The unified SFPCA (Sparse and Functional Principal Component Analysis) method enjoys many advantages over existing approaches to regularized PC, because it

  • allows for arbitrary degrees and forms of regularization;

  • unifies many existing methods;

  • admits a tractable, efficient, and theoretically well-grounded algorithm.

The problem is formulated as follows.

\[\max_{u,\,,v}{u}^{T} {X} {v}-\lambda_{{u}} P_{{u}}({u})-\lambda_{{v}} P_{{v}}({v})\] \[\text{s.t. } \| u \| _ {S_u} \leq 1, \, \| v \| _ {S_v} \leq 1.\] Typically, we take \({S}_{{u}}={I}+\alpha_{{u}} {\Omega}_{{u}}\) where \(\Omega_u\) is the second- or fourth-difference matrix, so that the \(\|u \|_{S_u}\) penalty term encourages smoothness in the estimated singular vectors. \(P_u\) and \(P_v\) are sparsity inducing penalties that satisfy the following conditions:

  • \(P \geq 0\), \(P\) defined on \([0,+\infty)\);

  • \(P(cx) = c P (x), \forall \, c > 0\).

A Wide Range of Modeling Options

Currently, the package supports arbitrary combination of the following.

Various sparsity-inducing penalties

So far, we have incorporated the following penalties in MoMA. The code under each penalty is only an example specification of the penalty. They should be carefully tailored based on your particular data set.

  • LASSO (least absolute shrinkage and selection operator), see moma_lasso;
  • SCAD moma_lasso(smoothly clipped absolute deviation), see moma_scad;
# `gamma` is the non-convexity parameter
moma_scad(gamma = 3, non_negative = TURE)
  • MCP (minimax concave penalty), see moma_mcp;
# `gamma` is the non-convexity parameter
moma_mcp(gamma = 3, non_negative = TURE)
  • SLOPE (sorted \(\ell\)-one penalized estimation), see moma_slope;
  • Group LASSO, see moma_grplasso;
  • Fused LASSO, see moma_fusedlasso;
  • L1 trend filtering, see moma_l1tf;
  • Sparse fused LASSO, see moma_spfusedlasso;
  • Cluster penalty, see moma_cluster.

Parameter selection schemes

  • Exhaustive search

  • Nested BIC. See select_scheme for details.

Multivariate methods

  • PCA (Principal Component Analysis). See moma_sfpca.

  • CCA (Canonical Component Analysis). See moma_sfcca.

  • LDA (Linear Discriminant Anlsysis). See moma_sflda.

  • PLS (Partial Least Square) (TODO)

  • Correspondence Analysis (TODO)

Deflation schemes

  • Hotelling’s deflation (PCA)

  • Projection deflation (PCA, CCA, LDA)

  • Schur’s complement (PCA)

Excellent User Experience

  • Easy-to-use functions. Let \(\Delta\) be a second-difference matrix of appropriate size, such that \(u^T\Delta u = \sum_i (u_{i} - u_{i-1} )^2\). For a matrix \(X\), one line of code can solve the following penalized singular value decomposition problem:

\[\max_{u,\, v} \, {u}^{T} {X} {v} - 4 \sum_i | v_i - v_{i-1}|\] \[ \text{s.t. } u^T(I + 3 \Delta) u \leq 1,\, v^Tv \leq 1.\]

  • R6 methods to support access of results.

  • Shiny supports interation with MoMA.

  • Fast. MoMA uses the Rcpp and RcppArmadillo libraries for speed (Eddelbuettel and François 2011; Eddelbuettel and Sanderson 2014; Sanderson and Curtin 2016).

References

Eddelbuettel, Dirk, and Romain François. 2011. “Rcpp: Seamless R and C++ Integration.” Journal of Statistical Software 40 (8): 1–18. https://doi.org/10.18637/jss.v040.i08.

Eddelbuettel, Dirk, and Conrad Sanderson. 2014. “RcppArmadillo: Accelerating R with High-Performance C++ Linear Algebra.” Computational Statistics and Data Analysis 71: 1054–63. https://doi.org/10.1016/j.csda.2013.02.005.

Sanderson, Conrad, and Ryan Curtin. 2016. “Armadillo: A Template-Based C++ Library for Linear Algebra.” Journal of Open Source Software 1 (2): 26. https://doi.org/10.21105/joss.00026.