Principal Component Analysis with MoMA

The `mtcar` Data Set

Citing the R documentation of the mtcar (Motor Trend Car Road Tests) data set, “the data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models)”.

The data format is as follows.

[, 1]   mpg: Miles/(US) gallon
[, 2]   cyl: Number of cylinders
[, 3]   disp: Displacement (cu.in.)
[, 4]   hp: Gross horsepower
[, 5]   drat: Rear axle ratio
[, 6]   wt: Weight (1000 lbs)
[, 7]   qsec: 1/4 mile time
[, 8]   vs: Engine (0 = V-shaped, 1 = straight)
[, 9]   am: Transmission (0 = automatic, 1 = manual)
[,10]   gear: Number of forward gears
[,11]   carb: Number of carburetors

Suppose we are interested in creating new “features” that

are linear combinations of known features,
cpature as much information as possible, and
comprise of only some of the known features.

Then we can use the moma_spca() function (Sparse Principal Component Analysis).

library(MoMA)

# Get rid of two categorical variables:
# X[, 8]: Engine (0 = V-shaped, 1 = straight)
# X[, 9]: Transmission (0 = automatic, 1 = manual
X <- as.matrix(datasets::mtcars[, c(1:7, 10, 11)])
lambda_len <- 30

a <- moma_spca(X,
    center = TRUE, scale = TRUE,
    v_sparse = moma_lasso(
        lambda = seq(0, 5, length.out = lambda_len)
    ),
    rank = 2
)

Access the Results

Get the loadings by `get_mat_by_index()$V`.

ld_10 <- a$get_mat_by_index(lambda_v = 10)

cat("chosen lambda for the first PC = ", ld_10$chosen_lambda_v[1], "\n")

## chosen lambda for the first PC =  1.551724

cat("chosen lambda for the second PC = ", ld_10$chosen_lambda_v[2], "\n")

## chosen lambda for the second PC =  1.551724

print(ld_10$V)

##             PC1        PC2
## mpg  -0.4180873  0.0000000
## cyl   0.4291761  0.0000000
## disp  0.4259293  0.0000000
## hp    0.3736823  0.1333281
## drat -0.2977026  0.2279502
## wt    0.3925498  0.0000000
## qsec -0.1495423 -0.4990620
## gear -0.1446716  0.6250596
## carb  0.1842326  0.5389804

Project New Data

Project new data onto the space spanned by new principal components.

# Let's pretend the first ten rows of `mtcars` are new data
newX <- X[1:10, ]

# `rank = 2`: the dimension of projected space
# `lambda_v`: an integer, the penalty level
a$left_project(newX = newX, rank = 2, lambda_v = 5)$proj_data

##                           PC1        PC2
## Mazda RX4         -0.76202053  1.1568861
## Mazda RX4 Wag     -0.71582657  0.9756667
## Datsun 710        -2.27240343 -0.3454976
## Hornet 4 Drive    -0.09344371 -1.9725363
## Hornet Sportabout  1.63253137 -0.8512317
## Valiant            0.20790912 -2.4343268
## Duster 360         2.66912438  0.2977371
## Merc 240D         -1.98588037 -0.7146268
## Merc 230          -2.16332116 -1.2525665
## Merc 280          -0.41575484  0.6079251

Start a Shiny App

Start a Shiny App and see how penalty levels affect loadings and the 2-D projection of original data.

plot(a)

Luofeng Liao

2019-08-26

The `mtcar` Data Set

Access the Results

Get the loadings by `get_mat_by_index()$V`.

Project New Data

Start a Shiny App

Contents

Principal Component Analysis with MoMA

Luofeng Liao

2019-08-26

The mtcar Data Set

Access the Results

Get the loadings by get_mat_by_index()$V.

Project New Data

Start a Shiny App

Contents

The `mtcar` Data Set

Get the loadings by `get_mat_by_index()$V`.