Fit a generalized linear model via maximum penalized likelihood using the exclusive lasso penalty. The regularization path is computed along a grid of values for the regularization parameter (lambda). The interface is intentionally similar to that of glmnet in the package of the same name.

exclusive_lasso(
  X,
  y,
  groups,
  family = c("gaussian", "binomial", "poisson"),
  weights,
  offset,
  nlambda = 100,
  lambda.min.ratio = ifelse(nobs < nvars, 0.01, 1e-04),
  lambda,
  standardize = TRUE,
  intercept = TRUE,
  lower.limits = rep(-Inf, nvars),
  upper.limits = rep(Inf, nvars),
  thresh = 1e-07,
  thresh_prox = thresh,
  skip_df = FALSE,
  algorithm = c("cd", "pg")
)

Arguments

X

The matrix of predictors (\(X \in \R^{n \times p}\))

y

The response vector (\(y\))

groups

An integer vector of length \(p\) indicating group membership. (Cf. the group argument of grpreg)

family

The GLM response type. (Cf. the family argument of glm)

weights

Weights applied to individual observations. If not supplied, all observations will be equally weighted. Will be re-scaled to sum to \(n\) if necessary. (Cf. the weight argument of lm)

offset

A vector of length \(n\) included in the linear predictor.

nlambda

The number of lambda values to use in computing the regularization path. Note that the time to run is typically sublinear in the grid size due to the use of warm starts.

lambda.min.ratio

The smallest value of lambda to be used, as a fraction of the largest value of lambda used. Unlike the lasso, there is no value of lambda such that the solution is wholly sparse, but we still use lambda_max from the lasso.

lambda

A user-specified sequence of lambdas to use.

standardize

Should X be centered and scaled before fitting?

intercept

Should the fitted model have an (unpenalized) intercept term?

lower.limits

A vector of lower bounds for each coefficient (default -Inf). Can either be a scalar (applied to each coefficient) or a vector of length p (number of coefficients).

upper.limits

A vector of lower bounds for each coefficient (default Inf). Can either be a scalar (applied to each coefficient) or a vector of length p (number of coefficients).

thresh

The convergence threshold used for the proximal gradient or coordinate-descent algorithm used to solve the penalized regression problem.

thresh_prox

The convergence threshold used for the coordinate-descent algorithm used to evaluate the proximal operator.

skip_df

Should the DF calculations be skipped? They are often slower than the actual model fitting; if calling exclusive_lasso repeatedly it may be useful to skip these calculations.

algorithm

Which algorithm to use, proximal gradient ("pg") or coordinate descent ("cd")? Empirically, coordinate descent appears to be faster for most problems (consistent with Campbell and Allen), but proximal gradient may be faster for certain problems with many small groups where the proximal operator may be evaluated quickly and to high precision.

Value

An object of class ExclusiveLassoFit containing

  • coef - A matrix of estimated coefficients

  • intercept - A vector of estimated intercepts if intercept=TRUE

  • X, y, groups, weights, offset - The data used to fit the model

  • lambda - The vector of \(\lambda\) used

  • df - An unbiased estimate of the degrees of freedom (see Theorem 5 in [1])

  • nnz - The number of non-zero coefficients at each value of \(\lambda\)

Details

Note that unlike Campbell and Allen (2017), we use the "1/n"-scaling of the loss function.

For the Gaussian case: $$\frac{1}{2n}|y - X\beta|_2^2 + \lambda P(\beta, G)$$

For other GLMs: $$-\frac{1}{n}\ell(y, X\beta)+ \lambda P(\beta, G)$$

By default, an optimized implementation is used for family="gaussian" which is approximately 2x faster for most problems. If you wish to disable this code path and use the standard GLM implementation with Gaussian response, set options(ExclusiveLasso.gaussian_fast_path=FALSE).

References

Campbell, Frederick and Genevera I. Allen. "Within Group Variable Selection with the Exclusive Lasso". Electronic Journal of Statistics 11(2), pp.4220-4257. 2017. doi: 10.1214/17-EJS1317

Examples

n <- 200 p <- 500 groups <- rep(1:10, times=50) beta <- numeric(p); beta[1:10] <- 3 X <- matrix(rnorm(n * p), ncol=p) y <- X %*% beta + rnorm(n) exfit <- exclusive_lasso(X, y, groups)