Fit a generalized linear model via maximum penalized likelihood
using the exclusive lasso penalty. The regularization path is computed
along a grid of values for the regularization parameter (lambda).
The interface is intentionally similar to that of glmnet
in
the package of the same name.
exclusive_lasso( X, y, groups, family = c("gaussian", "binomial", "poisson"), weights, offset, nlambda = 100, lambda.min.ratio = ifelse(nobs < nvars, 0.01, 1e-04), lambda, standardize = TRUE, intercept = TRUE, lower.limits = rep(-Inf, nvars), upper.limits = rep(Inf, nvars), thresh = 1e-07, thresh_prox = thresh, skip_df = FALSE, algorithm = c("cd", "pg") )
X | The matrix of predictors (\(X \in \R^{n \times p}\)) |
---|---|
y | The response vector (\(y\)) |
groups | An integer vector of length \(p\) indicating group membership.
(Cf. the |
family | The GLM response type. (Cf. the |
weights | Weights applied to individual
observations. If not supplied, all observations will be equally
weighted. Will be re-scaled to sum to \(n\) if
necessary. (Cf. the |
offset | A vector of length \(n\) included in the linear predictor. |
nlambda | The number of lambda values to use in computing the regularization path. Note that the time to run is typically sublinear in the grid size due to the use of warm starts. |
lambda.min.ratio | The smallest value of lambda to be used, as a fraction of the largest value of lambda used. Unlike the lasso, there is no value of lambda such that the solution is wholly sparse, but we still use lambda_max from the lasso. |
lambda | A user-specified sequence of lambdas to use. |
standardize | Should |
intercept | Should the fitted model have an (unpenalized) intercept term? |
lower.limits | A vector of lower bounds for each coefficient (default |
upper.limits | A vector of lower bounds for each coefficient (default |
thresh | The convergence threshold used for the proximal gradient or coordinate-descent algorithm used to solve the penalized regression problem. |
thresh_prox | The convergence threshold used for the coordinate-descent algorithm used to evaluate the proximal operator. |
skip_df | Should the DF calculations be skipped? They are often slower
than the actual model fitting; if calling |
algorithm | Which algorithm to use, proximal gradient ( |
An object of class ExclusiveLassoFit
containing
coef
- A matrix of estimated coefficients
intercept
- A vector of estimated intercepts if intercept=TRUE
X, y, groups, weights, offset
- The data used to fit the model
lambda
- The vector of \(\lambda\) used
df
- An unbiased estimate of the degrees of freedom (see Theorem
5 in [1])
nnz
- The number of non-zero coefficients at each value of
\(\lambda\)
Note that unlike Campbell and Allen (2017), we use the "1/n"-scaling of the loss function.
For the Gaussian case: $$\frac{1}{2n}|y - X\beta|_2^2 + \lambda P(\beta, G)$$
For other GLMs: $$-\frac{1}{n}\ell(y, X\beta)+ \lambda P(\beta, G)$$
By default, an optimized implementation is used for family="gaussian"
which is approximately 2x faster for most problems. If you wish
to disable this code path and use the standard GLM implementation
with Gaussian response, set options(ExclusiveLasso.gaussian_fast_path=FALSE).
Campbell, Frederick and Genevera I. Allen. "Within Group Variable Selection with the Exclusive Lasso". Electronic Journal of Statistics 11(2), pp.4220-4257. 2017. doi: 10.1214/17-EJS1317