mplearn.feature_selection.base_selector.ThresholdedOLS

class mplearn.feature_selection.base_selector.ThresholdedOLS(*, num_features_to_select=None, screening_thresh=None)[source]

Feature selection with the Thresholded OLS selector.

This class is designed to be used as a base feature selector on the minipatches with the mplearn.feature_selection.AdaSTAMPS class.

Parameters
num_features_to_selectint or float, default=None

The number of features to select from the m features in a minipatch.

  • If None, it employs the Bonferroni procedure as described in [1] to automatically decide the number of features to select on the minipatch.

  • If positive integer, it is the absolute number of features to select on a minipatch.

  • If float in the interval (0.0, 1.0], it is the percentage of the m features in a minipatch to select.

screening_threshfloat, default=None

This is ignored if the minipatch has more observations n than features m. For high-dimensional minipatches (n<m), screening_thresh should be a float in the interval (0.0, 1.0), which will first apply an efficient screening rule from [1] to reduce the number of features in the minipatch to round(screening_thresh * n).

Attributes
selection_indicator_ndarray of shape (m,) or (round(screening_thresh * n),)

A binary selection indicator for the features in the minipatch (1 for selected features and 0 for unselected features). If low-dimensional minipatch (n>m), the shape is (m,). Otherwise, the shape is (round(screening_thresh * n),).

Fk_ndarray of shape (m,) or (round(screening_thresh * n),)

The corresponding integer indices of the features in selection_indicator_. Note that these indices correspond to these features’ column indices in the full data X_full (N observations and M features).

References

1

Giurcanu, M. . “Thresholding least-squares inference in high-dimensional regression models.” Electron. J. Statist. 10 (2) 2124 - 2156, 2016.

fit(X, y, Fk)[source]

Fit the thresholded OLS base selector to a minipatch.

Parameters
Xndarray of shape (n, m)

The data matrix corresponding to the minipatch (n observations and m features).

yndarray of shape (n,)

The target values corresponding to the minipatch.

Fkndarray of shape (m,)

The integer indices of the features in the minipatch. Note that these indices correspond to these features’ column indices in the full data X_full. For example, X = X_full[:, F_k].

Returns
selfobject

Fitted estimator.

get_params(deep=True)

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsdict

Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects. The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfestimator instance

Estimator instance.