R/solvers.R
convex_clustering.Rd
convex_clustering
calculates the convex clustering solution path
at a user-specified grid of lambda values (or just a single value). It is,
in general, difficult to know a useful set of lambda values a priori,
so this function is more useful for timing comparisons and methodological
research than applied work.
convex_clustering( X, ..., lambda_grid, weights = sparse_rbf_kernel_weights(k = "auto", phi = "auto", dist.method = "euclidean", p = 2), X.center = TRUE, X.scale = FALSE, norm = 2, impute_func = function(X) { if (anyNA(X)) missForest(X)$ximp else X }, status = (interactive() && (clustRviz_logger_level() %in% c("MESSAGE", "WARNING", "ERROR"))) )
X | The data matrix (\(X \in R^{n \times p}\)): rows correspond to
the observations (to be clustered) and columns to the variables (which
will not be clustered). If |
---|---|
... | Unused arguements. An error will be thrown if any unrecognized
arguments as given. All arguments other than |
lambda_grid | A user-supplied set of \(\lambda\) values at which to solve the convex clustering problem. These must be strictly positive values and will be automatically sorted internally. |
weights | One of the following:
|
X.center | A logical: Should |
X.scale | A logical: Should |
norm | Which norm to use in the fusion penalty? Currently only |
impute_func | A function used to impute missing data in |
status | Should a status message be printed to the console? |
An object of class convex_clustering
containing the following elements (among others):
X
: the original data matrix
n
: the number of observations (rows of X
)
p
: the number of variables (columns of X
)
X.center
: a logical indicating whether X
was centered
column-wise before clustering
X.scale
: a logical indicating whether X
was scaled
column-wise before centering
weight_type
: a record of the scheme used to create
fusion weights
U
: a tensor (3-array) of clustering solutions
Compared to the CARP
function, the returned object
is much more "bare-bones," containing only the estimated \(U\) matrices,
and no information used for dendrogram or path visualizations.
clustering_fit <- convex_clustering(presidential_speech[1:10,1:4], lambda_grid = 1:100)#>#>#>print(clustering_fit)#> Convex Clustering Fit Summary #> ============================= #> #> Algorithm: ADMM [L2] #> Grid: 101 values of lambda. #> Fit Time: 0.004 secs #> Total Time: 0.009 secs #> #> Number of Observations: 10 #> Number of Variables: 4 #> #> Pre-processing options: #> - Columnwise centering: TRUE #> - Columnwise scaling: FALSE #> #> Weights: #> - Source: Radial Basis Function Kernel Weights #> - Distance Metric: Euclidean #> - Scale parameter (phi): 0.1 [Data-Driven] #> - Sparsified: 2 Nearest Neighbors [Data-Driven] #>