Compute Convex Clustering Solution Path on a User-Specified Grid

convex_clustering calculates the convex clustering solution path at a user-specified grid of lambda values (or just a single value). It is, in general, difficult to know a useful set of lambda values a priori, so this function is more useful for timing comparisons and methodological research than applied work.

convex_clustering(
  X,
  ...,
  lambda_grid,
  weights = sparse_rbf_kernel_weights(k = "auto", phi = "auto", dist.method =
    "euclidean", p = 2),
  X.center = TRUE,
  X.scale = FALSE,
  norm = 2,
  impute_func = function(X) {     if (anyNA(X))          missForest(X)$ximp     else X
    },
  status = (interactive() && (clustRviz_logger_level() %in% c("MESSAGE", "WARNING",
    "ERROR")))
)

Arguments

X	The data matrix (\(X \in R^{n \times p}\)): rows correspond to the observations (to be clustered) and columns to the variables (which will not be clustered). If `X` has missing values - `NA` or `NaN` values - they will be automatically imputed.
...	Unused arguements. An error will be thrown if any unrecognized arguments as given. All arguments other than `X` must be given by name.
lambda_grid	A user-supplied set of \(\lambda\) values at which to solve the convex clustering problem. These must be strictly positive values and will be automatically sorted internally.
weights	One of the following: A function which, when called with argument `X`, returns an b-by-n matrix of fusion weights. A matrix of size n-by-n containing fusion weights
X.center	A logical: Should `X` be centered columnwise?
X.scale	A logical: Should `X` be scaled columnwise?
norm	Which norm to use in the fusion penalty? Currently only `1` and `2` (default) are supported.
impute_func	A function used to impute missing data in `X`. By default, the `missForest` function from the package of the same name is used. This provides a flexible potentially non-linear imputation function. This function has to return a data matrix with no `NA` values. Note that, consistent with base `R`, both `NaN` and `NA` are treaded as "missing values" for imputation.
status	Should a status message be printed to the console?

Value

An object of class convex_clustering containing the following elements (among others):

X: the original data matrix
n: the number of observations (rows of X)
p: the number of variables (columns of X)
X.center: a logical indicating whether X was centered column-wise before clustering
X.scale: a logical indicating whether X was scaled column-wise before centering
weight_type: a record of the scheme used to create fusion weights
U: a tensor (3-array) of clustering solutions

Details

Compared to the CARP function, the returned object is much more "bare-bones," containing only the estimated \(U\) matrices, and no information used for dendrogram or path visualizations.

Examples

clustering_fit <- convex_clustering(presidential_speech[1:10,1:4], lambda_grid = 1:100)
#> Pre-computing weights and edge sets
#> Computing Convex Clustering Solutions
#> Post-processing
print(clustering_fit)
#> Convex Clustering Fit Summary
#> =============================
#> 
#> Algorithm: ADMM [L2] 
#> Grid: 101 values of lambda. 
#> Fit Time: 0.004 secs 
#> Total Time: 0.009 secs 
#> 
#> Number of Observations: 10 
#> Number of Variables:    4 
#> 
#> Pre-processing options:
#>  - Columnwise centering: TRUE 
#>  - Columnwise scaling:   FALSE 
#> 
#> Weights:
#>  - Source: Radial Basis Function Kernel Weights
#>  - Distance Metric: Euclidean
#>  - Scale parameter (phi): 0.1 [Data-Driven]
#>  - Sparsified: 2 Nearest Neighbors [Data-Driven]
#>