CARP returns a fast approximation to the Convex Clustering solution path along with visualizations such as dendrograms and cluster paths. CARP solves the Convex Clustering problem via an efficient Algorithmic Regularization scheme.

  weights = sparse_rbf_kernel_weights(k = "auto", phi = "auto", dist.method =
    "euclidean", p = 2),
  labels = rownames(X), = TRUE,
  X.scale = FALSE,
  back_track = FALSE,
  exact = FALSE,
  norm = 2,
  t = 1.05,
  npcs = min(4L, NCOL(X), NROW(X)),
  dendrogram.scale = NULL,
  impute_func = function(X) {     if (anyNA(X))          missForest(X)$ximp     else X
  status = (interactive() && (clustRviz_logger_level() %in% c("MESSAGE", "WARNING",



The data matrix (\(X \in R^{n \times p}\)): rows correspond to the observations (to be clustered) and columns to the variables (which will not be clustered). If X has missing values - NA or NaN values - they will be automatically imputed.


Unused arguements. An error will be thrown if any unrecognized arguments as given. All arguments other than X must be given by name.


One of the following:

  • A function which, when called with argument X, returns an b-by-n matrix of fusion weights.

  • A matrix of size n-by-n containing fusion weights


A character vector of length \(n\): observations (row) labels

A logical: Should X be centered columnwise?


A logical: Should X be scaled columnwise?


A logical: Should back-tracking be used to exactly identify fusions? By default, back-tracking is not used.


A logical: Should the exact solution be computed using an iterative algorithm? By default, algorithmic regularization is applied and the exact solution is not computed. Setting exact = TRUE often significantly increases computation time.


Which norm to use in the fusion penalty? Currently only 1 and 2 (default) are supported.


A number greater than 1: the size of the multiplicative update to the cluster fusion regularization parameter (not used by back-tracking variants). Typically on the scale of 1.005 to 1.1.


An integer >= 2. The number of principal components to compute for path visualization.


A character string denoting how the scale of dendrogram regularization proportions should be visualized. Choices are 'original' or 'log'; if not provided, a data-driven heuristic choice is used.


A function used to impute missing data in X. By default, the missForest function from the package of the same name is used. This provides a flexible potentially non-linear imputation function. This function has to return a data matrix with no NA values. Note that, consistent with base R, both NaN and NA are treaded as "missing values" for imputation.


Should a status message be printed to the console?


An object of class CARP containing the following elements (among others):

  • X: the original data matrix

  • n: the number of observations (rows of X)

  • p: the number of variables (columns of X)

  • alg.type: the CARP variant used

  • a logical indicating whether X was centered column-wise before clustering

  • X.scale: a logical indicating whether X was scaled column-wise before centering

  • weight_type: a record of the scheme used to create fusion weights


carp_fit <- CARP(presidential_speech[1:10,1:4])
#> Pre-computing weights and edge sets
#> Computing Convex Clustering [CARP] Path
#> Post-processing
#> CARP Fit Summary #> ==================== #> #> Algorithm: CARP (t = 1.05) #> Fit Time: 0.008 secs #> Total Time: 1.025 secs #> #> Number of Observations: 10 #> Number of Variables: 4 #> #> Pre-processing options: #> - Columnwise centering: TRUE #> - Columnwise scaling: FALSE #> #> Weights: #> - Source: Radial Basis Function Kernel Weights #> - Distance Metric: Euclidean #> - Scale parameter (phi): 0.1 [Data-Driven] #> - Sparsified: 2 Nearest Neighbors [Data-Driven] #>