CBASS returns a fast approximation to the Convex BiClustering solution path along with visualizations such as dendrograms and heatmaps. CBASS solves the Convex Biclustering problem using an efficient Algorithmic Regularization scheme.

CBASS(
  X,
  ...,
  row_weights = sparse_rbf_kernel_weights(k = "auto", phi = "auto", dist.method =
    "euclidean", p = 2),
  col_weights = sparse_rbf_kernel_weights(k = "auto", phi = "auto", dist.method =
    "euclidean", p = 2),
  row_labels = rownames(X),
  col_labels = colnames(X),
  X.center.global = TRUE,
  t = 1.01,
  back_track = FALSE,
  exact = FALSE,
  norm = 2,
  npcs = min(4L, NCOL(X), NROW(X)),
  dendrogram.scale = NULL,
  status = (interactive() && (clustRviz_logger_level() %in% c("MESSAGE", "WARNING",
    "ERROR")))
)

Arguments

X

The data matrix (\(X \in R^{n \times p}\)). If X has missing values - NA or NaN values - they will be automatically imputed.

...

Unused arguements. An error will be thrown if any unrecognized arguments as given.

row_weights

One of the following:

  • A function which, when called with argument X, returns a n-by-n matrix of fusion weights.

  • A matrix of size n-by-n containing fusion weights

Note that the weights will be renormalized to sum to \(1/\sqrt{n}\) internally.

col_weights

One of the following:

  • A function which, when called with argument t(X), returns a p-by-p matrix of fusion weights. (Note the transpose.)

  • A matrix of size p-by-p containing fusion weights

Note that the weights will be renormalized to sum to \(1/\sqrt{p}\) internally.

row_labels

A character vector of length \(n\): row (observation) labels

col_labels

A character vector of length \(p\): column (variable) labels

X.center.global

A logical: Should X be centered globally? I.e., should the global mean of X be subtracted?

t

A number greater than 1: the size of the multiplicative update to the cluster fusion regularization parameter (not used by back-tracking variants). Typically on the scale of 1.005 to 1.1.

back_track

A logical: Should back-tracking be used to exactly identify fusions? By default, back-tracking is not used.

exact

A logical: Should the exact solution be computed using an iterative algorithm? By default, algorithmic regularization is applied and the exact solution is not computed. Setting exact = TRUE often significantly increases computation time.

norm

Which norm to use in the fusion penalty? Currently only 1 and 2 (default) are supported.

npcs

An integer >= 2. The number of principal components to compute for path visualization.

dendrogram.scale

A character string denoting how the scale of dendrogram regularization proportions should be visualized. Choices are 'original' or 'log'; if not provided, a data-driven heuristic choice is used.

status

Should a status message be printed to the console?

Value

An object of class CBASS containing the following elements (among others):

  • X: the original data matrix

  • n: the number of observations (rows of X)

  • p: the number of variables (columns of X)

  • alg.type: the CBASS variant used

  • row_fusions: A record of row fusions - see the documentation of CARP for details of what this may include.

  • col_fusions: A record of column fusions - see the documentation of CARP for details of what this may include.

Examples

if (FALSE) { cbass_fit <- CBASS(presidential_speech) print(cbass_fit) plot(cbass_fit) }