convex_biclustering calculates the convex biclustering solution path at a user-specified grid of lambda values (or just a single value). It is, in general, difficult to know a useful set of lambda values a priori, so this function is more useful for timing comparisons and methodological research than applied work.

convex_biclustering(
  X,
  ...,
  lambda_grid,
  row_weights = sparse_rbf_kernel_weights(k = "auto", phi = "auto", dist.method =
    "euclidean", p = 2),
  col_weights = sparse_rbf_kernel_weights(k = "auto", phi = "auto", dist.method =
    "euclidean", p = 2),
  X.center.global = TRUE,
  norm = 2,
  status = (interactive() && (clustRviz_logger_level() %in% c("MESSAGE", "WARNING",
    "ERROR")))
)

Arguments

X

The data matrix (\(X \in R^{n \times p}\)). If X has missing values - NA or NaN values - they will be automatically imputed.

...

Unused arguements. An error will be thrown if any unrecognized arguments as given.

lambda_grid

A user-supplied set of \(\lambda\) values at which to solve the convex biclustering problem. These must be strictly positive values and will be automatically sorted internally.

row_weights

One of the following:

  • A function which, when called with argument X, returns a n-by-n matrix of fusion weights.

  • A matrix of size n-by-n containing fusion weights

Note that the weights will be renormalized to sum to \(1/\sqrt{n}\) internally.

col_weights

One of the following:

  • A function which, when called with argument t(X), returns a p-by-p matrix of fusion weights. (Note the transpose.)

  • A matrix of size p-by-p containing fusion weights

Note that the weights will be renormalized to sum to \(1/\sqrt{p}\) internally.

X.center.global

A logical: Should X be centered globally? I.e., should the global mean of X be subtracted?

norm

Which norm to use in the fusion penalty? Currently only 1 and 2 (default) are supported.

status

Should a status message be printed to the console?

Value

An object of class convex_biclustering containing the following elements (among others):

  • X: the original data matrix

  • n: the number of observations (rows of X)

  • p: the number of variables (columns of X)

  • U: a tensor (3-array) of clustering solutions

Details

Compared to the CBASS function, the returned object is much more "bare-bones," containing only the estimated \(U\) matrices, and no information used for dendrogram or path visualizations.

Examples

if (FALSE) { biclustering_fit <- convex_biclustering(presidential_speech, lambda_grid = 1:100) print(biclustering_fit) }