Skip to contents

Apply ResNMTF to data for a range of biclusters selecting the optimal number, with optional stability analysis

Usage

apply_resnmtf(
  data,
  init_f = NULL,
  init_s = NULL,
  init_g = NULL,
  k_vec = NULL,
  phi = NULL,
  xi = NULL,
  psi = NULL,
  n_iters = NULL,
  k_min = 3,
  k_max = 8,
  distance = "euclidean",
  num_repeats = 5,
  no_clusts = FALSE,
  sample_rate = 0.9,
  n_stability = 5,
  stability = TRUE,
  stab_thres = 0.4,
  remove_unstable = TRUE,
  use_parallel = TRUE
)

Arguments

data

list of n_v matrices, data to be factorised. If only one view is supplied, can be given as a matrix.

init_f

list of matrices, initialisation for F matrices

init_s

list of matrices, initialisation for S matrices

init_g

list of matrices, initialisation for G matrices

k_vec

vector of integers, number of clusters to consider in each view, default is NULL

phi

n_v x n_v matrix, default is NULL, restriction matrices for F

xi

n_v x n_v matrix, default is NULL, restriction matrices for S

psi

n_v x n_v matrix, default is NULL, restriction matrices for G

n_iters

integer, default is NULL, number of iterations to run for, otherwise will run until convergence

k_min

positive integer, default is 3, smallest value of k to be considered initially,

k_max

positive integer, default is 6, largest value of k to be considered initially,

distance

string, default is "euclidean", distance metric to use within the bisilhouette score

num_repeats

integer, default is 5, number of repeats to use within stability analysis

no_clusts

boolean, default is FALSE, whether to return only the factorisation or not,

sample_rate

numeric, default is 0.9, proportion of data to sample for stability analysis,

n_stability

integer, default is 5, number of times to repeat stability analysis,

stability

boolean, default is TRUE, whether to perform stability analysis or not,

stab_thres

numeric, default is 0.4, threshold for stability analysis,

remove_unstable

boolean, default is TRUE, whether to remove unstable clusters or not

use_parallel

boolean, default is TRUE, wheather to use parallelisation, not applicable on Windows or linux machines

Value

list of results from ResNMTF, containing the following: - output_f: list of matrices, F matrices - output_s: list of matrices, S matrices - output_g: list of matrices, G matrices - Error: numeric, mean error - All_Error: numeric, all errors - bisil: numeric, bisilhouette score - row_clusters: list of matrices, row clusters - col_clusters: list of matrices, column clusters - lambda: list of vectors, lambda vectors - mu: list of vectors, mu vectors

Examples

row_clusters <- cbind(
  rbinom(100, 1, 0.5),
  rbinom(100, 1, 0.5),
  rbinom(100, 1, 0.5)
)
col_clusters <- cbind(
  rbinom(50, 1, 0.4),
  rbinom(50, 1, 0.4),
  rbinom(50, 1, 0.4)
)
n_col <- 50
data <- list(
  row_clusters %*% diag(c(5, 5, 5)) %*% t(col_clusters) +
    abs(matrix(rnorm(100 * n_col), 100, n_col)),
  row_clusters %*% diag(c(5, 5, 5)) %*% t(col_clusters) +
    abs(0.01 * matrix(rnorm(100 * n_col), 100, n_col))
)
apply_resnmtf(data, k_max = 4)
#> Error in loadNamespace(x): there is no package called ‘bisilhouette’