CaDrA Search — CaDrA • CaDrA

Perform permutation-based testings on a sample of permuted input scores using candidate_search as the main iterative function for each run.

CaDrA(
  FS,
  input_score,
  method = c("ks_pval", "ks_score", "wilcox_pval", "wilcox_score", "revealer", "custom"),
  method_alternative = c("less", "greater", "two.sided"),
  custom_function = NULL,
  custom_parameters = NULL,
  weights = NULL,
  search_start = NULL,
  top_N = 1,
  search_method = c("both", "forward"),
  max_size = 7,
  n_perm = 1000,
  perm_alternative = c("one.sided", "two.sided"),
  obs_best_score = NULL,
  smooth = TRUE,
  plot = FALSE,
  ncores = 1,
  cache = FALSE,
  cache_path = NULL,
  verbose = FALSE
)

Arguments

FS

a matrix of binary features or a SummarizedExperiment class object from SummarizedExperiment package where rows represent features of interest (e.g. genes, transcripts, exons, etc...) and columns represent the samples. The assay of FS contains binary (1/0) values indicating the presence/absence of omics features.

input_score

a vector of continuous scores representing a phenotypic readout of interest such as protein expression, pathway activity, etc.

NOTE: input_score object must have names or labels that match the column names of FS object.

method

a character string specifies a scoring method that is used in the search. There are 6 options: ("ks_pval" or ks_score or "wilcox_pval" or wilcox_score or "revealer" (conditional mutual information from REVEALER) or "custom" (a user-defined scoring method)). Default is ks_pval.

method_alternative

a character string specifies an alternative hypothesis testing ("two.sided" or "greater" or "less"). Default is less for left-skewed significance testing.

NOTE: This argument only applies to ks_pval and wilcox_pval method

custom_function

If method is "custom", specifies a user-defined function here. Default is NULL.

NOTE: custom_function must take FS and input_score as its input arguments and its final result must return a vector of row-wise scores where its labels or names match the row names of FS object.

custom_parameters

If method is "custom", specifies a list of additional arguments (excluding FS and input_score) to be passed to custom_function. For example: custom_parameters = list(alternative = "less"). Default is NULL.

weights

If method is ks_score or ks_pval, specifying a vector of weights will perform a weighted-KS testing. Default is NULL.

NOTE: weights must have names or labels that match the labels of input_score.

search_start

a vector of character strings (separated by commas) specifies feature names in the FS object to start the search with. If search_start is provided, then top_N parameter will be ignored and vice versa. Default is NULL.

top_N

an integer specifies the number of features to start the search over. By default, it starts with the feature that has the highest best score (top_N = 1).

NOTE: If top_N is provided, then search_start parameter will be ignored and vice versa. If top_N > 10, it may result in a longer search time.

search_method

a character string specifies an algorithm to filter out the best candidates ("forward" or "both"). Default is both (i.e., backward and forward).

max_size

an integer specifies a maximum size that a meta-feature can extend to do for a given search. Default is 7.

n_perm

an integer specifies the number of permutations to perform. Default is 1000.

perm_alternative

an alternative hypothesis type for calculating permutation-based p-value. Options: one.sided, two.sided. Default is one.sided.

obs_best_score

a numeric value corresponding to the best observed score. This value is used to compare against the n_perm calculated best scores. Default is NULL. If set to NULL, we will compute the observed best score based on the given parameters.

smooth

a logical value indicates whether or not to add a smoothing factor of 1 to the calculation of permutation-based p-value. This option is used to avoid a returned p-value of 0. Default is TRUE.

plot

a logical value indicates whether or not to plot the empirical null distribution of the permuted best scores. Default is FALSE.

ncores

an integer specifies the number of cores to perform parallelization for permutation-based testing. Default is 1.

cache

a logical value determines whether or not to cache the permuted best scores. This helps to save time for future loading instead of re-computing the permutation-based testing every time. Default is FALSE.

cache_path

a path to cache permuted best scores. Default is NULL. If NULL, the cache path is set to system home directory (e.g. $HOME/.Rcache) for future loading.

verbose

a logical value indicates whether or not to print the diagnostic messages. Default is FALSE.

Value

a list of 4 objects: key, perm_best_scores, obs_best_score, perm_pval

-key: a list of parameters that was used to cache the results of the permutation-based testing. This is useful as the permuted best scores can be recycled to save time for future loading.

-perm_best_scores: a vector of permuted best scores obtained by performing candidate_search over n_perm iterations of permuted input scores.

-obs_best_score: a user-provided best score or an observed best score obtained by performing candidate_search on a given dataset and input parameters. This value is later used to compare against the permuted best scores (perm_best_scores).

perm_pval: a permutation-based p-value obtained by calculating sum(perm_best_scores > obs_best_score)/n_perm

NOTE: If smooth = TRUE, a smoothing factor of 1 will be added to the calculation of perm_pval.

e.g. (sum(perm_best_scores > obs_best_score) + 1) / (n_perm + c)

This is just to not return a p-value of 0

Examples


# Load pre-computed feature set
data(sim_FS)

# Load pre-computed input-score
data(sim_Scores)

# Set seed for permutation
set.seed(21)

# Define additional parameters and start the function
cadra_result <- CaDrA(
  FS = sim_FS, input_score = sim_Scores, method = "ks_pval", 
  weights = NULL, method_alternative = "less", top_N = 1,
  search_start = NULL, search_method = "both", max_size = 7, 
  n_perm = 10, perm_alternative = "one.sided", plot = FALSE, 
  smooth = TRUE, obs_best_score = NULL,
  ncores = 1, cache = FALSE, cache_path = NULL
)
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |======================================================================| 100%