Perform permutation-based testings on a sample of permuted input scores
using candidate_search as the main iterative function for each run.
CaDrA(
  FS,
  input_score,
  method = c("ks_pval", "ks_score", "wilcox_pval", "wilcox_score", "revealer", "knnmi",
    "correlation", "custom"),
  method_alternative = c("less", "greater", "two.sided"),
  cmethod = c("spearman", "pearson"),
  custom_function = NULL,
  custom_parameters = NULL,
  weights = NULL,
  search_start = NULL,
  top_N = 1,
  search_method = c("both", "forward"),
  max_size = 7,
  n_perm = 1000,
  perm_alternative = c("one.sided", "two.sided"),
  obs_best_score = NULL,
  smooth = TRUE,
  plot = FALSE,
  ncores = 1,
  cache = FALSE,
  cache_path = NULL,
  verbose = FALSE
)a matrix of binary features or a SummarizedExperiment class object from SummarizedExperiment package where rows represent features of interest (e.g. genes, transcripts, exons, etc...) and columns represent the samples. The assay of FS contains binary (1/0) values indicating the presence/absence of omics features.
a vector of continuous scores representing a phenotypic readout of interest such as protein expression, pathway activity, etc.
NOTE: input_score object must have names or labels that match
the column names of FS object.
a character string specifies a scoring method that is
used in the search. There are 6 options: ("ks_pval" or ks_score
or "wilcox_pval" or wilcox_score or
"revealer" (conditional mutual information from REVEALER) or
"knnmi" (K-Nearest Neighbor Mutual Information Estimator from knnmi) or
"correlation" or
"custom" (a user-defined scoring method)).
Default is ks_pval.
a character string specifies an alternative
hypothesis testing ("two.sided" or "greater" or "less").
Default is less for left-skewed significance testing.
NOTE: This argument only applies to ks_pval and wilcox_pval
method
correlation method to use - spearman or pearson. Default is "spearman".
NOTE: This argument only applies to correlation method
If method is "custom", specifies
a user-defined function here. Default is NULL.
NOTE: custom_function must take FS and input_score
as its input arguments and its final result must return a vector of row-wise
scores where its labels or names match the row names of FS object.
If method is "custom", specifies a list of
additional arguments (excluding FS and input_score) to be
passed to custom_function. For example:
custom_parameters = list(alternative = "less"). Default is NULL.
If method is ks_score or ks_pval, specifying a
vector of weights will perform a weighted-KS testing. Default is NULL.
NOTE: weights must have names or labels that match the labels of
input_score.
a vector of character strings (separated by commas)
specifies feature names in the FS object to start the search with.
If search_start is provided, then top_N parameter will be
ignored and vice versa. Default is NULL.
an integer specifies the number of features to start the search over. By default, it starts with the feature that has the highest best score (top_N = 1).
NOTE: If top_N is provided, then search_start parameter
will be ignored and vice versa. If top_N > 10, it may result in a longer
search time.
a character string specifies an algorithm to filter out
the best candidates ("forward" or "both"). Default is
both (i.e., backward and forward).
an integer specifies a maximum size that a meta-feature can
extend to do for a given search. Default is 7.
an integer specifies the number of permutations to perform.
Default is 1000.
an alternative hypothesis type for calculating
permutation-based p-value. Options: one.sided, two.sided. Default is
one.sided.
a numeric value corresponding to the best observed
score. This value is used to compare against the n_perm calculated best
scores. Default is NULL. If set to NULL, we will compute the observed
best score based on the given parameters.
a logical value indicates whether or not to add a smoothing
factor of 1 to the calculation of permutation-based p-value. This option is
used to avoid a returned p-value of 0. Default is TRUE.
a logical value indicates whether or not to plot the empirical
null distribution of the permuted best scores. Default is FALSE.
an integer specifies the number of cores to perform
parallelization for permutation-based testing. Default is 1.
a logical value determines whether or not to cache the
permuted best scores. This helps to save time for future loading instead
of re-computing the permutation-based testing every time.
Default is FALSE.
a path to cache permuted best scores. Default is NULL.
If NULL, the cache path is set to system home directory
(e.g. $HOME/.Rcache) for future loading.
a logical value indicates whether or not to print the
diagnostic messages. Default is FALSE.
a list of 4 objects: key, perm_best_scores,
obs_best_score, perm_pval
-key: a list of parameters that was used to cache the
results of the permutation-based testing. This is useful as the
permuted best scores can be recycled to save time for future loading.
-perm_best_scores: a vector of permuted best scores obtained
by performing candidate_search over n_perm iterations of
permuted input scores.
-obs_best_score: a user-provided best score or an observed best score
obtained by performing candidate_search on a given dataset and input
parameters. This value is later used to compare against the permuted best
scores (perm_best_scores).
perm_pval: a permutation-based p-value obtained by calculating
sum(perm_best_scores > obs_best_score)/n_perm
NOTE: If smooth = TRUE, a smoothing factor of 1 will be added to the
calculation of perm_pval.
e.g. (sum(perm_best_scores > obs_best_score) + 1) / (n_perm + c)
This is just to not return a p-value of 0
# Load pre-computed feature set
data(sim_FS)
# Load pre-computed input-score
data(sim_Scores)
# Set seed for permutation
set.seed(21)
# Define additional parameters and start the search function
cadra_result <- CaDrA(
  FS = sim_FS, input_score = sim_Scores, method = "ks_pval",
  weights = NULL, method_alternative = "less", top_N = 1,
  search_start = NULL, search_method = "both", max_size = 7,
  n_perm = 10, perm_alternative = "one.sided", plot = FALSE,
  smooth = TRUE, obs_best_score = NULL,
  ncores = 1, cache = FALSE, cache_path = NULL
)
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |======================================================================| 100%