Performs heuristic search on a set of binary features to determine whether
there are features whose union is more skewed (enriched at the extremes)
than either features alone. This is the main functionality of
the CaDrA package.
candidate_search(
  FS,
  input_score,
  method = c("ks_pval", "ks_score", "wilcox_pval", "wilcox_score", "revealer", "knnmi",
    "correlation", "custom"),
  method_alternative = c("less", "greater", "two.sided"),
  cmethod = c("spearman", "pearson"),
  custom_function = NULL,
  custom_parameters = NULL,
  weights = NULL,
  search_start = NULL,
  top_N = 1,
  search_method = c("both", "forward"),
  max_size = 7,
  best_score_only = FALSE,
  do_plot = FALSE,
  verbose = FALSE
)a matrix of binary features or a SummarizedExperiment class object from SummarizedExperiment package where rows represent features of interest (e.g. genes, transcripts, exons, etc...) and columns represent the samples. The assay of FS contains binary (1/0) values indicating the presence/absence of omics features.
a vector of continuous scores representing a phenotypic readout of interest such as protein expression, pathway activity, etc.
NOTE: input_score object must have names or labels that match the
column names of FS object.
a character string specifies a scoring method that is
used in the search. There are 7 options: ("ks_pval" or ks_score
or "wilcox_pval" or wilcox_score or
"revealer" (conditional mutual information from REVEALER) or
"knnmi" (K-Nearest Neighbor Mutual Information Estimator from knnmi) or
"correlation" (based on simple correlation - pearson or spearman) or
"custom" (a user-defined scoring method)).
Default is ks_pval.
a character string specifies an alternative
hypothesis testing ("two.sided" or "greater" or "less").
Default is less for left-skewed significance testing.
NOTE: This argument only applies to ks_pval and wilcox_pval method
correlation method to use - spearman or pearson. Default is "spearman".
NOTE: This argument only applies to correlation method only
if method is "custom", specifies
a user-defined function here. Default is NULL.
NOTE: custom_function must take FS and input_score as its
input arguments and its final result must return a vector of row-wise scores
where its labels or names match the row names of FS object.
if method is "custom", specifies a list of
additional arguments (excluding FS and input_score)
to be passed to the custom_function. For example:
custom_parameters = list(alternative = "less"). Default is NULL.
if method is ks_score or ks_pval, specifying a
vector of weights will perform a weighted-KS testing. Default is NULL.
NOTE: weights must have names or labels that match the labels of
input_score.
a vector of character strings (separated by commas)
specifies feature names in the FS object to start the search with.
If search_start is provided, then top_N parameter will be
ignored and vice versa. Default is NULL.
an integer specifies the number of features to start the search over. By default, it starts with the feature that has the highest best score (top_N = 1).
NOTE: If top_N is provided, then search_start parameter
will be ignored and vice versa. If top_N > 10, it may result in a longer
search time.
a character string specifies an algorithm to filter
out the best features ("forward" or "both"). Default is
both (i.e. backward and forward).
an integer specifies a maximum size that a meta-feature
can extend to do for a given search. Default is 7.
a logical value indicates whether or not to return
the best score corresponding to each top N searches only.
Default is FALSE.
a logical value indicates whether or not to plot the overlapping features of the resulting meta-feature matrix.
NOTE: plot can only be produced if the resulting meta-feature matrix contains
more than 1 feature (e.g. length(search_start) > 1 or top_N > 1).
Default is FALSE.
a logical value indicates whether or not to print the
diagnostic messages. Default is FALSE.
If best_score_only = TRUE, the heuristic search will return
the best feature whose its union meta-feature matrix has the highest score
among the top_N feature searches.
If best_score_only = FALSE, a list of objects pertaining to
top_N feature searches will be returned. For each top_N feature search,
the candidate search will contain 7 objects: (1) its best meta-feature matrix
(feature_set), (2) its observed input scores (input_score),
(3) its corresponding best score pertaining to the union meta-feature
matrix (score), (4) names of the best meta-features (best_features),
(5) rank of the best meta-features in term of their best scores (best indices),
(6) marginal scores of the best meta-features (marginal_best_scores),
(7) cumulative scores of the best meta-features (cumulative_best_scores).
NOTE: The legacy function topn_eval is equivalent to the recommended
candidate_search function
# Load pre-computed feature set
data(sim_FS)
# Load pre-computed input scores
data(sim_Scores)
# Define additional parameters and run the function
candidate_search_result <- candidate_search(
  FS = sim_FS, input_score = sim_Scores,
  method = "ks_pval", method_alternative = "less", weights = NULL,
  search_start = NULL, top_N = 3, search_method = "both",
  max_size = 7, best_score_only = FALSE
)