CaDrA.shiny
An R Shiny Dashboard for Interacting with CaDrA Package
CaDrA.shiny is an interactive R Shiny dashboard developed to allow users to directly interact with CaDrA package. CaDrA is an R package that supports a heuristic search framework aimed at identifying candidate drivers of a molecular phenotype of interest (visit our Github repo for more details).
The CaDrA’s Shiny dashboard has two distinctive features:
Data visualizations include:
CaDrA.shiny is currently containerized using Docker and can be deployed on any compatible cloud-based services.
# Install BiocManager
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
# Install CaDrA
BiocManager::install("CaDrA")
# Install GSVA
BiocManager::install("GSVA")
# Install hypeR
library(devtools)
devtools::install_github("montilab/hypeR")
# Install CaDrA.shiny
devtools::install_github("montilab/CaDrA.shiny")
Here, we show how to run CaDrA on a dataset downloaded from the CaDrA Portal, using input_score’s derived by applying GSVA to the downloaded gene expression dataset and the Hallmarks genesets. We will run a CaDrA search to look for genetic drivers of the “Epithelial Mesenchymal Transition” (EMT)-derived score.
## Get a list of descriptors of feature sets available on CaDrA Portal
fs_list <- CaDrA.shiny::get_feature_set(order_by="asc")
## Show the description of the first few feature sets
knitr::kable(head(fs_list), row.names = FALSE)
feature_set_name | description | collection |
---|---|---|
CCLE SCNAs and Mutations | Somatic copy number alterations and mutations from CCLE. See ?CaDrA::CCLE_MUT_SCNA | CCLE |
Simulated Feature Set | Simulated feature set comprises of 1000 genomic features and 100 sample profiles. This simulated data includes 10 left-skewed (i.e. True Positive or TP) and 990 uniformly-distributed (i.e. True Null or TN) features. See ?CaDrA::BRCA_GISTIC_MUT_SIG | Simulated |
TCGA BrCa SCNAs and Mutations | Somatic copy number alterations and mutations from BRCA TCGA. See ?CaDrA::BRCA_GISTIC_MUT_SIG | TCGA |
TCGA_ACC_2016_01_28_GISTIC_MUT_SIG | ACC | TCGA |
TCGA_BLCA_2016_01_28_GISTIC_MUT_SIG | BLCA | TCGA |
TCGA_BRCA_2016_01_28_GISTIC_MUT_SIG | BRCA | TCGA |
## Retrieve the dataset (both genetic feature set and gene expression)
datasets <- CaDrA.shiny::pull_datasets(
feature_set = "TCGA_HNSC_2016_01_28_GISTIC_MUT_SIG",
include_gene_expression = TRUE
)
datasets
$feature_set
class: RangedSummarizedExperiment
dim: 12852 279
metadata(3): experimentData annotation protocolData
assays(1): exprs
rownames(12852): Amp2q11.2 Amp2q31.2 ... ZZEF1 ZZZ3
rowData names(1): Feature
colnames(279): TCGA-BA-4074-01 TCGA-BA-4076-01 ... TCGA-IQ-7631-01
TCGA-IQ-7632-01
colData names(0):
$gene_expression
class: RangedSummarizedExperiment
dim: 20234 566
metadata(3): experimentData annotation protocolData
assays(1): exprs
rownames(20234): SMR3B STATH ... SCARNA27 MS4A5
rowData names(1): Genes
colnames(566): TCGA-4P-AA8J-01 TCGA-BA-4074-01 ... TCGA-WA-A7GZ-01
TCGA-WA-A7H4-01
colData names(1): Samples
## download MSigDB’s Hallmark genesets
hallmarks <- hypeR::msigdb_gsets("Homo sapiens", "H", clean=TRUE)$genesets # returns 50 genesets
# Compute the gsva scores of the 50 hallmark genesets
input_score_matrix <- GSVA::gsva(
expr = SummarizedExperiment::assay(datasets$gene_expression),
gset.idx.list = hallmarks,
method = "gsva",
mx.diff = TRUE,
verbose = FALSE
)
## Show few entries of the returned hallmark-by-sample matrix
knitr::kable(input_score_matrix[1:5, 1:5])
TCGA-4P-AA8J-01 | TCGA-BA-4074-01 | TCGA-BA-4075-01 | TCGA-BA-4076-01 | TCGA-BA-4077-01 | |
---|---|---|---|---|---|
Adipogenesis | -0.0160921 | 0.1648353 | 0.2211107 | -0.0737085 | -0.0115116 |
Allograft Rejection | -0.0049091 | -0.0081298 | -0.0447976 | -0.3880485 | 0.4610353 |
Androgen Response | -0.3956958 | 0.0941687 | 0.1397883 | 0.1266452 | 0.2823991 |
Angiogenesis | 0.3141972 | 0.2880628 | 0.3441984 | -0.1819343 | 0.0226343 |
Apical Junction | 0.1902571 | -0.2399212 | -0.1048696 | -0.0193586 | 0.1687033 |
## Samples to keep based on the overlap between the two inputs
sample_overlap <- intersect(colnames(input_score_matrix), colnames(datasets$feature_set))
input_score <- input_score_matrix["Epithelial Mesenchymal Transition", sample_overlap]
FS <- datasets$feature_set[, sample_overlap, drop = FALSE]
## Pre-filter FS based on occurrence frequency
FS_filtered <- CaDrA::prefilter_data(
FS = FS,
max_cutoff = 0.6, # max event frequency (60%)
min_cutoff = round(5/ncol(FS), 2) # make sure min event frequency has at least 5 samples
)
## Run candidate search
topn_result <- CaDrA::candidate_search(
FS = FS_filtered,
input_score = input_score,
method = "ks_pval", # Use Kolmogorov-Smirnov scoring function
method_alternative = "less", # Use one-sided hypothesis testing
weights = NULL, # If weights are provided, perform a weighted-KS (gsea-like) test
search_method = "both", # Apply both forward and backward search
top_N = 1, # Perform only one search (starting from top scoring feature)
max_size = 7, # Maximum number of features to include in the returned meta-feature
do_plot = FALSE, # Plot after finding the best features
best_score_only = FALSE # Return all results from the search
)
## Fetch the meta-feature yielding the best score over N searches
## .. (in this example, only N=1 search was performed)
topn_best_meta <- CaDrA::topn_best(topn_result)
## Visualize the best results with the meta-feature plot
CaDrA::meta_plot(topn_best_list = topn_best_meta, input_score_label = NULL)
## Permutation seed (for reproducible results)
set.seed(123)
## Run CaDrA Search 100 times on permuted input scores to estimate the null distribution
perm_res <- CaDrA::CaDrA(
FS = FS_filtered,
input_score = input_score,
method = "ks_pval",
method_alternative = "less",
top_N = 1,
max_size = 7,
search_method = "both",
n_perm = 100,
perm_alternative = "one.sided",
ncores = 2,
cache = FALSE
)
## Visualize permutation results
CaDrA::permutation_plot(perm_res = perm_res)
Here, we show how to launch a local instance of the CaDrA Portal, which will be populated with a user-selected set of datasets. In the example, a single dataset (ACC) will be uploaded to the portal.
# Download feature sets from CaDrA portal and return a datalist with appropriate paths to dataset
mydatafile <- CaDrA.shiny::download_feature_sets(
#feature_set = fs_list$feature_set_name, # this would download all TCGA datasets
feature_set = "TCGA_ACC_2016_01_28_GISTIC_MUT_SIG",
include_input_score = TRUE,
include_gene_expression = TRUE,
out_dir = file.path(Sys.getenv("HOME"), "Github") # specify your folder of choice here
)
feature_set_name | feature_set_path | input_score_name | input_score_path | gene_expression_name | gene_expression_path |
---|---|---|---|---|---|
TCGA_ACC_2016_01_28_GISTIC_MUT_SIG | /Users/reinachau/Github/download-fs-2023-11-27/TCGA_ACC_2016_01_28_GISTIC_MUT_SIG/feature_set/TCGA_ACC_2016_01_28_GISTIC_MUT_SIG.rds | NA | /Users/reinachau/Github/download-fs-2023-11-27/TCGA_ACC_2016_01_28_GISTIC_MUT_SIG/input_score/NA | TCGA_ACC_2016_01_28_Gene_Expression | /Users/reinachau/Github/download-fs-2023-11-27/TCGA_ACC_2016_01_28_GISTIC_MUT_SIG/gene_expression/TCGA_ACC_2016_01_28_Gene_Expression.rds |
TCGA_ACC_2016_01_28_GISTIC_MUT_SIG | /Users/reinachau/Github/download-fs-2023-11-27/TCGA_ACC_2016_01_28_GISTIC_MUT_SIG/feature_set/TCGA_ACC_2016_01_28_GISTIC_MUT_SIG.rds | NA | /Users/reinachau/Github/download-fs-2023-11-27/TCGA_ACC_2016_01_28_GISTIC_MUT_SIG/input_score/NA | TCGA_ACC_2016_01_28_Gene_Expression | /Users/reinachau/Github/download-fs-2023-11-27/TCGA_ACC_2016_01_28_GISTIC_MUT_SIG/gene_expression/TCGA_ACC_2016_01_28_Gene_Expression.rds |
There are five tabs on CaDrA’s Shiny dashboard. Explore each tab and see what they do.
To get help with CaDrA, visit our Github CaDrA dicussion or Github CaDrA issues.
To get help with CaDrA.shiny, visit our Github CaDrA.shiny dicussion or Github CaDrA.shiny issues.