vignettes/iedge-simple-run.Rmd
iedge-simple-run.Rmd
For this tutorial we will simulate the input gene expression data, the cis and trans genes with known mediations, and we will run iEDGE to recover these mediations.
We start by generating the dataset. You can use construct_iEDGE
to generate the iEDGE dataset from standard matrices (binary matrices for alteration data or numeric matrices for expression data).
Please note that this simulation is optimized for the evaluation of the iEDGE methology, and its only purpose is to demonstrate how to construct an iEDGE object from scratch for use in standard analysis.
library(iEDGE)
set.seed(1)
# define number of samples
n <- 50
# construct 'cn': binary alternations by samples matrix two
# independent alterations across 50 samples
cn1 <- sample(x= c(1,0), size = n, replace = T, prob = c(0.7, 0.3))
cn2 <- sample(x= c(1,0), size = n, replace = T, prob = c(0.5, 0.5))
cn <- rbind(cn1, cn2)
colnames(cn) <- paste0("sample_", 1:ncol(cn))
# construct 'cis' gene expressions: genes correlated with one or more
# alterations in 'cn'
cis <- t(sapply(c(1,1,1,2,2,2), function(i) {
3 + cn[i, ] + rnorm(n, 0, 0.3)
}))
rownames(cis) <- make.unique(paste0("cis", c(1,1,1,2,2,2)))
# construct 'trans-mediated' gene expression: gene correlated with
# one or more alteration in 'cn' and mediated through a cis gene
cis.m <- sample(rownames(cis), 10, replace = T)
trans.m <- t(sapply(1:length(cis.m), function(i)
3 + cis[cis.m[i], ] + rnorm(n, 0, 0.3)
))
rownames(trans.m) <- make.unique(paste0("t.m.", cis.m))
# construct 'trans-direct' gene expression: trans gene correlated
# with one or more alteration in 'cn' but not mediated through cis
# genes, this would be similar to construction of cis genes
trans.d <- t(sapply(c(1,1,1,2,2,2), function(i)
3 + cn[i, ] + rnorm(n, 0, 0.6)
))
rownames(trans.d) <- make.unique(paste0("t.d.", c(1,1,1,2,2,2)))
# uncorrelated genes
others <- t(sapply(1:30, function(i) 3+rnorm(n, 0, 0.3)))
rownames(others) <- paste0("others_", 1:nrow(others))
# putting genes together in expression matrix
gep <- rbind(cis, trans.m, trans.d, others)
colnames(gep) <- paste0("sample_", 1:ncol(gep))
cisgenes <- list(cn1 = grep("cis1", rownames(cis), value = T),
cn2 = grep("cis2", rownames(cis), value = T))
# the direction column in cn.fdat specifies whether to compute
# one-sided or two-sided p-values in differential tests. In this
# simulation we expect an increase in expression with the presence of
# the alteration, hence a one-sided ("UP") p-value is computed.
cn.fdat <- data.frame(cnid = rownames(cn), direction = "UP")
rownames(cn.fdat) <- rownames(cn)
gep.fdat <- data.frame(geneid = rownames(gep))
rownames(gep.fdat) <- rownames(gep)
dat <- construct_iEDGE(cn, gep, cisgenes, transgenes = NA, cn.fdat, gep.fdat,
cn.pdat = NA, gep.pdat = NA)
The iEDGE object is a list consisting of:
cn: Copy-Number of other binary alteration ExpressionSet:
exprs(dat$cn): the binary matrix encoding presence or absence of an alteration (c alterations by n samples)
fData(dat$cn): alteration annotation data frame (c alterations by g alteration annotation columns). Please note that if direction of differential expression test is dependent on the alteration, specify as such to be used in run_iEDGE, e.g. run_iEDGE(… cndir = “direction”, uptest = “UP”, downtest = “DOWN”) corresponds to reporting one-sided pvalues for significantly upregulated in genes for alterations in which fData(dat$cn)[, “direction”] == “UP”, or significantly downregulated genes for alterations in which fData(dat$cn)[, “direction”] == “DOWN”
pData(dat$cn): sample annotation data frame (n samples by l sample annotation columns)
gep: The gene expression ExpressionSet:
exprs(dat$gep): gene expression matrix, assumed to be already log2 transformed (m genes by n samples)
fData(dat$gep): gene annotation data frame (m genes by j gene annotation columns)
pData(dat$gep): sample annotation data frame (n samples by k sample annotation columns)
cisgenes: A list of character vectors specifying the genes that in the cis regions of each alteration, must be in the same order as the rows in exprs(dat$cn). For copy number alterations we defined the cis genes to be genes in the focal peaks of each amplification or deletion of interest.
transgenes (optional): A list of character vectors specifying the genes outside the cis regions (trans genes). If not specified, all genes in the expression matrix that are not the cis genes are considered. Trans genes specification is not recommended for exploratory analysis but may be useful for testing mediation and pathway enrichment for particular target gene sets in hypothesis-driven approaches.
## $cn
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 2 features, 50 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: sample_1 sample_2 ... sample_50 (50 total)
## varLabels: colid
## varMetadata: labelDescription
## featureData
## featureNames: cn1 cn2
## fvarLabels: cnid direction
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
##
## $gep
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 52 features, 50 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: sample_1 sample_2 ... sample_50 (50 total)
## varLabels: colid
## varMetadata: labelDescription
## featureData
## featureNames: cis1 cis1.1 ... others_30 (52 total)
## fvarLabels: geneid
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
##
## $cisgenes
## $cisgenes$cn1
## [1] "cis1" "cis1.1" "cis1.2"
##
## $cisgenes$cn2
## [1] "cis2" "cis2.1" "cis2.2"
Now that the iEDGE object is created, we can run the iEDGE pipeline on this dataset.
This will create the results object with differential expression tables for cis and trans genes
# cis and trans differential expression analysis only
# no mediation analysis
# no html report
res <- run_iEDGE(dat = dat, header = "testrun", outdir = ".",
cnid = "cnid", cndesc = "cnid", cndir = "direction",
uptest = "UP", downtest = "DOWN",
gepid = "geneid", enrichment = FALSE, bipartite = FALSE, html = FALSE)
## Running iEDGE for data set: testrun
## Making Differential Expression tables...
## Performancing cis analysis...
## Running iEDGE differential expression for 2 alterations:
##
##
|
| | 0%
|
|================================ | 50%
|
|=================================================================| 100%
## Writing table to ./testrun/tables/testrun_cis_sig_fdr_0.25.txt
## Writing table to ./testrun/tables/testrun_cis_full.txt
## Performancing trans analysis...
## Running iEDGE differential expression for 2 alterations:
##
##
|
| | 0%
|
|================================ | 50%
|
|=================================================================| 100%
## Writing table to ./testrun/tables/testrun_trans_sig_fdr_0.05.txt
## Done!
names(res)
## [1] "de" "de.enrich" "pruning" "ui"
head(res$de$cis$full)
## geneid fold.change adj.P.Val.all t high.class mean1.unlog
## cis1.1 cis1.1 2.132349 3.810924e-24 12.45441 case 15.10178
## cis1.2 cis1.2 2.037804 1.156082e-22 11.70850 case 15.71264
## cis1 cis1 1.916725 3.747522e-20 10.70101 case 15.44336
## cis2 cis2 2.045179 1.156082e-22 11.70744 case 15.49973
## cis2.1 cis2.1 1.918057 3.896869e-20 10.65738 case 15.27832
## cis2.2 cis2.2 1.900450 8.035568e-20 10.50649 case 15.25603
## mean0.unlog sd0.unlog sd1.unlog cnid direction interaction_type
## cis1.1 6.542553 1.465369 3.168008 cn1 UP cis
## cis1.2 7.212833 1.903196 3.605576 cn1 UP cis
## cis1 7.670445 2.081563 3.150070 cn1 UP cis
## cis2 7.075236 1.886268 3.559967 cn2 UP cis
## cis2.1 7.446159 1.652364 3.521293 cn2 UP cis
## cis2.2 7.512069 1.634062 3.505537 cn2 UP cis
## logFC AveExpr P.Value adj.P.Val B mean0.log
## cis1.1 1.0924433 3.634254 6.351540e-25 1.905462e-24 69.81907 2.891393
## cis1.2 1.0270153 3.703061 5.743513e-23 8.615269e-23 60.83393 3.004690
## cis1 0.9386434 3.715198 2.498348e-20 2.498348e-20 49.57859 3.076920
## cis2 1.0322269 3.410606 5.780412e-23 1.734124e-22 60.85680 2.977071
## cis2.1 0.9396453 3.447198 3.247391e-20 4.871087e-20 49.15384 3.052547
## cis2.2 0.9263412 3.452841 8.035568e-20 8.035568e-20 47.56241 3.063777
## mean1.log sd0.log sd1.log n0 n1
## cis1.1 3.983836 0.2661433 0.2704973 16 34
## cis1.2 4.031706 0.3132603 0.3018580 16 34
## cis1 4.015564 0.3486787 0.2625479 16 34
## cis2 4.009298 0.3282196 0.3359340 29 21
## cis2.1 3.992192 0.2774623 0.3171413 29 21
## cis2.2 3.990118 0.2782353 0.3182936 29 21
head(res$de$trans$full)
## geneid fold.change adj.P.Val.all t high.class
## t.m.cis1.1 t.m.cis1.1 2.058719 1.113504e-09 8.257112 case
## t.m.cis1.2 t.m.cis1.2 2.048716 3.095161e-09 7.909354 case
## t.m.cis1 t.m.cis1 1.875885 3.747706e-09 7.773147 case
## t.d.1 t.d.1 2.068440 5.221347e-06 5.636337 case
## t.d.1.1 t.d.1.1 2.036324 7.517011e-06 5.514853 case
## t.d.1.2 t.d.1.2 1.910407 5.188124e-04 4.259915 case
## mean1.unlog mean0.unlog sd0.unlog sd1.unlog cnid direction
## t.m.cis1.1 130.97073 62.936666 22.403932 43.210031 cn1 UP
## t.m.cis1.2 137.37791 68.380230 26.189159 37.396372 cn1 UP
## t.m.cis1 131.10487 70.856702 21.898342 30.521851 cn1 UP
## t.d.1 15.96287 7.664441 5.865252 7.642273 cn1 UP
## t.d.1.1 16.21092 7.938313 5.287850 6.876634 cn1 UP
## t.d.1.2 16.02314 8.130853 5.495517 8.454433 cn1 UP
## interaction_type logFC AveExpr P.Value adj.P.Val
## t.m.cis1.1 trans 1.0417471 6.645719 4.544915e-11 2.227009e-09
## t.m.cis1.2 trans 1.0347199 6.729740 1.623003e-10 3.976358e-09
## t.m.cis1 trans 0.9075714 6.716427 2.676933e-10 4.372324e-09
## t.d.1 trans 1.0485433 3.626562 6.926277e-07 8.484689e-06
## t.d.1.1 trans 1.0259674 3.666232 1.073859e-06 1.052382e-05
## t.d.1.2 trans 0.9338801 3.618542 8.470406e-05 6.917498e-04
## B mean0.log mean1.log sd0.log sd1.log n0 n1
## t.m.cis1.1 14.9623347 5.937331 6.979078 0.4059199 0.4280755 16 34
## t.m.cis1.2 13.7029102 6.026131 7.060851 0.5216744 0.3940067 16 34
## t.m.cis1 13.2079775 6.099279 7.006850 0.4695779 0.3438308 16 34
## t.d.1 5.4640829 2.913553 3.962096 0.7221742 0.5880397 16 34
## t.d.1.1 5.0346486 2.968574 3.994541 0.7366806 0.5798271 16 34
## t.d.1.2 0.7893726 2.983503 3.917383 0.7726423 0.7383662 16 34
This will create the results object with differential expression tables for cis and trans genes only, and an associated HTML report
# cis and trans differential expression analysis only
# no mediation analysis
# do html report
res <- run_iEDGE(dat = dat, header = "testrun", outdir = ".",
cnid = "cnid", cndesc = "cnid", cndir = "direction",
uptest = "UP", downtest = "DOWN",
gepid = "geneid", enrichment = FALSE,
bipartite = FALSE, html = TRUE)
## Running iEDGE for data set: testrun
## Making Differential Expression tables...
## Performancing cis analysis...
## Running iEDGE differential expression for 2 alterations:
##
##
|
| | 0%
|
|================================ | 50%
|
|=================================================================| 100%
## Writing table to ./testrun/tables/testrun_cis_sig_fdr_0.25.txt
## Writing table to ./testrun/tables/testrun_cis_full.txt
## Performancing trans analysis...
## Running iEDGE differential expression for 2 alterations:
##
##
|
| | 0%
|
|================================ | 50%
|
|=================================================================| 100%
## Writing table to ./testrun/tables/testrun_trans_sig_fdr_0.05.txt
## Done!
## Writing by alteration cis full DE tables...
## Writing cis boxplots...
## Writing trans boxplots...
## Writing by alteration cis DE tables...
## Writing by alteration trans DE tables...
## Writing summary table...
## Adding links to main table...
## Adding main table to main index...
## Done constructing html report at directory: ./testrun/html
This will create the results object with differential expression tables for cis and trans genes and mediation analysis, and an associated HTML report
# cis and trans differential expression analysis only
# do mediation analysis
# do html report
res <- run_iEDGE(dat = dat, header = "testrun", outdir = ".",
cnid = "cnid", cndesc = "cnid", cndir = "direction",
uptest = "UP", downtest = "DOWN",
gepid = "geneid", enrichment = FALSE,
bipartite = TRUE, html = TRUE)
## Running iEDGE for data set: testrun
## Making Differential Expression tables...
## Performancing cis analysis...
## Running iEDGE differential expression for 2 alterations:
##
##
|
| | 0%
|
|================================ | 50%
|
|=================================================================| 100%
## Writing table to ./testrun/tables/testrun_cis_sig_fdr_0.25.txt
## Writing table to ./testrun/tables/testrun_cis_full.txt
## Performancing trans analysis...
## Running iEDGE differential expression for 2 alterations:
##
##
|
| | 0%
|
|================================ | 50%
|
|=================================================================| 100%
## Writing table to ./testrun/tables/testrun_trans_sig_fdr_0.05.txt
## Done!
## Running pruning for cn1
## [1] 1
## [1] "number cis"
## [1] 3
## [1] "number trans"
## [1] 6
## Performing cis ordering...
## Performing trans ordering on 5 sets...
##
## Running pruning for cn2
## [1] 2
## [1] "number cis"
## [1] 3
## [1] "number trans"
## [1] 10
## Performing cis ordering...
## Performing trans ordering on 8 sets...
##
## Writing by alteration cis full DE tables...
## Writing cis boxplots...
## Writing trans boxplots...
## Writing by alteration cis DE tables...
## Writing by alteration trans DE tables...
## Writing summary table...
## Adding links to main table...
## Adding main table to main index...
## Writing bipartite graphs...
## Done constructing html report at directory: ./testrun/html
## $cn1
## cis trans yind zind taudiff tau0 tau1 tauratio
## 1 cis1.2 t.d.1 2 4 0.5420271 1.0485433 0.506516245 0.5169334
## 3 cis1.1 t.d.1.2 1 6 1.2585709 0.9338801 -0.324690839 1.3476794
## 4 cis1 t.m.cis1 3 3 0.7631243 0.9075714 0.144447104 0.8408422
## 5 cis1.1 t.m.cis1.1 1 1 1.2319340 1.0417471 -0.190186947 1.1825654
## 6 cis1 t.m.cis1.2 3 2 1.0403221 1.0347199 -0.005602134 1.0054142
## Z S pvalue fdr
## 1 1.782062 0.3041573 3.736959e-02 1.345305e-01
## 3 3.035446 0.4146247 1.200901e-03 5.404054e-03
## 4 4.753845 0.1605278 9.979214e-07 5.987529e-06
## 5 6.287764 0.1959256 1.610360e-10 2.898648e-09
## 6 6.139045 0.1694599 4.150955e-10 3.735860e-09
##
## $cn2
## cis trans yind zind taudiff tau0 tau1 tauratio
## 3 cis2.1 t.d.2.2 2 8 0.4916785 1.0756830 0.58400453 0.4570849
## 4 cis2 t.m.cis2 1 1 0.8209718 1.1270505 0.30607867 0.7284251
## 5 cis2.1 t.m.cis2.1 2 3 0.8365285 0.9352537 0.09872523 0.8944402
## 6 cis2.1 t.m.cis2.1.1 2 6 1.0545167 0.8939961 -0.16052060 1.1795540
## 7 cis2.1 t.m.cis2.1.2 2 7 0.9844624 0.8363706 -0.14809181 1.1770648
## 8 cis2.2 t.m.cis2.2 3 2 0.8995824 0.9720952 0.07251276 0.9254057
## 9 cis2 t.m.cis2.3 1 5 0.9507412 0.9825324 0.03179122 0.9676436
## 10 cis2 t.m.cis2.4 1 4 1.2855678 1.1221337 -0.16343419 1.1456459
## Z S pvalue fdr
## 3 1.758017 0.2796779 3.937231e-02 1.476461e-01
## 4 4.995617 0.1643384 2.932394e-07 1.256740e-06
## 5 5.773441 0.1448925 3.883433e-09 2.912574e-08
## 6 6.737861 0.1565061 8.036751e-12 1.205513e-10
## 7 5.507752 0.1787412 1.817228e-08 1.090337e-07
## 8 6.483173 0.1387565 4.490663e-11 4.490663e-10
## 9 5.116760 0.1858092 1.554143e-07 7.770716e-07
## 10 7.032506 0.1828037 1.014282e-12 3.042846e-11
The result html page should open automatically, and it will include an additional column “bipartite” with hyperlinks to the graphical display of the mediation analysis.
The HTML report of the final run is contained here