Abstract
DNA methylation and gene expression are interdependent and both implicated in cancer development and progression, with many individual biomarkers discovered. A joint analysis of the two data types can potentially lead to biological insights that are not discoverable with separate analyses. To optimally leverage the joint data for identifying perturbed genes and classifying clinical cancer samples, it is important to accurately model the interactions between the two data types. Here, we present EBADIMEX for jointly identifying differential expression and methylation and classifying samples. The moderated t-test widely used with empirical Bayes priors in current differential expression methods is generalised to a multivariate setting by developing: (1) a moderated Welch t-test for equality of means with unequal variances; (2) a moderated F-test for equality of variances; and (3) a multivariate test for equality of means with equal variances. This leads to parametric models with prior distributions for the parameters, which allow fast evaluation and robust analysis of small data sets. EBADIMEX is demonstrated on simulated data as well as a large breast cancer (BRCA) cohort from TCGA. We show that the use of empirical Bayes priors and moderated tests works particularly well on small data sets.
Funding source: Independent Research Fund Denmark
Award Identifier / Grant number: 7016-00379
Funding source: Sapere Aude
Award Identifier / Grant number: 12-126439
Funding source: Innovation Fund Denmark
Award Identifier / Grant number: 10-092320/DSF
Funding statement: This study was supported by Independent Research Fund Denmark, Grant Number: 7016-00379, Sapere Aude, Grant Number: 12-126439 and Innovation Fund Denmark, Grant Number: 10-092320/DSF.
Appendix
A Statistical notes
A.1 Huber Loss
To limit the influence of a single data type (expression or methylation), when performing classification, we use the Huber Loss instead of the log-density.
If
The Huber-loss function L(x) is linear when x is more than k standard deviations away from μ.
We use k = 2.5 in our data analyses.
A.2 Normalization
Let Rgi denote the raw read count for gene g in sample i. The read counts are scaled by a sample-specific factor and then log-transformed,
We implement two normalization strategies, namely total count (TC) and upper-quartile (UQ). In TC normalization we normalize with the total library size of the specific sample, i.e.
In UQ normalization we set Ki equal to the upper-quartile of the set
A.3 Filtering
We filter out lowly-expressed genes using a method outlined in Ding et al. (2015). For all genes we compute a specified quantile (by default the median, q = 0.5). Typically, a large group of genes display low median expression (Supplementary Figure S6). Based on a histogram we could dichotomize into two groups by manually setting a threshold. The selection of this threshold can be aided by fitting a two component normal mixture to the data: The genes having a posterior probability of belonging to the lowly expressed class larger than a half are not considered in subsequent analysis.
A.4 Kullback-Leibler divergence
Kullback-Leibler divergence can be used as an alternative ranking criterion to p-values. The Kullback-Leibler divergence between two normal distributions
More generally, for a multivariate normal, we have
B R-package
All functionality of EBADIMEX is implemented as an R-package. The package, with accompanying tutorial, is available at https://github.com/TobiasMadsen/EBADIMEX.
References
Aryee, M. J., A. E. Jaffe, H. Corrada-Bravo, C. Ladd-Acosta, A. P. Feinberg, K. D. Hansen and R. A. Irizarry (2014): “Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays,” Bioinformatics, 30, 1363–1369.10.1093/bioinformatics/btu049Search in Google Scholar PubMed PubMed Central
Bailer-Jones, C. and K. Smith (2011): Combining probabilities. Data Processing and Analysis Consortium (DPAS), GAIA-C8-TN-MPIA-CBJ-053.Search in Google Scholar
Bibikova, M., B. Barnes, C. Tsan, V. Ho, B. Klotzle, J. M. Le, D. Delano, L. Zhang, G. P. Schroth, K. L. Gunderson, J. B. Fan and R. Shen (2011): “High density DNA methylation array with single CpG site resolution,” Genomics, 98, 288–295.10.1016/j.ygeno.2011.07.007Search in Google Scholar PubMed
Breiman, L., A. Cutler, A. Liaw and M. Wiener (2006): “randomforest: Breiman and cutler’s random forests for classification and regression.”Search in Google Scholar
Brenet, F., M. Moh, P. Funk, E. Feierstein, A. J. Viale, N. D. Socci and J. M. Scandura (2011): “DNA methylation of the first exon is tightly linked to transcriptional silencing,” PloS One, 6, e14524.10.1371/journal.pone.0014524Search in Google Scholar PubMed PubMed Central
Bullard, J. H., E. Purdom, K. D. Hansen and S. Dudoit (2010): “Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments,” BMC Bioinformatics, 11, 94.10.1186/1471-2105-11-94Search in Google Scholar PubMed PubMed Central
Dedeurwaerder, S., M. Defrance, E. Calonne, H. Denis, C. Sotiriou and F. Fuks (2011): “Evaluation of the Infinium Methylation 450k Technology,” Epigenomics, 3, 771–784.10.2217/epi.11.105Search in Google Scholar PubMed
Demissie, M., B. Mascialino, S. Calza and Y. Pawitan (2008): “Unequal group variances in microarray data analyses,” Bioinformatics, 24, 1168–1174.10.1093/bioinformatics/btn100Search in Google Scholar PubMed
Ding, J., , M. K. McConechy, H. M. Horlings, G. Ha, F. C. Chan, T. Funnell, S. C. Mullaly, J. Reimand, A. Bashashati, G. D. Bader, D. Huntsman, S. Aparicio, A. Condon and S. P. Shah (2015): “Systematic analysis of somatic mutations impacting gene expression in 12 tumour types,” Nat. Commun., 6, 8554.10.1038/ncomms9554Search in Google Scholar PubMed PubMed Central
Dixon, W. J. and J. W. Tukey (1968): “Approximate behavior of the distribution of Winsorized t (trimming/winsorization 2),” Technometrics, 10, 83–98.10.2307/1266226Search in Google Scholar
Du, P., X. Zhang, C.-C. Huang, N. Jafari, W. A. Kibbe, L. Hou and S. M. Lin (2010): “Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis,” BMC Bioinformatics., 11, 587.10.1186/1471-2105-11-587Search in Google Scholar PubMed PubMed Central
Esteller, M. (2008): “Epigenetics in cancer,” N. Engl. J. Med., 358, 1148–1159.10.1056/NEJMra072067Search in Google Scholar PubMed
Fisher, R. A. (1932): Statistical methods for research workers, Oliver and Boyd, Edinburgh.Search in Google Scholar
Gelman, A. (2011): Arm: Data analysis using regression and multilevel/hierarchical models. http://cran. r-project. org/web/packages/arm.Search in Google Scholar
Grossman, R. L., A. P. Heath, V. Ferretti, H. E. Varmus, D. R. Lowy, W. A. Kibbe and L. M. Staudt (2016): “Toward a shared vision for cancer genomic data,” N. Engl. J. Med., 375, 1109–1112.10.1056/NEJMp1607591Search in Google Scholar
Huber, P. and E. Ronchetti (2009): Robust statistics, John Wiley & Sons, Inc., Hoboken, NJ, USA.10.1002/9780470434697Search in Google Scholar
Jeong, J., L. Li, Y. Liu, K. P. Nephew, T. H.-M. Huang and C. Shen (2010): “An empirical bayes model for gene expression and methylation profiles in antiestrogen resistant breast cancer,” BMC Med. Genomics, 3, 55.10.1186/1755-8794-3-55Search in Google Scholar
Jjingo, D., A. B. Conley, V. Y. Soojin, V. V. Lunyak and I. K. Jordan (2012): “On the presence and role of human gene-body DNA methylation,” Oncotarget, 3, 462–474.10.18632/oncotarget.497Search in Google Scholar PubMed
Jones, P. A. (2012): “Functions of DNA methylation: islands, start sites, gene bodies and beyond,” Nat. Rev. Genet., 13, 484.10.1038/nrg3230Search in Google Scholar PubMed
Jones, P. A. and S. B. Baylin (2007): “The epigenomics of cancer,” Cell, 128, 683–692.10.1016/j.cell.2007.01.029Search in Google Scholar PubMed
Karatzoglou, A., A. Smola and K. Hornik (2013): “Kernlab: Kernel-based machine learning lab. Eumetopias ju-batus) distributions and their environment,” J. Theor. Biol., 1–10.Search in Google Scholar
Kass, S. U., N. Landsberger and A. P. Wolffe (1997): “DNA methylation directs a time-dependent repression of transcription initiation,” Curr. Biol., 7, 157–165.10.1016/S0960-9822(97)70086-1Search in Google Scholar PubMed
Kristensen, V. N., O. C. Lingjærde, H. G. Russnes, H. K. M. Vollan, A. Frigessi and A.-L. Børresen-Dale (2014): “Principles and methods of integrative genomic analyses in cancer,” Nat. Rev. Cancer, 14, 299–313.10.1038/nrc3721Search in Google Scholar PubMed
Kuhn, M. (2015): “Caret: classification and regression training, Astrophysics Source Code Library”.Search in Google Scholar
Levenson, V. V. (2010): “DNA methylation as a universal biomarker,” Expert. Rev. Mol. Diagn., 10, 481–488.10.1586/erm.10.17Search in Google Scholar PubMed PubMed Central
List, M., A.-C. Hauschild, Q. Tan, T. A. Kruse, J. Baumbach and R. Batra (2014): Classification of breast cancer subtypes by combining gene expression and DNA methylation data,” J. Integr. Bioinform., 11, 1–14.10.1515/jib-2014-236Search in Google Scholar
Love, M. I., W. Huber and S. Anders (2014): “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2,” Genome Biol., 15, 550.10.1186/s13059-014-0550-8Search in Google Scholar PubMed PubMed Central
Ma, K., B. Cao and M. Guo (2016): “The detective, prognostic, and predictive value of DNA methylation in human esophageal squamous cell carcinoma,” Clin. Epigenetics, 8, 43.10.1186/s13148-016-0210-9Search in Google Scholar PubMed PubMed Central
McCarthy, D. J., Y. Chen and G. K. Smyth (2012): “Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation,” Nucleic Acids Res., 40, 4288–4297.10.1093/nar/gks042Search in Google Scholar PubMed PubMed Central
Mendizabal, I., J. Zeng, T. E. Keller and S. V. Yi (2017): “Body-hypomethylated human genes harbor extensive intragenic transcriptional activity and are prone to cancer-associated dysregulation,” Nucleic Acids Res., 45, 4390–4400.10.1093/nar/gkx020Search in Google Scholar PubMed PubMed Central
Meyer, D., E. Dimitriadou, K. Hornik, A. Weingessel and F. Leisch (2016): e1071: Misc functions of the department of statistics, probability theory group (formerly: E1071), tu wien, 2015, R package version, p. 1–6.Search in Google Scholar
Morris, T. J., L. M. Butcher, A. Feber, A. E. Teschendorff, A. R. Chakravarthy, T. K. Wojdacz and S. Beck (2013): “ChAMP: 450k chip analysis methylation pipeline,” Bioinformatics, 30, 428–430.10.1093/bioinformatics/btt684Search in Google Scholar PubMed PubMed Central
R Core Team (2017): R: A language and environment for statistical computing, R foundation for statistical computing, Vienna, Austria.Search in Google Scholar
Ritchie, M. E., B. Phipson, D. Wu, Y. Hu, C. W. Law, W. Shi and G. K. Smyth (2015): “limma powers differential expression analyses for RNA-sequencing and microarray studies,” Nucleic Acids Res., 43, e47.10.1093/nar/gkv007Search in Google Scholar PubMed PubMed Central
Scott, W. D. (2008): Multivariate density estimation: theory, practice, and visualization, John Wiley & Sons, Inc., Hoboken, NJ, USA.Search in Google Scholar
Smyth, Gordon K. (2004): “Linear models and empirical bayes methods for assessing differential expression in microarray experiments,” Stat. Appl. Genet. Mol. Biol., 3, 1–25.10.2202/1544-6115.1027Search in Google Scholar PubMed
Smith, Z. D. and A. Meissner (2013): “DNA methylation: roles in mammalian development,” Nat. Rev. Genet., 14, 204–220.10.1038/nrg3354Search in Google Scholar PubMed
Smith, A. D., D. Roda and T. A. Yap (2014): “Strategies for modern biomarker and drug development in oncology,” J. Hematol. Oncol., 7, 70.10.1186/s13045-014-0070-8Search in Google Scholar PubMed PubMed Central
Strand, S. H., T. F. Orntoft and K. D. Sorensen (2014): “Prognostic DNA methylation markers for prostate cancer,” Int. J. Mol. Sci., 15, 16544–16576.10.3390/ijms150916544Search in Google Scholar PubMed PubMed Central
Świtnicki, M. P., M. Juul, T. Madsen, K. D. Sørensen and J. S. Pedersen (2016): “PINCAGE: probabilistic integration of cancer genomics data for perturbed gene identification and sample classification,” Bioinformatics, 32, 1353–1365.10.1093/bioinformatics/btv758Search in Google Scholar PubMed
Weinstein, J. N., E. A. Collisson, G. B. Mills, K. R. M. Shaw, B. A. Ozenberger, K. Ellrott, I. Shmulevich, C. Sander and J. M. Stuart (2013): “The cancer genome atlas pan-cancer analysis project,” Nat. Genet., 45, 1113–1120.10.1038/ng.2764Search in Google Scholar PubMed PubMed Central
Wu, D., J. Gu and M. Q. Zhang (2013): “FastDMA: an infinium humanmethylation450 beadchip analyzer,” PloS One, 8, e74275.10.1371/journal.pone.0074275Search in Google Scholar PubMed PubMed Central
Yang, X., H. Han, D. D. De Carvalho, F. D. Lay, P. A. Jones and G. Liang (2014): “Gene body methylation can alter gene expression and is a therapeutic target in cancer,” Cancer Cell, 26, 577–590.10.1016/j.ccr.2014.07.028Search in Google Scholar PubMed PubMed Central
Zhong, D. and H. Cen (2017): “Aberrant promoter methylation profiles and association with survival in patients with hepatocellular carcinoma,” OncoTargets Ther., 10, 2501.10.2147/OTT.S128058Search in Google Scholar PubMed PubMed Central
Supplementary Material
The online version of this article offers supplementary material (DOI: https://doi.org/10.1515/sagmb-2018-0050).
© 2019 Walter de Gruyter GmbH, Berlin/Boston