Abstract
The diversity and huge omics data take biology and biomedicine research and application into a big data era, just like that popular in human society a decade ago. They are opening a new challenge from horizontal data ensemble (e.g., the similar types of data collected from different labs or companies) to vertical data ensemble (e.g., the different types of data collected for a group of person with match information), which requires the integrative analysis in biology and biomedicine and also asks for emergent development of data integration to address the great changes from previous population-guided to newly individual-guided investigations.
Data integration is an effective concept to solve the complex problem or understand the complicate system. Several benchmark studies have revealed the heterogeneity and trade-off that existed in the analysis of omics data. Integrative analysis can combine and investigate many datasets in a cost-effective reproducible way. Current integration approaches on biological data have two modes: one is “bottom-up integration” mode with follow-up manual integration, and the other one is “top-down integration” mode with follow-up in silico integration.
This paper will firstly summarize the combinatory analysis approaches to give candidate protocol on biological experiment design for effectively integrative study on genomics and then survey the data fusion approaches to give helpful instruction on computational model development for biological significance detection, which have also provided newly data resources and analysis tools to support the precision medicine dependent on the big biomedical data. Finally, the problems and future directions are highlighted for integrative analysis of omics big data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Field D, Sansone SA, Collis A, Booth T, Dukes P, Gregurick SK, Kennedy K, Kolar P, Kolker E, Maxon M, Millard S, Mugabushaka AM, Perrin N, Remacle JE, Remington K, Rocca-Serra P, Taylor CF, Thorley M, Tiwari B, Wilbanks J (2009) Megascience. ‘Omics data sharing’. Science 326(5950):234–236. https://doi.org/10.1126/science.1180598
Vo TV, Das J, Meyer MJ, Cordero NA, Akturk N, Wei X, Fair BJ, Degatano AG, Fragoza R, Liu LG, Matsuyama A, Trickey M, Horibata S, Grimson A, Yamano H, Yoshida M, Roth FP, Pleiss JA, Xia Y, Yu H (2016) A proteome-wide fission yeast interactome reveals network evolution principles from yeasts to human. Cell 164(1–2):310–323. https://doi.org/10.1016/j.cell.2015.11.037
Madhani HD, Francis NJ, Kingston RE, Kornberg RD, Moazed D, Narlikar GJ, Panning B, Struhl K (2008) Epigenomics: a roadmap, but to where? Science 322(5898):43–44. https://doi.org/10.1126/science.322.5898.43b
Romanoski CE, Glass CK, Stunnenberg HG, Wilson L, Almouzni G (2015) Epigenomics: roadmap for regulation. Nature 518(7539):314–316. https://doi.org/10.1038/518314a
Lage K, Karlberg EO, Storling ZM, Olason PI, Pedersen AG, Rigina O, Hinsby AM, Tumer Z, Pociot F, Tommerup N, Moreau Y, Brunak S (2007) A human phenome-interactome network of protein complexes implicated in genetic disorders. Nat Biotechnol 25(3):309–316. https://doi.org/10.1038/nbt1295
Nicholson JK, Lindon JC (2008) Systems biology: metabonomics. Nature 455(7216):1054–1056. https://doi.org/10.1038/4551054a
Rolland T, Tasan M, Charloteaux B, Pevzner SJ, Zhong Q, Sahni N, Yi S, Lemmens I, Fontanillo C, Mosca R, Kamburov A, Ghiassian SD, Yang X, Ghamsari L, Balcha D, Begg BE, Braun P, Brehme M, Broly MP, Carvunis AR, Convery-Zupan D, Corominas R, Coulombe-Huntington J, Dann E, Dreze M, Dricot A, Fan C, Franzosa E, Gebreab F, Gutierrez BJ, Hardy MF, Jin M, Kang S, Kiros R, Lin GN, Luck K, MacWilliams A, Menche J, Murray RR, Palagi A, Poulin MM, Rambout X, Rasla J, Reichert P, Romero V, Ruyssinck E, Sahalie JM, Scholz A, Shah AA, Sharma A, Shen Y, Spirohn K, Tam S, Tejeda AO, Trigg SA, Twizere JC, Vega K, Walsh J, Cusick ME, Xia Y, Barabasi AL, Iakoucheva LM, Aloy P, De Las Rivas J, Tavernier J, Calderwood MA, Hill DE, Hao T, Roth FP, Vidal M (2014) A proteome-scale map of the human interactome network. Cell 159(5):1212–1226. https://doi.org/10.1016/j.cell.2014.10.050
Friedel CC, Zimmer R (2006) Toward the complete interactome. Nat Biotechnol 24(6):614–615.; Author reply 615. https://doi.org/10.1038/nbt0606-614
Buxton B, Hayward V, Pearson I, Karkkainen L, Greiner H, Dyson E, Ito J, Chung A, Kelly K, Schillace S (2008) Big data: the next Google. Interview by Duncan Graham-Rowe. Nature 455(7209):8–9. https://doi.org/10.1038/455008a
Kirk P, Griffin JE, Savage RS, Ghahramani Z, Wild DL (2012) Bayesian correlated clustering to integrate multiple datasets. Bioinformatics 28(24):3290–3297. https://doi.org/10.1093/bioinformatics/bts595
Mo Q, Wang S, Seshan VE, Olshen AB, Schultz N, Sander C, Powers RS, Ladanyi M, Shen R (2013) Pattern discovery and cancer gene identification in integrated cancer genomic data. Proc Natl Acad Sci U S A 110(11):4245–4250. https://doi.org/10.1073/pnas.1208949110
Rapport DJ, Maffi L (2013) A call for integrative thinking. Science 339(6123):1032. https://doi.org/10.1126/science.339.6123.1032-a
Wen Y, Wei Y, Zhang S, Li S, Liu H, Wang F, Zhao Y, Zhang D, Zhang Y (2016) Cell subpopulation deconvolution reveals breast cancer heterogeneity based on DNA methylation signature. Brief Bioinform. https://doi.org/10.1093/bib/bbw028
Voillet V, Besse P, Liaubet L, San Cristobal M, Gonzalez I (2016) Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework. BMC Bioinformatics 17(1):402. https://doi.org/10.1186/s12859-016-1273-5
Weischenfeldt J, Simon R, Feuerbach L, Schlangen K, Weichenhan D, Minner S, Wuttig D, Warnatz HJ, Stehr H, Rausch T, Jager N, Gu L, Bogatyrova O, Stutz AM, Claus R, Eils J, Eils R, Gerhauser C, Huang PH, Hutter B, Kabbe R, Lawerenz C, Radomski S, Bartholomae CC, Falth M, Gade S, Schmidt M, Amschler N, Hass T, Galal R, Gjoni J, Kuner R, Baer C, Masser S, von Kalle C, Zichner T, Benes V, Raeder B, Mader M, Amstislavskiy V, Avci M, Lehrach H, Parkhomchuk D, Sultan M, Burkhardt L, Graefen M, Huland H, Kluth M, Krohn A, Sirma H, Stumm L, Steurer S, Grupp K, Sultmann H, Sauter G, Plass C, Brors B, Yaspo ML, Korbel JO, Schlomm T (2013) Integrative genomic analyses reveal an androgen-driven somatic alteration landscape in early-onset prostate cancer. Cancer Cell 23(2):159–170. https://doi.org/10.1016/j.ccr.2013.01.002
Shen R, Mo Q, Schultz N, Seshan VE, Olshen AB, Huse J, Ladanyi M, Sander C (2012) Integrative subtype discovery in glioblastoma using iCluster. PLoS One 7(4):e35236. https://doi.org/10.1371/journal.pone.0035236
Zeng T, Wang DC, Wang X, Xu F, Chen L (2014) Prediction of dynamical drug sensitivity and resistance by module network rewiring-analysis based on transcriptional profiling. Drug Resist Updates 17(3):64–76. https://doi.org/10.1016/j.drup.2014.08.002
Shi X, Shen S, Liu J, Huang J, Zhou Y, Ma S (2014) Similarity of markers identified from cancer gene expression studies: observations from GEO. Brief Bioinform 15(5):671–684. https://doi.org/10.1093/bib/bbt044
Shi X, Yi H, Ma S (2015) Measures for the degree of overlap of gene signatures and applications to TCGA. Brief Bioinform 16(5):735–744. https://doi.org/10.1093/bib/bbu049
Bebek G, Koyuturk M, Price ND, Chance MR (2012) Network biology methods integrating biological data for translational science. Brief Bioinform 13(4):446–459. https://doi.org/10.1093/bib/bbr075
Zhang S, Liu CC, Li W, Shen H, Laird PW, Zhou XJ (2012) Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Res 40(19):9379–9391. https://doi.org/10.1093/nar/gks725
Liu Y, Devescovi V, Chen S, Nardini C (2013) Multilevel omic data integration in cancer cell lines: advanced annotation and emergent properties. BMC Syst Biol 7:14. https://doi.org/10.1186/1752-0509-7-14
Hieke S, Benner A, Schlenl RF, Schumacher M, Bullinger L, Binder H (2016) Integrating multiple molecular sources into a clinical risk prediction signature by extracting complementary information. BMC Bioinformatics 17(1):327. https://doi.org/10.1186/s12859-016-1183-6
Shen R, Olshen AB, Ladanyi M (2009) Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25(22):2906–2912. https://doi.org/10.1093/bioinformatics/btp543
Wang W, Baladandayuthapani V, Morris JS, Broom BM, Manyam G, Do KA (2013) iBAG: integrative Bayesian analysis of high-dimensional multiplatform genomics data. Bioinformatics 29(2):149–159. https://doi.org/10.1093/bioinformatics/bts655
Yuan Y, Savage RS, Markowetz F (2011) Patient-specific data fusion defines prognostic cancer subtypes. PLoS Comput Biol 7(10):e1002227. https://doi.org/10.1371/journal.pcbi.1002227
Speicher NK, Pfeifer N (2015) Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery. Bioinformatics 31(12):i268–i275. https://doi.org/10.1093/bioinformatics/btv244
Narayanan M, Vetta A, Schadt EE, Zhu J (2010) Simultaneous clustering of multiple gene expression and physical interaction datasets. PLoS Comput Biol 6(4):e1000742. https://doi.org/10.1371/journal.pcbi.1000742
Kutalik Z, Beckmann JS, Bergmann S (2008) A modular approach for integrative analysis of large-scale gene-expression and drug-response data. Nat Biotechnol 26(5):531–539. https://doi.org/10.1038/nbt1397
Le Van T, van Leeuwen M, Carolina Fierro A, De Maeyer D, Van den Eynden J, Verbeke L, De Raedt L, Marchal K, Nijssen S (2016) Simultaneous discovery of cancer subtypes and subtype features by molecular data integration. Bioinformatics 32(17):i445–i454. https://doi.org/10.1093/bioinformatics/btw434
Seely JS, Kaufman MT, Ryu SI, Shenoy KV, Cunningham JP, Churchland MM (2016) Tensor analysis reveals distinct population structure that parallels the different computational roles of areas M1 and V1. PLoS Comput Biol 12(11):e1005164. https://doi.org/10.1371/journal.pcbi.1005164
Hore V, Vinuela A, Buil A, Knight J, McCarthy MI, Small K, Marchini J (2016) Tensor decomposition for multiple-tissue gene expression experiments. Nat Genet 48(9):1094–1100. https://doi.org/10.1038/ng.3624
Bersanelli M, Mosca E, Remondini D, Giampieri E, Sala C, Castellani G, Milanesi L (2016) Methods for the integration of multi-omics data: mathematical aspects. BMC Bioinformatics 17(Suppl 2):15. https://doi.org/10.1186/s12859-015-0857-9
Meng C, Zeleznik OA, Thallinger GG, Kuster B, Gholami AM, Culhane AC (2016) Dimension reduction techniques for the integrative analysis of multi-omics data. Brief Bioinform 17(4):628–641. https://doi.org/10.1093/bib/bbv108
Luo Y, Wang F, Szolovits P (2016) Tensor factorization toward precision medicine. Brief Bioinform. https://doi.org/10.1093/bib/bbw026
Vargas AJ, Harris CC (2016) Biomarker development in the precision medicine era: lung cancer as a case study. Nat Rev Cancer 16(8):525–537. https://doi.org/10.1038/nrc.2016.56
Lahti L, Schafer M, Klein HU, Bicciato S, Dugas M (2013) Cancer gene prioritization by integrative analysis of mRNA expression and DNA copy number data: a comparative review. Brief Bioinform 14(1):27–35. https://doi.org/10.1093/bib/bbs005
Genomes Project C, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491(7422):56–65. https://doi.org/10.1038/nature11632
Gerstein M (2012) Genomics: ENCODE leads the way on big data. Nature 489(7415):208. https://doi.org/10.1038/489208b
Nagano T, Lubling Y, Stevens TJ, Schoenfelder S, Yaffe E, Dean W, Laue ED, Tanay A, Fraser P (2013) Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature 502(7469):59–64. https://doi.org/10.1038/nature12593
Dekker J, Marti-Renom MA, Mirny LA (2013) Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data. Nat Rev Genet 14(6):390–403. https://doi.org/10.1038/nrg3454
Yun X, Xia L, Tang B, Zhang H, Li F, Zhang Z (2016) 3CDB: a manually curated database of chromosome conformation capture data. Database (Oxford). https://doi.org/10.1093/database/baw044
Teng L, He B, Wang J, Tan K (2016) 4DGenome: a comprehensive database of chromatin interactions. Bioinformatics 32(17):2727. https://doi.org/10.1093/bioinformatics/btw375
Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, Yefanov A, Lee H, Zhang N, Robertson CL, Serova N, Davis S, Soboleva A (2013) NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res 41(Database issue):D991–D995. https://doi.org/10.1093/nar/gks1193
Kim HS, Minna JD, White MA (2013) GWAS meets TCGA to illuminate mechanisms of cancer predisposition. Cell 152(3):387–389. https://doi.org/10.1016/j.cell.2013.01.027
International Cancer Genome C, Hudson TJ, Anderson W, Artez A, Barker AD, Bell C, Bernabe RR, Bhan MK, Calvo F, Eerola I, Gerhard DS, Guttmacher A, Guyer M, Hemsley FM, Jennings JL, Kerr D, Klatt P, Kolar P, Kusada J, Lane DP, Laplace F, Youyong L, Nettekoven G, Ozenberger B, Peterson J, Rao TS, Remacle J, Schafer AJ, Shibata T, Stratton MR, Vockley JG, Watanabe K, Yang H, Yuen MM, Knoppers BM, Bobrow M, Cambon-Thomsen A, Dressler LG, Dyke SO, Joly Y, Kato K, Kennedy KL, Nicolas P, Parker MJ, Rial-Sebbag E, Romeo-Casabona CM, Shaw KM, Wallace S, Wiesner GL, Zeps N, Lichter P, Biankin AV, Chabannon C, Chin L, Clement B, de Alava E, Degos F, Ferguson ML, Geary P, Hayes DN, Hudson TJ, Johns AL, Kasprzyk A, Nakagawa H, Penny R, Piris MA, Sarin R, Scarpa A, Shibata T, van de Vijver M, Futreal PA, Aburatani H, Bayes M, Botwell DD, Campbell PJ, Estivill X, Gerhard DS, Grimmond SM, Gut I, Hirst M, Lopez-Otin C, Majumder P, Marra M, McPherson JD, Nakagawa H, Ning Z, Puente XS, Ruan Y, Shibata T, Stratton MR, Stunnenberg HG, Swerdlow H, Velculescu VE, Wilson RK, Xue HH, Yang L, Spellman PT, Bader GD, Boutros PC, Campbell PJ, Flicek P, Getz G, Guigo R, Guo G, Haussler D, Heath S, Hubbard TJ, Jiang T, Jones SM, Li Q, Lopez-Bigas N, Luo R, Muthuswamy L, Ouellette BF, Pearson JV, Puente XS, Quesada V, Raphael BJ, Sander C, Shibata T, Speed TP, Stein LD, Stuart JM, Teague JW, Totoki Y, Tsunoda T, Valencia A, Wheeler DA, Wu H, Zhao S, Zhou G, Stein LD, Guigo R, Hubbard TJ, Joly Y, Jones SM, Kasprzyk A, Lathrop M, Lopez-Bigas N, Ouellette BF, Spellman PT, Teague JW, Thomas G, Valencia A, Yoshida T, Kennedy KL, Axton M, Dyke SO, Futreal PA, Gerhard DS, Gunter C, Guyer M, Hudson TJ, McPherson JD, Miller LJ, Ozenberger B, Shaw KM, Kasprzyk A, Stein LD, Zhang J, Haider SA, Wang J, Yung CK, Cros A, Liang Y, Gnaneshan S, Guberman J, Hsu J, Bobrow M, Chalmers DR, Hasel KW, Joly Y, Kaan TS, Kennedy KL, Knoppers BM, Lowrance WW, Masui T, Nicolas P, Rial-Sebbag E, Rodriguez LL, Vergely C, Yoshida T, Grimmond SM, Biankin AV, Bowtell DD, Cloonan N, deFazio A, Eshleman JR, Etemadmoghadam D, Gardiner BB, Kench JG, Scarpa A, Sutherland RL, Tempero MA, Waddell NJ, Wilson PJ, McPherson JD, Gallinger S, Tsao MS, Shaw PA, Petersen GM, Mukhopadhyay D, Chin L, DePinho RA, Thayer S, Muthuswamy L, Shazand K, Beck T, Sam M, Timms L, Ballin V, Lu Y, Ji J, Zhang X, Chen F, Hu X, Zhou G, Yang Q, Tian G, Zhang L, Xing X, Li X, Zhu Z, Yu Y, Yu J, Yang H, Lathrop M, Tost J, Brennan P, Holcatova I, Zaridze D, Brazma A, Egevard L, Prokhortchouk E, Banks RE, Uhlen M, Cambon-Thomsen A, Viksna J, Ponten F, Skryabin K, Stratton MR, Futreal PA, Birney E, Borg A, Borresen-Dale AL, Caldas C, Foekens JA, Martin S, Reis-Filho JS, Richardson AL, Sotiriou C, Stunnenberg HG, Thoms G, van de Vijver M, van't Veer L, Calvo F, Birnbaum D, Blanche H, Boucher P, Boyault S, Chabannon C, Gut I, Masson-Jacquemier JD, Lathrop M, Pauporte I, Pivot X, Vincent-Salomon A, Tabone E, Theillet C, Thomas G, Tost J, Treilleux I, Calvo F, Bioulac-Sage P, Clement B, Decaens T, Degos F, Franco D, Gut I, Gut M, Heath S, Lathrop M, Samuel D, Thomas G, Zucman-Rossi J, Lichter P, Eils R, Brors B, Korbel JO, Korshunov A, Landgraf P, Lehrach H, Pfister S, Radlwimmer B, Reifenberger G, Taylor MD, von Kalle C, Majumder PP, Sarin R, Rao TS, Bhan MK, Scarpa A, Pederzoli P, Lawlor RA, Delledonne M, Bardelli A, Biankin AV, Grimmond SM, Gress T, Klimstra D, Zamboni G, Shibata T, Nakamura Y, Nakagawa H, Kusada J, Tsunoda T, Miyano S, Aburatani H, Kato K, Fujimoto A, Yoshida T, Campo E, Lopez-Otin C, Estivill X, Guigo R, de Sanjose S, Piris MA, Montserrat E, Gonzalez-Diaz M, Puente XS, Jares P, Valencia A, Himmelbauer H, Quesada V, Bea S, Stratton MR, Futreal PA, Campbell PJ, Vincent-Salomon A, Richardson AL, Reis-Filho JS, van de Vijver M, Thomas G, Masson-Jacquemier JD, Aparicio S, Borg A, Borresen-Dale AL, Caldas C, Foekens JA, Stunnenberg HG, van't Veer L, Easton DF, Spellman PT, Martin S, Barker AD, Chin L, Collins FS, Compton CC, Ferguson ML, Gerhard DS, Getz G, Gunter C, Guttmacher A, Guyer M, Hayes DN, Lander ES, Ozenberger B, Penny R, Peterson J, Sander C, Shaw KM, Speed TP, Spellman PT, Vockley JG, Wheeler DA, Wilson RK, Hudson TJ, Chin L, Knoppers BM, Lander ES, Lichter P, Stein LD, Stratton MR, Anderson W, Barker AD, Bell C, Bobrow M, Burke W, Collins FS, Compton CC, DePinho RA, Easton DF, Futreal PA, Gerhard DS, Green AR, Guyer M, Hamilton SR, Hubbard TJ, Kallioniemi OP, Kennedy KL, Ley TJ, Liu ET, Lu Y, Majumder P, Marra M, Ozenberger B, Peterson J, Schafer AJ, Spellman PT, Stunnenberg HG, Wainwright BJ, Wilson RK, Yang H (2010) International network of cancer genome projects. Nature 464(7291):993–998. https://doi.org/10.1038/nature08987
Kozomara A, Griffiths-Jones S (2014) miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res 42(Database issue):D68–D73. https://doi.org/10.1093/nar/gkt1181
Quek XC, Thomson DW, Maag JL, Bartonicek N, Signal B, Clark MB, Gloss BS, Dinger ME (2015) lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res 43(Database issue):D168–D173. https://doi.org/10.1093/nar/gku988
Lebron R, Gomez-Martin C, Carpena P, Bernaola-Galvan P, Barturen G, Hackenberg M, Oliver JL (2017) NGSmethDB 2017: enhanced methylomes and differential methylation. Nucleic Acids Res 45(D1):D97–D103. https://doi.org/10.1093/nar/gkw996
Xin Y, Chanrion B, O'Donnell AH, Milekic M, Costa R, Ge Y, Haghighi FG (2012) MethylomeDB: a database of DNA methylation profiles of the brain. Nucleic Acids Res 40(Database issue):D1245–D1249. https://doi.org/10.1093/nar/gkr1193
Wishart DS, Jewison T, Guo AC, Wilson M, Knox C, Liu Y, Djoumbou Y, Mandal R, Aziat F, Dong E, Bouatra S, Sinelnikov I, Arndt D, Xia J, Liu P, Yallou F, Bjorndahl T, Perez-Pineiro R, Eisner R, Allen F, Neveu V, Greiner R, Scalbert A (2013) HMDB 3.0—the human metabolome database in 2013. Nucleic Acids Res 41(Database issue):D801–D807. https://doi.org/10.1093/nar/gks1065
Mitchell A, Bucchini F, Cochrane G, Denise H, ten Hoopen P, Fraser M, Pesseat S, Potter S, Scheremetjew M, Sterk P, Finn RD (2016) EBI metagenomics in 2016—an expanding and evolving resource for the analysis and archiving of metagenomic data. Nucleic Acids Res 44(D1):D595–D603. https://doi.org/10.1093/nar/gkv1195
Friedman A, Perrimon N (2007) Genetic screening for signal transduction in the era of network biology. Cell 128(2):225–231. https://doi.org/10.1016/j.cell.2007.01.007
Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell's functional organization. Nat Rev Genet 5(2):101–113. https://doi.org/10.1038/nrg1272
Goymer P (2008) Network biology: why do we need hubs? Nat Rev Genet 9(9):650
Hu JX, Thomas CE, Brunak S (2016) Network biology concepts in complex disease comorbidities. Nat Rev Genet 17(10):615–629. https://doi.org/10.1038/nrg.2016.87
New AM, Lehner B (2015) Systems biology: network evolution hinges on history. Nature 523(7560):297–298. https://doi.org/10.1038/nature14537
Chatr-Aryamontri A, Oughtred R, Boucher L, Rust J, Chang C, Kolas NK, O'Donnell L, Oster S, Theesfeld C, Sellam A, Stark C, Breitkreutz BJ, Dolinski K, Tyers M (2017) The BioGRID interaction database: 2017 update. Nucleic Acids Res 45(D1):D369–D379. https://doi.org/10.1093/nar/gkw1102
Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, Simonovic M, Roth A, Santos A, Tsafou KP, Kuhn M, Bork P, Jensen LJ, von Mering C (2015) STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 43(Database issue):D447–D452. https://doi.org/10.1093/nar/gku1003
Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K (2017) KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 45(D1):D353–D361. https://doi.org/10.1093/nar/gkw1092
Fabregat A, Sidiropoulos K, Garapati P, Gillespie M, Hausmann K, Haw R, Jassal B, Jupe S, Korninger F, McKay S, Matthews L, May B, Milacic M, Rothfels K, Shamovsky V, Webber M, Weiser J, Williams M, Wu G, Stein L, Hermjakob H, D'Eustachio P (2016) The reactome pathway knowledgebase. Nucleic Acids Res 44(D1):D481–D487. https://doi.org/10.1093/nar/gkv1351
Bohler A, Wu G, Kutmon M, Pradhana LA, Coort SL, Hanspers K, Haw R, Pico AR, Evelo CT (2016) Reactome from a WikiPathways perspective. PLoS Comput Biol 12(5):e1004941. https://doi.org/10.1371/journal.pcbi.1004941
Tyner C, Barber GP, Casper J, Clawson H, Diekhans M, Eisenhart C, Fischer CM, Gibson D, Gonzalez JN, Guruvadoo L, Haeussler M, Heitner S, Hinrichs AS, Karolchik D, Lee BT, Lee CM, Nejad P, Raney BJ, Rosenbloom KR, Speir ML, Villarreal C, Vivian J, Zweig AS, Haussler D, Kuhn RM, Kent WJ (2017) The UCSC Genome Browser database: 2017 update. Nucleic Acids Res 45(D1):D626–D634. https://doi.org/10.1093/nar/gkw1134
Koch A, De Meyer T, Jeschke J, Van Criekinge W (2015) MEXPRESS: visualizing expression, DNA methylation and clinical TCGA data. BMC Genomics 16:636. https://doi.org/10.1186/s12864-015-1847-z
van 't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530–536. https://doi.org/10.1038/415530a
Zeng T, Li J (2010) Maximization of negative correlations in time-course gene expression data for enhancing understanding of molecular pathways. Nucleic Acids Res 38(1):e1. https://doi.org/10.1093/nar/gkp822
Zeng T, Guo X, Liu J (2014) Negative correlation based gene markers identification in integrative gene expression data. Int J Data Min Bioinform 10(1):1–17
Deng M, Bragelmann J, Schultze JL, Perner S (2016) Web-TCGA: an online platform for integrated analysis of molecular cancer data sets. BMC Bioinformatics 17:72. https://doi.org/10.1186/s12859-016-0917-9
Huang Y, Zaas AK, Rao A, Dobigeon N, Woolf PJ, Veldman T, Oien NC, McClain MT, Varkey JB, Nicholson B, Carin L, Kingsmore S, Woods CW, Ginsburg GS, Hero AO III (2011) Temporal dynamics of host molecular responses differentiate symptomatic and asymptomatic influenza a infection. PLoS Genet 7(8):e1002234. https://doi.org/10.1371/journal.pgen.1002234
Brawand D, Soumillon M, Necsulea A, Julien P, Csardi G, Harrigan P, Weier M, Liechti A, Aximu-Petri A, Kircher M, Albert FW, Zeller U, Khaitovich P, Grutzner F, Bergmann S, Nielsen R, Paabo S, Kaessmann H (2011) The evolution of gene expression levels in mammalian organs. Nature 478(7369):343–348. https://doi.org/10.1038/nature10532
Leek JT, Storey JD (2007) Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet 3(9):1724–1735. https://doi.org/10.1371/journal.pgen.0030161
Manimaran S, Selby HM, Okrah K, Ruberman C, Leek JT, Quackenbush J, Haibe-Kains B, Bravo HC, Johnson WE (2016) BatchQC: interactive software for evaluating sample and batch effects in genomic data. Bioinformatics. https://doi.org/10.1093/bioinformatics/btw538
Vandenbon A, Dinh VH, Mikami N, Kitagawa Y, Teraguchi S, Ohkura N, Sakaguchi S (2016) Immuno-Navigator, a batch-corrected coexpression database, reveals cell type-specific gene networks in the immune system. Proc Natl Acad Sci U S A 113(17):E2393–E2402. https://doi.org/10.1073/pnas.1604351113
Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8(1):118–127. https://doi.org/10.1093/biostatistics/kxj037
Stein CK, Qu P, Epstein J, Buros A, Rosenthal A, Crowley J, Morgan G, Barlogie B (2015) Removing batch effects from purified plasma cell gene expression microarrays with modified ComBat. BMC Bioinformatics 16:63. https://doi.org/10.1186/s12859-015-0478-3
Reese SE, Archer KJ, Therneau TM, Atkinson EJ, Vachon CM, de Andrade M, Kocher JP, Eckel-Passow JE (2013) A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis. Bioinformatics 29(22):2877–2883. https://doi.org/10.1093/bioinformatics/btt480
Song R, Huang J, Ma S (2012) Integrative prescreening in analysis of multiple cancer genomic studies. BMC Bioinformatics 13:168. https://doi.org/10.1186/1471-2105-13-168
Huang X, Stern DF, Zhao H (2016) Transcriptional profiles from paired normal samples offer complementary information on cancer patient survival—evidence from TCGA pan-cancer data. Sci Rep 6:20567. https://doi.org/10.1038/srep20567
Hwang TH, Atluri G, Kuang R, Kumar V, Starr T, Silverstein KA, Haverty PM, Zhang Z, Liu J (2013) Large-scale integrative network-based analysis identifies common pathways disrupted by copy number alterations across cancers. BMC Genomics 14:440. https://doi.org/10.1186/1471-2164-14-440
Li Q, Seo JH, Stranger B, McKenna A, Pe'er I, Laframboise T, Brown M, Tyekucheva S, Freedman ML (2013) Integrative eQTL-based analyses reveal the biology of breast cancer risk loci. Cell 152(3):633–641. https://doi.org/10.1016/j.cell.2012.12.034
Peifer M, Fernandez-Cuesta L, Sos ML, George J, Seidel D, Kasper LH, Plenker D, Leenders F, Sun R, Zander T, Menon R, Koker M, Dahmen I, Muller C, Di Cerbo V, Schildhaus HU, Altmuller J, Baessmann I, Becker C, de Wilde B, Vandesompele J, Bohm D, Ansen S, Gabler F, Wilkening I, Heynck S, Heuckmann JM, Lu X, Carter SL, Cibulskis K, Banerji S, Getz G, Park KS, Rauh D, Grutter C, Fischer M, Pasqualucci L, Wright G, Wainer Z, Russell P, Petersen I, Chen Y, Stoelben E, Ludwig C, Schnabel P, Hoffmann H, Muley T, Brockmann M, Engel-Riedel W, Muscarella LA, Fazio VM, Groen H, Timens W, Sietsma H, Thunnissen E, Smit E, Heideman DA, Snijders PJ, Cappuzzo F, Ligorio C, Damiani S, Field J, Solberg S, Brustugun OT, Lund-Iversen M, Sanger J, Clement JH, Soltermann A, Moch H, Weder W, Solomon B, Soria JC, Validire P, Besse B, Brambilla E, Brambilla C, Lantuejoul S, Lorimier P, Schneider PM, Hallek M, Pao W, Meyerson M, Sage J, Shendure J, Schneider R, Buttner R, Wolf J, Nurnberg P, Perner S, Heukamp LC, Brindle PK, Haas S, Thomas RK (2012) Integrative genome analyses identify key somatic driver mutations of small-cell lung cancer. Nat Genet 44(10):1104–1110. https://doi.org/10.1038/ng.2396
Cancer Genome Atlas N (2012) Comprehensive molecular characterization of human colon and rectal cancer. Nature 487(7407):330–337. https://doi.org/10.1038/nature11252
Cancer Genome Atlas Research N (2013) Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499(7456):43–49. https://doi.org/10.1038/nature12222
Cancer Genome Atlas Research N (2014) Comprehensive molecular characterization of gastric adenocarcinoma. Nature 513(7517):202–209. https://doi.org/10.1038/nature13480
Cancer Genome Atlas Research N (2014) Comprehensive molecular characterization of urothelial bladder carcinoma. Nature 507(7492):315–322. https://doi.org/10.1038/nature12965
Cancer Genome Atlas N (2015) Comprehensive genomic characterization of head and neck squamous cell carcinomas. Nature 517(7536):576–582. https://doi.org/10.1038/nature14129
Cancer Genome Atlas Research N (2008) Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455(7216):1061–1068. https://doi.org/10.1038/nature07385
Cancer Genome Atlas Research N (2012) Comprehensive genomic characterization of squamous cell lung cancers. Nature 489(7417):519–525. https://doi.org/10.1038/nature11404
Akbani R, Ng PK, Werner HM, Shahmoradgoli M, Zhang F, Ju Z, Liu W, Yang JY, Yoshihara K, Li J, Ling S, Seviour EG, Ram PT, Minna JD, Diao L, Tong P, Heymach JV, Hill SM, Dondelinger F, Stadler N, Byers LA, Meric-Bernstam F, Weinstein JN, Broom BM, Verhaak RG, Liang H, Mukherjee S, Lu Y, Mills GB (2014) A pan-cancer proteomic perspective on The Cancer Genome Atlas. Nat Commun 5:3887. https://doi.org/10.1038/ncomms4887
Ciriello G, Gatza ML, Beck AH, Wilkerson MD, Rhie SK, Pastore A, Zhang H, McLellan M, Yau C, Kandoth C, Bowlby R, Shen H, Hayat S, Fieldhouse R, Lester SC, Tse GM, Factor RE, Collins LC, Allison KH, Chen YY, Jensen K, Johnson NB, Oesterreich S, Mills GB, Cherniack AD, Robertson G, Benz C, Sander C, Laird PW, Hoadley KA, King TA, Network TR, Perou CM (2015) Comprehensive molecular portraits of invasive lobular breast cancer. Cell 163(2):506–519. https://doi.org/10.1016/j.cell.2015.09.033
Cancer Genome Atlas N (2012) Comprehensive molecular portraits of human breast tumours. Nature 490(7418):61–70. https://doi.org/10.1038/nature11412
Drake JM, Paull EO, Graham NA, Lee JK, Smith BA, Titz B, Stoyanova T, Faltermeier CM, Uzunangelov V, Carlin DE, Fleming DT, Wong CK, Newton Y, Sudha S, Vashisht AA, Huang J, Wohlschlegel JA, Graeber TG, Witte ON, Stuart JM (2016) Phosphoproteome integration reveals patient-specific networks in prostate cancer. Cell 166(4):1041–1054. https://doi.org/10.1016/j.cell.2016.07.007
Cancer Genome Atlas Research N, Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, Ellrott K, Shmulevich I, Sander C, Stuart JM (2013) The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45(10):1113–1120. https://doi.org/10.1038/ng.2764
Neapolitan R, Horvath CM, Jiang X (2015) Pan-cancer analysis of TCGA data reveals notable signaling pathways. BMC Cancer 15:516. https://doi.org/10.1186/s12885-015-1484-6
Ruau D, Dudley JT, Chen R, Phillips NG, Swan GE, Lazzeroni LC, Clark JD, Butte AJ, Angst MS (2012) Integrative approach to pain genetics identifies pain sensitivity loci across diseases. PLoS Comput Biol 8(6):e1002538. https://doi.org/10.1371/journal.pcbi.1002538
Liu P, Sanalkumar R, Bresnick EH, Keles S, Dewey CN (2016) Integrative analysis with ChIP-seq advances the limits of transcript quantification from RNA-seq. Genome Res 26(8):1124–1133. https://doi.org/10.1101/gr.199174.115
Knouf EC, Garg K, Arroyo JD, Correa Y, Sarkar D, Parkin RK, Wurz K, O'Briant KC, Godwin AK, Urban ND, Ruzzo WL, Gentleman R, Drescher CW, Swisher EM, Tewari M (2012) An integrative genomic approach identifies p73 and p63 as activators of miR-200 microRNA family transcription. Nucleic Acids Res 40(2):499–510. https://doi.org/10.1093/nar/gkr731
Yan Z, Shah PK, Amin SB, Samur MK, Huang N, Wang X, Misra V, Ji H, Gabuzda D, Li C (2012) Integrative analysis of gene and miRNA expression profiles with transcription factor-miRNA feed-forward loops identifies regulators in human cancers. Nucleic Acids Res 40(17):e135. https://doi.org/10.1093/nar/gks395
Berghoff BA, Konzer A, Mank NN, Looso M, Rische T, Forstner KU, Kruger M, Klug G (2013) Integrative “omics”–approach discovers dynamic and regulatory features of bacterial stress responses. PLoS Genet 9(6):e1003576. https://doi.org/10.1371/journal.pgen.1003576
Kim M, Rai N, Zorraquino V, Tagkopoulos I (2016) Multi-omics integration accurately predicts cellular state in unexplored conditions for Escherichia coli. Nat Commun 7:13090. https://doi.org/10.1038/ncomms13090
Meng C, Helm D, Frejno M, Kuster B (2016) moCluster: identifying joint patterns across multiple omics data sets. J Proteome Res 15(3):755–765. https://doi.org/10.1021/acs.jproteome.5b00824
Wang B, Mezlini AM, Demir F, Fiume M, Tu Z, Brudno M, Haibe-Kains B, Goldenberg A (2014) Similarity network fusion for aggregating data types on a genomic scale. Nat Methods 11(3):333–337. https://doi.org/10.1038/nmeth.2810
Shi Q, Zhang C, Peng M, Yu X, Zeng T, Liu J, Chen L (2017) Pattern fusion analysis by adaptive alignment of multiple heterogeneous omics data. Bioinformatics. https://doi.org/10.1093/bioinformatics/btx176
Lee CH, Alpert BO, Sankaranarayanan P, Alter O (2012) GSVD comparison of patient-matched normal and tumor aCGH profiles reveals global copy-number alterations predicting glioblastoma multiforme survival. PLoS One 7(1):e30098. https://doi.org/10.1371/journal.pone.0030098
Xiao X, Moreno-Moral A, Rotival M, Bottolo L, Petretto E (2014) Multi-tissue analysis of co-expression networks by higher-order generalized singular value decomposition identifies functionally coherent transcriptional modules. PLoS Genet 10(1):e1004006. https://doi.org/10.1371/journal.pgen.1004006
Kersey PJ, Staines DM, Lawson D, Kulesha E, Derwent P, Humphrey JC, Hughes DS, Keenan S, Kerhornou A, Koscielny G, Langridge N, McDowall MD, Megy K, Maheswari U, Nuhn M, Paulini M, Pedro H, Toneva I, Wilson D, Yates A, Birney E (2012) Ensembl genomes: an integrative resource for genome-scale data from non-vertebrate species. Nucleic Acids Res 40(Database issue):D91–D97. https://doi.org/10.1093/nar/gkr895
Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, Sun Y, Jacobsen A, Sinha R, Larsson E, Cerami E, Sander C, Schultz N (2013) Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal 6(269):pl1. https://doi.org/10.1126/scisignal.2004088
He S, He H, Xu W, Huang X, Jiang S, Li F, He F, Bo X (2016) ICM: a web server for integrated clustering of multi-dimensional biomedical data. Nucleic Acids Res 44(W1):W154–W159. https://doi.org/10.1093/nar/gkw378
Xia J, Fjell CD, Mayer ML, Pena OM, Wishart DS, Hancock RE (2013) INMEX—a web-based tool for integrative meta-analysis of expression data. Nucleic Acids Res 41(Web Server issue):W63–W70. https://doi.org/10.1093/nar/gkt338
Tuncbag N, McCallum S, Huang SS, Fraenkel E (2012) SteinerNet: a web server for integrating ‘omic’ data to discover hidden components of response pathways. Nucleic Acids Res 40(Web Server issue):W505–W509. https://doi.org/10.1093/nar/gks445
Ovaska K, Laakso M, Haapa-Paananen S, Louhimo R, Chen P, Aittomaki V, Valo E, Nunez-Fontarnau J, Rantanen V, Karinen S, Nousiainen K, Lahesmaa-Korpinen AM, Miettinen M, Saarinen L, Kohonen P, Wu J, Westermarck J, Hautaniemi S (2010) Large-scale data integration framework provides a comprehensive view on glioblastoma multiforme. Genome Med 2(9):65. https://doi.org/10.1186/gm186
Krasnov GS, Dmitriev AA, Melnikova NV, Zaretsky AR, Nasedkina TV, Zasedatelev AS, Senchenko VN, Kudryavtseva AV (2016) CrossHub: a tool for multi-way analysis of The Cancer Genome Atlas (TCGA) in the context of gene expression regulation mechanisms. Nucleic Acids Res 44(7):e62. https://doi.org/10.1093/nar/gkv1478
Yu X, Li G, Chen L (2014) Prediction and early diagnosis of complex diseases by edge-network. Bioinformatics 30(6):852–859. https://doi.org/10.1093/bioinformatics/btt620
Zhang Q, Burdette JE, Wang JP (2014) Integrative network analysis of TCGA data for ovarian cancer. BMC Syst Biol 8:1338. https://doi.org/10.1186/s12918-014-0136-9
Zhu R, Zhao Q, Zhao H, Ma S (2016) Integrating multidimensional omics data for cancer outcome. Biostatistics 17(4):605–618. https://doi.org/10.1093/biostatistics/kxw010
Wang XV, Verhaak RG, Purdom E, Spellman PT, Speed TP (2011) Unifying gene expression measures from multiple platforms using factor analysis. PLoS One 6(3):e17691. https://doi.org/10.1371/journal.pone.0017691
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Yu, XT., Zeng, T. (2018). Integrative Analysis of Omics Big Data. In: Huang, T. (eds) Computational Systems Biology. Methods in Molecular Biology, vol 1754. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7717-8_7
Download citation
DOI: https://doi.org/10.1007/978-1-4939-7717-8_7
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-7716-1
Online ISBN: 978-1-4939-7717-8
eBook Packages: Springer Protocols