Integrative Networks Illuminate Biological Factors Underlying Gene–Disease Associations

Krishnan, Arjun; Taroni, Jaclyn N.; Greene, Casey S.

doi:10.1007/s40142-016-0102-5

Integrative Networks Illuminate Biological Factors Underlying Gene–Disease Associations

Genomics (S Williams, Section Editor)
Published: 09 September 2016

Volume 4, pages 155–162, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Current Genetic Medicine Reports Aims and scope Submit manuscript

Integrative Networks Illuminate Biological Factors Underlying Gene–Disease Associations

Download PDF

Arjun Krishnan¹^nAff5,
Jaclyn N. Taroni^2,3,4 &
Casey S. Greene^2,3,4^nAff6

408 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Purpose of Review

Integrative networks combine multiple layers of biological data into a model of how genes work together to carry out cellular processes. Such networks become more valuable as they become more context-specific, for example, by capturing how genes work together in a certain tissue or cell type. We discuss the applications of these networks to the study of human disease.

Recent Findings

Once constructed, these networks provide the means to identify broad biological patterns underlying genes associated with complex traits and diseases. We cover the different types of integrative networks that currently exist, and how such networks that encompass multiple biological layers are constructed. We highlight how specificity can be incorporated into the reconstruction of different types of biomolecular interactions between genes, using tissue specificity as a motivating example. We discuss examples of cases where networks have been applied to study human diseases and opportunities for new applications.

Summary

Integrative networks with specificity to tissue or other biological features provide new capabilities to researchers engaged in the study of human disease. We expect improved data and algorithms to continue and improve such networks, allowing them to provide more detailed and mechanistic predictions into the context-specific genetic etiology of common diseases.

Network-Guided Key Gene Discovery for a Given Cellular Process

Understanding multicellular function and disease with human tissue-specific networks

Article 27 April 2015

Modeling regulatory network topology improves genome-wide analyses of complex human traits

Article Open access 14 May 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Many biological concepts and their interactions can be represented as networks. Existing types of networks used in computational analyses include networks of correlation based on gene expression [1, 2], protein–protein interactions [3, 4], and even phenotype–phenotype interactions [5, 6]. Some networks [7, 8•] incorporate data from multiple layers of regulation, including the interactome, transcriptome, and proteome.

In addition to data integration, a significant frontier for biological networks has been to include various forms of context specificity such as capturing interactions related to a certain process [9, 10] in a specific tissue [8•, 11, 12]. This specificity has been achieved by either overlaying multiple resources of interactions [11] or employing a machine learning process that captures context specificity [8•].

Biological networks, once constructed, provide a unique resource for the study of human disease. In many cases, phenotypes are expected to arise due to changes in how information flows through a biological system. For example, somatic mutation events in pancreatic tissue that constitutively activate KRas provide a constitutive growth signal to the system that can result in uncontrolled proliferation and pancreatic ductal adenocarcinoma [13, 14]. Large-scale biological networks serve as systems-level molecular scaffolds of cells on which researchers can locate known disease-associated genes, interpret their relationships in the context of other genes, and gain insights into how these genes might be involved in the disease. Given that the genetic bases of most complex diseases are poorly characterized, these networks also provide a genomic framework for identifying novel genes linked to diseases based on their patterns of network connectivity. These ‘interpretive’ and ‘predictive’ modes are often used in tandem with one cyclically feeding the other toward filling gaps in our knowledge of disease biology.

Integrated and Multi-Omic Networks

Biological networks can be constructed in many different ways and from many types of data. Correlation networks, for example, are one of the first large-scale models of gene interactions built solely based on patterns of shared gene expression and can be used to suggest opportunities for drug development or repurposing [2]. These can be global- or tissue-specific [15]. Integrative, also called multi-omic, approaches, on the other hand, combine data types across levels of biological regulation. One way to combine multiple data types is to overlay distinct information from separate data types. Okada et al. [16] combined data from text mining, risk-associated variants, protein–protein networks, molecular pathways, mouse phenotypes, and other sources to identify biological support for potential rheumatoid arthritis drugs. Such approaches look for broad support for the captured interactions based on evidence in all (intersection) or at least one (union) type of data without explicitly modeling distinct layers of data together.

Approaches that model multiple data types vary in how much they condition on known biological processes. Those that use documented regulatory patterns, e.g., one protein inhibits transcription of another gene, can reach a high level of detail in the constructed models. PARADIGM is an example of this type of approach. The method uses biological regulatory patterns to integrate multi-omic cancer measurements into a model that estimates the activity of individual proteins or pathways in a given tumor biopsy [17]. PARADIGM takes advantage of multiple data layers for the same samples. This makes it and the methods like it well suited to the analysis of cancer genomics data where such opportunities are plentiful, but such methods are not as well suited to the analysis of large public data compendia. In these collections, such matched data are often unavailable.

Methods that integrate public data into networks can first transform these datasets into scores for each gene pair. Some data types, such as protein–protein interactions, are already pairwise: edges exist for pairs of genes that encode proteins that physically interact. Other data types must be wrangled into pairwise relationships. For example, genome-wide expression information can be converted to correlations between each gene pair. Once converted to pairwise scores, machine learning methods allow researchers to combine different datasets, including gene expression and protein–protein interaction information, into a single network [9, 10, 18, 19]. These integrated networks capture broad biological processes and can be used for gene function prediction or other tasks [18, 20, 21]. In this review, we discuss their applications that aid in understanding the genetic and genomic basis of human phenotypes.

Incorporating Tissue Specificity into Networks

Constructing the biological gene networks specific to the hundreds of tissues and cell types in multicellular organisms is a major goal toward applying network biology to higher organisms. Pursuing this goal is to realize Waddington’s vision of “the complex system of interactions”—pegs representing genes and strings representing their “chemical tendencies”—underlying developmental landscape pulled by interactions anchored to genes [22]. This goal is being actively pursued using a variety of approaches addressing different facets of the challenge such as capturing specific types of gene interactions (e.g., functional, physical, or regulatory) and expanding the coverage of these networks to all genes in the genome and all tissues and cell types in the body.

Overlaying gene (co-)expression, obtained from samples of a particular tissue, on protein–protein interaction (PPI) networks has been a straightforward and popular way to generate tissue-specific molecular networks [23•]. Magger et al. [11] have shown that tissue-specific PPI constructed in this manner are valuable for disease gene prediction using label-propagation methods. Comprehensive curation and comparison of multiple sources of tissue-gene expression have since been carried out [24, 25], results from which can be used to create tissue-specific PPI for an expanded set of genes and tissues. As an example, Cornish et al. [26] have incorporated more tissues and cell types to show that how tightly interconnected disease genes are in such tissue-specific networks and can reveal interesting links between diseases and tissues. While comparing favorably with known associations from the literature, their results also point to novel tissue disease links including one implicating mast cells in multiple sclerosis. These ‘data-overlay’ approaches succeed when many high-quality documented annotations of gene expression and interaction are available.

Tissue-specific networks can also be generated using tissue-specific genome-wide data using approaches that do not require numerous documented annotations. Regulatory networks with tissue specificity have been constructed by mapping transcription factor binding sites to open chromatin regions in different tissues/cell types identified based on DNase I sensitivity [27]. Alternative efforts have inferred networks by employing tissue-expressed promoter and enhancer elements [28•]. Still, others have developed inference methods to mine functional [29] or regulatory relationships [30] from large amounts of gene expression data from a single tissue. Such approaches are particularly well suited when the cells being assayed (on a large scale) most closely reflect the situation in complex human tissues.

However, large troves of existing genomic data available in public repositories, especially hundreds of thousands of gene expression samples, are not resolved to tissues and cell types. This represents a threefold problem: (1) Many datasets are not annotated to the tissue/cell type of origin due to unclear or entirely missing metadata; (2) Most samples are cell type/tissue mixtures; and (3) Many cell types are hard to isolate at all or enough to profile gene expression. This challenge hence requires alternative approaches for integration and network inference.

A third type of approach aimed at addressing this challenge relies on applying machine learning methods to simultaneously extract tissue and functional signal from data compendia representing heterogeneous tissue collections. Networks generated by these methods aim to be complete and predictive in general; however, specific edges may be difficult to interpret. Work in Caenorhabditis elegans [31] and Mus musculus [32] demonstrated the potential viability of these approaches. Recently, tissue-specific networks constructed in this manner were generated for humans as well [8•]. To perform this analysis, the authors developed gold standards of expression for 144 human tissues. They combined this with a gold standard of relationships within cellular pathways to generate 144 tissue-specific gold standards. The machine learning process shown in Fig. 1 constructs tissue-specific models from more than 32,000 experiments, most of which were not annotated or resolved to specific tissues. The models were then used to produce tissue-specific networks for each tissue. Comparison of tissue-specific networks constructed in this manner to those constructed only on tissue-specific data revealed consistent improvements with the machine learning-based integration of the complete data compendium. In addition to improved performance, the machine learning approach could also be applied to generate networks for more tissues than the integration restricted to tissue-specific data.

Applying Tissue-Specific Networks to Study Human Disease

Human diseases are complex, and it is now clear that in many cases, weak association signals are revealing broad networks of variants associated with such phenotypes [8•, 33]. These may be numerous variants of weak effect acting additively, or the single-variant projections of complex epistatic disease models [34, 35]. In either case, the challenge now is to identify the biological signal hidden in a noisy set of statistical associations [36].

Networks provide a mean to investigate these associations and potentially separate the disease-associated signal from statistical noise. Network-based approaches to identify and interpret disease associations are demonstrating success across diverse diseases including coronary artery disease [37], hypertension [8•], cancer [38–40], multiple sclerosis [41], Alzheimer’s disease [42], and autism spectrum disorder [43•]. Network-based approaches also present unique challenges—e.g., genes with more single nucleotide polymorphisms (SNPs) are also more likely to be highly connected within the network [44]. Approaches that employ network-based analysis should carefully evaluate and control such biases.

In addition to considering networks, tissue specificity of either causes or symptoms is a key feature of many diseases, and work by Lage et al. demonstrated the importance of considering tissues [45•]. The importance of tissue specificity has carried over to recent work that examines many genetic associations for a specific disease, for example, rheumatoid arthritis is an autoimmune disease. Walsh et al. [46] recently demonstrated that genetic associations with the disease reveal how those variants can impact cell lineage-specific regulation to contribute to the etiology of the disease. There are a number of strategies for incorporating tissue specificity into the network analysis of disease. We review four approaches that use network analysis of associations of SNPs, gene expression, or exome sequencing to identify factors underlying complex diseases.

Tissue-specific networks can identify biologic commonalities underlying variants associated with human phenotypes and use those commonalities to identify disease-associated genes by their network connectivity patterns. In Greene et al. [8•], the authors constructed 144 tissue-specific networks of predicted intra-pathway relationships (Fig. 1). They then developed a procedure called the network-wide association study (NetWAS). The NetWAS trains a classifier to identify genes that should be associated with a disease based on their network connectivity patterns. Specifically, genes with a nominal association are used as positives for a machine learning algorithm and those without a nominal association are used as negatives. A classifier is then trained with these labels and all pairwise network edges as features. The classifier is then applied back to the network to identify genes with edges that indicate potential associations. This approach outperformed a hypertension genome-wide association study (GWAS) alone, and the network-prioritization process also ranked the gene targets of antihypertensive drugs more highly than GWAS. This suggests that association-guided network-based approaches may also aid drug development and repurposing.

Tissue- and cell type-specific networks that integrate a large amount of data can be particularly valuable for the study of rare diseases. For instance, systemic sclerosis (SSc) is a rare autoimmune disease characterized by fibrosis in skin and internal organs (e.g., pulmonary fibrosis). The challenges of studying of SSc at the molecular level are common to many rare diseases: sample sizes tend to be small and internal organ biopsies are often difficult to obtain. Thus, there is a critical need for leveraging additional biological data to make inferences about rare disease pathobiology. Taroni et al. [47•] mined gene expression spanning multiple tissues and clinical manifestations in SSc to derive a disease-associated gene set used to query the tissue-specific networks from Greene et al. [8•]. The authors then performed differential network analysis on skin- and lung-specific networks to compare disease signal in the major organ systems affected by fibrosis. The authors first identified gene–gene interactions that were highly specific to each tissue by subtracting the global edge weights from the skin and lung networks. Then, these highly tissue-specific edges were compared to identify functional differences. They found a set of edges highly specific to lung and suggestive of a distinct macrophage phenotype in lung as compared to what could be inferred from SSc skin gene expression data. The authors also developed a cell type-aware multi-network approach that detected genes which preferentially downregulated in skin during improvement of immunomodulatory treatment. This work suggests that tissue- and cell type-specific functional genomic networks can provide insight into rare disease processes that are difficult to capture experimentally.

The advent of high-throughput sequencing has enabled the discovery of regulatory variants relevant to disease especially through studying the genetics of gene/protein expression [48]. Projects such as the GTEx [49, 50] and FANTOM5 [51] have used this approach to highlight regulatory variants that are specific to or shared across tissues. Analogous to tissue-specific functional networks, tissue-specific regulatory networks that can be inferred from such data have the potential to delineate the role of regulatory variants in specific tissues. Marbach and colleagues [28•] inferred tissue-specific regulatory networks by overlaying transcription factor binding site occurrences over promoter and enhancer elements detected in hundreds of tissues/cell types in the FANTOM5 project. They then used these networks to assess links between diseases and tissues by calculating how surprisingly tightly connected disease genes (implicated by GWAS) were in a particular tissue regulatory network (termed connectivity enrichment analysis). This analysis entails ranking genes by their summarized GWAS p-values, calculating the average connectivity of the genes above each rank along the list, and estimating the area under the curve (AUC) of connectivity as a function of rank. A connectivity score is finally reported based on the comparison of the observed AUC with an empirical distribution of AUCs generated by performing the analysis on thousands of permuted rankings. By performing this analysis across multiple diseases and networks, the authors find that disease genes are indeed tightly clustered in tissues relevant to the disease.

Traditional network-based disease gene prediction algorithms take as input either only known high-confidence disease genes or all genes irrespective of evidence. Krishnan and colleagues [43•] have recently developed an evidence-weighted approach that incorporates the trust with which disease genes are known from the sources of evidence be it high-confidence candidates identified in sequencing studies or weak associations mined from literature. They used this approach in conjunction with the brain-specific network, developed previously [8•], to discover novel candidate genes associated with autism spectrum disorder (ASD). Intuitively, the underlying machine learning method learns the pattern of network connectivity characteristic of known ASD genes (weighted proportional to their evidence) and then identifies other genes in the network that highly resemble known genes in their network pattern. The method is highly general and can be applied to any complex disease to predict new genes based all the genes previously identified in single exome/whole-genome sequencing studies or collated from multiple datasets.

Taken together, integrative networks serve as powerful means to both interpret existing knowledge about complex physiology and disease, while also offering data-driven predictions that point to uncharted territory, be it novel genes associated with diseases, new pathway memberships, key regulators of disease genes/pathways, surprising associations between tissues and diseases, or differential effects of disease on different tissues. Based on their interest and scope, experimental/biomedical researchers use these network-based tools to identify a small number of targets for careful investigation or to gather a prioritized list of candidates to guide further large-scale genetic screens.

The Road Ahead

A major bottleneck in building accurate tissue- and cell type-specific networks is the extreme scarcity of prior knowledge about tissue-specific genes and interactions based on high-quality experimental evidence. As we amass even a few scores of such specific genes and interactions across under-studied tissues/cell types, they can be used to train better machine learning classifiers and serve as reliable evaluation standards. A lesser but still significant challenge is the unavailability of large-scale genomic data uniformly from normal and disease states across a wide range of tissues/cell types. Such biases in input datasets, along with scarce prior knowledge, creep into the final inferred networks for different tissues/cell types, affecting their robustness and genomic coverage. In contrast, rare examples of massive efforts focused on a single tissue/cell type would help us describe the scope of data needed to construct better networks. Such efforts are fueled by a large collection of data from a single tissue/cell type including transcriptional profiles across a range of different genetic perturbations and time points, sometimes further profiling protein–protein or protein–DNA binding on a large scale, which together facilitate integrative methods to reconstruct accurate networks [52, 53]. Overall, both approaches—broadly covering many tissues and deeply covering single cell types—have enormous merits to be pursued simultaneously and, ideally, feed into each other.

If sufficient gene expression data are available, at least for multiple tissues in healthy and a particular disease condition, it may be possible to utilize differential network analysis to study across tissues akin to Taroni et al. [47•]. We illustrate with a hypothetical example centered on blood and a disease end-target tissue, kidney, as an example for how this strategy may be used to gain insight into disease mechanisms. Tissue- and condition-specific networks could be learned from four compendia of data (disease blood, healthy blood, disease kidney, and healthy kidney), using the same functional standards and then compared. If an intra-pathway relationship exists in both disease and healthy blood, but that intra-pathway relationship is absent in healthy kidney when compared to disease, we might infer that a leukocyte population is present in the diseased tissues. Furthermore, if the intra-pathway relationships identified in blood were connected to other pathways in diseased tissues (blood and kidney) alone that would suggest immune infiltrate with a particular phenotype in diseased kidney. With data available on genetic variants, it could also be possible to link cell type-specific changes in networks to underlying causal variants [54]. These multi-tissue differential network analyses could then guide further analyses and experiments centered on this complex multi-cell lineage disease process.

In addition to the gene networks specific to individual tissues, another milestone for the biological networks is the ability to capture cross-tissue interactions. Much of human physiology relies on biochemical interactions such as hormonal signaling and immune response that span across tissues. Capturing these interactions requires particular data from multiple tissues of matched individuals. Dobrin et al. [55] proposed one of the earliest approaches toward this goal by calculating gene–gene correlation across hypothalamus, liver, and adipose tissues in mouse and using the resulting networks to study molecular crosstalk between these tissues related to obesity. However, coordinated changes in gene expression across tissues can arise due to a common underlying genetic or regulatory mechanism. Such mechanisms could induce changes in both tissues, without reflecting bona-fide cross-tissue interactions. The recently produced large multi-tissue, multi-individual gene expression data from the GTEx consortium helps to address this challenge. Long and colleagues [56] calculated inter-tissue interactions while carefully controlling for the identical genetic regulation. Using these inter-tissue networks, they highlight specific signaling links between heart and whole-blood or lung. Similar to the methods described above for constructing tissue-specific networks, another recent study [57] has taken a network-overlay approach to infer cross-tissue interactions. In this study, Ramilowski and colleagues overlay tissue/cell type gene expression data from the FANTOM5 database onto known physical interactions between ligands and receptors, identifying cases where the ligand is expressed in one tissue and the receptor in another. While these methods are beginning to shine light on important aspects of human biology, the ground is fertile for novel integrative methods that can construct such cross-tissue maps on a large scale.

As network-based approaches are gaining use in genetic association analysis of individual complex diseases, another natural leap is to the domain of multiple-phenotype analyses. Since many genes and pathways have pleiotropic effects, i.e., their activity can alter multiple phenotypes, phenome-wide association studies (PheWAS) allow for this additional information to be incorporated. Complementary to identifying multiple variants associated with a trait of interest using GWAS, PheWAS interrogates the association of a variant of interest with a range of traits/phenotypes. Phenotype data for such an analysis are abundant, albeit in a noisy manner, in electronic health records (EHRs) of thousands of patients. If these patients have also been genotyped, PheWAS can be used to link variation at those locations to EHR-derived phenotypes or clinical outcomes that vary in the patient population [58]. Denny and colleagues [59] have applied this approach to specifically discover several novel pleiotropic variants associated with multiple phenotypes, a feat that is challenging in a disease-centric setting as in GWAS. Although a few previous studies have used a disease network (inferred based on phenotypic similarity) in combination with a gene network to model many-to-many disease–gene associations [60], much work is needed to develop network-based approaches that can complement PheWAS in identifying the tissue-specific effects of pleiotropic genes.

Most networks approaches developed to date have dealt with a single data type or a single integrated portrait from multiple data types over a common biological entity, the gene. New methods are being developed that incorporate multiple network types into a single heterogeneous network (‘hetnet’) [41]. These methods have helped to prioritize disease-associated genes [41], and the ongoing Project Rephetio [61] provides a promising method for drug repurposing using these data. Although hetnets are in their early days, they may provide a powerful mean to develop algorithms that can consider multiple biological entities simultaneously to identify the basis of human phenotypes.

Conclusions

Progress to date on network-based methods to identify the basis of human phenotypes has been promising. The initial step, connecting genetic variants to the gene that they affect remains challenging [62]. Advances in this domain will naturally translate to improvements for network-based methods which generally focus on genes. In addition to the supervised methods to construct networks highlighted in this review, new and powerful unsupervised learning approaches may allow us to construct networks in cases where biological knowledge is limited or unavailable [63]. While we anticipate that large-scale integrated networks will make their first contributions at the level of identifying a shared genetic or pathway basis behind observed associations, we look forward to detailed networks that can suggest a mechanistic hypothesis. These networks will require the combination of new algorithms and analytical methods as well as detailed data in targeted domains. Although more remains to be done, progress in this area is encouraging, and we look forward to advances in the years to come.

References

Papers of particular interest, published recently, have been highlighted as: • Of importance

Langfelder P, et al. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 2008;9(1):559.
Article Google Scholar
Penrod NM, Greene CS, Moore JH. Predicting targeted drug combinations based on Pareto optimal patterns of coexpression network connectivity. Genome Med. 2014;6(4):33.
Article PubMed PubMed Central Google Scholar
Rual J-F, et al. Towards a proteome-scale map of the human protein–protein interaction network. Nature. 2005;437(7062):1173–8.
Article CAS PubMed Google Scholar
Menche J, Sharma A, Kitsak M, Ghiassian SD, Vidal M, Loscalzo J, Barabási A-L. Disease networks. Uncovering disease-disease relationships through the incomplete interactome. Science. 2015;347(6224):1257601.
Article PubMed PubMed Central Google Scholar
Goh K-I, Cusick ME, Valle D, Childs B, Vidal M, Barabási A-L. The human disease network. Proc Natl Acad Sci USA. 2007;104(2007):8685–90.
Article CAS PubMed PubMed Central Google Scholar
Li L, et al. Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Sci Trans Med. 2015;7(311):311ra174.
Article Google Scholar
Wong AK, Park CY, Greene CS, Bongo LA, Guan Y, Troyanskaya OG. IMP: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks. Nucleic Acids Res. 2012;40(Web Server):W484–90.
Article CAS PubMed PubMed Central Google Scholar
• Greene CS, et al. Understanding multicellular function and disease with human tissue-specific networks. Nat Genet. 2015;47(6):569–76. The authors build tissue-specific networks for 144 tissues and show that network-based reprioritization of genetic associations can better identify disease genes than the underlying genome-wide association study.
Myers CL, Robson D, Wible A, Hibbs MA, Chiriac C, Theesfeld CL, Dolinski K, Troyanskaya OG. Discovery of biological networks from diverse functional genomic data. Genome Biol. 2005;6(13):R114.
Article PubMed PubMed Central Google Scholar
Huttenhower C, Haley EM, Hibbs MA, Dumeaux V, Barrett DR, Coller HA, Troyanskaya OG. Exploring the human genome with functional maps. Genome Res. 2009;19(6):1093–106.
Article CAS PubMed PubMed Central Google Scholar
Magger O, Waldman YY, Ruppin E, Sharan R. Enhancing the prioritization of disease-causing genes through tissue specific protein interaction networks. PLoS Comput Biol. 2012;8(9):e1002690.
Article CAS PubMed PubMed Central Google Scholar
Goya J, Wong AK, Yao V, Krishnan A, Homilius M, Troyanskaya OG. FNTM: a server for predicting functional networks of tissues in mouse. Nucleic Acids Res. 2015;43(W1):W182–7.
Article PubMed PubMed Central Google Scholar
Eser S, Schnieke A, Schneider G, Saur D. Oncogenic KRAS signalling in pancreatic cancer. Br J Cancer. 2014;111(5):817–22.
Article CAS PubMed PubMed Central Google Scholar
Bryant KL, Mancias JD, Kimmelman AC, Der CJ. KRAS: feeding pancreatic cancer proliferation. Trends Biochem Sci. 2014;39(2):91–100.
Article CAS PubMed PubMed Central Google Scholar
Pierson E, et al. Sharing and specificity of co-expression networks across 35 human tissues. PLoS Comput Biol. 2015;11(5):e1004220.
Article PubMed PubMed Central Google Scholar
Okada Y, et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature. 2014;506(7488):376–81.
Article CAS PubMed Google Scholar
Vaske CJ, Benz SC, Sanborn JZ, Earl D, Szeto C, Zhu J, Haussler D, Stuart JM. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics. 2010;26(12):i237–45.
Article CAS PubMed PubMed Central Google Scholar
Mostafavi S, Ray D, Warde-Farley D, Grouios C, Morris Q. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol. 2008;9(Suppl 1):S4.
Article PubMed PubMed Central Google Scholar
Warde-Farley D, et al. The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res. 2010;38(Web Server):W214–20.
Article CAS PubMed PubMed Central Google Scholar
Guan Y, Myers CL, Hess DC, Barutcuoglu Z, Caudy AA, Troyanskaya OG. Predicting gene function in a hierarchical context with an ensemble of classifiers. Genome Biol. 2008;9(Suppl 1):S3.
Article PubMed PubMed Central Google Scholar
Park CY, Wong AK, Greene CS, Rowland J, Guan Y, Bongo LA, Burdine RD, Troyanskaya OG. Functional knowledge transfer for high-accuracy prediction of under-studied biological processes. PLoS Comput Biol. 2013;9(3):e1002957.
Article CAS PubMed PubMed Central Google Scholar
Waddington CH. The strategy of the genes. London: Allen & Unwin; 1957.
Google Scholar
• Bossi A, Lehner B. Tissue specificity and the human protein interaction network. Mol Syst Biol. 2009;5:260. The authors perform one of the first studies to comprehensively analyze tissue-specificity of protein physical interactions.
Santos A, Tsafou K, Stolte C, Pletscher-Frankild S, O’Donoghue SI, Jensen LJ. Comprehensive comparison of large-scale tissue expression datasets. PeerJ. 2015;3(2015):e1054.
Article PubMed PubMed Central Google Scholar
Uhlén M, Hallström BM, Lindskog C, Mardinoglu A, Pontén F, Nielsen J. Transcriptomics resources of human tissues and organs. Mol Syst Biol. 2016;12(4):862.
Article PubMed PubMed Central Google Scholar
Cornish AJ, Filippis I, David A, Sternberg MJE. Exploring the cellular basis of human disease through a large-scale mapping of deleterious genes to cell types. Genome Med. 2015;7(2015):95.
Article PubMed PubMed Central Google Scholar
Neph S, Stergachis AB, Reynolds A, Sandstrom R, Borenstein E, Stamatoyannopoulos JA. Circuitry and dynamics of human transcription factor regulatory networks. Cell. 2012;150(6):1274–86.
Article CAS PubMed PubMed Central Google Scholar
• Marbach D, Lamparter D, Quon G, Kellis M, Kutalik Z, Bergmann S. Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nat Methods. 2016;13(4):366–70. The authors assemble 394 human tissue-specific gene regulatory networks and integrate them with 37 genome-wide association studies to corroborate that disease genes are tightly connected in tissues relevant to the disease.
Dowell KG, Simons AK, Wang ZZ, Yun K, Hibbs MA. Cell-type-specific predictive network yields novel insights into mouse embryonic stem cell self-renewal and cell fate. PloS one. 2013;8(2):e56810.
Article CAS PubMed PubMed Central Google Scholar
Lefebvre C, et al. A human B-cell interactome identifies MYB and FOXM1 as master regulators of proliferation in germinal centers. Mol Syst Biol. 2010;6(2010):377.
PubMed PubMed Central Google Scholar
Chikina MD, Huttenhower C, Murphy CT, Troyanskaya OG. Global prediction of tissue-specific gene expression and context-dependent gene networks in Caenorhabditis elegans. PLoS Comput Biol. 2009;5(6):e1000417.
Article PubMed PubMed Central Google Scholar
Guan Y, Gorenshteyn D, Burmeister M, Wong AK, Schimenti JC, Handel MA, Bult CJ, Hibbs MA, Troyanskaya OG. Tissue-specific functional networks for prioritizing phenotype and disease genes. PLoS Comput Biol. 2012;8(9):e1002694.
Article CAS PubMed PubMed Central Google Scholar
Sarkar AK, Ward LD, Kellis M. Functional enrichments of disease variants across thousands of independent loci in eight diseases. BioRxiv. 2016. doi:10.1101/048066.
Google Scholar
Greene CS, Penrod NM, Williams SM, Moore JH. Failure to replicate a genetic association may provide important clues about genetic architecture. PLoS One. 2009;4(6):7.
Article Google Scholar
Zuk O, Hechter E, Sunyaev SR, Lander ES. The mystery of missing heritability: genetic interactions create phantom heritability. Proc Natl Acad Sci. 2012;109(4):1193–8.
Article CAS PubMed PubMed Central Google Scholar
Williams SM, Haines JL. Correcting away the hidden heritability. Ann Hum Genet. 2011;75(3):348–50.
Article PubMed Google Scholar
Zhao Y, Chen J, Freudenberg JM, Meng Q, Rajpal DK, Yang X. Network-based identification and prioritization of key regulators of coronary artery disease locisignificance. Arterioscler Thromb Vasr Biol. 2016;36(5):928–41.
Article CAS Google Scholar
Hofree M, Shen JP, Carter H, Gross A, Ideker T. Network-based stratification of tumor mutations. Nat Methods. 2013;10(11):1108–15.
Article CAS PubMed PubMed Central Google Scholar
AlQuraishi M, Koytiger G, Jenney A, MacBeath G, Sorger PK. A multiscale statistical mechanical framework integrates biophysical and genomic data to assemble cancer networks. Nat Genet. 2014;46(12):1363–71.
Article CAS PubMed PubMed Central Google Scholar
Qian DC, et al. Identification of shared and unique susceptibility pathways among cancers of the lung, breast, and prostate from genome-wide association studies and tissue-specific protein interactions. Hum Mol Genet. 2015;24(25):7406–20.
Article CAS PubMed Google Scholar
Himmelstein DS, Baranzini SE. Heterogeneous network edge prediction: a data integration approach to prioritize disease-associated genes. PLOS Comput Biol. 2015;11(7):e1004259.
Article PubMed PubMed Central Google Scholar
Song A, Yan J, Kim S, Risacher SL, Wong AK, Saykin AJ, Shen L, Greene CS. Network-based analysis of genetic variants associated with hippocampal volume in Alzheimer’s disease: a study of ADNI cohorts. BioData Min. 2016;9(1):3.
Article PubMed PubMed Central Google Scholar
• Krishnan A, Zhang R, Yao V, Theesfeld CL, Wong AK, Tadych A, Volfovsky N, Packer A, Lash A, Troyanskaya OG. Genome-wide characterization of genetic and functional dysregulation in autism spectrum disorder. Nat Neurosci. 2016. The authors use a human brain-specific gene network to predict autism-associated genes across the genome and further elucidate their functional and developmental characteristics.
Greene CS, Himmelstein DS. Genetic association-guided analysis of gene networks for the study of complex traits. Circ Cardiovasc Genet. 2016;9(2):179–84.
Article PubMed Google Scholar
• Lage K, Hansen NT, Karlberg EO, Eklund AC, Roque FS, Donahoe PK, Szallasi Z, Jensen TS, Brunak S. A large-scale analysis of tissue-specific pathology and gene expression of human disease genes and complexes. Proc Natl Acad Sci. 2008;105(52):20870–75. The authors link diseases to tissues on a large-scale based on literature mining and further validate these links by showing that disease genes are over-expressed in the affected tissues.
Walsh AM, et al. Integrative genomic deconvolution of rheumatoid arthritis GWAS loci into gene and cell type associations. Genome Biol. 2016;17(1):79.
Article PubMed PubMed Central Google Scholar
• Taroni JN, Mahoney JM, Greene CS, Martyanov V, Wood TA, Christmann R, Farber HW, Lafyatis RA, Denton CP, Hinchcliff ME, Pioli PA, Whitfield ML. A novel network approach reveals tissue-specific cellular modulators of the immune-fibrotic axis in systemic sclerosis. BioRxiv. 2016. The authors use multiple tissue-specific networks to prioritize genes and molecular processes perturbed in the rare autoimmune disease systemic sclerosis.
Albert FW, Kruglyak L. The role of regulatory variation in complex traits and disease. Nat Rev Genet. 2015;16(4):197–212.
Article CAS PubMed Google Scholar
Lonsdale J, et al. The genotype-tissue expression (gtex) project. Nat Genet. 2013;45(6):580–5.
Article CAS Google Scholar
Ardlie KG, et al. The genotype-tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015;348(6235):648–60.
Article Google Scholar
Andersson R, et al. An atlas of active enhancers across human cell types and tissues. Nature. 2014;507(7493):455–61.
Article CAS PubMed Google Scholar
Ciofani M, et al. A validated regulatory network for Th17 cell specification. Cell. 2012;151(2):289–303.
Article CAS PubMed PubMed Central Google Scholar
Yosef N, et al. Dynamic regulatory network controlling TH17 cell differentiation. Nature. 2013;496(7446):461–8.
Article CAS PubMed PubMed Central Google Scholar
Civelek M, Lusis AJ. Systems genetics approaches to understand complex traits. Nat Rev Genet. 2014;15(1):34–48.
Article CAS PubMed Google Scholar
Dobrin R, Zhu J, Molony C, Argman C, Parrish ML, Carlson S, Allan MF, Pomp D, Schadt EE. Multi-tissue coexpression networks reveal unexpected subnetworks associated with disease. Genome Biol. 2009;10(5):R55.
Article PubMed PubMed Central Google Scholar
Long Q, Argmann C, Houten SM, Huang T, Peng S, Zhao Y, Tu Z, Zhu J. Inter-tissue coexpression network analysis reveals DPP4 as an important gene in heart to blood communication. Genome Med. 2016;8(1):15.
Article PubMed PubMed Central Google Scholar
Ramilowski JA, Goldberg T, Harshbarger J, Kloppmann E, Kloppman E, Lizio M, Satagopam VP, Itoh M, Kawaji H, Carninci P, Rost B, Forrest ARR. A draft network of ligand-receptor-mediated multicellular signalling in human. Nat Commun. 2015;6(2015):7866.
Article CAS PubMed PubMed Central Google Scholar
Hall MA, et al. Detection of pleiotropy through a Phenome-wide association study (PheWAS) of epidemiologic data as part of the environmental architecture for genes linked to environment (EAGLE) study. PLoS Genet. 2014;10(12):e1004678.
Article PubMed PubMed Central Google Scholar
Denny JC, et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol. 2013;31(12):1102–10.
Article CAS PubMed PubMed Central Google Scholar
Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R. Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol. 2010;6(1):e1000641.
Article PubMed PubMed Central Google Scholar
Himmelstein D, Lizee A, Khankhanian P, Brueggeman L, Chen S, Hessler C, Hadley D, Baranzini S. Rephetio: repurposing drugs on a hetnet. Thinklab. 2015. doi:10.15363/thinklab.4.20.
Google Scholar
Whalen S, Truty RM, Pollard KS. Enhancer–promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat Genet. 2016;48(5):488–96.
Article CAS PubMed PubMed Central Google Scholar
Tan J, Hammond JH, Hogan DA, Greene CS. ADAGE-based integration of publicly available pseudomonas aeruginosa gene expression data with denoising autoencoders illuminates microbe-host interactions. mSystems. 2016;1(1):e00015–25.
Article Google Scholar

Download references

Funding

This work was supported in part by a grant from the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative to CSG (GBMF 4552) and funding from the National Institutes of Health under award U01-TR001263.

Author information

Arjun Krishnan
Present address: 270B Carl Icahn Laboratory, Princeton University, Princeton, NJ, 08544, USA
Casey S. Greene
Present address: Perelman School of Medicine, 10-131 SCTR, 3400 Civic, Philadelphia, PA, 19104, USA

Authors and Affiliations

Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, USA
Arjun Krishnan
Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
Jaclyn N. Taroni & Casey S. Greene
Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
Jaclyn N. Taroni & Casey S. Greene
Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
Jaclyn N. Taroni & Casey S. Greene

Authors

Arjun Krishnan
View author publications
You can also search for this author in PubMed Google Scholar
Jaclyn N. Taroni
View author publications
You can also search for this author in PubMed Google Scholar
Casey S. Greene
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Arjun Krishnan or Casey S. Greene.

Ethics declarations

Disclosure

Arjun Krishnan declares that he has no conflict of interest. Jaclyn N. Taroni reports grants from the NIH during the conduct of study. Casey S. Greene reports grants from the Gordon and Betty Moore Foundation and the National Institutes of Health during the conduct of the study. He also reports grants from the Cystic Fibrosis Foundation, the National Institutes of Health, the National Science Foundation, the Norris-Cotton Cancer Center, and the American Cancer Society, outside the submitted work.

Human and Animal Rights and Informed Consent

This article does not contain any studies with human or animal subjects performed by any of the authors.

Additional information

This article is part of the Topical collection on Genomics.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Krishnan, A., Taroni, J.N. & Greene, C.S. Integrative Networks Illuminate Biological Factors Underlying Gene–Disease Associations. Curr Genet Med Rep 4, 155–162 (2016). https://doi.org/10.1007/s40142-016-0102-5

Download citation

Published: 09 September 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s40142-016-0102-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Integrative Networks Illuminate Biological Factors Underlying Gene–Disease Associations