Introduction

The human genome-wide association studies (GWAS) of the last several years have provided the first unbiased views of the genetics of common complex diseases, such as coronary artery disease, diabetes, and cancer. Many of the loci contain novel genes not previously connected to their respective disease, indicating that there is great potential to discover new pathways and new targets for therapeutic intervention. These GWAS do, however, have some important limitations. First, human GWAS are not well powered to study genetic interactions, such as gene-by-gene or gene-by-environment interactions (Zuk et al. 2012). Second, it will be difficult to move from locus to a disease pathway directly in humans (Altshuler et al. 2008). And third, for most diseases, GWAS have identified only a small fraction of the total genetic contributions and, thus, there is a great deal more to be discovered (Altshuler et al. 2008; Manolio et al. 2009).

To simplify genetic analysis, natural variations relevant to disease have been studied in mice and rats (Ahlqvist et al. 2011; Flint and Mackay 2009; Keane et al. 2011). This has generally involved traditional linkage mapping methods with crosses between different strains to identify quantitative trait loci (QTLs). An important problem with such analysis has been poor mapping resolution because the QTLs generally contain hundreds of genes, making the identification of the causal genes difficult.

To address these limitations, we have developed an association-based approach using classical inbred strains of mice (Bennett et al. 2010). We follow previous attempts to apply association in mice (Cervino et al. 2007; Grupe et al. 2001; Guo et al. 2007; Liao et al. 2004; Liu et al. 2007; Pletcher et al. 2004) with two differences. First, we correct for population structure, which is very extensive in mice, using an efficient mixed-model algorithm (EMMA) (Kang et al. 2008). Second, to capture loci with effect sizes typical of complex traits in mice (in the range of 5 % of total trait variance), we supplemented the population with recombinant inbred (RI) strains.

Over the last few years, we have typed the hybrid mouse diversity panel (HMDP) strains for a variety of clinical traits as well as intermediate phenotypes, and have shown that the HMDP has sufficient power to map genes for highly complex traits with resolution that is in most cases less than a megabase. In this essay, we review our experience with the HMDP, describe various ongoing projects, and discuss how the HMDP may fit into the larger picture of common diseases and different approaches.

Overview of the HMDP

The hybrid mouse diversity panel (HMDP) consists of a population of over 100 inbred mouse strains selected for usage in systematic genetic analyses of complex traits (Table 1). Our goals in selecting the strains were to (1) increase resolution of genetic mapping, (2) have a renewable resource that is available to all investigators worldwide, and (3) provide a shared data repository that would allow the integration of data across multiple scales, including genomic, transcriptomic, metabolomic, proteomic, and clinical phenotypes. The core of our panel for association mapping (Bennett et al. 2010; Cervino et al. 2007; Grupe et al. 2001) consists of 29 classic parental inbred strains which are a subset of a group of mice commonly called the mouse diversity panel. We settled on our strains by eliminating closely related strains and removing wild-derived strains. The decision to remove wild-derived stains is based on the tradeoff between statistical power and genetic diversity. While we were sacrificing the genetic diversity by leaving out wild-derived strains, our panel increased the statistical power (assuming the same number of animals) to identify genetic variants polymorphic among the classical inbred strains which affect traits, and these variants account for a tremendous amount of phenotypic diversity among the classical inbred strains.

Table 1 The 114 strains typed within the HMDP for metabolic phenotypes

In order to increase power, we included panels of RI mice, including the BXD, CXB, BXA/AXB, and BXH panels. Power calculations with the inclusion of these additional strains indicated that we have 70 % power to detect SNPs that contribute ~10 % of the overall variance of a complex trait (Bennett et al. 2010). We have recently shown that power can be further increased by performing meta-analysis in which data from the HMDP are combined with data from traditional crosses (Furlotte et al. 2012). Power can also be increased by typing additional commercially available RI panels, as discussed below.

A key feature of the panel is that genotyping is not necessary due to the wealth of single nucleotide polymorphism (SNP) genotypes known across the mouse strains (Keane et al. 2011; Kirby et al. 2010). The inbred strains used for the HMDP were previously genotyped by the Broad Institute and then combined with genotypes from the Wellcome Trust Center for Human Genetics (WTCHG) (Table 2). Genotypes of RI strains at the Broad Institute were inferred from WTCHG genotypes by interpolating alleles at polymorphic SNPs among parental strains, calling ambiguous genotypes missing. Of the 140,000 SNPs available, 107,145 were informative with an allele frequency >5 % and were used for GWAS in our publications (Bennett et al. 2010; Farber et al. 2011; Park et al. 2011). Additional genotyping classifications have been performed recently (Keane et al. 2011; Kirby et al. 2010), and the resulting 4 million SNP genotypes are freely available (Table 2).

Table 2 URL sites useful for the HMDP as well as sites used to develop the HMDP, including sites at The Jackson Laboratory (JAX) (Bar Harbor, ME)

In the current HMDP panel consisting of over 100 inbred and RI strains, we used ~860,000 SNPs to examine linkage disequilibrium (LD) blocks. These SNPs have greater than 5 % minor allele frequency (mean and median minor allele frequency of 28 and 31 %) and are polymorphic between the strains. Using 0.8 as the r 2 cutoff to define LD, there are a total of 13,706 LD blocks with the median size of 42.8 kb per block (mean of 143.3 kb per block) scattered throughout the genome (Fig. 1). Since LD blocks are more likely to define the window in which a candidate gene for a locus resides, the presence of small LD in the HMDP suggests that the number of causal candidate genes will be on average fewer than five genes per locus. This is a considerable improvement over mapping resolution in traditional linkage studies and/or in outbred stock mice where, on average, the number of candidate genes for each locus ranges from 10 to 50. Large blocks (defined as LD blocks >1 Mb) are also present in the HMDP panel but not frequently. Only 1.5 % of the LD blocks (211 of 13,706 total) have a large size encompassing 12.5 % of the genome. The X chromosome is noted to contain multiple large LD blocks, suggesting that mapping resolution for this chromosome is reduced as compared to the autosomes. These large blocks reflect the regions in the genome that were inherited by all strains from the shared ancestors as described in (Frazer et al. 2007). The high-resolution mapping property of the HMDP becomes particularly important in systems genetics as one common goal is to identify causal genes that coordinately regulate network function. Such drivers have been reported in numerous studies as candidate genes for “QTL hot spots,” but the true identity of such drivers has been elusive mainly due to lack of resolution in mapping.

Fig. 1
figure 1

Map of linkage disequilibrium (LD) blocks along the genome in the hybrid mouse diversity panel (HMDP) population. The LD blocks were determined using ~860,000 SNPs that have >5 % minor allele frequency (mean and median minor allele frequency of 28 and 31 %) and are polymorphic between the strains. Using 0.8 as the r 2 cutoff to define LD, there are a total of 13,706 LD blocks with a median size of 42.8 kb per block (mean of 143.3 kb per block) scattered throughout the genome. Only 1.5 % of the LD blocks (211 of 13,706 total) have a large size encompassing 12.5 % of the genome. The location of the large blocks in the genome can be identified by lines that cross the red bar. See text for more details

In addition to the excellent resolution, the HMDP has important advantages for systems genetics and for analysis of genetic interactions. The progeny from a genetic cross are unique and as such can be characterized for a limited number of phenotypes, whereas the inbred strains of the HMDP can be examined for an unlimited number of phenotypes since the data are cumulative. The same concept applies to interactions. Thus, mice of the same genotype can be examined under a variety of conditions to identify gene-by-environment interactions, and epistatic interactions can be tested using targeted perturbations on specific genetic backgrounds. This is also an important feature of other replicate mouse genetic systems such as consomic strains, collaborative cross strains, and wild-derived inbred strains.

Discoveries using the HMDP

The successful use of the HMDP to identify complex trait genes was recently highlighted in a study of bone mineral density (BMD) (Farber et al. 2011). BMD is a polygenic phenotype that is commonly investigated in human and rodent genetic studies and is the single strongest predictor of osteoporotic fracture (Cummings et al. 2002; Farber and Rosen 2010). For this study, total body, spine, and femur areal BMD data were generated on 16 week-old male mice from 96 HMDP strains. The whole bone transcriptome was also profiled using Illumina gene expression microarrays. The authors used EMMA to perform genome-wide association for the three BMD measures. A total of four genome-wide significant associations were identified on Chromosomes (Chrs) 7, 11, 12, and 17, each affecting BMD at one or more sites. The Chr 12 association for total-body BMD was chosen for further analysis since the 3 Mb window surrounding the association contained only 14 candidate genes. Interestingly, the most significant association was with a nonsynonymous SNP (rs29131970) in the additional sex-combs like 2 (Asxl2) gene (Fig. 2). This polymorphism was predicted to have deleterious effects on ASXL2 protein function. To gain further support for Asxl2 being the causal gene, existing human genome-wide association data (generated in ~6,000 Icelandic subjects) was used to evaluate SNPs within the human syntenic region for association with BMD (Styrkarsdottir et al. 2008). One SNP (rs7563012) was significant after Bonferroni correction and was located in intron 3 of the human ASXL2 gene (Fig. 2). Together these data suggested that Asxl2 influenced BMD in both humans and mice. Consistent with this hypothesis, BMD was found to be lower in Asxl2 −/− knockout mice.

Fig. 2
figure 2

Variation in Asxl2 in mice and humans is associated with bone mineral density (BMD). a Genome-wide association in the HMDP for total BMD identifies an association on chromosome (Chr) 12. b A nonsynonymous SNP (rs29131970) in Asxl2 that was predicted to alter protein function was the most significantly associated Chr 12 SNP in the HMDP. c Human SNPs within ASXL2 were also associated with BMD in ~6,000 Icelandic individuals. d Male mice deficient in Asxl2 (−/−) display significant decreases relative to wild-type controls (+/+) in total BMD, spine BMD, and femur BMD residuals after adjustments for age and body weight. Data shown in d are residual mean ± SEM, *P < 0.05

Network analysis in the HMDP is another powerful approach for investigating complex traits from a systems-level perspective. Coexpression networks can be used to annotate genes of unknown function based on the known functions of the genes to which they are most closely connected. This “guilt by association” approach has been shown to be a robust gene annotation tool (Wolfe et al. 2005) and was used to determine the mechanism through which Asxl2 influenced BMD. Weighted Gene Co-expression Network Analysis (WGNCA) was first used to generate a coexpression network using the bone microarray data, which identified Asxl2 as being connected to genes involved in myeloid cell differentiation. In bone, osteoclasts are bone-resorbing cells of myeloid origin (Teitelbaum and Ross 2003). Additionally, in a human protein-protein interaction network, ASXL2 interacts with TRAF6, a key component of the major signaling pathway regulating osteoclastogenesis (Teitelbaum and Ross 2003). Thus, based on network inferences, Asxl2 was predicted to be involved in the differentiation of osteoclasts. To test this prediction, expression of Asxl2 was knocked down in osteoclast precursors. An ~50 % reduction in Asxl2 transcript levels inhibited the formation of TRAP+ (a marker of mature osteoclasts) multinuclear cells. These data suggested that Asxl2 influences BMD, at least in part, through its regulation of osteoclastogenesis. This work highlights the ability of using the HMDP and systems genetics to move from association to gene to mechanism in a single step.

In another study, (Park et al. 2011) utilized the HMDP resource to report on gene networks associated with conditional fear. In this study, the authors combined behavioral phenotypes with gene expression data in two regions of the brain (striatum and hippocampus) to identify groups of genes that coordinately regulate the behavior of the animals. Overall, they observed significant overlap between the local QTLs and QTL hotspots in the two tissues, as well as module conservation and preservation of highly connected genes (also known as “hubs”) in the striatum and hippocampus networks. The authors were also able to identify tissue-specific network modules between the striatum and the hippocampus, and after performing functional enrichment analysis of the modules, they arrived at pathways likely to contribute to the differences in hippocampus and striatum function. Finally, using modules as functional units, the authors were able to demonstrate correlations with behavioral traits, thus helping to prioritize candidate genes and pathways for behavioral traits.

In addition to identifying cellular mechanisms and genes underlying physiological traits, the HMDP has been used to investigate the relationships across various biological scales at the global level (Ghazalpour et al. 2011). For example, among the intermediate phenotypes that have been examined in liver are transcript levels (in triplicate, using the Affymetrix platform) and a set of peptides corresponding to about 1,000 proteins (using quantitative mass spectrometry analysis) (Ghazalpour et al. 2011). The correlation between protein and transcript levels was quite weak, with a correlation coefficient of less than 0.4 in most cases, similar to what has been observed in previous studies with yeast and worms. More surprising was the finding that transcript levels were much more strongly correlated with clinical traits (primarily metabolic) than were protein levels. One possible explanation is that transcript levels may be reactive rather than causal with respect to physiologic traits (Ghazalpour et al. 2011).

Overall, the HMDP is being used to develop a multiscale understanding of a number of complex traits, including a recent report on elevated heart rate (Smolock et al. 2012). There are already over 70 traditional clinical traits reported and there are ongoing studies related to diet-induced obesity, hearing loss, heart failure, atherosclerosis, lipoprotein metabolism, bone metabolism, vascular injury, hematopoietic stem cells, air pollution, gut flora, addictive behavior, hepatotoxicity, and diabetic complications. In addition, gene expression microarrays have been used to quantify mRNA levels in liver, bone, adipose, brain, peritoneal macrophages, aorta, and heart, and proteomic and metabolomic profiling has been performed in liver.

Integration of the HMDP with other resources

The HMDP is just one of several recently proposed approaches to improve the resolution of mouse genetic studies. Other approaches include the Collaborative Cross (CC) (Churchill et al. 2004), outbred designs (Valdar et al. 2006; Yalcin et al. 2010), and the use of consomic strains (Gregorova et al. 2008; Singer et al. 2004; Takada et al. 2008). Each approach has advantages and disadvantages relative to the HMDP. The CC is a recently developed panel of RI strains that are descendants from eight founder strains. A key difference between the HMDP and the CC is that three of the CC founders are wild-derived strains. Wild-derived strains introduce a significantly larger amount of genetic variation and corresponding phenotypic variation compared to the HMDP. For this reason, it is likely that more genes in the CC will have effects on traits than in the HMDP. However, the additional variation may make it relatively more difficult to map quantitative loci polymorphic in both panels since the increased variation of the CC and outbred panels will reduce the relative effect size of the same variant compared to the HMDP (Kang et al. 2008). The CC will ultimately contain approximately 300 strains compared to the current 100 strains of the HMDP. However, the HMDP may be enlarged to 260 strains (see Future directions and conclusions section) (Collaborative Cross Consortium 2012). Both the CC and the HMDP use inbred strains so they share the advantage of accumulation of data on each strain over time as more and more studies are performed. Utilizing inbred strains also facilitates performing studies with perturbations because identical animals can be phenotyped both with and without a perturbation.

An advantage of outbred designs is that they have higher resolution than the HMDP (up to 100 kb). On the other hand, a disadvantage in outbred designs, either utilizing a specially designed Heterogeneous Stock (Valdar et al. 2006) or commercially available outbred mice (Yalcin et al. 2010), is that each animal is unique. Overall, several mapping strategies are, or will be, available to tackle the genetics of complex diseases.

Consomic strains are also proving valuable for reducing genetic intervals containing candidate genes (Hoover-Plow et al. 2006; Prows et al. 2008) and for identifying causal genes (Burrage et al. 2010). However, consomic strains have limited uses for initial genomic studies, as only two alleles are sampled and consomic strains by definition consist of large areas of LD which were captured from the donor strain during breeding. Nonetheless, consomic strains have enriched the ability to test for epistasis and to reduce genetic intervals by the generation of congenic strains.

Resources for design and analysis of HMDP

UCLA maintains several resources for the design and analysis of HMDP studies (Table 2). The two most relevant to investigators are the EMMA association webserver and EMMA design webserver. Investigators utilizing the HMDP can upload their collected phenotypes to the association webserver which will perform association mapping using EMMA and return population structure-corrected P values for each SNP. The analysis is performed on a high-performance computing infrastructure at UCLA, eliminating the need for investigators applying the HMDP to invest in computational resources to perform the analysis. The EMMA design webserver allows an investigator to estimate the power of a proposed study design through simulations (Kirby et al. 2010) that can help guide an investigator in the design of HMDP studies. In addition, curated genotypes of the HMDP strains are also available.

A systems genetics database

Because the HMDP mice are inbred, with fixed genotypes, the data generated from their study are cumulative. To facilitate the integration and analysis of such data, we also developed a database called the Systems Genetics Resource (SGR). The database comprises mouse genomic, transcriptomic, metabolomic, proteomic, and clinical trait data from the HMDP as well as selected traditional mouse crosses and several human studies. The data are accompanied by detailed descriptions of how the data were acquired, with protocols and links to related published papers. A summary of current data sets contained in the database is presented in Table 3.

Table 3 Data sets currently on line on the systems genetics resource (SGR) database

We developed a web-based interface where data can be queried for information on specific gene and trait correlations, gene expression in various tissues, or quantitative trait loci, as well as be downloaded for other types of analyses. Such information can be used, for example, to prioritize candidate genes in genetic studies and trans-acting loci can be used to generate hypotheses about regulatory pathways (Fig. 3). The intermediate phenotypes can also be used to model gene networks and causal interactions. The power of the SGR is expected to expand as more data are added. Some of the data is also available in the Genenetwork and the Mouse Genome Informatics databases (Table 2).

Fig. 3
figure 3

Database plots for interferon-inducible helicase 1 (Ifih1). Sample plots for a given gene of interest that can be obtained from our online database. a Lipopolysaccharide (LPS) response of Ifih1 in macrophages of the HMDP. b Genome-wide association for the expression of Ifih1 in LPS-treated macrophages. c Relative expression levels among mouse strains of the HMDP in adipose, aorta, heart, and liver, and for macrophages treated with control, LPS or oxidation products of 1-palmitoyl-2-arachidonyl-sn-glycero-3-phosphocholine (OxPAPC) media. Robust microarray average (RMA) refers to an algorithm for gene expression microarray background corrections. We used the Affymetrix GCOS RMA algorithm

The SGR is a resource that can be used to answer specific questions and to understand the relationships among different genes. For example, data generated from primary macrophages of the HMDP helped us to determine that a gene of interest, the interferon inducible helicase 1 (Ifih1), shows gene-by-environment interactions and that it is under the control of the inflammatory stimulus bacterial lipopolysaccharide (LPS) (Fig. 3a). Using eQTL, we also found that the expression of Ifih1 is controlled by three loci on Chrs 5, 8, and 13 (Fig. 3b). We can also compare the expression patterns of Ifih1 among the different tissues available on the database, which include adipose, aorta, heart, liver, and macrophages treated in three different conditions (Fig. 3c). Similarly, we found that the three trans-eQTL on Chrs 5, 8, and 13 are specific to macrophages treated with LPS, but there is also a strong cis-eQTL in the liver and other regulatory loci in Chr 2. Such information can allow us to identify candidate regulators, to examine gene-by-environment interactions and tissue specificities, and to prioritize candidate genes for clinical QTL.

Future directions and conclusions

While the resolution of the HMDP is excellent at most loci, the power is marginal. As judged by QTL studies, few loci contributing to complex clinical traits have effect sizes as large as 10 % and most are below 5 % (Flint and Mott 2008). Thus, using the panel of 100 strains (Bennett et al. 2010), only a subset of the loci contributing to complex traits are likely to be identified. As mentioned above, the power can be enhanced by integrating the results from traditional crosses or by expanding the number of inbred and recombinant inbred strains. Recently, two panels of advanced intercross RI lines have become available from The Jackson Laboratory: The LXS RI panel includes 62 strains and 50 more BXD strains have recently been developed. Also available from The Jackson Laboratory are several sets of cryopreserved strains: 19 strains from the AKXD RI panel, 13 from the AKXL panel, and 15 from the NXSM panel. Thus, it is possible to increase the size of the HMDP to over 260 strains. Recombinant Inbred Congenic Lines, Chromosome Substitution Strains, and Genome Tagged Mice (Peters et al. 2007) could also be employed to increase both power and resolution. A particularly useful complement to the HMDP will be the CC strain set now being generated (Casci 2012; Threadgill and Churchill 2012). Preliminary analyses of the partially inbred CC lines have been promising and substantial resources for their characterization, including complete genomic sequencing, have been planned (Collaborative Cross Consortium 2012).

As discussed above, human genetic studies of complex traits are limited in several respects, and the HMDP provides a partial solution to some of these limitations. First, human studies are poorly powered to identify genetic interactions, and these can be addressed more effectively in mice. As mentioned above, the HMDP is very convenient for genetic analysis of environmental interactions because mice of the same genotype can be examined under different conditions. Also, epistasis is likely to complicate studies of diseases such as diabetic complications and atherosclerosis, where one set of genes contributes to a predisposing factor (diabetes and elevated cholesterol, respectively), and another set of genes affects the response to these factors. In the HMDP, sensitizing genes can be introduced by breeding dominant mutations onto each of the HMDP strains and examining the F1 progeny. For example, we have bred a dominant hyperlipidemia-inducing gene, APOE-Leiden, onto a number of the HMDP strains and find atherosclerosis to be concordant with previous studies in which recessive mutations were transferred onto different backgrounds (B. Bennett and A. J. Lusis unpublished). Second, it will be difficult to identify, directly in humans, the pathways perturbed by novel GWAS genes. For example, the striking relationship between an allele of APOE and Alzheimer’s has been known for nearly 20 years and yet the mechanism remains uncertain. Clearly, studies in mice, where access to tissues and environmental conditions can be standardized, will simplify such analyses. Moreover, studying the genes in the context of natural variation, as opposed to transgenic or gene-targeted mice, may well offer important advantages. Third, for most common disease traits, human GWAS have been able to identify only a small fraction of the total heritability. There are undoubtedly many explanations but, clearly, much remains to be discovered. Studies in mice will most likely identify different, although overlapping, gene sets and perhaps different pathways. Traits that are substantially influenced by environmental factors will be addressed with greater power in mice because such factors can be controlled.

The HMDP panel should be useful as a tool for investigation of basic biological processes as well as complex clinical traits. An example is the study by (Ghazalpour et al. 2011) that examines the relationship between transcript levels, protein levels, and metabolic traits. By providing many thousands of genetic perturbations in various combinations, the HMDP enables global dissection of the relationships between biological scales such as DNA methylation, transcription factor binding, histone modification, and transcription.