Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

8.1 Introduction: Key Experimental Advantages

The earliest written record of an interest in mice is found in a Chinese lexicon, published in about 1100 bc, in which there was a special word for a spotted mouse. In more recent times, directed breeding of unusual or “fancy” mice dates back to at least the eighteenth century in Japan (Keeler and Fuji 1937). As a small social mammal with a short reproductive cycle, that breeds well in communal housing all year round, the mouse is an excellent laboratory animal. Mice are used extensively in many areas of biomedical research and are the most commonly used genetically modified mammal. Mice were the first mammals shown to follow Mendel’s laws of genetics (Cuenot 1902) and in the early twentieth century were one of the first organisms in which genetic linkage was demonstrated (Haldane et al. 1915). More recently, one of the reasons for the utility of the mouse in genetic studies is the relative ease with which embryonic stem cells can be derived, chosen genes modified, and the genetic change transmitted to offspring. This technological feat was recognized in the award of the 2007 Nobel Prize (Mak 2007). The mouse was the second mammal (after humans) for which a draft genome sequence was determined (Waterston et al. 2002) and indeed there is now a “finished” genome (Church et al. 2009) of higher quality than that available for humans. The availability of the draft sequence further stimulated the utility of the mouse in genetic and postgenomic research.

The Mouse Genome Database (http://www.informatics.jax.org/ ) is the closest murine equivalent of the databases for C. elegans (WormBase—see Chap. 2) or Drosophila (FlyBase—Chap. 3), but there are numerous other important online or print resources describing the biology, genetics, and genomics of the mouse. The classical work on the anatomy of the adult mouse (Cook 1965) is also available in electronic form (http://www.informatics.jax.org/cookbook/) and the reference works on the developing mouse embryo and fetus (Kaufman 1992; Theiler 1989) underpin the Edinburgh Mouse Atlas Project (EMAP: http://www.emouseatlas.org/emap/home.html). Mouse genetics is well served by the classic works “Genetic Variants and Strains of the Laboratory Mouse” (Lyon et al. 1996) and “Mouse Genetics: Concepts and Applications” (Silver 1995). The primary source for the mouse reference genome sequence is the assembly maintained by the NCBI (currently at version 38); this underpins the genome annotations in the main genome browsers at Ensembl, the University of California at Santa Cruz (UCSC) and NCBI (see Table 8.1).

Table 8.1 Genome sequence resources 

In the wild, mice have a wide distribution throughout Europe and Asia and were most likely introduced into the Americas and Australia by man. Mice are omnivores and can be maintained easily on a standard diet from commercial suppliers. The generation time of mice is short, varying somewhat between inbred strain; typically 8–12 weeks from egg to egg. They are prolific breeders, with litter sizes of more than ten not uncommon in outbred stocks, but usually smaller numbers in inbred mice. Despite their prolific nature, some mouse stocks are maintained using assisted reproductive technologies such as in vitro fertilization (Takahashi et al. 2010). Embryos from mice can be stored in the gas phase of liquid nitrogen for at least 35 years and spermatozoa for at least 22 years (Nakagata 2000). This technology reduces the cost of maintaining large numbers of different mutant lines, because they need not be kept as breeding animals, minimizes spontaneous genetic drift in mouse colonies, simplifies shipping of mice between facilities and also can help improve the health status of animals. The reproductive technology of cloning has also been used successfully in mice (Thuan et al. 2010), but, unlike IVF, is not in widespread use.

The origins of laboratory mice are complex and it seems that most strains derive from the interbreeding of three mouse subspecies: Mus musculus musculus, Mus musculus domesticus, and Mus musculus castaneus (Guenet and Bonhomme 2003). One of the critical factors in the success of the mouse as a laboratory animal has been the fact that mice can be inbred, unlike many other species. Over the last 100 years, many inbred and outbred mouse genetic resources have been developed, as given in Table 8.2.

Table 8.2 Genetic resources

8.2 Genome Mapping

The earliest genome maps of the mouse were those constructed by linkage analysis of mutations that caused visible traits (Silver 1995). The first traits included on maps arose spontaneously and were noticed in the fancy mice bred in Europe and the USA. Later, mutations induced by radiation or chemicals gave rise to many more traits that were mapped.

All mouse chromosomes are acrocentric, meaning that they have the centromere placed very close to one end, so that one chromosome arm is long and the other very short. The normal mouse karyotype is diploid, consisting of 40 chromosomes; 19 pairs of autosomes and two X chromosomes in females and a single X and a Y chromosome in males. The autosomes do not differ substantially in size and can only be identified reliably after dye staining. The first cytogenetic maps, showing the distinctive banding patterns of mouse chromosomes stained with Giemsa or quinacrine and more recently with DAPI, allowed the integration of genetic linkage and physical maps (Kouri et al. 1971). A representative Giemsa-banded mouse karyotype and an idealized version of the cytogenetic map are shown in Fig. 8.1.

Fig. 8.1
figure 0081

(a) Standard idiogram for the Giemsa-banded mouse karyotype (Evans 1996). (b) Representative Giemsa-banded mouse karyotype (Courtesy of EP Evans and CV Beechey)

Mouse genetic maps grew slowly over the first 80 years of the twentieth century, but with the development of molecular cloning methods and the discovery of highly polymorphic tandem repetitive elements (“microsatellites”) that could be used to distinguish the parental origin of chromosome regions in the offspring of crosses between different inbred strains (Dietrich et al. 1992), this process accelerated. These microsatellite-based maps were used mainly to allow position-based cloning of genes underlying Mendelian (Zhang et al. 1994) and quantitative traits (Cormier et al. 1997).

Two main classes of physical maps have been developed, complementary to genetic maps, which do not depend on animal breeding and ultimately relate the genome sequence to the underlying chromosomes. These are maps that give a direct view of genome features, such as cytogenetic maps (Kouri et al. 1971), and indirect maps, that involve fragmentation and an analytical process of reconstruction to infer genome structure, such as radiation hybrid (Hudson et al. 2001) and clone-based physical maps. Physical maps were needed for positional cloning of mutations, were useful in the process of hierarchical shotgun genome sequencing, and indeed have been used to independently validate the genome sequence map (Zhou et al. 2007).

The first, indirect, types of map derived from either panels of hamster cell lines carrying an intact subset of mouse chromosomes (Kozak et al. 1975) or random fragments of chromosomes produced by irradiation (“radiation hybrid” (RH) map) (McCarthy et al. 1997). These somatic maps have the advantage of being useful to locate any STS for which an amplicon can be designed that is either absent from the hamster genome or is a different size in hamster and mouse.

The other, molecular type of indirect physical maps is based on mainly yeast artificial chromosomes (YACs) or bacterial artificial chromosomes (BACs), using the same techniques pioneered in other organisms, e.g., BAC restriction digest “fingerprinting” in C. elegans (see Chap. 2). The BAC system generates clones that are more stable and more easily manipulated than YACs (Kim et al. 1996); nevertheless, YACs have been useful for studying long-range genomic structure and also in functional studies (Sharpe et al. 1999).

The current reference genetic map for the mouse is based on a massive effort to genotype a complex cross, with eight inbred strains as its starting point, with almost 10,200 single nucleotide polymorphism (SNP) markers (Cox et al. 2009). This level of map density has facilitated both haplotype mapping, which is useful for better localization of quantitative trait loci (QTL) and also a close examination of genome structure, e.g., to understand better the relationship between physical distance and variation in recombination frequency.

A number of reference maps of the mouse genome exist, based on physical or genetic mapping techniques (see Table 8.3), which have been used to underpin the ultimate physical map: the genome sequence.

Table 8.3 Genome reference maps

8.3 Genomics

8.3.1 General Organization

The mouse genome is composed of 20 pairs of nuclear chromosomes, the largest being chromosome 1 (197 Mb) and the smallest the Y chromosome (15.9 Mb—NB only the euchromatic region has been assembled), and the 16.3 kb mitochondrial genome. On all mouse nuclear chromosomes, the relationship between recombination rate and physical size follows a slightly sigmoid shape, with recombination rates suppressed at the proximal, centromeric end and accentuated at the distal chromosome end (Cox et al. 2009).

There are two distinct mouse whole-genome sequence assemblies: one produced by Celera Genomics (Mural et al. 2002) and the other produced by the public genome project (Table 8.1). The Celera assembly is based on whole-genome shotgun sequence from four mouse inbred strains (A/J, DBA2/J, 129X1/SvJ, and 129S1/SvImJ), whereas the public sequence is a composite of sequences from finished BACs and from whole-genome shotgun sequence, both from the C57BL/6J (B6) inbred strain. The B6 genome sequence is now essentially finished, consisting of over 95 % finished BAC sequence, with only ~1,200 sequence gaps (Church et al. 2009). There is an ongoing endeavor to improve the public genome assembly, lead by the Genome Reference Consortium (Table 8.1).

8.3.2 Protein Coding Genes

Based on the Ensembl 65 annotations of the mouse genome, there are 21,879 “known” and 826 “novel” protein coding genes in the mouse genome. The existence of these genes is based on a combination of computational prediction and in all cases, independent evidence of corresponding RNA transcripts and/or encoded proteins (Curwen et al. 2004). It is possible, however, that some of these “genes” may not encode proteins and are actually misannotated pseudogenes, as discussed below. The gene counts obtained by the other genome annotation databases at NCBI and UCSC (Table 8.1) are broadly similar, but there is an effort to improve and harmonize annotations of the protein-coding genes, known as the Consensus Coding Sequence (CCDS) project. The aim is to identify a core set of mouse (and human) protein coding regions that are annotated consistently and of high quality (Table 8.1).

The protein-encoding genes of the mouse vary wildly in size, but are typical for a mammalian genome—the largest, encoding dystrophin, spans about 2.6 Mb of the X chromosome. The average mouse exon is 280 bp; introns are much larger, with an average size of 4,981 bp, but the distribution of sizes is very wide—the largest intron being over 1 Mb.

8.3.3 Pseudogenes

Functionally inactive genes are labeled as pseudogenes and may either be transcribed, yet not apparently able to be translated to produce a protein, or nontranscribed. There are 5,228 pseudogenes annotated in Ensembl 65, yet this figure is open to a number of caveats. Firstly, some pseudogenes may be translated after mRNA editing replaces stop codons (Wagner et al. 2003). Secondly, there are may be computational artifacts that lead to misannotation of pseudogenes. Finally, in some cases, transcribed pseudogenes are still clearly functional—a recent example being the PTENP1 pseudogene, which regulates its “functional” homolog, PTEN, by interactions with microRNAs (Poliseno et al. 2010).

Pseudogenes have been described both as evolutionary garbage and as working material from which “new” genes may evolve (Lachmann 2010). The comparative analysis of genome sequences from wild mouse species, e.g., inbred strains derived from Mus spretus, with the B6 reference sequence, may throw some light onto these hypotheses, because they shared a common ancestor about 1.5–2 million years ago (Guenet and Bonhomme 2003).

8.3.4 Major Protein Coding Gene Families

By comparison with the human genome sequence, it is clear that there has been both expansion and contraction of specific protein coding gene families in the mouse (Church et al. 2009; Waterston et al. 2002). The olfactory and vomeronasal receptor gene families are expanded in mice, which may be related to the dependence of these animals on olfaction both in exploring their environment and also in behavioral interactions. Other large gene families are the KRAB zinc-finger and high-mobility-group (1 and 2) DNA/RNA-binding proteins. There are over 50 IgG kappa light chain genes in the mouse genome, by comparison with 13 in humans. Some of these differences may be the result of lineage-specific selective pressure; perhaps we should not be surprised that the general functional classes of gene exhibiting the biggest differences are involved in olfaction, reproduction, and immunity, areas of biology that show large physiological and behavioral differences between mouse and man (Emes et al. 2003).

8.3.5 RNA Genes: Translational and Other Species

The genome includes many genes that do not encode protein, but instead encode functional RNA, involved in many different processes, including transcriptional regulation, mRNA processing, translation, and turnover. The challenge of annotating RNA genes is qualitatively distinct from the identification of protein-coding genes because of the lack of distinct signals or hallmarks. The best understood of the RNA genes are the 355 tRNA genes that match the expected anticodons and the 247 genes for ribosomal RNAs, spliceosomal RNAs, and telomerase RNA (Waterston et al. 2002), which are scattered across the mouse genome.

The number of distinct classes of noncoding RNA molecules has grown dramatically over recent years, due, in part to the large-scale sequencing of transcripts (Carninci 2006; Okazaki et al. 2002; Strausberg et al. 2002) and to novel discoveries in other laboratory animals such as C. elegans (see Chap. 2). These large noncoding RNAs are mostly of unknown function, but some are apparently involved in various aspects of the regulation of imprinted gene expression, dosage compensation, development, and tumorigenesis (Esteller 2011; Guttman et al. 2009). Two examples of noncoding RNAs with specific functions are XIST and NRON. The XIST RNA regulates the transcriptional activity of the X chromosome as a major part of the mechanism of dosage compensation of X-linked genes (Avner and Heard 2001). A noncoding RNA called NRON, identified in the mouse and highly conserved in humans, apparently functions as a component of an RNA– protein complex that represses the NFAT transcription factor (Willingham et al. 2005). For a more extensive review of noncoding RNA genes, consult Esteller (2011).

8.3.6 Small RNA Genes: Regulatory MicroRNAs and Other Species

A highly heterogeneous mixture of small RNAs have been discovered in eukaryotes, including microRNA (miRNA), PIWI-interacting RNAs (piRNAs), and small nucleolar RNAs (snoRNAs) (Esteller 2011). The main function of miRNAs (19–24 nt) is posttranscriptional gene silencing by targeting of messenger RNAs and it is estimated that they regulate the translation of about 60 % of protein-coding genes. About 1,400 transcriptional units encoding these small RNAs are found throughout the mouse genome (Esteller 2011).

The main role of piRNAs (24–30 nt) is to bind the PIWI subfamily of Argonaute family proteins that are involved in maintaining genome stability in germline cells. They are transcribed from parts of the genome that contain transcriptionally active transposable elements. Recent evidence suggests that piRNAs also regulate imprinting-related DNA methylation (Watanabe et al. 2011).

The snoRNAs (60–300 nt) are components of small nucleolar ribonucleoproteins (snoRNPs) and are involved in posttranscriptional methylation and pseudouridylation of ribosomal RNA. These modifications of the rRNA are essential for normal folding and stability.

8.3.7 Transposons

Transposons are mobile genetic elements—they are able to integrate into and also move around the mammalian genome (Kazazian 2004). The main classes of transposons in the mouse genome are DNA transposons and retrotransposons, with the latter class divided up by the presence or absence of flanking long terminal repeat (LTR) sequences. The LTR retrotransposons are responsible for most mobile-element insertions in mice; intracisternal A-particles (IAPs), early transposons (Etns), and mammalian LTR-retrotransposons (MaLRs). The largest group of non-LTR retrotransposons is the long interspersed repetitive element (LINE) L1 repeat family (see below) and about 3,000 copies behave as active transposons. It is likely that these mobile elements supply some of the working material from which “new” genes evolve (Kazazian 2004).

8.3.8 Repetitive Sequences

A large proportion of the mammalian genome consists of repetitive sequences, many of which are interspersed repeats, derived from transposable elements. The mouse genome consists of 42.1 % of repetitive elements (Church et al. 2009), belonging to 16 different major repeat families. As in humans, there is a strong correlation in the density of repeat element distribution with (G + C) content; LINE elements being concentrated more in (A + T) rich regions and SINE elements more abundant in (G + C) rich regions (Waterston et al. 2002). In general, these repetitive elements are viewed as mainly nonfunctional, “selfish” DNA, but there are some intriguing exceptions. For example, the repeat hypothesis for X inactivation proposes that the enrichment of LINE elements on the X chromosome, relative to the autosomes, acts as a binding signal for the Xist RNA (Lyon 2003).

8.4 Postgenomic Analysis

In the sections below, postgenomic approaches to analyze the expression, function, organization, and evolution of the mouse genome are considered briefly, under 13 headings.

8.4.1 Continued Annotation

There is an ongoing process of annotation of the B6 reference genomic sequence, e.g., improving gene predictions by incorporating manual annotations from the Vega database (Wilming et al. 2007). The genome annotation browsers such as Ensembl also include new data and computational tools to identify previously unpredicted genes (especially those producing noncoding RNAs), define new gene families, and reveal potential regulatory regions. Many projects are contributing to this process, but two worthy of specific mention are: firstly, the International Knockout Mouse Consortium, in which genes chosen for targeted mutagenesis are being annotated manually by the Havana group at the WTSI (http://vega.sanger.ac.uk/info/data/mouse_knockouts.html) and integrated into Ensembl and secondly, the ENCODE project, which is identifying regulatory features for human and mouse inferred from experimental data and mapping them onto the mouse genome (http://www.genome.gov/10005107) (Chen et al. 2008).

8.4.2 Resequencing

The reference genome sequence is from the C57BL/6J inbred strain, which is one of the most widely used mouse strains. At the time of publication of the essentially “finished” genome sequence, there remained ~1,200 sequence gaps (Church et al. 2009). There is an ongoing program to improve the public genome assembly, including filling gaps and resolving other problematic genome regions, being led by the Genome Reference Consortium (Table 8.1). This effort has now filled over 300 sequence gaps.

In addition to the B6 reference sequence, a project to produce high-quality draft sequences of 17 other inbred mouse strains was completed recently (http://www.sanger.ac.uk/resources/mouse/genomes/). This has generated an invaluable resource of sequence variation, ranging from over 56 million single nucleotide polymorphisms to a precise mapping of structural differences between strains (Keane et al. 2011; Yalcin et al. 2011). These resources will enable improved mapping of complex or quantitative trait loci (QTL), for example using in silico mapping (Pletcher et al. 2004) and also in the recombinant inbred strains being bred for the Collaborative Cross (Philip et al. 2011). Two of the inbred strains sequenced were derived from mouse subspecies (CAST/EiJ from Mus musculus castaneus) and species (SPRET/EiJ from Mus spretus) distinct from B6 and so will facilitate the analysis of natural variability in the Mus genus.

8.4.3 Transcriptome

Extensive cDNA sequencing has been undertaken in the mouse, systematically surveying gene expression in an enormous range of tissues. A combination of approaches, sampling different aspects of cDNAs, have generated a transcriptional landscape of the mouse, including the sequencing of random ESTs (expressed sequence tags) (Marra et al. 1999), serial analysis of gene expression (SAGE) (Yamamoto et al. 2001), cap analysis of gene expression (CAGE) (Carninci 2006), and full-length cDNA (Okazaki et al. 2002). With each technological advance, the mouse transcriptome has revealed numerous surprising findings, e.g., the abundance of long noncoding RNAs (Guttman et al. 2009; Ozsolak and Milos 2010), including and antisense pairs (Carninci 2007) and the complexity of alternative splicing (Lee and Wang 2005).

8.4.4 Microarray Analysis

Expression profiling using microarrays has been applied extensively in the mouse, both with spotted cDNA (Hamatani et al. 2004; VanBuren et al. 2002) and oligonucleotide arrays (Cui et al. 2007; Granville and Dennis 2005). Microarray expression profiling remains a powerful technique and the legacy of data ensures comparisons with new experiments are productive, but as costs reduce it may soon be superseded by RNA-seq (Ozsolak and Milos 2010). This ultrahigh throughput sequencing method is highly sensitive, quantitative, and able to detect previously unknown transcripts.

8.4.5 Expression Analysis: Spatial and Temporal Patterns

Determining when and where genes are expressed is a critical step in predicting gene function. A fundamental requirement in such studies is an anatomical atlas; the reference work for the adult mouse (Cook 1965) is also available in electronic form (http://www.informatics.jax.org/cookbook/) and those for the developing mouse embryo and fetus (Kaufman 1992; Theiler 1989) underpin the Edinburgh Mouse Atlas Project (EMAP: http://www.emouseatlas.org/emap/home.html). These atlases typically combine anatomical drawings, photographs, and microphotographs using only nonspecific chemical dyes, but are essential in defining the location of gene expression. An extension of EMAP is the Atlas of Gene Expression (EMAGE: http://www.emouseatlas.org/emage/home.php), which combines in situ hybridization, protein immunohistochemistry, and transgenic reporter data for the mouse. A further valuable resource that integrates spatial and nonspatial expression data both from individual investigators and large-scale projects is the Gene Expression Database (GXD), maintained at the Jackson Laboratory by the Mammalian Genome Informatics team (http://www.informatics.jax.org/expression.shtml). There have been two large-scale systematic efforts to characterize spatial expression patterns of mouse genes using in situ hybridization: the EUREXPRESS project, that characterized the developing mouse at embryonic day 14.5 (Diez-Roux et al. 2011) and the Allen Brain atlas, that focused on the adult mouse brain (Jones et al. 2009). An alternative approach to localizing gene expression was taken recently by Belgard and colleagues, in which RNA-seq was used to detect and quantify transcripts in the dissected layers of the mouse somatosensory cortex; this allowed them to identify candidate alternatively spliced transcripts that are differentially expressed across layers (Belgard et al. 2011).

8.4.6 Functional Analysis: Gene Replacement or Deletion

Targeted mutagenesis by homologous recombination is one of the great successes of mouse genetics in the twentieth century, allowing scientists to test the effect of gene inactivation or deletion on mammalian physiology and evaluate the likelihood that the orthologous human gene is associated with similar phenotypes or diseases (Mak 2007). Initially, the technology was exploited in individual laboratories, but more recently there have been systematic programs, e.g., KOMP, EUCOMM, and NorCOMM, now coordinated as the International Knockout Mouse Consortium (Collins et al. 2007). Their aim is to generate at least one knockout for every gene in the mouse genome and make available to the scientific community a catalog of the specific embryonic stem cell lines. A conventional knockout will produce a null allele; this is a first step in determining function, but in some cases may appear uninformative if the mutant mouse exhibits no gross phenotype, yet this apparent lack of abnormality may have several possible causes (Barbaric et al. 2007). One obvious possibility is that the chosen phenotype assay(s) are insufficiently sensitive, specific, or comprehensive; Lewis Wolpert, when told that a knockout mouse had no phenotype, famously said “I say, have you taken your mice to the opera? Can they still tell Wagner from Mozart?” (Wolpert and Garcia-Bellido 1998). A further possibility is that the mutant mice die at such an early stage of development that, although the specific gene is clearly essential, the function of the gene product in later life (if any) is obscured. In order to study gene function, therefore, we require multiple alleles; e.g., conditional knockouts, in which one can deliberately choose a tissue or developmental stage at which a gene is inactivated. This has been achieved using a targeted recombination technology “borrowed” from various bacteriophages, e.g., the CRE/loxP recombinase system from P1 phage (Branda and Dymecki 2004; Gu et al. 1993).

8.4.7 Functional Analysis: RNAi Knockdowns

RNA interference (RNAi) is a method of reducing gene function using short double-stranded RNA (dsRNA) molecules, discovered in the nematode worm, C. elegans (Chap. 2), that is now used in many other organisms, including mice (Hitz et al. 2009). This method has not been taken up to a large extent in mice, due in part to difficulties in delivering bioactive dsRNA to cells in whole animals. However, some exciting biology has resulted, e.g., in the fields of type 1 diabetes (Kissler et al. 2006), transcriptional network interactions during infection (Amit et al. 2009), tumor-suppressor genomics (Zender et al. 2008), and neurobiology (Bai et al. 2003; Thakker et al. 2005). New ways to deliver plasmid DNA encoding dsRNAs in mice would be needed to allow experiments similar to those done in C. elegans, where simply feeding them bacteria carrying plasmids expressing shRNA (Timmons et al. 2001) is highly effective.

8.4.8 Functional Analysis: Random Mutagenesis

Spontaneous mutation in mice has generated many interesting and valuable abnormal traits, it was soon realized however, that it was possible to accelerate this random process using various irradiation, chemical, or biological treatments (reviewed in more detail by Flaherty and colleagues 1998). The mutagenic potential of high-energy radiation had been described first in Drosophila (Muller 1927), but was soon used to produce mutations in mice. Various chemicals have been used successfully as mutagens in mice, e.g., ethylmethane sulfonate (EMS) and chlorambucil, but that most widely used is ethyl-nitroso-urea (ENU). The main reasons for the predominance of ENU are that it is probably the most potent known mutagen, introducing about 1 functional sequence change per locus, per 750 progeny, and because it produces, almost exclusively, point mutations (Hitotsumachi et al. 1985). ENU mutagenesis produces a wide variety of functional, which can include null (loss), hypomorphic (decreased), (increased), and even neomorphic (functional gain). This variety of alleles can tell one something new about function, not previously suspected about a gene based on simple null alleles, or help in dissecting out different aspects of function of a single protein, e.g., the Ikaros gene (Papathanasiou et al. 2003), where a null allele has multiple, pleiotropic effects or the cytoplasmic dynein heavy chain 1 (Dnchc1) gene, where animals homozygous for the null allele die at an early stage of embryogenesis and heterozygotes show no obvious abnormalities (Hafezparast et al. 2003). Point mutations produced by ENU in mice mirror closely the spectrum of common human mutations, but it is still relatively time consuming to determine the gene affected. Developments in methods for sequence capture (Olson 2007) and next generation DNA sequencing (Bentley et al. 2008; Wheeler et al. 2008) make this obstacle less important.

There are also various biological mutagenesis methods that depend upon random insertion of novel DNA sequences that disrupt gene function and produce abnormal traits, e.g., based on retroviruses (Soriano et al. 1987) or transposable elements (Carlson and Largaespada 2005; Ivics et al. 2009). The Sleeping Beauty transposon, which was derived from a fish, in combination with the Tc1/MARINER transposase gene, has been adapted for insertional mutagenesis in mice. These methods have lower mutagenesis efficiency than ENU, but have the advantage of easier gene identification.

8.4.9 Functional Analysis: Genetic Reference Populations

The concept of mouse genetic reference populations (GRP) (Argmann et al. 2005) covers the range from sets of the conventional inbred strains described earlier to the more sophisticated set of recombinant inbred (RI) strains known as the Collaborative Cross (CC) strains (Churchill et al. 2004). The CC strains are being produced by combining the genomes of eight genetically diverse founder strains (A/J, C57BL/6J, 129S1/SvImJ, NOD/LtJ, NZO/HiLtJ, CAST/EiJ, PWK/PhJ, and WSB/EiJ) and then inbreeding for at least 23 generations. The ultimate goal of the CC is to breed about 1,000 RI lines, but even with 66 lines, it is possible to map QTL to a resolution of <1.2 Mb (Durrant et al. 2011). These strains will be powerful tools in dissecting complex traits, particularly because the eight founder strains were included in those chosen for resequencing (Yalcin et al. 2011) and so by inference, there will be complete genomic sequence information for all the CC strains.

8.4.10 Functional Analysis: Systematic Phenotyping

The mouse clinic concept (Gailus-Durner et al. 2005; Brown et al. 2006) grew from the idea of the broad health check carried out on a human being at a hospital or general health practice. Humans may present with vague symptoms and so a broad range of different tests is often conducted to aid the clinician in reaching a differential diagnosis. This is a systematic, explorative approach, rather than the hypothesis-driven approach that has been typical of many labs investigating knockout mice. The guiding principle is that our understanding of mammalian gene function is imperfect and that the most unbiased method of detecting novel phenotypes in mutant mice is to subject animals to a broad range of assays, covering all physiological systems. The success of this approach is shown by the detection of many novel phenotypes, even in conventional inbred mouse strains that have been studied intensively for many years (http://www.europhenome.org/).

8.4.11 Interactome and Gene Networks

Within the past decade, high-throughput genomic technologies have revolutionized biomedical science. The genetic, expression or protein data generated by these technologies, when combined with functional annotations, allows inference of predicted network interactions and the definition of subsets of molecules forming interactome(s). These approaches were developed first in simple species such as yeast (Goll and Uetz 2006), but have since been expanded to humans, mice, and other more complex organisms.

The interactomes of the hair cells of the inner ear and photoreceptor cells of the retina include dynamic Usher syndrome protein complexes in mouse and man (Kremer et al. 2006). Spatiotemporal expression, immuno-histochemistry, and immuno-electronmicroscopy in wild-type and mutant mice have contributed significantly to dissecting the interactions of the Usher complexes in stereocilia.

A transcriptional network mediating mouse dendritic cell responses to pathogen-associated molecules has been constructed using a combination of expression profiling and targeted knockdown using shRNA (Amit et al. 2009).

8.4.12 Proteomics and Structural Genomics

Mouse proteomics has lagged behind studies in humans, yet there have been numerous valuable investigations, e.g., of proteomic alterations in disease or disease models (Madan and Amar 2008; Nath et al. 2009; Zhai et al. 2009). Furthermore, Prokai and colleagues have demonstrated that “shotgun” proteomics for protein abundance profiling from mouse tissue is practical using liquid chromatography and mass spectrometry (Prokai et al. 2009).

The Protein Structure Initiative aims at the determination of the 3D structure of all proteins, predominantly from humans, but also some from mouse (http://www.nigms.nih.gov/Research/FeaturedPrograms/PSI/) (Markley et al. 2009).

8.4.13 Comparative Genomics

One of the motivating factors behind the study of mouse genetics and genomics has been the expectation that the fundamental biological similarity between humans and mice could inform a better understanding of human health and disease. It has become clear that this similarity extends to the molecular level: if we align the mouse and human genomes, 15,187 protein-coding genes have simple 1:1 orthologous relationships, corresponding to about 80 % of human genes (Church et al. 2009); furthermore, these genes tend to be clustered into blocks of conserved sequence. Indeed, in the process of finishing the mouse genome, one approach that was used to help detect BAC contig overlaps was to use the syntenic human sequence as a virtual probe (Church et al. 2009; Denny et al. 2001).

We can also think of comparative genomics as a way of analyzing the mouse C57BL/6J reference sequence, using the recently acquired data from four wild-derived inbred strains CAST/EiJ, WSB/EiJ, PWK/PhJ, and SPRET/EiJ, which include the ancestors of the common lab strains and are representative of the Mus musculus castaneus, Mus musculus domesticus, Mus musculus musculus, and Mus spretus taxa, respectively (Keane et al. 2011). The evolutionary divergence between the laboratory mouse and Mus spretus is about 1.5 million years, so it will be fascinating to perform de novo assemblies on these sequences and then compare them with the B6 genome. The three Mus musculus subspecies, castaneus, musculus, and domesticus will interbreed and their F1 hybrid offspring are fertile, whereas only the female F1 hybrid offspring of crosses with Mus spretus are fertile. Some of the molecular bases of this difference are understood (Dejager et al. 2009), yet it is likely that comparative genome analysis will give further clues to the causes of hybrid infertility and insights into the process of speciation.

An excellent comparison of rat, mouse, and human genome maps is given in Chap. 9, but Szpirer and Goran do not extend their comparisons to include the newest mouse genome sequences. Keane et al. (2011) sequenced 17 inbred mouse strains and noted that some of the sequences in the non-B6 strains failed to align with the B6 reference genome, yet could be aligned with rat or rabbit genomes. The authors do not describe the content of these “missing” sequences.

8.5 Conclusion

A major challenge for the biologists of the twenty-first century is determining functions for the 22,000–37,000 genes in the human genome. One of the tools we can use to help us rise to this challenge is the study of “model” organisms such as mice. One might believe, now that the sequences for human and mouse genomes are in hand, that determining gene function would be a simple, albeit time consuming, matter of studying, one at a time, the phenotype of mice in which a single gene has been inactivated. It is clear, however, that this will give only a partial picture of gene function, because of the structural and functional complexity of mammalian genes. A large proportion of genes are expressed as multiple transcripts, produced from distinct promoters, or by alternative splicing. Genes are subject to spatial and temporal regulation, e.g., activated during embryonic development, then switched off and later reactivated only in specific adult tissue(s). Additional complexity is imposed by posttranslational modification of proteins modulating function. Furthermore, some genes are functionally redundant, i.e., multiple genes encode similar proteins, so that deletion of only one of the genes may not produce a phenotype. And what about RNA-coding genes? We need therefore, other, subtler, alleles such as point mutations, conditional knockouts, and ways to partially reduce gene function, such as RNAi, as described earlier. The genetic and functional genomic “toolbox” in the mouse has grown extensively in the last two decades and as systematic mutagenesis and phenotyping are combined, we grow ever closer to achieving that goal of determining function for every mouse gene and so, by inference, for their human equivalent.