Introduction

Developing fundamental knowledge about the biology of the brain is essential if we are ever to understand how the most complex structure known to humanity enables us to breathe, think, and make sense of our world. While the rate of progress in neuroscience increases with each year, one stumbling block that remains is the lack of unifying information to provide foundational data for brain research. What is needed is comprehensive data on how the brain is constructed: the basic parts list. How does each gene function and where is it active in the brain? What are the functional units of information processing within the brain? How many cell types are there and how are they wired together? Fortunately, efforts to generate this “parts list” are underway in many large-scale investigations funded by both private and public sources. Genomes have been mapped for many species (Lander et al. 2001; Waterston et al. 2002), and microarray and RNAseq studies have revealed where genes are turned on and off across large brain areas (Kang et al. 2011; Lindblad-Toh et al. 2011; Shimogori et al. 2010). A truly massive effort is ongoing to map connectivity in the human brain using diffusion imaging and various MRI techniques (Van Essen et al. 2012), and the Allen Institute and others are mapping the connectome of the mouse brain with conventional and Cre-dependent viral tracers (Madisen et al. 2010). Since the inception of the Allen Institute, we have been forging ahead on numerous fronts to provide as much information as possible about the molecular neuroanatomy of the mammalian brain. We have produced detailed histological reference atlases of the developing and adult mouse brain (Dong 2008) and spinal cord (Watson et al. 2008) and are in the process of creating prenatal and adult human brain reference atlases. A gene expression map of approximately 500 regions per hemisphere of the adult human brain via microarray, and RNAseq data for 16 critical brain regions across several prenatal and postnatal ages are also available via the Allen Brain Atlas data portal (www.brain-map.org).

Our most longstanding projects are our high-resolution in situ hybridization (ISH) studies in mouse, human, and macaque monkey (Bernard et al. 2012; Lein et al. 2007; Zeng et al. 2012). Using industrial-scale methodologies, with careful and exhaustive quality control to ensure repeatability and accuracy, these studies provide a wealth of fundamental information about how the brain employs its ~20,000 available genes to construct and control a normal nervous system. In this review, we detail our efforts in the adult and developing mouse brain and spinal cord and describe the data, tools, and analyses of these projects (Box 1).

Box 1 Overview of Allen Brain Atlas Resources for the Mouse Central Nervous System. The following resources and tools are all freely available via the Allen Brain Atlas data portal at www.brain-map.org. Informational videos and tutorials demonstrating use of these resources are available via the data portal, and on YouTube (http://www.youtube.com/user/AllenInstitute)

Allen mouse brain atlas

Since its original conception, the Allen Mouse Brain Atlas has grown from an ambitious idea into a prominent resource within the neuroscience community. The atlas uses gene-specific ISH transcripts to localize expression in the brains of adult C57BL/6J mice (for more detailed methods, see Lein et al. 2007). By employing a standardized, high-throughput production pipeline, the Allen Institute has been able to produce a vast catalog of comparable experiments. Since its first public data release in December 2004, the atlas has grown to contain 21,706 sagittal and 4,342 coronal ISH experiments for ~20,000 genes. For each of these experiments, brain sections were scanned at 10× in order to provide cellular resolution in all neuroanatomical structures. Enhancing the value of the ISH experiments is an assortment of informatics tools designed to help a user visualize and sort the data in order to answer specific scientific questions. These tools include (1) an integrated reference atlas based on the Allen Reference Atlas (ARA) (Dong 2008); (2) informatics-based searches for genes with specific expression patterns; (3) the Anatomic Gene Expression Atlas (AGEA), an interactive viewer of brain-wide expression pattern correlations (Ng et al. 2009); and (4) the Brain Explorer 3-D viewer application (Lau et al. 2008) (Fig. 1). Including ISH experiments and analysis tools, approximately 17 terabytes of adult mouse brain data have been published online.

Fig. 1
figure 1

Illustration of the assortment of Allen Mouse Brain Atlas informatics tools. a An AGEA correlation map with a seed voxel in the lateral group of the dorsal thalamus (green crosshair). b For this seed voxel, the gene Plekhg1 (left hemisphere) is returned by AGEA’s “Gene Finder” feature. A matched ARA plate (right hemisphere) shows that expression exists primarily in the thalamus. c The gene Cnr1 is returned by the “differential search” feature when requesting expression in the whole brain, but NOT in the thalamus. d Plekhg1 (blue) and Cnr1 (white) expression visualized using Brain Explorer. The expression patterns of these genes are largely distinct

The integrated reference atlas is unquestionably the foundational framework within the Allen Mouse Brain Atlas. It is a high-resolution color atlas drawn over 21 sagittal and 132 coronal Nissl sections. It includes a systematic hierarchical ontology of brain structures. The ARA has been integrated into all other analysis tools in order to provide a consistent anatomic context for gene expression. The reference atlas can be viewed alongside ISH data. By using the “Synchronize with other viewers” tool, any point on an ISH experiment can be matched to the nearest point in the reference atlas (Fig. 1b). In order to make this possible, all ISH experiments have been virtually reconstructed in 3-D and registered to the 3-D volume of the reference atlas.

Another vital feature of the Allen Mouse Brain Atlas is its gene expression detection algorithm. For each ISH experiment, the algorithm identifies pixels that correspond with gene expression and assesses expression intensity. Combining these values with the reference atlas registration information, expression intensity scores can be calculated for any structure in the reference atlas ontology. The compilation of these scores is described as an experiment’s expression profile. Using this measure, genes can be classified and searched in different ways. The “NeuroBlast” search feature returns genes with an expression profile similar to that of a specified gene of interest. The “Differential” search returns a list of genes with higher expression in one structure compared to another (Fig. 1c).

Expanding upon the expression profile data calculated for each experiment is the AGEA analysis tool. Fundamentally, AGEA derives correlations between gene expression at single points in the brain and all other points. ISH experiments sectioned in the coronal plane were virtually reconstructed in 3-D and then divided into cubic voxels of 200 μm per side. Selecting a seed voxel will display a heat map showing the correlation of expression in other voxels (Fig. 1a). The reference atlas has been integrated into the AGEA tool such that each voxel has a neuroanatomical identity. However, the correlation maps are derived entirely from gene expression data. Utilizing the collection of correlation maps, a hierarchical binary tree of spatial clusters was generated. At each level of the tree, groups of voxels are dichotomized to achieve maximal dissimilarity of aggregate correlation values. Essentially, each group of voxels, beginning with the whole brain, is divided at each level by the most significant spatial difference in collective gene expression. AGEA’s “Gene Finder” tool returns a list of genes for which expression is specific to the highly correlated voxels in the heat map.

Once a list of genes has been accrued, the interactive Brain Explorer 3-D viewer is useful for visualizing expression. It displays voxel-level gene expression. Because the data have been registered into a canonical 3-D space, multiple experiments viewed simultaneously allow for a comparison of relative expression position (Fig. 1d), with anatomical context provided by 3-D brain structures derived from the reference atlas. Selecting any expressing voxel will show the user a thumbnail of the ISH data at that position and directly link to the high-resolution ISH images in the online Allen Mouse Brain Atlas.

The scope and utility of the Allen Mouse Brain Atlas has attracted a multitude of users. Since the publication of the majority of the data for the atlas in 2006, it has attracted an average of 10,352 unique visitors per month. In order to improve the user experience, regular updates to the atlas are made annually. These typically include replacement of poor-quality experiments and improvements to tools and viewers. According to PubMed, the primary platform paper for the atlas (Lein et al. 2007) has been cited 384 times as of May 2012, reflecting its extensive use as a validation of and catalyst for new research (Jones et al. 2009). For example, a recent project using RNAseq to generate a quantitative atlas of gene expression in the mouse neocortex (Belgard et al. 2011) uses ISH data from the Allen Mouse Brain Atlas to visualize and verify their approach. Another recent publication informatically relates expression patterns from the Allen Mouse Brain Atlas with connectivity patterns from the Brain Architecture Management System, finding a statistically significant relationship between gene expression and connectivity (French and Pavlidis 2011).

Allen developing mouse brain atlas

To add a temporal dimension to the mouse brain ISH data set, the Allen Developing Mouse Brain Atlas was launched in 2008. Repeating the approach taken for the adult mouse brain atlas described above, the Allen Developing Mouse Brain Atlas details the expression patterns of ~2,000 unique genes at seven significant stages of mouse brain development. This survey comprises 15,066 sagittal and 714 coronal ISH experiments; all of which are displayed at 10× resolution. A set of tools similar to those of the Allen Mouse Brain Atlas has been deployed for this atlas, including (1) a series of anatomic reference atlases; (2) informatics-based searches for genes with specific expression patterns; (3) the Brain Explorer 3-D viewer application; and (4) AGEA. In all, approximately 12 terabytes of data and analysis tools have been made available online as part of this atlas. These data are accessed by an average of 2,903 unique users each month. As with the adult atlas, updates of data, tools and user interface are released each year.

Most genes were selected for inclusion in the atlas based on a known or hypothesized influence on brain development. Gene selection focused on transcription factors; neuropeptides, neurotransmitters, and their receptors; receptor tyrosine kinases and their ligands; axon guidance molecules; and anatomical and cell-type markers. Expression of these genes was assayed at embryonic days 11.5, 13.5, 15.5, and 18.5, and postnatal days 4, 14, and 28. By focusing on these types of genes at these specific ages, the progression of numerous neurodevelopmental events such as neuronal migration and differentiation has been captured. For example, the cerebral cortex is made up primarily of excitatory glutamatergic neurons and inhibitory GABAergic interneurons. Excitatory neurons mature in the ventricular zone of the dorsal pallium and radially migrate a short distance to form the cortical plate. Inhibitory interneurons are born primarily in the subpallium and tangentially migrate a relatively long distance to reach the cortical plate. The gene Lhx6 expresses in the medial and central ganglionic eminences in the subpallium and marks a subset of interneurons migrating from these regions into cortex (Liodis et al. 2007). Data for Lhx6 illustrate this key element of cortical development across time (Fig. 2).

Fig. 2
figure 2

ISH data for the interneuron marker gene Lhx6 in the developing mouse brain. a Cross sections of pallium from embryonic day 11.5 to postnatal day 14. Migrating interneurons can be found in the marginal zone (MZ) as early as E13.5. By E18.5, Lhx6-positive interneurons are found throughout the depth of the pallium. This scattered expression pattern persists throughout adulthood. b Sagittal sections of the forebrain from E11.5 to P14. The medial and central ganglionic eminences (MGE and CGE), which are both sites of cortical interneuron generation, are revealed by Lhx6 expression. Over developmental time, expression in both areas is diminished as Lhx6-positive interneurons migrate to the cortex and cease to be generated. 1, 2/3, 4, 5, 6a, and 6b are distinct layers in the postnatal cortex; cc corpus callosum; CP cortical plate; Dg diagonal domain; LV lateral ventricle; Pal pallium; PP preplate; SP subplate; SVZ subventricular zone; VZ ventricular zone

A high-resolution, full-color anatomic reference atlas and ontology has been created for the Allen Developing Mouse Brain Atlas. This reference atlas was generated by Luis Puelles and includes annotations of the seven stages of mouse brain development utilized in the gene expression atlas. Instead of following the conventional model of brain neuroanatomy, which centers on defining functionally distinct nuclei and layers, Dr. Puelles has created an atlas set that illustrates the developmental Bauplan of the vertebrate brain. His “prosomeric model” holds that the developing brain is segmentally organized (Rubenstein et al. 1994). These segments, known as neuromeres, are delineated in the reference atlas.

Digital registration to the reference atlas allows gene expression data to be analyzed in many of the same ways described for the Allen Mouse Brain Atlas. A “NeuroBlast” search can be performed for any image series in order to find genes exhibiting similar expression patterns. The “Anatomic” search returns a list of genes that are enriched in a specified structure. The “Temporal” search finds genes that are enriched at a specified age in a specified structure. The Brain Explorer application can be used to view ISH data in 3-D. Within this application all seven developmental ages can be viewed simultaneously, giving an efficient overview of a gene’s expression pattern throughout development. The reference atlas has been integrated into this tool so expression can be visualized in a neuroanatomical context. There is also a version of AGEA derived from these data, which has most of the functionality of the Allen Mouse Brain Atlas version.

While the scope of the Allen Developing Mouse Brain Atlas is limited to approximately 2,000 genes instead of complete genomic coverage, its strategic gene selection, sampling at seven stages of development, and analytical tools make it an exceptional resource for visualizing neurodevelopmental events in the mouse brain.

Allen spinal cord atlas

The Allen Spinal Cord Atlas contains neonatal and adult mouse ISH data for about 20,000 genes spanning the length of the spinal cord. Coronal sections of spinal column at postnatal day 4 and spinal cord at postnatal day 56 were collected and processed with the ISH protocol described in Lein et al. (2007). These data are augmented by the inclusion of a detailed spinal cord reference atlas drawn by Charles Watson and Gulgun Sengul (Sengul et al. 2012; Watson et al. 2008). Over 80 structures in the neonate and adult cords have been delineated in the reference atlas. Each ISH section has an associated “nearest reference section,” which can be accessed by loading an ISH experiment and its age-appropriate reference atlas into the viewer and using the “Synchronize with other viewers” tool. In addition, all adult spinal cord segments have been classified into cervical, thoracic, lumbar, sacral, or coccygeal groups. Nissl and AChE reference sets have been produced to further assist in the neuroanatomical localization of gene expression (Fig. 3).

Fig. 3
figure 3

ISH data and associated reference resources for the spinal cord radial glia marker Gfap in the adult (P56) and neonate (P4) mouse. a1 Adult thoracic spinal cord section in the coronal plane showing Gfap expression radially arrayed in the white matter. 2 The nearest reference atlas section showing many divisions of white and gray matter. 3 The nearest reference Nissl section showing the pattern of cell bodies expected for the ISH section. 4 Nearest reference acetyl cholinesterase-stained (AChE) section showing cell bodies and terminals involved in the cholinergic system. b1 Thoracic spinal cord section from the P4 mouse showing Gfap expression radially arrayed in the white matter and surrounding the central canal. 2 The nearest reference atlas section showing detailed neuroanatomical annotations. 3 The nearest reference Nissl section showing the pattern of cell bodies expected for the ISH section. 4 The nearest reference AChE section showing cell bodies and terminals involved in the cholinergic system

The Allen Spinal Cord Atlas has received an average of 1,500 unique visitors each month since its first public data release in July 2008. In all, over 1.7 terabytes of data have been made available online. The atlas has been utilized by many researchers, but one of its most notable impacts was in 2011 when a paper titled “Adult Spinal Cord Radial Glia Display a Unique Progenitor Phenotype” (Petit et al. 2011) was published, describing a putative new cell type discovered based on the atlas data. The study was a collaboration effort with scientists from the University of British Columbia, The Montreal Neurological Institute, and the Allen Institute. It defined a previously unknown subtype of adult radial glia within the spinal cord that expresses a set of 122 genes, which are also expressed in classic neural stem cells. This discovery suggests the existence of previously overlooked stem cell-like populations that may one day assist in the repair of spinal cord injury or in the treatment of spinal cord disease.

Validation of the top 1,000 allen mouse brain atlas genes

While the scope of the Allen Mouse Brain Atlas is impressive, its true value depends on its biological accuracy. In order to characterize and improve data accuracy, the most frequently viewed genes in the atlas were compared to external sources (Allen Institute for Brain Science 2010). Cumulative webpage views from the Allen Mouse Brain Atlas were calculated for 17,825 unique genes from October 2005 to March 2010. A webpage hit was tallied whenever a page containing ISH data for a specific gene was accessed. The 1,000 genes receiving the greatest number of hits were selected for validation. We focused on the top 1,000 in order to preferentially improve accuracy among the most frequently viewed genes, which represented 49 % of all website hits (Fig. 4).

Fig. 4
figure 4

Website hits shown by gene rank. The top 1,000 genes account for only 5 % of all genes but 49 % of the total website hits for the Allen Mouse Brain Atlas

There were two ways in which a gene in the top 1,000 was validated: (1) comparison to peer-reviewed literature or (2) comparison to one of two online databases. The 100 highest-ranked genes were validated by published literature exclusively. These genes are highly studied within the neuroscience community so comparable data were readily available for all 100 (top 50 are shown in Table 1). Comparable data were defined as articles containing images of mRNA or protein localization experiments from adult, wild-type mouse or rat. Images showing gene expression patterns were carefully compared to corresponding ISH images from the Allen Mouse Brain Atlas. Written descriptions of expression patterns were also utilized, but only as a complement to image comparison. Whole-brain comparisons were opted for whenever possible, but partial-brain comparison was accepted when necessary. The standard for validation was strict for comparisons to peer-reviewed literature sources. In order for a gene to be validated, all structures available for comparison had to have equivalent expression patterns.

Table 1 Comparison of top 50 Allen Mouse Brain Atlas genes to literature

The remaining 900 genes from the top 1,000 initially were compared to two online databases: (1) the St. Jude Brain Gene Expression Map (BGEM) project (Magdaleno et al. 2006), and (2) the Gene Expression Nervous System Atlas (GENSAT) project (Gong et al. 2003). When a database comparison was not possible or did not validate our atlas data, peer-reviewed literature was consulted using the criteria described above. The BGEM database presents dark-field images of mRNA expression across the mouse brain using radioactive riboprobes for in situ hybridization. The GENSAT database contains gene expression data generated from bacterial artificial chromosome (BAC) transgenic mice, where endogenous protein-coding sequences have been replaced with sequences encoding an EGFP reporter gene in each transgenic vector. Because its method is more similar to that of the Allen Mouse Brain Atlas, BGEM was the preferred source for database comparisons. The standard for validation by database comparison was lower than that for literature comparison. For database comparison, gross agreement of expression pattern across major structures was the standard for validation (Fig. 5).

Fig. 5
figure 5

Validation of Allen Mouse Brain Atlas experiments by external data sources. a Pdyn data from the Allen Mouse Brain Atlas (left hemisphere) compared with a radiolabeled in situ hybridization section from an adult mouse from Lin et al. (2006) (right hemisphere). b Gad1 data from the Allen Mouse Brain Atlas (top) compared with a radiolabeled in situ hybridization section from BGEM (bottom). c Gfap data from the Allen Mouse Brain Atlas (top) compared with a BAC transgenic adult mouse section from GENSAT (bottom). Regions where each gene is strongly expressed are similar between Allen Mouse Brain Atlas data and literature/database sources

When no comparable database or literature source could be found for a gene in the top 1,000 genes, an attempt was made to validate gene expression using Allen Developing Mouse Brain Atlas experiments. Data from the Allen Developing Mouse Brain Atlas using an identical ISH probe to the top 1,000 gene in question were compared to external sources. For example, no comparable adult data could be found for the gene Lhx6 (data shown in Fig. 2). However, there were three developmental experiments available in the BGEM database (embryonic days 11 and 15 and postnatal day 7). Each of these experiments showed comparable expression to age-corresponding Allen Developing Mouse Brain Atlas experiments. Because the ISH probe could be shown to produce a validated expression pattern earlier in development, the adult experiments using an identical probe sequence were also considered validated.

The results of the validation project corroborate the overall data accuracy of the top 1,000 genes from the Allen Mouse Brain Atlas (Table 2). We were able to find comparable external data for 782 of the top 1,000 genes. Of these, 96 % (752 genes) were validated by their comparison. The expression patterns of the remaining 4 % (30 genes) were found to contradict comparable external data. For these mismatches, a redo experiment was initiated in an attempt to produce a supported expression pattern. Subsequent experiments have produced validated data for 9 of these 30 genes. For the 21 remaining genes, further experiments are ongoing to resolve mismatches. The possibility remains that some external data used for comparison was either itself inaccurate or was not, in fact, biologically comparable.

Table 2 Overall results the of top 1,000 validation project

Conclusions

The mapping resources described here allow researchers to compare gene expression across the entire brain and spinal cord in high spatial resolution. Addition of the temporal axis provides important new information about brain development and the localization of genes involved in neurogenesis, transcription, synaptic activity, and neurotransmission. The data for a single gene are easily obtained for all projects and can provide a revealing gestalt of that gene’s activity over time and region. This is illustrated by the gene Somatostatin, which exhibits highly regionalized expression throughout development and in adulthood in both the brain and the spinal cord (Fig. 6).

Fig. 6
figure 6

Various ISH sections for a single mouse gene, Somatostatin (Sst). a Sagittal Allen Developing Mouse Brain Atlas sections showing the whole embryo at E11.5, the head at E13.5, and the brain at E18.5 and P4. Sst expression shifts from a few small regions at E11.5 to most major brain regions by P4. b Sagittal (left) and coronal (right) Allen Mouse Brain Atlas sections showing strong scattered expression in the adult cortex and caudoputamen. Enriched expression can be seen in the reticular thalamus and the superior colliculus. Many other sites also show regionalized Sst expression. c Cervical Allen Spinal Cord Atlas sections of neonate spinal column (left) and adult spinal cord (right) showing scattered expression throughout the spinal gray matter, and enrichment in laminae 1–3 compared to deeper layers. The enrichment in laminae 1–3 can be seen in the sagittal P4 and sagittal P56 brain sections near the base of the medulla. Undefined spinal cord expression can be seen in the sagittal E11.5 and E13.5 sections

Current efforts at the Allen Institute for Brain Science include nonhuman primate and human brain mapping studies that will allow seamless cross-species comparisons. The addition of human microarray and RNA sequencing data provides other genomic modalities to search across species and time. Future projects include detailed mapping of the connectome of the mouse brain and a complete cataloging of neuronal cell types in the cerebral cortex. Combined, these online public resources, which are made freely available to researchers around the world, will help to develop the essential parts list of molecular neuroanatomical tools needed to advance our understanding of how the human brain functions.