Multiple cross mapping (MCM) has been used in the laboratory as an empirical method to significantly reduce the QTL interval (Hitzemann et al. 2000, 2002). Interest in MCM built from the observation that QTL data generated by three different intercrosses in three different laboratories resulted in the detection of what appeared to be an identical QTL for basal locomotor activity (BLA) on distal Chr 1 [C57BL/6J (B6) × BALB/cJ (C), Flint et al. 1995; B6 × A/J (A), Gershenfeld et al. 1997; B6 × DBA/2J (D2), Koyner et al. 2000]; the QTL was not detected in a C×LP/J (LP) intercross (Hitzemann et al. 2000). We proposed that this multiple cross information could be used to develop an empirical algorithm for sorting microsatellite markers to detect chromosomal regions with the highest probability of containing the QTL (Hitzemann et al. 2000). The underlying principle was that the genetic map contained an embedded haplotype structure derived from the common lineage of the inbred strains and that this structure provided information (and thus statistical power) which could enhance QTL analyses. A simple sequential sort of the microsatellite data confirmed that there were three regions on distal Chr 1 that showed a relative enrichment in marker density. This analysis showed promise in that the two distal areas of enrichment generally coincided with the estimated peak position(s) of the basal activity/open-field activity QTLs (Flint et al. 1995; Gershenfeld et al. 1997; Koyner et al. 2000).

Despite the apparent promise of the paradigm, the initial attempt at multiple cross mapping relied on opportunistic samples, and consequently important data were missing. For example, it was not known whether a QTL on distal Chr 1 would be detected in an A×C, A×D2, or C×D2 intercross. If a QTL was present in one or more of these crosses, it would certainly confound, if not invalidate, the approach. With this point in mind, it was concluded that rather than rely on opportunistic data, a prospective study was needed with a completely balanced design. Given that data were on hand for B6×D2 and C×LP intercrosses (Hitzemann et al. 2000), the QTL analysis was extended to include the remaining four crosses (B6×C, B6×LP, D2×LP, and D2×C; Hitzemann et al. 2002). The balanced MCM design was first applied to the study of a QTL for ethanol-induced activation, also on Chr 1. The empirical algorithm reduced the QTL interval to approximately 3 cM, which was confirmed by fine mapping in heterogeneous stock (HS) mice (Hitzemann et al. 2002).

The potential of using the MCM to also integrate QTL, gene expression, and sequence analysis has been discussed previously, and some preliminary evidence to support this use has been provided (Hitzemann et al. 2000, 2002; Xu et al. 2002). The essence of the argument is that the same algorithm used to reduce the QTL interval can also be used to sort through differences in gene expression and sequence to find those differences most relevant for the QTL of interest. The earliest proof of principle (in rodent models) for the integration of QTL and gene expression data were two studies that identified genes involved in insulin resistance (Aitman et al. 1999; Collison et al. 2000) and airway hyper- responsiveness (Karp et al. 2000). The first application of the approach to neural phenotypes is found in Sandberg et al. (2000). These authors reported marked differences in brain gene expression between two inbred mouse strains, B6 and 129S1/S×ImJ, in whole brain and discrete brain regions (cortex, midbrain, hippocampus, and cerebellum). These data led the authors to the salient observation that some of these differences appeared to coincide with the known location of “behavioral” QTLs; a particular note was made of the fact that Kcnj9 (which encodes GIRK3, a G-protein coupled inwardly, rectifying potassium channel) had a markedly lower expression in the B6 strain and was located in a QTL-rich region on Chr 1 (see, e.g., Flint 2003). Subsequent publications from this laboratory (Carter et al. 2001; Lockhart and Barlow 2001; but also see Gerschwind 2000; Belknap et al. 2001; Flint and Mott 2001; Wayne and McIntyre 2002) continued the argument and provided additional data for combining analyses of transcript levels (using expression arrays) with information from QTL mapping to nominate candidate genes (see also Mackay 2001). We now report that by coupling the MCM approach with gene expression analysis, we can confirm that Kcnj9 is indeed a strong candidate quantitative trait gene (QTG) for basal locomotor activity; further, we have used the MCM strategy to identify a candidate quantitative trait nucleotide (QTN) associated with the differential gene expression.

Materials and methods

Animals

Male and female B6, D2, C, and LP mice were obtained from The Jackson Laboratory and used to establish breeding pairs which provided all animals used in the experiments. Details of the B6×D2 intercross are provided elsewhere (Hitzemann et al. 2000, 2002). For the remaining five crosses, reciprocal F1 animals were developed, and these pairs were mated to obtain the F2 animals. A new C×LP intercross was generated for this study; these animals were subsequently used as the founders of a C×LP advanced intercross (see Davarsi 1998). Equal numbers of males and females were used in all crosses. Animals were maintained on a 7 am–7 pm light– dark cycle with food and water available ad libitum. Animals were tested only between 10 am and 3 pm. All procedures were approved by the institutional animal review boards at both OHSU and the Portland VAMC. The sample of heterogeneous stock (HS) animals used here have been described previously (Demarest et al. 2001).

Measurement of basal locomotor activity

Mice were removed from the home cage, injected with saline (10 mL/kg, ip)(which is a mild stressor), and placed individually in the testing arena; the arena floor was covered with standard laboratory bedding. Activity was monitored for 20 min under standard laboratory lighting conditions. One week later, the test was repeated and the two responses were averaged (Markel et al. 1995). Locomotor activity was assessed in a San Diego Instruments Flex Field locomotor system. The apparatus comprises a four by eight array of photo cells mounted in a 25 × 47 cm metal frame, situated 1 cm off the floor, and surrounding a 22 × 42 × 20 cm high plastic arena. Activity was recorded over eight 2.5-min blocks. Data were collected as distance (cm) traveled in each interval.

DNA isolation

High-molecular-weight genomic DNA was isolated from liver samples as follows: 250–500 mg of liver tissue was minced with a sterile razor blade, transferred to a 15-mL polypropylene Falcon tube with 5 mL lysis buffer [100 mM Tris-HCl (pH 8.0), 5 mM, EDTA, 100 µg/mL proteinase K, 200 mM NaCl], and incubated with rocking at 55°C overnight. After incubation, 20 µL/mL of 5 M NaCl was added with gentle inversion. The tissue digest was extracted twice with equilibrated phenol, once with equal volumes of phenol and chloroform:isoamyl alcohol (24:1), and once with chisam alone. DNA was precipitated with 0.5 vol of 7.5 M ammonium acetate and 2 vol of ice-cold ethanol. Dried DNA pellets were resuspended in double-distilled water (ddH2O). Purity and concentration of the final samples were evaluated by u.v. spectroscopy, and only samples with a 260/280 ratio >1.4 were used for genotyping.

Genotyping microsatellite polymorphisms

All of the genotyping involved the -(CA)n-repeating microsatellites first described by Dietrich et al. (1992). The PCR primer sets were obtained from Research Genetics; 1–5 ng of genomic DNA was amplified with 18 pmol of each primer, 0.5 units of TAQ polymerase [AmpliTaq (Perkin Elmer Cetus) or Taq DNA polymerase (Boehringer Mannheim Biochemica)], and 100 nM dNTPs in a 20-µL reaction under the standard conditions recommended by the manufacturer. All reactions are amplified in a Perkin Elmer 9700 Thermal Cycler. Products were visualized by electrophoresis in 1× TBE buffer on a 6% agarose gel (3:1 NuSieve; SeaKem FMC, Inc.). Product bands were visualized by ethidium bromide staining.

QTL data analysis

The collection and analysis of the B6×D2 data set have been described previously (Koyner et al. 2000; Demarest et al. 2001; Hitzemann et al. 2000). For the B6×D2 intercross, at least two and perhaps three separate QTLs have been identified on Chr 1 (Koyner et al. 2000 and unpublished observations). The focus of the current investigation is on the most distal QTL (located at ~90 cM), which has been detected in the B6×D2 intercross at −Log P = 7.4. The threshold for confirmation of this QTL in the other five crosses was set at −Log P = 4, which exceeds the threshold for confirmation established by Lander and Kruglyak (1995). Data for the other five crosses were collected in two cohorts, an initial cohort of ~400 animals followed by a second cohort of ~200 animals. Within the first cohort, ~250 animals were genotyped for each cross, including all of the animals in the phenotypic top and bottom 20%. In general, the intermarker distance was kept at 15 cM. The phenotype × genotype interaction was analyzed separately for each D1Mit marker across the region of interest by using standard ANOVA procedures. The threshold for a putative QTL from this stage of the analysis was set at p < 0.05; despite the multiple comparisons being made (five intercrosses), the low threshold was seen as acceptable to avoid the catastrophe of a false negative and because the QTL would be confirmed in the larger sample. The first stage had a sample power of 0.95 to detect a QTL with an effect size of h 2 QTL = 0.05. The B6×C and B6×LP intercrosses met this threshold. All of the remaining animals in both cohorts were then genotyped. All data are presented graphically as the −Log P value (equivalent to the logarithm of the likelihood for linkage or LOD value) obtained for each marker. DNA samples from the HS animals described in Demarest et al. (2001) were sent to Oxford for analysis by using the procedures outlined in Mott et al. (2000). These data are also reported as the −Log P.

To test for cis-regulation of gene expression, the WebQTL mapping service and BXD recombinant inbred (RI) transcriptome database were used (http://webqtl.roswellpark.org/ ). The database contains expression data (U74Av2 chip; see below) for 27 of the BXD RI strains, the two parental strains, and a B6D2 F1 intercross. 575 unique markers are entered into the analysis. Marker regression is used to detect QTLs, and the program automatically performs a permutation test (n = 1000); cis-regulated genes are expected to map to their respective chromosomal locations.

Preparation of tissue for microarray analysis

For all experiments, animals were sacrificed by cervical dislocation followed immediately by decapitation, with the brains removed and frozen on dry ice. The entire process took less than 60 s. Brains were then stored at −80°C; no sample was stored for more than 3 months. Within this time frame, we could find no effect of storage on the quality or quantity of the RNA extracted. This was assessed at 2 weeks and 1, 2, and 3 months of storage, by using a whole brain sample and the Affymetrix test chip. For the whole brain studies, RNA was isolated without further manipulation as described below. Brains used for microdissection were removed from the freezer, placed on a stage chilled with dry ice, allowed to thaw for 1–2 min, and then sliced in 0.5- or 1-mm coronal sections. For dissection of the dorso-medial striatum, the section extending 1–2 mm rostral to bregma was used. Tissue was punched (0.5 mm, internal diameter) bilaterally from the caudal to rostral surface, at both 1 and 1.5 mm from midline, just below the corpus callosum, angling the punch to avoid the corpus callosum on the rostral surface. The lateral bed nucleus of the stria terminalis (BSTL) was punched from a section that extended approximately from bregma to +0.5 mm rostral; tissue was punched from the caudal to rostral surface by using the lateral ventricle and anterior commissure as the guide. For the central nucleus of the amygdala (CeA), a section that extended −1.0 to −1.5 mm from bregma was used. The external capsule and the medial globus pallidus (previously referred to as the entopeduncular nucleus) were used to guide the external and internal lateral boundaries of the punch. The samples from the CeA and BSTL were combined.

RNA isolation

Total RNA was isolated with TRIZOL® Reagent (Life Technologies) with a modification of the single-step acid guanidinium isothiocyanate phenol-chloroform extraction method, according to the manufacturer’s protocol. The extracted RNA was then purified with RNAeasy (Qiagen). RNA samples were evaluated by u.v. spectroscopy for purity and concentration. Only samples with a 260/280 ratio of >1.8 were used. Samples too dilute for subsequent analysis were concentrated by precipitation with 7.5 M ammonium acetate. RNA quality was monitored by visualization on an ethidium bromide-stained denaturing formaldehyde agarose gel.

Oligonucleotide (Affymetrix arrays)

Samples containing at least 10 µg of total RNA were sent to the OHSU Gene Microarray Shared Resource (GMSR) facility for analysis. The procedures used at the facility precisely follow the manufacturer’s specifications. Additional details are found at (www.ohsu.edu/gmsr/ ). Following labeling, all samples were hybridized to the GeneChipTest3 for quality control. If target performance did not meet recommended thresholds, the sample was discarded. Samples passing the threshold were then hybridized to the U74Av2 array.

Analysis of oligonucleotide array data (MAS 5.0).

The Affymetrix Microarray Suite 5 (MAS 5.0) software was used for the initial processing of all data. The software parameters used for processing the data were set as follows: alpha 1 = 0.1; alpha 2 = 0.15; tau = 0.015; gamma1H = 0.0025; gamma1L = 0.0025; gamma2H = 0.003; gamma2L = 0.003; scaling factor = 200. The parameter values were determined empirically through data modeling; details of the modeling algorithms may be obtained by contacting the GMSR (see above). Two of the parameters are noteworthy: alpha 1 sets the p value for detection (present call) at p < 0.1; the scaling factor sets the mean signal for all probe sets.

Analysis of the whole brain array data (standard procedures)

The whole brain array data were collected in two separate experiments (n = 3/strain/experiment); the correlation between experiments 1 and 2 was used as a measure of assay reliability. Data from both experiments were combined, and standard ANOVA procedures were used to detect strain differences in gene expression; the Neuman-Keuls Test was used for post-hoc comparisons. Even though the analysis in the current study focuses only a small interval of Chr 1, a genome-wide correction for multiple comparisons was used. It was assumed that there were approximately 5000 unique transcripts from the U74Av2 chip expressed in the brain, and thus a conservative Bonferroni correction was used to set the threshold at p < 0.00001(0.05/5000) for the ANOVA. An alternative method for analysis would be to set the differentially expressed genes by using the false discovery rate (Storey and Tibshirani, 2003). Setting the false discovery rate at 0.01 lowers the threshold for detecting differentially expressed genes to ~p < 0.001.

Analysis of the array data (exploratory procedures)

Array data were also collected (N = 3/strain) for the dorsomedial striatum and central extended amygdala. A priori (see Discussion) there were reasons to believe that these regions would have a role in basal locomotor activity (see Hitzemann and Hitzemann 1999). The data from these regions were qualitatively examined to determine whether there were transcripts differentially expressed in these regions but not in the whole brain or vice versa.

Analysis of B6 and D2 sequence data

The general strategy for extracting the B6 and D2 sequence data for Kcnj9 is found in Marshall et al. (2002). Polymorphisms for 2 kb upstream from the transcription start site were analyzed with the TRANSFAC software (Wingender et al. 2001) (http://www.gene-regulation.com/ ) to determine whether the polymorphisms occurred within consensus transcription factor binding sites.

DNA sequencing

Sequencing reactions were performed by using Big Dye Terminator Cycle Sequencing Ready Reaction Kit (Applied Biosystems). The PCR products were directly sequenced on a 64-lane upgraded ABI 373 automated DNA sequencer.

Results

Step 1: multiple cross mapping (MCM)

The data in Fig. 1A and 1B summarize the multiple cross mapping results. No QTL in this region of Chr 1 (150–190 Mb) was detected in the D2×C, D2×LP, or C×LP intercrosses (Fig. 1A); the latter result confirms an earlier observation (Hitzemann et al. 2000). The QTL, originally detected in the B6×D2 intercross (Koyner et al. 2000) was confirmed (p < 0.0001) in both the B6×C and B6×LP intercrosses (Fig. 1B). For all three QTLs, the B6 allele was associated with increased BLA (Fig. 1C). It should be noted that the shared QTL was temporally dependent and was strongest for the first 2.5-min interval after the animal was placed in the testing apparatus (data shown in Fig. 1). Further details on the temporal dependence of this QTL are found in Koyner et al. (2000).

Figure 1
figure 1

Multiple cross mapping of basal locomotor activity (BLA). BLA was mapped in the six intercrosses that can be formed from the B6, D2, C, and LP mouse strains. The data in Fig. 1 are from thefirst 2.5-min interval of the test session. Sample sizes for all intercrosses except the B6 × D2 intercross were 500–600; sample size for the B6 × D2 intercross was approximately 1100/marker. Data analysis used marker by marker ANOVA procedures. Panel A shows the genotype × phenotype interaction for the three B6 intercrosses: data are presented for D1Mit150, located at 175.94 Mb. Results are presented as the mean ± SE. Panel B illustrates the B6 × D2, B6 × LP, and B6 × C intercrosses. Panel C compares the C × LP, C × D2, and D2 × LP intercrosses with the B6 × D2 intercross.

Step 2: using the MCM algorithm to reduce the QTL interval

Polymorphic Mit series microsatellites within the region of interest, whose location could be confirmed in MGSCv3 (www.ensembl.org/Mus_musculus/ ) (N = 106), were sorted to find the subset of markers for which the B6 strain was different from the C, LP, and D2 and C = D2 = LP, i.e., the residual set contained only those markers that had identical alleles for the C, LP, and D2 strains. Each step of the selection process is illustrated in Fig. 2. Seventy-two of the markers were polymorphic between the B6 and D2 strains (Panel B); this proportion was not significantly different from the proportion for all Chr 1 markers (263 out of 444) (χ2 = 2.7, p > 0.09). Selecting for those markers also polymorphic between the B6 and C strains reduced the number of residual markers to 50 (Panel C). Selecting for makers also polymorphic between the B6 and LP strains had only a small effect (Panel D). The inclusion of additional selection routines (e.g., B6 vs A; B6 vs C3H/HeJ; B6 vs AKR/J) had no further effect, i.e., the subset of markers in which the B6 strain was different from the C, LP, and D2 strains identified all of the markers for which the B6 strain was different from the other six inbred strains in the Mit catalogue. The fourth selection (C = LP) (Panel E) reduced the number of residual markers to 18. The fifth and final selection (D2 = LP) reduced the number of residual markers to 12 (Panel F). A sixth selection was not necessary as the markers with identical alleles for the C, LP, and D2 strains were already identified. Overall, the selection process identified two and perhaps three distinct chromosomal regions; of these, the middle region at ~174 Mb appeared the most QTL congruent (see Fig. 1).

Figure 2
figure 2

The application of the MCM algorithm to the microsatellite markers across the interval of interest. Panel A illustrates the distribution of all microsatellite markers in the Mit catalog, fully informative for the B6, D2, C, and LP strains and lying between 155 and 195 Mb on Chr 1. Panel B illustrates the distribution of the markers that are polymorphic between the B6 and D2 strains. Panel C illustrates the markers that are polymorphic between the B6 and (D2 and C) strains. Panel D illustrates the markers that are polymorphic between the B6 and (D2, C, and LP) strains. Panel E illustrates the markers that are polymorphic between the B6 and (D2, C, and LP) strains and identical between the C and LP strains. Panel F illustrates the markers that are polymorphic between the B6 and (D2, C, and LP) strains and identical among C, LP, and D2 strains.

The selection process was also applied to single nucleotide polymorphisms (SNPs) (see, e.g., Hitzemann et al. 2002). To examine this issue, we first turned to the Whitehead-Roche SNP database found at (http://www.nervenet.org.main.dictionary.html ). There were no SNPs in this database for the interval between 170 and 180 Mb; however, it was of some interest to note that over the 163–170 Mb interval, the D2 and C strains had an identical haplotype. We next turned to the SNP database found at (www. celera.org ). Data were extracted through use of the Celera Discovery System and Celera Genomics- associated databases for the B6, D2, and A strains over the interval of 160–190 Mb. The A strain was included in the analysis, given that data were not available for the C and LP strains and given that there is evidence of a QTL in the region of interest for a B6×A intercross (Gershenfeld et al. 1997). The SNPs lying between 160 and 190 Mb and polymorphic between the B6 and D2 and the B6 and A strains are illustrated in Figs. 3A and 3B. The subset of markers polymorphic between the B6 and both the D2 and A strains is illustrated in Fig. 3C; in this latter category, there were a total of 9600 SNPs. The distribution of these residual SNPs was qualitatively similar to the microsatellite distributions found in Figs. 2C and 2D and included a high density of SNPs between 171 and 174 Mb. Overall, both the microsatellite and SNP data suggested that a QTL was likely to be centered at ~173 Mb. It is of interest to note that of the 3000 10-kb segments plotted in Fig. 3c, 1287 contained no SNPs; further, there were large SNP-poor domains [from the perspective of the B6 vs (D2 and A) strains] that appear to stretch over several Mb. This “mosaic” structure is in general agreement with that reported by Wade et al. (2002).

Figure 3
figure 3

The application of the MCM algorithm to single nucleotide polymorphisms (SNPs) across the regions of interest. Panel A illustrates the SNPs polymorphic between the B6 and D2 strains (generated through use of the Celera Discovery System and Celera Genomics associated databases). Panel B illustrates the SNPs polymorphic between the B6 and A strains. Panel C illustrates the SNPs polymorphic between the B6 and (D2 and A) strains. Data are presented as SNPs/10 kb interval.

Step 3: confirmation that a QTL is associated with the interval(s) predicted by MCM

To confirm that a QTL was indeed centered at 173 Mb, BLA was mapped in heterogeneous stock animals (N = 550) at G35 following the design of Talbot et al. (1999) and was analyzed as described by Mott et al. (2000). These data (Fig. 4) confirmed the presence of a QTL in the predicted region. Additional verification of the MCM predicted region was obtained from the B6.D2-Mtv7 congenic strain (Taylor and Frankel 1993; Ferraro et al. 2001). The maximum extent of the D2 introgressed region is indicated by the black bar in Fig. 4. The congenic mice were backcrossed to B6 mice, and these mice were subsequently mated to produce mice B6 or D2 homozygous or heterozygous across the congenic interval; data from a group of standard D2 mice are included for comparison. Fig. 5 illustrates that, compared with the B6 homozygotes, BLA was significantly lower in the congenic animals; the heterozygote animals were somewhat intermediate, although the data suggest partial D2 dominance.

Figure 4
figure 4

QTL mapping for BLA in HS/Npt mice on distal Chr 1. Data were extracted for a sample of 550 HS mice by using the analysis strategy described by Mott et al. (2000). Data are plotted by usingthe cM placement of the markers found at (www.jax.org ); the Mb scale provides the best estimate of marker position available in MGVSv3 (www.ensembl.org ). Importantly, it should be noted that the QTLpeak was found at D1Mit113, which is located at 173.1 Mb. This marker also gave the QTL peak for Kcnj9 expression (Table 4). The black bar illustrates the maximum extent of the introgressed interval in the B6.D2-Mtv congenic mice (see Fig. 5).

Figure 5
figure 5

Basal locomotor activity in B6.D2 Mtv congenic mice. B6.D2 Mtv congenic mice (Frankel and Taylor, 1993) were backcrossed to B6 mice, and theprogeny were mated to produce animals B6 and D2 homozygous and heterozygous across the Mtv interval (see Fig. 4). Data are also provided for D2 inbred mice. N= 10–15/group. Data are for the first 2.5-min interval of the test session.

Step 4: gene expression among the four inbred mouse strains across the region of interest

Gene expression data (Affymetrix U74Av2 gene chip) were collected for whole brain (N = 6/strain), the dorsomedial striatum (N = 3/strain), and the central extended amygdala (N = 3/strain). Data were analyzed with the MAS 5.0 software, provided by Affymetrix. The whole brain data were collected in two cohorts (N = 3/strain); test/retest reliability was >0.99 for all strains, focusing only on those genes (or transcripts) called as present (“p” value <0.1). Genome-wide, the coefficient of variation was less than 0.1 for all strains; examples of these data are found in Table 1. Of the 12422 genes and transcripts on the U74Av2 chip, 7175 were called as present in one or more of the strains, and 6750 were called as present in all four strains. Given the redundancy of the U74Av2 chip, the actual number of unique genes expressed will be significantly less than 6750. A standard one-way ANOVA revealed that genome wide the number of genes or transcripts differentially expressed at p < 10−7, 10−6, 10−5, 10−4, and 10−3 were 181, 249, 337, 506, and 784, respectively.

Table 1 Central extended amygdala gene expression, Chromosome 1 (168–178 Mb)

Steps 1–3 identified a QTL region of interest, centered at ~173 Mb, which for the purpose of the gene expression experiments was broadly defined as the interval from 168 to 178 Mb. MGVSv3 (www.ensembl.org ) reports 117 genes, 10 predicted genes, and 9 ambiguous gene sequences for this interval. The major class of genes within the interval are a family of olfactory receptor genes (N = 17). Fifty-nine genes or expressed sequence tags (ESTs) associated with the 168- to 178-Mb interval are found on the U74Av2 chip, and of these 33 were called as expressed in whole brain, the dorsomedial striatum, and the central extended amygdala (Tables 1,2,3). No differential gene expression among brain regions was detected for this interval. The data in Table 1 illustrate that six genes (Aldh9a1, Mgst3, Rgs5, Sdhc, Kcnj9, and Dfy) or 19% of the total met the threshold from the ANOVA (F3,21 >17, p < 0.00001) for a significant difference in expression (see Methods). Although only a qualitative comparison is possible, given the smaller sample sizes for the dorsomedial striatum and central extended amygdala, the whole brain pattern of differential gene expression for Aldh9a1, Rgs5, and Kcnj9 appeared to persist in these discrete brain regions (Tables 1,2,3). The pattern of Sdhc expression persisted in the dorsomedial striatum, but not the central extended amygdala, whereas for Mgst3, the whole brain pattern persisted in the extended amygdala but not the striatum. Finally, across the three tissues, three distinct patterns of Dfy expression were found. Of the genes significantly differentially expressed in the whole brain, only Kcnj9 met the MCM criteria of B6 different from the D2, C, and LP strains and the MCM criteria of D2 = C = LP.

Table 2 Dorsomedial striatum gene expression, Chromosome 1 (168–178 Mb)
Table 3 Central extended amygdala gene expression, Chromosome 1 (168–178 Mb)

Step 5: testing for cis-regulation of the candidate quantitative trait gene(s)

For candidate QTGs detected from the integration of QTL and gene expression analyses, it follows that the QTGs must show apparent cis-regulation. Regulation of Kcnj9 transcription was characterized by using the WebQTL mapping service and the BXD recombinant inbred (RI) transcriptome database (see Methods). Cis-regulated genes are expected to map to their respective chromosomal locations. The results obtained are summarized in Table 4 and illustrate that Kcnj9 exhibits the expected cis-regulation. In addition, suggestive trans modifiers were found on Chrs 1 (proximal to Kcnj9), 3, 4, 18, and 19.

Table 4 QTL analysis for Kcnj9 expression

Step 6: interrogation of the sequence databases

With public and private databases (Ensembl and Celera), both the coding and promoter sequences of Kcnj9 were analyzed for polymorphisms between the D2 and B6 strains. For the coding region, the BLAST searches were aligned; the DBA sequence was identical to the sequence of the Genbank entry. Six silent polymorphisms were detected in the coding region (Table 5). The promoter region (1.1 kb) was queried using both “in silico” (e.g., Marshall et al. 2002) and direct sequencing approaches. Seventeen polymorphisms were detected between the B6 and D2 strains; for all of these polymorphisms, the D2 and A strains were identical. One polymorphism, at −237 bp, was predicted by using the TRANSFAC program (Wingender et al. 2001) (http://www.gene-regulation.com/ ) to disrupt the binding of three different transcription factors, i.e., Ikaros 1, MZF1, and C/EBPbeta (Table 6). For these factors, the homology of the binding site motif to that found in Kcnj9 was highest for Ikaros 1 and poorest for C/EBPβ (Table 6). Note that a polymorphism has never been reported for the bp of interest in the Ikaros 1 binding site. The sequence structure was confirmed in independent B6 and D2 samples. The C and LP strains were then sequenced; these strains had the D2 genotype.

Table 5 Nucleotide changes within the coding region of Kcnj9
Table 6 Evidence for disruption of transcription binding site in the 5′ UTR of Kcnj9 for the B6 but not D2, C, or LP strains

Discussion

Multiple Cross Mapping (MCM) is one of two mapping strategies that formally incorporate the haplotype structure of the mouse genome into QTL analysis (Hitzemann et al. 2000, 2002). The other and related strategy, which we have termed Multiple Strain Mapping (MSM), has been described in at least two different contexts (Grupe et al. 2001; Wade et al. 2002). To be fully successful, MSM will require both phenotypic and genotypic data from a large panel of inbred mouse strains, probably >50 (Chesler et al. 2001; Hitzemann et al. 2002). Both MCM and MSM strategies recognize that the haplotype structure present in the mouse genome provides a source of information and, thus, statistical power, which can be leveraged to reduce the QTL interval. The current study extends the MCM method and illustrates its usefulness for interrogating gene expression and sequence information.

The reasons for choosing the four strains used in the current study have been described in detail elsewhere (Hitzemann et al. 2002). Briefly, these strains are the phenotypic extremes for a number of phenotypes of interest to the laboratory, including BLA (B6, C vs D2, LP), ethanol-induced activation (D2, C vs LP, B6) and haloperidol-induced catalepsy (D2, C vs B6,LP) (Koyner et al. 2000; Demarest et al. 2001; Kanes et al. 1993). The strains were not chosen so as to optimize genetic diversity and/or to integrate with existing sequence databases, which for future initiatives may be desirable. The question arises as to whether or not four strains (and the associated six intercrosses) are necessary, sufficient or even optimal for MCM. Presumably, it would be possible to statistically answer this question given the availability for multiple inbred strains of very dense SSLP and/or SNP databases. For the present and focusing on the seven standard laboratory strains in the Mit catalog of microsatellites, we have found that four strains are always sufficient to specify any of the microsatellite markers, providing the B6 strain, the most genetically unique (see www. jax.org ), is one of the four (unpublished observations). The fact that four strains are sufficient reflects both the relatedness of the strains and the “relatively” simple underlying haplotype structure (e.g., Wade et al. 2002). The question of whether or not a balanced intercross design is always necessary appears difficult to answer a priori. Most of the relevant information is contained in three or four of the crosses (see Fig. 2 and Hitzemann et al. 2002). Unfortunately, there appears to be no direct mechanism for determining which crosses will be the most informative. However, it may be possible to a priori determine unnecessary crosses; for example, in the current study, only two of the non-B6 crosses are necessary to establish the markers where the C, D2, and LP strains have identical alleles.

The use of four strains is also likely to prove optimal for a reason not addressed in the current study but will be addressed in future iterations of the MCM protocol. It is reasonable to argue that genetic background effects may reduce or even silence the detection of a QTL which in turn would confound the MCM algorithm (see Hitzemann et al. 2002). In the current study, the QTL was detected in HS animals, formed by an eight-way cross (Demarest et al. 2002) that included the four MCM strains (Fig. 4). These data suggest that the QTL persists, despite the heterogeneous genetic background. However, at present, one can only estimate (Mott et al. 2000) which strains contribute to the QTL and what is the direction of the individual strain effects (remembering that for an eight-way cross, there are 36 different genotypes). Further, despite the similarity in location, the QTL detected in the eight-way cross may not be the same as that detected in the two strain intercrosses. We argue that a solution to this problem is to construct a balanced four-way cross of the MCM strains. The four-way cross is markedly simpler (10 different genotypes), and it should be possible to precisely or almost precisely correctly determine genotype. For the strains used in the current study, there is a subset of microsatellite markers (~300/genome) which discriminate among all four strains. In addition, we find that for most locations, there are always two or three sufficiently closely linked markers to permit a strong estimation of genotype. Overall, the four-way cross should provide an effective mechanism for investigating genotype vs phenotype interactions in the context of a complex genetic background.

The current study again confirms that the MCM strategy can parse a broad QTL interval into regions with low and high probability of containing the QTG or QTGs. The data in Fig. 2 confirm a previous observation (Hitzemann et al. 2000) with opportunistic intercross samples and suggest the presence of two or three high probability regions. Among these, the region centered at 173–174 Mb appears to be the most QTL congruent. However, the problem with the microsatellite data used here is the relatively small number of markers across the region of interest (~100), which in turn is reduced to a handful of markers (N = 12) by the selection process. Thus, one could argue that the selection is an artifact of the Mit database. Of great interest is the observation that despite the marked differences in the origin of the SNPs and microsatellites (SSLPs), the two distributions of residual markers show a remarkable qualitative similarity and clearly (we argue) define those regions of high and low probability for containing the QTL or QTLs (Figs. 2 and 3). Similarly, Wade et al. (2002) have noted that there is a general overlap between regions of high and low SSLP polymorphisms and regions of high and low SNP density (although this is not always the case).

Two methods were used to confirm that the MCM strategy had indeed located the regions likely to contain the QTG(s). One, the B6.D2-Mtv7 congenic strain (Taylor and Frankel 1993; Ferraro et al. 2001) captured the QTL. Two, fine mapping in HS animals (Talbot et al. 1999; Mott et al. 2000) placed the QTL in the expected location. Recently, Talbot et al. (2003) have shown that this QTL is distinct from the QTL for fear conditioning. Overall, this is the second example of where the MCM approach (for reduction of the QTL interval) has been confirmed (see also Hitzemann et al. 2002).

The main advantage of the MCM strategy is that it simultaneously reduces the QTL interval and provides a mechanism for interrogating expression and sequence data. Across the region of interest (Table 1), 19% of the genes were differentially expressed at p < 10−6 or better; this is approximately five times the genome-wide rate. Previously, we argued (Belknap et al. 2001) that the rate of differential gene expression would likely be low (see, e.g., Sandberg et al. 2000), and thus, expression would be an effective filter for sorting through candidate QTGs. At least in the current example, this is not the case. Further, given that the U74Av2 chip provides coverage for only 50% of the genes in the region, there may well be ten or more genes that would meet the criteria for differential expression. The question arises as to whether or not this will be typical for QTL-rich regions such as distal Chromosome 1. At the moment, we have too few data to answer this question. However, it is perhaps of interest to note that there are only two regions on all of Chr 1 which have an SNP density similar to that found at ~173 Mb (unpublished observations).

The application of the MCM algorithm to the gene expression data reveals only one gene, Kcnj9, which meets the criteria of B6 different from C, D2, and LP and C = D2 = LP. This pattern of expression detected in the whole brain was also found to be qualitatively similar in both the extended amygdala and dorsomedial striatum; previous studies have suggested that the extended amygdala has a key role in regulating the response to a novel environment (Hitzemann and Hitzemann 1997, 1999). Kcnj9 encodes a G-protein inwardly rectifying potassium channel (known as GIRK3) which is coupled to a wide variety of neurotransmitter systems, including dopamine, opiate, and serotonin and thus, a broad effect on behavior cannot be unexpected (see Torrecilla et al. 2002). Sandberg et al. (2000) noted that, compared with the 129S1 strain, Kcnj9 expression was markedly lower in the B6 strain; this pattern was found in whole brain and multiple brain regions. These data led the authors to suggest that Kcnj9 may be associated with open-field/basal locomotor activity QTL on distal Chr 1. The data presented here confirm and extend their conclusion.

For a QTG found within the QTL interval and detected from analysis of expression data, it follows that the QTG should show apparent cis-regulation. The term “apparent” is key since the analysis strategy used to detect cis-regulation could not detect the difference between true and pseudo cis-regulation. The latter term would apply to the situation where a gene or regulatory site affects the expression of the candidate QTG but is closely linked only from the QTL perspective and may indeed lie some considerable distance from the gene of interest. The WebQTL mapping service and the BXD RI strain transcriptome database were used to determine that Kcnj9 did indeed exhibit apparent cis-regulation (Table 4). Taking Kcnj9 expression as the phenotype and performing a standard QTL analysis with permutation test (N = 1000), a peak LOD score of >12 was detected on distal Chr 1. Interestingly, evidence for modifiers was found on Chr 1 (proximal to Kcnj9) and Chrs 3, 4, 18, and 19; none of these peaks were associated with the DNA binding proteins found in Table 6.

Although the emphasis of the current study has been on integrating QTL and expression data, the multiple cross approach could easily be applied to interrogating coding region sequences. Marshall et al. (2002) have recently developed an algorithm for extracting coding region sequence information for the B6 and D2 strains from public and private databases and detecting putative functional polymorphisms. The algorithm can be readily applied to the A and 129 strains for which extensive sequence data are also available. Previously we (Xu et al. 2002) used the MCM to determine that coding sequence polymorphisms in Cas1 and Bdnf were not associated with a QTL for ethanol-induced activation. The QTL was originally mapped in a B6 × D2 intercross. Polymorphisms between the B6 and D2 strains in both Cas1 and Bdnf (both considered to be plausible candidate QTGs) were first detected by both direct sequencing and “in silico” approaches. Sequence data were then obtained for the C and LP strains, which did not match with the QTL results (Hitzemann et al. 2000). A similar approach was applied to Kcnj9 in the current study. Six polymorphisms between the B6 and D2 strains were detected in the coding sequence, none of which are predicted to change amino acid composition; however, the possibility that these polymorphisms may affect transcriptional efficiency cannot be discounted. Numerous polymorphisms were found between the B6 and D2 strains in the promoter region, consistent with the high density of SNPs (Fig. 3). One of these polymorphisms was predicted to disrupt a binding domain for three transcription factors (Ikaros, MZF1, and C/EBPβ). For these factors, the binding of Ikaros1 was predicted to be the most affected. Although generally associated with the hemopoietic system, Ikaros1 is also found in the brain (Yu et al. 2002).

The multiple cross method is tied to the idea that the haplotype structure of the mouse genome can be used to reduce the QTL interval and interrogate expression and sequence databases (Hitzemann et al. 2000, 2002). Wade et al. (2002) have reached essentially a similar conclusion based on their analysis of SNP structure (but see also Grape et al. 2001). For the moment, we favor the multiple cross as opposed to the multiple strain method, given the relative lack of detailed maps for a large number of multiple strains; however, given that collecting the data for multiple strains is technically feasible and given the advantages of working with precisely determined strain means, the multiple strain approach may ultimately prove the most effective. Whether or not one favors the multiple cross or multiple strain approach, the need for alternative strategies to the conventional approaches for detecting QTGs is obvious. Perhaps the most widely accepted approach for moving from QTL to QTG involves capturing the QTL in a congenic strain, followed by further reduction of the QTL interval in recombinant congenic strains (Darvasi 1998). This process may well take 3–4 years, with no guarantee of success; i.e., the QTL may be lost during the production of the congenic animals. When one succeeds in capturing the QTL in a small interval, the next steps still require an effective mechanism for determining which polymorphisms within the interval are functionally important and which are not.

The multiple cross approach emphasizes the most successful aspect of QTL analysis, namely, QTL detection. For rodent behavioral traits alone, more than 120 QTL have been detected and confirmed with LOD scores of >3 (Crabbe et al. 1999; Flint 2003), and most certainly this is an underestimate. This success in QTL detection, which has largely occurred almost entirely in the last 10 years, is due to several factors, including marked improvements in experimental design and analysis and the availability of dense genetic maps with easily genotyped markers (e.g., Belknap et al. 1996; Darvasi 1998; Churchill and Doerge 1994; Dietrich et al. 1994, 1996). The multiple cross approach builds from this success and is most effective when one starts with known QTLs, which focuses the genotyping effort. The goal of the multiple cross approach is not to detect new QTLs. The multiple cross approach will be aided by several tools and databases which should soon become widely available to all investigators. For example, multiple strain SNP and sequence databases will be developed (e.g., Wade et al. 2002), which in turn will provide greater detail of haplotype structure and thus greater statistical power to dissect QTLs. In addition, one of the authors (R. Hitzemann) has posted a brain gene expression database for nine inbred mouse strains, which should be publicly available at (www.jax.org) in the summer of 2003. However, a cautionary note must be sounded about expression databases, especially when working with the brain. Currently available databases have only a limited power to detect small but potentially functionally important changes. Further, some changes may only occur in very discrete brain regions and/or only at very discrete stages of development, and these will be missed. Thus, the current configuration of the gene expression tool will be most useful for those genes that show both global and persistent differences in expression.

Overall, the data presented here illustrate a strategy for moving in a timely fashion from a QTL to “candidate” QTG and to “candidate” QTN. The conversion of “candidate” QTG and candidate QTN to “proven” QTG and “proven” QTN will presumably be implemented by a variety of strategies. In some cases it will be possible to swap alleles, selectively silence gene expression, or use some related molecular methods. For some cases, pharmacological approaches may be the most useful. However, regardless of the method used, we argue that the multiple cross approach has the potential to move one quickly to the final stage.

Note added in proof: Kcnj10, an ATP sensitive inwardly rectifying potassium channel, is found 12,000 bps from Kcnj9. Hybridization of the whole brain samples to the new Affymetrix MOE430A array has revealed that Kcnj10 exhibits the Kcnj9 pattern of differential regulation i.e. low in the B6 strain. However, Kcnj10 exhibits only suggestive cis-regulation and thus we still conclude that Kcnj9 is the best candidate for the QTG.