Introduction

The processing of emotionally salient information is a fundamental feature of the mammalian nervous system and is likely to be involved in a variety of psychiatric disorders (Gershenfeld and Paul 1998). Fear learning has been extensively studied for several decades in both mice and humans (LeDoux 2003) and provides a framework for the investigation of emotional regulation. In particular, there is a positive relationship between fear learning as measured in a laboratory setting and anxiety in humans (Lissek et al. 2005), suggesting that fear learning provides a means of probing the activity of brain circuits related to anxiety. In an effort to identify the genetic basis of emotionality in mice, a number of previous studies have examined fear learning (Caldarone et al. 1997; Gershenfeld and Paul 1997; Owen et al. 1997; Ponder et al. 2007a, 2007b; Radcliffe et al. 2000; Talbot et al. 2003).

Short-term, bi-directional selection can be used to create lines of mice with differences in the selection phenotype while limiting inbreeding and fixation of trait-irrelevant alleles (Belknap et al. 1997). Short-term selected lines (STSL) are thus particularly well suited to assess correlated phenotypes and can be used to directly address questions about the nature of the selected phenotypic difference (Phillips et al. 2002). We and others have previously used short-term selection to produce reliable differences in fear conditioning in a cross between C57BL/6J and DBA/2J (Ponder et al. 2007a; Radcliffe et al. 2000). In the present study we created an STSL for contextual fear conditioning based on an F2 cross between C57BL/6J (B6) and A/J (AJ) mice.

After creating STSL lines we used a number of tests to determine whether additional phenotypes were correlated with selection. The effect of the number of CS–US pairings and the effect of different shock intensities on fear conditioning was examined to determine whether the difference in fear conditioning was the result of specific experimental parameters. Next, extinction of freezing to context was examined in both lines. In addition, fear potentiated startle was used to gauge fear learning in a freezing-independent manner. We also measured acute sensitivity to foot shock to determine whether selection had altered shock sensitivity. We then evaluated fear conditioning in isogenic strains that had opposite tyr alleles to directly evaluate the role of tyr in fear conditioning. Finally, we used a combination of QTL analysis and gene-expression microarrays to identify other genes that are differentially expressed between the selected lines.

Materials and methods

Environment and housing

All experiments were performed in accordance with the National Institutes of Health guidelines for the care and use of laboratory animals and were approved by Columbia University’s (most studies) or University of Chicago’s (studies of B6-albino and A/J-pigmented mice) Institutional Animal Care and Use Committees. Mouse colony rooms were maintained on a 12/12 light/dark cycle with lights on at 07:00 h. Two to five same-sex littermates were housed in clear plastic cages with standard corn-cob type bedding. All mice were maintained with food and water ad libitum, except during testing. Certain procedures were followed in all behavioral experiments: testing was conducted during the light phase between 09:00 and 17:00 h, mice were brought into the testing room in their home cages and allowed to adapt for a minimum of 30 min before testing.

Fear conditioning

Fear conditioning procedures were identical to those described previously in Ponder et al. (2007a). Fear conditioning chambers obtained from Med Associates (St. Albans, VT) had inside dimensions of 29 cm × 19 cm × 25 cm with metal walls on each side, clear plastic front and back walls and ceilings, and stainless steel bars on the floor. A fluorescent light provided dim illumination (∼3 lux) and a fan provided a low level of masking background noise. Chambers were cleaned with 80% ethanol (New York) or 10% isopropanol (Chicago) between animals. Behavior was recorded with digital video and analyzed with FreezeFrame software from Actimetrics (Evanston, IL). Mice were between 7 and 14 weeks old on the first day of testing.

Testing for fear conditioning was conducted over 3 days. On test day 1 baseline activity was measured, beginning 30-s after the mice were placed into the test chambers, and ending 150-s later. The mice were then exposed twice to the conditioned stimulus (CS), which consisted of an 85 dB, 3 kHz tone that persisted for 30 s and co-terminated with the unconditioned stimulus (US), which was a 2-s, 0.5 mA foot shock, with a 30 s inter-trail interval (ITI) between pairings.

On day 2, the testing environment was identical to day 1; however neither tones nor shocks were presented. Freezing in response to the test chamber (context) was measured beginning 30-s after the mice were placed into the test chambers and ended 150-s later. Selective breeding was based on a freezing score which was the percentage of time spent freezing on day 2 minus the percentage of time spent freezing on day 1, both during the 150-s periods mentioned above.

On day 3, the context was altered in several ways: a different experimenter wore a different style of gloves, the transfer cages had no bedding, the metal shock grid was covered with a white plastic floor, a bent white plastic wall was inserted into the test chamber, a yellow light filter was placed over the chamber lights, chambers were cleaned with 0.1% acetic acid solution, and the vent fan was partially obstructed to change the background noise. The CS was presented at the same times as on day 1, but no foot shock was administered. The “freezing to tone” score is an average of the percentage of time spent freezing during the two 30-s CS presentations. Data from day 3 were not used to determine which animals would be selected as breeders.

Selective breeding

Procedures for mass selection to create STSL are described in Ponder et al. (2007a). The foundation population was an F1 cross between C57BL/6J (B6) females and A/J (AJ) male mice obtained from The Jackson Laboratory (Bar Harbor, ME). The B6AJF1 mice were intercrossed at Columbia University to obtain an F2 population. One hundred and forty-four F2 mice (72 of each sex) were tested for FC. From these mice, the 12 males and 12 females with the highest freezing score were bred to create the high selected lines regardless of family origin (brother-sister breeder pairs were avoided). In addition, three alternate breeder pairs were created from the mice with next highest freezing scores, and their offspring used for the subsequent line only when primary breeder pairs failed to produce offspring. The same procedure was repeated with the lowest freezing animals from the F2 population in order to create the low selected line. From the high-freezing breeder pairs, 96 offspring from 12 families (as close to four male and four female animals per family as possible) constituted the first selected generation (denoted S1). Mice were selected in the same manner, until the S4 litter had been produced. The procedure to produce the low selected line was identical, except that mice with the lowest freezing scores were bred in each generation. The line names reflect the number of generations removed from the F2 population, and their freezing behavior (L for low, H for high), so that the offspring of the 4th selected breeders for the high freezing line were called S4H. Forty-eight mice per line from the S4 generation were phenotyped for FC. All phenotypes were measured in naïve mice between 7 and 14 weeks old. Equal numbers of male and female mice were tested. The lines were terminated after the S4 generation.

Effect of number of CS/US pairings on freezing behavior

The effect of number of CS–US pairings on the freezing exhibited on day 2 was tested by using different numbers of tone-shock pairings (0, 1, 2, 4 and 8) on the training day in an approximately equal number of male and female mice from the S3 generation (n = 40 per line; age = 8–14 weeks). All sessions were identical to those described above, except that the session length was shortened or extended to accommodate the different number of CS/US pairings. On day 1 the 0 CS–US group was presented with two 30-s tones but received no shocks. Day 2 lasted 5-min regardless of day 1 treatment group.

Effect of shock magnitude on learning

To test the effect of the shock magnitude (in mA) on the freezing behavior exhibited on day 2, we varied the intensity of the shocks presented on day 1 in male and female mice from the S4 generation (n = 42–45 per line; age = 8–10 weeks). All groups contained approximately equal numbers of male and female mice. The training session was identical to the session used for selection except that the different shock intensities were used (0, 0.1, 0.3, 0.5, 0.7 or 0.9 mA; n = 8 per line per intensity).

Extinction of freezing to context

Extinction of freezing to context was tested in male and female mice from the S4 generation (n = 12 per sex/per line; age = 7–12 weeks). The training session was identical to the session used for selection except that a “no-shock” control group was also included. The “no-shock” control group received a training session that was identical to day 1 except that it did not include any shocks. The testing sessions for days 3–9 were identical to day 2. Corrected freezing scores for each day were obtained by subtracting the baseline freezing on training day from time spent freezing on each subsequent testing day.

Fear potentiated startle

Fear potentiated startle (FPS) was tested in male and female mice from the S4 generation (n = 36 low; n = 35 high; age = 7–14 weeks). Two-thirds of the mice in this study were male and only one-third were female, this was a result of availability, this ratio was uniform across treatment groups and lines; none of our analyses detected main effects or interactions involving sex. The protocol and apparatus are similar to those that have been described previously (Ponder et al. 2007a). On day 1 mice from each line (low or high) were randomly presented with one of three training sessions. The paired training session (P) consisted of 20 light + shock pairings (30-s chamber light co-terminating with 0.5-s 0.14 mA shock). The unpaired training session (UP) consisted of 10 light + shock pairings, 10 shock alone trials (0.5-s 0.14-mA shock) and 10 light alone trials (30-s chamber light, no shock). The no shock training session (NS) consisted of 20 light alone trials—no shocks were presented to mice in this treatment group. On day 2 all mice were presented with the same set of stimuli, which consisted of 12 light plus startle trials and 12 trials that consisted of that startle stimuli alone, in the absence of the light cue. The P and NS groups consisted of 12 mice, and the UP group 11 mice. The maximum response for each trial in the 65 ms following the onset of the startle stimulus was used for all calculations.

Shock sensitivity

Sensitivity to the foot shock was assessed using the startle apparatus described above in male and female mice from the S4 generation (= 14 per line; age = 11–13 weeks). After a 5-min adaptation period, five startle trials were presented, separated by an average of 15 s, with pseudo-random ITI. Each startle trial consisted of a 40-ms burst of 100-dB white noise. Next, a total of 25 shocks were presented in five blocks that consisted of one shock of each of the five intensities (0.1, 0.3, 0.5, 0.7, and 0.9 mA; all shocks were 500 ms in duration) in a pseudorandom order. Each shock trial was separated by a pseudo-random ITI that averaged 90-s. After the last shock, a final block of five startle trials was presented, after which the session ended. The startle response to each shock trial was measured in the same manner as in FPS except that the response window was 500 ms and both the maximum and average responses over that time window were used for analysis.

Fear conditioning in B6, B6-albino, AJ and AJ-pigmented strains

Mice for these studies were obtained from Jackson Laboratories and bred in house at the University of Chicago. These were the only studies conducted at the University of Chicago. All equipment, analysis and personnel were identical to those used for the selection studies conducted at Columbia University. Fear conditioning was tested in C57BL/6J (B6), C57BL/6J−Tyrc−2J− (B6-Albino), A/J (AJ) and A.B6−Tyr+/J (AJ-pigmented). B6-albino mice are B6 inbred strain with a point mutation inactivating the tyrosinase gene and AJ-pigmented are a congenic A/J strain with the B6 allele for tyrosinase. Testing procedures were identical to those used for selection except that we employed a 0.7 mA shock for some of the studies, in an effort to elicit greater freezing behavior from the low-freezing B6 mice. A total of 20 B6, 20 B6-albino, 22 AJ and 21 AJ-pigmented mice were used at the 0.5 mA shock intensity. For the 0.7 mA shock intensity, 20 B6, 20 B6-albino, 13 AJ and 6 AJ-pigmented mice were used. Males and females were approximately equally represented in all groups.

Gene expression

We compared gene expression between the high and low selected lines using Affymetrix MOE430v2 microarrays. Gene expression was measured in the amygdala or hippocampus of experimentally naïve male mice from the S4 generation. We collected a total of 72 samples, 36 per selected line, with 18 representing the hippocampus and the other 18 representing the amygdala. Collection of tissue has been described previously (Ponder et al. 2007a).

A second microarray experiment measured gene expression differences in the hippocampus of 7-week-old B6 and B6-Albino inbred strains again using Affymetrix MOE430v2microarrays. Total RNA was isolated from the 18 naïve male mice, 9 per inbred strain. We did not examine gene expression in the amygdala in these mice because we had previously observed much greater experimental noise in amygdala samples which we believe is partially due to difficulties in reproducibly micro-dissecting the same piece of amygdala relative to the consistency in hippocampal dissections.

In both experiments, total RNA from three male siblings was pooled and hybridized to a single Affymetrix MOE430v2 array. The concentration of the purified total RNA was quantified using a spectrophotometer and combined such that each individual sample was represented equally in each pool. Total RNA samples were sent to the NINDS/NIMH Microarray Consortium (http://arrayconsortium.tgen.org) where they were processed, labeled and hybridized to the arrays using standard Affymetrix protocols.

Array data were normalized within each experiment using Bioconductor 1.8 and the GCRMA algorithm (Gentleman et al. 2004; Irizarry et al. 2003). The normalized expression levels from the S4 selected mice were then subject to a two-way ANOVA with brain region and line as factors. The normalized expression levels from the B6 and B6-Albino mice were subject to a two-tailed t-test. We then used the program QValue (http://faculty.washington.edu/~jstorey/qvalue) to calculate q-values from the p-values from the ANOVAs and t-tests (Storey and Tibshirani 2003) and used a q < 0.05 threshold for significance. Present, marginal or absent (P/M/A) calls were calculated in R using the “mas5calls” command to verify all probe sets were expressed. In the selected line comparison, the fold change for each significant transcript was determined for each brain region to find the probe sets with largest differences in expression.

QTL mapping

Breeders from the S3 generation were genotyped by KBiosciences (Hoddesdon, Hertfordshire, UK) with 165 informative markers selected from Petkov et al. (2004). We searched for quantitative trait loci (QTL) by comparing the frequency of each marker allele in the high and low selected populations from each generation to the probability distribution defined by genetic drift. Allele frequencies that are unlikely to be the results of genetic drift are presumed to reflect the effects of selective breeding (Belknap et al. 1997).

Results

Creation of high and low fear conditioning selected lines

Bi-directional selection for freezing score (defined as the difference between freezing to context on day 2 and pre-training freezing observed on day 1) resulted in a highly significant divergence between the high and low lines (Fig. 1a). A three-way ANOVA with line, generation and sex as factors, showed significant effects of line (F [1,651] = 769.8; < 0.001), generation (F [3,651] = 7.09; < 0.001) and sex (F [1,651] = 7.82; p < 0.01). Females froze more then males. The only significant interaction was between generation and line (F [3,651] = 8.94; p < 0.001). The difference between the lines was significant from the S1 generation (F [1,190] = 120.6; p < 0.001) and behavior progressively diverged in each subsequent generation. To determine the significance of the response to selection, one-way ANOVAs were performed within each line for generation (F2–S4). There was a significant effect of generation in the low line (F [4,471] = 37.0; p < 0.001) and in the high line (F [4,474] = 23.9; p < 0.001).

Fear conditioning was tested in the F2 population in three 2-h sessions with 48 animals each, with a 30 min break between each session. A one-way ANOVA of freezing score by session number (1st, 2nd, 3rd) showed no effect of time of day on freezing behavior (F [2,141] = 1.76; p = 0.18).

Fig. 1
figure 1

Response to selection. (a) Freezing score (% freezing to context − % pre-training freezing), (b) % freezing to tone, (c) % pre-training freezing, and (d) % freezing to the altered context F2 is a single population common to both selected lines represented by two bars. Each other bar is a single high or low selected generation. Bars are mean ± SE. * < 0.05

Selection for freezing to context also altered freezing in response to tone (Fig. 1b). The freezing to tone was analyzed with a three-way ANOVA for sex, generation (S1–S4) and line. There was a significant effect of line (F [1,648] = 193; p < 0.001), generation (F [3,648] = 6.69; p < 0.001) and sex (F [1,648] = 16.2; p < 0.001). Females froze more then males. Although there were no significant interactions, we chose to perform one-way ANOVAs for generation within each line; these tests showed significant effects of generation in both the low line (F [4,471] = 17.4; p < 0.001) and the high line (F [4,471] = 5.4; p < 0.001). Had we applied a correction for freezing in the altered context (prior to the presentation of the tone; Fig. 1d), the two selected lines would not have been nearly as divergent.

Pre-training freezing changed in response to selection (Fig. 1c), which suggests a difference unrelated to learning. A three-way ANOVA for sex, line and generation found no significant effect of sex, so data were then collapsed on sex for further analysis of pre-training freezing. A two-way ANOVA of generation (S1–S4) and line showed a main effect of line (F [1,664] = 73.4; p < 0.001) and generation (F [3,664] = 14.7; p < 0.001) and their interaction (F [3,664] = 5.31; p < 0.01). Planned comparisons showed a difference in pre-training freezing in the S1 (F [1,664] = 4.0; p < 0.05), the S2 (F [1,664] = 18.1; p < 0.001), the S3 (F [1,664] = 57.4; p < 0.001) and the S4 (F [1,664] = 14.2; p < 0.001). These differences confound the interpretation of the selection phenotype as a measure of learning.

Selection also affected freezing in the altered context (Fig. 1d). A three-way ANOVA with line, generation and sex as factors, showed significant effects of line (F [1,648] = 230.1; p < 0.001), generation (F [3,648] = 7.09; p < 0.001) and sex (F [1,648] = 5.59; p < 0.05; females froze more), with a significant interaction between line and generation (F [3,648] = 41.4; p < 0.001). A one-way ANOVA with generation showed a significant effect in the low line (F [4,471] = 59.3; p < 0.001) and in the high line (F [4,471] = 54.8; p < 0.001).

Effect of number of CS/US pairings on freezing behavior

The difference in freezing behavior between the selected lines was consistent over a range of tone-shock pairings (Fig. 2). In a three-way ANOVA there was a significant effect of line (F [1,60] = 125.0; p < 0.001) and number of shocks (F [4,60] = 19.8; p < 0.001), but not sex. There was a significant interaction between line and shock (F [4,60] = 6.61; p < 0.001) and sex and shock (F [4,60] = 2.79; p < 0.05). Separate one-way ANOVAs were then used to test the effect of number of shocks in each line individually. There was a significant effect of number of shocks in both the low line (F [4,35] = 3.02; p < 0.05) and the high line (F [4,35] = 23.6; p < 0.001). In the low line one shock (F [1,35] = 1.28; p = 0.27) was not different from no shocks, but two shocks (F [1,35] = 4.4; p < 0.05), four shocks (F [1,35] = 10.9; p = 0.01) and eight shocks (F [1,35] = 4.0; p = 0.05) were significantly different. In the high line one shock (F [1,35] = 63.9; p < 0.001), two shocks (F [1,35] = 50.7; p < 0.001), four shocks (F [1,35] = 49.2; p < 0.001), and eight shocks (F [1,35] = 67.6; p < 0.001) were all different from no shocks. Planned comparisons between the lines were not different for no shocks (F [1,60] = 0.94; p = 0.33), but were different for one shock (F [1,60] = 53.1; p < 0.001), two shocks (F [1,60] = 32.4; p < 0.001), four shocks (F [1,60] = 17.2; p < 0.001) and eight shocks (F [1,60] = 47.4; p < 0.001).

Fig. 2
figure 2

Freezing score as a function of the number of CS/US pairings. Mice from the high and low lines were exposed to 0, 1, 2, 4, or 8 shocks of 0.5 mA. Bars are mean ± SE. * < 0.05

Effect of shock magnitude on learning

The difference in freezing behavior between the selected lines was observed at all shock intensities tested in the S4 generation (Fig. 3). In a three-way ANOVA for line, sex and shock, sex had no significant main effect or interactions, so sex was collapsed in further analyses. A two-way ANOVA revealed a significant effect of line (F [1,75] = 157.9; p < 0.001), shock intensity (F [5,75] = 19.3; p < 0.001) and their interaction (F [5,75] = 4.1; p < 0.001). One-way ANOVAs were then used to test the effect of increasing shock intensity in each line individually. There was a significant effect of shock intensity in the low line (F [5,39] = 11.1; p < 0.001) and the high line (F [5,36] = 10.6; p < 0.001). Increasing shock intensity resulted in increased freezing in both lines. In the low line 0.1 mA (F [1,75] = 0; p = 0.96), 0.3 mA (F [1,75] = 0.1; p = 0.78), 0.5 mA (F [1,75] = 1.5; p = 0.22) were not different than no shock, but 0.7 mA (F [1,75] = 8.5; p < 0.01) and 0.9 mA (F [1,75] = 11.9; p < 0.001) were. In the high line the 0.1 mA shock was not different from no shock (F [1,75] = 2.6; p = 0.11), but the 0.3 mA (F [1,75] = 20.4; p < 0.001), 0.5 mA (F [1,75] = 37.4; p < 0.001), 0.7 mA (F [1,75] = 35.5; p < 0.001) and 0.9 mA shock intensities (F [1,75] = 55.7; p < 0.001) were different from no shock. Line was significantly different at all shock intensities, including the 0.0 mA (no shock) intensity.

Fig. 3
figure 3

Freezing score as a function of shock intensity. The control group (marked 0) were not exposed to any shocks. All other groups were exposed to two shocks of the indicated intensity. Bars are mean ± SE. * < 0.05

Extinction of freezing to context

Despite the difference between the selected lines in freezing behavior, the number of no stimulus presentations required for extinction of freezing was nearly identical (Fig. 4). The high selected lines showed much higher freezing on day 2, but shocked and un-shocked mice were similar after 4 days of exposure to the context. The low selected lines showed much less freezing behavior after training on day 2, but they did show elevated freezing for the 3 days subsequent to training. An initial four-way ANOVA for sex, treatment, line and day showed no significant effects or interactions with sex. A three-way ANOVA of treatment, line and day revealed significant three-way interaction between treatment, line and day (F [7,308] = 4.4; p = 0.001).

Fig. 4
figure 4

Extinction of contextual fear conditioning. Solid lines are groups exposed to foot shock. Dashed lines are control groups not exposed to foot shock. High freezing line is in black and the low freezing line in grey. Each point represents the freezing to context on that day minus pre-training freezing on day 1. Points are mean ± SE. * < 0.05

Planned comparisons to investigate the source of the line*treatment*day interaction first compared the two treatment groups in a single line on day 2. Mice exposed to shock froze more on day 2 in the low line (F [1,22] = 8.0; p < 0.01) and high line (F [1,22] = 81.3; p < 0.001), but again the high line froze more than the low line in the no shock treatment group (F [1,22] = 13.7; p < 0.01). The two treatment groups were first equivalent on day 5 in the low freezing line, and day 6 in the high freezing line indicating only a minor difference in extinction. The rate of extinction in the high freezing line is driven as much by the increasing inactivity of the no-shock control as it is by the daily decrease in freezing of the shocked group. The no-shock control for the high freezing line “freezes” much more than either treatment groups in the low line on day 3 and beyond. This is another indication of differential activity levels between the selected lines.

Fear potentiated startle

Fear potentiated startle (FPS) measures fear learning by assessing the increase in the startle response in the presence of a CS after it has been paired with the US, relative to the startle response in the absence of that CS. Fear potentiated startle was exhibited by the low freezing line in the paired treatment group (P), but was not observed in the P groups from the high line or in the no shock (NS) or the uncorrelated (UC) negative control groups from either line. A two-way ANOVA for line and treatment showed no effect of line, but a significant effect of treatment (F [2,65] = 4.93; p = 0.01). The low line showed a trend towards greater FPS (Fig. 5), though the effect was non-significant (F [1,22] = 2.66; p = 0.12). This non-significant trend was opposite to the expected direction; we would have predicted that fear conditioning as measured by freezing to context would have been correlated with fear potentiated startle, as had been observed in our previous study (Ponder et al. 2007a).

Fig. 5
figure 5

Fear potentiated startle. Grey bars are high selected mice and white bars are low selected mice. Two control groups, No shock (NS) and unpaired (UP) are included along with fear conditioned potentiated groups (P). Bars are mean ± SE

Shock sensitivity

Selection for high or low freezing could have changed the frequency of alleles that influence sensitivity to shock, resulting in differential fear conditioning due to differential sensitivity to the US, rather then a difference in learning. We measured the acute startle response at each of five shock intensities in both high and low selected mice. There was a main effect of sex for both the max (F [1,24] = 9.45; p < 0.01) and average measure (F [1,24] = 12.4; p < 0.01) but no significant interaction involving sex and line, so sex was removed for all further analyses (males showed somewhat greater shock sensitivity compared to females). There was no main effect of line, or interaction between line and shock intensity for max measures (Fig. 6a), however, for the average measure there was a significant interaction between line and shock intensity (F [4,104] = 4.59; p < 0.01), which reflected greater shock sensitivity in the low line at all shock intensities (Fig. 6b). While selection was associated with altered sensitivity to foot shock as measured by the average (but not the max) measure, the low line appeared to be more sensitive than the high freezing mice, which is opposite to the observed fear learning differences. There was a clear main effect of shock intensity for both the max (F [4,104] = 117; p < 0.001) and average measures (F [4,104] = 197; p < 0.001), indicating greater response among both lines to increasing shock intensities. Because the low line appeared more sensitive to the shock, it appears that shock sensitivity is unrelated to the robust response to selection observed in these mice.

Fig. 6
figure 6

Shock sensitivity. Sensitivity to shock was determined by measuring the acute motor response to shocks. (a) Max response to shock. (b) Average response to shock. Points are mean ± SE

Fear conditioning in albino and congenic strains

We tested the inbred strains used to create the selected lines, along with an albino B6 strain and an AJ with a functional tyrosinase allele (Fig. 7) with a shock intensity of 0.5 and 0.7 mA. An ANOVA for line with B6 versus B6-albino showed no significant difference in freezing to context at the 0.5 mA shock intensity (F [1,38] = 0.41, p = 0.53) but a significant decrease in freezing to context in the B6-albino mice relative to the B6 mice at the 0.7 mA shock intensity (F [1,38] = 4.15, p < 0.05). An ANOVA for line with the AJ versus AJ pigmented strain showed no significant difference for either the 0.5 mA shock intensity (F [1,41] = 1.52; p = 0.22) or the 0.7 mA shock intensity (F [1,17] = 0.00; p = 0.97). We used the 0.7 shock intensities because the B6 mice have very low freezing at 0.5, and thus it was difficult to detect the expected decrease in freezing caused by the albino allele when it was expressed on the B6 background.

Fig. 7
figure 7

Freezing score in mice with different and the tyr alleles. Behavior of B6, B6-albino, AJ, and AJ-pigmented mice trained with 0.5 or 0.7 mA shocks. Bars are mean ± SE. * < 0.05

Gene expression

The results of a two-way ANOVA of microarray data from the S4 selected lines for the factors brain region and line identified 364 probe sets that showed a main effect of line and that had q-values <0.05. Of the 364 probe sets identified using the AVOVA, 76 had a fold change of 1.5 or greater in at least one brain region (Table 1). The list contained seven genes represented by two probe sets; coiled-coil domain containing 91 (Ccdc91), galactokinase 2 (Galk2), glial fibrillary acidic protein (Gfap), heat shock protein 1 (Hspb1), myocyte enhancer factor 2C (Mef2c), Musashi homolog 2 (Msi2) and suppressor of Ty 16 homolog (Supt16h) such that the 76 probesets corresponded to 69 unique genes.

Table 1 Differentially expressed genes in the amygdala and hippocampus of high and low selected lines

The results of the t-test of microarray data from the B6 and B6 albino inbred strains did not identify any genes with a main effect of line and a q-value <0.05. This experiment was fundamentally different from the selected line comparison because all the mice within each inbred strain were genetically identical. RNA from three identical mice was combined for each chip, and each group was measured with three chips. We used q-values to address the multiple testing issues associated with the 45,101 probe sets contained on the MOE430 2.0 array, however, had we found a gene that was non-significantly differentially expressed between these two inbred strains but was significantly differentially expressed between the selected lines, we would have accepted an uncorrected p-value of <0.05. Even with this relaxed threshold, none of the genes detected in the selected lines showed expression differences in between the B6 and B6-albino mice.

Quantitative trait loci (QTL)

We used the method of Belknap (1997) to calculate p-values for each marker that was typed in the selected lines (Table 2). Marker rs3686613, which is on Chromosome 7 and within 4 MB of the tyr allele was significant at the genome-wide level with a p < 9.6−10. Chromosomes 2, 3, 5, 10, 13 and 14 can be considered suggestive at the genome-wide level with p < 0.0034.

Table 2 QTLs based on allele frequency changes following selection

Discussion

Short-term selection for fear conditioning in a cross between B6 and AJ strains produced a divergence in behavior after the first generation of selection, and behavior continued to diverge over four generations of selection (Fig. 1). While the difference in freezing to context was robust, a number of correlated phenotypes suggest that the observed difference might not be completely accounted for by a difference in learning. A difference in the pre-training activity was observed between the selected lines in the later selected generations. Freezing is used to measure fear learning, so differences in pre-training activity might be expected to confound this measure of learning. This was not the case in our prior selection study in which B6 and D2 mice were used as the foundation for selection (Ponder et al. 2007a). Additionally, fear potentiated startle did not differ between the selected lines; this suggests that the response to selection was specific to freezing as a measure of learning and raises concerns as to whether freezing can be interpreted unambiguously in these lines. Again, this result differed from our prior selection experiment (Ponder et al. 2007a). Microarray and QTL analysis identified many significantly differentially expressed genes and chromosomal regions. While there was strong initial reason to think the QTL on chromosome 7 was due to the albino allele itself, we identified only modest evidence in support of this idea. Microarray studies showed that none of the gene expression differences were related to the tyrosinase allele. Thus, while there are reasons to be cautious about the nature of the behavioral differences between the high and low selected lines, we have identified QTL regions and genes that may be involved in fear learning.

Evidence of non-learning related changes in response to selection

Several tests indicated a difference in the activity levels between the two selected lines. A difference was observed in pre-training freezing in the first selected generation, and the difference became more pronounced with each subsequent generation. The experiments for number of tone-shock pairings (Fig. 2), shock magnitude (Fig. 3), and extinction (Fig. 4) all included no-shock controls for each line and each indicate that the high freezing line was less active than the low freezing line in the absence of fear conditioning. Thus, part or all of the response to selection may be due to differences in activity rather than differences in learning.

Whereas our prior study of lines selected for differences in freezing behavior (Ponder et al. 2007a) showed a corresponding difference in FPS, no such differences were observed in the present study (Fig. 5). This raises questions as to whether there were true differences in fear learning between the two lines. Unlike rats, mice do not readily show FPS; consistent with this, neither the high nor the low lines showed a difference between the paired and either of the two control groups. It is therefore difficult to interpret these data because, unlike our prior study (Ponder et al. 2007a), the parameters we selected failed to induce significant FPS in either line.

A test of the shock sensitivity in the startle apparatus suggests a difference in shock sensitivity between the selected lines (Fig. 6), but the low freezing line appears more sensitive to the shock than the high freezing line. Since it was the high freezing mice that were less sensitive to the shock, this difference is unlikely to play a role in the response to selection.

Microarray and QTL studies in the selected lines

A number of the genes that are differentially expressed are located on chromosomes where we also identified QTLs. A subset of these genes could be differentially expressed because there are cis-acting expression QTLs (eQTLs) located on these chromosomes causing the behavioral QTLs. We have previously used such evidence to implicate Csnk1e in methamphetamine response (Palmer et al. 2005), and have gone on to show that this same gene is involved in human sensitivity to amphetamine (Veenstra-VanderWeele et al. 2006). Other genes may also have eQTLs and be differentially expressed simply because they are in linkage disequilibrium with the true QTL allele as a result of the few crossovers present after only four generations of recombination, and thus were altered by selection pressure in the same manner as the SNP alleles that we used as markers. Such undesirable linkage is especially prevalent in short term selected lines derived from F2 crosses between inbred strains. Thus, many of the genes in our list are likely to be unrelated to fear conditioning. As an example of this phenomenon, 15 of the 69 genes in Table 1 are located on chromosome 7.

Three genes were identified as differentially expressed between the high and low selected lines that had also been identified in our previous study of selected lines (Ponder et al. 2007a): suppressor of Ty 16 homologue (Supt16h), endonucleoside triphosphate diphosphohydrolase4 (Entpd4; aka Lysal1) and diacylglycerol kinase, gamma (Dgkg). These genes may provide particularly interesting candidates for future study.

The results of the QTL analysis show that the region of chromosome 7 that contains the tyrc allele was a factor in mediating the response to selection. Additional QTLs were identified at suggestive levels. We have previously examined fear conditioning in chromosome substitution strains (CSS) lines, in which individual AJ chromosomes have been backcrossed onto a B6 background (Ponder et al. 2007b). The present study shows only limited concordance with those results. Chromosomes 3, 7, 10 and 14 are identified in both studies. The present results also identified chromosomes 2, 5 and 13 at suggestive levels of significance. The CSS study identified genome-wide significant QTLs on chromosomes 6, 11 and 18 that were not observed in the present study. We have previously reported the results of another STSL study in which we used short-term selection for fear conditioning but started with an F2 cross between B6 and DBA/2J (D2) mice (Ponder et al. 2007b). In that study we identified several of the same chromosomes as reported in the present study. Specifically, chromosomes 5, 10, 13 and 14 are implicated in both studies. The mapping precision of this technique is likely to be rather poor, however it is worth noting that the chromosome 5 and 10 QTLs both appear more proximal in our previous study compared to this one, while the chromosome 13 and 14 QTLs identify similar regions of those chromosomes. There are several possible explanations for the inconsistent results observed in these studies. For the comparison between the present results and our previous study of CSS lines, differences are likely due to a combination of limited power and epistatic interaction that are present in the selected lines, but do not exist in the much simpler CSS lines. Differences between results from CSS lines and F2 crosses have been reported previously for physiological phenotypes (Stylianou et al. 2006). For the comparison between the present data and our previous selection study, different F2 populations (B6xAJ versus B6xD2) were used as a foundation for selection, and hence it is likely that alleles unique to either AJ or D2 are the cause of at least some of the observed differences. In all cases, the limited power, and suggestive nature of the QTLs provide additional explanations for the inconsistent results.

The albino allele and the response to selection

We identified some evidence that suggests the albino allele is partially responsible for the response to selection. An observation made during the creation of the selected lines was that all low selected mice were albino after the second selection generation. This indicated that the tyrc allele, which causes albinism, had become fixed in the low selected line. QTL analysis confirmed that the frequency of a marker 4 MB away from the tyr allele changed over the course of selection in a manner that is extremely unlikely to be accounted for by genetic drift (Table 2). This is interpreted as evidence of a nearby QTL for freezing behavior. In our previous selected line study, no alleles achieved complete fixation after four generations of selection (Ponder et al. 2007a). The result was even more surprising because the tyrc allele is recessive, and hence would generally be slower to respond to selective pressure as compared to additive or dominant alleles. These data suggested the presence of a strong QTL at or near the tyrc allele.

In order to examine the effect of the tyrc allele in isolation, we compared B6 and B6-albino mice. B6-albino mice are the result of a spontaneous loss-of-function mutation in tyr. B6-albino mice showed significantly decreased freezing when using the 0.7 mA shock compared to B6 mice, however there was no difference between the groups when the 0.5 mA shock was used (Fig. 7). Contextual fear conditioning was also tested in a congenic AJ strain containing a functional tyr allele (AJ-pigmented). The AJ-pigmented strain showed a non-significant trend towards more freezing to context than the albino strain at 0.5 mA shock intensity, but no difference when the 0.7 mA shock was used. In a previous study of inbred CSS we identified the CSS-7 strain (which carries both the albino allele as well as the rest of the AJ version of chromosome 7) as having significantly lower freezing behavior as compared to B6 when training used the 0.5 mA shock intensity (Ponder et al. 2007b). Thus, when all of chromosome 7 is derived from AJ, the effect on freezing behavior is observable at the 0.5 mA intensity and is also greater in magnitude than when just the tyrc allele is different. This observation suggests the possibility that there are additional, closely linked alleles on chromosome 7 that also underlie the response to selection. Thus, this study has provided some evidence that tyrc reduces freezing behavior and may have been partially responsible for the change in marker allele frequency observed over the course of selection (Table 2); however there is also evidence that suggests the existence of additional alleles.

We were initially concerned that the automated video scoring system might incorrectly score freezing in white (albino) mice. To investigate this possibility, we asked several human observers unfamiliar with the issue to score a selection of video files from albino and pigmented mice. We found the observations of human scorers were highly correlated with automated video scoring. The r 2 correlation between automated and human scoring were (0.92; 0.79; 0.87; 0.85 and 0.52; average = 0.79). This demonstrates that the difference in freezing was not due to our use of automated scoring.

AJ mice are albino, due to the tyrc allele of the tyrosinase gene located on chromosome 7 (87.3 MB) which inhibits the production of melanin as a result of a single nucleotide mutation. Therefore, mice with the tyrc allele lack melanin in the hair, skin and eyes, resulting in white fur and red eyes. Albino mice have fewer rod photoreceptors, defects in visual projections at the optic chasm, an increased occurrence of glaucoma, and spatiotemporal defects in neuronal development of the retina (Beermann et al. 2004; Rachel et al. 2002). A visual task measured in the Morris water maze showed that albino strains in general, and the AJ strain in particular, show long latencies to locate a visual platform (Owen et al. 1997), indicating a general visual deficiency in these strains. The effect of visual deficit on contextual fear conditioning is unknown; however, AJ mice show a very high level of contextual fear conditioning compared with B6 mice (Ponder et al. 2007b), suggesting a robust ability to learn the association between shock and context despite potential visual impairments. On the surface, the high freezing in the AJ mice may seem to contradict our evidence that the tyrc allele (present in AJ but not B6) decreases freezing. Multiple alleles are expected to influence fear learning; while the tyrc allele may decrease fear learning in the AJ line, it apparently harbors other alleles that result in higher freezing behavior compared to B6. Thus, the two observations are not contradictory.

It has been known for decades that the tyrc allele is associated with behavioral differences (Churchill et al. 2004; Crabbe et al. 1999; DeFries 1969; Henry and Schlesinger 1967; Katz and Doyle 1981; Meier and Foshee 1965; Rhoades and Henry 1977; Talbot et al. 2003; van Abeelen and Kroes 1968; Winston and Lindzey 1964). A QTL for behavior on the plus maze on chromosome 7 at the tyr locus has been previously reported (Cohen et al. 2001). Interestingly, a recent study reported that pigmentation is related to social inhibition in human subjects (Moehler et al. 2006), raising the interesting possibility (though clearly quite speculative) that these findings may model a source of human behavioral variability.

Microarrays identified numerous gene expression differences in the amygdala and hippocampus of these selected lines. A second microarray experiment was used to assess the gene expression differences between B6 and B6-Albino strains to see how many of the differentially expressed genes in the selected lines were a result the tyr allele specifically. The data indicated that no genes were differentially expressed in the brain as a result of the tyrc allele on a B6 background. Additionally the data indicated that tyrosinase gene expression is undetectable in the brain regions examined.

Because of the apparent effect of tyr on fear learning, as well as the potential (unrealized in the present study) for video-based automated scoring systems to be confounded by coat color, these results could be taken as a warning against using populations where different tyr alleles are segregating for the study of complex behavior where examination of tyr is not the objective. Had we based our cross on an F2 between B6 and AJ-pigmented, or between B6-albino and AJ, we would have been better able to assess the effect of other alleles on fear conditioning.

Conclusions

The results of this study provide some support for the hypothesis that the tyr allele alters fear learning. The extent to which the overall response to selection reflects differences in learning is not perfectly clear. Additional findings of this study include the identification of QTLs and the identification of specific genes that are differentially expressed in the brains of selected mice. Taken together these results provide a foundation for the systematic, unbiased identification of genes that mediate differences in fear learning.