Introduction

Meiotic recombination rate, which describes the frequency with which genetic information is exchanged between homologous chromosomes during meiosis, is a key population genetic parameter. Given the well-described effects of genetic hitchhiking (Gillespie 2000; Maynard Smith and Haigh 1974) and background selection (Charlesworth et al. 1993) on levels of standing variation, it is now clear that recombination rate has a major impact on the extent to which natural selection affects linked sites. As recombination rate increases, not only does the efficacy of selection increase due to decreased interference among sites (Hill and Robertson 1966), but theory also predicts that linked variability will be decreased due to the fixation of beneficial variants or purging of deleterious variants to a degree that decreases with increased rates of recombination.

However, recombination rate varies widely at several scales. For example, both the frequency and distribution of crossover events can differ markedly even among closely related species, as seen in Drosophila (e.g., True et al. 1996) and primates (Ptak et al. 2005; Ptak et al. 2004; Wall et al. 2003; Winckler et al. 2005). In addition, within humans, recombination rates have been shown to vary among populations (Fearnhead and Smith 2005) and individuals (Coop et al. 2008). In Drosophila as well, recombination rates have been shown to vary among strains (Brooks and Marks 1986). Furthermore, recombination intensity, or recombination rate scaled to physical distance (in units of centimorgans per megabase pair [cM/Mb]) is heterogeneous within the genome. In D. melanogaster, for instance, the X chromosome and the autosomes exhibit striking differences, with a telomere-associated decrease in recombination intensity on the X chromosome but little or no such effect on the autosomes (True et al. 1996).

Although these represent coarse-scale differences in recombination intensity, crossover frequency appears to vary at a fine scale as well. Recombination hotspots have been observed in a wide range of taxa, including Arabidopsis (Drouaud et al. 2006), maize (Brown and Sundaresan 1991; Dooner et al. 1985; Fu et al. 2002), yeast (for review see Lichten and Goldman 1995; Petes 2001), humans (Myers et al. 2005), nonhuman primates (Ptak et al. 2005; Ptak et al. 2004; Wall et al. 2003; Winckler et al. 2005) and mice (Guillon and de Massy 2002; Kauppi et al. 2007). In humans, recombination hotspots are generally defined as regions with local increases in recombination intensity relative to background or the surrounding region, with the magnitude of the local increase between 10- and several 1000-fold (e.g., Li et al. 2006). Another defining feature of these hotspots is that the majority of recombination events occur within these regions. In humans, for instance, up to 80% of recombination occurs in only 10–20% of the sequence (Myers et al. 2005), and in studies mapping individual meiotic exchange events, approximately 60% of all such exchanges occur in previously identified hotspots (Coop et al. 2008).

It is generally thought that the recombinational landscape in Drosophila does not resemble that found in humans, at least in this latter regard. This is primarily due to previously reported patterns of linkage disequilibrium (LD), which decays relatively rapidly, and homogeneously, in Drosophila (e.g., Langley et al. 2000; Long et al. 1998; Palsson et al. 2004; Tatarenkov and Ayala 2007). The absence of long haplotype blocks in Drosophila has thus been interpreted as being inconsistent with a recombinational landscape of highly localized hotspots in which the majority of recombination events takes place. However, it remains to be seen whether there is significant heterogeneity in recombination intensity at a fine scale in D. melanogaster, with fluctuations in the magnitude of recombination intensity rivaling that found in humans.

For most applications, inferences of recombination intensity in D. melanogaster have employed statistical methods that rely on the relationship between the physical and genetic maps. There are several examples of this type of approach, which integrate this map information over different physical scales. The adjusted coefficient of exchange (Kindahl 1994), for instance, is based on the relationship between the physical and genetic maps at a megabase scale; other sliding window approaches, with variable window sizes, have been taken as well (e.g., Carvalho and Clark 1999; Hey and Kliman 2002). Polynomial regression approaches estimate recombination intensity as the derivative of a polynomial curve describing genetic location as a function of physical location. This method suffers from the fact that the estimate of recombination intensity is sensitive to the degree of the polynomial equation, which has varied among studies (Comeron and Kreitman 2000; Kliman and Hey 1993; Marais et al. 2001; Singh et al. 2004). These methods, as well as traditional genetic approaches, have showed substantial variation in recombination rate at a broad scale, with regions of highly suppressed recombination, such as centromeric regions, and recombination intensity increasing with increasing distance from the centromere (e.g., Lindsley and Sandler 1977). For instance, the pericentromeric heterochromatin of the X chromosome has a genetic map distance of only 0.04 cM, although it may comprise up to half of total X chromosome DNA, whereas the remainder of the chromosome has a total genetic map length of 65 cM (Lindsley and Sandler 1977; Roberts 1965).

Although it is clear that recombination intensities in D. melanogaster are heterogeneous, the resolution of statistical inference of recombination intensity is limited by the genetic map, and the statistical methodology leaves considerable uncertainty about the scale at which recombination rate variation occurs. A critically unanswered question is whether the wide range in recombination intensity evident at a broad scale in D. melanogaster is recapitulated at a fine scale. Classical half-tetrad analysis has been used to produce fine-scale maps of only a few regions of the genome (e.g., Clark et al. 1988; Hilliker and Chovnick 1981; Hilliker et al. 1980), but this approach is not readily applied to any arbitrary chromosomal location. Although there is some evidence in D. pseudoobscura suggesting up to 40-fold variation in the rate of crossovers at a local scale (Cirulli et al. 2007), little is known about the fine-scale structure of recombination intensity variation in D. melanogaster.

Here we experimentally evaluate heterogeneity in recombination intensity at a local scale in a segment of the X chromosome of D. melanogaster. We generated a strain carrying mutations in both white and echinus; these X-linked genes are separated by approximately 1.2 Mb and 4 cM. We focused on this region in part because previous regression-based approaches have been suggestive of moderate to high recombination intensity in this region (Hey and Kliman 2002; Kindahl 1994). Were there localized regions in D. melanogaster that were subject to high levels of recombination, our chances of capturing such a region would likely be bolstered by focusing on a region with high recombination intensity at a broad scale. In addition, two different recombination estimators, which integrate genetic and physical map information at different scales, are somewhat incongruous in this region, which gave us confidence that there might be local heterogeneity in recombination intensity in this region in particular. Using a two-step crossing scheme, we were able to screen hundreds of thousands of flies to identify >500 male flies containing a single recombination event in this region. By genotyping SNP markers across this region in our recombinant flies, we obtained a detailed picture of the local recombinational landscape.

Our results show that crossover frequency is significantly heterogeneous in this region of the D. melanogaster X chromosome, with recombination intensities varying approximately 3.5-fold across this 1.2-Mb interval. Moreover, fine-scale recombination intensity appears correlated with codon bias in the expected direction, which is suggestive of a biological significance to this fine-scale variation. Simulation results indicate that within this range of recombination intensities, nucleotide diversity and empirical estimates of recombination intensity are weakly but significantly positively correlated under a variety of models of selection. Empirical estimates of recombination intensity do not, however, correlate with available estimates of standing nucleotide variation in this region. Our simulation results suggest that data from more loci are needed to confidently detect such correlations given the modest magnitude of heterogeneity in recombination intensity observed for this region. These results contribute to the emerging picture that recombinational landscapes of Drosophila species are significantly heterogeneous at a fine scale and that the magnitude of the heterogeneity may vary among species. This variation has important implications for understanding the physical scale at which natural selection can affect the evolutionary trajectories of linked sites.

Materials and Methods

Fly Strains

The double-mutant strain of D. melanogaster used in this experiment contained two X-linked recessive mutations with visible phenotypes, corresponding to mutations in the white (w) and echinus (ec) genes. This strain was constructed by crossing a white mutant line (stock number 145 from Bloomington Drosophila Stock Center) with an echinus mutant line (stock number 32 from Bloomington Drosophila Stock Center) and screening for recombinant phenotypes in the F2 generation. We subsequently created an extracted X-chromosome (i.e., a line in which all of the X chromosomes are identical) w-ec line by crossing our initial w-ec line to the FM7a balancer stock. The wild-type line used in this study (UgX54A) corresponds to an X-extracted line from a Ugandan population of D. melanogaster. We screened for inversions by crossing these two parental lines and performing standard polytene chromosome squashes of the F1 progeny; these lines appear to be inversion-free relative to one another. We determined M versus P cytotype by crossing these two lines to lines with known cytotypes at 29°C and dissecting ovaries from female progeny. The w-ec line appears to have the M cytotype, whereas the African line has the P cytotype.

Experimental Crosses

We used a two-step crossing scheme to generate the recombinant male flies used in this study (Fig. 1). All crosses were set up in bottles on standard yeast–glucose media and involved 20 virgin female and 20 male flies. A total of 160 bottles of crosses were established in the first round; these crosses were conducted at 21°C because this decreases the effects of hybrid dysgenesis induced by the M/P cytotype incompatibility of the two parental lines (Kidwell et al. 1977). A total of 258 bottles of crosses was established in the second round, which all involved virgin female flies 24–36 h old. These crosses were incubated at 25°C for the first 5 days and were held at 21°C thereafter. The initial incubation temperature (25°C) was chosen because crossover frequency appears to increase at temperatures >22°C, at least for some regions (Ashburner 1989). Although 25°C is a temperature at which MxP crosses results in hybrid dysgenesis to some extent, these effects are decreased in the F1 generation. That is, the incidence of hybrid dysgenesis in MPxM crosses (F1 daughters of an MxP cross mated to an M male) is markedly decreased relative to MxP crosses (Engels 1979). However, the effects of the cytotype incompatibility are not completely mitigated in the F1 generation, which suggests that M/P effects might play a limited role in some of our findings. The shift to 21°C was necessitated by incubator space limitations. Parents were cleared on day 4, and F2 progeny were scored on day 18, where day 0 is the day the crosses were set up. Male and female flies were counted, and male flies were scored for the recombinant phenotype. A total of 536 recombinant males was collected (273 +ec male and 263 w+ male flies).

Fig. 1
figure 1

Schematic representation of our two-step crossing scheme. Boxed F2 male progeny correspond to the two recombinant genotypes identified by our screen. These male flies contain a single crossover in the region between white and echinus

SNP Markers

Eight single nucleotide polymorphisms were used to genotype the recombinant male flies generated during this experiment. These correspond to fixed differences between our two parental lines. These SNPs were ascertained by Sanger sequencing of several noncoding regions in the 1.2-Mb region separating the white and echinus genes. With the exception of one marker (4647), all SNPs were located in intergenic regions. These markers were designed before the initiation of our experimental crosses. Our initial list of homogenously spaced candidate noncoding loci was refined based on ascertained single nucleotide polymorphisms (SNPs) among lines as well as pyrosequencing assay constraints, yielding a less homogenously spaced set of markers that was ultimately used for genotyping. The absolute physical positions (release 5.1 of the D. melanogaster genome) of our markers, including our visible markers (white, 40355, 25645, 29649, 28369, 4647, 479, 52943 and 52792, echinus), were 2.69, 2.79, 2.83, 2.88, 2.98, 3.18, 3.30, 3.44, and 3.73 Mb, respectively. The distance among markers was therefore 95, 44, 46, 102, 69, 131, 122, 144, and 286 kb, respectively. More precise mapping information is available on request. Note that these are the physical distances in the iso-1 reference strain.

Genotyping Recombinant Male Flies

DNA was extracted from single recombinant male flies using a standard squish protocol. Each fly was flash frozen in liquid nitrogen, crushed with a pestle, and subsequently immersed in a buffered solution (10 mM Tris-Cl pH 8.2, 1 mM ethylenediaminetetraacetic acid [EDTA], 25 mM NaCl, and 200 μg/ml proteinase K). This was incubated at 37°C for 30 min and then at 95°C for 2 min to inactivate the proteinase K.

Individuals were amplified by polymerase chain reaction (PCR) using primers designed by Pyrosequencing Assay Design Software (Biotage, Uppsala, Sweden). One primer in each amplifying primer pair was biotinylated. Amplifying conditions were as follows: 95°C for 5 min, followed by 45 cycles at 95°C for 15 s, at 55°C for 30 s, and at 72°C for 15 s, with final extension at 72°C for 5 min. Biotinylated PCR products were immobilized onto streptavidin-coated beads by combining 3 μl beads, 40 μl binding buffer (10 mM Tris-HCl, 2 M NaCl, 1 mM EDTA, and 0.1% Tween 20), and 22 μl H2O with 15 μl PCR product and vortexing for 10 min at room temperature. The immobilized DNA was washed for 5 s in 70% EtOH, and the DNA was denatured for 5 s in 0.2 M NaOH and washed in 10 mM Tris-acetate for 5 s; these three steps were performed using a vacuum prep tool (Biotage). The DNA was immersed in 40 μl 0.3 μM sequencing primer (designed by Pyrosequencing Assay Design Software [Biotage]) that had been diluted to that concentration (from 10 μM) using annealing buffer (20 mM Tris-acetate and 2 mM Mg-acetate; pH 7.6). This was subsequently incubated at 85°C for 2 min. The PyroGold (Biotage) substrate, enzyme, and dNTPs were added to the PSQ cartridge at half strength (diluted 1:1 with distilled water), and the SNPs were genotyped using a PSQ 96MA Pyrosequencer (Pyrosequencing AB, Uppsala, Sweden) and SNP software (Biotage). For those individuals that posed challenges with ≥1 SNP genotyping reaction (this occurred in approximately 20% of individuals), genotypes were inferred where possible based on the assumption of a single recombination event in this region. Importantly, the distribution of crossovers in this 1.2-Mb region, when only those flies with full genotype information were considered, did not significantly differ from the distribution of crossovers in this region when all flies (including those with inferred genotypes) were considered, suggesting that this inference procedure is not systematically biasing our results.

Recombination Maps

We used a multipoint linkage approach to infer recombination maps in our two classes of recombinant male flies as implemented in MapMaker version 3.0 (Lander et al. 1987). Recombination rates are presented as cM (using Haldane’s function), and recombination intensities are presented as cM/Mb.

Genomic Correlates of Recombination Intensity

To correlate local recombination intensity with genomic features, we estimated both codon bias and intronic GC content. We retrieved the sequences of all genes located in this region based on release 5.1 of the D. melanogaster genome. We concatenated all exonic sequences within individual genes and estimated codon bias for each gene using stand-alone implementation of codonW (downloaded from http://codonw.sourceforge.net). We used the codons defined as preferred in D. melanogaster to estimate the frequency of optimal codons (FOP) for each gene in this region. We similarly concatenated all of the intronic sequence within individual genes to estimate intronic GC content in a gene-by-gene fashion. We placed each gene into one of the nine intervals defined by our SNP markers based on its physical position and assigned each gene and intron the recombination intensity estimate for that interval.

In addition, we compiled nucleotide polymorphism and divergence data from previous reports, yielding a data set of 45 loci, each of which was sampled in both African and non-African populations. These correspond to genes and noncoding regions within our 1.2-Mb interval and were taken from several published sources (Bauer DuMont and Aquadro 2005; Jensen et al. 2007; Ometto et al. 2005; Pool et al. 2006), although some unpublished data were included (V. L. Bauer DuMont and C. F. Aquadro, 2008, personal communication). For each gene, we normalized estimates of average pairwise nucleotide diversity by divergence with D. simulans in an effort to control for variation in mutation rate and selective constraint among loci. We assigned each locus to one of the nine intervals based on its physical map location in release 5.1 of the D. melanogaster genome and ascribed to each locus the recombination intensity estimate for that interval.

Simulations

We conducted simulations using SFS_CODE (http://www.sfscode.sourceforge.net/; Hernandez 2008). Each replicate consisted of forward-evolving a 5000-bp segment of noncoding DNA and sampling 10 diploid individuals at the end of each simulation run. A random 500-bp window within this 5000-bp segment was chosen for each replicate, and nucleotide diversity (n = 20 alleles) was calculated for this window. Ten thousand iterations were implemented for each parameter set. In our explored parameter space, the population recombination parameter ρ (per site) varied from 0.01 to 1; the selection parameter γ = |2Ns| varied from 0.1 to 10; and θ = 2N e μ was constant at 0.004. The distribution of selective effects followed a three-point mass model, with a single γ for advantageous and deleterious mutations. The probability that a novel mutation is advantageous ranged from 0.05 to 0.15, and the probability that a novel mutation is deleterious was between 0.1 and 0.5; the remaining mutations were neutral.

Results and Discussion

We empirically quantified the fine-scale structure of recombination intensity in a 1.2-Mb region of the D. melanogaster X chromosome. This region corresponds to the interval between the w and ec genes and, based on broad-scale recombination estimators, appears subject to moderate levels of recombination (approximately 2 to 3 cM/Mb). We crossed an X-extracted D. melanogaster line containing both visible mutations to a wild-type African line and back-crossed the F1 female progeny to male flies of the w-ec parental line to generate 190,642 F2 progeny. The 91,539 F2 male flies were scored for both phenotypic markers, and male flies with a single recombination event in this interval should carry one visible mutation but not the other. In total, 536 recombinant male progeny were recovered, 273 with the +ec phenotype and 263 with the w+ phenotype. Importantly, this phenotypic screen will only capture those individuals with an odd number of crossover events; thus, we were unable to include double crossovers in our fine-scale mapping of the crossover distribution in this region. However, triple crossovers could be detected by this approach, as could gene conversion events, although none were detected in this screen.

Our mapping experiment suggests that the recombinational distance between w and ec is 0.59 cM for this pair of chromosomes, which contrasts with the previously reported average distance of 4.0 cM (Lindsley and Grell 1967). This latter distance is based on amalgamating map data determined from multiple laboratories, likely corresponding to crossing different lines under different conditions. The laboratory lines were most likely also of North American origin. Given the extensive variability in rates of crossing over among lines (as much as two-fold; see Brooks and Marks 1986) as well as the sensitivity of recombination rates to factors such as maternal age (Bridges 1927; Chadov et al. 2000; Lake 1984; Redfield 1964, 1966; Stern 1926) and temperature (Grushko et al. 1991; Plough 1917, 1921; Smith 1936; Stern 1926), this difference between our new estimate and the previous estimates of recombination distance is perhaps not surprising. However, our estimated average recombination intensity in this 1.2-Mb region is only 0.49 cM/Mb, which is also markedly lower than estimates inferred from other approaches, such as the adjusted coefficient of exchange or a regression polynomial approach, both of which estimate recombination intensity in this region as 2 to 3 cM/Mb. We do not believe that this decrease in recombination intensity is caused by a lack of double crossovers being taken into account in our experimental screen because given crossover interference, it seems unlikely that these events play a significant role at this physical scale in Drosophila (Cirulli et al. 2007). More likely, this decrease in crossover intensity reflects variation among lines in crossover frequency. The divergence (including small insertions and deletions) between the Ugandan line and the laboratory strain may also result in a lower rate of genetic exchange, as might chromosomal rearrangements too small to be scored cytologically (Sturtevant and Beadle 1936).

Finally, the M/P cytotype differences between our two parental lines may have played a role in the reduction in overall recombination intensity detected in our experiment. Although the extent of hybrid dysgenesis in this second round of crosses (MPxM) is expected to be decreased relative to an MxP cross, we did these crosses at a temperature where the effects of hybrid dysgenesis can be manifested (although not to the same degree that they would at higher temperature) (Kidwell et al. 1977). Cytotype differences have been known to play a role in rates of crossing over in F1 daughters of MxP crosses (Kidwell 1977), so it is possible that the M/P cytotype differences between our two parental lines could contribute to our observed decreased rate of crossing over relative to previous reports. In the future, we hope to explore this discrepancy further. Until it becomes clear how much of our observed decrease in recombination intensity is caused by M/P cytotype differences versus other contributing factors, we do not believe that this result is necessarily biologically significant. Rather, we focus on the magnitude of the variation in recombination intensity within this region, independent of the previously published map distance, because we believe them to be more robust to cytotype differences between our parental lines. Although overall rates of recombination may be decreased in our experimental crosses because of M/P effects, we do not have any reason to expect a priori that M/P crosses alone will generate fine-scale structure of recombination intensity.

Fine-Scale Mapping of Crossover Distribution

By genotyping each of our 536 recombinant male flies at eight SNP markers (in addition to w and ec), we were able to localize the single crossover event in each fly to one of nine intervals. We constructed recombination maps for this region in each of our two classes of recombinants (w+ and +ec) using Mapmaker. Because Mapmaker is run in a likelihood framework, we can statistically test whether estimates of recombination distances were significantly different between these two classes of recombinant flies. We compared the log-likelihood of the recombination map based on combining the w+ and +ec flies to the sum of the log-likelihoods of the maps based on each class alone. Twice the difference in these log-likelihoods should be approximately χ2 distributed with nine degrees of freedom in this fully nested comparison. Because a more highly parameterized model, in which each recombinant class is allowed to have its own recombination map, is not a significantly better fit to the data than a model in which these recombinant classes are combined (P = 0.49, χ2 test), this suggests that the recombination maps are not significantly different between the w+ and +ec flies. Moreover, estimates of recombination intensity (recombination distance divided by physical distance) based on +ec flies are significantly correlated with those based on w+ flies (Spearman’s ρ = 0.67, P = 0.0499). As a consequence, we combine the two classes of recombinants and present a single recombination intensity map in this region (Fig. 2).

Fig. 2
figure 2

Recombination intensity (in cM/Mbp) as estimated by crossover frequency per physical distance. Error bars correspond to the 95% confidence interval. The intervals are defined by eight SNP markers in combination with the two visible phenotypic markers (white and echinus)

Within this region, the distribution of recombination events (taking into account physical interval size) is significantly different from uniform (P = 0.004, χ2 test). The range of recombination intensities spans 0.25 to 0.86 cM/Mbp, which corresponds to approximately 3.5-fold variation in recombination intensity in this 1.2-Mb region. Although it appears as although there are localized regions with increased recombination intensity, we are reluctant to deem them “hotspots” given that the magnitude of the local increase appears mild and because patterns of LD in Drosophila are inconsistent with a preponderance of recombination occurring in a restricted fraction of the sequence. To distinguish these regions from hotspots such as those found in humans, which can exhibit massive local increases in recombination intensity, and in which a disproportionate amount of recombination events occur, we will refer to these regions in Drosophila as “recombination peaks.”

Our results thus indicate that the distribution of crossover events in this small region of the D. melanogaster X chromosome is significantly heterogeneous. This echoes a previous finding from D. pseudoobscura, which suggested that fine-scale rates of crossing over are significantly different from uniform as well (Cirulli et al. 2007). Notably, this previous report in D. pseudoobscura was focused on an X-linked region of similar size, making the direct comparison of results appropriate. However, the height of the recombination peaks captured in the present experiment in D. melanogaster appears quite low, on the order of approximately 3.5-fold. This may in part reflect the resolution of our recombination map, given that we divided our 1.2-Mb region into nine intervals. In a comparable study in D. pseudoobscura, which divided a 2-Mb region into eight intervals, recombination intensity variation was similar in magnitude, ranging from approximately 2 to 7 cM/Mbp (Cirulli et al. 2007). Closer dissection of the fine-scale structure of recombination intensity variation in D. pseudoobscura by additional genotyping in the interval with the highest estimated crossover rate suggested that recombination intensities can vary up to approximately 40-fold (Cirulli et al. 2007). One possibility for this ascertained difference in magnitude in recombination intensity variation between D. melanogaster (approximately 3.5-fold) and D. pseudoobscura (up to 40-fold) is that of resolution. Alternatively, this difference could reflect the generally increased rate of recombination in D. pseudoobscura relative to D. melanogaster (Hamblin and Aquadro 1999; Ortiz-Barrientos et al. 2006), which could in principle result from an increase in the number or intensity of recombination peaks. Further comparisons of the recombinational landscapes of D. melanogaster and D. pseudoobscura are likely to shed light on whether interspecific differences are principally driven by the frequency of recombination peaks or their intensities.

Recombination Intensity and Nucleotide Diversity

Nucleotide polymorphism and regional rate of recombination have been shown to be positively correlated in D. melanogaster and D. simulans (e.g., Begun and Aquadro 1992; Begun et al. 2007), and this could result from genetic hitchhiking (Gillespie 2000; Maynard Smith and Haigh 1974), background selection (Charlesworth et al. 1993), or some combination of both (Kim and Stephan 2000). However, the bulk of previous studies correlating nucleotide diversity and recombination rate in Drosophila have been limited to coarse-scale estimates of recombination rate (although see Kulathinal et al. 2008), and, as a consequence, our understanding of this relationship is limited in resolution. As more fine-scale recombination rate data emerge, we can begin to dissect the physical scale at which this correlation is manifested, which can aid in disentangling the relative roles of different selective forces in generating this pattern.

To investigate whether the range of recombination intensities captured in this study is sufficiently wide to significantly impact levels of standing nucleotide diversity under different models of selection, we used a simulation approach. We used SFS_CODE (Hernandez 2008) to simulate the evolution of a 5-kb stretch of DNA, taking into account the effects of both advantageous and deleterious mutations. To make these simulations comparable with typical empirical polymorphism data sets, we randomly sampled a 500-bp window within the 5-kb locus to estimate nucleotide diversity and sampled 20 alleles to generate our population sample.

For the purposes of presentation, we also included simulations with recombination intensities far exceeding those detected in this study. Our simulation results suggest that even moderately strong selection yields a significant, positive correlation between polymorphism and recombination intensities. Although this is visually apparent across a range of recombination intensities spanning two orders of magnitude (Fig. 3), it is also statistically significant within the narrow range of recombination intensities captured here \( \left( {\rho \approx 0.01 - 0.04} \right) \) (Table 1). However, the magnitude of the increase in diversity with increasing recombination in this range, reflected in the left-most two values of ρ in each panel in Fig. 3, appears to be small. In addition, the strength of the correlation appears to depend on the strength of selection: as the selection parameter increases, the dependence of polymorphism on recombination intensity increases as well (Table 1). The frequency of selective events appears to play a role as well, with more frequent selective events having a more pronounced effect on nucleotide diversity (Table 1). Overall, these results suggest that under a variety of modes of selection, recombination intensity and nucleotide diversity are positively correlated even within a modest range of recombination intensity values. However, the magnitudes of the correlation coefficients indicate that the association between recombination rate and diversity at this scale is weak, suggesting that genome-scale sampling is likely to be required to recover this pattern.

Table 1 Representative simulation results: Selection models and correlation coefficients based on 10,000 iterations each of 4 recombination intensity valuesa
Fig. 3
figure 3

Representative simulation results. Boxplots correspond to distributions of nucleotide diversity as a function of recombination rate under different parameter combinations, with the selection parameter ranging from 0.1 to 10 and the probability of a novel mutation being advantageous ranging from 0.05 to 0.15. For these plots, the probability of a novel mutation being deleterious is constant at 0.5. The range of the population recombination parameter ρ is consistent across plots, ranging from 0.01 to 1, and the left-most two values in each plot (0.01 and 0.05) correspond roughly to the range of recombination intensity captured by our empirical study. The notch in each box corresponds to the median value, and the lower and upper edges of the box correspond to the 25th and 75th percentiles, respectively. Whiskers extend to the most extreme data point, which is no more than 1.5 times the interquartile range from the box

Given the availability of empirical polymorphism data in this region of the X chromosome in D. melanogaster, we compared our empirically generated fine-scale recombination intensity estimates with estimates of nucleotide diversity in this region. We compiled polymorphism data from several sources, which collectively represent 45 loci across this 1.2-Mb region (which fall into five of our nine intervals) sampled separately in both African and non-African populations. Because these loci include both coding and noncoding regions, we normalized our estimates of nucleotide diversity by divergence with D. simulans to control for the effects of mutation rate variation. For both African populations and non-African populations, recombination intensity and normalized nucleotide diversity are not significantly correlated (Fig. 4).

Fig. 4
figure 4

Scatterplot of recombination intensity and nucleotide diversity (normalized by pairwise divergence with D. simulans) in African and non-African populations

Our lack of significant correlation between local nucleotide diversity and recombination intensity contrasts with that recently reported in D. pseudoobscura (Kulathinal et al. 2008). However, local recombination intensity varied by more than one order of magnitude across the region studied in D. pseudoobscura (Kulathinal et al. 2008). In addition, another crucial difference between our simulation results and empirical results for D. melanogaster is the number of loci sampled. The significant correlations detected in our simulations were based on 10,000 iterations of a given selection/recombination model, which contrasts with the 45 loci sampled in our empirical data. To investigate whether the absence of a significant correlation in our empirical sample was caused by a lack of statistical power, we sampled 11 iterations of each parameter set (for a total of 44 iterations across 4 recombination intensity categories corresponding to ρ = 0.01, 0.02, 0.03, and 0.04) and examined the correlation between polymorphism and recombination intensity with these subsets. In no case was there a significant positive correlation between polymorphism and recombination intensity, suggesting that the limited number of loci sampled in our population genetic data set may underlie our inability to capture such an association given the limited range of recombination intensities detected in this region of the D. melanogaster X chromosome.

Base Composition Evolution

It has previously been reported that codon bias and the GC content of noncoding sequences are significantly negatively correlated with recombination rate on the Drosophila X chromosome (Singh et al. 2005). This is contrary to the naïve expectation under a Hill-Roberston interference model, in which the efficacy of selection on codon bias would increase with increasing recombination. Indeed, this expected positive correlation is seen on the autosomes specifically and in the genome in general (Kliman and Hey 1993; Marais et al. 2001; Singh et al. 2005), although the role of selection in generating this correlation remains unclear (e.g., Marais et al. 2001; Marais and Piganeau 2002; Singh et al. 2004). To examine whether the negative correlations observed at the scale of the whole X chromosome were recapitulated at a fine scale, we estimated codon bias and intronic GC content for the 66 genes within the 1.2-Mb region under study. Consistent with the negative correlation at the level of the entire X chromosome, we found that codon bias is significantly negatively correlated with fine-scale recombination rate in this 1.2-Mb region of the X chromosome (Spearman’s ρ = −.31, P = 0.01) (Fig. 5a). However, intronic GC content (based on 47 intron-containing genes) was not significantly correlated with fine-scale estimates of recombination intensity in this region (Fig. 5b). This difference between coding and noncoding sequences is not likely to reflect differences in power. Although there are more point estimates of codon bias than there are of intronic GC content, the average length of the intronic sequence (13.7 kb) is substantially larger than the average exonic length (1.3 kb) per gene, which suggests that the intronic point estimates reflect less noise than their codon bias counterparts. More likely, the difference observed between codon bias and intronic GC content with respect to their relation with recombination intensity reflects differences in the evolutionary forces serving to modulate base composition of coding versus noncoding sequences. While base composition at synonymous sites may be subject to selection for translational efficiency, for instance, base composition at intronic sites may be subject to different selective pressures and/or different strengths of selection. In addition, the physical scale at which base composition is biologically relevant may differ between coding and noncoding sequences, for instance, with perhaps short-range functional significance in coding sequence and long-range significance in noncoding sequences.

Fig. 5
figure 5

Scatterplot of recombination intensity and a codon bias as measured by FOP and b intronic GC content

Conclusion and Future Directions

Our investigation of the fine-scale structure of recombination intensity in a 1.2-Mb region of the D. X chromosome showed significant heterogeneity in the distribution of crossover events. Although the magnitude of the variation in crossover frequency appears small (approximately 3.5-fold), these data, in combination with similar results from D. pseudoobscura (Cirulli et al. 2007), suggest that recombination intensity in Drosophila does indeed vary significantly at a fine scale. Moreover, our estimates of recombination intensity correlate with genomic features, such as codon bias, even at this fine scale, which is suggestive of a biological significance to this heterogeneity. Furthermore and provided that sampling is sufficiently dense, our simulation results show that recombination intensity and levels of nucleotide diversity should be weakly but significantly positively correlated within the range of empirical values of recombination intensity measured here. Although we did not find evidence for a significant association between fine-scale recombination intensity and levels of nucleotide diversity in our empirical study, our simulation results suggest that this may be due to a lack of statistical power given the limited number of loci in this region with available population genetic data.

More work is needed to assess the importance of scale both with respect to the distribution of crossover events in Drosophila as well as the relationship between polymorphism and recombination intensity. Perhaps most importantly, our understanding of the magnitude in recombination intensity fluctuation at a local scale (i.e., the height of the recombination peaks) would benefit considerably from increased resolution of our recombination map. Because the intervals studied here range from 40 to 300 kb, we are limited in our ability to detect heterogeneity at an ultra-fine scale, such as on the order of kilobases or even tens of kilobases, because these effects might be swamped in these larger intervals. At increased resolution, we will have considerably more power to detect whether recombination peaks vary in intensity in D. melanogaster (as they do in humans, for instance) as well as the distribution of recombination peak width with respect to physical distance covered. Critically, it also remains to be seen whether the heterogeneity in recombination intensity captured by our study is a general feature other D. melanogaster lines and populations, of the X chromosome, or of the D. melanogaster genome as a whole. Moreover, the extent to which recombination intensity varies at a fine scale in other Drosophila species could benefit from further study, particularly in close relatives of D. melanogaster, and comparative analysis of the recombinational landscapes among species will shed much light on the lability of recombination rates over evolutionary time. Finally, as our recombination maps in Drosophila continue to gain resolution, we can determine the biologically relevant scale at which levels of nucleotide diversity are affected by recombination rates.