Main

In spite of its central role in evolution, the molecular mechanisms underlying speciation are not well understood. Only a small number of genes involved in speciation have been documented1, with only one such gene, Prdm9, known in mammals2,3. Prdm9 contributes to hybrid sterility in male (PWD × B6)F1 mice from crosses between male Mus musculus domesticus C57BL/6 (hereafter B6) and female Mus musculus musculus PWD/Ph (hereafter PWD)4. Although its genetic basis is only partially understood5,6, this hybrid sterility is characterized by failure of pairing (synapsis) of homologous chromosomes and an arrested meiotic prophase owing to lack of repair of recombination intermediates2. Homologous recombination, and synapsis, are interdependent, essential meiotic processes7, and evidence suggests synapsis often nucleates at recombination sites8. Aside from the PWD × B6 cross, Prdm9 allele and dosage have been associated with variation in measures of fertility and successful meiosis in many additional mouse crosses9.

PRDM9 has several functional domains, including a DNA-binding zinc-finger array, and a PR/SET domain responsible for histone H3 lysine 4 trimethylation (H3K4me3)10(Fig. 1a). By binding to specific DNA sequence targets, PRDM9 directs the positions of the DSB events that initiate meiotic recombination11. This results in these DSBs and downstream recombination events clustering into small discrete regions called hotspots12,13. The PRDM9 zinc-finger array, encoded by a minisatellite repeat, is highly polymorphic within and across mammalian species3,14,15,16and is among the fastest evolving regions in the genome, with strong evidence of natural selection influencing this evolution3. It is unknown whether PRDM9 zinc-finger array polymorphism has additional impacts, aside from direct alterations of DSB hotspot positions.

Figure 1: Humanizing the zinc-finger domain of PRDM9 does not impact fertility.
figure 1

a, Domain structure of the re-engineered PRDM9 protein b, γH2AX staining of the sex body (green), SYCP3 staining of the chromosome axis (red) in late pachytene in B6B6/B6 (top) and B6H/H (bottom). c, As b, but for (PWD × B6)F1PWD/B6 and (PWD × B6)F1PWD/H. d, SYCP1 staining of the synaptonemal complex transverse filament (green), and SYCP3 staining of the chromosome axis (red) in pachytene for (PWD × B6)F1PWD/B6 and (PWD × B6)F1PWD/H. Arrows, unsynapsed autosomes. Scale bars, 10 μm.

PowerPoint slide

Humanizing Prdm9 restores hybrid fertility

To explore the DNA-binding characteristics of PRDM9, we generated a line of humanized B6 mice, by replacing the portion of mouse Prdm9 exon 10 encoding the zinc-finger array with the orthologous sequence from the human reference PRDM9 allele (the ‘B’ allele) (Fig. 1a, Extended Data Fig. 1). A feature of PRDM9 (explored further below) is the co-evolution of its zinc-finger array with the genomic background in which it sits13,17. Minisatellite mutational processes at PRDM9 can produce new alleles with duplications, deletions or rearrangements within the zinc-finger array, yielding an almost complete change in PRDM9 binding sites, and thus hotspot locations14. Because the human PRDM9 zinc-finger array evolved on a lineage separated from mice for ~150 million years, our experimental approach allows assessment of functional properties of a PRDM9 zinc-finger array unaffected by changes it has induced in the background genome, similar to new alleles randomly arising in the population.

Humanization of the Prdm9 zinc-finger array in B6 inbred mice had no effect on fertility (Extended Data Fig. 2) and cytogenetic comparisons revealed no significant impact on zygotene DSB counts (DMC1 immunoreactivity, Extended Data Fig. 2b), crossover counts (MLH1 foci, Extended Data Fig. 2c), normal sex body formation (γH2AX immunostaining, Fig. 1b) or quantitative measures of fertility and successful synapsis (see later). The full fertility of humanized mice implies there are unlikely to be any specific essential PRDM9 binding sites. One mechanism underlying speciation in many settings involves Dobzhansky–Muller incompatibilities: hybrid dysfunction arising from incorrect epistatic interactions1. Based on the above, it seems likely that if such interactions involving PRDM9 occur, they do not reflect constrained co-evolution of Prdm9 with specific genes.

To explore the role of PRDM9 in fertility directly, we crossed PWD females with B6B6/H males. As expected18, male (PWD × B6)F1PWD/B6 hybrids (we use superscripts to indicate Prdm9 genotypes and write the female strain first in crosses) exhibited hybrid sterility as evidenced by failures in siring pups (Extended Data Fig. 2e), sex body formation (Fig. 1c) and synapsis (Fig. 1d). In contrast, all these defects were completely rescued in (PWD × B6)F1PWD/H hybrids inheriting the engineered humanized zinc-finger array (Fig. 1c, d, Extended Data Fig. 2e). Thus the zinc-finger domain of PRDM9, and hence probably the DNA-binding properties of this protein, underlies the role of Prdm9 in hybrid sterility.

Although (PWD × B6)F1PWD/B6 male mice are completely sterile, the male progeny of the reciprocal cross (B6 × PWD)F1B6/PWD are semi-fertile9. A particular 4.7 Mb locus (Hstx2) on the PWD X chromosome influences these fertility differences6. We also tested the impact of humanization in this reciprocal cross, and full fertility (from semi-fertility) was again restored (see below and Supplementary Information). Thus humanization of PRDM9 acts at least partially independently of Hstx2.

Our reprogramming of the PRDM9 binding sites mimics the consequences of mutational changes in its zinc-finger array. The restoration of hybrid fertility suggests that the same rescue is likely to occur for newly arising alleles that also reset PRDM9 binding sites, and hence hybrid sterility between subspecies driven by Prdm9 will be evolutionarily transient. This raises the question, which we return to below, of what properties are possessed by Prdm9 alleles that are associated with reduced fertility.

Humanizing the recombination landscape

To characterize the consequences of re-engineering the zinc-finger domain on recombination, we generated high-resolution DSB maps for mice with different Prdm9 alleles and genomic backgrounds, using ChIP-seq single-stranded DNA sequencing19 on adult testes. This approach identifies single-stranded 3′ sequence ends decorated with DMC1, which arise as intermediates following creation of DSBs by SPO11. In addition to mapping DSB hotspots, our hotspot-calling algorithm estimates a hotspot ‘heat’, proportional to the fraction of cells marked by DMC1 at that locus (Supplementary Information). This DMC1 heat depends on both the relative frequency of DSB formation and on how long DMC1 marks persist20. We also obtained complementary information by performing ChIP-seq to measure H3K4me3, a histone modification directly introduced in cis by PRDM9 binding11.

Relative to wild-type B6 mice21, B6H/H mice showed completely changed hotspot landscapes (2.6% overlap; Extended Data Fig. 3), with hotspots in the humanized mouse showing strong enrichment for a motif matching the previously reported human PRDM9 binding motif13(Extended Data Fig. 4). Most DSB hotspots overlapped H3K4me3 peaks (89%, P < 0.05). Correlation between the wild-type and humanized mice in total DSB heats increased over larger genomic scales (Extended Data Fig. 3b), consistent with earlier studies showing large-scale crossover rates depend on factors other than PRDM916,20,22.

In the heterozygous mouse, despite the presence of two different Prdm9 alleles, we found a similar number of hotspots to homozygous mice (Supplementary Table 1). Furthermore, almost all B6B6/H hotspots (95.8%) were found in either the B6B6/B6 or B6H/H mice (Extended Data Figs 3c, 4c, Supplementary Table 2). The human allele exhibited 2.7-fold dominance over the wild-type allele (Supplementary Table 2), with even stronger dominance for hotter hotspots (Extended Data Fig. 3c). Comparison of homozygous and heterozygous hotspot heats (Extended Data Fig. 3d, e) implies B6 hotspots operate similarly, but are proportionally less active, in the heterozygote. For additional DSB hotspot analyses, see Supplementary Information and Extended Data Fig. 5.

Humanization restores symmetric binding

Next we examined DSB hotspot maps for hybrid males: infertile (PWD × B6)F1PWD/B6, reciprocal semi-fertile (B6 × PWD)F1B6/PWD, humanized rescue (PWD × B6)F1PWD/H, and reciprocal humanized rescue (B6 × PWD)F1H/PWD, with wild-type PWD for comparison. Sequence differences between the PWD and B6 genomes allowed us to determine whether individual hotspots in these hybrids were ‘symmetric’, with DSBs occurring equally on both chromosomes, or ‘asymmetric’, with a preference towards either the PWD or B6 chromosome (Supplementary Information Section 5).

We found that most DMC1 signal (71.8%) in (PWD × B6)F1PWD/B6 or (B6 × PWD)F1B6/PWD hybrids occurs within asymmetric DSB hotspots (Fig. 2a, Extended Data Fig. 6). Further, DSBs associated with the PWD allele occur largely on the B6 chromosome and those associated with the B6 allele occur largely on the PWD chromosome. We also measured asymmetry of the H3K4me3 mark at each hotspot and found the same pattern, confirming that DSB asymmetry largely reflects underlying differences in PRDM9 binding and methylation between the two homologues. This H3K4me3 asymmetry resembles that previously described for (B6 × CAST)F1B6/CAST hybrids17, but is considerably more extreme. Sequence differences directly disrupting PRDM9 binding motifs explain almost all cases of binding asymmetry (83.4% of PWD hotspots; 91.3% of B6 hotspots), and result from rapid mutational accumulation along the separate lineages from the common ancestor of B6 and PWD (Extended Data Fig. 6g).

Figure 2: DSB hotspot asymmetry in hybrids.
figure 2

a, Distribution of the fraction of reads originating from the B6 chromosome in the infertile (PWD × B6)F1PWD/B6 mouse. PRDM9 control at each hotspot is attributed to B6 (blue), PWD (pink) or undetermined (grey). b, As a, but for non-shared hotspots, unique to either the rescue (PWD × B6)F1PWD/H mouse (top) or the infertile (PWD × B6)F1PWD/B6 mouse (bottom). c, Relative contributions of B6 and humanized PRDM9 to DMC1 signal in (B6/CAST)F2B6/H. Bars represent the three possible genomic backgrounds. d, Individual chromosome effects (relative to chromosome 1) when comparing DMC1 signals in (PWD × B6)F1PWD/B6 relative to (PWD × B6)F1PWD/H, for shared DSB hotspots. Error bars, ±1 s.e. e, As d, but for H3K4me3. f, Comparison of DMC1 chromosome effects (as in d) with the fitted chromosome effects, using a model including the symmetric hotspot measures for the three Prdm9 alleles. Error bars, ±3 s.e. (95% simultaneous confidence level for 19 chromosomes).

PowerPoint slide

Such asymmetry can arise through meiotic drive to favour mutations disrupting PRDM9 binding motifs, within populations where these motifs are active. Any new mutation disrupting PRDM9 binding at a hotspot is preferentially transmitted to offspring: in individuals heterozygous for the mutation, DSBs occur preferentially on the non-mutant chromosome and are then repaired by copying from the mutant chromosome23. This phenomenon has been observed at PRDM9 binding motifs in human13 and mouse17 and causes a rapid accumulation of mutations disrupting PRDM9 binding. B6 and PWD Prdm9 alleles are largely subspecies-specific15, so only the B6 lineage has experienced strong erosion of the B6 binding motif, and only the PWD lineage has experienced strong erosion of the PWD binding motif. This asymmetric erosion explains the highly asymmetric PRDM9 binding sites in F1 hybrids.

Because the human allele has not been present in mice, its binding sites have not experienced erosion in the mouse genome. As a consequence, DSBs at hotspots attributable to the human allele occur mostly (57%) in symmetric hotspots, with the remaining, asymmetric hotspots mainly (84.2%) explained by the presence of mutations that coincidentally fall within the human PRDM9 binding motif (Fig. 2b). Conversely, only 30% of DSBs at hotspots attributable to the B6 allele occur in symmetric hotspots. An identical pattern is seen in the reciprocal crosses. Thus, a genome-wide effect of humanizing the mouse is to reprogram hotspot positions with the consequence that hotspot asymmetry is reduced in the hybrids.

Meiotic drive might also explain dominance, as seen for the human Prdm9 allele over the B6 allele in the B6B6/H mouse, because B6 motifs are heavily eroded on the B6 background. To test this, we created F2 mice to analyse the behaviour of the B6 and humanized alleles on a neutral Mus musculus castaneous (CAST/EiJ) background which has been unaffected by B6 motif erosion (Extended Data Fig. 6h). Dominance of the human allele disappeared in regions of the genome with two copies of the CAST genome—removing the effect of motif erosion removes the dominance (Fig. 2c). This result excludes some factors which might influence dominance (Supplementary Information), and also suggests that recently arisen Prdm9 alleles might be dominant over older alleles, for which meiotic drive will have had more time to degrade binding motifs.

Chromosome-specific trans effects of humanization

The infertile and humanized rescue mice share some hotspots, controlled by the PWD allele. These shared hotspots show strong correlation (r2 = 0.63) in DMC1 heat, but nevertheless far weaker than that between hotspots in the infertile and reciprocal mice (0.95). To explore this weaker correlation, we compared DMC1 heats in the two mice for each shared hotspot and calculated their ratio. We observed substantial differences in these ratios across different chromosomes (Fig. 2d). Thus, substituting the B6 allele for the human allele impacts hotspots that neither allele binds directly, in trans, and this impact is observed at broad genomic scales. This trans effect might reflect differences in either the formation, or downstream processing, of DSBs. In contrast to DMC1, the H3K4me3 heat showed no significant chromosomal ratio differences (Fig. 2e), implying that the trans effect probably operates downstream of PRDM9 binding. Furthermore, comparison of DMC1 heats between B6B6/B6 and B6B6/H mice also revealed chromosome effects (Extended Data Fig. 7). This implies that such trans effects do not depend on SNP presence (the B6 background is fully homozygous), and cannot simply be a consequence of asynapsis (observed only in the infertile mouse).

Next, we sought to understand the drivers of these chromosome-specific differences in DMC1 heat by testing various potential predictors of these differences between the infertile and humanized rescue mice (Supplementary Information). After an exhaustive search over possible models, given the predictors considered, the best-fitting model was highly predictive (r2 = 0.84; Fig. 2f) and included only symmetric hotspot measures—the total H3K4me3 signal from PRDM9 binding on both homologues (that is, symmetrically) at the same hotspots, summed over the entire chromosome—for each of the three Prdm9 alleles (P < 0.01 in each case). The trans effect is thus explained by knowledge of only the direct differences in PRDM9 binding targets across mice, without any additional information regarding other features such as SNP diversity, consistent with the sole difference between the infertile and rescue mice being the zinc-finger array of Prdm9. Moreover, only symmetric hotspots (in the infertile mouse, a minority) provide predictive power.

The fitted model implies that lower overall symmetric binding results in increased DMC1 heat, at a chromosome level. The same properties (P < 0.0002; Supplementary Information) hold true in the comparison between B6B6/B6 and B6B6/H mice. Although the B6 background is completely homozygous, so PRDM9 is predicted to mark H3K4me3 equally on both homologues, different total levels of H3K4me3 marking across chromosomes still occur and these correlate with observed differences in DMC1 heat between the two genotypes. This excludes sequence differences at or near hotspots, or asynapsis itself, as a cause, and suggests that the total amount of symmetric binding on each chromosome, as opposed to a simple lack of asymmetric binding, plays an important role in predicting DMC1 heat. The direction of causality is reasonably clear (binding predates DSB formation, and the H3K4me3 mark lacks similar chromosome effects), while confounding influences should always be shared between the mice being compared and thus cannot alone explain the observed inter-chromosomal differences. It therefore appears that differences in the level of overall symmetric binding by PRDM9 drive downstream trans effects at chromosomal scales, with lower symmetric binding somehow increasing the number, or repair time, of DSBs even at distant hotspots.

PRDM9 binding symmetry and synapsis

Sterile (PWD × B6)F1PWD/B6 hybrids show very high rates of asynapsis, particularly at specific chromosomes5, and failure to form the sex body during early meiosis5,9. By contrast, these phenotypes are completely rescued in (PWD × B6)F1PWD/H hybrids harbouring the humanized Prdm9 allele (Fig. 1c, d). Having seen a relationship between PRDM9 binding symmetry and the recombination process, we examined binding symmetry in relation to fertility. For different male mice, we measured three quantitative fertility phenotypes24(Fig. 3a), and calculated several genome-wide measures of hotspot symmetry (Extended Data Fig. 8; Supplementary Information). We observed a significant correlation (P = 0.0083; rank correlation permutation test) between the DMC1 symmetry measures and the rate of proper synapsis among all nine mice studied. In humanized hybrid mice, the observed increase in symmetry was accompanied by improved fertility. Notably, this improvement effect is stronger than the Hstx2 modifier, responsible for the difference in asynapsis and fertility observed between the sterile and reciprocal hybrids5(Fig. 3a). An additional mouse hybrid, (B6 × CAST)F1B6/CAST, showed intermediate PRDM9 binding symmetry17 and also an intermediate asynapsis level. Symmetry measures in homozygous mice (PWD, B6B6/B6, B6B6/H, B6H/H) are, as expected, much higher than hybrids, and these mice show the highest synapsis rates and fertility measures.

Figure 3: Humanizing PRDM9 restores proper synapsis and rescues fertility in hybrids.
figure 3

a, Fertility metrics in hybrid mice. Error bars represent bootstrap 95% confidence intervals (symmetry metric), or ±1 s.e. (other metrics) b, Chromosome effects in DMC1 signals (as Fig. 2d) versus previously reported5 asynapsis rates for five chromosomes in infertile (PWD × B6)F1PWD/B6. Error bars, ±1 s.e.

PowerPoint slide

Previous work5 showed that in the infertile (PWD × B6)F1 mouse, synapsis failure occurs at different rates among five chromosomes tested. We compared the reported asynapsis rates for these five chromosomes with the chromosome-specific DMC1 heat effects described above and found an identical ranking (P = 0.017 by rank correlation permutation test; Fig. 3b). Because these DMC1 heat effects are strongly predicted by symmetric H3K4me3 levels in the infertile mouse, this result implies that chromosomes with lower symmetric PRDM9 binding experience higher asynapsis rates. This may explain why lower symmetric PRDM9 binding genome-wide accompanies higher overall asynapsis rates among different mice.

Having found elevated DMC1 heat on chromosomes influenced by asynapsis (where homologous pairing fails), we examined DMC1 and H3K4me3 heats in two additional settings, where no homologue exists at all and thus homologous chromosome pairing cannot occur: the X chromosome in male mice, and separately in humanized hybrid mice at autosomal hotspots where the human PRDM9 binding motif lies within a region deleted in the PWD genome. In both these settings, we observed an elevation of DMC1 heat relative to autosomal hotspots bound symmetrically by PRDM9 (Extended Data Fig. 9). Elevation of DMC1 heat might, therefore, be a consistent signature of non-pairing of homologous chromosomes during meiosis. DMC1 elevation might be explained by an increased probability of a DSB occurring at that site, or by the DMC1 coating at breaks persisting for longer (delayed repair). However, the total number of RAD51-marked DSBs initiated per cell is tightly regulated25, remaining unchanged even in Prdm9 knockouts26, while in both knockouts and infertile hybrids, DSB marks indeed persist late into pachytene, suggesting a failure of repair5,9,26. Therefore, the elevated DMC1 signals we observe may be explained by persistence of DMC1 where homologous repair is compromised or delayed.

PRDM9-dependent homologue interactions

Given our chromosome-scale observations, we next asked whether symmetric binding at individual hotspots might also influence DMC1 heat. At each human-controlled hotspot in the humanized rescue, we measured the component of total DMC1 heat contributed by the B6 chromosome only, and compared this to the DMC1 heat for the same hotspot in B6H/H. The comparison revealed (Fig. 4a) a remarkably strong, and clear, elevation in DMC1 heat in the hybrid mouse for the asymmetric hotspots (>90% asymmetry, towards binding of only the B6 chromosome), relative to the symmetric hotspots (those within 10% of complete symmetry). However, similar to the chromosomal analysis, H3K4me3 enrichment showed no difference whatsoever between symmetric and asymmetric sites in these mice (Fig. 4b). Indeed a comparison of H3K4me3 and DMC1 heat revealed a far higher (Fig. 4c) ratio of average DMC1 heat to H3K4me3 enrichment for asymmetric relative to symmetric hotspots, across all hybrid mice, backgrounds, and Prdm9 alleles tested (Extended Data Fig. 9d). This effect reflects a consistent elevation of DMC1 heat at DSB sites on individual chromosomes when the homologue is not bound strongly (Extended Data Fig. 9e, f). This phenomenon cannot easily be explained by factors including local heterozygosity within or outside the PRDM9 motif, the type of mutation(s) disrupting PRDM9 binding, or outlier effects (Extended Data Figs 9, 10; Supplementary Information Section 13).

Figure 4: Asymmetric DSB hotspots show elevated DMC1 signals but no H3K4me3 elevation.
figure 4

a, Comparison of B6H/H and (PWD × B6)F1PWD/H DMC1 signals (medians shown). Signals are compared for symmetric and asymmetric hotspots in (PWD × B6)F1PWD/H, on the shared B6 chromosomes. Error bars, 95% confidence intervals. b, As a, but for H3K4me3. c, Comparison of DMC1 signals in (PWD × B6)F1PWD/B6, at symmetric and asymmetric hotspots, binned by H3K4me3 enrichment (medians shown). H3K4me3 and DMC1 signals are estimated on the PWD chromosome only, for hotspots associated with B6 PRDM9.

PowerPoint slide

Thus, elevation of DMC1 heat on the bound chromosome appears to be a universal feature of hotspots where PRDM9 binds asymmetrically, relative to symmetrically bound hotspots. By contrast, the results for H3K4me3 suggest the mark is deposited in an independent manner on each homologue (Supplementary Information Section 12.2). This implies the DMC1 heat elevation depends on a process involving symmetric PRDM9 binding, downstream of H3K4me3 deposition, involving both homologues. While we cannot exclude the possibility that somehow more DSBs occur at asymmetric hotspots, this would require early, precise pairing of homologues, at least at hotspots, before DSB formation, to determine which hotspots are symmetrically bound. Although there is some evidence of pre-meiotic homologue association27, current data do not suggest the existence of precise pairing before DSB formation28. The alternative and more plausible explanation is that sites where PRDM9 binds asymmetrically simply experience a delay in DSB processing, delaying DMC1 removal compared to symmetric DSB hotspots. While our data represent the collective behaviour of populations of cells, this model suggests a mechanism of PRDM9-dependent interaction between homologues influencing downstream DSB processing operating within individual cells, which we discuss below (also Supplementary Information Section 14).

Discussion

Only one mammalian speciation gene, Prdm9, has so far been identified. Humanizing the zinc-finger array of Prdm9 redirects binding, thereby entirely reprogramming recombination hotspots, and in doing so reverses the hybrid infertility between musculus and domesticus subspecies. This modification mimics the consequences of a newly arising allele and thus suggests that Prdm9 evolution (for example, rapid fixation of particular existing variants3,15 or novel alleles arising by mutation) in either or both subspecies would also restore hybrid fertility.

Multiple lines of evidence in our data, at chromosomal, whole-organism, and individual hotspot scales, strongly suggest novel roles for PRDM9 in the formation or processing of DSBs downstream of H3K4me3 deposition, dependent upon symmetric binding. Several aspects of our, and published, data (comparison between B6B6/B6 and B6B6/H mice, see also Supplementary Information) also mean that our results cannot be fully explained simply by sequence differences within or around hotspots, which do not specifically impact binding symmetry.

Pervasive asynapsis is proposed to be the underlying cause of infertility in hybrid mice5. We observed a positive relationship between symmetric PRDM9 binding and correct synapsis of homologous chromosomes later in meiosis. Replacing the B6 allele with the humanized allele in hybrids greatly increases symmetric binding, restoring proper synapsis and fertility. Many apparently complex relationships have previously been reported between naturally occurring mouse Prdm9 alleles, allelic dosage, and quantitative fertility measures in hybrids9. Each of ten manipulations shown or predicted to increase PRDM9 binding symmetry also increases meiotic success and fertility (Supplementary Information), supporting the idea that the link between binding symmetry and fertility might be very general, and causal.

The erosion of PRDM9 binding sites through meiotic drive17 also occurs at human hotspots13, and probably across many mammals. In two populations separated for sufficient time, differential PRDM9 binding site erosion will decrease symmetry in hybrids, which is likely to decrease fertility levels (though not necessarily to the extreme of sterility). Therefore, PRDM9 may affect hybrid fertility levels across many mammalian species and so might repeatedly act in driving early speciation steps, although the rapid evolution of the zinc-finger array of PRDM9 implies an unexpected transience of this direct role. However, even subtle or transient PRDM9-driven reductions in fertility might still provide a selective advantage to additional mutations contributing towards speciation. This mechanism is different from the previously characterized causes of intrinsic hybrid incompatibilities, such as differences in ploidy, chromosomal rearrangements, or incompatibilities between genes. The extent to which it has been responsible for speciation in the natural world appears an interesting question for further research.

One plausible mechanism for the impacts of (a)symmetry involves a role for PRDM9 binding in aiding homology search—a process thought to involve invasion of the homologous chromosome to probe for homology by single-stranded DNA formed around DSBs29. It has been suggested that synaptonemal complex proteins are loaded at some DSB sites and synapsis begins to spread7,8. Extending this model, to incorporate the property that asymmetrically bound sites are less favourable for homology search, would parsimoniously predict each symmetry-related phenomenon we observed: DSBs at asymmetric hotspots would repair more slowly, elevating their DMC1 signal, and chromosomes with fewer symmetric hotspots overall would show delayed DSB repair and higher asynapsis rates, ultimately causing subfertility or sterility in animals with low symmetric binding. It is not known how homology search occurs efficiently in the nuclear environment, given the enormous potential search space of the genome30, or why hotspots exist at all. Both phenomena could be explained by the above model in which homology search is focused at least partly on hotspot positions. Indeed hotspots might massively increase search efficiency by directing homology search to PRDM9 binding sites.

Methods

Gene targeting in embryonic stem cells

A C57BL/6J (B6) mouse genomic BAC clone (RP23-159N6) encompassing the Prdm9 gene was used for subcloning of homology regions. A 7 kb XmaI/SpeI fragment upstream of exon 10 and a 2.5 kb BamHI/SpeI fragment downstream of exon 10 were used as 5′ and 3′ homology regions, respectively. The intervening 4 kb SpeI/BamHI encoding exon 10 and flanking intronic regions were subcloned and an internal 1.4 kb BglII–NheI fragment, containing the coding region of the zinc-finger array, was replaced with a synthesized fragment (Life Technologies) encoding the zinc-finger array from the human B allele. All coding sequence 5′ of the first zinc finger and all 3′ untranslated regions (UTR) downstream of the stop codon were left as mouse. This humanized fragment was then assembled between the two homology arms, upstream of a neomycin selection cassette. PhiC31 attP sites were incorporated immediately downstream of the 5′ homology arm and between the PGK promoter and the neomycin phosphotransferase open reading frame to equip the locus with PhiC31 integrase cassette exchange machinery for subsequent manipulations31.

The completed targeting vector was linearized with ApaI and electroporated into mycoplasma free C57BL/6N JM8F6 embryonic stem cells (Extended Data Fig. 1a). JM8F6 cells were a gift from B. Skarnes, Wellcome Trust Sanger Institute. Following selection in 210 μg ml−1 G418, recombinant clones were screened by PCR to detect homologous recombination over the 3′ arm. A forward primer (5′-TACCGGTGGATGTGGAATGTG-3′) binding within the PGK promoter was used together with a reverse primer (5′-TGACAGCAAAAACCACCTCTA-3′) binding downstream of the 3′ homology arm to amplify a 2.7 kb fragment from correctly recombined clones. Positive clones were examined for correct recombination at the 5′ end by long range PCR using a forward primer (5′-CAGAGGACCTTTAGTCTGTGAGGG-3′) binding upstream of the 5′ homology arm and a reverse primer (5′-AGCAGAGGCTTGACCTATCGCTAA-3′) binding within the humanized region. Correctly targeted clones yielded a 10.4 kb amplicon. Sanger sequence analysis of the 10.4 kb amplicon encompassing the 5′ homology arm with primer 5′-CCTTTCTCAATGATCCACAAAT-3′ confirmed the correct integration of the 5′ attP sequence, necessary for future manipulations of the locus. Southern blotting using a probe against neomycin was used to confirm that only a single integration event had occurred.

Mouse production and matings

Mice were housed in individually ventilated cages and received food and water ad libitum. All studies received local ethical review approval and were performed in accordance with the UK Home Office Animals (Scientific Procedures) Act 1986. Experimental groups were determined by genotype and were therefore not randomized, with no animals excluded from the analysis. Sample size for fertility studies and cytogenetics (see below) were selected on the basis of previously published studies5,9,32. No statistical methods were used to predetermine sample size. All phenotypic characterization was performed blind to experimental group.

ES cells from correctly targeted clones were injected into albino C57BL/6J blastocysts and the resulting chimaeras were mated with albino C57BL/6J females. Successful germline transmission yielded black pups and F1 mice harbouring the humanized Prdm9 allele were identified using the above attP screening PCR. F1 heterozygous male mice were bred with C57BL/6J Flp recombinase deleter mice (Tg(ACTB-Flpe)9205Dym (Jax stock 005703)) and offspring were screened for the deletion of the selection cassette using a forward primer (5′-TTCTGCCATCACTTCCTTCGGTGA-3′) binding immediately upstream of the cassette and a reverse primer (5′-TCTGAAGCCCAACTATTTCATTAATACCCC-3′) binding immediately downstream of the cassette. A 677-bp amplicon was obtained from the Flp-deleted humanized allele and a 491-bp amplicon was obtained from the wild-type allele. Heterozygous humanized mice without the selection cassette were then backcrossed with C57BL/6J to remove the Flp transgene before intercrossing to obtain experimental cohorts of heterozygous, homozygous and wild-type mice which were genotyped with the above PCR. PWD/PhJ mice were a gift from J. Forejt, Institute of Molecular Genetics, Prague, Czech Republic and CAST/EiJ were sourced from MRC Harwell.

Fertility was assessed in male mice between the ages of 2 and 4 months by recording the average number of pups obtained when bred with 7-week-old wild-type C57BL/6J female mice. Paired testes weight was recorded and normalized against lean body weight, as assessed using EchoMRI-100 Small Animal Body Composition Analyzer.

Immunohistochemistry analyses

Spermatocytes from mice at approximately 9 weeks of age were prepared for immunohistochemistry by surface spreading33,34. In brief, the testis tunica was removed, the tubules cut with a razor blade and disassembled by pipetting, in PBS, containing protease inhibitors (Complete, Roche). Following centrifugation at 5,800 g for 5 min, the cells were resuspended in 0.1 M sucrose, and spread onto the surface of slides in a drop of 1% paraformaldehyde in PBS. The slides were left to dry for 3 h at room temperature, in a humidified box, then washed in 0.4% Photo-Flo 200 (Kodak), and either used immediately, or stored at −80 °C. For immunohistochemistry the following antibodies were used: mouse anti-MLH1 (BD, 51-1327GR); mouse anti-phospho-H2A.X (Millipore 05–636, clone JBW301); rabbit anti-SYCP1 (Novus Biological, NB300–229); rabbit anti-DMC1 (Santa Cruz Biotechnology sc-22768, H-100); mouse anti-SYCP3 (Santa Cruz Biotechnology sc-74569, D-1); rabbit anti-SYCP3 (Abcam ab15093). Non-specific binding sites were blocked by incubating the cells with 0.2% BSA, 0.2% gelatin, 0.05% Tween-20 in PBS (B/ABD buffer). Cells were incubated with the primary antibodies overnight at 4 °C. Following washes in B/ABD buffer and detection with secondary antibodies, the slides were mounted in DAPI/Vectashield (Oncor) and analysed with an Olympus BX60 microscope for epifluorescence, equipped with a Sensys CCD camera (Photometrics, USA), using Genus Cytovision software (Leica).

Spermatocytes were staged based on SYCP3 staining. For MLH1 analysis, only pachytenes with 19 or more foci, co-localizing with SYCP3, were considered, according to criteria defined by ref. 35. For DMC1 analysis, randomly selected cells, from any stage, were scored. The number of DMC1 foci per cell was counted using the PointPicker macro in ImageJ (64 bit). For SYCP1 analysis, only cells in pachytene were considered. Cells with 19 fully synapsed autosomes, with co-localizing SYCP1 and SYCP3 signals, and one XY body, were considered normal. For characterization of γH2AX, cells in pachytene or diplotene were scored, and we considered normal those where only a clearly identifiable XY body was covered by γH2AX signal.

Prdm9 expression via RT–PCR analysis

To verify the correct expression of the humanized Prdm9, we performed exon-spanning end-point RT–PCR on whole testis cDNA prepared using Tetro reverse transcriptase (Bioline) using a forward primer binding to exon 9 (5′-CATTAAG TGGGGAAGCAAGA-3′) and a reverse primer binding within the 3′ UTR, immediately downstream of the humanized zinc-finger domain encoded by exon 10 (5′-GGGATTTAATTCCCTTTTCTAGTCA-3′) (Extended Data Fig. 1b). Q-PCR analysis of Prdm9 transcripts was performed using two primer pairs (5′-GAATGAGAAAGCCAACAGCA-3′ and 5′-GGACAACCAGACTGCACAGA-3′; 5′-AGCCAACAGCAATAAAACCA-3′ and 5′-GGGATTTAATTCCCTTTTCTAGTCA-3′), amplifying regions within the 3′ UTR, normalizing against a housekeeping gene (Hprt; 5′-AGCTACTGTAATGATCAGTCAACG-3′ and 5′-AGAGGTCCTTTTCACCAGCA-3′) using the Power SYBR Green PCR Master mix (Applied Biosystems) and a BioRad CFX96 cycler as per the manufacturer’s instructions. Relative expression was calculated using the Livak method. Expression of the humanized Prdm9 allele was unaffected by the genetic manipulation (Extended Data Fig. 1b, c).

Single-stranded DNA sequencing and double-strand break (DSB) detection

Testis cells from B6H/H, B6B6/H, wild-type PWD, the infertile (PWD × B6)F1PWD/B6, the reciprocal semi-fertile (B6 × PWD)F1PWD/B6, the humanized rescue (PWD × B6)F1PWD/H, (B6 × CAST)F1B6/CAST, (B6/CAST)F2B6/H males were subjected to single-stranded DNA sequencing (SSDS) as previously described19. In addition, we used the sample C57BL/6 (sample 1) from ref. 21 aligned to mm9/NCBI37. This sample was also re-mapped to mm10/NCBI38 with a modified BWA mapper19. Other samples from ref. 21, 9R (sample 2), 13R (samples 1 and 2) and Prdm9 knockout (B6–/–) (sample 1)10, were also used in the comparative analysis of DSB maps (Extended Data Fig. 3e). B6H/H and B6B6/H libraries were prepared in D. Camerini-Otero’s lab (NIH) and sequenced on a HiSeq 2000 platform, using paired-end reads (read 1: 36 bp; read 2: 40 bp). These samples were aligned to the mouse mm9/NCBI37 reference genome.

Wild-type PWD, the infertile (PWD × B6)F1PWD/B6, the reciprocal semi-fertile (B6 × PWD)F1PWD/B6, the reciprocal rescue (PWD × B6)F1PWD/H, (B6 × CAST)F1B6/CAST, (B6/CAST)F2B6/H samples were prepared in The Wellcome Trust Centre for Human Genetics and sequenced on HiSeq 2000 and HiSeq 2500 platforms, using paired-end reads (51 bp for each read). These samples were aligned to the mouse mm10/NCBI38 reference genome with a modified BWA mapper19. Variation in the number of sequenced fragments results from the difficulty to precisely assess the DNA concentration before sequencing. Only fragments with high mapping quality (at least 20) were retained for DSB hotspot calling, and only one copy of each duplicate fragment was conserved (here, a fragment is duplicated if there exists at least one other fragment mapping to the same genomic position). Supplementary Table 1 gives details about the samples considered in this study.

H3K4me3 ChIP-seq

ChIP-seq was performed as previously described36 with several modifications (noted here). In brief, the testis tunica was removed, the tubules disassociated with tweezers and fixed in 1% formaldehyde in PBS for 5 min followed by glycine quenching (125 mM final conc.) for 5 min at room temperature. Following washing steps, pellets were resuspended in 900 μl cold RIPA lysis buffer, dounced 20 times and sonicated in 300 μl aliquots in a Bioruptor Twin sonication bath at 4 °C for three 10-min periods of 30 s on, 30 s off at high power, then cell debris was pelleted and removed and aliquots were pooled. For each sample, 50 μl of equilibrated magnetic beads were resuspended in 100 μl PBS/BSA and added to the chromatin samples for pre-clearing for 2 h at 4 °C with rotation. Beads were removed, and 100 μl of pre-cleared chromatin was set aside for the input control. 5 μl rabbit polyclonal anti-H3K4me3 antibody (Abcam ab8580) was added to the remaining pre-cleared chromatin and incubated overnight at 4 °C with rotation. 50 μl beads were washed and resuspended as before, then incubated with the chromatin samples for 2 h at 4 °C with rotation. Beads were then washed and de-crosslinked at 65 °C as described36, and for input controls, 50 μl of pre-cleared chromatin was used. After de-crosslinking, samples were further incubated with 80 μg RNase A at 37 °C for 60 min and then with 80 μg Proteinase K at 55 °C for 90 min. DNA was purified using a Qiagen MinElute reaction cleanup kit.

ChIP and total chromatin DNA samples were sequenced in multiplexed paired-end Illumina libraries, yielding 51-bp reads. We prepared two biological replicates plus one genomic input control each for the infertile (PWD × B6)F1PWD/B6, reciprocal (B6 × PWD)F1B6/PWD, and rescue (PWD × B6)F1PWD/H mice, yielding roughly 40–50 million usable read pairs per replicate. For the B6B6/B6 and B6H/H mice, we prepared one biological replicate each (yielding 70–80 million usable read pairs per sample) and later split read pairs into pseudoreplicates. Sequencing reads were aligned to mm10 using BWA aln37(v. 0.7.0) followed by Stampy38(v. 1.0.23, option bamkeepgoodreads), and reads not mapped in a proper pair with insert size smaller than 10 kb were removed. Read pairs representing likely PCR duplicates were also removed by samtools rmdup. Pairs for which neither read had a mapping quality score greater than 0 were removed. Fragment coverage was computed at each position in the genome and in 100-bp non-overlapping bins using in-house code and the samtools39 and bedtools40 packages.

DSB hotspot detection and map comparison

To analyse DMC1 data, we developed a novel ChIP-seq peak caller, specific to DSB hotspots, which takes advantage of the shift in the mapping of single stranded DNA (ssDNA) reads between the 5′ and the 3′ DNA strands to call hotspots. These ssDNA segments are a consequence of the resection of DNA ends that accompanies a DSB and are isolated by DMC1 ChIP19. For each hotspot, the caller estimates in particular the centre of the hotspot, and its heat, loosely defined as the number of reads mapping to this DSB hotspot and predicted to represent real signal. The caller handles sample replicates and is able to call hotspots using several samples jointly. Details are given in Supplementary Information. DSB hotspots from two different samples are considered to overlap if their centres are at most 600 bp apart. DMC1 hotspot heats have been normalized so that the sum of hotspot heats is identical in each sample (and equals the sum of hotspot heats in B6B6/B6 (sample 1)).

H3K4me3 enrichments have been computed at DSB hotspots identified by DMC1 ChIP-seq, using our previously published method36(Supplementary Information Subsection 7.1). H3K4me3 hotspots have also been called de novo, without using DSB hotspots, using the same approach36. The de novo calls were used to generate a list of regions likely to be trimethylated independently of PRDM9, by intersecting calls in mice with different Prdm9 alleles. In comparisons involving both DMC1 and H3K4me3 data, we excluded DSB hotspots contained in any of the PRDM9-independent trimethylated regions, and we used H3K4me3 enrichments computed at DSB hotspots (Supplementary Information). We only used de novo calls for analysis in Extended Data Fig. 6d, e.

DNA binding motif analyses

We developed a new, Bayesian, approach to identify DNA motifs enriched at DSB hotspots (Supplementary Information). We used FIMO (MEME Suite version 4.9.1) to find the locations of those motifs genome-wide. Using Mus famulus and Mus caroli as outgroups, we reconstructed an ancestral reference genome for B6 and PWD. We could therefore identify on which lineage (B6 or PWD) mutations between these two mouse strains occurred. See Supplementary Information for details.

DSB hotspot assignment in hybrids

Using SNPs between the B6 and PWD genomes, each read pair from a hybrid DSB library (DMC1 ChIP-seq) is assigned to one of the categories ‘B6’, ‘PWD’, ‘unclassified’ or ‘uninformative’ using criteria detailed in Supplementary Information. For each DSB hotspot, the ratio of informative reads from the B6 chromosome was then computed as the fraction of ‘B6’ reads mapped within 1 kb of the hotspot centre, over the sum of ‘B6’ and ‘PWD’ reads in that region. We followed a similar approach for H3K4me3 ChIP-seq, but we further corrected for background signal.

Chromosome effects

To test for statistically significant differential elevation of DMC1 (or H3K4me3) heats between chromosomes following Prdm9 humanization of the infertile (PWD × B6)F1PWD/B6 mice, we fitted a quasi-Poisson model to these heats, including predictors for each chromosome. Specifically, we estimated parameters α, β and γ by fitting the model , where dinfertile and drescue are the DMC1 heats of a particular hotspot which is shared between the infertile (PWD × B6)F1PWD/B6 and rescue (PWD × B6)F1PWD/H mice and c is a categorical variable which represents the chromosome on which the DSB hotspot occurs. Furthermore, for one of the hybrid mice we considered, for a given autosome, we defined the ‘total H3K4me3 signal from PRDM9 binding on both homologues (that is, symmetrically) at the same hotspots, summed over the entire chromosome’, also referred to as ‘the sum of ‘symmetric’ heats’, as , where ri is the fraction of DMC1 reads coming from the B6 chromosome for hotspot i, hi is the H3K4me3 heat of that hotspot, and the sum is taken over all the hotspots on that chromosome which are under the control of a specific (PWD, B6, or humanized) PRDM9. (Our analyses always refer to this sum of symmetric heats for a specific allele.) When we considered the B6 mouse (which of course has two B6 chromosomes), we defined this sum of symmetric heats to be (which is the special case of the formula above with , corresponding to all hotspots being fully symmetric). Under the assumptions we describe in the Supplementary Information, this can also be interpreted as being proportional to the expected number of hotspots with PRDM9 bound on both homologues. Details and motivations for defining this quantity are given in Supplementary Information Section 8, together with a slight adjustment we used in practice to provide robustness against outliers in the value of .

We proceeded similarly in the B6B6/B6–B6B6/H comparison. The observed effects reported in Fig. 2d–f and Extended Data Fig. 7 are normalized to the effect for Chromosome 1. Precise definitions for the model, and for the 14 chromosome effect predictors tested, are given in Supplementary Information.

Analysis code availability and source data

Analysis code used for analysis in this study is available at https://github.com/anjali-hinch/hybrid-rescue. The source data generated in this publication has been deposited in NCBI’s Gene Expression Omnibus (accession number GSE73833).