Introduction

Extensive advances in comparative cytogenomics and more recently in genome sequence comparisons have significantly increased our knowledge of mammalian chromosomal evolution (Froenicke and Lyons 2008; Kemkemer et al. 2009). In particular, the availability of whole genome sequences from diverse mammalian orders has considerably improved the definition of chromosomal syntenies and the resolution of evolutionary breakpoints. These approaches support the non-random distribution of evolutionary breakpoints, as well as a significant correlation between synteny breakpoints and different classes of repeat sequences which appear to be lineage-specific (Froenicke and Lyons 2008). Thus, breakpoints are considerably enriched in segmental duplications and various tandem repeats in primates (Kemkemer et al. 2009; Farre et al. 2011), and have been found to be preferentially associated with centromeres in marsupials, bovids and rodents (Thomas et al. 2003; Metcalfe et al. 2007; Mlynarski et al. 2010). In the latter, the high rate of chromosomal change thus underpins the dynamic role of centromeres in the evolution of genome architecture (rodents, 3.2–3.5 rearrangements/Myr vs humans, 1.6 rearrangements/Myr; Coghlan et al. 2005) (Eichler and Sankoff 2003; Adega et al. 2009; Mlynarski et al. 2010). Among rodents, muroid species such as the house mouse show extensive karyotypic repatterning (Mlynarski et al. 2010; Trifonov et al. 2010). An analysis within the genus Mus that encompasses the house mouse revealed that this high rate of chromosomal change was not restricted to the house mouse lineage, but was a characteristic of the whole genus (Veyrunes et al. 2006). In fact, these authors observed a burst in the chromosomal diversification rate that coincided with the subgeneric radiation. Subsequent karyotypic change among the subgenera varied radically in a clade-specific fashion, the two extremes being the African pygmy mice (subgenus Nannomys), that are characterized by an extraordinary karyotypic diversity, and the subgenus Mus that shows a remarkably conserved karyotype (Veyrunes et al. 2006), although both exhibit a similar species diversity.

The subgenus Mus comprises 14 species including the house mouse, between which the phylogenetic relationships are now well established and dated. Three major geographic clades are defined: (1) a Southeast Asian group with three species, Mus caroli, Mus cervicolor and Mus cooki, (2) an Indian group (Mus booduga, Mus terricolor, Mus nitidulus, Mus fragilicauda and Mus famulus) and (3) a Palearctic group with Mus spretus, Mus macedonicus, Mus spicilegus, Mus cypriacus and the subspecies of Mus musculus (Suzuki et al. 2004; Chevret et al. 2005). The most recently described species, Mus lepidoides, has been tentatively assigned to an additional clade (Shimada et al. 2010). All taxa but the latter species for which no data exist exhibit a conserved karyotype composed of 20 acrocentric chromosomes (2n = 40) with two exceptions: M. terricolor and one subspecies of the house mouse, Mus musculus domesticus. Chromosomal variability in M. terricolor involves heterochromatin addition on the short arms of up to three acrocentric chromosomes as well as several Robertsonian (Rb) translocations and pericentric inversions (Sen and Sharma 1983). M. musculus domesticus shows an extraordinary Robertsonian diversity which is extremely well documented. In addition to populations with the ancestral 2n = 40 karyotype, this subspecies displays more than 90 races carrying different combinations of Rb translocations throughout its western European range (Piálek et al. 2005).

A Robertsonian translocation consists of the joining of two acrocentric non-homologous chromosomes by the centromere to form a single metacentric chromosome (Robertson 1916). This rearrangement is the preponderant mode of chromosomal change in mammals and is well documented in humans, bovids, shrews, sheep and the house mouse (Searle and Wójcik 1998; Bandyopadhyay et al. 2001; Piálek et al. 2005; Nguyen et al. 2008; Adega et al. 2009). Rb translocations typically involve the pericentromeric region which consists of satellite sequences (satDNA). These sequences are large arrays of tandemly repeated sequences that undergo concerted evolution by various potential mechanisms such as unequal cross-over, gene conversion, rolling circle replication or transposition (Plohl et al. 2008). The homogenization process results in a greater similarity of satellite sequences within species than between species and not only within chromosomes but also between chromosomes. Several authors have argued that particular properties of satellite DNA sequences, i.e. a high degree of homology between sequences on all autosomes and large quantities of uninterrupted satellite arrays, may render them ideal substrates for non-homologous recombination events thereby promoting rearrangements, and Rb translocations in particular (Redi et al. 1990a; Garagna et al. 2001; Kalitsis et al. 2006). Such a view is compatible with the sequence organization and composition in several Rb-prone organisms such as humans, cattle, gerbils and mice (Chaves et al. 2003b; Redi et al. 1990a; Garagna et al. 1993; Chaves et al. 2003a; Gauthier et al. 2010). In the house mouse, the centromeric and pericentromeric regions are composed of three related classes of AT-rich satellite DNAs (Mitchell 1996; Kalitsis et al. 2006). The most abundant is the major satellite DNA (monomers of 234 bp) that constitutes the pericentromeric region (Pietras et al. 1983). The centromeric region involves the minor satellite DNA (monomer of 120 bp) (Kipling and Warburton 1997; Mitchell 1996; Plohl et al. 2008). Finally, the newly discovered sequence named the TLC satellite (TeLoCentric; monomer, 146 bp) is present, in the laboratory strain, between the minor satellite and the telomeres. Remarkably, this satellite sequence is in reverse orientation compared to the two other satDNAs, i.e. the major and minor satellites (Kalitsis et al. 2006). The extraordinary Robertsonian radiation evidenced in the house mouse has stimulated a plethora of studies to determine the factors triggering the diversity and high rate of this evolution. Garagna and collaborators (Redi et al. 1990b; Garagna et al. 1993, 2001) compared the composition and organization of the major and minor satellite sequences among several species and subspecies of the subgenus Mus (Narayanswami et al. 1992). The main conclusions were that M. musculus domesticus was the only taxon combining favorable Rb-prone features such as a large and homogenous quantity of satellite sequences (Redi et al. 1990a). Since then, the advances in genomic approaches have uncovered the presence of inverted repeats within or between homologous satellite sequences in both humans and house mice (Bandyopadhyay et al. 2001; Kalitsis et al. 2006). The existence of several similar sequences in reverse orientation was considered as the main trigger enhancing the formation of Rb fusions through aberrant non-homologous recombination in these species.

The present study compares the structure and orientation of the three satellite DNA sequences (major, minor and TLC) characteristic of the house mouse among a large taxonomic component of the subgenus Mus. Specifically, the phylogenetic distribution of these satDNA sequences is explored to gain insight into their evolutionary dynamics and evaluate their contribution to the unique Rb proneness of M. musculus domesticus within the subgenus.

Materials and methods

Material

Animals were obtained from the Conservatoire Génétique de la Souris Sauvage (Institut des Sciences de l’Evolution, Montpellier, France) or collected in the wild and are listed in Table 1. Eight species and three subspecies of the subgenus Mus were sampled. One individual per taxon was analysed, and one specimen from each of the three other subgenera belonging to the genus Mus was also investigated to serve as outgroups.

Table 1 Presence (+)/absence (−) of the three satellite DNAs in taxa within the genus Mus

Chromosomal analyses

Mitotic metaphases were obtained by the air-drying method from bone marrow cells after yeast stimulation (Lee and Elder 1980). Identification of chromosomes was performed by DAPI-Banding following the nomenclature of Cowell (1984). All observations were made with a Zeiss Axiophot fluorescence microscope equipped with an image analyzer (Cytovision 3.93.2, Genetix). Between 10 and 20 metaphases were examined per individual.

Probes

The three satellite DNA probes (major, minor and TLC; Kalitsis et al. 2003, 2006) were labeled by nick translation with Digoxigenin-11-dUTP (DIG) according to the Roche Protocol or with Biotin-14-dATP (BIO) following the recommendation of Invitrogen. For the CO-FISH experiments, we used LNA-modified oligonucleotide probes (purchased from Eurogentec, Belgium) with the following sequences: 5′-CaCtCaTcTaAtAtGtTc-3′DIG for the major satellite, and 5′-tTcGtTgGaACgCgtT-3′BIO for the minor satellite (LNA modifications are indicated in lowercase letters (cf. Fig. 1). The TLC probe was obtained in the following way: (1) a PCR was performed with one 5′-phosphorylated primer using the TLC probe as a matrix (5′-TGTGCCGGTCTGATTTTCTA-3′); (2) after checking for positive amplification (2 % gel agarose), the PCR product was digested with strandase I exonuclease (Sigma). This enzyme digests 5′-phosphorylated strands, in other words, the strand containing the phosphorylated primer was selectively degraded. Purification with the Qiaquick PCR purification kit (Qiagen) produced a single-stranded DNA TLC probe that was then labeled with DIG by the end-labeling procedure (Roche) following the manufacturer’s instructions.

Fig. 1
figure 1

Schematic chromosomal location of single probes after the CO-FISH procedure (the newly formed strands are destroyed). The orientation of probes is given according to Falconer et al. (2010), i.e. the major and the TLC probes hybridized onto the A-rich strand and the minor probe onto the T-rich strand

Fluorescence in situ hybridization

The slides were treated first with HCl (0.04 ‰), then dehydrated in an ethanol series (70, 90 and 100 %) followed by incubation in a 0.15 ‰ pepsin solution for 10 min at 37 °C, and again dehydrated in an ethanol series (70, 90 and 100 %). They were then denatured for 2 min at 72 °C in 70 % Formamide, 2xSSC, dehydrated in an ice-cold ethanol series and air-dried. The probes were denatured for 10 min at 72 °C. The slides were hybridized overnight with one or two probes (200 ng/slide; hybridization buffer: formamide 50 %, 2xSSC,40 mM Na2HPO4/ NaH2PO4, denhart ×1, dextran sulphate 10 %, SDS 0.1 %). Following hybridization, the slides were washed at 39 °C in 2xSSC and 4XT (20xSSC; 0.05 % Tween) before they were incubated at 37 °C for 20 min with an anti-digoxigenin antibody conjugated with FITC (Roche) and/or with an anti-biotin antibody conjugated with CY3 (Sigma). The slides were counterstained with DAPI (4′,6′-diamidino-2-phenylindole) and mounted in a Vectashield antifade solution (Vector Laboratories).

CO-FISH

The CO-FISH is a procedure that provides the relative orientation of two or more sequences within a chromosome. This technique requires the previous incorporation of Bromodeoxyuridine (Brd-U) during a single mitotic S phase, so that only the newly formed DNA strand of each chromatid is labeled with Brd-U. The in vivo protocol was adapted from Falconer et al. (2010). The Brd-U was injected intraperitoneally at 1-hintervals, 6 h before the animal was sacrificed (4 mg Brd-U per injection; total dose, 20 mg Brd-U). The metaphases were then prepared as previously (Lee and Elder 1980). The correct incorporation of Brd-U was controlled by immunofluorescence of a Brd-U antibody conjugated with biotin (ABcam) following the manufacturer’s instructions. The CO-FISH method described by Goodwin and Meyne (1993) was followed. The chromosome slides were treated with RNase (0.5 mg/ml) at 37 °C for 10 min before they were incubated for 15 min in Hoechst 33258 (0.5 μg/ml). The slides were irradiated with ultraviolet light for 30 min (365 nm). After digestion with exonuclease III (3 U/μl), the remaining DNA strands were hybridized with the LNA probes (major-DIG, minor-BIO) for 1 h at 37 °C. The chromosomal slides were then washed in 2xSSC at 60 °C. To reveal the TLC, the chromosomes were hybridized overnight with TLC-DIG and minor-BIO probes at 37 °C. Post-hybridization washes in 2xSSC and in 4XT at 39 °C were performed. Fluorescence in situ hybridization (FISH) signals were revealed as indicated above. The slides were counterstained with DAPI and mounted in a Vectashield antifade solution.

PCR

The presence/absence of the satellite DNAs was checked by PCR amplification. PCR primers (5′- > 3′) for the major satellite are TGATTTTCGGTTTTCTTGCC and TGAAGGACCTGGAATATGG for the minor satellite AAATCCCGTTTCCAACGAAT, and TGGAAAATGATGAAAACCACA and for the TLC satellite TGTGCCGGTCTGATTTTCTAG and AGAAAATGGGAAATGCACAG. The PCR conditions were 0.08 mM dNTP, 1.5 mM MgCl2, 0.4 mM of each primer, buffer ×1, 0.75 U of Taq (GoTaq Promega), 4 ng of DNA solution and 11.35 μl of purified water to obtain a final volume of 25 μl. The amplification consisted of an initial denaturing step at 94 °C for 4 min followed by 30 cycles at 94 °C for 30s, at 58 °C for the TLC satellite or at 60 °C for the major and minor satellites for 30s, at 72 °C for 30s, and a final extension at 72 °C for 10 min. The PCR products were revealed on a 2 % agarose gel.

Results

Distribution of satellite sequences in the subgenus Mus

The results of the in situ hybridization of the three satellite sequences are detailed in Table 1 for each species and subspecies and completed with data from the literature. The major and minor satDNA sequences were detected in the Indian and Palearctic groups, on all chromosomes except the Y. The only variation from this pattern was observed for the major satDNA in Mus musculus musculus in which the signal on chromosomes 1 and X was faint, and in M. spretus, in which no signal was detected on chromosomes 4 and 7. In contrast, the three South-East Asian species showed a remarkable lack of both the minor and major satDNAs.

The distribution of the TLC sequences varied greatly between taxa and between chromosomes within taxa. The TLC satDNA was detected on all chromosomes except the Y in M. caroli, M. cooki, M. macedonicus and M. fragilicauda (Fig. 2a–c, f and Online resource 1). The same result was observed in M. musculus domesticus and Mus musculus castaneus, but the quantity between chromosomes was notably different. The TLC was always present on chromosomes 2 and 12 in M. musculus domesticus, whereas it was rarely detected on chromosomes 7, 11 and 14 (i.e. only 4/15 of the metaphases). In M. musculus castaneus, the TLC was rarely detected on chromosomes 1, 2 and 3, whereas it was always present on chromosome 19. This sequence was present only on chromosomes 1, 3, 4 in M. cervicolor (Fig. 2c) and on chromosomes 1–3, 5, 7, 10, 11, 14 and 15 in M. spretus (see Online resource 1). In the remaining taxa, M. famulus, M. cypriacus and M. musculus musculus, no TLC satDNA could be detected (data not shown). When present, the major satDNA was always located distally relative to the minor and TLC satellites which were more proximal. The resolution of the hybridization experiments was, however, insufficient to detect which of the minor or TLC satDNA was adjacent to the proximal telomere. Differences in centromeric composition were observed between the sex chromosomes. Indeed, none of the satDNAs were detected on the Y chromosome in any of the taxa studied, while the X chromosome displayed a satDNA content similar to that of the autosomes in most of the analysed species. The analyses performed on specimens from the three other subgenera of the genus Mus, Mus pahari (Coelomys), Mus mattheyi (Nannomys) and Mus platythrix (Pyromys), detected no signals of either the major, the minor or the TLC satDNAs.

Fig. 2
figure 2

FISH pattern of the different satellites. The chromosomal distribution of satellites is shown in a M. cooki, b M. caroli and c M. cervicolor, d M. fragilicauda, e M. cypriacus and f M. macedonicus. The probes used and the detection color are indicated on each image. Hybridization signals are visualized either by FITC (green) or by CY3 (red); metaphase spreads are counterstained with DAPI (blue). Scale bar indicates 10 μm

PCR assays were performed for all combinations of taxa and satDNAs using the appropriate standard primers and both a negative and a positive control. In all cases, the results confirmed the FISH analyses: all taxa in which the satellite sequences were present produced an amplified product by PCR in the form of a large smear, bringing to light the presence of several monomer repeats of the satellites (major, minor and TLC). Conversely, when no signal was detected by FISH, no PCR amplification was observed. By combining the FISH and PCR analyses, negative results are interpreted as indicative of the absence of a repeat sequence with sufficient homology to the M. musculus domesticus reference satDNA probe or set of primers.

Orientation of satellite sequences

By applying the CO-FISH technique, we determined the orientation of the satellite sequences in M. spretus, M. musculus castaneus, M. musculus domesticus and M. musculus musculus. Our results revealed that in all taxa, the major and the minor satellites were organized in a head-to-tail fashion respectively to each other (Fig. 3). Indeed, as the probes had an opposite orientation, both signals were detected on different chromatids (see Fig. 1). In contrast, the co-orientation pattern of the TLC and minor satDNA differed between species. These sequences showed the same orientation in M. musculus castaneus and M. musculus domesticus (Fig. 3) as each fluorescent bright spot was present on different chromatids. In M. spretus, both signals were detected on the same chromatid, indicating that the TLC and the minor satellite DNAs had an opposite orientation (see Fig. 1). It should be kept in mind that it was not possible to detect both the TLC and the minor satDNA on all chromosomes with this technique. In effect, the CO-FISH approach has a lower sensitivity than the FISH procedure, and only allows the detection of hybridization signals higher than 50 kb (Goodwin and Meyne 1993). Thus, we were able to determine the orientation for chromosomes 2, 11 and 18 in M. musculus domesticus, for chromosomes 5 and 11 in M. spretus and for chromosomes 6, 7, 9, 10, 12, 15, 17 and 19 in M. musculus castaneus.

Fig. 3
figure 3

CO-FISH pattern using major and minor LNA probes in a M. musculus musculus, b M. musculus domesticus, c M. musculus castaneus and d M. spretus. CO-FISH pattern of TLC and minor in e M. musculus domesticus, f M. musculus castaneus and g M. spretus. The major and TLC probes are visualized by FITC (green) and the minor by CY3 (red). Metaphase spreads are counterstained with DAPI (blue). Scale bar indicates 10 μm. The orientation of the different satellites is schematized for each species in the insert

Discussion

Satellite sequence structure in the subgenus Mus

This study presents a comparative survey of satellite sequence composition (distribution and orientation) in 11 taxa within the same subgenus. Two patterns were evident regarding the taxonomic distribution of satDNA composition. The minor and major satDNAs co-occurred in all taxa except the three Asian species (M. caroli, M. cervicolor and M. cooki). Whereas our results for the minor satDNA agree with previously published data (Redi et al. 1990b; Garagna et al. 1993), several differences concerning the distribution of the major satDNA in two species were apparent. In M. caroli, our study detected no trace of the major satDNA. Previous studies by enzymatic digestion of DNA showed conflicting results: a faint signal was present in the study by Garagna et al. (1993), whereas other analyses (Dod et al. 1989; Kipling et al. 1995) revealed that the satDNA in M. caroli had a periodic structure of 60–79 pb distinct from that of the major satDNA in M. musculus domesticus (234 bp). The latter results support the lack of the major satDNA in M. caroli that most likely possesses small amounts of related sequences that may cross-hybridize under certain conditions. In M. spretus, the presence of the major satDNA is in agreement with previous analyses using restriction enzyme digestion and FISH, both of which detected these sequences albeit in moderate quantity with regard to that of M. musculus domesticus (Dod et al. 1989; Garagna et al. 1993). However, differences in the chromosomal distribution of the major satellite sequences were observed in M. spretus: Garagna et al. (1993) found no major satDNA on chromosome 16, while in our own survey, this satDNA was absent on chromosomes 4 and 7. Such variation may well represent genomic diversity within this species (Boursot et al. 1985), resulting from geographic differences in copy-number repeats among populations.

In contrast to this taxonomic distribution, the TLC satDNA showed a distinct pattern. Kalitsis et al. (2006) detected the TLC sequences by Southern blot in three species of the subgenus Mus, M. musculus (M. musculus domesticus—precisely C57BL/6, M. musculus castaneus and a M. musculus musculus/M. musculus domesticus strain), M. spretus and M. caroli. Our results confirmed the presence of these sequences in the latter two species and extended it to M. macedonicus, M. fragilicauda, M. cooki and to three chromosomes in M. cervicolor, whereas the TLC was absent in the other species. Within M. musculus, however, our analyses detected these sequences in only two subspecies, since no signal was observed in M. musculus musculus. The absence of the TLC satDNA in M. musculus musculus has recently been confirmed by Sasaki et al. (2012) who tested wild derived strains of this subspecies from 12 different origins. Among the latter, the only mice that produced a positive signal were hybrids between M. musculus musculus and a TLC-carrying subspecies such as the one in the original assay (Kalitsis et al. 2006). The most intriguing results of our study lied in the relative orientation of the three satDNA sequences in M. musculus domesticus. The CO-FISH analyses allowed us to confirm the results of Garagna et al. (2001) for the minor and major sequences in M. musculus domesticus, and revealed the same head-to-tail orientation of these tandem repeats in the other two subspecies (M. musculus and M. musculus castaneus) as well as in M. spretus. The data for the TLC, surprisingly, did not conform to previous results. Indeed, using a sequencing approach, Kalitsis et al. (2006) had demonstrated that the TLC satellite presented a reverse orientation to that of the major and minor satDNAs in nine centromere-containing fosmids cloned from M. musculus domesticus (C57Bl/6J inbred strain). This was not the case in our own analyses as in both M. musculus domesticus and M. musculus castaneus, the TLC and the minor satDNAs were oriented in a head-to-tail fashion, whereas a reverse orientation was demonstrated in M. spretus. One explanation for these discordant results may lie in the origin of the mice studied (wild mouse vs inbred strain), particularly since classical inbred strains are known to have a composite genome with a predominantly domesticus background and various contributions from other subspecies of the house mouse (Frazer et al. 2007) and even other species such as M. spretus (Song et al. 2011).

This study highlights several noteworthy features in satDNA sequence distribution in the subgenus Mus: (1) the absence of the major and minor satellite sequences in the Asian clade, (2) the diversity in orientation of the TLC sequence and (3) the absence of any of these satDNAs on the Y chromosome in all species. In M. musculus domesticus, the minor satDNA has been identified as the sequence involved in centromeric function, since the CENP-B, the protein involved in kinetochore formation, binds to this satellite (Mitchell 1996). The absence of such sequences in the Asian species suggests that other sequences most likely fulfill this role. Two candidate sequences are available, both of which are partly homologous to the critical 17 bp of the CENP-B box. The first is the short 79-bp satellite motif (carrying nine of the 17 bp) described in M. caroli by Kipling et al. (1995), while the second is the TLC (11 bp/17 bp) which is particularly abundant in this species. Nevertheless, the TLC satDNA, although present on all chromosomes in M. caroli and M. cooki, was detected on only three chromosomes of the third Asian species M. cervicolor. It is thus likely that additional undescribed sequences involved in centromere function are yet to be discovered in these species. The overall absence of the satDNAs on the Y chromosome in all the taxa examined extends previous observations in the house mouse (Garagna et al. 1993). The lack of or very low homology between the Y centromere and that of all other chromosomes seems in fact to be a common characteristic of the mammalian genome since such differences have been recorded in a wide diversity of species (Vidal-Rioja et al. 1987; Kunze et al. 1999; Van Vuuren and Robinson 2001). The recent characterization of the house mouse Y centromere demonstrated a unique higher order repeat structure with a distant homology to the minor satellite sequences (76.8 %; Pertile et al. 2009). These results support the notion the centromeric sequences of mammalian Y chromosomes evolve at a different rate and independently from those on other chromosomes.

Evolution of satellite DNAs in the subgenus Mus

The mode and tempo of satDNA evolution in the subgenus Mus was investigated by mapping the presence/absence of the three satellite sequences onto the phylogenetic tree defined in Cazaux et al. (2011) (Fig. 4). This approach allowed us to polarize the probable sequence of events underlying the evolution of the M. musculus domesticus satDNA arrays and to propose a pattern of short-term evolution of these sequences. No homology to these sequences was detected in the three species belonging to the other subgenera, suggesting that these satellite sequences were specific to the subgenus Mus. However, Dod et al. (1989) detected very small amounts of satDNAs with a main periodicity of 240 bp (corresponding to the periodicity of the major satellite) in species of two of the other subgenera (Nannomys and Pyromys) as well as in Rattus norvegicus, but none in less closely related genera. These authors proposed that such a DNA repeat structure may have originated in a common ancestor well before the radiation of the genus Mus. Even if such is the case, what remains evident is that amplification and diversification of these sequences took place in the subgenus Mus. Among the three satDNAs, the TLC had the most widespread distribution as it was present in species belonging to the three clades of the subgenus Mus. This pattern suggests that TLC would have appeared in the ancestor of the subgenus Mus 6.5 MY ago, and would have been lost independently three times: in (1) M. famulus (time of divergence estimated at 2.8 MYa), (2) M. cypriacus (0.53 MYa) and (3) M. musculus musculus (0.5 MYa; cf. Fig. 4; Chevret et al. 2003; Suzuki et al. 2004; Cucchi et al. 2006). In contrast to the TLC, the major and minor satellites were detected only in species of the Indian and Palearctic clades. This distribution indicates that both satellites most likely appeared and dramatically amplified after the divergence of the Southeast Asian group (cf. Fig. 4). Data from the literature as well as complementary Southern blot analyses provided clues to the evolution of the repeat sequences of the major and minor satDNAs. In all species studied except two, monomer size was found to match that in the reference M. musculus domesticus (see Online resource 2). The two exceptions concerned M. fragilicauda and M. famulus that showed no hybridization signal to the major probe under standard conditions (see Online resource 2). In the case of the minor satDNA, these two species exhibited a more or less faint hybridization signal in the form of a smear indicating that the respective restriction sites have likely been lost. Both of these species belong to the Indian clade suggesting a sequence divergence of both the major and the minor in this group in comparison with the reference sequences of M. musculus domesticus.

Fig. 4
figure 4

Reference phylogenetic tree of the subgenus Mus adapted from Cazaux et al. (2011). Arrows along the branches indicate the appearance (white) and disappearance (black) of satellites. The figures on the right hand side schematize the composition, organization and orientation of satellites in the proximal chromosomal region for each of the taxa studied: telomeres (black), TLC satellite (blue), minor satellite (red) and major satellite (green). The relative spatial organization of the satDNAs is depicted as similar in all species: however, while the distal location of the major DNA is most likely correct, the relative position of the minor and TLC satDNAs is tentative and will have to be confirmed by additional analyses. M. pahari, M. mattheyi and M. platythrix correspond to the outgroups

The existence of a centromeric region consisting of a succession of contiguous related satDNA sequences appears to be a feature shared with other organisms (Nijman and Lenstra 2001; Di Meo et al. 2006; Gauthier et al. 2010). Such a structure is compatible with the progressive proximal expansion model whereby new sequences appear by mutation and expand in the centromere, moving the older sequences outwards into the pericentromeric region (Henikoff et al. 2001; Schueler et al. 2005). As a consequence, this turnover of satDNA sequences creates a spatial gradient in homogeneity: the young homogeneous repeats correspond to the functional centromere, whereas the pericentromeric region exhibits a gradual decrease in homogeneity away from this core chromosomal area. In agreement with this scenario, the minor satDNA which constitutes the centromeric region in the house mouse possesses monomers with a remarkably high sequence identity (95 %; Kalitsis et al. 2006) as a result of intense concerted evolution. In contrast, the major satDNA which forms the pericentromeric region is expected to have diverged more rapidly than the minor satDNA. Sequence analyses apparently do not support this prediction as the degree of monomer similarity matches that of minor satDNA (96 %; Vissel and Choo 1989; Kalitsis et al. 2006), although no studies have tackled this question in a spatial context (i.e. near vs far from the centromere core).

In M. musculus domesticus, pairwise analyses show that TLC shares a higher sequence similarity with the other two satDNAs (minor = ∼70 %, major = ∼57 %) than do the latter two to each other (∼41 %). Given that the TLC predates the other two satDNAs, these observations suggest that the TLC may have given rise to the minor and major satellites or a common progenitor sequence prior to the differentiation of the Indian and Paleartic clade. The older age of the TLC is in agreement with the observed variation of TLC per genome and per chromosome suggesting that this satDNA has undergone recurrent episodes of erosion and amplification within the subgenus. In M. musculus domesticus at least, this satDNA appears to be under relaxed homogenization processes as the TLC monomers are those showing the lowest level of sequence identity (82 %; Kalitsis et al. 2006). Previous studies have suggested that the minor may have derived from the major satDNAs, although no conclusive evidence was found (Garagna et al. 1993).The co-occurrence of the minor and major satellites in the present phylogenetic tree provides no information on their relative order of origin, their functional role when they appeared (i.e. centromere vs pericentromere), nor their rate of divergence. Additional insight into the evolution of satDNA architecture in the genus and subgenus Mus requires comparative analyses of the nucleotide sequences in the different species to determine extant rates of sequence homogenization, and patterns of divergence. Moreover, elucidating the nature of the satDNA sequences present in the Asian species as well as the other subgenera will inform on the origin and dynamics of satellite sequences in the genus.

Satellite sequences and formation of Rb fusions

Despite numerous studies since its discovery in the 1970s, the factors triggering the chromosomal radiation of the house mouse remain speculative, although analyses of the centromeric region of Rb chromosomes have provided clues to the formation of this rearrangement. The breakpoint was shown to occur within the minor satDNA since the telomeric sequences and presumably the TLC satDNA were lost, whereas 50–70 kb of the minor satDNA and the integrality of the major satDNA were retained in the Rb chromosome (Garagna et al. 1995; Nanda et al. 1995). The identification of the minor satDNA as the molecular substrate of the chromosomal rearrangement reinforces the role of satDNAs in promoting genomic plasticity in this and other mammals (Garagna et al. 2001; Chaves et al. 2003a; Kalitsis et al. 2006; Di Meo et al. 2006; Adega et al. 2009; Gauthier et al. 2010). Previous comparative genomic analyses explored the specificity of the centromeric region in M. musculus domesticus. The main results of these studies demonstrated that M. musculus domesticus was the only taxon to show not only the highest quantity of satellite sequences but also the longest arrays of satDNA with no interspersion by other non-satellite sequences (Garagna et al. 1993). An additional significant feature was the high degree of homology between satDNAs sequences within the genome (Vissel and Choo 1989; Kalitsis et al. 2006). Such characteristics increase the probability of generating chromosomal rearrangements through mispairing during non-homologous recombination. The relationship between the presence of substantial amounts of satDNA in the genome and chromosomal plasticity has been investigated in other groups of mammals. Whereas some conform well to the predictions (Halkka et al. 1994; Gauthier et al. 2010; Acosta et al. 2010), others apparently do not (Kunze et al. 1999; Slamovits and Rossi 2002). An alternative model postulated that chromosomally variable lineages will have satDNA families in a dynamic state (undergoing rapid changes in copy number), whereas conservative lineages will be expected to show high intragenomic heterogeneity leading to reduced rates of non-homologous recombination and thus stasis in copy number (Slamovits and Rossi 2002; Ellingsen et al. 2007). The present phylogenetic survey of satDNA sequence composition and organization in taxa closely related to M. musculus domesticus showed no major differences in their chromosomal structure. This was also the case for the newly described TLC which presented the same head-to-tail orientation in the wild mice investigated with the exception of M. spretus. Such results provide no support for the involvement of inverted satDNAs as a mechanism promoting the rapid and diverse chromosomal radiation in M. musculus domesticus. Thus, within a wide phylogenetic framework, this study indicates that almost all the satDNA features required for Rb formation are shared with one or another subspecies/species within the subgenus Mus. In particular, the overall similarity in satDNA structure (composition, organization and orientation) between M. musculus domesticus and M. musculus castaneus provides no explanation for their different rates of chromosomal diversity. Although widespread karyotypic surveys may be lacking in M. musculus castaneus, Rb translocations have so far been described in only two individuals of this subspecies in India. However, the taxonomic assignation of these specimens needs to be confirmed by molecular analyses (Chakrabarti and Chakrabarti 1977).

What these observations suggest is that these genomic characteristics may be necessary but not sufficient to trigger the observed chromosomal radiation. Factors other than satDNA organization have also been put forward. One of these has been postulated for the human genome, and involves the CENP-B box which through its nicking activity could promote a high rate of exchanges between satDNA sequences on different chromosomes (Kipling and Warburton 1997; Garagna et al. 2001). Another possibility is illustrated by several marsupial hybrids, in which the failure of DNA methylation and subsequent mobile-element activity have been shown to trigger chromosomal instability (O’Neill et al. 1998). No modifications in methylation patterns had been observed in placental hybrids (Roeder 1997; Dobigny et al. 2006), until the very recent study of Brown et al. (2012) which is the first to show a link between methylation and retroelements in placental mammalian hybrids. In addition, comparative analyses of LINE-1 abundance in species of the subgenus Mus showed that M. musculus domesticus displays the highest content of LINE-1 sequences (Rebuzzini et al. 2009). This important accumulation of LINE-1 in M. musculus domesticus may confer a certain degree of lability to its genome that may, under certain circumstances, promote chromosomal change more readily than in the related species of the subgenus. The role of transposable elements (TE), particularly those embedded in the centromere, in generating chromosomal instability through disruption of methylation patterns may gain support in the house mouse context, since M. musculus domesticus is the subspecies with the highest expansion rate due to transport with humans. This dispersion success has likely provided multiple opportunities for genetic admixtures between differentiated populations, a key feature in the TE model of chromosomal plasticity (Metcalfe et al. 2007; Carbone et al. 2009).