Introduction

Microsatellites, or simple sequence repeats (SSRs), are DNA tracts consisting of relatively short base-pair motifs that are repeated in tandem. They occur across the genome in various types of gene contexts (exons, introns, and intergenic regions). Because of the wide distribution and their high level of polymorphism, microsatellites have shown utility in the construction of genetic maps, in facilitating positional cloning, and also in population genetic studies.

In most cases, changes to microsatellites are thought to follow a neutral pattern, but this is dependent on any functional consequences of microsatellite mutation or to other mutations that microsatellites may be linked to (for discussion see Li et al. 2004). Microsatellites can affect the regulation or functioning of genes, where the most striking case involves the trinucleotide repeat expansion diseases. Huntington’s disease, for example, results from the expansion of a trinucleotide repeat to form a polyglutamine tract in the encoded protein (MacDonald et al. 1993). This polyglutamine tract may facilitate novel protein-protein interactions, causing disease (Bao et al. 1996). Beyond protein function, microsatellites can also affect other functional processes, like transcription, mRNA splicing, and cellular localization. Microsatellites found in introns may affect splicing or mRNA structure, influencing stability and, ultimately, translation and protein expression.

A phylogenetic approach is needed to explain the evolutionary birth, dynamics, and death of microsatellites. The first reported phylogenetic study indicating interspecific variation in repeat length was reported in a globin gene in primates (Messier et al. 1996). Subsequently, other studies have reported the birth and dynamics (Primmer and Ellegren 1998; Zhu et al. 2000), as well as the death, of a microsatellite (Taylor et al. 1999). The life cycle of a microsatellite has been discussed as a balance between growth through polymerase slippage or recombination and death through mutation leading to imperfections in the repeat structure (Levinson and Gutman 1987; Schlotterer and Tautz 1992; Goldstein and Clark 1995; Zhu et al. 2000; Ellegren 2004; Buschiazzo and Gemmell 2006).

In vitro experiments have demonstrated that DNA slippage occurs at very high rates (Hentschel 1982; Schlotterer and Tautz 1992; Streisinger and Owen 1985). It has been shown that a functional mismatch repair system reduces the mutation rate of microsatellites between 100- and 1000-fold (Strand et al. 1993). Hence, the in vivo mutation rates are the result of two processes: the mutational origin and the efficiency of the mismatch repair system. Additionally, it is possible that an enabling mutation external to the repeat may also affect the stability of the repeat through local effects to DNA structure like influencing curvature, rigidity, or other structural attributes that are known to affect polymerase fidelity (Ball et al. 2005).

There may be a sequence-dependent threshold for instability that relates to the probability of a polymerase slippage or recombination event that creates the mutation in the first place (for discussion see Oliveira et al. 2006). One study concluded that a minimum number of eight repeats is required before DNA polymerase slippage can extend the number of repeats (Rose and Falush 1998), while another study (Zhu et al. 2000) indicated that the expansion of microsatellites can occur with fewer repeats. This may be dependent on the properties of the specific sequence, as Pupko and Graur (1999) found no general rule for the establishment of a threshold.

Myostatin (MSTN) is one of the most important genes identified to be involved in muscle development (Lee 2004). In mammals, MSTN acts as a potent regulator of skeletal muscle growth (Grobet et al. 1997; McPherron et al. 1997). The importance of the gene in regulating muscle mass has implications both in agriculture (Bellinge et al. 2005) and, potentially, in human health (Joulia-Ekaza and Cabello 2007). A phylogenetic analysis in ruminants demonstrated that MSTN was under positive selective pressures during the time of divergence of Bovinae and Caprinae, possibly leading to changes of function between these closely related clades of species (Tellgren et al. 2004). Further, mutational polymorphism in the gene in cattle was reported to show the signatures of selection (Wiener et al. 2003). Recently, a polymorphic microsatellite within intron I of the MSTN gene (MSTN01) was described in bovines (De la Rosa-Reyna et al. 2006). It is located ∼400 bp from the beginning of the first intron (the total intron length is 510 bp, depending on the microsatellite length) and is characterized by a mononucleotide T-motif that in bovine exhibits allelic variants from 12 to 21 repeats. In this study, we seek to characterize the evolutionary origin and dynamics of this microsatellite using phylogenetic methods.

Poly(T) repeats are structurally interesting at the DNA level in that poly(T) sequences are intrinsically curved. T-T steps in double-helical DNA have unusual roll and tilt angles compared to other dinucleotide steps (Bolshoy et al. 1991). An NMR structure of poly(T) DNA also indicated that the junctions of the T-tracts with nonrepetitive sequences appeared bent (Yoon et al. 1988). Finally, single-stranded poly(T) shows no appreciable stacking energy and the energetics for stacking are driven by the A-A stacks on the opposite strand (Saenger 1984). These unique structural properties of poly(T) repeats make their evolutionary dynamics in general interesting. Therefore, we extended the study to examine how general the dynamics of the MSTN microsatellite are among poly(T) repeats in two closely related mammalian genome pairs.

Materials and Methods

DNA Samples

DNA was extracted from muscle, hair, or blood tissue with a Promega Wizard Genomic DNA isolation kit or a Qiagen DNA isolation kit. Hair and blood samples from four eland (Taurotragus oryx) and seven Texas white-tailed deer (Odocoileus virginianus texanus) were obtained from cinegetic ranches located in northeastern Mexico. DNA was isolated using the Wizard genomic DNA purification kit (Promega).

The sequences of pig (Sus scrofa), goat (Capra hircus), sheep (Ovis aries), and water buffalo (Bubalus bubalis) were extracted from GenBank (accession numbers AJ237662, AF393619, AF393618, and DQ091762, respectively).

Microsatellite Amplification and Sequencing

Ten to fifty nanograms of DNA from each sample was amplified by 30 cycles of PCR, using primers MSTN2F 5′-CACGACGTTGTAAAACGACCCACGGAGTGTGAGTAGTCCTG-3′ and MSTN2R 5′-TTTACTTCCTTATTGCTCTTACTA-3′ (described by De la Rosa et al. 2006) and 2.5 U of Taq DNA polymerase (Perkin Elmer, Roche Molecular Systems, Inc., Branchburg, NJ). The amplification program consisted of predenaturation (94°C for 2 min), followed by 35–40 cycles of denaturation (94°C for 2 min), primer annealing (55°C for 45 s), and extension (72°C for 2.5 min), with a final extension at 72°C for 7 min. The PCR products were resolved by electrophoresis in a 1.5% agarose gel, stained with Syber Gold, and visualized by UV irradiation. The SequiTherm EXCEL II DNA Sequencing Kit (Epicentre Technologies, Madison, WI) was used for sequencing plasmid DNA of individual clones obtained after cloning each PCR product using the pGEM Easy Kit (Promega Corp., Madison, WI) or 4-TOPO-TA vector (Invitrogen, Applied Biosystems, Foster City, CA). Samples were sequenced in both directions on a LI-COR IR2 DNA Sequencer (LI-COR, Inc., Lincoln, NE), assembled into contigs, and then aligned using Clustal X (Thompson et al. 1997).

Phylogenetic Analysis

The coding sequences of MSTN from Sus scrofa AF019623, Antilocapra amiericana AY629309, Ovis aries AF019622, Capra hircus AY436347, Bubalus bubalis DQ159987, Taurotragus derbianus AY629304, Bos taurus AF320998, and Bos gaurus AY62303 were aligned with ClustalW. A model of nucleotide substitution was chosen with Modeltest (Posada and Crandall 1998). MrBayes (Ronquist and Huelsenbeck 2003) was used for phylogeny construction with a GTR model with gamma-distributed rates across sites, calculating 1 million generations and sampling every 100 trees after a burn-in of 500 trees. The tree obtained was consistent with existing sequence relationships reported by Hassanin and Douzery (1999). For species where the entire coding region was not available, phylogenetic position was inferred from Hassanin and Douzery (1999).

Genome Comparisons

The human (Homo sapiens) genome vs. 16 other vertebrate multiple alignments was downloaded from the UCSC Genome Bioinformatics web site (Karolik et al. 2002). The gene prediction data for the same human genome assembly were acquired from the UCSC genome table browser. Perfect T/A repeats of length ≥5 were identified in human and mouse genome alignment chains. The corresponding regions on chimpanzee and rat genome chains, respectively, were then tested for repeat length change and mutations (imperfections). A similar analysis was performed with the pairwise alignment from the human and chimpanzee genomes as a control against any bias that might be introduced through the more distant multiple sequence alignment.

Results and Discussion

PCR products were cloned and sequenced to determine heterozygosity. An alignment of the set of sequences permitted us to locate the MSTN01 microsatellite motif, as reported in Table 1. Species where sequence was identified or determined include Bos taurus (cattle), Bos gaurus (gaur), Bubalis bubalis (water buffalo), Tragelephus oryx (eland), Taurotragus debianus (giand eland), Ovis canadensis (big horn sheep), Ovis aries (sheep), Capra hircus (goat), Oryx gazella (gemsbok), Cervus elaphus (elk), Alces alces (moose), Odocoileus hemionus (mule deer), Odocoileus viginianus (white-tailed deer), Antilocapra americana (pronghorn), Mesoplodon stejnegeri (Stejneger’s beaked whale), Sus scrofa (pig), Camelus dromedarius (dromedary camel), Lama guanacoe (guanacoe), Lama pacos (alpaca), and Equus caballus (horse). The most divergent ruminant species in our study was Antilocapra americana, which diverged from other ruminants in the middle of the Oligocene (Hassanin and Douzery 2003). Cervidae diverged from the Bovidae during the late Oligocene period. The Bovidae split into Bovinae and Antilopinae during the early Miocene. The last common ancestor of eland/giant eland and other bovids lived approximately 16 million years ago (Hassanin and Ropiquet 2004).

Table 1 Microsatellite sequences from all alleles across artiodactyl species (plus a perissodactyl outgroup) where the sequence was generated

In Fig. 1, we infer the microsatellite development on a phylogeny based on the coding sequence of MSTN. For sequences where the coding sequence is available, this phylogeny agrees with earlier data and with expected species relationships (Hassanin and Douzery 1999; Hassanin and Ropiquet 2004). The ancestral state as seen in camel, pig, and whale contains five to seven repeated Ts that do not appear to be polymorphic, given the homozygosity and similarity across species. Given the variability among the three sets of outgroups, this is consistent with stepwise gain/loss of Ts on a slow interspecific timescale. No evidence of polymorphism was observed in a fourth outgroup, pronghorn, where a G imperfection was also introduced. At this point, two models for the birth(s) and subsequent dynamics of the microsatellite are presented in Figs. 1a and b.

Fig. 1
figure 1

The tree based on the myostatin coding sequence is shown with added species adapted from accepted species relationships. A GTR model with inverse gamma-distributed rates across sites was calculated in MrBayes (Ronquist and Huelsenbeck 2003). The myostatin coding sequence for some species was not available but these phylogenetic positions in the tree were inferred from Hassanin and Douzery (1999). Microsatellites are shown at the right. The substitutions indicated on the branches show the minimal number of substitutions necessary for microsatellite expansion(s) and/or deaths according to (a) a model with two independent birth events and (b) a model with a birth and death event. Also indicated on the tree are the names of the different clades

The first model (Fig. 1a) involves two independent birth events, while a presumed ancestral state is maintained among the caprids. In this model, independently in both the bovid and the cervid lineages, a threshold is crossed and polymorphism emerges. While the dynamics may be more complex, no lineage shows more than T(7) and no polymorphism, and the putative ancestral state includes T(7), where the addition of a single T generates a microsatellite. Interestingly, the pronghorn, which is not polymorphic and has not increased substantially in size, has T(8), but an imperfection occurred as well, to generate T(4)GT(4). If the threshold is this simple, one might speculate that the order of mutations involved the G insertion before the introduction of the eighth T. Subsequent to the microsatellites emerging in the two clades, polymorphic imperfections emerged in cattle and also in the lineage leading to mule deer and white-tailed deer, after the divergence from moose.

In the alternative model (Fig. 1b), the microsatellite emerged once preceding the divergence of cervids from caprids and bovids and then was lost on the caprid lineage. Arguing against this are the lack of imperfection in the caprid sequence and the correspondence of the conserved caprid sequence (among gemsbok, goat, and sheep) with the putative ancestral state at the last common ancestor of bovids and caprids according to the previous model.

While no general threshold has been established for the birth of a microsatellite (Pupko and Graur 1999), the data here indicate a length of initiation consistent with T(8), as suggested as necessary by Rose and Falush (1998). However, if the model described in Fig. 1a is correct, the microsatellite is behaving seemingly with a threshold-like behavior that when crossed in two instances led to the birth of the microsatellite. The picture along the pronghorn lineage is interesting, and understanding the order of mutations would be desirable. Unfortunately, no polymorphism was detected in three individuals and there is no close sister species to pronghorn to examine.

To examine such threshold behavior in other microsatellites in mammalian genomes, the conservation of poly(T) tracts in sister genomes (human vs. chimpanzee and mouse vs. rat) was undertaken, as shown in Fig. 2 for human vs. chimpanzee. In both genome comparisons, and irrespective of the location of the microsatellite (intron vs. exon vs. noncoding), a transition for length instability beginning at about T(7) was observed. From these comparisons, the MSTN microsatellite appears to be behaving in a similar fashion to other poly(T) tracts. Because of the phylogenetic distance of the species involved in the multiple sequence alignment on the UCSC genome browser (Karolik et al. 2002), these results were compared to those obtained from a pairwise alignment of the human and chimpanzee genomes. This was done with the possibility of a bias in the multiple sequence alignment against the identification of divergent pairs in noncoding regions in mind. The similarity of the results (data not shown) between the multiple sequence and the pairwise alignments argues against this bias.

Fig. 2
figure 2

Based on the multiple sequence alignment of the human and chimpanzee genomes at the UCSC genome browser web site (Karolik et al. 2002), poly(T) repeats of length ≥5 from the human genome were compared for conservation patterns to those from the chimpanzee genome. A similar pattern was observed across exonic, intronic, and noncoding regions, where (a) conservation decreased from a length of ∼7 T bp, corresponding (b) to an increase in length instability at that point. A similar result was observed in a comparison of the mouse and rat genomes (data not shown). The introduction of imperfections into repeats (c) showed a less dramatic dependence on repeat length

Another proposal is that the birth of a microsatellite is influenced by an enabling mutation. A search for potential enabling substitutions that were specific to and conserved in caprini and Sus but not in Bovinae and Cervidae was carried out, but no obvious substitution in the immediate proximity of the microsatellite was identified (data not shown). The proposition of an important role for enabling mutations outside of the microsatellite itself is controversial and our dataset is not illuminating in this regard.

Imperfections occur in the MSTN microsatellite in many species studied, which is characterized as an important part of the life cycle (Buschiazzo and Gemmell 2006). A previous study of microsatellite evolution in ruminant artiodactyls (Brohede and Ellegren 1999) found the mutation rate to be faster at the ends of microsatellites than in the center. In our study, substitution was found to occur at the end of the poly(T) sequence, which could be an indication of a faster mutation rate there. In most sequences studied, an extra G is present just upstream of the poly(T) sequence. Further, some Bos taurus individuals have an additional GTG or GTGTTG preceding the poly(T) region. These insertions are not seen in any of the other sequences examined. Two Bos taurus individuals also have G imperfections in the 3′-end of the sequence. The imperfections in the cervids and bovids did not identify any individuals where imperfections had interrupted the repeat to an extent that might be expected to affect its behavior as a microsatellite. Of course, even identifying such a late-stage microsatellite individual would then be subject to the forces of population genetics for fixation dependent on effective population size and any selective effects on the microsatellite itself.

Brohede and Ellegren (1999) also found that the substitution rate in the first few nucleotide positions flanking microsatellites was higher than in the flanking sequence farther away. In our study the 5′-flanking region shows the opposite pattern, where the five closest flanking bases are highly conserved, but the next five bases are not.

Some evidence suggests that different species have different expansion rates (or expansion abilities) of microsatellites (Rubinsztein et al. 1995). If there are selective forces operating on microsatellites (Garza et al. 1995), these might also differ between species. Effective population size (which varies among species) is generally an important parameter in dictating substitution properties. Alternatively, species-specific differences in dynamics may be due to differences in either polymerases or DNA repair enzymes in different species. Both statistical testing of substitution patterns between mammalian species and experimental biochemical analysis of DNA repair activity have provided preliminary evidence for systematic species-specific differences in the DNA repair process (Ota and Penny 2003 and references therein). In the present study we see both a species-specific difference in length in MSTN01 and different impurities in different species, although this is not evidence for or against a difference in the underlying enzymatic process. The similarity of the behavior of the MSTN microsatellite to that of both rodent and primate genomes in general may also indicate an important role for DNA chemistry in general in the process, in addition to the potential subsequent role of any lineage-specific biochemistry or population biology. Because there was not a substantial difference in the behavior of microsatellites in exons vs. in noncoding DNA, this might further support the importance of DNA structure as a driving force.

Other microsatellite regions have been found in MSTN genes from several species, MSTN was sequenced in three species of Ictalurid (catfish) and polymorphic microsatellites were found in noncoding regions (Gregory et al. 2004). In Pseudosciaena crocea (yellow croaker) a microsatellite sequence, (CA)30 and (CA)26 separated by TA, exists in the 3′-UTR of MSTN, but its functional consequences are not known (Xue et al. 2006).

This analysis of the MSTN gene identified the birth of a microsatellite in ruminants that does not behave as a microsatellite in pigs or other mammals. Interestingly, microsatellite behavior is not identified in caprids/ovids and may have emerged twice independently within the ruminants. This complex microsatellite dynamics in an important gene like MSTN presented here is novel, and the hints that the microsatellite behavior may be linked to selection on this gene are unsubstantiated but intriguing as a possibility.