Introduction

Trans-specific microsatellites are a valuable tool in pines because de novo development is difficult in the large, highly duplicated conifer genome (Kostia et al. 1995; Soranzo et al. 1998; Joyner et al. 2001; Mariette et al. 2001). Although a number of methods have been developed to circumvent problems associated with repetitive DNA (Zhou et al. 2002), the transfer of microsatellites between related conifer species is one of the more promising (Echt et al. 1999; Kutil and Williams 2001; Shepherd et al. 2002). Shepherd et al. (2002) reported successful microsatellite transfer in hard pines (subgenus Pinus) for species that diverged a few million generations ago. Polymorphism was observed to be high within these narrow phylogenetic limits (Shepherd et al. 2002). Microsatellite transfer across pine subgenera (about 14 million generations since divergence) has also been shown to be feasible, but the extent of polymorphism is unknown at this level (Kutil and Williams 2001). Trans-specific microsatellites can provide a test of evolutionary models for microsatellite DNA regions (Karhu et al. 2000; Primmer and Ellegren 1998) because they are often phylogenetically informative (see Zhu et al. 2000). Variation relevant to infer phylogenetic relationships can be found both in flanking sequences and in the repeat motif. Trans-specific microsatellites are also useful for anchoring genetic maps in comparative mapping, for analysing the conservation of gene order across orthologous linkage groups and for verifying quantitative trait loci (QTLs) (Devey et al. 1999; Meksem et al. 2001).

A common observation in microsatellite transfer studies is that allele lengths are longer in focal species than in non-focal species. These results may possibly be explained by the presence of an inherent bias in protocols that select for longer-than-average microsatellites in the focal species (i.e., “ascertainment bias” as defined by Ellegren et al. 1995). However, evidence of ascertainment bias remains contradictory. It has not been found in intensive studies of wasps and cattle (Crawford et al. 1998; Zhu et al. 2000). In other cases, such as between humans and chimpanzees, inter-specific length differences in orthologous microsatellites cannot entirely be explained by ascertainment bias (Cooper et al. 1998). In fact, Cooper et al. (1998) showed that longer microsatellites in humans could be explained by a mutational bias in favour of microsatellite expansions and a higher average genome-wide microsatellite mutation rate in the human lineage.

Pines are woody perennial species with a long lifespan. They are characterized by highly outcrossing mating systems and approximately 10 years per generation as a general rule (Kutil and Williams 2001). Mediterranean pines are a heterogeneous assembly of hard (diploxylon) pine species that have been traditionally considered to be relic taxa from the Tertiary period, belonging to different evolutionary lineages (Klaus 1989). The complex phylogenetic relationships among the Mediterranean pine species themselves (subsections Pinaster and Pineae, based on Frankis 1993 and Liston et al. 1999) and among these and other Eurasian pines (subsection Pinus) have been the subject of active investigation using morphological and molecular data, and the conclusions drawn from these studies are still under debate (see Price et al. 1998 for a review). Pine species occur over a broad area in the Mediterranean basin, showing a remarkable ecological plasticity (Barbéro et al. 1998) and playing an important biological and commercial role. Some of the singular ecological adaptations of Mediterranean tree species have taken on a special significance in recent years as more precise predictions on climate change become available. The genetic variability of those Eurasian forest tree species, such as P. halepensis or P. pinaster, or of marginal populations from southern locations of widespread species such as P. sylvestris, which are adapted to warm and dry conditions, are of great interest to breeders and genetic conservation programmes.

Eurasian hard pines show distinct patterns of genetic variation and population structure. Highly heterozygous species, such as P. sylvestris (Prus-Glowacki and Stephan 1994), are sympatric with species showing a remarkable lack of polymorphism, such as P. pinea (Fallour et al. 1997). Pinus halepensis has a widespread and continuous range, yet P. pinaster has a widespread and scattered range and P. canariensis is an island endemic. Trans-specific microsatellites are needed to address evolutionary questions related to gene flow and adaptation at a multispecific (ecosystem) level. To date, there has not been a microsatellite-transfer study for pines in the Mediterranean region, and few de novo microsatellites are available— namely, P. sylvestris (Kostia et al. 1995; Soranzo et al. 1998; Karhu et al. 2000), P. halepensis (Keys et al. 2000) and P. pinaster (Mariette et al. 2001).

In the investigation reported here we tested microsatellite transfer from P. taeda and P. sylvestris to seven Eurasian hard pines widely distributed along the Mediterranean basin (P. pinaster, P. pinea, P. halepensis, P. canariensis, P. nigra, P. sylvestris and P. uncinata). Transferred markers (i.e., the clear amplification pattern of a single locus) were screened for gene diversity and allelic richness in order to demonstrate their practical usefulness in genetic studies. Because the set of species included in this study showed a wide range of phylogenetic distances, we were also able to address the extent of polymorphism retention from a few million years ago to over 100 million years ago (estimated divergence time between New World and Eurasian hard pines), thereby extending previous results by Shepherd et al. (2002). Furthermore, trans-specific microsatellites were used to (1) study sequence variation in the flanking and repeat regions across different Eurasian pine species and (2) evaluate the usefulness of the transferred markers for a population-level distinction between P. sylvestris and P. uncinata, two closely related cross-mating species.

Materials and methods

Species classification and DNA sources

We followed the taxonomic classification of Price et al. (1998), with the exception of recognizing a subsection Pinaster (Frankis 1993; Liston et al. 1999), which includes the species placed by Price et al. (1998) in subsections Halepenses and Canarienses together with two species from subsection PinusP. pinaster and P. merkusii. Two hard pine (subgenus Pinus) species were used as focal taxa of microsatellite loci: P. taeda and P. sylvestris. Pinus taeda belongs to the subsection Australes (New World hard pines), while P. sylvestris belongs to subsection Pinus (Eurasian hard pines). Seven native pine taxa occurring in the Mediterranean basin (including Canary Islands pine, a former Mediterranean-distributed species) were included as non-focal species for transfer, namely P. pinea (subsection Pineae); P. halepensis, P. canariensis and P. pinaster (subsection Pinaster); P. nigra, P. sylvestris and P. uncinata (subsection Pinus). Note that P. sylvestris was used as a both focal and non-focal species.

Control DNA samples of P. taeda were randomly selected from a three-generation outbred pedigree (Devey et al. 1991). Twenty individuals of each of the other seven species were randomly selected from three to four different native populations, with one exception. This exception was P. uncinata, which has a very restricted distribution range; samples were collected in just one population in the Pyrenees (northeastern Spain). DNA was extracted from needle tissue following the protocol described by Dellaporta et al. (1983). Of the 20 DNA samples collected, two were used to evaluate amplification and optimize PCR conditions. All 20 samples were used to estimate the polymorphism of the transferred loci or to confirm monomorphism. In addition, two samples from P. taeda and P. sylvestris were used as focal-species controls.

Microsatellite transfer

Transference is defined here as the positive amplification of a PCR band of the expected size. Twenty-two microsatellite markers (Table 1) were assessed for transfer. Markers were selected at random from already available sources. Nineteen markers were originally developed in P. taeda from different DNA libraries (16 from low-copy and under-methylated libraries; Auckland et al. 2002) and the other three markers were originally developed in P. sylvestris from total genomic libraries (Soranzo et al. 1998). The amplification of P. taeda microsatellite loci followed the protocol described by Auckland et al. (2002) (protocol a in Table 1). We attempted to optimize the PCR analyses by making the following modifications: 5% DMSO was used for amplification, and new annealing temperatures of the touchdown profile were tested (Table 1). The total number of cycles and concentration of PCR mixture products were not changed. Microsatellite loci described originally in P. sylvestris were tested for transfer using the protocol described in Soranzo et al. (1998) (protocol b Table 1). A Perkin-Elmer GenAmp 9700 thermal cycler (Perkin Elmer, Foster City, Calif.) was used to carry out all reactions.

Table 1 Results of transfer and PCR conditions used to amplify 22 microsatellite loci in seven Eurasian hard pine species

PCR amplification was initially assessed on 1.7% agarose gels. Products of positive amplifications were then resolved on 6% acrylamide/bisacrylamide (19:1), 7 M urea and 1× TBE denaturing gels. The gels were run at 45 W constant power for 1–2 h using a Li-Cor 4200 Series automated DNA sequencer (Li-Cor, Lincoln, Neb.). Amplified fragments were sized by gene imagir ver. 3.56 software (Scanalytics), using external standards. A positive result of transfer to non-focal species was reported when a clear amplification pattern of the expected size and polymorphism were observed (class 1). Other observed results were classified as follows: class 2, monomorphic amplification product of the expected size; class 3, non-specific amplification, complex pattern or poor amplification; class 4, no amplification.

Expected heterozygosity (H e ) was calculated following Nei (1973). Allelic richness (A) at a locus was measured as the number of different alleles observed in a sample of 10 to 20 diploid individuals. In order to estimate allelic richness for a fixed sample size, a rarefaction method (Hurlbert 1971) was used to estimate A 10 , the number of different alleles at a locus for a sample of ten diploid individuals. The results obtained from non-focal species were compared with analogous population genetic data from P. taeda (Al-Rababah and Williams 2002). Differences in average microsatellite size (population genetic data) and length (sequencing data, see below) between P. taeda and non-focal species were tested using a non-parametric sign test. One goal of the present study was to develop a set of highly polymorphic markers able to distinguish—at a population level—between P. sylvestris and P. uncinata, two closely related cross-mating species. The utility of trans-specific microsatellites for this purpose was tested using a Fisher exact test of population differentiation between the two species (Raymond and Rousset 1995a). Genotypes of 20 trees per species, scored at eight trans-specific microsatellites each, were considered for the Fisher exact test, which was performed using genepop version 3.3. software package (Raymond and Rousset 1995b).

Sequence variation of microsatellite loci

Two amplification products from each of three trans-specific microsatellites (PtTX3107, PtTX4001 and PtTX3116; see Results) were sequenced for all non-focal species with successful cross-amplification. PCR products were precipitated with ethanol and then cloned using the pGEM-TEasy vector (Promega, Madison, Wis.). At least four clones from each transformation were sequenced to reduce Taq polymerase PCR artifacts. Only consensus sequences are reported. DNA sequencing was carried out using dye terminator sequencing reagents (Perkin-Elmer) and an automated ABI 377 sequencer. GenBank accession numbers are given in Table 2. Sequences were aligned using the clustal-w method included in megalign software (DNASTAR), followed by manual alignment adjustments.

Table 2 GenBank accession numbers for trans-specific microsatellite sequences

To test for polarity of mutation, we computed the rate of change for a given nucleotide position from the repeat motif as the observed number of substitutions and insertions/deletions at this position over all the sequences divided by the number of sequences analysed. Generalized linear models were tested using the glm procedure of SAS version 9.0 statistical package (SAS Institute, Cary, N.C.). In these models, the rate of change was considered to be the response variable and the distance to the repeat motif was the explanatory variable. Models were constructed combining 5′ and 3′ flanks. Confidence intervals for parameter estimates were obtained using standard methods. Base substitution at position 52 in PtTX4001 was excluded from regression analysis because it was common to all non-focal species. Finally, non-parametric unpaired Wilcoxon tests were computed to test if there were higher rates of change within the 5–10 bp immediately adjacent to the repeat region than in flanking sequences further away. This level of analysis (5–10 bp) was chosen following Brohede and Ellegren (1999).

Results

The percentage of microsatellites successfully transferred (classes 1 and 2 in Table 1) from P. taeda to subsection Pinus was moderately high (36–53%). The levels of polymorphism varied among species within this subsection from three polymorphic markers in P. nigra to five in P. sylvestris and P. uncinata. The transference rate was slightly lower to pines in subsections Pineae and Pinaster (36–42%). Polymorphism was found in two markers in P. halepensis, P. pinaster and P. canariensis (see Table 1), whereas no polymorphism was found in P. pinea. Out of the ten microsatellites that were successfully transferred from P. taeda (classes 1 and 2, Table 1), five (50%) showed polymorphism (class 1) in some of the non-focal species, which diverged from the focal species over an evolutionary period of approximately 100 million years (or ten million generations).

Trans-specific microsatellites originally developed in P. sylvestris were always polymorphic but also more phylogenetically limited (Table 1). Positive cross-amplification was mainly achieved within its own subsection (P. uncinata and P. nigra), although two microsatellites were successfully transferred to P. canariensis (SPAC11.6 and SPAG7.14). All primers with clear amplification patterns in P. sylvestris were also successfully amplified in P. uncinata.

Polymorphism level varied among markers in non-focal species (Table 3). With the allelic richness corrected for a sample of ten diploid individuals (A 10 ), it ranged between 1.0 and 14.0. Expected heterozygosity (H e ) ranged from moderate (0.29) to high (0.93) values. Microsatellites with higher number of repeats, such as PtTX4001, PtTX3032 or SPAG7.14, generally corresponded with higher expected heterozygosity values. When all five polymorphic markers transferred from P. taeda were taken into consideration, in 22 out of 26 cases the average allele size was larger in the focal species; the only exceptions were those corresponding to microsatellites with a complex repeat motif pattern (PtTX3116 and PtTX3032; Tables 1, 3). This difference was highly significant, as shown by a non-parametric sign test (P<0.001).

Table 3 Gene diversity and allelic richness for eight nuclear microsatellites in Pinus taeda (focal species) and seven Eurasian hard pine species (nt non-tested microsatellites; - tested but not transferred microsatellites)

Important polymorphism differences were also apparent among species for each marker, with P. sylvestris being consistently more polymorphic at all transferred loci (Table 3). Despite its restricted distribution, levels of diversity in P. uncinata remained high when compared with widespread species such as P. sylvestris (Table 3). Trans-specific polymorphism was common between P. sylvestris and P. uncinata, with an average of 30% of the alleles showing the same size in both species (data not shown). However, P. uncinata was clearly differentiated from P. sylvestris, as shown by a Fisher exact test of differentiation based on allele frequencies (P<0.01).

Cloning and sequencing of successfully transferred markers enabled flanking and repeat regions to be analysed across all seven species (Fig. 1). Basic repeat structure was conserved in two loci—PtTX3107 and PtTX4001. Only in three species—P. halepensis at PtTX3107, and P. pinaster and P. canariensis at PtTX4001—were slight changes in the repeat motif observed. Sequences flanking the repeat region were also highly preserved in PtTX3107. Only five point mutations were identified in four different species: P. pinaster, P. pinea, P. halepensis and P. canariensis (Fig. 1). The flanking region of PtTX4001 was more variable, with point mutations per 100 bp from 2.63 in P. halepensis to 4.21 in P. canariensis and P. nigra and with one base insertion in the latter. Pines belonging to subsection Pinus (P. uncinata, P. sylvestris and P. nigra) shared two point mutations that were not present in P. pinaster, P. halepensis or P. canariensis (Fig. 1). In contrast to PtTX3107 and PtTX4001, microsatellite locus PtTX3116 showed a complex pattern of variation among species. The main tri-nucleotide repeat region in the focal taxon (TTG) was only found in one of the non-focal species (P. canariensis), instead tri- and six- nucleotide compound motifs (TTA and TTATTG) or perfect repetitions of the six-nucleotide motif TTATTG (P. pinaster) were present (Fig. 1). The flanking sequence of the PtTX3116 microsatellite was also considerably variable, in particular in the regions closer to the repeat motif.

Fig. 1
figure 1

Alignment of nucleotide sequences among eight hard pine species for three microsatellite loci. The first sequence corresponds always to the focal species, Pinus taeda. Gaps are indicated by dashes, identical nucleotides are indicated by dots, N=A/C/G/T. The repeat regions are shown in bold. Numbers indicate the positions of the last nucleotide of the row when repeat regions are excluded. The name of the locus and accession number of the focal-species sequence are provided for each alignment

When all three microsatellite markers were considered together (PtTX3107, PtTX3116 and PtTX4001), there were fewer perfect repeats in the non-focal species than in the focal species, P. taeda, a decrease which was also associated with a total shortening of the repeat region (Fig. 1). There were 14 perfect repeats of a trinucleotide motif CAT in PtTX3107 of P. taeda, whereas there were between four and nine in the non-focal species. In PtTX4001, the dinucleotide motif CA was repeated 15 times in P. taeda but only an average of eight times in the remaining species. The difference in microsatellite length between P. taeda and non-focal species was highly significant as shown by a non-parametric sign test (P<0.000).

A polarity of mutation was also observed. The rate of change in the flanking sequence followed a logarithmic curve, moderately decreasing from the regions close to the repeat motifs to the extremes of the flanking sequences (Fig. 2). The value of the slope parameter (b=−0.0204), which indicates the decrease of the rate of change with distance, was significantly different from zero. The 95% confidence intervals for this parameter varied from −0.0289 to −0.0118. The non-parametric Wilcoxon test showed a significantly higher rate of change within the five nucleotides closer to the repeat motif (P<0.004) than for the rest of the sequence as well as a marginal significance level when the number of nucleotides in the proximity of the repeat region was increased to ten (P<0.083). Polarity for mutation was only apparent in the 5′ flanking region of the repeat motifs (Fig. 2), in particular for those loci showing a higher sequence variation (see, for example, the 5′ flanking region of PtTX3116 in Fig. 1).

Fig. 2
figure 2

Rate of nucleotide change at the 5′ (circles) and 3′ (triangles) flanking sequences in the 0- to 90-bp range of distance from the repeat region. A logarithmic adjustment is shown. r 2 is the coefficient of determination. Loci used in this analysis are PtTX3107, PtTX3116 and PtTX4001

Discussion

Nuclear microsatellites from P. sylvestris (three loci tested) transferred preferentially within its own subsection (Pinus), while some of the 19 microsatellites from P. taeda (subsection Australes) amplified across subsections Pinus (36–53%), Pineae (36%) and Pinaster (36–42%). The relatively high transfer rate (53%) from P. taeda to P. sylvestris and P. uncinata, the latter of which belong to subsections that diverged probably more than 100 million years ago (or 10 million generations) (Geada-López et al. 2002), suggests the utility of low-copy and under-methylated microsatellites as a generic source of molecular markers in hard pines and extends the results obtained by Shepherd et al. (2002) in a New World hard pine complex.

In pines, Echt et al. (1999) successfully transferred 80% of the microsatellites tested within subgenus Strobus but only 29% among subgenus Strobus and Pinus, which diverged over 130 million years ago (approximately 13 million generations in pines) according to the fossil record (Miller 1977). Kutil and Williams (2001), using triplet-repeat microsatellites, found high transfer rates (100%) from P. taeda (subsection Australes) to a close relative of the same subsection, P. palustris but only moderate (and similar to our study) transfer success (47%) to P. halepensis (subsection Pinaster). Shepherd et al. (2002) obtained the same transfer rates between two different pine subsections (Australes and Oocarpae) of New World hard pines as within one of them (Australes). This result might be explained by the substantially shorter period (fewer than 20 million years or two million generations; Geada-López et al. 2002) since New World hard pine subsections split.

Trans-specific microsatellites turned out to be polymorphic in 25 out of 62 cases (40%), showing moderate to high levels of heterozygosity (0.29–0.93), and therefore providing a valuable source of molecular tools for population genetic studies. This is particularly true in species for which no other microsatellite markers are available (P. nigra, P. uncinata and P. canariensis) or are scarce (P. pinaster). The set of markers developed was also useful for distinguishing between P. uncinata and P. sylvestris, thereby providing a potential tool for the characterization of forest reproductive material and the study of hybridization between these two closely related species. Pinus sylvestris was consistently more polymorphic at all of the transferred loci, which agrees with the high levels of variation found in this species using such various molecular markers as allozymes (Prus-Glowacki and Stephan 1994), mtDNA and RFLPs (Soranzo et al. 2000). In contrast, no trans-specific microsatellite was polymorphic in P. pinea and, consequently, de novo development of microsatellites seems mandatory for this species. This lack of polymorphism in P. pinea was a not surprising result, as this is a conifer species that has also been shown to have a remarkable low heterozygosity with other markers—allozymes (Fallour et al. 1997) and cpSSRs (Gómez et al. 2002, G.G. Vendramin personal communication) that is only comparable to the North American P. resinosa (Echt et al. 1998). However, the apparently monomorphic PtTX3107 in P. pinea presented a well-structured repeat region (see Fig. 1), which did turn out to be polymorphic when we tested an additional sample of individuals from other populations (a new 159-bp variant, with eight repetitions of the CAT motif, was found).

High sequence similarity among primer binding sites, flanking regions, and repeat motifs is treated as orthology for trans-specific microsatellites (Kutil and Williams 2001). Conserved flanking regions and consistent repeat motifs in PtTX3107 and PtTX4001 supported the cross-amplification of orthologous sequences. In contrast, repeat motifs were not conserved across species in PtTX3116, which could be an evidence of paralogy. Karhu et al. (2000) observed locus duplication at microsatellite RPS 105 (not included in this study) in the soft pine P. strobus. Sequences for RPS 105a and RPS 105b had a 94% similarity in the flanking region but a totally different repeat structure. Elsik and Williams (2001) showed that pine microsatellites occur in families with the different members of the same family having different repeat motifs but the same flanking regions. Paralogy in PtTX3116 could explain the observed high similarity of flanking regions and primer-binding sites in cross-amplified sequences showing different repeat motifs.

The point mutations and indels found in the flanking sequences of microsatellite repeat motifs have been considered to be phylogenetically informative in several species, including crabs (Orti et al. 1997), wasps (Zhu et al. 2000) and pines (Shepherd et al. 2002). If we consider the variation in flanking sequences (PtTX3116 excluded) that we observed, our results are consistent with the inclusion of P. pinaster in the assembly of Mediterranean hard pine species (see, for instance, base substitutions at positions 66, 107 and 109 in PtTX4001) and with the differentiation of Eurasian from New World hard pines, as suggested previously by Frankis (1993), Krupkin et al. (1996) and Geada-López et al. (2002).

The rate of change in the flanking sequence of trans-specific microsatellites was higher in the regions close to the repeat motifs. This result can be interpreted as either higher sequence instability nearby the microsatellite region or as base substitution events causing imperfections in the extremes of the repeat region. Point mutations causing imperfections arise more frequently in the extremes than in the centre of repeat arrays (“polarity of substitutions”, as defined by Brohede and Ellegren 1999; Ellegren 2000). Brohede and Ellegren (1999) found a relative sequence instability within a 5- to 10-bp region at the border of microsatellite repeat regions in cattle and subsequently advanced three different models to account for this variation: (1) differences in the probability of replication slippage in nearly completely replicated repeat tracks and in tracks where replication has just started; (2) loop formation in connection with homologous recombination or non-reciprocal gene conversion; (3) a higher tendency to accept mismatches in repeat ends than in the middle of the repeat regions during homologous recombination following DNA damage. In pines, variable evolutionary rates for different parts of a microsatellite locus have been recently suggested (Karhu 2001), but differences in the rate of change in the flanking sequences, as shown by this study, have not been previously reported. However, our results are based on only three microsatellite markers with highly conserved priming sites and might not stand up to a broader microsatellite sampling.

Microsatellite markers transferred from P. taeda showed, in general, a smaller average length (sequencing data) or size (diversity screening data) in the non-focal species. Ascertainment bias could explain these results. However, this study was limited in its ability to estimate ascertainment bias because it did not include reciprocal transference studies (Cooper et al. 1998; Crawford et al. 1998). An alternative hypothesis to ascertainment bias is related to the evolution of microsatellite loci. Young microsatellites, evolving from a shorter allele length, have a bias toward allele length expansion (Taylor et al. 1999). Imperfections (interruptions of the repeat motif) tend to accumulate over the life cycle of a microsatellite region, which lowers the mutation rate and eventually leads to the degradation and loss of the repeat motif (Zhu et al. 2000). Mutational bias in favour of microsatellite expansions in P. taeda (microsatellites in young stages) or a higher average genome-wide microsatellite mutation rate in P. taeda than in non-focal species could also explain our results. Additional support for the evolutionary hypothesis comes from the observation that trans-specific markers that diverged over relatively short evolutionary times did not show differences in average allele size or length. This was seen in the data for P. uncinata, P. sylvestris and P. nigra, all of which are from subsection Pinus, and is supported by data for New World hard pines (Shepherd et al. 2002).

In conclusion, given the difficulty of de novo development of microsatellite markers in species with limited commercial value, cross-amplification of loci among phylogenetically close species would seem to be an adequate strategy, in particular when the objective is only to develop a limited number of markers to accomplish population or conservation genetic studies. In hard pines, successful microsatellite transfer has proved to be feasible for species that diverged as long as 100 million years ago (ten million generations). A limited sequencing effort can produce relevant information about the evolution of this widespread kind of markers, allowing the selection of those that more closely adjust to theoretical microsatellite mutation models, as the stepwise mutation model, thereby underlying common gene diversity or population structure parameter estimates.