Introduction

The genus Taxodium is well known for its extreme tolerance to waterlogging, thus having great ecological and economic potential (Qi et al. 2014). Taxodium ‘zhongshansa’ are interspecies hybrid clones generated from three Taxodium species, Taxodium mucronatum, Taxodium distichum, and Taxodium ascendens (Yu et al. 2011). T. ‘zhongshansa’ are versatile trees for ornamental use in urban areas, ecological use in wetlands, and economic use for timber production (Ying and Yu 2005; Hu et al. 2012; Zhang et al. 2012; Xu et al. 2013). Until now, progress has been made on understanding the physiological response to flooding stress, as well as superior clone selection and rapid propagation of T.zhongshansa’ (Ying and Yu 2005). Several hybrid clones, such as ‘zhongshansa 302’, ‘zhongshansa 118’, and ‘zhongshansa 405’, have been selected and widely planted in eastern China (Yu et al. 2011). Nevertheless, little is known about the molecular genetics due primarily to a lack of reliable molecular markers in Toxadium. This greatly hinders exploring the underlying mechanisms of economically important traits and molecular breeding. Hence, there is an urgent need to develop reliable molecular makers for Taxodium.

DNA-based molecular markers provide powerful tools for constructing genetic linkage maps, assessing the genetic diversity of germplasm, quantifying population genetic structures, relatedness, evolution, determining quantitative trait loci (QTLs), and comparative genomics (Siongng et al. 2009; Koelling et al. 2012; Varshney et al. 2005; Cavagnaro et al. 2011). Microsatellite markers or simple sequence repeats (SSRs) are short units of 1–6 nucleotide tandem repeat sequences, occurring in both coding and noncoding regions of eukaryotic genomes (Tautz and Renz 1984). Due to their high level of polymorphisms, transferability, reproducibility, codominant inheritance, and the ease of detection, SSRs are the best choice for molecular markers in genetics research (Powell et al. 1996; Gupta et al. 2003).

Currently, there are two types of SSRs based on their origin: genomic SSRs (gSSRs) and expressed sequence tag derived SSR (EST-SSRs) (Yang et al. 2013). Traditional methods of developing SSRs from genomic DNA are costly and time-consuming, involving the construction of genomic DNA libraries, probe hybridization, cloning, and sequencing. EST-SSRs, also called gene-based SSRs or genetic markers, can be developed efficiently and cost-effectively with advent of Next Generation Sequencing (NGS) systems (Zhai et al. 2014). Using the Illumina system, the short reads-based de novo assembly of contigs does not require a reference sequence and a large numbers of SSR markers can be developed using this method (Iorizzo et al. 2011). Compared with the SSR markers derived from the large-scale cloning and sequencing of DNA or insufficient public EST libraries, the development of EST-SSRs from transcriptome sequences provides adequate resources for mining large numbers of SSR markers quickly in non-model organisms (Barbará et al. 2007; Mardis 2008).

EST-SSR markers have many advantages in molecular genetics research. (1) They can detect variation in the expressed portion of the genome, allowing gene tagging to bridge between significant genetic traits and genes; (2) they can be developed at low cost and high speed from EST databases; (3) once developed, these markers, unlike genomic SSRs, may be used across a number of related species (Gupta et al. 2003); and (4) they are codominant and reproducible (Saha et al. 2004). Therefore, EST-SSR markers have been developed in many important plant, animal, and microbial species, such as wheat (Nicot et al. 2004), shrimp (Franklin et al. 2004), and Auricularia polytricha (Zhou et al. 2014).

The aim of this study was to develop EST-SSR markers as genomic tools for T.zhongshansa’. The transcriptome data of ‘zhongshansa 406’ were obtained by Illumina sequencing (Qi et al. 2014). MIcroSAtellite (MISA) was used to detect SSR loci in the ESTs (Thiel et al. 2003), and Primer3.0 was used for designing primers to amplify EST-SSR molecular markers (Cardle et al. 2000). Moreover, the characteristics of EST-SSRs were analyzed, and the polymorphisms and cross-species transferability of a subset of EST-SSR primers had been estimated. To our knowledge, this study is the first to develop SSR markers in the genus of Taxodium.

Materials and Methods

Plant Materials

Altogether, 12 genotypes from three Taxodium species were taken as plant materials, of which five genotypes came from Taxodium mucronatum Tenore (TM01, TM02, TM03, TM04, and TM05), four genotypes from Taxodium distichum (L.) Rich (TD01, TD02, TD03, and TD04), and two genotypes from T. ascendens Brongn (TB01 and TB02), and one hybrid clone T.zhongshansa405’ (TA 405), which is an interspecies hybrid clone generated from the cross of T. mucronatum (♀) × T. distichum (♂). The plant samples were collected from Nanjing Botanical Garden in April 2014. Young leaves, collected from each tree, were transported to the laboratory in an ice box and stored in a refrigerator at −80 °C prior to DNA extraction.

Identification SSRs in ESTs and Primer Design

MIcroSAtellite (http://pgrc.ipk-gatersleben.de/misa/misa.html) was used to identify SSR loci in the transcriptome data. The criteria were set for the detection of mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide motifs with a minimum unit size of ten, six, five, five, five, and five repeats, respectively. When the distance between the two microsatellites was less than 100 bp, they were considered as a composite microsatellite. The output data included unigene IDs, repeat motifs, number of repeats, and start and end positions. Primers were designed using Primer 3.0 software (available online at http://www.genome.wi.mit.edu/genome_software/other/primer3.html ). Region 50 bp from each end of the repeat was excluded from primer site consideration. All primers were designed to have similar annealing temperatures to allow for uniform polymerase chain reaction (PCR) cycling conditions. The range of optimal primer sizes was 17–25 bp, of PCR product sizes was 100–300 bp, and of the GC content was from 40–60 %, with 50 % as the optimal.

DNA Extraction and PCR Amplification

DNA was isolated from the fresh leaves following the CTAB method (Doyle and Doyle 1987). PCR reactions were carried out in a 10 μL reaction mix consisting of 3.75 mmol L−1 MgCl2, 0.4 mmol L−1dNTPs, 0.75 μmol L−1 of each primer, 0.5 U of TaqDNA polymerase, and 20 ng of genomic DNA. The PCR amplification protocol consisted of an initial denaturation at 94 °C for 3 min, followed by 30 cycles of 30 s at 94 °C, 45 s at the annealing temperature of 59 °C, and 45 s at 72 °C, ending with a final extension at 72 °C for 7 min. Amplification products were separated on 12 % polyacrylamide gels and stained with ethidium bromide for SSR band detection.

Polymorphism and Transferability Detection of EST-SSRs

To verify the effectiveness of this set of SSR markers, 503 primer pairs were randomly selected to amplify across 12 genomic DNA templates from three Taxodium species. The primers were considered polymorphic when a locus had at least two alleles at any frequency. The SSR marker diversity index (Na, observed number of alleles; Alleles freq, Alleles frequency) was estimated by POPGENE version 1.31 (Yeh et al. 1999). Polymorphism information content (PIC) was calculated using the formula developed by Anderson et al. (1993). Additionally, the selected EST-SSR markers were assessed across three closely related species of Taxodium. The amplification ratios and polymorphic parameters (Pa, the number of polymorphic loci; Na, observed number of alleles; and Ho, observed heterozygosity of polymorphic SSRs) across three species were summarized to analyze their transferability and polymorphic variation.

Functional Annotation of EST-SSRs

An overview of the functional categories of the 503 EST-SSRs was produced by a Gene Ontology (GO) analysis using Blast2GO v2.5 software. The GO analysis assigned the sequences into three categories, ‘molecular function (MF)’, ‘biological process (BP)’, and ‘cellular component (CC)’. The terms in each category were further analyzed.

Results

Frequency and Distribution of EST-SSRs in Taxodium ‘zhongshansa’

A total number of 108,692 EST sequences with a total size of 69.309123 Mb were examined in T.zhongshansa’. Ten thousand thirty-eight SSR loci were identified by MISA in 8137 (7.49 %) SSR-containing sequence found in the EST database. The frequency of occurrence for SSR loci was one in 6.90 kb of EST sequences in T.zhongshansa’. In 8137 SSR-containing ESTs, 6761 (83.09 %) ESTs contained one SSR locus and 1376 (16.91 %) contained two loci or more. In addition, 943 (9.39 % of 10,038 SSRs) SSR loci were considered as compound formation according to the predefined criteria (Table 1).

Table 1 Summary of EST-SSRs analysis in T. ‘zhongshansa’ transcriptome

As shown in Table 2, the most frequent repeat motifs were mononucleotide repeats (6581, 65.56 %), followed by trinucleotides (2246, 22.37 %) and dinucleotides (1080, 10.76 %). Tetranucleotide (98, 0.98 %), pentanucleotide (17, 0.17 %), and hexanucleotide (16, 0.16 %) occurred less frequently. Table 2 also showed the distribution of repeat units. About 30.40 % (3052) of the EST-SSRs contained 10 repeat units, followed by 5 (15.03 %, 1509), 11 (14.72 %, 1478), and 6 (10.36 %, 1040). The least representative repeat units were 9 (0.88 %, 88). Moreover, there were 1324 potential EST-SSRs containing more than 13 repeat units of mononucleotide repeats.

Table 2 Distribution of EST-SSRs based on the number of repeat units and motif types

EST-SSR Marker Development in Taxodiumzhongshansa

Finally, 1958 EST-SSR markers were developed from 10038 SSR loci in T.zhongshansa’ by Primer3.0 software. The SSR primers were named as TAxxxx, where TA denotes Taxodium. The remanent 8080 SSR loci were excluded from designing primers due to their inappropriate sequence structures such as short flanking sequences, SSR loci with mononucleotide repeats (easily leading to high mismatch ratio during DNA amplification), and compound SSRs. Their detailed information is shown in Additional file 1.

Polymorphism and Cross-Species Transferability Detection of EST-SSR Markers

A sample of 503 selected SSR primer pairs was applied to polymorphism detection across 12 genomic DNA using the optimized SSR-PCR system. Two hundred fifty-seven primer pairs (51.1 % of the sample) amplified the expected bands, while the remaining 246 pairs of primers amplified no product or products with unexpected size. Eighty one (Table 3) of the 257 primer pairs could amplify polymorphic (alleles of different size) and the other 176 (Addition file 2) amplified monomorphic products. Figure 1 showed the typical polymorphic bands and monomorphic band amplified by three of the 257 primer pairs. Finally, 174 alleles were found in 81 polymorphic SSR loci. The range of alleles detected on each locus was 1–4 with the average of 2.148. The allele frequency for all primers varied from 0.042 to 0.958 with the average of 0.466. The mean value of polymorphism information content (PIC) was 0.323, with a maximum of 0.66 and a minimum of 0.08. And the scope of the size about 81 polymorphic SSR loci is 12 to 36 bp; however, 80.25 % (65) of them varied from 12 to 20 bp (Table 3).

Table 3 Characteristics of 81 polymorphic EST-SSR markers for Taxodium with allele number(Na), allele frequency (Allele freq), and polymorphism information content (PIC)
Fig. 1
figure 1

Typical polyacrylamide gel electrophoresis patterns of amplification results using three EST-SSR markers. Note: a and b show the polymorphic of Primer TA 30 (two alleles) and 40 (four alleles). c shows monomiphic of Primer TA 163. M represents DNA Maker (50 bp). Line 1 ~ 12 represent the amplification products across 12 genotypes (Line 1, 2, 3, 4, 5 stands for TM01, TM02, TM03, TM04, and TM05, respectively; Line 6,7, 8, 9 stands for TD01, TD02, TD03, and TD04, respectively; Line 10, 11 stands for TB01 and TB02, respectively; Line 12 stands for TA405)

Meanwhile, all 81 polymorphic SSR markers developed from T. ‘zhongshansa’ could amplify clear bands in the other three species, indicating a 100 % cross-species transferability ratio. The polymorphic variations of 81 primer pairs in the three species were further valued with three parameters of Pa, Na, and Ho (Table 4). The data of polymorphic parameters revealed weak interspecies differences among the three investigated species of Taxodium.

Table 4 Variability parameters for SSRs of the three Taxodium taxa

Functional Annotation for SSR-Containing EST Sequences

The original ESTs of the 503 EST-SSRs in T. ‘zhongshansa’ were annotated with gene ontology (GO), 264 (52.49 %) out of which had significant matching annotations in Nr (NCBI non-redundant protein sequences) and protein family (Pfam) on the basis of sequence similarity (Additional file 3). Studies on gene ontology focused on three categories, biological process (BP), molecular function (MF), and cellular component (CC) and the number and percentage for each category was BP (98, 37.12 %), MF (31, 11.74 %), and CC (135, 51.14 %). BP was further subdivided into eight groups represented by cellular process (60, 61.22 %), metabolic process (13, 13.27 %), localization (12, 12.24 %), signaling (5, 5.10 %), response to stimulus (3, 3.06 %), single organism process (2, 2.04 %), multicellular organismal process (2, 2.04 %), and developmental process (1, 1.02 %) (Fig. 2a). The detected matches for MF were binding (18, 58.06 %), catalytic activity (10, 32.26 %), channel regulator activity (2, 6.45 %), and enzyme regulator activity (1, 3.23 %) (Fig. 2b). Organelle (63, 46.67 %), cell (50, 37.04 %), membrane (15, 11.11 %), virion (6, 4.44 %), and extracellular region part (1, 0.74 %) constitute the groups in CC (Fig. 2c).

Fig. 2
figure 2

Two hundred sixty-four gene ontology annotations based on Blast2GO analysis (level 2). a Biological process; b molecular function, c cellular component

Discussion

Frequency, Type, and Distribution of SSRs in Taxodiumzhongshansa

An EST database is an important resource for EST-SSR marker development (Vidushi et al. 2015). Fortunately, enormous quantities of EST sequences can now be generated from next-generation sequencing (NGS) technologies effectively (Ashrafi et al. 2012). But only a high quality transcriptome can develop a large number of functional SSR markers. The quality of transcriptome getting from Illumina platform could be assessed by the number of assembled transcripts, total bases of transcripts, mean length of transcripts, N50 statistic, and number of long transcripts (≥1 kb) (Wu et al. 2014; Jayasena et al. 2014; O’Nei and Emrich 2013). Here, N50 statistic is defined as the length of the shortest transcript in the set that contains the longest transcripts whose combined length reaches at least half of the sum of the lengths of all transcripts. Transcriptomes with larger number of assembled transcripts and total bases of transcripts are thought to be higher quality (Zhao et al. 2011). In the transcriptome of T. ‘zhongshansa’, 143,636 transcripts with a total of 106.3 Mb were assembled, of which 31,665 were long transcripts (≥1 kb). Besides, the mean length and N50 length of all transcripts in Taxodium transcriptome were 740 and 1324 bp, respectively, (Qi et al. 2014). Compared with the transcriptomes of pepper (Ashrafi et al. 2012) and seashore paspalum (Jia et al. 2015), the transcriptome in T. ‘zhongshansa’ contained larger number of transcripts and total bases of transcripts, indicating a higher quality of Taxodium transcriptome.

In addition to the quality of transcriptome, the frequency and density of SSRs were also affected by the search criteria set to identify SSRs, SSR search tools, datasets for the database mining, and the nature of species (Durand et al. 2010).

In comparison with former investigations, the number of SSR-containing ESTs (7.49 %, 8137) in T. ‘zhongshansa’ was higher than that in some other plants, such as citrus (5.83 %) (Chen et al. 2006), Jatropha curcas (6.8 %) (Yadav et al. 2011), Rosaceae (4 %) (Jung et al. 2005), grape (2.5 %) (Scott et al. 2000), wheat (1.72 %), rice (1.92 %), maize (0.88 %), and soybean (1.45 %) (Gao et al. 2003). These differences in the number of SSR-containing ESTs can be partly attributed to SSR search criteria and the characteristics of the EST database analyzed in (Yadav et al. 2011). The SSR search criteria can also change relative estimates of frequency of EST-SSRs (Yadav et al. 2011). The average SSR frequency is one in 6.9 kb of EST sequences in T. ‘zhongshansa’, which is lower than that in citrus (one in 5.2 kb) (Chen et al. 2006), J. curcas (one locus in 6 kb) (Yadav et al. 2011), and rosa (one in 5.5 kb) (Jung et al. 2005), but higher than that in wheat (one in 17.42 kb), rice (one in 11.81 kb), maize (one in 28.32 kb), and soybean (one in 23.80 kb) (Gao et al. 2003), showing that different microsatellite frequencies exist in diverse plants.

The highest proportion of mononucleotide repeat sequences in Taxodium ‘zhongshansa’ is in accordance with the results from J. curcas L. (Kumari et al. 2013) and wheat (Asadi and Monfared 2014), but in contrast to those from citrus (Chen et al. 2006), lily (Du et al. 2014), pineapple (Ong et al. 2012), and arabidopsis (Cardle et al. 2000), in which trinucleotides were identified as the most numerous repeats. A potential explanation for the high ratio of mononucleotide SSR loci in this study was that mononucleotide SSRs would be amplified rapidly after interspecific hybridization (Gao et al. 2011). The repeat units of 10, 5, 11, and 6 about SSR sequences account 69.73 % of the total SSRs, thus a rough trend could be discerned where the size of SSRs mainly ranged from 10 to 18 bp (Table 2). Cardle et al. (2000) demonstrated that short SSRs with size less than 20 bp have high polymorphic. The SSR motifs were also compared with the previous studies on pines (Chagné et al. 2004; Echt et al. 2011) and spruce (Rungis et al. 2004). The (A)n, (T)n, and (AT)n were also found to be the most motifs in these coniferous species. Lagercrantz et al. (1993) reported that the A/T richness of microsatellites was thought to be high in the genomic sequences of plants. And AT repeats were preferentially found at the 3′ end of the EST sequences (Rungis et al. 2004).

EST-SSR Polymorphisms and Cross-Species Transferability

This is the first report on the development and characterization of a set of functional SSR markers in Taxodium. The rate of primer amplification is 51.1 %, which is slightly lower than that in grape (Scott et al. 2000) and J. curcas L. (Yadav et al. 2011). The ratio of polymorphic primers (16.1 %) is also as low as some typical coniferous trees (Postolache et al. 2014). The coniferous trees with huge genomes have a very high content of repetitive DNA, such as transposable elements, which may lead to the low polymorphism rate (Kovach et al. 2010; Wagner et al. 2012). The PIC value, reflecting allelic diversity and frequencies among the testing individuals, can be used to evaluate the informativeness of each EST-SSR (Bostein et al. 1980). Three levels of PIC (PIC > 0.5, 0.5 > PIC > 0.25, PIC < 0.25) were generally used to judge the polymorphisms. Among 81 polymorphic primer pairs of T. ‘zhongshansa’, only eight (9.88 %) primer pairs presented high PIC value (>0.5), 43 (53.09 %) primer pairs presented moderate PIC value, ranging from 0.25 to 0.5, while the remaining 30 (37.04 %) primer pairs presented low PIC value (<0.25). The relatively lower polymorphism of EST-SSRs, compared to genomic SSRs, was confirmed in the studies of other plants (Cardle et al. 2000; Chagné et al. 2004).

The transferability of DNA markers was based on genomic similarity, thus it could reflect the genetic relationship between species (Zhang et al. 2014). Usually, EST-SSR markers have higher transferability rates between species with closer genetic relationships (Durand et al. 2010). Nevertheless, the actual transferability was also affected by other factors, such as nucleotide deletions, insertions, and substitutions at these SSR loci (Decroocq et al. 2003). In this investigation, it revealed that these EST-SSR markers developed from T. ‘zhongshansa’ were highly transferable across three closely related species. In particular, the amplification rate was 100 % for the three species belonging to Taxodium.

The polymorphic variation was not obvious between the three Taxodium species (Table 4), this might be attributed to the close relatedness and the extensive gene flow between the three species. The close relatedness was showed in the similar morphological features (Cheng 1983), the sympatric natural distributions (Florin 1963), and the high crossing compatibility (Ying and Yu 2005; Yu et al. 2011) between the three species. Indeed, Denny and Arnold (2007) treated these three species of Toxadium as one species with three botanical varieties. Besides the characteristics of high crossability, the open-pollinated mating system and wind-dispersed pollen made the extensive gene flow between the three Taxodium species possible. Undoubtedly, the long-term and frequent interspecies gene flow could greatly dilute the genetic differentiation between species. Thus, although Taxodium maintained high genetic diversity, little difference was found between the three species.

The Functional Annotation for SSR-Containing EST Sequences and its Potential Application

Gene ontology, the functional annotation schema for gene and protein sequences, has been the de facto standard in nearly all public databases (Götz et al. 2008). In this paper, the specific function of important genes was shown in more detail by GO terms. These sequences containing-SSR were mainly related with intracellular part (21), membrane (18), intracellular membrane-bounded organelle (13), intracellular organelle lumen (12), nuclear part (11), metabolism of macromolecular compound (10), and immunity (7). Further analyses of these genes should provide information about cell metabolism and may help to understand Taxodium’s resistance to waterlogging stress (Palta, et al. 2012). It is noteworthy that 81 SSRs showing high polymorphism mainly connected with cellular macromolecule metabolic process, cellular response to stress, hydrolase activity, and acting on glycosyl bonds, which may help T. ‘zhongshansa’ produce and maintain ATP to adapt to anaerobic environments (Qi et al. 2014). Thus, these SSR loci might be applied as target genes to explore the relationship between gene function and phenotype of resistance in Taxodium.

Conclusions

This paper reported the characteristics of SSRs in T.zhongshansa’ EST sequences, including the frequency and distribution of SSR repeat lengths and motifs. We also identified and characterized a set of EST-SSRs and evaluated their polymorphism and cross-species transferability rates. This set of EST-SSR markers may provide a useful tool for subsequent studies in T.zhongshansa’, such as construction of genetic maps, QTL mapping, and marker assisted selection. Furthermore, the functional categorization of T.zhongshansa’ containing-SSR sequences revealed that these EST sequences were transcribed from many functional genes involved in cellular and molecular processes. Hence, this set of EST-SSR markers may also be used to explore the relationship between gene function and phenotype.