Introduction

The genus Paeonia includes three sections, namely sect. Moutan, sect. Paeonia, and sect. Onaepia (Stern 1946). Intersectional hybridization between the sect. Moutan and sect. Paeonia forms a new type of group which has been called Itoh Hybrid. Tree peony (Paeonia suffruticosa Andrews), which belongs to the section Moutan, is composed of the subsection Vaginatae and the subsection Delavayanae (Stern 1946). This species is a perennial woody flower plant native to China with approximately 2,000 cultivars. China is also the origin and developmental centre of the cultivated tree peony with a cultivation history of 1,600 years. Due to its striking ornamental and medicinal value, as well as its unique culture symbolisation, the tree peony has become one of the most important horticultural flower crops in the world and is widely cultivated in many countries in Asia, America, Europe, and Australia (Cheng 2007). In addition, recent studies have shown that the tree peony seed has a high rate of oil yield (27–33 %), and seed oil contains abundant unsaturated fatty acids, such as linolenic acid (33.87 %), oleic acid (27.73 %), and linoleic acid (21.40 %) (Zhou et al. 2009). Thus, it has been proposed that the tree peony seed has a great potential for edible oil production as a new crop, which will have broad market prospects and is gaining increasing attention from the Chinese government (Li et al. 2012a).

Traditionally, the breeding of tree peony is based on hybridisation among species and cultivars and the phenotypic selection of novel or improved offspring, followed by vegetative propagation of released cultivars. Although traditional breeding still plays an important role in improving the quality of floral crops, it is limited by the long selection cycle and enormous resources required in the cultivation of new varieties. Thus, any method that offers a shortcut to breeding would be invaluable in the improvement of woody flower crops, such as the tree peony. Marker-assisted selection (MAS) has the great potential to accelerate the crop breeding; it will enable breeders to quickly select cultivars with desirable traits, thereby saving time and resources. However, the basic requirement of MAS is to find the molecular markers associated with important traits. In tree peony, MAS has been hindered due to lack of effective molecular markers.

Within the last two decades, many types of markers have been developed and used for crop breeding (Paux et al. 2012). Of these various markers, simple sequence repeats (SSRs) are widely used due to their co-dominant inheritance, multi-allelic nature, high reproducibility and transferability, extensive genome coverage and simple detection (Agarwal et al. 2008; Varshney et al. 2005). According to the original sequences used to identify SSR markers, SSRs can be divided into genomic SSRs and expressed sequence tag (EST) SSRs. The traditional development of genomic SSRs is both time-consuming and cost-intensive and involves the construction and sequencing of a genomic library (Eujayl et al. 2004). In contrast, EST-SSRs can be rapidly mined at a lower cost from expressed sequences, which are more conserved compared to non-coding sequences. Thus, EST-SSRs demonstrate a higher level of transferability across related species compared to genomic SSR markers (Zhou et al. 2014; Zeng et al. 2010). Moreover, due to their association with coding sequences, EST-SSRs have the potential for direct gene tagging for important horticultural traits (Bouck and Vision 2007).

To date, more than 200 SSRs markers have been developed in tree peony (Gao et al. 2013; Yu et al. 2013; Gai et al. 2012; Zhang et al. 2011, 2012; Hou et al. 2011a, b; Homolka et al. 2010; Wang et al. 2009), and they have been applied effectively to study hybrid origin (Yuan et al. 2010), genetic diversity (Yuan et al. 2012), cultivar identification (Zhang et al. 2012), genetic relationship (Yu et al. 2013) and the domestication of cultivated tree peonies (Yuan et al. 2014). However, compared with crop plants such as maize, rice, and wheat, the number of SSR markers for further molecular breeding of tree peony is inadequate. In addition, the size of the tree peony genome is very large, approximately 16 G (Gao et al. 2013). We speculate the developed SSR markers are very hard to cover and be evenly distributed to the entire tree peony genome, for the limited number of markers. Meanwhile, fewer SSRs have been developed for the herbaceous species of the genus Paeonia, P. lactiflora (Li et al. 2011; Sun et al. 2011; Gilmore et al. 2013). Thus, it is still very essential to develop more effective SSR markers for MAS in tree peony breeding.

In recent years, an increasing number of EST data sets have become available for model and non-model organisms; however, relatively few ESTs are currently available for tree peony. Transcriptome sequencing using next-generation sequencing (NGS) technology provides a simple and effective method to develop a large number of unigene-based SSR markers. In tree peony, our laboratory has previously performed de novo transcriptome sequencing for P. suffruticosa ‘Luo Yang Hong’ using the Illumina sequencing platform and identified a total of 59,275 unigene ESTs (Zhou et al. 2013). In this study, a comprehensive set of EST-SSRs for tree peony was identified and characterised in terms of frequency and distribution, informativeness, and transferability to related species. Moreover, the effectiveness of these SSR markers will be assessed in the study of genetic relationships between 56 accessions of Paeonia.

Materials and methods

Plant material and DNA extraction

P. suffruticosa ‘Luo Yang Hong’ was first used to confirm the amplification specificity of the synthesised EST-SSR primers. Two other common cultivated varieties, P. ostii ‘Feng Dan’ and P. suffruticosa ‘Hong Qiao’, which were crossed to establish the mapping population in our laboratory, were used to examine polymorphic microsatellite loci. A set of 56 accessions, including 36 tree peonies (section Moutan) (8 species and 28 cultivars), 8 herbaceous peonies (section Paeonia) (2 species and 6 cultivars), and 12 intersectional hybrids between the two types (Itoh hybrids), was used to analyse the polymorphisms and transferability of 30 EST-SSR markers. And the genetic relationships of these 56 accessions were also analysed using 30 EST-SSR markers. The tree peony species used in this study included P. jishanensis, P. qiui, P. rockii, P. decomposita, P. ostii, P. delavayi var. lutea and P. ludlowii, all of which belong to the subsection Vaginatae except P. delavayi var. lutea and P. ludlowii which belong to the subsection Delavayanae. The herbaceous peony species included P. veitchii and P. lactiflora. The 46 cultivar accessions used (Table 2, 9–48, 51–56) included six distinct cultivar types, which consisted of Suffruticosa hybrid, Rockii hybrid, Lemoinei hybrid, Itoh hybrid, Lactiflora cultivar, and Hybrid peony. Detailed descriptions of the 56 accessions are listed in Table 2. All of these samples were collected from the Luoyang National Tree Peony Collection (Luoyang, Henan, China) and the Jiufeng Forestry Experiment Station of Beijing Forestry University (Beijing, China) in the spring of 2012 from fresh young leaves.

Genomic DNA was extracted from silica gel-dried leaves using a DNAsecure plant kit (Tiangen Biotech, Beijing, China). The DNA quality and quantity were confirmed using 1 % agarose gels and Unico UV–visible Spectrophotometer (Unico, USA), respectively. The working concentration of DNA was adjusted to 25 ng/μL.

EST-SSRs identification, primer design, PCR amplification

A total of 59,275 unigenes were obtained from tree peony ‘Luo Yang Hong’ transcriptome sequences of flower buds (Zhou et al. 2013). The simple sequence repeat identification tool program (SSRIT) (http://www.gramene.org/db/markers/ssrtool) (Temnykh et al. 2001) was used to search for SSRs within this unigene ESTs database. In this study, only SSR loci were considered to contain di-, tri-, tetra-, penta- and hexa-nucleotides with minimum repeat numbers of 6, 5, 4, 4, and 3, respectively. Mononucleotide repeats and complex SSR types were excluded from the study. To predict the SSRs location and whether the SSR was present in the coding sequence (CDS), 5′ untranslated region (5′ UTR) or 3′ untranslated region (3′ UTR), the ORF Finder software (http://www.ncbi.nlm.nih.gov/gorf/gorf.html) was used to identify the initiation and termination codons in the EST sequences.

Primers were designed to flank the SSRs using Primer Premier 5.0 software (Premier Biosoft International, Palo Alto, CA, USA). Primers were synthesised by Shanghai Sangon Biological Engineering Technology (Shanghai, China). Polymerase chain reactions (PCRs) were performed in 10-μl reaction volumes containing 25 ng of template DNA, 5 μl of 1× Power Taq PCR MasterMix (BioTeke, Beijing, China), 0.5 μl of 10 μM each primer and 3 μl sterile distilled water. The following PCR program included an initial step at 94 °C for 3 min followed by 30 cycles of 94 °C for 30 s, the appropriate annealing temperature (see Table S1) for 30 s and 1 min at 72 °C, and a final extension for 10 min at 72 °C. Finally, the products were examined on 2 % agarose gels. Amplified products that showed a band of the expected size were separated on 6 % denaturing polyacrylamide gel electrophoresis and then visualised using silver staining. To confirm the reproducibility of the results, PCR amplification with each primer pair was performed three times.

Putative functional annotation

To identify the putative function of the unigene sequences containing the polymorphic microsatellite loci, the sequences were subjected to a BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi) search against the NCBI nonredundant protein database using the BLASTX program. Each unigene sequence was assigned to its putative functions based on their highest scores to BLASTX. Sequences with E-value of less than 10−6 and scores of more than 50 were designated as ‘significant similarity matches’ and were annotated.

Data analysis

The SSR marker diversity was estimated using POPGENE version 1.31 (Yeh et al. 1999), which included the following parameters: the number of alleles (N a), the observed and expected heterozygosities (H o and H e), polymorphic information content (PIC), and Shannon index. The availability of the EST-SSR was validated using cluster analyses to evaluate the genetic relationships between species and cultivars. A neighbour-joining (NJ) tree was constructed using the proportion of shared alleles coefficient from the program PowerMarker version 3.25 (Liu and Muse 2005). Eight herbaceous peony accessions were used as an outgroup in the NJ tree. The robustness of the dendrogram was subsequently tested using bootstrap analysis with 1,000 replications.

Results

Frequency and distribution of EST-SSRs in the tree peony transcriptome

A total of 59,275 unigenes (19,288 contigs and 39,987 singletons) with a mean length of 698 bp were obtained to identify perfect SSRs, which represented approximately 39.5 Mb. Using the SSRIT tool, 4,373 potential EST-SSRs were identified from 3,787 (6.4 %) unigene sequences, of which 500 sequences (13.2 %) contained more than one SSR. On average, one SSR was identified for every 9.24 kb, or corresponded to one SSR for every 13.6 unigenes in the tree peony transcriptome. The EST-SSRs contained five diverse types of repeat motifs, and there was an uneven distribution of EST-SSRs. Among the motifs identified, di-nucleotide repeats were the most abundant type with a frequency of 46.26 % (2,023) followed by tri-nucleotide (27.30 %, 1,194), hexa-nucleotide (21.88 %, 957), tetra-nucleotide (3.06 %, 134) and penta-nucleotide repeats (1.49 %, 65) (Table 1).

Table 1 Summary of EST-SSRs identified in tree peony transcriptome

The frequency of 28 major types with di- and tri-nucleotide repeat motifs and repeat number of SSRs are shown in Fig. 1. Among the di-nucleotide motifs, the most common motifs were AG/CT (41.4 %) and GA/TC (39.9 %), which were also the most dominant motif of all SSRs, followed by the motif AT/TA (11 %), CA/TG (4.4 %), AC/GT (2.9 %), and CG/GC (0.4 %). Among the tri-nucleotide motifs, CCA/TGG (10.3 %) and GAA/TTC (10.3 %) were the richest motif (Fig. 1a, b). However, there were no obvious dominant motifs among the tetra-, penta- and hexa-nucleotide motifs. The SSR length was mostly distributed from 12 to 24 bp, which accounted for 99.1 % of the total SSRs, followed by 25–66 bp (0.9 %). Among them, the SSR loci with a length of 18 bp were the most common. The repeat unit number of SSR ranged from 3 to 17, and SSRs with six repeats were the most abundant followed by SSRs with five, three and seven repeats, while motifs of more than 11 reiterations were notably rare with a frequency of 0.2 % (Fig. 1c).

Fig. 1
figure 1

Characterisation of SSRs in tree peony transcriptome. a Frequency distribution of di-nucleotide SSRs based on motif type; b frequency distribution of tri-nucleotide SSRs based on main motif type; c number of different repeat motifs

EST-SSR marker development, and polymorphic microsatellite loci screening and functional annotation

A total of 2,989 SSR-containing sequences were amenable for primer design, while the remaining 1,384 sequences did not qualify for primer design because the sequences flanking the SSRs were too short or inappropriate for primer design. Of the 2,989 primer pairs, 788 primer pairs were selected for primer synthesis on the basis of a software design score of more than 85. Of these SSR primers, 373 primers (47.3 %) were able to generate PCR products of the expected sizes. Details of these 373 EST-SSR primers are available in Table S1. In addition, 84 primer pairs amplified fragments that were larger than expected, suggesting that there may be an intron within the amplicons. In several cases, it is assumed that the examined polymorphism was caused by an intron polymorphism, which may overshadow a putative polymorphism of the microsatellite. Thus, these 84 primer pairs were excluded from further analyses. The remaining 331 primer pairs failed to amplify bands after subjected to various PCR programs or they generated multiple bands; thus, they were not selected for further analysis. Our analysis of the position of the 373 validated SSR markers within the transcriptome sequences showed that 219 markers (58.7 %) were present in the CDS, 147 markers (39.4 %) were present in the UTR (11.3 % in 3′ UTR and 28.1 % in 5′ UTR), and the remaining seven markers were present in unigenes without any hit to known proteins. In addition, 61.9 % of di-repeats, 75 % of tetra-repeats, 100 % of penta-repeats occurred within the UTR, whereas 82.1 % of the tri-repeats and 73.5 % of the hexa-repeats occurred within the CDS.

All 373 validated primer pairs were first screened for polymorphisms in both parents of one mapping population. The amplification results suggested that 149 primer pairs (39.9 %) were polymorphic, including 54 di-, 41 tri-, 5 tetra-, 1 penta-, and 48 hexa-motif loci. The polymorphic proportion of the di-, tri-, tetra-, penta-, and hexa-motif loci were 38.8, 35.0, 31.3, 33.3, and 49.0 %, respectively. Analysis of the proportion of polymorphic loci among the three types of EST-SSR loci demonstrated that the proportion of polymorphic loci of those located in the 3′ UTR (52.4 %) were the highest followed by those located in the 5′ UTR (41.0 %) and CDS (37.4 %). For the 149 polymorphic loci, a total of 323 alleles were detected, and the average number of alleles per locus was 2.17 with a range of 2–4 (Table S2).

To explore the potential function of these polymorphic SSR-containing sequences, a search was performed in GenBank using BLASTX against 149 transcriptome sequences of the 149 polymorphic markers. These results showed that 108 sequences (72.5 %) of the 108 polymorphic EST-SSRs demonstrated a similarity to protein sequences, while the other 41 (27.5 %) did not show a significant match in the database and thus may encode proteins that were unique to the tree peony genome. Of the 108 annotated sequences, 50 sequences (46.3 %) exhibited putative functions in grape, 18 sequences (16.7 %) were matched to the Theobroma cacao protein, 11 sequences (10.2 %) were matched to the Ricinus communis protein, 9 sequences (8.3 %) were matched to genes in Prunus persica, and the remaining sequences (18.5 %) were matched to genes of other dicotyledonous plants, such as Poplar trichocarpa, Solanum lycopersicum, and among others. None of the sequences were matched to genes in monocotyledons (Table S2).

Polymorphic analysis and transferability of 30 polymorphic EST-SSR markers

In total, 30 EST-SSR markers, which demonstrated polymorphism when tested in both parents of one mapping population, were randomly selected and further used to assess polymorphism in 36 accessions of tree peonies (Table 2, 1–36). The results showed that 30 (100 %) markers were polymorphic in the genotypes examined, which revealed 254 alleles. The number of alleles varied from 3 to 12 with an average of 7.4 per locus. Furthermore, the observed heterozygosity (H o) and expected heterozygosity (H e) ranged from 0.19 to 0.81 (mean 0.52) and 0.43 to 0.87 (mean 0.74), respectively. The PIC values for 30 SSR markers ranged from 0.36 for PS290 to 0.85 for PS004 with an average value of 0.69. These findings indicated a high level of informativeness within these EST-SSRs loci. Furthermore, 28 markers showed PIC values greater than 0.5. In addition, the Shannon index, which can also reflect the polymorphism of markers, ranged from 0.83 to 2.14 and the mean value was 1.56 (Table 3).

Table 2 Abbreviation and origin of 56 accessions used in this study
Table 3 Polymorphism information of 30 EST-SSR markers in 36 tree peony accessions, including the following parameters: the number of alleles (N a), observed and expected heterozygosities (H o and H e), PIC, and Shannon index

These thirty primer pairs were also used to test their transferability across nine Paeonias species (Table 2, 1–7, 49–50). Thirty primer pairs revealed 100 % transferability rates in seven tree peony species, including P. delavayi var. lutea, P. ludlowii, P. jishanensis, P. qiui, P. rockii, P. decomposita, and P. ostii. Of these primer pairs, three primer pairs, PS099, PS106, and PS219 could not amplify a fragment in P. veitchii; two primer pairs, PS099 and PS166, could not amplify a fragment in P. lactiflora; and the remaining 26 primer pairs successfully amplified PCR products in P. veitchii and P. lactiflora. Thus, the transferability rate across nine species ranged from 90 to 100 %, with an average transferability ratio of 95.3 % (Table S3).

Evaluation of the genetic relationship within 56 genotypes using EST-SSR profiles

To evaluate the usefulness of 30 newly developed EST-SSRs, a genetic relationship study was performed in 56 accessions. A NJ tree was constructed based on the EST-SSR data (Fig. 2), in which the accessions in section Paeon (Table 2, 9–56) were distributed at the extreme of the dendrogram and were used as an outgroup. The P. lactiflora and cultivars of herbaceous peony first formed an independent cluster with a 92 % bootstrap value and were subsequently grouped together with P. veitchii with a 56 % bootstrap value. The wild species P. delavayi var. lutea and P. ludlowii with fleshy floral disks formed a separate clade from the other accessions examined in this study, with a bootstrap of 81 %. However, the remaining 34 accessions of tree peonies and 12 itoh hybrids formed another clade, and all their floral disks were leathery.

Fig. 2
figure 2

The NJ tree of 56 accessions was constructed using the PowerMarker version 3.25 based on the data of 30 EST-SSR markers. Bootstrap values over 50 % were displayed above the branch; based on 1,000 resamplings of the data set. The information of the abbreviations represented in the tree can be seen in Table 2

Next, thirty-four accessions of tree peonies and 12 itoh hybrids were primarily divided into two clusters. In cluster I, seven Rockii hybrid cultivars were first distributed in a separate branch with a 92 % bootstrap value and then grouped together with P. rockii and finally grouped together P. decomposita with a 50 % bootstrap value. Cluster II in the NJ tree consisted of all of the cultivar accessions of the Suffruticosa hybrid, Lemoinei hybrid, Itoh hybrid and two P. ostii accessions, which can be explained based on their known origins and previous studies. Cluster II was later separated into two subclusters. In subcluster I, 13 samples fell into one large branch and were concentrated on two small groups with 84 % and 54 % bootstrap values. Six cultivars belonging to the P. suffruticosa Zhongyuan group were distributed in one group. Five cultivars belonging to the P. suffruticosa Japan group and two P. ostii samples comprised another group. In subcluster II, ‘Souvenir de Maxime Cornu’ formed a single branch, and other Lemoinei hybrids except for ‘Chromatella’ formed A, and all Itoh hybrids except for ‘Lafayette Escadrille’ formed B. These results indicated that the germplasms in each classified groups exhibited a great consistency with their origins.

Discussion

Next-generation sequencing (NGS) technologies are becoming increasingly inexpensive and can provide high-throughput data, and enormous quantities of sequencing data can now be generated exponentially and are publicly available. Transcriptome sequencing has been demonstrated to be an effective method in obtaining EST sequences, which are essential for mining a large number of functional SSR markers and in the identification of novel genes. Currently, a large number of EST-SSR markers, based on transcriptome sequences, have been developed in diverse species, such as vetch (Chung et al. 2014), rubber tree (Li et al. 2012b), linseed (Kale et al. 2012), and pigeonpea (Dutta et al. 2011). The development of these gene-based SSR markers has directly or indirectly accelerated progress in MAS breeding (Zhai et al. 2014; Iorizzo et al. 2011).

The proportion of ESTs containing SSRs vary in different plant species. A survey on dicotyledonous species revealed that this ratio ranged from 2.65 to 16.82 % (Kumpatla and Mukhopadhyay 2005). In this study, approximately 6.4 % of the transcriptome sequences possessed at least one SSR, which was similar to reports for alfalfa (6.1 %) (Wang et al. 2013), bermudagrass (6.4 %) (Tan et al. 2012), and orchardgrass (7 %) (Bushman et al. 2011). Moreover, the frequency of the occurrence for EST-SSRs was one SSR in every 9.24 kb, which was much higher compared to alfalfa (1/12.06 kb) (Liu et al. 2013a), pineapple (1/13 kb) (Ong et al. 2011), and lotus (1/13.04 kb) (Pan et al. 2010). However, this value was lower compared to levant cotton (1/2.4 kb) (Jena et al. 2012), castor bean (1/1.77 kb) (Qiu et al. 2010), and gerbera (1/5.6 kb) (Gong and Deng 2010). These results indicated that the frequency of SSRs in EST sequences was highly variable with plant species, and this difference, to some extent, may be attributed to the size of the data set, SSR search criteria, and mining tool used for the SSR search (Varshney et al. 2005).

Until recently, both di- and tri-nucleotide repeats have been found to be predominantly repeat types in many plants, whereas the dominant repeat motifs were usually different (Varshney et al. 2005). In this study, di-nucleotide repeats were the most abundant type and were followed by tri-nucleotides, which were consistent with previous reports on tree peony (Shu et al. 2009; Gai et al. 2012) and other plants, such as pear (Zhang et al. 2014) and tea (Sharma et al. 2009). A potential explanation for the high ratio of di-nucleotide SSR loci in this study was that unigene sequences contained more information representing UTRs. Furthermore, we found that the shorter motifs appeared more frequently compared to longer motifs, except for hexa-nucleotides, which were more frequent than tetra- and penta-nucleotides in the transcriptome. As shown in Fig. 1, the most dominant di-nucleotide repeat motif was AG/CT, followed by GA/TC, and the same results were found in radish (Zhai et al. 2014) and sweet potato (Wang et al. 2011). The most prevalence tri-nucleotide motifs were CCA/TGG and GAA/TTC, which was consistent with the recent results obtained in tree peony (Gai et al. 2012), pineapple (Ong et al. 2011), and rubber tree (Feng et al. 2009). Furthermore, a common feature in most dicotyledonous plants was the rarity of GC/CG in di-nucleotide motifs. There were no GC/CG repeat motifs observed in common bean (Chen et al. 2014) and alfalfa (Wang et al. 2013). However, the GC/CG repeat motif was determined in only 0.4 % of the di-nucleotide repeats in this study, which was consistent with the results obtained in herb epimedii (Zeng et al. 2010) and coffee (Aggarwal et al. 2007).

In this study, 47.3 % of the EST-SSR primers amplified the target genomic DNA, which was lower than the results obtained in Gai’s report in tree peony (Gai et al. 2012) (81.2 %), but higher than the results obtained in Wang’s report in tree peony (Wang et al. 2009) (40.9 %). Such results indicated that the amplification efficiency of EST-SSRs was much higher than that of genomic SSRs, which may be due to the relatively highly conserved transcribed regions in the primer sequences. Compared to other plants, the success amplification rate is comparable to sugarcane, where 48.5 % of the primers amplified (Cordeiro et al. 2001), but higher than alfalfa (30 %) (Wang et al. 2013) and lower than rubber tree (59.8 %) (Feng et al. 2009). The success amplification rate is relatively low in this study which may be due to the presence of introns, alternate splicing, or sequencing splicing technology. To the best of our knowledge, very few research has been performed on the development of so many EST-SSR markers in tree peony in a single study to date. In this study, we obtained 149 polymorphic EST-SSR markers from 373 valid EST-SSRs in the parents, P. ostii ‘Feng Dan’ and P. suffruticosa ‘Hong Qiao’, of our mapping population with a polymorphic ratio of 39.9 %, thereby providing a powerful tool for the construction of an SSR-based linkage map in further studies. Compared to other plants, the polymorphic ratio of EST-SSRs in this study was at the medium level (Varshney et al. 2005). The number of samples and samples obtained from a different geographic origin may result in a different ratio of polymorphism. For example, Wang et al. (2011) found 816 primers yielding amplification products, of which 195 (23.9 %) and 342 (41.9 %) primers exhibited polymorphism between the two mapping parents and among the eight cultivated sweet potatoes, respectively. Thus, if more samples were screened in our study, the number of polymorphic loci would most likely increase.

As previously described, di-, tetra-, and penta-nucleotide repeats were primarily associated with the UTR, while tri- and hexa-nucleotide repeats primarily occurred within the CDS. Approximately 76.7 % of the SSRs that were located in the CDS were tri- and hexa-nucleotide repeats. This appeared to be a common feature of EST-SSRs and has often been found in other organisms (Gupta et al. 2014; Morgante et al. 2002). This may result from selection and evolution because tri- and hexa-nucleotide SSRs are thought to be required in the coding region to maintain the reading frame, while di-, tetra-, and penta-nucleotide SSRs can easily change the reading frame within coding regions and cause a negative mutation when SSR length variation occurs. Our study also revealed that the proportion of polymorphic SSR markers was considerably higher in the UTR compared to the CDS, which was consistent with previous reports in pigeonpea (Dutta et al. 2011) and castor bean (Qiu et al. 2010). The polymorphism detected by EST-SSRs from different EST regions (3′ UTR, 5′ UTR, and CDS) varied across the different taxonomic levels. In general, SSRs that were from the 3′ UTR were most polymorphic at the cultivar level, 5′ UTR SSRs were most polymorphic between cultivars and species, and CDS-SSRs tend to differentiate at higher taxonomic levels (Scott et al. 2000). This may be attributed to less tolerance for mutations in the CDS region compared to the UTR.

The PIC value of each EST-SSR, which reflects allelic diversity and frequencies among the sampled individuals, was used to assess the informativeness level of the markers and were categorised as high (PIC > 0.5), moderate (0.5 < PIC < 0.25), and low (PIC < 0.25) (Bostein et al. 1980). In this study, 28 out of 30 primers exhibited a high PIC value, which indicated good informativeness as a marker. In general, the EST-SSR markers revealed a lower level of polymorphism compared to the genomic SSR markers. However, in this study, it was higher compared to previous reports (Yu et al. 2013; Zhang et al. 2011). Some potential reasons include (1) employment of highly polymorphic EST-SSR markers for diversity assessment; and (2) use of a diverse set of genotypes derived from intra- and inter-specific hybridisation and wild species.

Located in transcript regions, EST-SSRs display a high degree of transferability to related species. Ellis and Burke (2007) summarised the transferability of EST-SSR among plant taxa and demonstrated a variation range of EST-SSR cross-genera transferability from 10 to 90 %. Our result showed that the EST-SSR identified in P. suffruticosa had a high transfer rate (95.3 %) in other species of Paeonia. A high transferability of EST-SSR was also reported in other plants, such as eucalyptus (Zhou et al. 2014) and citrus (Luro et al. 2008). This high success rate indicated that different species in the genus Paeonia may be evolutionarily closely related. Thus, these novel EST-SSR markers may be used as anchored markers for future comparative mapping and evolutionary studies in the Paeoniaceae family. The correlation between the index of transferability and PIC values of the SSR markers was not observed in this study.

In recent years, SSRs have been extensively applied in genetic diversity analysis, genetic relationship analysis, linkage mapping and association mapping studies (Wang et al. 2013; Liu et al. 2013b; Agrama et al. 2007). In this study, the validity of the developed EST-SSR markers were investigated for genetic relationships in the collected Paeonia germplasms of 56 accessions. Cluster analysis in the NJ tree demonstrated that major clusters corresponded to known pedigree trees. In the NJ tree, P. rockii and its seven cultivars group were closely relative to P. decomposita, which was consistent with recent molecular markers studies (Zhao et al. 2008; Yu et al. 2013). To the best of our knowledge, the ancestors of the Japanese cultivars were from China, particularly from the Zhongyuan groups. This close relationship between the Japanese and Chinese cultivars is supported by our clustering pattern. P. ostii is thought to contribute to the formation of the Suffruticosa hybrid (Cheng 2007). Indeed, it shares more genetic similarity and is distributed as a sister clade among the cultivars of the Japan group in this study. Subcluster II in the NJ tree consisted of all of the cultivar accessions of Lemoinei hybrid and Itoh hybrid. ‘Souvenir de Maxime Cornu’ formed a single branch, and other Lemoinei hybrids except for ‘Chromatella’ formed A, and all Itoh hybrids except for ‘Lafayette Escadrille’ formed B. ‘Souvenir de Maxime Cornu’ from France were the crossbred offspring of the P. delavayii and P. suffruticosa Zhongyuan Group, whereas the other nine Lemoinei hybrids originated from similar crosses, but pollen was collected from the P. suffruticosa Japan Group (Cheng 2007; Wister 1995). Itoh hybrids in this study were the result of the intersectional hybridisation between P. lactiflora and the Lemoinei hybrid. We also found that there was some incongruence in a few branches. For example, the Itoh hybrid ‘Lafayette Escadrille’ and Lemoinei hybrids ‘Chromatella’ formed a branch. The incongruence between the dendrogram and the division of cultivars might be due to the limited number of samples. Clustering results also indicated that the EST-SSR markers were an informative and effective tool for the analysis of genetic relationships among a wide range of diverse accessions and exhibited promise for the development of additional molecular markers, which could reliably determine the relative genetic contributions of specific populations in existing and newly developed cultivars.

In conclusion, using transcriptome sequences to identify EST-SSR markers is a convenient and cost-effective approach in tree peony. A large set of EST-SSR markers identified in this study has remarkably increased the molecular marker repository for tree peony. With its high polymorphism and transferability, these novel EST-SSR markers will represent a powerful molecular tool for germplasm identification, genetic diversity analyses, genetic relationship studies, and comparative mapping in tree peony and relative species. Moreover, these markers may also be valuable for the construction of genetic linkage maps, which are a prerequisite for the generation of marker–trait relationships and the application of MAS in the future improvement of this important flower crop.