Introduction

Lilium spp., perennial ornamental plants belonging to the Liliaceae family [1], are widely distributed in the cold and temperate regions of the Northern Hemisphere [2]. In addition, Lilium are the second most popular cut flower in the world, more than 10,000 cultivars have been registered by worldwide breeders which prove its popularity. Lilium is taxonomically classified into seven sections: Martagon, Pseudolirium (2a, 2b, 2c, 2d), Archelirion, Liriotypus, Sinomartagon (5a, 5b, 5c), Leucolirion (6a, 6b) and Daurolirion [3]. While the Lilium cultivars are mainly separated into four groups: Asiatic (A), Oriental (O), Longiflorum (L), and Trumpet (T) [4], and they cross with each other to establish different hybrid groups (LA, OT, LO, OA, et al.). China is the central distribution area of Lilium with approximately 55 native species [5, 6] distributed in 29 provinces [1]. In addition to the ornamental value, bulbs of some Lilium species are rich in nutrients such as carbohydrates, proteins, polysaccharides, and low-fat content which can be used as food [7]. Modern studies also found that lily bulbs contain steroidal saponins [8], alkaloids, and flavonoids [9]. Three Lilium bulbs including L. lancifolium, L. pumilum and L. brownii var. viridulum have been recorded in the Chinese Pharmacopeia with the function of curing bronchitis, pneumonia, and cough [7].

Molecular marker-assisted breeding has been considered a novel way to shorten the breeding cycle of plants, but it requires a great deal of molecular marker information. Thus the development of molecular markers can help to characterize and identify lines for further use inefficient breeding programs of Lilium. In addition, although the two systematic taxonomic classification systems have been widely accepted until now [3, 4], there are still some uncertain nodes in the phylogeny of Lilium, which may be due to the lack of appropriate markers. With the development of molecular biology techniques, SSR (Simple Sequence Repeats) markers have become widely employed in plant genetic research [10,11,12] as well as identification of hybrid authenticity due to the advantages of better stability, high polymorphism, codominance, and easy to use [13, 14]. The expressed sequence tag-simple sequence repeat (EST-SSR) markers, which were more accessible to the studies based on plant functional traits than other genomic SSR types [15], can be developed from the transcriptome data. One hundred and twelve EST-SSR markers had been developed in Oriental L. ‘Sorbonne’, of which 70 were selected to assess the genetic diversity in Lilium [16]. Twenty-six EST-SSRs had been developed in five Lilium species, in which 19 EST-SSRs were identified with higher polymorphisms among the Asiatic and Oriental cultivars [17]. One hundred and seventy-two EST-SSRs had been developed in L. regale by Yuan et al. [18]. One hundred and thirteen SNP markers and 292 common EST-SSR had been developed in four Lilium cultivars and tulips [19]. However, because of the complex genetic background of Lilium species, more molecular markers need to be developed.

The double-flowered lily cultivars are preferred by many people because of their showy appearances and the absence of pollen which can prevent pollen contamination. ‘Elodie’ was a special Lilium cultivar that had different petaloid organ types according to the petaloidy degree of whorl 3 when planted under the same conditions [20]. The unstable flowering phenotype may result in lower commercial value and ornamental quality, however, it also was a good cultivar to explore functional markers or key genes related to the degree of petaloidy of the stamens. Until now, 4099 EST-SSRs of Lilium can be found in the NCBI databases, however, compared with the enormous genome size, the molecular markers, genome, as well as specific functional genes in Lilium were still not enough. Thus we conducted the RNA-seq analysis by using the whole three organs as a sample [20]. In addition, developing SSR markers from the transcriptome database, analyzing their distribution and composition characteristics, and evaluating the polymorphism will contribute to the analysis of the genetic diversity of Lilium. In this study, we developed novel polymorphic EST-SSR markers from an Illumina transcriptome sequencing data set of L. ‘Elodie’ and tested the genetic diversity of 65 germplasm resources. Results can provide a theoretical basis for Lilium germplasm identification and genetic relationship analysis of Lilium.

Materials and methods

Transcriptome sequencing

Whole three of the petaloid stamen, partially petaloid stamen, and normal stamens of L. ‘Elodie’ were sampled for the transcriptome sequencing. RNA was extracted using the Trizol reagent (Invitrogen, CA, USA), and an RNA-Seq data set containing 1078,058,364 raw reads was generated by Hangzhou Lianchuan Biotechnology Co., Ltd using an Illumina Hiseq 4000 platform. Reads of the three separated libraries were mixed together to assemble and finally normalized to obtain unigenes. The transcriptome data have been submitted to the NCBI Sequence Read Archive database under the accession number PRJNA680629 [20] and the data were analyzed as background data.

Materials

A total of 65 germplasm resources were selected for genetic diversity analysis including 22 Asiatic varieties (A), 18 Oriental varieties (O), seven LA varieties, nine OT varieties, one LL variety, one LO variety, six native species and one variety (Table 1, Supplementary file 1). Six cultivars (‘Matrix’, ‘Roselily Elena’, ‘Heartstrings’, ‘Zambesi’, ‘Fusion’, and ‘S-1’) were selected for primer screening.

Table 1 The germplasm collections of Lilium used in this study (A germplasm resource information is missing in this table. I will send you the complete table by email separately)

Total genomic DNA was extracted from leaves with a plant genomic DNA extraction kit (Tolo Biotech., Shanghai) according to the manufacturer’s protocol. The DNA preparations were quantified on 0.8% agarose gels and the DNA concentration was examined using a Nano-Drop Lite spectrophotometer (Thermo Scientific, Massachusetts, USA).

SSR locus identification and primer design

The SSR screening was performed using the MicroSatellite (MISA, http://pgrc.ipk-gatersleben.de/misa/misa.html) [21] within the identified unigenes in the transcriptome, search with two, three, four, five, and six nucleotide repetitive unigenes. The microsatellite characteristics were analyzed statistically by Excel 2007. Primer 3.0 [22] was used for the obtained SSR sequences to design primers batch, and ensure the high amplification efficiency of primers. Primer selection criteria: the repeat unit was two, three, four, and five bases and only one repeat type, the fragment length was greater than 150 bp and less than 300 bp. 20–23 bp, TM value of 60 degrees, the number of base repeats was less than or equal to four. There were no two consecutive A/T bases at the 5′ and 3′ ends, and no repeated sequence in the primer.

PCR reaction system and electrophoresis detection

Then 100 primer pairs (Supplementary file 2) were selected for the initial screening to ensure the polymorphism as well as the optimal annealing temperature of each pair. The PCR reactions were performed in Thermo Cycler ABI2720 (Applied Biosystems, Foster City, California, USA) following the standard protocol of the 2 × Master mix (Tsingke Biotechnology Co., Ltd.) in a final volume of 20 µL, which containing 1 µL of genomic DNA, 0.15 µL forward primer (10 µM), 1.2 µL reverse primer (10 µM), 10 µL of 2 × Tsingke Master mix, 6.45 µL of deionized water and 1.2 µL Premix Taq™ Hot Start Version (TaKaRa, China). The PCR procedure was as follows: 5 min of initial denaturation at 95 °C, 35 cycles of 94 °C for 30 s, annealing temperature from 54 °C to 60 °C for 30 s, and 72 °C for 30 s; followed by a final 5 min extension step at 72 °C and a 4 °C holding temperature. The PCR products were separated by 3% agarose gel electrophoresis. Primer pairs that amplified strong bands within the expected size were selected for the second polymorphic screening of the PCR fluorescent labeling technique. The amplification system and procedure were the same as the first time. Then, the PCR products with distinct bands were run on an ABI 3730 Genetic Analyzer, and data was obtained using GeneMapper 4.1 software (Applied Biosystems, Foster City, CA, USA).

EST-SSR analysis, cluster analysis, and population structure analysis

Screened primers with better polymorphism and higher specificity were used for the diversity analysis in the 65 varieties combined with the other 10 previously reported primers [18, 23] (Table 2). Gene Mapper 4.1 software was used to conduct the fragment analysis. The size of the acquired SSR marker was converted by Convert 1.31 [24]. POPGEN version 1:32 program [25] was conducted to generate the observed number of alleles (Na), effective number of alleles (Ne), observed heterozygosity value (Ho), expected heterozygosity value (He), polymorphism information content (PIC). A dendrogram of an UPGMA (unweighted pair-group method with arithmetic means) cluster analysis was generated in population genetic software of Populations 1.2.30 [26]. ITOL V6.5.2 (http://itol.embl.de/) [27] was used to modify the cluster picture.

Table 2 Characteristics of 16 polymorphic EST-SSR markers in Lilium

The population structure was assessed in a model-based approach using STRUCTURE 2.3.4 [28]. The run length was set to 200,000 MCMC (Markov chain Monte Carlo) using correlated allele frequencies under an admixture model. Structure analysis was carried out by setting the number of sub-populations (k) from k = 1 to k = 10. The K value is the corresponding K value when the likelihood value LnP(D) is maximum. If there is no obvious turning point in LnP(D), the method combining LnP(D) with a specific statistic ΔK is needed to determine the most appropriate K value [29]. When ΔK is the maximum value, the corresponding K value is the best. The results of 10 repeated operations of the best K value were calculated with the online software STRUCTURE HARVESTER [30].

Results

SSR identification and primer polymorphism analysis of the transcriptome data

Totally, 5509 potential EST-SSR motifs were identified and used for the mining of EST-SSRs. The SSR distribution characteristics of the two, three, four, five, and six nucleotide repeat motifs were shown in Supplementary file 3 and Supplementary file 4. The proportion of different SSR types varied greatly and mainly focused on dinucleotide and trinucleotide repeats. The trinucleotide repeats number was 1446, accounting for 26.25% of the total SSR repeats, followed by dinucleotide repeats (1318), accounting for 23.92% of the total repeats. The number of four, five, and six nucleotide repeats types were 27, 11, and 43 respectively, totally accounting for 1.47% of the total SSR repeats. There were seven dinucleotide dominant motifs and 29 trinucleotide dominant motifs. Among the dinucleotide repeat motifs, the dominant repeat was AG/CT (9.29%) followed by GA/TC (7.01%). The dominant tri-nucleotide repeat was CCG/CGG (3.49%) (Supplementary file 5).

Development of EST-SSR markers and EST-SSR analysis

A total of 2057 primer pairs were designed from flanking SSR regions by Primer 3 software. One-hundred primers were selected randomly (Supplementary file 2) and screened in eight varieties including ‘Matrix’ (A), ‘Roselily Elena’ (O), ‘Heartstrings’ (LA hybrids), ‘Zambesi’ (OT hybrids), ‘Fusion’ (LL hybrid), ‘Claude Shride’ (Matagon), L. regale, and ‘S-1’ (varieties). Forty-four primer pairs (44%) did not give any amplification products. A total of 56 primer pairs (56%) produced reproducible polymorphic fragments. From these 56 primer pairs, a total of 13 pairs of primers with clear and polymorphic bands were screened, in which six pairs of stable polymorphisms primers (SSR45, SSR59, SSR71, SSR77, SSR84, and SSR100) were the best, other seven primers (SSR22, SSR23, SSR43, SSR47, SSR60, SSR63, and SSR81) could not be amplified in partial varieties. Thus six polymorphic primers were selected in the 13 primers to conduct fluorescent labeling PCR combined with the other 10 previously reported primers [18, 23].

Allele richness of SSR loci

Sixteen pairs of primers detected a total of 250 allele loci in 65 sample individuals (Table 3). Among them, the minimum number of alleles was four, the maximum number of alleles was 33, and the average number of alleles per site was 15.6250. The total number of effective alleles was 92.432, and the value range was 1.4826 (SSR59) to 12.1857 (IVFLMRE107). The average number of effective alleles per locus was 5.7770. The Shannon Index (I) ranged from 0.6635 (SSR59) to 2.9552 (IVFLMRE725), with an average value of 1.9578. The value range of polymorphic information content (PIC) was 0.3255 (SSR59) to 0.9179 (IVFLMRE107), and the average value was 0.7583. Among them, 16 pairs of primers had higher polymorphic information (PIC > 0.25). In summary, the 16 EST-SSR primers had higher polymorphisms. The numerical ranges of the observed heterozygosity (Ho) and expected heterozygosity (He) were 0.2917 (SSR59) to 0.5122 (IVFLMRE381) and 0.3289 (SSR59) to 0.9302 (IVFLMRE107), respectively, with the mean values of 0.4111 and 0.7658.

Table 3 Genetic characterization of 16 polymorphic EST-SSR markers in Lilium

Clusters based on the dendrogram

Cluster analysis was performed and the 65 samples could be roughly divided into two groups (Fig. 1). Group 1 included accessions of 18 Oriental varieties, nine OT (Oriental×Trumpet) varieties, one LL (L. pardalinum × L. longiflorum) variety, and one LO (Longiflorum×Oriental), and four native species. Group 2 included accessions of 22 Asiatic varieties, seven LA (Longiflorum×Asiatic) varieties, and two native species (L. wenshanense, L. lancifolium), and one unknown variety. The A and LA samples were divided into one group, and there was an obvious genetic distance among partial A and LA varieties, which could be divided into two clusters. The unknown variety S-1 was divided into the LA clusters. O and OT samples were clustered into one group, while the genetic distance between O and OT was not obvious, there was no significant separation. Four native species (L. sargentiae, L. henryi, L. leucanthum, L. leucanthum var. Centifolium) were clustered together. L. leucanthum var. Centifolium was the variant type of L. leucanthum, our results showed that they were grouped. L. lancifolium ‘Flore pleno’ was the double type of L. lancifolium, our results showed that they were grouped. There was more cross-mixing between groups and populations, indicating that there was more genetic variation within the population.

Fig. 1
figure 1

UPGMA cluster phenogram showing the genetic relationship among 65 Lilium accessions

According to the population structure analysis, the optimal delta K value calculated by Evanno et al. [29] in STRUCTURE HARVESTER of the 65 Lilium samples was 2 in our research, indicating the existence of two major gene clusters. Different gene clusters were represented by different colors (Fig. 2). The STRUCTURE results grouped the 65 genotypes into two subgroups which had the same results as the UPGMA cluster analysis.

Fig. 2
figure 2

STRUCTURE analysis of 65 Lilium germplasm resources. A Line chart of K value with Delta K value; B Line chart of K value with Ln probability of data; C Model-based structure of Lilium germplasm resources (K = 2). A Magnitude of Delta K as a function of K; B Plot of the Ln probability of data, over 5 runs for each K value; C Different colors represent different gene banks

Discussion

A large number of high-quality clean reads, which were obtained from RNA-sequencing data of L. ‘Elodie’, were assembled. A total of 1067,547,290 unigenes were obtained, which provides a substantial basis for EST-SSR identification and characterization [20]. Simple sequence repeats have been applied to plant genetics and breeding owing to their advantages including codominant inheritance, hypervariability, multiallelic nature, reproducibility, relative abundance, and extensive genome coverage [31]. This study helped to identify 5509 potential SSRs (10.36%) in the 53,182 annotated unigenes which was higher than other reports in Lilium [18, 23] but lower than some plants such as Camellia (15.5%) [32], Ricinus communis (28.4%) [33] and Coffee [34]. Du et al. [23] reported that Lilium had more trinucleotide motifs than dinucleotide motifs, the same result was noticed in our study with 1,446 trinucleotide motifs and 1,318 dinucleotide motifs. Clear motif type bias reported in L. ‘Elodie’ was AG/CT (9.29%) for dinucleotide, the same result had been described in other Lilium species and cultivars [17, 18, 23]. In addition, the most abundant trinucleotide motif type in L. ‘Elodie’ was CCG/CGG (3.49%), which was also reported as the most abundant tri-nucleotide repeats in other monocot plants [21, 35,36,37].

The development of SSR markers for genetic diversity screening of Lilium germplasm has been widely adopted [16, 18, 23]. In the present study, 44 primer pairs in the 100 selected SSR primers did not give any amplification products, the main reason was probably due to the large genome of Lilium and the presence of large introns within the primer binding regions which might prevent the primer annealing [38]. A total of 56 primers (56%) produced reproducible polymorphic fragments, from which six primers amplified in the six different samples were selected for the diversity assessment of Lilium. Because only one organ including petaloid stamen, partially petaloid stamen, and normal stamen of L. ‘Elodie’ was sampled for transcriptome sequencing, the sequencing data varied a lot and might hinder the possibility to distinguish isoforms, other ten SSR markers developed by other researchers from different cultivars and tissues [18, 23] were also selected for polymorphism screening and genetic diversity analysis. Totally, the 16 primer pairs detected 250 alleles, with an average of 15.625 alleles per SSR locus. In addition, SSR primers amplified with an average PIC of 0.7583, the results were consistent with the previous reports of 0.76 in 84 Lilium germplasm accessions [17]. However, other studies in Lilium reported comparatively lower rates of genetic diversity with average PIC of 0.55 [23], 0.53 [16], and 0.493 [18], which might be related to the number of markers and germplasm resources. The number of alleles observed at each locus in the data set was high, ranging from 1.4826 (SSR59) to 12.1857 (IVFLMRE107), indicating the capacity of the SSRs to detect relatively higher polymorphism. The EST-SSR markers developed from the transcriptomic data was conserved across the genomes of the closely related species, thus the polymorphic EST-SSR markers had transferability among species [31]. Our results indicated that the six EST-SSRs developed in L. ‘Elodie’ (LA) had been cross-amplified in other species of Lilium, which was consistent with other reports [17]. Most of the SSR markers used in this study had a PIC value ≥ 0.3255, and primer IVFLMRE725 had a PIC value of 0.9151 with the highest number of alleles (33) denotes the strong correlation between PIC and allele richness, the same results were also reported in Vigna unguiculata spp. sesquipedalis [39]. In addition, ten previously reported primers were mostly developed from different cultivars and organs, which can enrich the limitations of EST-SSR markers development in single tissue in this study. These EST-SSR markers can be used as candidate markers for future genetic map construction and evolutionary analysis in Liliaceae plants.

According to the Royal Horticultural Society (RHS), the genus Lilium was classified into nine groups based on a particular characteristics or a combination of characteristics [3]. The Asiatic subgroup was composed mainly of AA lines. L. lancifolium, one of the breeding parents involved in currently available cultivars of Asiatic hybrids, should be classified into the Asiatic group. Our result was in agreement with the genetic background of L. lancifolium, L. lancifolium ‘Flore pleno’ (A), the double flower cultivar of L. lancifolium, was grouped together with L. lancifolium with a single flower. LA lines were hybrids of L. longiflorum and L. formolongi [40], which was supported by the clustering pattern. L. lancifolium was the partial original parent of LA lines [41], therefore, it had a closer relationship with the AA lines, L. Longiflorum × Asiatic hybrids were genetically closer to Asiatic hybrids than to Oriental hybrids. Oriental hybrids were derived from eastern species such as L. auratum, L. japonicum, L. rubellum, L. speciosum and their hybrids with L. henryi. In the present study, L. henryi was classified into the Oriental subgroup, which was in agreement with the genetic background of L. henryi. L. sargentiae was involved in the currently available cultivars of Trumpet hybrids and Aurelian hybrids [42]. L. leucanthum var. Centifolium and L. leucanthum had the same trumpet flowers and they were classified into the same group with L. sargentiae. OT lines were Oriental trumpet hybrids derived from the progenies of Oriental hybrids and Chinese aurelianese [43]. Lee et al. [17] also clustered OO and OT into one group. This finding was also supported by our clustering pattern. Population structure analysis was useful in understanding genetic diversity analysis of Lilium, 65 germplasm resources were divided into two major clusters with the same results as UPGMA analysis method.

In our research, ‘Fusion’ (LL) and ‘White Triumph’ (LO) were both grouped into the Oriental subgroup which may be because fewer LL and LO sub-genome germplasm resources were collected. The germplasm resources between LA varieties and Asiatic hybrids, OT varieties and Oriental hybrids were still difficult to be distinguished, which had been reported in previous papers on Lilium [16, 17], thus the number of EST-SSRs is still needed to be increased in the future. In addition, it is necessary to develop more specific EST-SSR markers to distinguish the four sub-genomes (O, L, A, and T) of Lilium.

In general, although 4099 EST-SSRs of Lilium have been reported in the NCBI databases, new EST-SSRs are necessary to be developed to supplement the Lilium genetic information. In addition, the germplasm diversity evaluation of Lilium is also important work to understand the genetic relationship between cultivated species and wild species. In this paper, six new EST-SSRs were developed to enrich the biological data of Lilium by transcriptome sequencing. We collected some new varieties which were different from previous papers. Therefore, the genetic relationship analysis based on the resources was conducive to distinguishing the genetic relationship and serving the specific breeding objectives in the future.

Conclusion

In this study, EST-SSRs were analyzed from an Illumina transcriptome sequencing data set of L. ‘Elodie’. After primer screening, six pairs of polymorphic EST-SSR markers were selected from 100 pairs with the most abundant polymorphic bands. The newly identified SSRs combined with the other 10 reported SSRs were verified in the genetic diversity analysis of 65 Lilium germplasm resources. Two major clusters were reported and a large number of genotypes were grouped together based on UPGMA and STRUCTURE analysis methods. This study provides valuable information for Lilium, as well as the molecular-marker-assisted breeding in the further.