Introduction

The genus Casuarina belongs to the Casuarinaceae which is member of the major group of Angiosperms. Species in the genus Casuarina, commonly known as casuarinas, beefwoods, she-oaks, Australian pines or ironwoods, are fast-growing evergreen trees native to Australia, Southeast Asia and the Pacific archipelagos (Wheeler et al. 2011). Many casuarinas are economically and ecologically important species. They are valued as pioneer trees for degraded sites and for soil improvement and no other species can replace casuarinas at the foreshores. Along the southern coasts of China, Casuarina trees are also commonly planted as windbreaks to stabilize the moving sand, used in agroforestry, and for general rehabilitation and wood production programs (Zhong et al. 2010). However, Casuarina trees suitable for constructing coastal shelter forest are few and low in quality. Low temperature is one of the main limiting factors for cultivation expansion, fast growth, and high yield of Casuarina species. So it is very important to breed high-quality Casuarina species with cold-tolerant genotypes.

Molecular marker-assisted breeding is considered a novel way to reduce the breeding and selection processes indirectly, shortening the breeding cycle, especially beneficial for resistance breeding (Du et al. 2015). Microsatellites, or simple sequence repeats (SSRs), have been applied for construction of chromosome maps and cultivar identification (Song et al. 2004; Tsukazaki et al. 2008, 2009, 2015). Compared with other molecular markers, SSR markers are more ideal due to their reproducibility, multi-allelic nature, co-dominant inheritance, relative abundance, and good genome coverage (Tóth et al. 2000). SSRs are generally categorized into two common groups based on their origins, genomic SSRs from genomic sequences, and expressed sequence tag (EST)-SSRs from transcribed RNA sequences (Varshney et al. 2005). As one of the powerful next-generation sequencing (NGS) methods, Illumina RNA-seq has proven to be robust and efficient with higher throughput and much lower cost (Garg et al. 2011; Grabherr et al. 2011; Ono et al. 2011; Yang et al. 2011; Feng et al. 2012) than other systems, and has been used in many plants to reveal gene annotation and expression under biotic and abiotic stresses (Marguerat and Bähler 2010; Oshlack et al. 2010; Mochida and Shinozaki 2011; Wang and Chen 2012; Postnikova et al. 2013). In addition to the discovery of new genes and investigations of gene expression, RNA-seq also offers many advantages, including the detection of thousands of single nucleotide polymorphisms (SNPs) and EST-SSR loci for markers development. A large number of SNPs and SSR markers will provide the foundation for future research to address questions in population and evolutionary genetic studies.

Casuarina equisetifolia L. has the largest native range in coastal regions of Northern Australia, Malaysia and Southeast Asia, and Oceania (Parrotta 1993). Presently, C. equisetifolia genome has not yet been fully sequenced. Kullan et al. (2016) validated 42 SSR markers from 86,415 ESTs of C. equisetifolia and C. junghuhniana. However, the number of SSR markers reported for Casuarina species is still far fewer than other commonly planted tree species such as Eucalyptus (Rabello et al. 2005), Live oaks (Cavender-Bares et al. 2015), Pine (Feng et al. 2014), Kandelia candel (Islam et al. 2006, 2008), etc. It is necessary to develop more novel cross-species transferable SSR markers for promoting the genetic and breeding studies in Casuarina species.

In previous work, we performed a global survey for transcriptome profiles of C. equisetifolia via the Illumina HiSeq 2500 platform. A total of 118,270 unigenes were obtained by de novo assembly, creating an initial reference transcriptome (Li et al. 2017). Hence, in the present study, our aim was to: (1) identify a large number of EST-SSR markers based on the C. equisetifolia transcriptome information obtained; (2) validate the identified EST-SSR markers for the detection of polymorphism in Casuarina and Allocasuarina species; (3) evaluate the transferability of the validated EST-SSR markers across Casuarina and Allocasuarina.

Materials and methods

Plant materials and genomic DNA extraction

A cold-resistant C. equisetifolia individual tree (Zhou 7) was used for RNA-seq. For genetic relationship analysis, fresh juvenile needles of 26 genotypes from four Casuarina species (C. equisetifolia, C. cunninghamiana, C. glauca, and C. cristata) and three Allocasuarina species (A. huegeliana, A. littoralis and Allocasuarina sp.) were collected from the Zhoushan Germplasm Bank of Institute of Tree Breeding, Zhejiang, China and dried in silica gel for DNA extraction. Detailed information for the plant materials is listed in Table 1. Genomic DNA of 26 genotypes was extracted from fresh juvenile needles using the Plant Genomic DNA Extraction kit (Bioteke, Beijing, China) following the manufacturer’s instructions. The DNA quality was evaluated by electrophoresis through a 1.5% agarose gel, and the DNA concentration was determined using a Thermo Scientific NanoDrop™ 2000C spectrophotometer (Nanodrop Technologies, South Logan, Utah, USA).

Table 1 Seed source of Allocasuarina and Casuarina species used in this study

Identification of EST-SSR markers

The MicroSAtellite program (MISA, http://pgrc.ipk-gatersleben.de/misa) was used to detect and locate SSRs from unigenes (Sharma et al. 2007). The search criteria for simple motifs (di-, tri-, tetra-, penta-, and hexanucleotides) and complex (imperfect) nucleotides were a minimum of five and three repeats, respectively. Compound SSR markers are SSR loci containing multiple different repeat motifs, separated by non-repetitive sequences. In this analysis, the maximum size of interruption allowed between two different SSRs in a compound SSR was 100 bp. Mononucleotide repeats were excluded.

Development and selection of EST-SSR primers

SSR primer pairs of each unique microsatellite locus were designed using Primer 3.0 (Rozen and Skaletsky 2000), with target microsatellites containing at least five repeats and yielding PCR products of 80–500 bp size. The genomic DNA of 12 genotypes were first used for primary screening of 150 SSR primer pairs. Each 20 μL PCR reaction mixture contained 10 μL 2 × Power Taq PCR Master Mix (Bioteke, Beijing, China), 1.5 μL SSR forward and reverse primer (each 10 μM), 3 μL DNA templates (20 ng/μL), and 4 μL sterile distilled water. SSR-PCR amplification was performed in a DNA thermal cycler (Life ECO, Hangzhou Bioer) using the following parameters: 7 min at 94 °C, followed by 30 cycles of 45 s at 94 °C, 45 s at 55 °C, 2 min at 72 °C, and a final extension step was for 7 min at 72 °C. The PCR products were resolved by 6% polyacrylamide gel electrophoresis (PAGE) and silver staining in accordance with the protocol described by Creste et al. (2001).

EST-SSR markers validation and genetic relationship analysis

Fifteen polymorphic SSR primers were used as valid SSR markers for genetic relationship analysis among 26 genotypes from Casuarina and Allocasuarina species. The SSR-PCR bands obtained were scored as presence (1) or absence (0) and two binary qualitative data matrices were constructed. A pair-wise similarity matrix was constructed using the Dice similarity coefficient (DSC). The relationship between the genotypes was displayed as a dendrogram, constructed using NTSYS-pc 2.2 r software (Rohlf 1992) based on unweighed pair group method with arithmetic mean (UPGMA). Tree confidence was also performed by a bootstrap analysis with 100 replications using the DendroUPGMA tool available at http://genomes.urv.cat/UPGMA (Garcia-Vallve et al. 1999). For each locus, the number of alleles and allele frequencies were calculated using the program POPGEN version 1.32 (Yeh et al. 2000). The polymorphism information content (PIC) (Botstein et al. 1980) of each EST-SSR primer pairs was calculated using PIC-CALC 0.6 software.

Results

Distribution and frequency of the EST-SSR motifs

The EST-SSRs were mined from C. equisetifolia transcriptome data reported in the earlier study by Li et al. (2017). Based on the transcriptome data obtained in this study, a total of 10,291 SSRs were identified (8.7%) from the 118,270 assembled unigenes with 1089 unigene sequences containing more than one SSR. Among the identified SSRs, 344 (3.34%) were present in compound formation, while others were of perfect one-repeat type. Total size of examined sequences was 139,375,239 bp, and the distribution density was one SSR for every 13.54 kb. The total sequence length of SSR loci was 153,984 bp with an average of 14.96 bp. The average repeat sequence length of di-, tri-, tetra-, penta-, and hexanucleotide was 13.95, 16.12, 19.32, 27.50, 33.05 bp, respectively (Table 2). The number of SSR repeats ranged from 5 to 12, and the most frequent repeat was six (3158 or 30.69%), followed by five (2154 or 20.93%), seven (1765 or 17.15%) and nine (1067 or 10.37%). Among these repeats, dinucleotide was the most frequent type (6546 or 63.61%), followed by trinucleotide (3356 or 32.61%) and tetranucleotide (306 or 2.97%). There were far less pentanucleotide and hexanucleotide repeat motifs, only accounting for 0.23% (24) and 0.57% (59), respectively (Table 3).

Table 2 Distribution of the EST-SSR motifs in transcriptome of C. equisetifolia
Table 3 Distribution of the number of repeats in EST-SSR loci in transcriptome of C. equisetifolia

Distribution and frequency of EST-SSR types

A total of 82 simple repeat motifs were identified in C. equisetifolia. The most abundant motif was AG/CT (54.24%) for dinucleotide, and AAG/CTT (13.02%) for trinucleotide motif. The motifs with low proportion of less than 1.0% were ACT/AGT (0.32%), CG/CG (0.33%), CCG/CGG (0.83%), and ACG/CGT (0.90%) (Table 4).

Table 4 Number and frequencies of the main repeat motif types in C. equisetifolia transcriptome

Validation and evaluation of polymorphic SSR markers

Of the total 10,291 EST-SSR loci identified from C. equisetifolia, 3745 were of trinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide motifs with the amplicons size ranging between 100 and 300 bp. To validate the identified SSR markers, we designed 150 pairs of SSR primers among the 3745 SSR loci for PCR screening. These primers were tested for DNA amplification in 12 genotypes from Casuarina and Allocasuarina species. Subsequent to PCR, 55 SSR primers generated products in the expected size, among which, 15 (27.3%) produced clear, reproducible and polymorphic bands (Fig. 1). The 15 polymorphic SSR primers were, therefore, used as valid SSR markers for further genetic diversity study.

Fig. 1
figure 1

Polyacrylamide gel electrophoretic pattern of 26 Casuarina and Allocasuarina genotypes amplified with the primers of SSR marker CeUg19361. Lane M molecular marker; Lanes from 1 to 26 correspond to the 26 Casuarina and Allocasuarina genotypes listed in Table 1; The numbers on the left of the figure indicate the DNA size markers in kilobases (kb)

Genetic relationship analysis among 26 genotypes from four Casuarina species and three Allocasuarina species listed in Table 1 were performed with the 15 SSR primer pairs. A total of 42 polymorphic bands were detected from the 15 SSR loci with the size ranging from 131 to 280 bp. The number of bands detected by each SSR primer pairs ranged from 2 to 5, with an average of 2.8. The PIC value varied from 0.2624 to 0.6177 with a mean value of 0.4265 (Table 5). The PIC index describes diversity within accessions (intra-populational diversity) and evaluates the degree of polymorphism in each locus, a PIC value of less than 0.25 indicating low polymorphism, a value between 0.25 and 0.5 average polymorphism and a value higher than 0.5 a highly polymorphic locus (Botstein et al. 1980). Out of the 15 loci, 5 loci had PIC values exceeding 0.5 and could be considered highly polymorphic, while 10 loci had average polymorphism.

Table 5 Characteristics of 15 SSR markers used in this study

Genetic relationship analysis

Based on 42 polymorphic bands detected from 15 SSR loci, 26 genotypes belonging to four Casuarina species and three Allocasuarina species in the UPGMA dendrogram were clearly divided into three main clades (I, II, III) with low genetic similarity coefficients (GSC) level of 0.53 with Dice, which was well supported by a 100% bootstrap value (Fig. 2). The clade I, supported by a 72% bootstrap value, consists of mixtures of species including A. huegeliana, Allocasuarina sp., A. littoralis and C. cunninghamiana. The species C. cristata formed a independent clade II, supported by a 77% bootstrap value. The species of C. equisetifolia and C. glauca representing the clade III with a 100% bootstrap support value is well separated from clade I and II, indicating that a distinct genetic relationship between the two species within clade III and the other five species within clade I–II. Meanwhile, low levels of intraspecific variation within Casuarina and Allocasuarina species were demonstrated, with high GSC level by Dice ranging from 0.68 (C. cristata) to 0.93 (A. littoralis). At the interspecific level, a low genetic variation resided between the species of C. equisetifolia and C. glauca at GSC level of 0.70, A. littoralis and C. cunninghamiana 0.73, as well as A. huegeliana and Allocasuarina sp. 0.78. In addition, the present UPGMA dendrogram did not separate the Casuarina and Allocasuarina genera and revealed a significant relationship between A. littoralis and C. cunninghamiana, as well as C. equisetifolia and C. glauca.

Fig. 2
figure 2

UPGMA cluster diagram of 26 Casuarina and Allocasuarina genotypes based on SSR markers. The numbers in each node represent bootstrap support values (those lower than 50 are not shown)

Discussion

Compared with time-consuming genomic SSR development, identification of SSR markers from EST sequences is less labor intensive (Varshney et al. 2005). Besides, EST-SSRs are potentially tightly linked with functional genes that can be associated with certain important genetic traits and contain high level of transferability across species and genera. In the present study, we identified a total of 10,291 SSR loci from 118,270 assembled unigene sequences of C. equisetifolia (8.7%), which is slightly lower than that observed from Casuarina transcriptome data (10.7%, Kullan et al. 2016). The distribution frequency (8.7%) is also much higher than those obtained from cereal species (3.5%) and other tree species such as Eucommia ulmoides (2.9%), Pinus massoniana (3.62%), Pinus koraiensis (4.24%), and Camellia oleifera (4.99%), and much lower than Populus trees (14.83%), also Eucalyptus trees (34.87%) (Varshney et al. 2002; Rabello et al. 2005; Wen et al. 2013; Zhang et al. 2015). The difference in distribution frequency may be mainly due to species specificity, the size of dataset, the quality of sequenced unigenes, also the database mining tools, SSR search criteria, etc. (Varshney et al. 2005; Biswas et al. 2012). Therefore, a direct comparison of abundance estimation and frequency occurrence of SSRs in different reports is difficult. In C. equisetifolia, the most abundant motif was AG/CT (54.24%) and AAG/CTT (13.02%). The same results have been observed by Kullan et al. (2016), where the types of AG/CT (62.4%) and AAG/CCT (9%) were the most prevalent motifs among dinucleotide repeats and trinucleotide repeats, respectively.

Cross-species amplification of SSR loci is considered as a cost-effective approach for developing microsatellite markers for new species. The transferability of EST-derived SSRs across different species and genera has been documented in many plant species, especially in studies involving economically important crop species (Ellis and Burke 2007). Yasodha et al. (2005; 2009) confirmed the transferability of Eucalyptus SSR markers across two species of C. equisetifolia and C. glauca. Among 42 Casuarina SSR markers, 80% were transferable across four species of C. cunninghamiana, C. junghuhniana, C. equisetifolia, and C. glauca (Kullan et al. 2016). Our present study confirmed transferability of 15 highly polymorphic Casuarina SSR markers across Casuarina and Allocasuarina species, which suggests that these SSR markers may be potentially valuable employed in the genetic diversity and genetic structure analyses in the genus of Casuarina and Allocasuarina.

The family Casuarinaceae belongs to the Gondwanic family consisting of four genera (Allocasuarina, Casuarina, Ceuthostoma and Gymnostoma) and 97 species including monoecious or dioecious shrubs and trees (Wilson and Johnson 1989; Kamalakannan et al. 2006). Allocasuarina species are considered to be the nearest relatives of the genus Casuarina since they share many common morphological features (Wilson and Johnson 1989; Pinyopusarerk and House 1993; Yasodha et al. 2004). However, our present study with SSR markers did not separate the Casuarina and Allocasuarina genera, just as reported in the previous study on morphological and molecular diversity among Casuarina and Allocasuarina species with morphometric parameters and inter simple sequence repeat (ISSR) markers (Yasodha et al. 2004; Kamalakannan et al. 2006; Chezhian et al. 2009). Therefore, further work on molecular phylogeny relationship of Casuarinaceae is needed. Thus, the 15 transferable Casuarina SSR markers can also be used for phylogenetic analysis to clarify phylogenetic relationships between the two genera. In addition, these EST-derived SSR markers can facilitate the development of gene-based markers that may increase the efficiency of marker-assisted selection and breeding of Casuarina trees.

The present study showed that all of the genotypes have low levels of genetic diversity at the species level, especially those from A. littoralis. The results are in accordance with the observation made by Kamalakannan et al. (2006) and Chezhian et al. (2009). This may be due to the less number of parent trees used for bulking of seeds and the limited genotypes used in the present study. For A. littoralis, the narrow genetic diversity may be attributed mainly to its restricted geographical distribution. In addition, the close genetic relationship between C. equisetifolia and C. glauca revealed in the present study is well supported by the results reported by Kamalakannan et al. (2006) and Chezhian et al. (2009).

Although SSR markers developed from ESTs are widely used in genetic diversity analyses, only 15 polymorphic SSR markers were validated from 150 detected SSR markers in the present study, which revealed the EST-SSR markers of C. equisetifolia have a low level of polymorphism. Compared with EST-SSR markers, the genomic SSR markers can detect more allelic diversity, thus show higher discrimination power, which were reported in a variety of plant species (Eujayl et al. 2001; Chabane et al. 2005; Hu et al. 2011; Zhang et al. 2014). This fact may be due to possible selection against alterations in the conserved coding sequences (Scott et al. 2000), which limits SSR variation in these regions. Therefore, much more valuable EST-SSR markers need to be developed based on Illumina RNA-seq or public EST databases to meet the increasing demand for genetic and breeding studies in Casuarina and Allocasuarina species.

Conclusions

Based on transcriptome data of C. equisetifolia, 10,291 EST-SSR loci with the types of 2–6 nucleotide repeats were identified from assembled 118,270 unigenes. An average of one EST-SSR was found every 13.54 kb, and the distribution frequency of SSRs was 8.7%. The most common repeat motifs were dinucleotide with AG/CT and trinucleotide with AAG/CTT. Fifteen highly polymorphic Casuarina SSR markers were developed and proved to be transferable across Casuarina and Allocasuarina species, which can be applied for population and evolutionary genetic studies in Casuarinaceae, and will be useful in selection and breeding of Casuarina trees.

Author contribution statement

NL, YQZ, HMD, HZP, BJ and HBL designed the experiments; NL and HBL performed the experiments and analyzed the data; NL and HBL wrote the manuscript.