Introduction

Approximately 200 plant genomes have been sequenced so far; from those, roughly 50 correspond to ferns, bryophytes and algae, and the rest are mostly temperate climate crops (Mukherjee et al. 2018). Unfortunately, most of the tropical species have yet to be sequenced. This is a worrisome situation since fragmentation and loss of habitat in the tropics is happening at a very fast pace (Aguilar et al. 2018; Cousins 2020; Escobar 2019) and mass extinctions are expected in these regions, even as consequence of habitat disturbance (Alroy 2017). This situation endangers the existence of many species on which man has depended for survival for thousands of years.

One of the biggest challenges for germplasm collections is the molecular characterization of accessions and their preservation from genetic erosion (Barcaccia 2009). Molecular markers are critical to determine the genetic diversity within collections and in the wild, as well as to select core collections of manageable size that represent the genetic diversity of the collection while maintaining allele specificity and accession rarity (Curry 2017; Reyes-Valdes et al. 2018). Microsatellites (or SSRs—Simple Sequence Repeats—), are one of the most widely used molecular markers in genetic studies, such as population genetics, molecular breeding, and paternity testing (Ellegren 2004). SSRs are abundant, co-dominant, multi-allelic, highly reproducible and easy to use (Richard et al. 2008); and they can be isolated either by data mining of existing sequences (Sharma et al. 2007) or by generating and sequencing SSR-enriched libraries (Kijas et al. 1994; Zane et al. 2002). SSRs are still the markers of choice for many population genetic studies in tropical plants (e. g. Martínez-Castillo et al. 2019a, b; Chaluvadi et al. 2018; Yamanaka et al. 2019).

In the United States of America, the National Plant Germplasm System (NPGS) currently maintains 596,198 accessions from 13,480 species within 239 families (Bretting and Bennet 2007; NPGS 2020). All tropical perennial plants considered in the present study are included in NPGS and they represent an important genetic resource for people living in the tropics: sapodilla [Manilkara zapota (L.) P. Royen] Sapotaceae, lychee (Litchi chinensis Sonn.) Sapindaceae, rambutan (Nephelium lappaceum L.) Sapindaceae, mangosteen (Garcinia mangostana Linn.) and false mangosteen [Garcinia cochinchinensis (Lour.) Choisy] Clusiaceae, all have edible fruits (Arias et al. 2012; Finocchiaro 2020), whereas the two species of bamboo [Bambusa vulgaris Schrad. ex J.C. Wendl and Guadua angustifolia Kunth] Poaceae, have edible shoots (Singhal et al. 2013). For these species, the number of SSRs available are not sufficient; since the theoretical quantity of loci for accurate evolutionary inference of populations is greater than 30 (Pollock et al. 1998; Takezaki and Nei 1996). For rambutan, no SSRs have been developed, though transferability of 12 SSRs from lychee has been described (Hock et al. 2005). For lychee, only 4 and 12 SSRs were reported in separate studies (Ekue et al. 2009; Viruel and Hormaza 2004), respectively. For sapodilla, only 8 and 17 SSRs were reported in two studies (Moraes et al. 2013; Silva-Junior et al. 2016), respectively. For mangosteen, only 17 SSRs were described (Samsir et al. 2016). For bamboo, only 16 SSRs were developed, but from chloroplast sequencing (Vieira et al. 2016). Our main objective was to develop large sets of nuclear SSR markers for the seven species mentioned earlier with the goal that this information will be a valuable resource for conservation programs in banks of germplasm and for in depth population-genetics studies on these species.

Materials and methods

DNA extraction and preparation of SSR libraries

Leaf samples sapodilla, lychee, mangosteen (two species), rambutan and bamboo (two species) were received from USDA-ARS Tropical Agriculture Research Station (TARS), Mayaguez, Puerto Rico, and organized in five groups for developing simple-sequence repeats (SSRs) to be used in germplasm collection and identification. The list of accessions is shown in Table 1. SSR-enriched libraries were prepared as described in Arias et al. (2015), using the same restriction enzymes adapter 1: SSRLIBF1 and adapter 2: SSRLIBF3 (Techen et al. 2010), and the same biotinylated oligonucleotide repeats and conditions.

Table 1 List of accesions used for Roche 454 pyrosequencing and SSR development

Sequencing and SSR primer design

To avoid generating chimeric DNA during PCR reactions in SSR-enriched library preparation, two DNA samples of each of the five groups were processed separately. Then, equal volumes of the two libraries were mixed before proceeding to library preparation for sequencing. DNA quality of pooled pairs of samples was evaluated using Qubit™ fluorometer with the Quant-iT™ PicoGreen® reagent (Invitrogen, Carlsbad, CA) and by Agilent 2100 BioAnalyzer (Agilent Technologies, Santa Clara, CA) equipped with a DNA Ladder 1000 LabChip (Agilent Technologies, New Castle, DE) and its corresponding ladder. The libraries were sequenced using 70 × 75 mm Titanium Pico-Titer Plates (Roche, Branford, CT) on a Roche 454 GS FLX (Roche, Indianapolis, IN) using GS Titanium sequencing kit XLR70 (200 cycles). Read length distribution was analyzed with Roche 454 v.2.0 image/signal analysis and base caller programs. Contigs were assembled using Roche 454 gsAssembler version 2.0 (Roche, Branford, CT). SSR detection, and primer design followed the same protocol previously described (Arias et al. 2015). When a contig contained more than one repeat, primer sets within the contig were given alphabetical sub-indexes, e.g. “_a”, “_b”, “_c. Designed primer sets were tested on 4 or 10 DNA samples per group in 384-well/clear microtiter plates HSP3811 (Bio-Rad, Hercules, CA) in 5 µL reactions with 10-ng DNA using Titanium Taq DNA Polymerase (Clontech, Mountain View, CA). Amplicons generated by capillary electrophoresis were analyzed in ABI 3730XL DNA analyzer (Applied Biosystems, Foster City, CA) and data were processed using GeneMapper v. 3.7 (Applied Biosystems, Foster City, CA).

Results

SSR-enriched libraries were prepared for five groups of tropical plant species (Fig. 1), using two accessions in the TARS collection as indicated in Table 1. The libraries were sequenced in a Roche 454 pyrosequencer resulting in 227–354 thousand reads per group. Histograms of the distribution of read number vs. read length showed maximum number of reads for each of the libraries at approximately 300—350 base pairs and reaching up to 600 bp length (Fig. 2). Libraries of bamboo and mangosteen were processed together in the same region of a picotiter plate, then the reads were separated by the sequence of the oligonucleotide adapters used. The number of contigs assembled for each group was between 1582 and 19,862, and their sequences were submitted to GenBank, National Center for Biotechnology Information (NCBI), accession numbers shown in Table 1. The total number of repeats detected by SSR-Finder software in each of the libraries was between 949 (bamboo) and 8084 (rambutan); and the number of unique primers designed were between 353 and 1557, also for bamboo and rambutan, respectively (Table 2). A total of 384 SSRs were tested for each of the fruit trees, rambutan, sapodilla, lychee, mangosteen, and 336 SSRs were tested for bamboo, these 1872 SSRs are provided in Supplementary Table S1. The number of samples used to test the SSRs varied, 10 samples were used for rambutan, lychee and bamboo, whereas only 4 samples of sapodilla and mangosteen were tested. The overall number of repeat motif sizes, whether they were 2, 3, 4 or ≥ 5 nucleotides (nt) is also listed in Table 2.

Fig. 1
figure 1

Plants used in the present study. Top: sapodilla (left), rambutan (middle), bamboo (right); Bottom: mangosteen (left), lychee (right). Inserts within pictures show fruits cut open. Photographs, courtesy of USDA-ARS, Peggy Greb (USDA Image Database, open access)

Fig. 2
figure 2

Read-length distribution in Roche 454 pyrosequencing of SSR-enriched libraries of Manilkara zapota (sapodilla), Litchi chinensis (lychee), Garcinia mangostana (mangosteen), Nephelium lappaceum (rambutan) and Bambusa vulgaris or Guadua angustifolia (bamboo). Mangosteen and bamboo were processed using different adapters and loaded on the same region of the Roche 454 plate. The “y” axis is the number of reads, the “x” axis is the read length in base pairs (bp)

Table 2 Data of the SSR sets development in seven tropical perennial plants

In the SSR-enriched libraries sequenced of tropical plants, the number of repeat motifs varied from as low as 98 motifs in bamboo to 149 motifs in sapodilla. However, a small group of nine motifs represented from 72 to 83% of all the repeat motifs found in each library, these motifs are shown in Fig. 3. A total of 149 SSR markers did not result in amplification, still leaving 1,723 usable markers generated from this work. In general, less than 10% of the primer sets designed did not produce amplicons, with the lowest values observed in sapodilla, mangosteen and lychee (5.2, 6.3 and 7.8%, respectively), and the highest for rambutan and bamboo (9.4 and 11.6%, respectively) (Fig. 4). The SSRs that resulted in no amplification on the DNAs tested were marked as gray shade cells in Supplementary Table S1. Screening of SSR markers on 4 to 10 individual DNA samples resulted in polymorphism in 30.1 to 52.3% of the markers tested; in the case of bamboo where SSR-enriched libraries were prepared using two different genera (Guadua angustifolia and Bambusa vulgaris), 50% of the marker amplified only one of these two species (Fig. 4).

Fig. 3
figure 3

Percentage of nuclear simple sequence repeat (SSR) markers that were polymorphic or resulted in no amplification. The total number of SSRs tested was 384 for Garcinia mangostana (mangosteen, GAM), 384 for Litchi chinensis (lychee, LIC), 384 for Manilkara zapota (sapodilla, MAZ), 384 for Nephelium lappaceum (rambutan, NEL) and 336 for Bambusa vulgaris (bamboo, GUA). One spp. only: indicates that 50% of the markers that were developed using SSR-enriched libraries of Bambusa vulgaris and Guadua angustifolia amplified only one of these two species. The percentage of polymorphism is probably underestimated given the small number of samples tested

Fig. 4
figure 4

Percentage of simple sequence repeat (SSR) motifs found in non-mononucleotide nuclear-SSR-enriched libraries over a total of 1093 repeats and 99 motifs of Garcinia mangostana (mangosteen), 1948 repeats and 144 motifs of Litchi chinensis (lychee), 2135 repeats and 149 motifs of Manilkara zapota (sapodilla), 1117 repeats and 114 motifs of Nephelium lappaceum (rambutan) and 480 repeats and 98 motifs of Bambusa vulgaris (bamboo). A 72–83% of the repeats corresponded to the 9 motifs indicated in the figure legend. Percentage values ≥ 5% are numerically indicated in the colored areas

Several criteria were applied to select the top quality SSR markers for each group of tropical plants. These criteria were: amplification of all the samples tested, high fluorescent signal, minimum background amplification (nonspecific amplicons), absence of multiple peaks, and absence of stutter peaks. Application of these criteria to the 1872 SSRs tested resulted in 178 top quality markers reported in Table 3; where 36, 47, 38, 31 and 26 correspond to rambutan, sapodilla, lychee, mangosteen and bamboo, respectively. Examples of SSRs that were chosen as top quality markers are shown in Fig. 5, these are three markers of sapodilla showing amplification of four DNA samples, all of them with high levels of fluorescence (30,000 units scale), and a minimum of background.

Table 3 Selected best quality, polymorphic primer sets for microsatellites of five tropical plant groups
Fig. 5
figure 5

Examples of what was considered good quality simple sequence repeat (SSR) markers in the present work. Three markers of Manilkara zapota (Stv_maz_13697; Stv_maz_06044; Stv_maz_06859) tested on four DNA samples and showing discrimination of all the samples tested. “x” axis corresponds to amplicon sizes in base pairs (bp), “y” axis for all the markers was set at a maximum of 30,000 fluorescent units

Discussion

We reported between 297 and 364 new nuclear SSR markers for five groups of tropical plants studied: rambutan, sapodilla, mangosteen, lychee and bamboo. Overall, this is a 20-fold higher number of SSR markers than the currently existing in the literature for the five plant groups studied, e.g. 15-fold more for sapodilla and 22-fold more for lychee, respectively. One advantage of using nuclear SSRs, is that the topology of reconstructed phylogenies can be different from the one using plastid data (Lin et al. 2019). This could be an advantage in the case of bamboo (Guadua angustifolia and Bambusa vulgaris) since to the best of our knowledge the 297 SSRs are the first nuclear SSRs reported for these species; the 16 SSRs previously reported for bamboo are from chloroplast origin (Vieira et al. 2016).

The potential use of the SSRs markers developed in the present study goes beyond their particular use for the seven species considered. One of the characteristics of SSRs is their high level of transferability between closely related species (Ziya et al. 2016). For example, SSRs developed for sapodilla could be useful in the other 64 species that belong to the pantropical genus Manilkara, which contains about 30 species in America, about 20 in Africa and about 15 in Asia, Australia and the Pacific; several of them utilized for its timber, fruit and latex (Armstrong 2010). Furthermore, it is possible that the SSRs developed here could be used in species that do not belong to the same genus. For example, SSR markers developed for lychee in two separate studies (Hock et al. 2005; Ekue 2009), have shown transferability to species within different genera of the Sapindaceae family. Sapindaceae is a tropical and subtropical family which contains about 1580 species, several of them with edible fruits (Buerki et al. 2010).

SSRs are being used in multiple conservation efforts to preserve species in the tropics. For example, nine polymorphic SSRs were used to understand the genetic structure and diversity of Annona cherimola Mill., to preserve germplasm that could be source of biotic and abiotic stress resistance and to guarantee food security in future generations (Larranaga et al. 2017). In date palm, Phoenix dactylifera, 19 SSRs were used to determine the population structure of 195 accessions from Asia and Africa and to understand their vulnerability to diseases and insect pests given sudden changes in climate (Chaluvadi et al. 2018). Also, 46 polymorphic SSRs were used to analyze the genetic structure of mango cultivars from around the world, conservation of germplasm and to facilitate the use of genetic resources for breeding purposes (Yamanaka et al. 2019). Even in current times when genotyping by sequencing (GBS) has become inexpensive, SSRs are still the preferred effective, robust, reproducible and simple to use tool to determine genetic diversity of landraces of maize and preserve rare allele sources; as SSRs do not require large bioinformatics infrastructure and expertise for data analysis (Hayano-Kanashiro et al. 2017).

Germplasm conservation and genetic population studies can be performed with a small number of markers, many have used between 8 and 17 SSRs (Amici et al. 2019; Ekue et al. 2009; Moraes et al. 2013; Samsir et al. 2016; Silva-Junior et al. 2016; Viruel and Hormaza 2004). Indeed, the most common number of loci that had been used in population studies of wild species was six and usually no more than twelve (Koskinen et al. 2004). However, for the estimation of the population-genetic parameter θ (4Neμ) a linear gain in accuracy occurs when increasing the count of loci from 1 to 100 (Carling and Brumfield 2007), and the theoretical quantity of loci for accurate evolutionary inference was estimated between 30 and hundreds of loci (Pollock et al. 1998; Takezaki and Nei 1996). Thus, the number of nuclear SSR markers provided in the present work for rambutan, sapodilla, lychee, mangosteen (two species) and bamboo (two species) would allow to meet those theoretical ideal figures for each of these groups. In addition, we report 26–47 top quality polymorphic markers for each for the species, which are sufficient for screening large number of samples and facilitate their correct identification and conservation in banks of germplasm; whereas for more in depth characterization of population-genetic parameters we provide hundreds of SSRs. Regarding the level of polymorphism of the markers reported here, 30–50%, is probably underestimated given the small number of accessions (4–10 per group) used for testing these markers.

Conclusions

The seven perennial plant species considered in the present study; rambutan, sapodilla, mangosteen, lychee and bamboo, represent an important genetic resource for the people living in the tropics. The markers reported here will help to generate information in relation to conservation genetics and breeding programs for these species.