Introduction

Genetic improvement of plantain and banana has been restricted by the complex genetic structure and ploidy nature of the crop, which often leads to low levels of fertility and seed viability. It is further hindered by the large amount of space required for growth, maintenance, and evaluation of plant populations. Molecular marker assisted breeding could be potentially exploited to enhance efficiency of genetic improvement in Musa. One of the major setbacks in the application of molecular breeding for Musa improvement is the paucity of informative molecular markers. Although simple sequence repeats (SSRs) are considered to be the best marker in many crop species because they are codominant, multiallelic, and randomly dispersed throughout the genome [1], their development through the conventional approach of genomic library construction is time consuming and resource intensive [2]. Harvesting of publicly available DNA sequence databases for the development of SSRs provides an attractive, time and cost-effective alternative. Expressed sequence tags (ESTs) are particularly attractive for marker development since they represent coding regions of the genome and are also being developed at an extremely faster pace for many genomes [35]. Moreover, the frequency of microsatellites is significantly higher in ESTs than in genomic DNA [6, 7]. In addition, they can lead to the development of gene-based maps, which helps in the identification of candidate genes and increases the efficiency of marker assisted selection [8]. Similar advantages of EST-SSRs have also been reported in other horticultural crops such as grapes [9], Prunus [10], and cocoa [11]. To date, nearly 35,000 banana ESTs are currently available in the GenBank of National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/), and their number is increasing every year. Therefore, the present study has been initiated (1) to mine the SSR markers from Musa ESTs, (2) to analyze the frequency and distribution of SSRs, (3) to validate the EST-SSR markers for detection of polymorphism in Musa genotypes, and (4) to assess the transferability of markers among Musa species, related genera Ensete, and family Zingiberaceae of the order Zingiberales.

Materials and Methods

Data Mining for SSR Marker

ESTs of 11 Musa complementary DNA (cDNA) libraries developed from various organs and different stress conditions were retrieved from NCBI. ESTs having poly-A tails were trimmed using TrimEST program (http://emboss.bioinformatics.nl/cgi-bin/emboss/trimest). Then, all the ESTs were reassembled using CAP3 program (http://pbil.univ-lyon1.fr/cap3.php). These assembled sequences were run in WEBSAT for finding the SSR motifs (http://purl.oclc.org/NET/websat/). EST-SSR primers were designed for at least six repeat units in case of dinucleotide, four repeat units for tri- and tetranucleotide and three repeat units for penta- and hexanucleotide SSRs using the same program.

Plant Materials, Amplification, and Confirmation of SSRs

Eighteen Musa accessions representing different genomic groups of the core collection of Musa germplasm, maintained at NRCB, Trichy, were used to validate the EST-SSR markers, which are designed from the Musa EST database. The genomic DNA of these accessions was extracted as described by Gawel and Jarnet [12] with minor modification.

Twenty-four primer pairs for Musa EST-SSRs were synthesized, which are representing all possible combinations of mono- [1], di- [11], tri- [8], tetra- [1], penta- [2], and hexa-repeats [1]. PCR reaction was carried out in a 12.5-μl volume containing template DNA (20 ng), dNTPs (0.2 mM), primers (each 0.5 μM), 1.25 U of Taq polymerase, and 1× PCR buffer. Amplification was carried out using an Ependorff thermocycler with the following program: initial denaturation at 94 °C for 5 min; 25 cycles of 94 °C for 1 min; specific primer annealing temperature for 1 min; and extension at 72 °C for 1 min with an extra elongation period of 7 min at 72 °C. The PCR products were run on 3 % Agarose in 1× Tris–borate–EDTA buffer and visualized by ethidium bromide staining. A 50-bp molecular ladder was used to estimate the allele sizes.

The putative functions of identified functional markers were annotated by BLASTX against the NCBI Non-redundant Protein http://www.ncbi.nlm.nih.gov/ RefSeq/. Those EST-SSRs, which were successfully amplified in various genomic groups of Musa, were used to test the transferability across genera and family using transferability of Musa EST-SSR markers all primer pairs producing successful PCR bands using Musa genomic DNA were tested using Ensete superbum, Ensete glaucum of family Musaceae and Zingiber officinale, Elettaria cardamomum, and Curcuma longa of family Zingiberaceae).

To confirm the presence of microsatellite repeats, MESSR 17 was used for the amplification of genomic genomic DNA of six cultivars, representing different genomic group such as AA, BB, AAB, ABB, and AAAB, and they were electrophoresed on a 3 % agarose gel. The bands were eluted from the gel and purified using the Qiagen gel extraction kit. The purified fragment was cloned using the pGEMT TA cloning kit for sequencing. The positive clones identified by PCR were sequenced by an automated sequencer (ABI PRISM 3700 DNA analyzer). The final edited sequences belonging to different genotypes were compared with the original SSR containing EST sequence through multiple sequence alignment using Lasergene 7.1v software.

Results and Discussion

Abundance, Frequency, and Distribution of SSRs in Musa ESTs

A total of 32,708 EST sequences were retrieved from 11 cDNA libraries available in the public domain. They have been derived from different organs and/or stress conditions of two different species (Musa acuminata and Musa balbisiana) and bispecific hybrids, and they were used to evaluate the presence of SSR motifs. These ESTs were trimmed with the help of TrimEST. Elimination of redundant sequences and improvement of sequence quality using CAP3 analysis resulted in 21,050 potential unique ESTs (Table 1). These unique sequences were subjected to WEBSAT analysis resulting in 5,158 (24.5 %) SSRs containing ESTs. This included 1,349 contigs and 3,809 singletons. Among the 11 libraries screened, all libraries except IV and VII had 20 % of SSR containing ESTs. This study revealed that banana exhibited higher percentage of SSR containing Musa ESTs (24.5 %) compared to other horticulture crops like cassava (9.7 %) [13], pepper (10.2 %) [14], and coffee (6.8 %) [15]. Varshney et al. [16] reported that the frequency of EST-SSRs may be attributed to the parameters used for mining SSRs. High frequency of SSRs in banana ESTs might be attributed to the higher mutation rate in clonally propagated crops [17]. This is supported by Sreenu et al. [18], suggesting that the distribution and frequency are resultant of factors like mutation and DNA repair mechanisms.

Table 1 Details of Musa EST libraries retrieved from NCBI

Out of 5,158 SSR containing ESTs, 4,691 contained only one SSR, 234 had more than one SSR, and 505 had compound SSR, and the detailed data for individual library are depicted in Table 2. The Musa EST-SSRs contained diverse types of repeat motifs. Among different repeat units, the trinucleotide repeat motifs were maximum (37.6 %) followed by mono- (32.9 %), di- (15.9 %), tetra- (8.6 %), hexa- (3.07 %), and pentanucleotide (1.7 %). The occurrences of different repeat units were tri-repeats (37.6 %). Similar trend has been observed in cereals [19] and in legumes [20]. Most EST sequences consist of exonic regions, which are under strong selection against frameshift mutations as they translate into proteins. As codons are functional units of three nucleotides, indel mutations causing a shift in three nucleotides do not perturb the current reading frame of a given gene [21].Therefore, trinucleotide repeats are expected to be the most abundant SSR class found in ESTs.

Table 2 Summary of SSRs detected after sequences assembled and categorized

The size of the SSR varied from 10 bp (5 repeats for dinucleotide motifs) to 60 bp (20 repeats for trinucleotide motif), and the average number of repeats was 8.3 for dimeric, 4.8 for trimeric (58.3 % with 4 repeats), 3.2 for tetrameric (85.8 % with 3 repeats), 3.42 for pentameric (72.4 % with 3 repeats), and 3.2 for hexameric (85 % with 3 repeats) SSRs. Maximum length of SSR region was observed for tri-repeats (CGA),20 di-repeats (TA),25 tetra-repeats (GATA),9 penta-repeats (ATGCC),7 and hexa-repeats (GAACCG)6 and (GCCGTC).6 Among 11 libraries, library III had maximum repeat length for di-, tri-, and hexa-repeats followed by library IX for tetra- and penta-repeats. In general trinucleotides with repeat motifs CCG/CGG were dominant followed by AAG/CTT when compared with other repeats (Table 3). The SSRs in Musa ESTs showed a more or less equal contribution of A + T (52 %) rich repeat motifs and G + C rich repeat motifs (48 %). This is accordance with the findings of Lescot et al. [22] and Cheung and Town [23] who also registered 47 % GC content in the protein matching regions of cDNAs and Musa bacterial artificial chromosome (BAC) end sequences, respectively.

Table 3 Frequency and distribution of different SSR types in the 21,050 nonredundant Musa ESTs redundant

Class I and II Group of Musa EST-SSRs

Based on the nucleotide length of the SSR region, simple perfect SSRs were classified into two groups, namely, class I (>20 nucleotides) and class II (10–20 nucleotides) SSRs. The number of class I SSR ranged from 0 to 234, and the number of class II SSRs ranged from 2 to 1,272 was observed in libraries IV and III, respectively. In general, the most abundant repeat types in class I SSRs were mono-repeats followed by tri-repeats, while trinucleotide repeats followed by mono-repeats were more abundant in class II SSRs. The pentanucleotide repeats were very meager in both classes of SSRs. A number of ESTs having class I SSRs types were observed for different repeat types such as tri- (307), di- (187), hexa- [21], tetra- [10], and pentanucleotides [7]. Similarly, a number of class II SSRs for tri- (1,700), di- (648), hexa- (131), penta- (83), and tetranucleotides (70) were observed. It is understood that the number of class II SSRs is maximum than that of class I SSRs, and both class I and II SSRs were abundant for tri-repeats. Among the SSRs, a maximum of 38 % of class I SSRs when compared to the total SSRs were observed in library II, whereas library III had maximum number of class II SSRs.

Transferability of EST-Derived SSR Markers

A total of 24 EST-SSR primer pairs were synthesized and evaluated with DNA extracted from 18 Musa germplasm accessions representing diverse genomic groups. Out of these, 21 EST-SSR primers amplified the genomic DNA of all the accessions. The high amplification success suggested that the flanking regions of these loci are sufficiently conserved, and they could be used for comparative analyses of genetic diversity in the genus Musa. Failure in amplification of the three primers is mainly due to the deletion of microsatellite at a specified locus [24], or nonamplification is due to mutations (in-dels) in the primer binding sites. [25]

Nine out of 24 (43 %) showed polymorphism among the Musa accessions. The high polymorphism observed in this study is contradictory to the results of Wang et al. [26] who reported lower rate of polymorphism for EST-SSRs than the genomic SSRs. This suggested that these EST-SSR markers could be exploited for improving banana through MAS breeding and developing linkage maps. The higher frequency of polymorphism observed is possibly due to the inclusion of diverse indigenous land races, exotic introduction, and wild species. The number of alleles detected ranged from 1 to 3 with an average of 1.8 alleles per locus. The amplification results suggested that di- and tri-repeat SSR motifs are highly polymorphic than tetra-, penta- and hexa-repeats.

In the present study, the observed size of the amplicon was larger than the expected size for the primer MESSRm17. Similar result was also reported by Thiel et al. [27]. The difference in the amplicon size is due to the presence of introns and insertions–deletions (in–dels) in the corresponding genomic sequences, which was substantiated by Saha et al. [28] through sequence analysis.

Sequence Comparison of SSR Region Across Musa Genomes

EST-SSR polymorphism was confirmed at the nucleotide level by cloning and sequencing the amplified products of the primer MESSR17, for six out of 18 accessions exhibiting polymorphism were cloned and sequenced. The BLAST analysis (Table 4) showed that all the sequences hit with the original EST sequence from which MESSR17 primer was designed. The sequence alignment showed that there was variation in the repeat length of GGA and CTT motif among the Musa accessions (Fig. 1). This suggested that allelic diversity for the MESSR17 primer must be due to the differences in repeat lengths, which could be due to insertion or deletion of repeats in the microsatellite regions.

Table 4 Details of EST-SSR primers used for studying the transferability across Musaceae and Zingiberaceae
Fig. 1
figure 1

Alignment of six different genomic group of Musa EST sequences obtained by amplification using MESSR17 primer pairs. Repetitive sequences are indicated in boxes

The comparison of sequence sizes and nucleotides of the different genomic groups suggested that BB genome (Attikole and Bhimkol) are having more nucleotide base pairs compared to other genomic groups, namely, AA AAB, ABB, and AAAB. This is mainly due to variation in the length of CTT and GGA motifs and also the point mutation in GGA motif in both BB accessions. Apart from this, variation in motif length, a few point mutations were observed in the SSR motif flanking region. This phenomenon is supposed to be the innate evolving nature of the genome, which is indicative of the evolutionary relationships among the taxa [29].

To confirm the specificity of this MESSR 17 marker across the BB genomes, 10 accessions of BB genome were compared with accessions belonging to different genomic group through mobility differences of the amplified product in 3 % Metaphore agarose gel. Eight out of ten accessions (BB genome) exhibited monophorphic bands of same size of the sequenced BB accessions genomes, and the remaining two accessions exhibited polymorphic bands similar to other accessions. This result suggested that MESSR 17 marker could not be used as BB-specific markers as SSRs are highly mutable loci [30].

Transferability of Musa EST-SSR Markers Across Genera and Families

To test the transferability of Musa EST-SSR markers among the Zingiberale order, eight Musa species (M. acuminata, M. balbisisana, M. ornata, M. laterita, M. nagensium, M. itinerans, and M. sikkinensis) related genus (E. superbum and E. glaucum) and three different genera of the family Zingiberaceae (E. cardamomum, C. longa, and Z. officinale) were amplified with the functional Musa EST-SSR primers (Fig. 2). The amplification results revealed that all the 21 functional EST-SSR primers produced amplicons in all Musa and Ensete species of Musaceae. Higher levels of transferability of EST-derived SSRs reflect the conserved nature of coding sequences in the SSR flanking region. This result suggested that these transferable EST-SSR markers could be used for identification of markers associated with specific traits in other Musa species and related genera. Similar results were observed among the related species of grapes [31] and pines [32] and among related genera of sugarcane [33] and sunflower [34]

Fig. 2
figure 2

Amplification patterns obtained with primer MESSR17 in 3 % agarose gel electrophoresis of various genera of order Zingiberales. M 100 bp Molecular weight marker, 1 Musa acuminate, 2 M. balbisiana, 3 M. ornata, 4 M. laterita, 5 M. nagensium, 6 M. itinerans, 7 M. sikkimenis, 8 Ensete superbum, 9 Ensete glacum, 10 Zingiber officinale, 11 Elettaria cardamomum, 12 Curcuma longa

Amplification of Musa EST-SSR markers in other genera of Zingiberaceae revealed that these markers could be used for a variety of studies. Out of 21 polymorphic EST-SSR markers, transferability was observed for Z. officinale [12], C. longa [10], and E. cardamomum [12] of Zingiberaceae family. Gao et al. [35] have also experimentally proven the transferability of EST-SSR markers from monocot (wheat) to dicot (soybean) species. These results indicated that EST-SSRs can be transferred across relatively large taxonomic distances, spanning not just species within a genus and multiple genera within a family but in some instances families with in the same order. Currently, no ESTs are available for the E. cardamomum in the public domain unlike Zingiber spp. (38115) and Curcuma spp. (12593), which have sufficient number of ESTs (NCBI). It is suggested that these transferable Musa EST-SSR markers could also be used in the cardamom improvement program as the availability of cardamom specific primers are very meager. Fatokun et al. [36] reported that different plant species often share orthologous genes for very similar functions. Similarly, Bennetzen and Freeling [37] and Gale and Devos [38] also stated that gene content and gene order among different plant species are highly conserved. This suggested that Musa EST-SSR markers have potential to develop conserved orthologous set (COS) markers, which facilitate comparative studies between species and members of the order Zingiberales. Hence, it is suggested that these Musa EST-SSR markers are potential markers for comparative mapping among the Zingiberaceae and Musaceae families, which could be used in the development of COS markers for the order Zingiberales.

In summary, the Musa EST databases have enormous amount of functional SSR markers, which could be exploited not only for studying the genetic diversity but also used for identifying candidate gene for specific trait, as these EST-SSRs are part of or adjacent to functional genes. High percentage of cross-species, cross-genera and cross-family transferability also suggested that these Musa EST-SSR markers will be a valuable resource for the comparative mapping by developing COS markers in evolutionary studies and in the improvement of the members of Zingiberaceae and Musaceae.