Introduction

Downy mildew disease caused by the obligate oomycetes pathogen, Sclerospora graminicola Schott is a serious threat to pearl millet [Pennisetum glaucum (L.) R. Br.] production reported to cause losses of up to 40% in favorable environments. Pearl millet is a principal crop for food security in the arid and semi-arid regions of India and Africa and now-a-days gaining much more attention on the fact of its nutritive index (Kodkany et al. 2013; Kumar et al. 2016). It is one of the most widely cultivated drought and high temperature tolerant C4 cereals for forage, grain and stover. Development of downy mildew resistant parental lines to develop durable hybrids through marker-assisted selection is a viable and feasible strategy of disease control. Molecular markers are the most important genetic resources for crop improvement.

Molecular markers linked with a particular trait like disease resistance can be employed to track that trait in the segregating population and be effectively utilized for marker-assisted selection (MAS). Even though, molecular marker studies in pearl millet date back to the early 1990s with the first molecular map of pearl millet reported by Liu et al. (1994), efforts are still underway for the development and application of molecular-markers in pearl millet to make it stand with other cereals like rice, sorghum, maize, wheat and barley (Sehgal et al. 2012; Punnuri et al. 2016). The development of downy mildew resistant hybrid, HHB-67 improved by RFLP marker-assisted selection is a landmark event in the history of pearl millet improvement which not only ensued the discovery of QTLs linked to the downy mildew disease resistance but also led to successful adoption of this variety in north western India (Hash et al. 2003). The availability of pearl millet genome sequence in the near future (http://ceg.icrisat.org/ipmgsc/) is expected to append and enhance the genetic resources. Nevertheless, efforts must continue in the direction of developing functional molecular markers for specific biotic or abiotic resistance/tolerance.

The genetic maps in pearl millet based on markers such as restriction fragment length polymorphism (RFLP) and amplified fragment length polymorphism (AFLP) with the simple sequence repeat (SSR) and diversity array technology (DArT)-based maps have proven useful in the identification of quantitative trait loci (QTLs) and breeding for drought tolerance, disease resistance, stover quality and grain mineral (Hash et al. 2003; Supriya et al. 2012; Kannan et al. 2014; Kumar et al. 2016). A high density linkage map and Genotyping by sequencing (GBS) markers has been recently developed by Punnuri et al. (2016) for tagging leaf spot resistance.

Of late, RNA-seq or transcriptome sequencing through next-generation sequencing platforms has been able to provide reliably large amount of information on the functions and expression of genes at a given time point (Devran et al. 2015; Varshney 2016). The SSRs markers developed from such data ply insight on functions and provides association with that particular trait in study and aptly known as EST-SSRs. The EST-SSRs have intrinsic advantage over genomic SSRs in terms of its origin from coding regions of the genome and transferability to other related species (Kalia et al. 2011; Fougat et al. 2014). The open accessibility of huge amount of data emanated from transcriptome sequencing in various crops concerning various stages or time points have greatly levered and favored the development of SSRs (Varshney et al. 2005a, b, 2016).

There are reports on development of EST-SSRs for drought tolerance (Rajaram et al. 2013), abiotic stress tolerance (Bertin et al. 2005; Senthilvel et al. 2008; Thudi et al. 2010; Sehgal et al. 2012) in pearl millet. The EST-SSRs associate with particular function and hence can be efficiently employed in QTL tagging and molecular breeding of a particular trait. In the present study, transcriptome data of downy mildew resistant and susceptible genotypes was employed to mine and develop functional EST-SSR derived SSRs markers on the basis of annotated functional genes involved in plant disease resistance.

Materials and methods

Plant materials and DNA extraction

A total of 12 downy mildew (DM) resistant/susceptible genotypes of pearl millet were used for the validation of newly designed EST-SSR primers (Supplementary Table I). These resistant and susceptible genotypes can be used as potential parental lines for development of mapping population(s) to identify DM resistance QTL/genes. Genotypes were obtained from the International Crop Research Institute for Semi-Arid Tropics (ICRISAT), Patancheru, Telangana and Main Pearl Millet Research Station, Junagadh Agricultural University, Jamnagar, Gujarat. Genomic DNA was extracted from fresh tissue of 10 days old plant by CTAB method described by Zidani et al. (2005). The quality and integrity of DNA was checked on 0.8% agarose gel and quantified using NanoDrop ND-1000 (NanoDrop products, DE, USA).

Screening of trinity assembly for EST-SSRs mining and primer designing

In the earlier study, we performed transcriptome analysis of downy mildew resistant genotype (P310-17) and susceptible genotype (7042S) of pearl millet upon inoculation and control. We have employed the transcripts emanated from assembling the pre-processed 1,295,196 high-quality (HQ) reads of four multiplexed pearl millet samples through Trinity assembler yielded 26,690 transcripts (Kulkarni et al. 2016). These assembled transcripts were used to mining and development of EST-SSR markers in present study. The transcripts were screened for SSR motifs using Perl script, MIcroSAtellite identification tool (MISA) with criteria of unit size or minimum number of repeats (2-7, 3-5, 4-4, 5-3 and 6-3) and maximum number of bases interrupting two SSRs in a compound microsatellite = 100 (Thiel et al. 2003). The fasta sequences of transcripts harboring SSR motifs were extracted by in-house Perl script and subjected to design primers flanking the SSR motif using BatchPrimer3 (You et al. 2008). The parameters used for designing primers as follows primer length 18–23 bp, with optimum value 20 bp; Tm 57–62 °C, with optimum value 60 °C; GC content 30–70%, with the optimum value 55%; and product size range 120–300 bp optimum value 150 bp, Max 5′ self-complementarity 5.0 and Max 3′ self-complementarity 0. The EST-SSRs primers were synthesized from Sigma-Aldrich, USA.

Functional annotation of SSRs harboring transcripts

The assembled transcripts harboring SSR motifs were functionally annotated using Blast2GO, online tool for functional annotation of (novel) sequences (Conesa et al. 2005). Blastx of contigs was carried out with non-redundant (nr) database using e value cut-off of 1E-6. The sequences showing blast hits were mapped for retrieving the gene ontology (GO) terms associated to the hits. Annotation was carried out on the mapped sequences for selecting GO terms and InterPro annotation was executed to retrieve domain/motif information in a sequence-wise manner. The GOSlim terms, cut-down versions of the GO ontologies containing a subset of the terms in the whole GO for molecular function, biological process, and cellular component categories associated with the best BLASTX hit of the Arabidopsis thaliana protein were assigned to the corresponding pearl millet transcripts (http://www.geneontology.org/GO.slims.shtml).

Polymorphism survey of newly designed EST-SSR markers

On the basis of functional annotation of the transcripts harboring SSR motifs, transcripts possessing functional relevance to downy mildew resistance were selected for validation on a panel of 12 genotypes. The PCR mixture (10 µL) contained 5 µL of 2X PCR master mix (Fermentas), 1 µL of genomic DNA (50 ng), 0.8 µL primers (10p moles of each forward and reverse primers), and 3.2 µL of nuclease free water. A touchdown PCR thermal profile was performed for 94 °C for 5 min of initial denaturation, followed by first five cycles of 94 °C for 30 s, 65 °C to 60 °C for 30 s and 72 °C for 1 min, with 1 °C decrement in annealing temperature per cycle, then 30 cycles of 94 °C for 30 s with constant annealing temperature of 60 °C for 30 s and 72 °C for 1 min followed by a final extension for 7 min at 72 °C. Amplified PCR product were separated on 3% Metaphor agarose gel (0.5 μg/ml Et Br) along with 50 bp DNA ladder (Fermentas, USA) in 1X TBE buffer at constant power 120 V for about 2.5–3 h. The gels were visualized and documented by gel documentation system (Bio-Rad, USA). The amplified bands were scored as presence (1) or absence (0). Polymorphism information content (PIC) was calculated using PowerMarker V3.25 (Liu and Muse 2005).

Results and discussion

Identification, type and frequency of EST-SSRs

The selection of assembled transcripts is critical for mining and development of EST-SSRs from RNA-seq data. In the present study, de novo assembled transcripts emanated by Trinity programme were selected on the basis of the largest N50 value (1219 bp), the maximum transcript length (1.74 Kb) and average contig length of 969.02 bp, was employed for development and mining SSRs (Kulkarni et al. 2016). Higher length of transcripts provides better opportunity in terms of longest open reading frame for mining SSRs and for effectively picking of primers from flanking sequence of SSRs (Bosamia et al. 2015). Out of the 26,690 transcripts, 3586 (13.4%) transcripts harbored SSR motifs. A total of 4510 SSRs were identified and 695 transcripts harbored more than one SSR (Table 1). The SSRs have been found in a range of 6–11% of the total EST sequences from several cereal species viz., barley, maize, rye, rice, sorghum and wheat (Varshney et al. 2002, 2009). In the present study, we found comparatively higher percentage of SSRs (13.4%). The frequency of SSR was one per 5.73 kb which is comparatively higher than the previous studies in pearl millet (Senthilvel et al. 2008) and corroborates with the reports stating frequency of SSRs every 4–8 kb in cereals (Kantety et al. 2002). The frequency of SSR depends on amount of data and the parameters employed for mining SSRs (Varshney et al. 2007; Ding et al. 2015). Higher frequency of SSRs in the present study can be attributed to the RNA-seq generated transcripts.

Table 1 Statistics of EST-SSRs identified in pearl millet transcripts assembled by Trinity

The EST-SSRs can be classified on the basis of their size, type of repeat unit, motif length (Bosamia et al. 2015). The distribution of SSR motifs in different repeat type classes highlighted tri-nucleotide repeats as the highest (54.6%) followed by hexa-nucleotide (16.1%), penta-nucleotide (14.5%), tetra-nucleotides (8.0%) and di-nucleotide (6.8%) repeats (Table 2; Fig. 1). However, the results contrasted with the earlier studies of Senthilvel et al. (2008) in pearl millet where di-nucleotide repeat was the most common repeat followed by tri-, penta-, tetra- and hexa-nucleotide repeats. Earlier studies in cereals also demonstrated tri-nucleotides as the most abundant type of repeat class (Varshney et al. 2002, 2005a, b). Tri-nucleotide repeats are generally abundant in SSRs (Parekh et al. 2016). This abundance is dependent on factors like SSR search criteria, size of the dataset, database mining tools and the EST sequence redundancy (Varshney et al. 2005a, b; wang et al. 2014). The prominence of tri-nucleotide repeats in the pearl millet transcriptome also corroborated with the highest percentage of tri-nucleotide repeats (57.9%) in chickpea transcriptome sequenced through 454 pyrosequencing (Garg et al. 2011) and with studies of (Kantety et al. 2002; Varshney et al. 2002; Thiel et al. 2003; Gupta and Varshney 2000) in sorghum, barely and bread wheat. The tri-nucleotide repeat motifs were followed by hexa-nucleotide repeat motifs contributing 16.1% of the total SSR motifs. It has also been observed in many species that the transcripts or exons have more tri-nucleotide SSRs than other repeats (Bosamia et al. 2015; Kumar et al. 2016; Cardle et al. 2002; Morgante et al. 2002; Li et al. 2004). Nevertheless, hexa-nucleotide repeat has not yet been reported as the second highest repeat motif in cereals, however, such a presence could be explained on the basis of findings that hexa-nucleotide repeats along with tri-nucleotide repeat do not perturb reading frames and tolerate expansions or deletions in coding regions (Asp et al. 2007; Metzgar et al. 2000; Morgante et al. 2002).

Table 2 Distribution and frequencies of EST-SSRs in pearl millet
Fig. 1
figure 1

Distribution of EST-SSR repeat motifs

The di-nucleotide repeat class AG/CT (53.7%; Table 3) was the most abundant in pearl millet transcriptome as was evident in sorghum, barley, wheat and in tall fescue grass (Gupta and Varshney 2000; Thiel et al. 2003; Saha et al. 2004). The presence of AG/CT in the highest percentage was also supported by the studies of Kantety et al. (2002), Varshney et al. (2002) and Jayashree et al. (2006) where SSRs were mined from the nucleotide databases. The studies of Yu et al. (2004) and Jayashree et al. (2006) also indicated that AG and CCG repeats were the most abundant class of repeats in cereals. Among the tri-nucleotides, CCG/CGG class was the most abundant (44.9%) motif as was observed in closely related species viz., sorghum and foxtail millet (Jia et al. 2007). It was also in conformance with most other studies (Eujayl et al. 2002; Gupta and Varshney 2000; Kantety et al. 2002; Saha et al. 2004; Varshney et al. 2002). Moreover, the abundance of CCG repeat is found to be specific feature of monocots and could be attributed to the increased GC content or high frequencies of amino acids with GC nucleotide codons (Morgante et al. 2002). In case of the tetra-, penta- and hexa-nucleotide repeats, AGGG/CCCT (8.6%), CCGCG/CGCGG (9.5%) and CCGGCG/CCGGCG (8.9%) were the most abundant repeat classes respectively (Supplementary Table S2). As with the case of tri-nucleotides, the dominance of GC repeats in penta- and hexa-nucleotide could be attributed to the GC content in monocots.

Table 3 Frequencies of different repeat motifs of di- and tri-nucleotide repeats in pearl millet EST-SSRs

The SSR motifs were further classified into class I (≥20 bp) and class II (≥10 but <20 bp) repeats on the basis of the motif length. The class II repeats (3786) were present in more number than class I repeats (724) (Table 4). Similar observations were recorded by Rajaram et al. (2013) in pearl millet drought transcriptome. The average frequency (Kb/SSR) was found to be the highest for di-nucleotide motif followed by tetra-nucleotide and penta-nucleotide repeats. The least average frequency of tri-nucleotide repeat motif colligated with its highest number in the pearl millet transcripts.

Table 4 Classification of EST-SSR according to the length of motif

Functional annotation of transcripts harboring EST-SSRs

Functional annotation of transcripts harboring SSRs was carried with respect to identification of the function and significance in disease resistance so as to correlate function with the molecular markers (Supplementary Table 3). Therefore, functional annotation constituted the prerequisite for selection of SSR primer designing and validation with respect to downy mildew resistance.

Out of the total 3586 transcripts subjected to BlastX with non-redundant database, 2566 transcripts were annotated (Supplementary Figure S1). The annotated transcripts showed the maximum homology was seen with Setaria italica (2510) followed by Zea mays (184) and Sorghum bicolor (167) (Supplementary Figure S2). Pearl millet and foxtail millet belong to the same millet family and share common ancestry which could have allotted pearl millet top blast hits to foxtail millet (Devos et al. 2000).

The functionally annotated SSR harbored transcripts were categorized into the cellular components, molecular functions and biological processes (Fig. 2). In cellular component ontology, the maximum number of transcripts were associated with plastid, GO:0009536 (450) followed by mitochondria, GO:0005739 (275), plasma membrane, GO:0005886 (160) and cytosol, GO:0005829 (100). The maximum number of transcripts in molecular function were attributed to nucleotide binding, GO:0000166 (501) followed by protein binding, GO: (380), DNA binding, GO:0003677 (300) and kinase activity, GO:0016301 (200). In biological function, the maximum number of transcripts was represented by transport, GO:0006810 (337) followed by transcription, DNA-templated GO:0009056 (289), cellular protein modification, GO:0006464 (287) and response to stress, GO:0016043 (285).

Fig. 2
figure 2

Gene ontology distribution of EST-SSR harboring pearl millet transcripts into cellular component, molecular function and biological process

Validation of EST-SSRs and polymorphism

The SSR markers were selected on the basis of functionally annotated transcript sequences harboring SSR motifs associated with downy mildew resistance genes as previously reported in pearl millet-downy mildew interaction and similar plant-pathogen interactions (Kini et al. 2000; Shivakumar et al. 2000; Geetha and Shetty, 2002; Shivakumar et al. 2003; Chandrashekhara et al. 2010). Thus, EST-SSR primers were selected for disease resistance genes, pathogenesis related proteins, transcription factors, genes involved in disease resistance. Out of the total 3586 transcripts subjected for primer designing, 2419 successful primer pairs were designed from 2321 transcripts with stringent criteria, out of which 230 primers pairs were selected on the basis of transcript function in plant-pathogen interaction. (Supplementary Table S4) and rest of the 2586 EST-SSRs markers have been listed in Supplementary Table S5. The newly developed EST-markers were prefixed PMES1-PMES230 and were validated on twelve parental genotypes for amplification. Out of the 230 EST-SSRs markers, 160 displayed desired amplification of defined product size referring to that of the designing parameters and 33 markers (20.62%) displayed polymorphism (Fig. 3).

Fig. 3
figure 3

Electrophoretic profile of polymorphic EST-SSR markers run on 3% metaphor agarose

The PIC of newly developed EST-SSRs was found in the range of 0.188 to 0.375 with an average value of 0.354 (Table 5). A total of 69 polymorphic loci with 2.2 average number of alleles were detected across the surveyed panel of twelve genotypes Eight primer pairs (PMES-016, PMES-050, PMES-053, PMES-061, PMES-065, PMES-082, PMES-086, PMES-110, PMES-158 and PMES-159) produced three alleles and rest of the 25 polymorphic primer pairs displayed two alleles across the panel of twelve genotypes. The PIC values of markers provided an estimate of their discriminating power in a set of accessions by taking not only the number of alleles but also the relative frequencies of each allele (Smith et al. 2000). Although the PIC values were slightly less than the polymorphism percentage of the pearl millet SSR markers reported in earlier studies, it has been well documented that EST-derived SSRs are less polymorphic than those derived from genomic libraries (Senthilvel et al. 2008; Rajaram et al. 2013). The previous studies using EST-SSR markers for genetic diversity analysis in other crops exhibited PIC value of 0.44 in bread wheat, 0.45 in barley (Gupta and Varshney 2000; Thiel et al. 2003). The average PIC value was lower compared to the PIC values of genomic SSR (0.4–0.9) markers in pearl millet, sorghum, foxtail millet (Caniato et al. 2007). The EST-SSR markers are generally less polymorphic and provide less alleles and lower PIC values as compared to the genomic SSR (Varshney et al. 2009). In the present study, tri-nucleotide repeat markers were more polymorphic than the di-nucleotide, tetra-nucleotide and penta-nucleotide repeat-based markers, as observed previously in pearl millet (Gerard et al. 2006; Senthilvel et al. 2008; Rajaram et al. 2013).

Table 5 Details of 33 EST-SSR primers showing polymorphism among 12 pearl millet genotypes

On the basis of the functions assigned in Blast2GO annotation, eight transcripts harboring the SSR motifs and coding for peroxidase 16-like, transcription factor myc7e, wound stress protein precursor, peroxidase 16-like, zinc finger ccch domain-containing protein 11-like, wrky transcription factor, beta purothionin were found to be more polymorphic than rest of the SSR markers. Functional or gene-based SSR and SNP primers underlying drought tolerance have been reported in pearl millet (Rajaram et al. 2013; Sehgal et al. 2012) and in this study, we preliminary reported functional markers associated with downy mildew resistance. The polymorphic EST-SSR markers could be further employed in validation of mapping population segregating for downy mildew resistance in pearl millet. Such information can be further deployed for QTL mapping for disease resistance.

With reference to the economic losses caused by downy mildew disease in pearl millet and the fact that hitherto little information on gene/EST-SSRs, the primers developed in the present study have added to the existing repertoire of data for pearl millet. The polymorphic EST-SSR markers out of total of 2419 EST-SSRs developed in present study can be used in mapping population segregating for downy mildew resistance for advancing marker-assisted breeding for downy mildew resistance in pearl millet and for association mapping, genetic diversity assessment, genome mapping and saturation of pearl millet genetic map. The EST-SSRs markers can be employed for preparing a “transcript map” and can also be utilized for transferability in other minor millets for which less genetic resources are available.