Introduction

Grain legumes are among the most important crops in many countries and provide one-third of human dietary protein. Most of the economically important grain legumes belong to the genus Vigna in the tribe Phaseoleae. The genus Vigna consists of seven subgenera, namely Vigna, Haydonia, Lasiocarpa, Sigmoidotropis, Plectotropis, Ceratotropis and Macrorhyncha (Marechal et al. 1978). The African subgenus Vigna contains the important species cowpea [Vigna unguiculata (L.) Walp.] and the subgenus Ceratotropis includes about 16 species distributed across Asia, including mungbean [Vigna radiata (L.) Wilczek], azuki bean [Vigna angularis (Willd.) Ohwi & Ohashi], blackgram (Vigna mungo (L.) Hepper], rice bean [V. umbellata (Thunb.) Ohwi & Ohashi] and moth bean [V. aconitifolia (Jacq.) Marechal]. The species belonging to the subgenus Vigna and Ceratotropis are significant as they provide an important and inexpensive source of dietary protein to the people of Asia and Africa. Despite systematic and continuous efforts through conventional breeding approaches, substantial yields in these grain legumes could not be achieved. Further, the yields in these crops are severely affected by many biotic factors like yellow mosaic virus, powdery mildew and bruchid infestation.

Molecular or genetic markers are powerful tools for genetic research and breeding. A variety of molecular marker techniques such as restriction fragment length polymorphism (RFLP, Botstein et al. 1980), random amplified polymorphic DNA (RAPD, Williams et al. 1990; Welsh and McClelland 1990), inter simple sequence repeat (ISSR, Zietkiewicz et al. 1994), microsatellite or simple sequence repeat (SSR, Tautz and Renz 1984) and amplified fragment length polymorphism (AFLP; Vos et al. 1995) have been developed for gene tagging, diversity analysis and genetic linkage map construction in many crop plants. The availability of large expressed sequence tag (EST) databases for many crop species provides a valuable resource for the development of molecular markers like intron length polymorphism (ILP). Introns are non-coding sequences of genes and are less conserved than the exons. The variation in the intron sequences can be exploited as molecular markers (Wang et al. 2005). For developing the ILP markers, intron positions are determined by comparing the EST sequences with the genomic sequences of a model plant. Once the intron is identified, PCR primers are designed for the flanking region so as to amplify the intronic region. ILP markers, being part of the genes, are more useful as genetic markers because they represent variation in the transcribed portion of the genome. ILP markers have been developed in many plant species including soybean (Shu et al. 2010), rice (Wang et al. 2005), tomato (Wang et al. 2010), medicago (Choi et al. 2004) and foxtail millet (Gupta et al. 2011). Like other molecular markers, ILP markers can be used for a variety of applications like molecular mapping, gene tagging, genetic diversity analysis and comparative studies. In addition, ILP markers show a high rate of transferability to related species owing to a higher conservation of EST sequences across species (Wang et al. 2005; Yang et al. 2007). Therefore, ILP markers developed in one species can be used in a related species for which sufficient genomic resources for marker development are not available.

Despite their importance, molecular breeding research is still lagging behind in Vigna species compared to other grain legumes such as common bean and soybean. There is an urgent need to develop a large number of molecular markers in Vigna species, which can be used by breeders in various molecular breeding strategies to increase the breeding efficiency. Therefore, the present study was conducted with an aim of developing ILP markers in cowpea and evaluating their transferability to other Vigna species.

Materials and methods

Plant material and DNA extraction

Ten cowpea genotypes and eight other Vigna species used in the study are listed in Table 1. Total genomic DNA was extracted from 10-day-old seedlings using the modified CTAB method. The quality of DNA was checked on 1 % agarose gel and the quantity was determined at 260 nm using a UV spectrophotometer (Unicam UV 300, UK).

Table 1 List of Vigna species used in the study along with their accession number and source

EST sequence retrieval and primer design

Unigene sequences of cowpea were downloaded from the HarvEST:Cowpea assembly P12 (http://harvest.ucr.edu). These sequences were aligned with the Arabidopsis thaliana and soybean (Glycine max) genomic sequences using the WebGMAP software (Liang et al. 2009) to predict the intron positions. For aligning EST sequences with the genomic sequences, the inter-species mapping option was used (i.e. only hits that satisfy minimum matched query length of 30 nucleotides, minimum matched query identity of 30 % and minimum matched coverage of query sequence of 10 % were considered valid hits). The position of introns was determined by identifying the gaps in the cowpea ESTs. Primer3 software (http://frodo.wi.mit.edu/primer3/) was used to design PCR primer pairs from highly conserved EST sequences to amplify the intronic regions. For designing primers, user-defined parameters were used. viz. optimum primer length was 20 mer (range 18–25 mer), optimum annealing temperature was 60 °C (range 55–62 °C), optimum GC content was 50 % (range 30–80 %) and the rest of the parameters had the default value.

ILP marker analysis

PCR reactions were performed in a 20-μl volume containing 10 mM Tris-HCl (pH 9.0), 50 mM KCl, 1.5 mM MgCl2, 0.2 mM of each dNTP, 0.5 unit Taq DNA polymerase (Bangalore Genei, Bangalore, India), 50 ng template DNA, and 20 ng each of forward and reverse primer. PCR amplifications were performed in an Eppendorf Mastercycler Gradient (Eppendorf, Hamburg, Germany) using the following thermal profile: one cycle of 95 °C for 2 min, followed by 39 cycles of 94 °C for 30 s, 50–65 °C for 30 s (depending on the primer), 72 °C for 1 min and a final extension at 72 °C for 7 min.

To evaluate the allelic variation among the cowpea genotypes, PCR products were resolved on a denaturing polyacrylamide gel. The PCR products were mixed with an equal amount of loading buffer (98 % formamide, 10 mM EDTA, 0.1 % bromophenol blue, 0.1 % xylene cyanol), denatured for 3 min at 94 °C and then kept on ice for 2 min. About 5 μl of each reaction mixture was electrophoresed on a 6 % denaturing polyacrylamide gel containing 7 M urea in 1× Tris-borate-EDTA buffer. Electrophoresis was performed at a constant power of 50 W for about 2 h in a Sequi-Gen GT Sequencing system (Bio-Rad, USA). Gels were stained using a modified silver staining protocol as described in Gupta and Gopalakrishna (2010). The gel was scanned on a photo scanner and the data was scored manually on a white light illuminator.

To access the transferability of cowpea ILP markers across Vigna species, ILP markers were screened on eight other Vigna species. PCR products were resolved on 2 % agarose gel in 1× Tris-borate-EDTA buffer, stained with ethidium bromide and photographed on a gel documentation system (Syngene, UK). To check whether the PCR products amplified the same target gene, the PCR product amplified by the marker CILP66 in different Vigna species was cloned using InsTAclone™ PCR cloning kit (Fermentas, Germany) and sequenced using the M13 universal primer. The multiple sequence alignment was performed using the ClustalW2 program (http://www.ebi.ac.uk/Tools/msa/clustalw2/).

Statistical analysis

Allelic variation was calculated from the frequencies of genotypes at each locus as the polymorphism information content (PIC). Scoring of bands was done as presence (1) or absence (0) for each locus. The PIC of each cowpea ILP marker was calculated by applying the formula of Anderson et al. (1993): PIC = 1 − Σ(P ij )2, where P ij is the frequency of the jth allele for the ith locus. Genetic similarity analyses were performed using NTSYS-pc version 2.0 (Rohlf 1998). Data was entered in a binary matrix as discrete variables and pair-wise similarities were obtained using Jaccard’s coefficient. The matrices of similarity coefficients were subjected to the unweighted pair group method with arithmetic average (UPGMA) to estimate the genetic relatedness among the genotypes and generate the dendrogram.

Results

Intron identification

A total of 12,117 cowpea unigene sequences were aligned with the genomic sequences of Arabidopsis and soybean. With the Arabidopsis genome, 2,926 cowpea ESTs showed significant hits and 778 ESTs had one or more intron insertion sites. The total number of introns present in Arabidospis was 1,539 with an average of 1.97 introns per EST. In the case of soybean, 10,273 cowpea ESTs showed a significant match and 5,914 ESTs with 14,885 introns were identified with an average of 2.51 introns per EST. The maximum number of introns present in an EST was eight in the case of Arabidopsis and ten in the case of soybean. One hundred ESTs were randomly selected for intron number and size comparison. The number of introns identified in the case of Arabidopsis was 190 while in the case of soybean 359 introns were identified in 100 randomly selected ESTs. Only 12 % of ESTs had the same number of introns in both species, while 88 % of ESTs had more introns in soybean than Arabidopsis. The length of introns in soybean was much larger than in Arabidopsis, varying from 53 to 1,104 bp in Arabidopsis with an average intron length of 154.6 bp, while in soybean intron length varied from 73 to 9,255 bp with an average intron length of 439 bp. In Arabidopsis, 79.5 % of introns were less than 200 bp and only 1.6 % of introns were larger than 500 bp (Fig. 1), while in soybean 50 % of introns were less than 200 bp and 31.7 % of introns were larger than 500 bp.

Fig. 1
figure 1

Distribution of intron sizes in Arabidopsis and soybean based on cowpea ESTs

ILP marker analysis

For brevity, 110 PCR primers targeting one or more introns were developed from randomly chosen cowpea EST sequences. Of the 110 PCR primer pairs, 98 primer pairs successfully amplified a single marker fragment in the two cowpea genotypes GC-3 and V-130. Ten primer pairs did not amplify and two primer pairs produced multiple fragments. Based on the size estimation, all PCR products were found to be larger than the targeted EST sequence, suggesting the presence of intron(s) in the amplified products. These markers were named as CILP (cowpea intron length polymorphism) markers and details are provided in the online resource (Supplementary Table S1). The EST sequences used for ILP marker development were functionally annotated based on similarity against the non-redundant protein database in GenBank using the BLASTX search tool (http://blast.ncbi.nlm.nih.gov/Blast.cgi). The putative function was successfully assigned to about 90 % of sequences based on the most informative high-scoring hit and 10 % of ESTs had an unknown or hypothetical function.

Intraspecific variability among ten cowpea genotypes was examined with 45 randomly selected CILP markers. Sixteen CILP markers (36 %) were found to be polymorphic between the genotypes and collectively yielded 33 alleles with an average of 2.0 alleles per locus. The number of alleles varied from one to three. The PIC of individual loci ranged from 0.18 to 0.64, with a mean value of 0.34 (Supplementary Table S2). The UPGMA analysis distributed the ten genotypes into three main clusters with Jaccard’s similarity coefficient ranging from 0.34 to 0.88 (Fig. 2). The first cluster consisted of seven cowpea genotypes (Arka Suman, C-352, CO(CP)7, GC-3, EC634640, GC-4 and IC349906) and genotypes GC-4 and IC349906 had the least distance among the ten genotypes. The second cluster had only one genotype (V-240) and the third cluster contained two genotypes (V-130 and Pusa Phalguni).

Fig. 2
figure 2

Dendrogram showing genetic relatedness among ten cowpea genotypes based on cowpea ILP markers using Jaccard’s similarity coefficient

Cross-species transferability of ILP markers

To check the transferability of CILP markers across Vigna species, all the 98 CILP markers were screened on eight Vigna species. About 93 ILP markers (95 %) were found to be transferable to other Vigna species and five markers (5 %) were cowpea-specific (Supplementary Table S3). Of the 93 transferable CILP markers, 77 were transferable to all eight Vigna species, three markers to seven Vigna species, five markers to six Vigna species and eight markers to less than six Vigna species. Most of the CILP markers produced significant length variation among the Vigna species (Fig. 3). The sequence of the cloned product from the marker CILP66 showed that the same target gene was amplified in all the Vigna species. There were three introns present in the amplified fragments as earlier revealed by WebGMAP analysis. The exonic regions were highly conserved among the Vigna species and large differences, including length variation and point mutations, were observed in the intronic regions (Supplementary Fig. S1).

Fig. 3
figure 3

Amplification of ILP markers CILP66 in different Vigna species. Lanes M 100-bp DNA ladder, 1 V. vexillata, 2 V. umbellata, 3 V. glabrescens, 4 V. aconitifolia, 5 V. mungo, 6 V. radaiata, 7 V. angularis, 8 V. trilobata, 9 V. unguiculata

The 93 CILP markers showing transferability collectively yielded 168 alleles. The dendrogram constructed based on similarity matrix clearly separated the Vigna species into distinct lineages (Fig. 4). Cowpea, along with its wild relative V. vexillata which belong to the subgenus Vigna, were placed separately from the other species belonging to the subgenus Ceratotropis (V. radiata, V. mungo, V. angularis, V. glabrescens, V. aconitifolia, V. umbellata and V. trilobata).

Fig. 4
figure 4

Dendrogram showing phylogenetic relationships among different Vigna species based on CILP markers

Discussion

Precise prediction of intron position is critical for the successful amplification of introns from genomic DNA in a species. In this study, cowpea ESTs were aligned with Arabidopsis and soybean genomic sequences to predict the position of introns. A total of 778 cowpea ESTs carrying 1,539 introns were identified with reference to the Arabidopsis genome and 1,531 ESTs harboring 5,914 introns were identified with reference to soybean genomic sequences. The density of introns is an important feature of genome architecture. Intron density varies by several orders of magnitude between species, mostly involving extensive intron gain or loss (Carmel et al. 2007; Roy 2006). However, the mechanism and evolutionary forces responsible for such gains or losses are largely unknown. In the present study, there were large differences in the size of introns between Arabidopsis and soybean (Fig. 1). The average size of intron in soybean (439 bp) was much larger than in Arabidopsis (154.6 bp). Wang et al. (2010) had also observed vast differences in the size of introns in tomato and Arabidopsis. Various factors like insertion of transposable elements (Bartolomé et al. 2002) and frequency and size of deletion events (Petrov et al. 2000) may lead to changes in intron size.

For designing the PCR primers, those cowpea EST sequences were chosen in which the intron insertion position was the same in both Arabidopsis and soybean. In the study, 98 of the 110 primer pairs were successfully characterized. DNA sequence analysis indicated that the six of the ten primer pairs that failed to amplify had targeted introns of size greater than 1 kb, indicating that large product size may be the major reason for PCR failure. Wang et al. (2010) have also shown that the success rate of intron amplification in tomato depended on the product size and the success rate decreases as the product size increases. The success rate of ILP primer amplification in this study (89 %) was comparable to ILP marker in other plants like tomato (71 %; Wang et al. 2010) and soybean (88.2 %; Shu et al. 2010). Based on significant homology of the target ESTs to the reported proteins, 90 % of the ILP markers were categorized into different functional classes (dehydrogenases, helicases, reductases, phosphatases, kinases, ATPase, transcription factors, transport proteins, receptor proteins etc.).

Forty-five cowpea ILP markers tested on the ten cowpea genotypes were successfully amplified in all the genotypes and 36 % produced length polymorphisms. The PIC value of the polymorphic markers ranged from 0.18 to 0.64 with an average of 0.34, which was higher compared than ILP markers in foxtail millet (Gupta et al. 2011) and lower than that reported in rice (Huang et al. 2010).

The ten cowpea genotypes were grouped into three main clusters based on UPGMA analysis, with Jaccard’s similarity coefficient ranging from 0.34 to 0.88 (Fig. 2). Earlier, using unigene-based microsatellite markers, the 20 cowpea genotypes (which included ten genotypes from the current study) were grouped into six clusters with Jaccard’s coefficient ranging from 0.33 to 0.77 (Gupta and Gopalakrishna 2010). In both the studies, Arka Suman and C-352 were placed in the same group. Similarly GC-3 and CO(CP)7, and V-130 and Pusa Phalguni were placed together. However, there were some differences in the clustering of genotypes in the two studies. These differences may be mainly because of variation in the number of genotypes and number of markers used in the two studies. However, the similar similarity coefficients in the two studies indicate that ILP markers detect substantial variation between cowpea genotypes and can be used for germplasm characterization and genetic diversity studies. ILP markers have been used to detect genetic diversity in other plants such as rice (Zhao et al. 2009) and soybean (Shu et al. 2010), maize (Holland et al. 2001).

In this study, 95 % of the CILP markers were found to be transferable to other Vigna species. Significant differences in amplified product size among Vigna species were observed with most of the CILP markers (Fig. 3). The high transferability of these CILP markers suggests the conserved nature of exons in Vigna species. This was further confirmed by sequencing and multiple alignment of the cloned PCR product, which showed that the homologous regions were amplified in all the Vigna species. Exonic regions were highly conserved, while large variability such as additions, deletions and point mutations were observed in the intronic region, which resulted in length variation and polymorphism among the Vigna species (Supplementary Fig. S1). Similar observations have been made in rice (Wang et al. 2005) and foxtail millet (Gupta et al. 2011). The highly conserved exonic regions and the very variable intronic regions made the ILP markers useful across species and genera, thereby increasing their utility and markedly decreasing the developmental cost. The dendrogram generated based on CILP markers clearly separated the eight Vigna species into distinct lineages (Fig. 4). For example, V. vexillata was clustered closely to V. unguiculata, which was consistent with the earlier RFLP (Fatokun et al. 1993) and gene-derived marker (Wang et al. 2008) studies. Similarly, closely related Vigna species (V. radiata and V. mungo; V. angularis and V. umbellata) were clustered together, which was in complete agreement with the taxonomic classification (Verdcourt 1970). These results indicate that cowpea ILP markers are efficient in revealing the phylogenetic relationship. ILP markers share many advantages of SSR markers like locus specificity, codominant nature, high reproducibility and convenient detection. In addition, being functional markers, ILP markers will directly reflect the variation within the genes and would be more useful in gene tagging, genetic map construction and comparative mapping studies.

In conclusion, 98 ILP markers were developed in the study based on cowpea ESTs and these markers showed high transferability to other Vigna species. The number of these ILP markers in Vigna species could be further increased by utilizing the complete Vigna EST database. These markers will be of great importance in genome analysis and molecular breeding of cowpea and other Vigna species.