Introduction

Pigeonpea [Cajanus cajan (L.) Millspaugh] is an important food legume crop of the semi-arid tropical regions of Africa and Asia. In spite of breeding a number of pure line varieties, the productivity of pigeonpea has remained low at around 780 kg/ha for the last five decades (http://faostat.fao.org/). In order to achieve a breakthrough in the productivity of pigeonpea, a hybrid breeding technology based on elements of the cytoplasmic-nuclear male-sterility system has been implemented in the pigeonpea breeding programme at the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT) (Saxena et al. 2005; Saxena 2008). Besides giving a 30–35% yield advantage, the pigeonpea hybrids also have greater tolerance to drought and resilience against some common biotic and abiotic stresses. It is, therefore, anticipated that the area under pigeonpea hybrids will increase substantially in the near future and will contribute towards increasing the crop production. However, this is possible only when the hybrid breeding system is versatile and is adapted to different agro-ecological zones and cropping systems.

To exploit the full potential of heterosis and enhance the adaptation of hybrids to different regions, it is important to have genetic diversity information as well as phenotypic information on parental lines. Prior selection of crossing parents on the basis of genetic divergence instead of evaluating F1, F2, and advanced generations may help breeders to concentrate their efforts on the most promising combinations. Based on molecular markers assays, genetic distance as well as prediction of heterosis has been estimated in several crop species such as rice (Zhang et al. 1994), maize (Lanza et al. 1997; Barbosa et al. 2003) and sorghum (Jordan et al. 2003). In the case of pigeonpea, although hybrid breeding programme at ICRISAT has been developing and using a number of hybrid parental lines, no genetic diversity information is available. This may be attributed partly to the unavailability of adequate molecular marker resources in the past. Therefore, estimation of genetic diversity in parental lines should be useful for selecting crossing parental genotypes that may enhance the hybrid vigour.

Another important challenge in sustainable hybrid production is managing the supply of adequate quantities of pure hybrid seeds to the farmers. Like hybrids for any crop, a high level of genetic purity is essential in maintaining the necessary level of heterosis observed in this crop. Traditionally, breeders or seed companies carry Grow-Out Tests (GOTs) on representative samples of the seed lot to assess the purity of hybrid seeds. The GOT involves growing plants to maturity and assessing several morphological and floral characteristics to determine the purity of the hybrid. As only one cycle of crop can be grown annually in pigeonpea, it will take almost a full cropping season to assess the purity of hybrid seeds by using conventional GOT. This, in turn, will result in major capital investment in storing the seeds for the next cropping season. Furthermore, GOT can be subjective, as several aspects of plant phenotype can be affected by environmental conditions. Thus there is an urgent need for a precise and efficient assay in pigeonpea so that hybrid seeds produced in one season can be released for marketing and cultivation in the same season. In fact, molecular markers-based hybrid purity tests have been developed and are in routine use in many species such as rice (Jena and Pandey 1999; Yashitola et al. 2002; Sundaram et al. 2008), maize (Asif et al. 2006), cotton (Ali et al. 2008) and safflower (Naresh et al. 2009).

Because simple sequence repeat (SSR) markers, in reasonably good numbers, have recently become available in pigeonpea (see Varshney et al. 2010), these SSR markers can be used for assessing genetic diversity in hybrid parental lines as well as for developing molecular markers-based hybrid purity test. Therefore, the present study was undertaken with the following objectives: (1) molecular characterization of parental (A, B, and R) lines of selected hybrids; (2) elucidation of genetic relationships among A, B, and R lines of hybrids; (3) identification of the most suitable markers for assessing purity of ICPH 2438 hybrid seeds; and (4) detection of heterozygosity in hybrid parental lines.

Materials and methods

Plant material and DNA extraction

A total of 159 pigeonpea lines including 37 A (cytoplasmic male sterile), 38 B (maintainer), and 84 R (fertility restorer) lines that are in use in the hybrid breeding programme at ICRISAT were used for molecular characterization (Electronic Supplementary Material [ESM] Table 1). The A lines used in present study represented different backcross generations (BC2 to BC9) while all the B and R lines used represented recombinant inbred lines (RILs). For identification of suitable markers for assessing the purity of the ICPH 2438 hybrid, seeds of this hybrid were obtained from the Pigeonpea Breeding Division of ICRISAT and another batch of hybrid seeds was obtained from Maharashtra State Seeds Corporation Limited (Mahabeej), Nagpur, India.

Genomic DNA was isolated and purified from leaves of 2–3 weeks old single plants from each line following the protocol described in Cuc et al. (2008). The DNA quantity for each sample was assessed on 0.8% agarose gel and DNA concentrations were normalized at 5 ng/μl.

SSR markers and polymerase chain reactions (PCRs)

For molecular characterization of parental lines of hybrids, a total of 148 unlabelled primer pairs were used (ESM Table 2). For purity assessment of hybrid seeds and to detect the heterogeneity in the parental lines, labeled primer pairs for two selected polymorphic SSR markers (CCB4 and CCttc006) were used. The amplification conditions, however, remained the same in both cases.

PCRs were performed in a 5 μl reaction volume [0.5 μl of 10 × PCR buffer, 0.3 μl of 25 mM MgCl2, 0.5 μl of 2 mM dNTPs, 0.15 μl of 10 pM primer (MWG-Biotech AG, Bangalore, India), 0.3 U of Taq polymerase (Bioline, London, UK), and 1.0 μl (5 ng) of template DNA] in 96-well microtitre plate (ABgene, Rockford, Illinois, USA) using thermal cycler GeneAmp PCR System 9700 (Applied Biosystems, Foster City, California, USA). A touch-down PCR programme was used to amplify the DNA fragments. An initial denaturation for 3 min at 95°C was used. This was followed by initial 10 cycles of denaturation for 20 s at 94°C, annealing for 20 s at 55°C (the annealing temperature for each cycle being reduced by 1°C per cycle), and extension for 30 s at 72°C. Subsequently, 35 cycles of denaturation for 20 s at 94°C, annealing for 20 s at 48°C, and extension for 30 s at 72°C were used, followed by 20 min final extension at 72°C.

Electrophoresis of PCR products

For diversity analysis, the amplification products obtained by using non-labeled primer pairs were separated on a 6% polyacrylamide gel and visualized by silver staining (Promega Silver Sequencing System, Wisconsin, USA) as described in Thudi et al. (2010). For purity assessment experiments, the amplification products obtained by using fluorescent dye-labeled primer pairs, together with Liz Gene Scan-500 labeled internal size standards, were analyzed on 36-cm capillaries with POP7 polymer on ABI 3130 Genetic Analyzer. Fragment analysis data were collected by the data collection software and pre-processed by GeneScan software version 3.7 (Applied Biosystems, Foster City, California, USA). GeneScan data were imported, converted to pseudogel images, and further analyzed by Genotyper software version 3.7 (Applied Biosystems, Foster City, California, USA).

Data scoring and analysis

For diversity analysis, marker profiles obtained on silver-stained polyacrylamide gels were scored manually. For purity assessment experiments, the allelic data obtained in bp were analyzed as allele sizes.

For understanding relationships among parental lines of hybrids, allelic data thus obtained were used to prepare a dissimilarity matrix and to construct a two-dimensional (2D) plot using the factorial analysis method with DARWIN V5.0.128 software (Perrier et al. 2003).

The polymorphism information content (PIC) refers to the value of a marker for detecting polymorphism within a given germplasm, depending on the number of detectable alleles and the distribution of their frequency. In the present study, the PIC value of markers was calculated using the following formula (Anderson et al. 1993):

$$ PIC = 1 - \sum\limits_{i = 1}^{k} {P_{i}^{2} } $$

where k is the total number of alleles detected for a given marker locus and P i is the frequency of the ith allele in the lines analyzed.

Results

Microsatellite (SSR) polymorphism in hybrid parental lines

A total of 148 SSR markers including 116 available in the public domain and 32 newly developed markers at ICRISAT and reported for the first time in this study (ESM Table 2) were used for characterization of the 159 parental lines (ESM Table 1) of hybrids. These lines included 37 A (cytoplasmic male sterile) lines, 38 B (maintainer) lines, and 84 R (fertility restorer) lines that are being used in the hybrid breeding programme at ICRISAT. As a result, 41 markers (Table 1) showed polymorphism among 159 lines analyzed. These polymorphic markers amplified a total of 130 alleles with an average of 3.1 alleles per marker in the lines surveyed (Table 1). The majority of the markers (17) amplified two alleles, while a maximum of six alleles were amplified by two markers (CCtta011 and CCtc002). The PIC values calculated for these 41 polymorphic markers were in the range of 0.01 (CCac007 and CCac031) to 0.81 (CCtta011) with an average of 0.41 per marker (Table 1).

Table 1 SSR polymorphism among male sterile (A), maintainer (B), and restorer (R) lines

Among 37 A lines studied, the 40 polymorphic markers amplified a total of 106 alleles with a range of 2 (21 markers) to 6 (CCtta011) with an average of 2.6 alleles per marker. The PIC values, across A lines, ranged from 0.05 (ICPM131) to 0.75 (CCtta011) with an average of 0.34. In the case of 38 B lines, the 34 polymorphic markers amplified a total of 94 alleles with a range of 2 (17 markers) to 6 (CCtta011) with an average of 2.7 alleles per marker. The PIC values across B lines varied from 0.06 (CCgtt004) to 0.78 (CCtta011) with an average of 0.39 per marker. In the case of 84 R lines, 115 alleles were obtained by 39 polymorphic markers with a range of 2 (19 markers) to 6 (CCtta011 and CCtc002) with an average of 2.9 alleles per marker and the PIC values varied from 0.03 (CCtta008) to 0.78 (CCtta011) with an average of 0.37 per marker.

Genetic relationships among A, B, and R lines

Genetic dissimilarity among the parental lines varied from 0.03 (ICPB 2044 and ICPB 2162) to a maximum of 0.52 (ICPL 92045 and ICPA 2148) with a mean dissimilarity of 0.28. The genetic dissimilarity estimates for 159 lines were employed to generate a two-dimensional distribution plot by using factorial analysis with DARWIN V5.0.128 software (Perrier et al. 2003) (Fig. 1). Parental lines were classified into two main clusters ‘I’ (64) and ‘II’ (95) (Fig. 1) and could distinguish all 159 (A, B, and R) lines. Most of the male sterile lines (34) were grouped in cluster ‘I’ with 14 maintainers and 16 restorers. For instance a male sterile line ICPA 2039 and its corresponding maintainer line ICPB 2039 were clustered together in cluster ‘I’. Three male sterile lines ICPA 2043, ICPA 2047, and ICPA 2092 were found to be grouped into cluster ‘II’. As the majority of the lines (120) used for the present study were of non-determinate (NDT) growth habit and 39 lines represented determinate (DT) growth habit of the plant, it was interesting to note that 34 DT lines were grouped in major cluster ‘I’ with 30 NDT lines. On the other hand, the cluster ‘II’ contained the remaining five DT and 90 NDT lines.

Fig. 1
figure 1

Diversity analysis among 159 (37 A, 38 B, and 84 R) lines of pigeonpea. The figure shows A, B and R lines in blue, red and black, respectively

Relationships among parental lines of hybrids

A total of 56 combinations of A, B, and R lines included in the present study are being utilized for the production of hybrids in the Pigeonpea Breeding Division at ICRISAT. Therefore, marker genotyping data were used for understanding the diverse nature of parents, i.e. A and R lines, and A and B lines of a precise combination. In this regard, the number of polymorphic markers and the dissimilarity index was estimated between A and R line, and A and B line for each of 56 hybrids (ESM Table 3). Varied levels of diversity were observed between parent genotypes for different hybrids. For instance, 26 hybrids showed higher genetic diversity (>30%) while 30 hybrids showed the least genetic diversity (<30%) between respective A and R lines. The highest genetic dissimilarity (41%) between A and R lines was found for ICPH 2460 and ICPH 3522, whereas the lowest genetic diversity (14%) was observed for ICPH 3341 and ICPH 3477. On analyzing the relationships of A and B lines in pairwise combinations of respective hybrids, genetic similarities were observed in the range of 67% (ICPA 2042 and ICPB 2042) to 94% (ICPA 2039 and ICPB 2039) with an average of 78%. In order to assess whether genetic divergence between A and R lines for a given hybrid combination has a correlation with yield, genetic dissimilarity coefficient values between A and R lines for 56 hybrids were analyzed in relation with the yield data for two environments. However, no significant correlation of genetic dissimilarity coefficient values between A and R lines was observed with the yield of the respective hybrids (data not shown).

Diagnostic markers for hybrid purity assessment

As the hybrid ICPH 2438 is being used for commercial purposes, this hybrid was selected for molecular markers-based hybrid purity testing. In the parental lines of ICPH 2438, i.e. ICPA 2039 and ICPR 2438, 22 SSR markers showed polymorphism and provided an average genetic dissimilarity value of 0.37. For hybrid purity testing of this hybrid, SSR markers were selected further based on the following two criteria: (a) markers amplify polymorphic and high quality (based on peak height) alleles between A and R lines; and (b) the selected markers based on criterion (a) do not show polymorphism between the A and the B lines. While 22 markers fulfilled the first criterion (a), 21 SSR markers were found after implementing the second criterion (b) (Table 2).

Table 2 List of SSR markers showing polymorphism between parental lines of ICPH 2438 hybrid

Although 21 SSR markers were identified as polymorphic markers between A and R lines but monomorphic between A and B lines, two SSR markers, namely CCB4 (Fig. 2) and CCttc006, were found to be the most suitable diagnostic markers for hybrid purity assessment. To assess the reliability of these two markers, 379 and 91 seeds from seed-lots of ICPH 2438 obtained from ICRISAT and Mahabeej, respectively, were analyzed together with seeds of A (ICPA 2039) and R (ICPR 2438) lines. Analysis of markers CCB4 and CCttc006 on DNA samples of 379 seeds of the hybrid seed-lot from ICRISAT showed hybrid purity (alleles from both A and R lines) in 94.2 and 98.7% of cases, respectively. Similarly, the 91 seeds from the seed-lot of the hybrid from Mahabeej showed hybrid purity in 95.6 and 97.8% of cases with CCB4 and CCttc006 markers, respectively (Table 3).

Fig. 2
figure 2

A snapshot showing the hybrid purity assessment of hybrid ICPH 2438 with the CCB4 marker. The parental lines, ICPA 2039 (A line) and ICPR 2438 (R line) show 228 bp and 220 bp alleles, respectively, on screening with a diagnostic SSR marker (CCB4), while the seeds (seed 1 and seed 2) showing the presence of both alleles (228 and 220 bp) represent true hybrid

Table 3 A summary of purity assessment of ICPH 2438 hybrid seeds from two sources

Detection of genetic heterozygosity in parental lines

It is important to have 100% or maximum level of homozygosity in A and R lines to produce hybrids of high yield performance as well as uniform seeds. Since SSR markers are co-dominant (Gupta and Varshney 2000), they can be used for identification of both homozygotes and heterozygotes at a particular marker locus. Thus, these markers have great potential to test even residual heterozygosity within the parental lines. To test genetic heterozygosity of parental lines of the ICPH 2438 hybrid, 48 seeds of each parental line, i.e. ICPA 2039 and ICPR 2438, were analyzed with CCB4 and CCttc006 markers. However, no heterozygote was observed within A and R lines with either marker.

Discussion

This study reports for the first time the primer sequence for 32 novel SSR markers, and therefore enhances the existing repertoire of 156 available SSR markers for pigeonpea in the public domain (10–Burns et al. 2001, 20–Odeny et al. 2007, 110–Odeny et al. 2009, 16–Saxena et al. 2010a). As a result of a polymorphism survey of 159 hybrid pigeonpea parental genotypes representing male sterile (A), maintainer (B), and restorer (R) lines with 148 SSR markers, 27.7% markers showed polymorphism. The lower level of polymorphism observed in this study is in agreement with earlier SSR-based diversity studies in pigeonpea (Burns et al. 2001; Odeny et al. 2007; Saxena et al. 2010a,b). Like the study of Burns et al. (2001), all the genotypes employed in the present study were from the cultivated gene pool and reported only 3.1 alleles per primer pair across the genotypes. A lower level of diversity was also reported based on other marker systems such as amplified fragment length polymorphism (AFLP; Panguluri et al. 2006) and Diversity Array Technology (DArT; Yang et al. 2006). In contrast to lower levels of diversity among A lines in the present study, Souframanien et al. (2003) reported a higher level of genetic diversity. It is, however, important to note that in the present study, all A lines were derived from C. cajanifolius (A4) while in the study of Souframanien et al. (2003), the A lines analyzed were derived from two different wild relatives, C. sericeus and C. scarabaeoides.

In the present study, no consistent relationship was observed between microsatellite polymorphism and repeat unit length (data not shown). In past, some studies reported a positive relationship between degree of polymorphism and the repeat unit length (Weber 1990; Moretzsohn et al. 2005; Kapila et al. 2008), while several other studies reported no relationship or a weak correlation between microsatellite polymorphism and repeat unit length (He et al. 2003; Ferguson et al. 2004; Cuc et al. 2008; Saxena et al. 2010b).

Although, as mentioned above, in the recent past some SSR-marker-based genetic diversity studies have been undertaken in pigeonpea (Burns et al. 2001; Odeny et al. 2007; Saxena et al. 2010a,b), SSR markers have not yet been exploited to assess diversity within and among A, B, and R lines of A4 (C. cajanifolius) cytoplasm, being used for development of pigeonpea hybrids at ICRISAT. Therefore the results of this study cannot be compared with existing studies. Nonetheless, the present study grouped the majority of A lines (34) and 14 B lines together in the major cluster ‘I’, while the majority of the B lines (24) and a few A lines (3) were grouped in the other major cluster ‘II’. In an ideal condition for hybrid breeding, A and B lines should be homogeneous except for the male sterility, but in the present study the majority of the A lines used were still in early backcrossing generations (BC2 to BC4). Therefore, these A lines cannot be considered as iso-nuclear with the corresponding B lines. On the other hand, ICPA 2039 and ICPB 2039 used in the study represented the BC9 generation and these lines were found to share the maximum numbers of common alleles and were grouped together in cluster ‘I’ with 94% genetic similarity. In fact, at present these lines are in the BC11 stage and higher genetic similarity in these lines is expected if they are examined for molecular diversity. Therefore, the genetic background of this particular combination of A and B lines is satisfactory for producing homogeneous A line seeds. Differences that remain between these iso-nuclear lines are mainly due to differences in the organellar genomes, while the nuclear material is expected to be almost similar after several backcrosses. Also, the grouping of several A lines together in the same cluster can be attributed to the shared pedigree of the maintainer lines from which these A lines have been generated. On the other hand, differences in the pedigree of maintainer lines for some other A lines caused them to be classified in the other cluster.

In general it is assumed that a greater variation present between A and R lines leads to maximum hybrid vigour for the hybrid. However, no correlation was observed between genetic distance of A and R lines with the yield of hybrids analyzed. While some reports showed a positive correlation of genetic distance between A and R lines with yield (Zhang et al. 1994; Lanza et al. 1997; Barbosa et al. 2003; Jordan et al. 2003), several other studies showed either a weak or no correlation (Chowdari et al. 1998; Kidwell et al. 1999; Kwon et al. 2002). Some factors that are responsible for hiding the correlation between genetic divergence and yield include the use of a smaller set of SSR markers, and that too coming from anonymous regions of genomic DNA. In fact, if markers linked with heterotic components are used for such analysis, there is a high possibility of observing the correlation between genetic divergence and hybrid vigor (Bohn et al. 1999; Prasad et al. 2000).

The commercial success of hybrid pigeonpea technology depends to a large extent on the quality of hybrid seeds supplied to farmers. This makes it necessary to take every possible care at each step of seed multiplication before marketing. Therefore, genetic purity assessment of hybrids is not just critical for farmers but also for plant breeders and the large-scale seed-producing private sector, as well as quality control agencies. The duration and other limitations of field GOT can be managed effectively by applying molecular markers. The use of SSR markers for assessing seed purity of hybrids is almost routine for several crops, e.g. tomato (Smith and Register 1998), rice (Yashitola et al. 2002; Sundaram et al. 2008), safflower (Naresh et al. 2009), and cotton (Ali et al. 2008). Although some studies reported the suitability of even one marker for hybrid purity assessment tests (Yashitola et al. 2002, Mishra et al. 2003, Nandakumar et al. 2004), Sundaram et al. (2008) demonstrated the superiority of using two to three markers in a multiplex over single-marker system. The present study reports the suitability of two SSR markers for a hybrid purity test. Furthermore, both of these markers amplify distinct alleles in A (CCB4-228 bp; CCttc006-293 bp) and R lines (CCB4-220 bp; CCttc006-290 bp) which suggests their use in a multiplex manner for hybrid purity assessment. This will reduce the costs and time of PCR assays and also will increase the accuracy in determinating purity in parental lines as well as in hybrid seeds. It is also important to note that both of these markers reported a rather varied level of purity in the seed lots of ICRISAT (CCB4-94.2%; CCttc006-98.7%) and Mahabeej (CCB4-95.6%; CCttc006-97.8%). This is not surprising as both of these markers most likely represent two different loci in the genome and do not seem to be linked; therefore, they would differ a little in segregating the alleles in a given population (Hashemi et al. 2009).

The residual heterozygosity, if any, in A and R lines may result in the breakdown of the CMS system in the long run (Jena and Pandey 1999) and may produce hybrids with poor performance. Therefore, it is recommended that A and R lines should have the highest level of homozygosity. Screening of A and R lines for ICPH 2438 with two markers CCB4 and CCtttc006 showed 100% homozygosity in these lines. This clearly indicates the high quality of maintenance of purity of hybrid parental lines at ICRISAT.

In conclusion, this study adds a novel set of 32 SSR markers and reports a polymorphism survey on 159 parental (37 A, 38 B, and 84 R) lines of hybrids with 41 polymorphic markers out of a total of 148 SSR markers used. The most important outcome of this study is identification of two SSR markers, namely CCB4 and CCttc006, for testing the purity of hybrid ICPH 2438.