Introduction

Rice, Oryza sativa L. is the most important staple crop feeding over half of the world population. The present world population of about 6 billion is likely to cross 8 billion by the year 2030 which necessitates that rice production must increase by 40% to keep pace with the increasing trend in the population growth. Hybrid rice is one of the technological options that will impact the world’s rice production (Khush 2005). Availability of efficient male sterility systems is a prerequisite for the development of hybrids in rice. Mainly, three types of male sterility systems: genetic male sterility (GMS), cytoplasmic male sterility (CMS) and environment-sensitive genic male sterility (EGMS) are well recognized in rice. Of these, the CMS system has been the most effective for hybrid rice breeding. Till date, more than 40 different CMS sources have been reported in rice (Virmani and Shinjyo 1988). Among them, Wild Abortive (WA), Boro Type (BT), Hong Lian (HL), Gambiaca, Dian, Lead (Ld), Dissi and Assam Rice Collection (ARC) sources are currently used in hybrid rice breeding worldwide. However, WA type identified in the wild species, O. sativa f. spontanea is the most widely used CMS source as it is more stable and can be easily restored. The CMS based hybrid breeding system involving male sterile (CMS/A), maintainer (CMS/B) and restorer (R) lines is popularly known as three line breeding (Virmani 1993).

Hybrids have yield superiority of about 15–20% (1.0 tonne/ha) over the best commercial inbred varieties under similar conditions. This yield advantage can be realized only when the hybrid seeds have high genetic purity. It has been reported that even 1% impurity in hybrid seed can lead to yield reduction of about 100 kg per ha at farmers’ fields (Mao et al. 1998). Indian Seed Act (1966) recommends ensuring 98% genetic purity in commercial hybrid seed lots (Sindhu and Kumar 2002). Hybrid rice seed production is a highly technical and skillful job involving various intricacies. The process of hybrid seed production comprises two steps (i) foundation seed production/CMS line multiplication (A/B) and (ii) certified/hybrid seed production (A/R). During the process of foundation seed production, besides biological contamination, CMS line seed lots are often contaminated by its respective maintainer line (shedders) as mechanical admixtures. And it is not possible to distinguish CMS and its respective maintainers based on seed morphology as they are isonuclear genotypes. During hybrid seed production, presence of maintainer line plants in CMS rows, if not rouged out before flowering, will lead to production of genetically impure hybrid seeds containing CMS line seeds (A and B) as well. Use of this contaminated hybrid seeds by farmers will lead to proportionate yield reduction due to the appearance of male sterile plants in commercial hybrid crop. Hence, maintenance of parental line purity is a prerequisite to ensure high genetic purity of hybrid seeds. Traditionally, the seed purity of parental lines and hybrids can be estimated through grow-out-test (GOT) by observing various morphological traits (Verma 1996). Although GOT is dependable, it requires significant amount of time and resources.

Use of DNA markers can accelerate the process of testing the genetic purity of parental lines and hybrid seed lot without going for GOT. Development of DNA markers suitable for use in genetic purity analysis requires a two-pronged strategy involving identification of mitochondrial genome based DNA marker capable of distinguishing male sterile and fertile counterparts of the CMS line and one or two nuclear genome based polymorphic marker for testing homozygous or heterozygous nature of the genotype under test. In the recent past, efforts were made to differentiate CMS and maintainer lines of rice using PCR based DNA markers such as randomly amplified polymorphic DNA (RAPD) (Sane et al. 1997; Jena and Pandey 1999), amplified fragment length polymorphism (AFLP) (Guanghua et al. 2003) and sequence tagged site (STS) (Komari and Nitta 2004; Yashitola et al. 2004). Low reproducibility of RAPD and lengthy process of AFLP markers have made them impractical for their routine use in seed purity analysis. Use of STS markers will be more easy and reliable for large scale genetic purity testing of seed lots. In this study, our aim was to develop a STS marker capable of distinguishing male sterile and fertile counterparts of diverse cytoplasmic sources of rice.

Material and methods

Plant materials and DNA isolation

A total of 91 genotypes comprising 36 CMS lines of 6 different cytoplasmic sources (WA, mutagenized IR62829B, Gambiaca, Dissi, Kalinga and O. rufipogon) with their corresponding maintainers, 40 restorer lines, 5 hybrids, 10 inbred varieties were used in this study (Table 1). In addition, a set of 20 coded samples from a commercial seed lot of CMS line from Pioneer Overseas Corporation, a multinational seed company engaged in hybrid rice research and development in India, were used in this analysis. Genomic DNA was extracted from four-week old rice seedlings, using modified Dellaporta method (Dellaporta et al. 1983).

Table 1 Plant materials used in the study

Amplified fragment length polymorphism (AFLP) analysis

AFLP analysis was carried out as described by Vos et al. (1995) with minor modifications. A total of 64 AFLP primer combinations were selected to analyze polymorphism between male sterile and fertile counterparts of two WA-CMS lines viz., IR58025A&B and IR62829A&B. Amplification was performed in MJ Research thermal cycler with a programme of initial denaturation at 94°C for 2 min, cyclic denaturation at 94°C for 1 min, annealing at 50°C for 1 min, extension at 72°C for 1 min and the cycle was repeated 20 times. Pre-selective amplification was confirmed in 1.5% agarose gel electrophoresis, diluted 50 times with TE0.1 buffer and used for selective amplification using adapter primers with 3 selective nucleotides using the programme of initial denaturation at 95°C for 5 min, cyclic denaturation at 94°C for 20 s, annealing at 65°C for 40 s, extension at 72°C for 1 min followed by another round of cyclic denaturation at 94°C for 1 min, annealing at 56°C for 1 min, extension at 72°C for 1 min and final extension at 72°C for 7 min. Selective PCR amplified products were resolved on 5% denaturing polyacrylamide sequencing gel electrophoresis (PAGE) in 1X TBE buffer followed by silver staining procedure.

Isolation, cloning and sequencing of AFLP fragment

The amplified product distinguishing CMS and maintainer lines was eluted from polyacrylamide gel using sterile razor blade, soaked in 500 μl of buffer containing 500 mM ammonium acetate, 0.1% SDS, and 0.1 mM EDTA, crushed and centrifuged at 14,000 rpm for 15 min. Supernatant containing the eluted fragment was precipitated with 2.5 volume of absolute alcohol and 0.1 volume of 3 M sodium acetate. Samples were incubated at room temperature for 5–10 min, centrifuged at 10,000 rpm for 10 min. Pellet was washed with 70% alcohol twice, air-dried and dissolved in 30–50 μl TE0.1 buffer. DNA concentration of the eluted fragment was tested in 1% agarose gel and used for amplification.

The isolated DNA fragment was further amplified with the selective AFLP primer pair and the PCR product was cloned into pCR TOPO 2.1 vector (Invitrogen, Carlsbad, California) according to the manufacturer’s instruction. Recombinant plasmid DNA was extracted by alkaline lysis method (Sambrook et al. 1989). Five positive clones were used for sequence analysis. Database sequence similarity was performed using Blast-X algorithm (Altschul et al. 1990) through National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov.in) system. Similarity among all clone sequences was carried out using multiple sequence alignment analysis through Gene Tool software version 2.0 programme (Bio Tools Inc., Edmonton, Canada).

Development of Sequence Tagged Site (STS) primers and PCR analysis

On the basis of DNA sequence variation of the cloned PCR product, as observed after database sequence alignment, a specific primer pair was designed using Gene Tool software (Bio Tools Inc., Edmonton, Canada) and synthesized at Sigma-Bangalore, India. These primers were tested for polymorphism among a set of CMS and maintainer lines. PCR amplification was performed in 15 μl reaction volume containing 10–15 ng genomic DNA, 10 mM Tris–HCl (pH 8.4), 50 mM KCl, 1.5 mM MgCl2, 400 μM dNTPs, 2.5 pM each primer and 1 U Taq polymerase. Amplification profile was standardized with the initial denaturation at 95°C for 5 min, followed by 30 cycles of cyclic denaturation at 94°C for 8 s, annealing at 62°C for 5 s, extension at 72°C for 40 s and the final extension at 72°C for 7 min. PCR products were subject to electrophoresis on 1.2% agarose gel followed by ethidium bromide staining and visualized under UV gel documentation system.

Results and discussion

Identification of polymorphic AFLP marker distinguishing CMS and maintainer lines

Out of 64 primer combinations screened on DNA of two WA-CMS lines (IR58025A & IR62829A) with their respective maintainers (IR58025B & IR62829B), one primer combination namely M-CAA/E-ACA produced a polymorphic amplification profile with the presence of an approximately 510 bp fragment in the CMS line and absence in the maintainer line (Fig. 1). The polymorphic profile was further confirmed among other WA-CMS lines with the selected primer pair (data not shown).

Fig. 1
figure 1

AFLP polymorphism observed between CMS (IR58025A&IR62829A) and maintainer lines (IR58025B&IR62829B) in 5% denaturing polyacrylamide gel. Arrow indicates polymorphic fragment of about 510 bp size present in CMS and absent in maintainer lines

Cloning and sequencing of AFLP fragment

As the use of AFLP based DNA marker is a laborious and time intensive procedure, it is not suitable for large-scale analysis. STS markers being site specific, robust, quick in assaying and highly reproducible, and therefore, can effectively overcome the limitations encountered with AFLP markers. As a result, AFLP markers are often converted into STS marker for a simple, rapid and PCR based screening (Reamon-Buttner and Jung 2000). Hence the polymorphic AFLP fragment obtained initially in the present study was eluted and cloned for sequence analysis. Five clones of CMS line insert were sequenced and their exact size was found to be 511 bp. Multiple sequence alignment indicated that the sequences of different clones of same line were identical. The actual size of the polymorphic product was 478 bp excluding the sequence of forward and reverse adapter primers in its 5′ & 3′ ends, respectively (Fig. 2).

Fig. 2
figure 2

Sequence alignment of 478 bp polymorphic AFLP fragment as amplified from WA-CMS lines: IR58025A and IR62829A with rice mitochondrial genome database. The sequence underlined at 3′ end indicates the nucleotide variation at AFLP primer (E-5′-ACA-3′) binding site. Shaded sequences in 5′ and 3′ ends of CMS sequence indicate the region of BF-STS-401 primers (Forward: 5′-GCCACTATTCCACAATGCATG-3′; Reverse: 5′-CCCTTTCCTGCTTCCCTTTTTTA-3′) designed to produce 464 bp PCR product in CMS lines

Homology search and sequence alignment

Database sequence alignment of 478 bp CMS specific DNA sequence using BLAST algorithm revealed 97% similarity to a region of the rice mitochondrial DNA (Accession #: BA000029; Version: BA000029.3) (Notsu et al. 2002). The homology extended between 3–388 bases of the CMS specific sequence that corresponds to 165588–165970 nucleotide region of total 490520 bp in accession BA000029.3. Based on this, the actual 478 bp sequence was located in rice mitochondrial genome database that corresponds to 165586–166060 nucleotide region in accession BA000029.3. The sequences, CMS specific sequence as well as database sequence, were compared by multiple sequence alignment option using Gene Tool software. This sequence alignment indicated that there were about 68 mismatches, which included 4 additions, 21 deletions, 16 transitional mutations and 27 transversional mutations of nucleotides throughout the length of 478 bp of CMS specific sequence. Most of this sequence variation was observed between 389 and 478 bp of CMS specific sequence (Fig. 2).

Our results indicated that the mismatch between CMS specific sequence and database sequence at the primer binding site, which corresponds to 166058 and 166059 nucleotide region, might be responsible for the polymorphism observed between CMS and maintainer lines in AFLP analysis. In that position, CMS specific sequence has 5′-TGT-3′ whereas database sequence has 5′-TCG-3′. The sequence of the reverse AFLP primer E-ACA is complementary to 5′-TGT-3′ of CMS specific sequence and not to 5′-TCG-3′ of database sequence. Therefore, E-ACA could successfully amplify in CMS line. It is inferred from the database sequence that the fertile plants having normal cytoplasm may possess the sequence of 5′-TCG-3′, which is not complementary to E-ACA. This might be the probable reason why E-ACA could not amplify in maintainer lines.

Comparative analysis with rice mitochondrial genome database (www.rmg.rice.dna.affrc.go.jp) indicated that the sequenced region identified in this experiment was found to be the intronic region of nad1 & nad5 subunits of mitochondrial NADH gene in rice. The query sequence also showed 96% similarity with wheat mitochondrial NADH gene subunit nad1 & nad5 and 98% similarity with maize mitochondrial NADH gene subunit nad1 & nad2 (all of which encode subunits of complex I, the NADH dehydrogenase). This indicated that mitochondrial nad genes are highly conserved among monocots (Souza et al. 1992; Clifton et al. 2004). It is evident from earlier studies that sequence variations in mitochondrial genes are responsible for pollen sterility in several crop species including BT cytoplasm of rice (Gallagher et al. 2002; Satoh et al. 2004; Pineau et al. 2005; Yamamoto et al. 2005; Wang et al. 2006). However, genes responsible for pollen sterility in commercially useful WA cytoplasm of rice have not been characterized so far. Whether the intron variation in mitochondrial nad genes identified in this experiment influences cytoplasmic male sterility mechanism operating in rice needs to be investigated.

STS primer designing and PCR analysis

As per the sequence variations observed in male sterile line, a specific primer pair, designated as BF-STS-401 (Forward: 5′-TGCCACTATTCCACAATGCATG-3′; Reverse: 5′-CCCTTTCCTGCTTCCCTTTTTTA-3′), was developed to produce 464 bp PCR product in CMS lines. Although this new primer pair clearly amplified the expected product size of 464 bp in CMS lines, faint amplification of the same size was also observed in maintainer lines. This faint amplification remained unchanged even with stringent PCR conditions (data not shown). In order to distinguish CMS and maintainer lines clearly, an additional mitochondrial genome specific STS primer pair, which produces monomorphic amplification indiscriminately in all CMS and maintainer lines, was multiplexed with BF-STS-401. The primer pair, designated as BF-STS-402 (Forward: 5′-TAGGGCCATGACGGTTTTG-3′; Reverse: 5′-CGCGTCCTTCCCCAATT-3′), was derived from the mitochondrial atp9 subunit of rice (data not shown), which amplified PCR product of 335 bp size. The multiplex PCR amplified 464 bp size product only in CMS and not in maintainer line, and 335 bp size product monomorphic in both lines (in 2% agarose gel electrophoresis) (Fig. 3A). Interestingly, multiplex PCR programme eliminated the faint amplification in maintainer line, which was observed when BF-STS-401 was used alone. A possible reason for the disappearance of faint amplification in multiplex PCR reaction could be the competitive utilization of Taq polymerase enzyme for the sequence specific amplification by the multiplex primer pairs at stringent thermo profile conditions (Wu et al. 1989, 1991). The primer pair, BF-STS-402 also served as a positive control to avoid possible errors in PCR analysis.

Fig. 3
figure 3

Multiplex STS marker (BF-STS-401 & BF-STS-402) analysis (in 2% agarose gel electrophoresis) distinguishing male sterile (CMS) and fertile lines in different genotypes. The genotypes used here are listed in Table 1. M indicates 100 bp DNA ladder. (A) Amplification profile of multiplex STS marker among individual samples of two WA-CMS pairs (A&B). (B) Validation in a set of WA type CMS lines (A&B). (C). Validation in five diverse CMS lines having WA, Mutagenized IR62829B, Gambiaca, Dissi and Kalinga sterility inducing cytoplasmic sources. (D) Amplification profile of multiplex STS primer pair in five WA-CMS based hybrids with their respective parental lines (A, B and R). R*: Restorer line possess sterile cytoplasm). (E) Validation in 20 mixed samples from a commercial seed company. S indicates CMS line; F indicates fertile off type

Validation of STS markers in diverse genotypes

To further investigate the utility of the STS marker developed in the present study, multiplex primer pairs was analyzed in all the 91 genotypes listed in Table 1. Results obtained from the analysis of 36 CMS lines revealed that the multiplex STS marker clearly distinguished all CMS and maintainer lines of different cytoplasmic sources except DMS4A (O. rufipogon cytoplasm) (Fig. 3B, C). In case of DMS4A/B, only the 335 bp product of BF-STS-402 was observed in CMS and maintainer lines (data not shown). These results indicated that five out of six cytoplasms analyzed in this study are closely related at molecular level. In other words, male sterility gene(s) of these CMS sources might be present in the same mitochondrial genomic region as amplified by the multiplex STS primer pair BF-STS-401 in the present study. Traditionally, different cytoplasmic sources are classified based on the fertility restoration behavior in hybrids involving a representative CMS line and a common set of restorers. Further, it has been reported that nuclear genetic background of CMS line also influences the fertility restoration pattern in hybrids to some extent (Rai and Hash 1990). Therefore, the similarities among WA, mutagenized IR62829B, Gambiaca, Dissi, and Kalinga CMS sources observed in this study requires to be confirmed through fertility restoration behavior of these CMS sources in comparison with WA. In case, a significant difference in their fertility restoration behavior is established, this will warrant identification of differential genomic regions responsible for male sterility mechanism among these sources.

The marker was further analyzed in a set of genotypes comprising 40 restorers of WA cytoplasm, 5 hybrids and 10 inbred varieties. The results indicated that all restorer lines except PRR78 (restorer line with sterile cytoplasm) and all inbred varieties produced 335 bp product, as observed earlier in the maintainer lines. The amplification profile of all hybrids was identical to CMS line. This is expected because CMS hybrids also possess the sterile cytoplasm. The restorer line (PRR78) of PusaRH10 hybrid produced the profile similar to either CMS line or its hybrid (Fig. 3D). These results supported that the observed polymorphism might be associated with the sterile cytoplasm.

In order to confirm the utility of the CMS specific multiplex STS marker to detect any contaminant having fertile cytoplasm in either CMS line or hybrid seed lot, a set of 20 mixed samples of a commercial CMS line seed lot obtained from Pioneer Overseas Corporation, Hyderabad were tested. The analysis showed that the mixed samples contained one fertile contaminant and the rest were sterile, which was further verified with their phenotype (M.G. Gangashetti, Personal communication) (Fig. 3E). For further confirmation, the multiplex STS marker was validated at the laboratory of Mahyco Research Centre, Maharashtra, India using the samples of Maharashtra Hybrid Seeds Company Limited, India (data not shown). These results suggested that the multiplex marker (BF-STS-401 & BF-STS-402) can reliably identify the contaminants in commercial samples.

Generally, CMS lines are multiplied in seed production plots with satisfactory isolation distance absolutely leaving no or little chance for a biological contamination through foreign pollen coming from nearby rice fields. Under such circumstances, the only impurity that can be expected in CMS line seed lot comes from maintainer line as a mechanical admixture during various stages of handling the CMS line seed including harvesting and threshing. The multiplex marker developed in the present study can effectively be deployed in distinguishing CMS line seed from that of maintainer or any other self-fertile genotype without going for laborious and lengthy procedure of GOT. Further, the present marker can be used in conjunction with one or two nuclear genome based DNA markers to test the genetic purity of CMS line or hybrid seed lot (Nandakumar et al. 2004). However, the selection of polymorphic markers is very crucial and should be done with extreme precaution as there are reports that single polymorphic marker may not always be able to distinguish all the contaminants in the commercial seed lots. In such cases, application of two markers can further improve the accuracy of the seed purity analysis. Komari and Nitta (2004) have designed a strategy to test the seed purity of japonica rice hybrids based on BT cytoplasm where they have advocated the use of two sets of markers, one for differentiating CMS line/hybrid plants from maintainer/restorer line and the other one for differentiating between maintainer and restorer lines. To differentiate maintainer and restorer lines, the marker linked to fertility restorer gene (Rf1) has been suggested. Recently, Garg et al. (2006) have reported the utility of a fertility restoration gene linked marker for testing the genetic purity of hybrid seed lots in rice.

In conclusion, multiplex STS primer pair reported in the present study will be useful to differentiate CMS line (having WA, Mutagenized IR62829B, Gambiaca, Dissi and Kalinga cytoplasms) from their corresponding maintainers and any other male fertile genotypes. This marker, in combination with carefully chosen nuclear genome based markers specific to hybrid-parental lines, can be reliably used to test the genetic purity of CMS line and hybrid seed lots on commercial scale.