Introduction

Exploiting the hybrid vigor phenomenon or heterosis is important for improving crop yield (Schnable and Springer 2013). The use of heterosis in rice, rape, sorghum and other major crops depends on male sterile lines (Atlin 1995, Lambright 2019, Westhues et al. 2017, Xie et al. 2019). Cotton (Gossypium sp.) is a major fiber and oil crop that is widely cultivated in the world as an essential industrial resource for human and social development. Like other crops, heterosis has been exploited as an important way to improve lint yield and fiber quality in cotton. In cotton production practices, the superior F1 exhibits significant advantage of early maturity and high yield, fiber quality, biotic and abiotic stress tolerances over its parents (Zhang & Pan 1999, Munir et al. 2016, Tian et al. 2019, Zaidi et al. 2020). Therefore, breeding for cotton hybrids is an important way to achieve the breeding goal. Most F1 hybrids commercially planted in cotton production are produced by hand-emasculation and pollination (HEP). Although F2 seeds are not strictly considered as hybrids, F2 seeds are being extensively planted in China due to high cost to produce F1 seeds. Compared with F1 plants, F2 plants may result in yield reduction. In the 1990s, F2 seeds were planted in more than 80% of hybrid cotton growing area in China. Since the beginning of 21st centry, with the successful planting of hybrids containing transgenic Bt trait, the utilization of cotton hybrids has been accelerated and resulted in covering whole cotton growing region in Yangtze (YaRCGR), and most of the Yellow River (YeRCGR) and partly in the Northwestern Inland Cotton Growing Region (NICGR) including Xinjiang.

Male sterility had been found in 617 species belonging to 162 genera in 43 families, including major crops such as rice, maize, wheat, cotton (Shen et al. 2017, Wan et al. 2019, Yang et al. 2021). The application of a male sterility line can facilitate large-scale production of crop hybrids. The male sterility system is one of the important means for hybrid breeding and production in cotton (Zhang & Pan 1999, Budar & Pelletier 2001). Both cytoplasmic (CMS) and genic male sterile (GMS) hybrids have been developed and planted commercially in China and India. GMS genes (ms2 and ms5ms6) and casual mitochondrial genes for G. harknessii CMS used commercially in hybrid seed production have been cloned (Wu et al. 2022, Ma et al. 2022, Mao et al. 2022, Xuan et al. 2022).

During the past few decades, more than 100 hybrid cultivars, most were transgenic Bt bollworm-resistant cultivars, have been developed and extensively planted in China. The extension and application of such hybrids played an important role in cotton production in China. Almost all cotton growing areas were planted with transgenic Bt hybrids produced by HEP, CMS and GMS in the YaRCGR, and mostly in YeRCGR. Most of the hybrids grown now are produced by hand-emasculation and pollination in cotton because this type of hybrid is easy to develop due to its wide parent selection and some of them can be used in F2. In this article, advances in cotton hybrid cultivar development by HEP, CMS and GMS are reviewed. Insight into future development and commercial production of cotton hybrid is provided.

Hybrid breeding and production by hand-emasculation and pollination

Hybrid cultivar development by hand-emasculation and pollination

Loden and Richmond (1951), Meyer (1969), Davis (1978), Meredith (1984, 1990) and Zhang & Pan (1999) made comprehensive reviews concerning the general situation of heterosis utilization at different times. Interspecific heterosis between G. hirsutum L. and G. barbadense L. was first published in cotton in 1894 (Mell 1894). Consequently, Ball (1908), Hua et al. (1963), etc. further proved the prominent interspecific heterosis in agronomic and fiber characteristics (see Zhang & Pan’s review 1999). This kind of interspecific F1 hybrid combinations such as Varalaxmi, JKHyl1, DCH32, HB224, NHB12, and TCHB213 developed in India had been cultivated for the first time in production (Paroda & Basu 1993).

Kime & Tilley (1947) reported this intervarietal hybrid yield heterosis performance of F1, and F2 as well, and further revealed for the first time that the possibility of producing super F1 hybrid with large heterosis was greater in crosses between two relatively distant parents. Simpson (1948) not only confirmed the exact existence of hybrid vigor, but also the consecutive presence of obvious hybrid vigor in F2 generation. However, the heterosis in cotton has not yet been utilized effectively in USA, and other cotton growing countries.

The exploration of hybrid cotton with high yields and excellent fiber qualities had been achieved for the first time by Dr. Patel and his colleagues in India. The successful Hybrid 4 was bred by crossing a commercial Gujarat variety G-67 with ‘American Nectariless’ introduced from USA. This hybrid cotton was characterized by very strong boll setting capability and good fiber quality spinning potential. This was thus the first commercial hybrid released not only in India, but also in the world. Not long after the development of Hybrid 4, another distinguished interspecific hybrid Varalaxmi (G. hirsutum × G. barbadense), was released in the Karnataka. Varalaxmi had the same yield potential of Hybrid 4, but better fiber qualities. Since then, many interspecific and intervarietal hybrids have been bred, released, and planted in a large scale in place of conventional varieties. It was considered as one of the main factors for increasing yield and improved fiber qualities during the 1970s in India. Hybrids occupied 28% of total cotton area, while the total yield from hybrid cotton accounted for about 40% of total cotton production in the whole India in 1993 (Paroda & Basu 1993).

Almost at the same time, scientists in Sichuan, China, discovered a male sterile plant in the cultivated Dongting 1 which was pedigree-selected from Deltapine 15 in 1972. Hybrid cotton development using this naturally-occurring GMS mutant designated as Dong-A was initiated and more than 20 hybrids have been developed in this province (Huang et al. 1982, Zhang et al. 1992, Zhang 1995, Zhang & Jing 1997, Zhang & Pan 1999, Huang 2007, Xing et al. 2017). Scientists in other provinces such as Henan, Hunan also focused their efforts toward developing cotton hybrids by HMP. Several hybrids such as Xiangzamian 2 (XZM 2), Wanza 40, Jimian 16 were developed; however, their F1 seeds were not planted commercially due to too high cost to produce F1 hybrid seeds by HMP. Therefore, their F2 seeds were extensively planted instead.

Since the beginning of 21st century, with the successful introduction and utilization of transgenic Bacillus thuringiensis (Bt) cotton inbreds containing a toxin-producing gene, a lot of transgenic Bt hybrid cultivars have been developed and the utilization of cotton hybrids has been accelerated. F1 hybrid cottons are being planted in almost entire YaRCGR and most of the YeRCGR. Many transgenic Bt hybrids such as Nankang 3 and Sikang 3 released in Jiangsu, Lumianyan 25 in Shangdong, Biaoza A1 in Hebei, Cikangza 3 in Zhejiang, Huakangmian 1 in Hubei, Xiangzamian (XZM) 8 in Hunan have been developed and commercially planted on a large scale in China. In India, hybrids containing Cry1Ac from Monsanto were developed and commercially planted by Indian seed companies (Jayaraman 2005). Two largest cotton growing countries in the world have developed and grown many transgenic insect-resistant hybrids in cotton.

F2 Heterosis and its utilization

Heterosis existing in F2 hybrids was reported early and commanded certain interests in USA (Simpson 1948, Meredith et al. 1970, Meredith 1990, Dever & Gannway 1992, Tang et al. 1993a, 1993b). In China, almost at the same time of identification and utilization of Dong-A GMS line in the mid seventies last centry, the hybrid vigor in F2 in cotton production has been testified and found to exist usually 6 ~ 10% higher yield than the control cultivars (Xing et al. 1987, Huang 2007). From the theoretical point of view, F1 hybrid is heterozygous genetically and may express relatively prominent heterosis, while F2 generation segregates into homozygous and heterozygous individuals. Cotton indefinite growth is characterized by its long duration of flowering, boll-setting and harvesting; therefore, there will be no segregation in maturing stage in F2. It had been well known early that there is exactly hybrid vigor in F2 generation (Kime & Tilley 1947, Simpson 1948). Although yield reduction in F2 was usually observed compared to its F1, F2 as a less expensive alternate to F1hybrid has been used quite often in China. Accordingly, hybrid vigor in F2 should decrease due to the decrease in heterozygous plants. However, there may still exist some amount of heterozygous individuals and heterosis as well. Since the late eighties last century, seven hybrids such as Zhongmiansuo (ZMS) 28 (Jing et al. 1995), Jimian 18 (Cui et al. 1994), XZM 1 and 2 (Li et al. 1997), and Suzha 16 (Qian et al. 1997) specifically for F2 generation have been developed and released at different times in China (Huang 2007). Their yield performance and growing area are given in Table 1. In New Cotton Cultivar Trial Test organized by Hunan province from 1994 to 1996, its lint yield of XZM 2 averagely increased by 19.30% in F1, and 7.4% in F2. Our QTL mapping by using immortalized population of XZM 2 revealed that dominance played an important role in the genetic basis of heterosis for yield and its components (Liu et al. 2012). However, additive genetic variance was predominantly responsible for genetic variability in fiber quality traits, therefore, low level of heterosis existed in XZM2 for this trait (Wang et al. 2007). This hybrid cultivar was released in Hunan in 1997 and further expanded to theYaRCGR in 2001. It was planted in more than 1.5 mil. ha. Another well-known hybrid cultivar, Wanza 40, increased lint yield by 16.48% for F1 in 1996, and 11.40% for F2 in 1997, in New Cotton Cultivar Trial Test organized by Anhui province, China. Wanza 40 was planted in more than 1.3 mil. ha. It is interesting that these two hybrid cultivars developed by Cotton Research Institutes in Hunan and Anhui, respectively, have indicative markers, yellow pollen for male parent in XZM 2, and female parent for glandless in Wanza 40, therefore, it is easy to distinguish their F1 from F2 through scoring seed or field performance of the indicative character, no segregation for F1 but obvious segregation in F2. Generally, F2 hybrids can be expected to have approximately half the heterosis effect of F1. However, cost to produce hybrid seeds is greatly reduced more than 10 times. In Chinese working experience, if one ha field is used to produce F1 hybrid seeds, approximate 100 ha fields can be grown for F1 generation next year and 10,000 ha for F2 for the third year. A transgenic Bt hybrid, ZMS29, released by CRI/CAAS, may be one of the most successful hybrid that commanded the largest planting acreage in China.

Table 1 Yield and fiber quality performances for main hybrid cottons produced by hand-emasculation and pollination developed and mostly grown in China (Huang 2007)

Heterosis exploitation through utilization of GMS lines

Genic male sterile lines

Male sterility can be used as an important tool for hybrid breeding and heterosis utilization of various crops, especially for heterosis utilization in self-pollinated crops and less frequent in cross-pollinated crops. Male sterility refers to the dysplasia of male organs, loss of reproductive function, resulting in infertility, which is common in the plant kingdom (Chen & Liu 2014b). In majority of the cases, this phenomenon appears as a result of spontaneous mutation yielding male-sterile mutants. Male sterile lines can be divided into genic (GMS) and cytoplasmic (CMS) ones based on their mode of inheritance.

Many male sterile lines showing either complete or partial sterility have been identified in Gossypium. GMS line is generally simple in inheritance and controlled by one or two dominant or recessive genes. Since Justus and Leinweber (1960) first identified a GMS line named as ms1 in 1960, a total of 19 GMS genes in 17 GMS lines including ms1, ms2, ms3, Ms4, ms5, ms6, Ms7, ms8, ms9, Ms10, Ms11, Ms12, ms13, ms14, ms15, ms16, Ms17, Ms18 and Ms19 have been identified successively. Among them, ms1, ms2, ms3, ms13, ms14, ms15 and ms16 are inherited by single gene, while ms5ms6 and ms8ms9 by a pair of recessive genes. Except for Ms11, Ms12, ms13, Ms18, and Ms19 found in G. barbadense, all the others are found in G. hirsutum (see review by Zhang & Pan 1999). Scientists in Sichuan province, China, discovered a male sterile plant in the cultivated Dongting 1 in 1972 (Zhang & Jing 1997). Genetic analysis indicated that it was a naturally-occurring GMS mutation designated as Dong-A and controlled by one recessive male sterile gene (Huang et al. 1982, Zhang et al. 1992, Zhang 1995). Now only completely GMS lines such as ms2 (1355A), ms14 (Dong-A) and ms5ms6 lines are being utilized to produce hybrid seed in China and India.

Genomics and molecular genetics of GMS in cotton

Early studies on cotton GMS lines mainly focused on their collection and identification. With linkage tests, the ms8ms9 sterile genes were mapped on chr. 12 and 26 or chr. A12 and D12 (Rhyne 1991), ms3 on chr. D07 (Kohel et al. 1984), and Ms11 on chr. A12 (Turcotte and Feaster 1979). Wang et al. (2007b) mapped ms2 on chr. D02 using ‘1335A’ GMS line. By using ‘Zhongkang-A’ (ZK-A), Chen et al. (2009) anchored ms5 on chr. A12 between SSR markers NAU3561 and NAU2176 with a genetic distance of 3.2 cM, and ms6 on chr. D12 between markers BNL1227 and NAU460 at 3.1 cM genetic distance, respectively. Furthrmore, they also located ms15 on chr. A12 at a 2.7 cM genetic distance between NAU2176 and NAU1278 (Chen et al. 2009).

Using the cDNA-AFLP to screen the differentially expressed genes between the fertile and sterile plants of the “Dong-A”, Hou et al. (2002) identified three differentially expressed genes GHA27, GHA28 and GHA47 in the mononuclear stage. The expression of many key genes required for anther development was suppressed at the meiotic and uninucleate microspore stages in the “Dong-A” (Wei et al. 2013). These genes were mainly associated with hormone synthesis, sucrose and starch metabolism, pentose phosphate pathway, glycolysis, flavonoid metabolism, and histone synthesis. Differentially expressed genes were involved in related processes such as cell wall development in pollen of duplicated ms5ms6 line (Ma et al. 2007). A total of 2446 differentially expressed genes enriched in pollen wall and anther development were identified between the male fertile and sterile plants in different anther development stages in 1355A (ms2) through RNA-seq analysis (Wu et al. 2015).

With release of cotton genome sequences (Zhang et al. 2015, Hu et al. 2019), the progress in genetic and genomic research including the cloning of male sterile genes has been accelerated in cotton. The ms2 and ms5ms6 GMS genes have been cloned (Ma et al. 2022, Wu et al. 2022, Mao et al. 2022). Wu et al. (2022) fine-maped and cloned the ‘1355A’ male sterile gene ms2, which encodes polygalacturonase protein (GhNSP). The Ghnsp mutant exhibits abundant of de-esterified homogalacturonan in the tapetum and exine, which leads to defects in pollen exine formation and finally male sterility. The duplicate mutations of GhCYP450 genes encoding a cytochrome P450 protein essential for pollen exine formation and pollen development resulted in producing male sterility for the duplicate ms5ms6 line. GhCYP450 acts as the hydroxylation monooxygenases to catalyze hydroxylation of fatty acids (C12) for producing 7-OH-lauric acid required for the precursor synthesis of sporopollenin. Compared to the fertile wild-type TM-1, GhCYP450-At (ms5) appeared to be premature termination of GhCYP450 translation, while GhCYP450-Dt (ms6) had a 7 bp (GGAAAAA) insertion in the promoter domain in addition to three amino acid changes (D98E, E168K, G198R) in the coding region. Investigating the mechanisms of ms5ms6 male sterility will deepen our understanding of the development and utilization of heterosis (Mao et al. 2022).

In higher plants, tapetum development and pollen formation, are unique events required for development of the male gamete, and any error will lead to the abnormal pollen grain development (Zhang et al. 2016). Meanwhile, the fatty acid metabolism is involved in the synthesis of cuticle and sporopollenin which are important components of pollen exine and wall (Chacon et al. 2013). For example, GhCYP450 acts as the hydroxylation monooxygenases to catalyze hydroxylation of fatty acids required for sporopollenin precursor synthesis. And GhNSP can function as the pectin-degrading enzymes, which lead to the degradation of de-esterified pectin/ homogalacturonan. Interestingly, the GhNSP mutant (Ghnsp) accumulates abundantly de-esterified homogalacturonan in tapetum and pollen exine, generates the thicker nexine and fails to form spines on the pollen wall surface and lacks intine, and finally causes male sterility (Wu et al. 2022).

Development of hybrid cotton cultivars using GMS lines

In Sichuan province, Dong-A and its sibling lines have been utilized in breeding more than various types of twenty hybrids including high yield, super fiber quality and transgenic Bt cottons, such as Chuanzamian (CZM) 4, CZM 27, CZM 32, CZM 35, CZM 40 which generally surpassed local cultivars over 10% in yield, indicating prominent heterosis, high yielding capability and superior fiber qualities (Huang 2007, Xing et al. 2017). Among these hybrids, CZM 1, 2 and 4 were the first cotton GMS hybrids developed and released in China in 1985. CZM 4 was the first Fusarium wilt resistant hybrid and therefore planted on a large scale (Huang & Shi 1988).

In India, ms5ms6 GMS line, Greg, was imported from USA in 1970. A first GMS hybrid named as ‘Suguna’ was developed and released, but it was not extensively grown (Paroda & Basu 1993, Raja et al. 2018). This double recessive GMS line is also widely used in cotton hybrid seed production in China. Several hybrid cultivars have been bred using this ms5ms6 sterile lines, including good fiber quality cultivars such as NAU 98–4, transgenic Bt high yield cultivars such as NAU 6, and planted extensively in YaRCGR (Zhang & Zhu 2004, 2005). By transferring the Bt and ms5ms6 GMS genes into the commercial cultivar ZMS 12 (Xing et al. 2017), the successful cultivar commanding largest planting acreage in China, ZK-A GMS line was bred and released by CRI/CAAS (Xing et al. 1999). Using ZK-A as parent, transgenic Bt ZMS38 and ZMS54 have been developed and planted extensively in YeRCGR (Xing et al. 2017).

Production of GMS hybrid seed by hand-pollination

How to produce GMS hybrid seed using one line with two proposed procedures by hand-pollination including identification and roguing the male fertile plants and conducting hand-pollination was reviewed previously (Zhang & Pan 1999). Multiplication of GMS line such as ms5ms6 and the production of hybrid seed are illustrated in Fig. 1. As a duplicate male sterile line, when heterozygous fertile plants such as Ms5ms5ms6ms6 or ms5ms5Ms6ms6 are bred and crossed as male parent with male sterile plants (ms5ms5ms6ms6), their progeny will segregate into male sterile and fertile plants in the ratio of 1:1. They may be used as the single recessive male sterile line in producing hybrid seeds. In hybrid seed production, there need to be three isolation regions for propagating GMS line, parental restorer line, and producing hybrid seeds, respectively. Hybrid seeds are produced on a huge area entirely depending upon hand-pollination plant to plant with restorer cultivar. This is one of the important characteristics of GMS lines.

Fig. 1
figure 1

Multiplication and utilization of ms5ms6 GMS lines in production of hybrid seeds

In production of F1 hybrid seeds, one has to wait until the flowering stage to recognize and eliminate the fertile plants. The procedure is tedious, and time-consuming and increases the cost of hybrid seed production.Therefore, it should be helpful to develop a GMS line with indicative characters appearing in seed or seedling stage. Although such GMS lines have been identified, a leaf abnormality for the identification of the ms2 male sterility (Quisenberry & Kohel 1968), a virescent marker completely associated with ms16 sterility (81A) (Feng 1988, Zhang & Pan 1990, Zhang et al. 1992), no such hybrid cultivars have been developed using this GMS lines.

Being different from the ordinary “one line with two purpose” procedures illustrated in Fig. 1, hybrid seeds produced using the duplicate GMS line can be used for F1 and potential F2 so that the cost for producing hybrid seeds can be reduced further. Although approximately 6% of male sterile plants will be segregated out in this F2 population, they may be plucked off at budding or flowering stage in the field without too much effect on the total yield. Moreover, in the region with high percentage of natural crossing, it is even not necessary to pluck off such sort of male sterile plants (Weaver 1987, Jing et al. 1994).

Heterosis exploitation through utilization of CMS lines

Cytoplasmic male sterile lines

Many types of CMS lines had been developed in cotton (see review by Zhang & Pan 1999). Since 1965, CMS lines with cytoplasms from G. arboreum L. (A2), G. anomalum Wawr. & Peyr (B1) (Meyer & Meyer 1965), G. harknessii Brandg. (D2-2) (Meyer 1975), and G. trilobum (DC.) Skov. (D8) (Stewart 1992) have been developed in the USA. One new type of CMS line, 104-7A, selected from the progeny of a cross between G. hirsutum cv. Shiduan 5 and G. barbadense cv. Junhaimian was developed in China (Jia 1990). A CMS line “Xiangyuan A” was also cultivated by crossing G. thurberi with Dong-A GMS line and backcrossed with G. hirsutum as a recurrent parent (Zhu et al. 2013). What is relationship among these newly identified CMS lines remains to be explored. The male sterility of CMS lines in G. arboreum and G. anomalum cytoplasm is not stable, prominently influenced by environmental factors, and without any practical utility value (Meyer & Meyer 1965). Only G. harknessii CMS lines (CMS-D2-2) and 104-7A have been used to produce hybrid cotton in India and China (Xing et al. 2017), respectively.

CMS can be further divided into sporophytic and gametophytic sterile types according to the process of male pollen abortion. The sporophyte sterility is controlled by sporophyte genotype, and the hybrid F1 of CMS line crossed with restorer line has full ability of pollen fertility. G. harknessii CMS-D2-2 as well as 104-7A belong to sporophyte CMS type. However, G. trilotum CMS-D8 is identified as gametophytic one, and its fertility is controlled by gametophyte genotype, only half pollen of the hybrid F1 have their fertility recovery.

Genetics and genomics of CMS fertility restoration

Two different dominant genes, Rf1 and Rf2, control the fertility restoration to two main CMS systems, CMS-D2-2 and CMS-D8, respectively. The Rf1 gene from D2-2 can restore fertility to both CMS-D2-2 and CMS-D8 lines, but the Rf2 gene from D8 can only restore fertility to the CMS-D8 lines (Zhang & Stewart 2001). Two independent genes Rf1 and Rf2 with a distance of 0.93 cM contribute to the fertility restoration (Zhang & Stewart 2004). So, Rf1 possesses great potential for heterosis exploitation.

Both restorer genes Rf1 and Rf2 have been fine-mapped. Guo et al. (1998) first reported a random amplified polymorphic DNA (RAPD) marker OPV-15300 linked to Rf1 gene. Combined bulked segregant analysis (BSA) and near isogenic line (NIL), this laboratory further anchored Rf1 gene with three more SSR and two RAPD markers (Liu et al. 2003). A high-resolution genetic map of Rf1 containing 13 markers in a genetic distance of 0.9 cM was constructed and used to screen a bacterial artificial chromosome (BAC) library from a restorer line 0–613-2R containing Rf1 gene. Based on sequences of 50 BAC ends of single positive clones screened, two new sequence-tagged site (STS) markers tightly linked to Rf1 gene had been tagged and integrated into this map. The physical map for the Rf1 gene was constructed by fingerprinting the positive clones digested with the HindIII enzyme. The location of the Rf1 gene was further delimited to a minimum of two BAC clones spanning an interval of approximately 100 kb between two clones designated 081-05 K and 052-01N (Yin et al. 2006). By sequenceing these two BACs, five clustered and very high similar pentatricopeptide repeat protein (PPR) protein genes were deduced as Rf1 candidate for CMS-D2-2 restoration (Yang et al. 2010), consistent to most reports in which restorer fertility (Rf) genes encode PPR protein (Kim & Zhang 2018). However, it is difficult to distinguish which one (or likely several) is responsible for Rf1 or even Rf2.

Owing to its important in heterosis utilization, many markers such as RAPD, SSR, STS and InDEL linked to Rf1 have been developed and used in molecular marker assisted-selection breeding (Zhang & Stewart 2004; Feng et al. 2005).

A RAPD marker, UBC188-500, closely linked to Rf2 was first reported (Zhang & Stewart 2001, 2004). Rf1 and Rf2 as fertility restoration for CMS-D2 and CMS-D8, respectively, were anchored within a genetic distance of 1.4 cM (Wang et al. 2007a, Wang et al. 2009b). PPR genes were clustered on the Rf1 loci (Wu et al. 2014, Wu et al. 2017, Zhang et al. 2018). Integrated BSA-seq, high-throughput SNP genotyping and InDel, Rf2 locus was anchorred on interval of 1.48 Mb containing 8 PPR genes on chr. 5D (Feng et al. 2021). Using homocap-seq technology, Gao et al. (2022) reported extensive differences within the D05_PPR cluster in restorer line, inferring that D05_PPR cluster was associated with fertility recovery. The cloning of restorer genes can facilitate their molecular exploration in cotton heterosis and improve the efficiency of three-line hybrid seeding system.

Molecular mechanism of CMS in cotton

Mitochondrial, as a semi-autonomous organelle, has a large quantity of repeat sequences, which mediate gene rearrangement to generate new chimeric genes that are closely associated with cytoplasmic male sterility (Chase 2007, Hu et al. 2014). Until now, 31 CMS genes located in mitochondrial genome have been identified in 14 crop species, such as rice, maize, radish, soybean and wheat ( Jing et al. 2012, Luo et al. 2013, Iwabuchi et al. 1999,, Yamagishi et al. 2019, Melonek et al. 2021, Jiang et al. 2022, Yang et al. 2022). Wang et al. (1998) analyzed the mitochondrial DNA from multiple CMS lines using RAPD markers and determined the main driver of CMS in cotton is the aberrant mitochondrial DNA. Feng et al. (2000) identified significant differences between the mitochondrial genomes of G. harknessii CMS line and the normal fertile G. hirsutum by RFLP. Through comparative studies of the mitochondrial proteins and DNAs between G. harknessii CMS A-line and its corresponding B-line, a 31 KDa polypeptide is found missing in the CMS mitochondrial at abortion stage (Wang 2000). Using four mitochondrial genes (rrn26S, atp9, atp6, coxII) as probes, the CMS line is found lacking a 1.9 Kb fragment that is homology with the coxII gene, the mutation in coxII gene may lead to mitochondrial dysfunction and result in production of male sterility (Wang 2000, Wang et al. 2009a). Li et al. (2018) assembled the mitochondrial genomes of 2074A, 2074S as well as their B-line and restorer line, four specifically transcribed ORFs in 2074A are found. An aberrant transcription of cox3 was found in CMS line H276A (Khan et al. 2022a). Methylated genes mainly related with starch, sucrose and galactose metabolism pathways are differentially expressed and five key genes associated with CMS are identified (You et al. 2022). Some differential expression miRNAs may be the regulators of CMS occurrence (Li et al. 2021).

Sequenced mitochondrial genome information for three-line hybrid system in cotton is given in Table 2. By comparative analysis of the mitochondrial genome and transcriptome, four specific ORFs are identified in CMS-D2 line Simian 3A, of them, orf606a, a homologous to orf610a, highly expressed in CMS-D2 line, is supposed as CMS-D2 candidate gene (Xuan et al. 2022). At the same time, the complete sequence of mitochondrial (mt) genome for CMS-D2 line “ZBA” is also assembled as a single circular molecule with 634,036 bp in length. Of 194 annonated ORFs, 36 protein-coding genes, six rRNAs, and 24 tRNAs, a previously unknown chimeric gene, orf610a, which is composed of atp1 and a 485-bp downstream sequence of unknown nature, is identified. The orf610a expresses specifically in CMS-D2 line. Ectopic expression of orf610a in A. thaliana fused with a mitochondrial targeting peptide displays partial male sterility. Interaction between ORF610a and the nuclear-encoded protein RD22 indicated an association between ORF610a and pollen abortion, suggesting that the mt CMS gene, orf610a, may account for CMS-D2 male sterility (Zhang et al. 2022). The abortion process of CMS line is accompanied with programmed cell death and accumulation of reactive oxygen species (Fig. 2). The key problem that needs to be solved by combining the functional Rf genes to elucidate the molecular mechanism of mutual regulation in CMS cytoplasm and nuclear restore genes.

Table 2 Sequenced mitochondrial genome information for three-line hybrid system in cotton
Fig. 2
figure 2

Model of the mechanism underlying male sterile in cotton CMS-D2 line. ORF606a protein may be toxic causing a decrease in ATP synthesis, and ROS burst in pollen tapetal, eventually leading to male sterility

Development and production of CMS hybrid cultivars

CMS-based hybrid breeding system employs a three-line hybrid system, CMS (A-line), maintainer (B-line) and restorer line (R-line). Of them, the selection of R-line is the most important.

Development of CMS R-lines

Selection of restorer line is generally based on cross hybridizing, backcross breeding and test-cross screening. By Agrobacterium-mediated introduction of glutathione S-transferase (GST) into the restorer line, Wang et al. (2003) developed a strong restoring R-line named as “Zhedaqianghui,” with which Zheza 2 CMS hybrid was developed (Zhu et al. 2006). By distant hybridization between G. hirsutum and G. anomalum, a restorer line introgressed Rf gene from G. anomalum was produced, which had strong restoring ability and 100% restoration rate for CMS-D2-2 line (Hua et al. 2003). Excellent CMS A- and R-lines have been selected during the past decades, which lead to dominant contributions to cotton heterosis.

Development of CMS hybrid cultivars

In 2005, the first two three-line CMS hybrids named as Yinmian 2 and Zheza 2 were developed by Biotchnology Institute/CAAS and Zhejiang University, respectively. Yinmian 2 is a transgenic Bt insect-resistance hybrid cultivar (Zhu et al. 2006, Guo et al. 2007). It is characterized by its high yield for Yinmian 2, 21.1% higher than the control ZMS 41. Subsequently, transgenic Bt CMS hybrids named as ZMS83 and ZMS 99 developed by CRI/CAAS in 2011 and 2016, respectively, and Luza 2138 by Shangdong Cotton Research Center, China in 2017, have been released and planted in China (Xing et al. 2017). However, due to high cost to produce hybrid seeds by CMS with hand-pollination, these hybrids are not extensively planted in production.

Some interspecific CMS-D2-2 hybrids such as Xinluzhong 24 and 43 have been developed in Xinjiang. Xinluzhong 43 crossed between G. hirsutum acc. H-268A and G. barbadense acc. 75R was released in Xinjiang in 2009. Its lint yield (2217.9 kg/ha) is the same as the control (2185.1 kg/ha), but its fiber quality much better, fiber length 35.0 mm, strength 36.46 cN/tex and micronaire 3.53.

It is reported that the cytoplasm of G. harknessii may render F1 hybrid some detrimental effects on yield. Partial female sterility is the reason that results in high rate of abortive seeds which reduce the seed-cotton and lint yield (Weaver 1986, Wei et al. 1995, Wang et al. 1997). Nevertheless, an excellent hybrid might be selected through wide testcrossing and selection overcoming the negative effects.

Producing hybrid seeds with male sterile lines pollinated by insects

Cotton pollen bearing large spheroid and spinate pollen grains is not wind-disseminated, and insects are the natural agents for the pollen transfer. Therefore, much attention has been paid to the exploitation of such a way of producing hybrid seeds with male sterile lines pollinated by insects as reviewed before (Zhang & Pan 1999). It is concluded that honeybee (Apis mellifera L.) is the main vector for pollination and the transfer efficiency depends upon the number and distribution of the colonies. If bee source for hybrid seed production can fundamentally be met, or visiting bees exceeding 0.5–1.0 for 100 flower is necessary for good pollen dispersal, honey bee pollination is feasible to produce hybrid seeds via male sterile line (Waller et al. 1985, Feng 1990). However, in areas where natural crossing rate is high, the average seed-cotton yield in male sterile plant could not be comparable to the fertile one, not to mention in the locations where natural crossing rate is low and yield decreases more significantly (Gururajan & Srinivasan 1975).

Honeybee is extensively raised artificially and easily available, therefore, it is considered to be the most effective vector to produce hybrid seeds. At blooming stage, 22 kinds of insects belong to Hymenoptera, and 5 to Diptera, were identified to be associated with cotton pollination from the cotton growing area in Sichuan, China (Department of Biology/Nanchong Normal College 1978). Among Hymenoptera insects, honey bee, bumble bee (Bumbus spp), and leaf cutting bee (Megachile conjunctifomis) were the main vectors for cotton pollination. Two peak periods of honeybee visiting the cotton field, 11 o’clock in the morning and 3 o’clock in the afternoon were observed in Zu county of Hebei, China. The insect activity was tremendously influenced by environmental conditions, especially, spraying insecticides would cause severest effects at full blooming stage. Unfavorable weather such as great storm, thunder and showers also adversely affected the pollination activities, particularly the honeybee (Feng 1990).

In 1999, hybrid seed production of transgenic Bt bollworm-resistant combination (ZK-A × Bollgard 33B) was conducted on net room by hand- (1000 M2) and honeybee-pollination (600 M2), respectively. Each net room had one honeybee colony having around 6000–7000 bees for pollinating the ms5ms6-line. It was workable for this hybrid seed yield (1110.5 kg/ha) under 1 row of Ms-line (Bollgard 33B, Ms5Ms6) and 3 rows of ms-line (ZK-A, ms5ms6) by honeybee pollination considering seed cost although it was still significantly lower (1291.6 kg/ha) by −16.31% by hand-pollination (Xing et al. 2002). The seed yield reduction mainly resulted from its decreasing in bolls per plant (10.9 vs 13.5) by 23.85% and boll weight (4.9 g vs 5.2 g) by 6.12% respectively for bee- and hand-pollination. No spraying insecticide to control bollworm helps increasing seed yield at full blooming stage in insect-resistent cottons.

A survey experiment in producing hybrid seeds in large field planted in both YeRCGR and YaRCGR was further conducted for three years with an alternated 3 ms:1Ms, 4 ms:1Ms, 5 ms:1Ms, 6 ms:1Ms row pattern, respectively (Xing et al. 2005). 15 colonies per hm2 were placed along the sides of three fields. Each colony had around 8000 ~ 10,000 bees. In this system of honeybee pollination producing cotton hybrids, the parent plant proportion, bee varieties, parent plant patterns, honeybee behavior and weather influence were presented. Average hybrid seed of two locations is 1220.4 kg/hm2 for 1:3, 1242.1 kg/hm2 for 1:4, 998.8 kg/hm2 for 1:5, and 639.2 kg/hm2 for 1:6 Ms-line vs ms-line, respectively. It is concluded that 1:4 proportion of parents is ideal, honey bee is better than bear bee in pollination effect, and there is no remarkable difference between mixing plant patterns and alternating plant patterns (Table 3). Weather has obviously effect on honeybee pollination, which leads to direct hybrid yield loss.

Table 3 Effects of hybrid producing by different parent plant proportions (Xing et al. 2005)

Challenges and opportunities

Core parental development for heterosis utilization.

Germplasm enhancement is the foundation of hybrid breeding work. ZMS-12, a largest Upland cotton cultivar grown in China with high yield, superior fiber quality and disease resistance, is characterized by its wide adaption and high combining ability. ZMS 12 and its pedigree-derived lines were used to develop elite hybrids, including ZMS-28, ZMS-29, XZM 2 and Jimian18 (Table 1). Due to its good combining ability of ZMS-12, 84 cultivars had been directly bred from ZMS -12, seven hybrids and six transgenic Bt pest-resistant cotton cultivars before 2002 (Guo et al. 2002). Among them, ZMS-12, ZMS-28 and Jimian18 are widely cultivated in the YeRCGR, and ZMS-29 and XZM-2 extensively planted in the YaRCGR (Huang 2007, Xing et al. 2017) (Table 1). By transferring the Bt and ms5ms6 GMS genes into the ZMS 12, a GMS line in ZMS 12 background named as Zhongkang-A (ZK-A) has been bred (Xing et al. 2017). With this ZK-A, super hybrids are easily identified due to its high combility ability and ZMS38 and ZMS54 have been developed in CRI/CAAS. Therefore, ZMS 12 as a core parent plays a very important role in Chinese cotton breeding including heterosis utilization in China. The pedigrees and utilization in hybrid cultivar improvement as a foundation parent for ZMS-12 as well as Shiyuan 321 are presented in Fig. 3. From the development and utilization ZMS 12, a great attention shall be paid to core parent development.

Fig. 3
figure 3

Pedigrees of hybrids developed using ZMS12 and SY321 as core parents

F2 heterosis utilization in cotton

As continuous rising of labor cost, the simplified, high effective and low cost cotton planting has become an inevitable trend, which bring a challenge in cotton heterosis utilization. The efficient and controllable pollination process is a prerequisite for the mass commercial production of hybrid seeds. F2 hybrid does exist heterosis, usually, 6% ~ 10% higher yield than the control cultivars, and has being used extensively in China for many years. F2 heterosis utilization at least in China is still a way in future cotton breeding. Now, the Xinjiang Uygur autonomous region has gradually become one of the largest bases of cotton production worldwide. In 2021, cotton in Xinjiang was planted over a span of more than 2.50 million hectares and produced 5.39 million tons of cotton lint, accounting for approximately 82.8% of the planting area and 89.5% of production in cotton in China (http://www.stats.gov.cn), which in turn constitutes almost one fifth of the world’s cotton production; thus, Xinjiang occupies a unique position in the global cotton industry. Some breeders have developed such hybrids such as Xinluzhong 32, Xinluzao 14, 67, and 69 specifically for F2 generation in this region.

Challenges developing F2 hybrid which can be used in production is given the following:

  1. (1)

    In selecting parental lines, no big difference in fiber length for two parents (within 1.5 mm) and growth stage, otherwise, the uniformity of lint length in F2 cannot meet the demands of yarn spinning.

  2. (2)

    As a gametophytic CMS, hybrid (s) developed using G. trilotum CMS-D8 can be used in F2 because their all F2 plants have a complete fertility recovery although only half pollens of the hybrid F1 have their fertility recovery.

  3. (3)

    Selection and testification of high heterosis in F2 because some F2 does not show heterosis although high heterosis exists in F1.

  4. (4)

    Heterosis does exist in yield, but not in fiber quality in F2.

  5. (5)

    Using chemical gametocide. Hybrid cultivar development by hand-emasculation is easy to make crosses with convenience and freedom. Some chemical gametocides for killing stamens, but without any damage to the pistil and normal capability of fertilization has been screened out in cotton. Without hand-emasculation, hybrid seed production can be further reduced.

Exploration and enhancement of environment‑sensitive genic male sterility

The most desirable method to produce hybrid seeds is using the GMS or CMS line combined with insect pollination if there is adequate pollen pollinated by honey bee, A- and B-lines can produce equal seed-cotton yields in field. The discovery and enhancement of the environment‑sensitive genic male sterility (EGMS) induced by environmental factors such as light and temperature has enabled some GMS traits to be used for hybrid crop breeding. The EGMS line can be used as a sterile line and a maintainer line as well by controlling the appropriate environment, and realizing cross-breeding of two lines. Since the 1970s, EGMS has been continuously found in major crops such as rice, wheat and soybean. According to environmental dominant factors, EGMS can be divided into three categories: photoperiod-sensitive (PGMS), temperature-sensitive GMS (TGMS) and photo-thermosensitive GMS (PTGMS) (Chen et al. 2019). In Upland cotton, a PGMS mutant CCRI9106 derived from CCRI040029 using Space mutation breeding technology is found. The mutated CCRI9106 becomes male sterile under long day conditions and is genetically controlled by one recessive gene named as ys-1 anchored on chr. D12 (Liu et al. 2014, Zhang et al. 2020). The photoperiod- and thermo-sensitive GMS in rice is caused by a point mutation in a novel noncoding RNA that produces a small RNA(Zhou et al. 2012), However, what is mutation mechanism for ys-1 and how to use this PGMS in heterosis utilization remains to be explored to solve the high cost of pollination problem for hybrid seed production effectively. Producing hybrid seeds with EGMS lines pollinated by insects remains to further study.