Introduction

Cytoplasmic male sterility (CMS) is a maternally inherited trait that prevents a plant from forming pollen (Levings 1993; Schnable and Wise 1998). The genetic model of CMS includes cytoplasm having the ability to induce male sterility (designated as S) and non-inducible cytoplasm (N). Also included in this model is a nuclear gene that suppresses the action of S. This gene is termed restorer-of-fertility (Rf). In most cases, the dominant Rf allele restores pollen fertility; hence, the recessive rf allele is necessary for the induction of male sterility. The male sterile plant can receive pollen to set seeds because its female reproductive organ remains functional.

As hybrid maize became more significant in the early twentieth century (Duvick 2001), breeders anticipated applying this new breeding method to other crops. However, this attempt was hampered by the hermaphroditism expressed by crops (i.e., male and female organs are closely positioned in a flower), which results in the purity of the resultant crop being reduced. The idea that CMS can be used to emasculate seed parental line was proposed for several crops (Kaul 1988), and Jones (1943) was first to implement it in the onion. The hybrid expresses sterility if the pollen parental line is rfrf; hence, pollen parental line for hybrid seed production should be RfRf to secure pollination if the yield is a seed or fruit. Hybrid breeding through the use of CMS has now been widely introduced across many crops (Kim and Zhang 2018). Unfortunately, not all crop species are amenable to CMS-based breeding because of the absence of appropriate CMS. For CMS-based breeding to be implemented within the context of certain crops, S and both dominant and recessive alleles of Rf must first be characterized.

CMS has been reported in more than 150 plant species (Laser and Lersten 1972). Potential sources of CMS are as follows: In some crops such as onion, maize, and sugar beet, the CMS plant was discovered from breeding materials (Jones 1936; Owen 1945; Rogers and Edwardson 1952). A common strategy to identify CMS is an intraspecific cross between distantly related populations or an interspecific cross (Duvick 1965; Mikami et al. 1985; Marinković and Miller 1995). The rationale for the occurrence of a CMS plant from two normal fertile parents is that CMS is suppressed in one parent due to the dominant Rf allele but the Rf allele is segregated out from the offspring because of the recessive rf allele in the other parent. Overall, this strategy entails the elucidation of cryptic CMS (Touzet 2012) using crossing experiments. In a case in which the donor is sexually incompatible, CMS can be transferred via protoplast fusion followed by plant regeneration (e. g., Pelletier and Budar 2015). In plant reproductive biology, CMS is known as a principal cause of gynodioecy, a sexual system in which females and hermaphrodites co-occur in a population (Touzet 2012). Gynodioecy has been found in more than 18% of 449 angiosperm families (Dufay et al. 2014), from which CMS may be introduced into related crops.

Several efforts based on the above-mentioned strategy have been made to identify and introduce CMS into crops, and some have been a success (Bosemark 1998; Sattari et al 2007; Reddy et al 2008; Yamagishi and Bhat 2014; Huang et al 2014; Saxena et al 2018). However, not all CMS is practically useful because of the difficulty in handling the phenotypic instability that is accompanied by the implementation of this technique. For example, a caveat of Polima-type CMS in rapeseed is the reversion of pollen fertility under high temperatures in certain genotypes (Burns et al. 1991). Such issues can be overcome if we elucidate the mechanisms that underlie the expression of CMS. Here, we would like to review important advances and insightful studies rooted in exploring the molecular genetics of CMS and Rf, with an emphasis on recent findings. We stress that previously unappreciated aspects in this research field such as multiple allelism of Rf, effect of environmental factors, and their interactions should be considered for the successful application of CMS-based breeding across a wider range of crops. Previous reviews on the relevant subject may complement this review, allowing readers to delve deeper into CMS and Rf (Horn et al. 2014; Chen and Liu 2014; Gaborieau et al. 2016; Bohra et al. 2016; Chen et al. 2017; Kim and Zhang 2018; Kubo et al. 2020; Toriyama 2021; Xu et al. 2022).

S-cytoplasm is associated with mitochondria

Cytoplasmic inheritance has been associated with mitochondria, plastids, and infectious agents such as viruses (Mogensen 1996; Greiner et al. 2015; Vong et al. 2019). A CMS in fava bean was reported to be linked to cytoplasmic double-strand RNA (Pfeiffer 1998), but the vast majority of CMS have been shown to stem from mitochondria (Schnable and Wise 1998). Although plant mitochondrial genomes typically contain 20–40 protein-coding genes and 20–30 other genes (i.e., the number varies across species) (Møller et al. 2021), few mutations in these genes have been associated with CMS. A wild beet CMS termed G-type is an exception in which several mutations that truncate reading frames of mitochondrial genes such as cox2 and nad9 have been found (Ducos et al. 2001; Darracq et al. 2011). The association of non-coding RNA has been proposed in one case (Stone et al. 2017), and many cases of CMS have been associated with unique mitochondrial ORF that has the potential to encode a polypeptide absent from N-cytoplasm (hereafter termed S-orf). S-mitochondria contain the same suite of genuine mitochondrial genes as N-mitochondria (Satoh et al. 2004; Makarenko et al. 2019; Wang et al. 2020; Zhong et al. 2021). Generally, the S-orfs that have been identified thus far are not homologous to each other, instead, exhibit a few instances of interspecific similarities (e.g., Tang et al. 2017). One of the notable features is their patchy homology to mitochondrial genes such as genes for subunits of ATP synthase (Köhler et al. 1991; Handa et al. 1995; Tang et al. 1996). The rest of the ORF region (or sometimes the entire ORF region) is unique to each S-orf (Yamamoto et al. 2005; Kim et al. 2007).

Heterologous expression of S-orf with mitochondria-transit peptide in transgenic plants yielded a sterile male phenotype (Yamamoto et al. 2008; Wang et al. 2006; He et al. 1996; Jing et al. 2012; Yang et al. 2010; Luo et al. 2013; Jiang et al 2022), but male sterility phenotype is not always reproduced in similar experiments (Wintz et al. 1995; Chaumont et al. 1995; Kojima et al. 2010). On the other hand, pollen-fertile revertants have been obtained from CMS maize and common bean; their S-orfs are lost or suffer frame-shift mutation (Wise et al. 1987; Janska et al. 1998), supporting a notion that they are responsible for the male sterility phenotype.

Plant mitochondrial genome editing is now feasible through the deployment of mitoTALEN, a chimeric protein consisting of mitochondria-transit peptide and transcription activator-like effector nuclease (Kazama et al. 2019). Unlike CRISPR/Cas9 system, mitoTALEN does not require guide RNA (Kazama et al. 2019), whose import into mitochondria seems difficult. By using mitoTALEN, DNA double-strand breaks were introduced into S-orfs of rice, rapeseed, and tomato CMSs, which caused genome rearrangement resulting in the loss of these S-orfs from the mitochondrial genomes (Kazama et al. 2019; Omukai et al. 2021; Kuwabara et al. 2022; Takatsuka et al. 2022). Plants that lost S-orfs were male fertile, indicating that they are responsible for CMS. An interesting result was reported by Omukai et al. (2021); in this study, they produced rice plants depleted of orf352 which is associated with RT102 cytoplasm but the plants brought males that were partially fertile, instead of being fully fertile. Omukai et al. (2021) discussed that an additional unidentified mitochondrial gene is associated with the male sterility phenotype, implying that the CMS gene is not always a single ORF. The occurrence of multiple unique ORFs in addition to genuine genes is usual in plant mitochondria irrespective of their ability to induce male sterility (e.g., Marienfeld et al. 1997). It seems likely that most of such unique ORFs would not impinge upon the production of pollen, but each has the potential to evolve CMS-causing genes. This notion is consistent with the proposed evolutionary history of rice WA352c (for WA-type CMS), which appeared to have evolved from a unique but non-male sterility inducible ORF (Tang et al. 2017).

The location and expression of S-orf can be classified as illustrated in Fig. 1. In many cases, S-orfs are located upstream or downstream of mitochondrial genes and co-transcription of the S-orfs and the mitochondrial genes has been shown to occur frequently (Fig. 1a; Dewey et al. 1987; Rathburn and Hedgcoth 1991; Song and Hedgcoth 1994; Wang et al. 2006). S-orfs fused with mitochondrial genes have been reported in several plant species (Fig. 1b, c; Bailey-Serres et al. 1986; Yamamoto et al. 2005; Meyer et al. 2018), but their occurrence has been shown to be rarer than the co-transcription type. One may expect fusion protein from such gene organization; however, in sugar beet, the fusion protein seems to be processed into two independent proteins (Fig. 1b, Yamamoto et al. 2005). On the other hand, the fusion protein was detected in CMS sorghum (Fig. 1c, Bailey-Serres et al. 1986). Note that cox1 in wild beet with G-type CMS has a NH2-terminal extension that can be considered an S-orf, resulting in the detection of the fusion protein (Fig. 1c, Meyer et al. 2018). Fewer cases of solitary S-orf have been recorded (Fig. 1d; Yamamoto et al. 2008; Park et al. 2013). For these observations, we favor the notion that, because S-orf is an evolutionarily young gene, it does not have its own promoter and terminator sequences. Therefore, the easiest way to achieve S-orf expression is to capture the preexisting sequences. This possibility may be reflected in the fact that the co-transcription type S-orf has the highest frequency with regard to expression. Gene fusion may facilitate the expression of S-orf, but it would be necessary not to severely impair the function of the fused mitochondrial gene, which may make the emergence of fusion-type S-orf difficult; hence, this type is rarer. The expression of solitary S-orf is only possible if sequences necessary for its expression are available, but this may be evolutionarily rare scenario in the context of the plant mitochondrial genome.

Fig. 1
figure 1

Locations and expression mechanisms that underlie S-orfs in the mitochondrial genome. Striped and open boxes represent S-orfs and mitochondrial genes, respectively. S-orfs encode either chimeric sequence of partial mitochondrial gene and unknown origin, or their entire sequence is unknown origin. Curved lines represent transcripts. Striped and open circles represent translation products of S-orfs and mitochondrial genes, respectively. Protein derived from S-orfs responsible for male sterility. a S-orf is located independently of mitochondrial gene. They are co-transcribed, and then translated into two independent proteins. b S-orf fused with mitochondrial genes is co-transcribed, and then translated into a fusion protein. The fusion protein is proteolytic cleavage into two proteins. c S-orf fused with mitochondrial genes is co-transcribed, and then translated into a fusion protein. d S-orfs are solitary

CMS-associated ORFs vary in their primary structure, but several groups may be possible at biochemical level

The above-mentioned studies indicate that the protein product of S-orf likely alters mitochondrial function, but the application of sequence-based approaches to infer S-orf function has been difficult because of the great nucleotide-sequence diversity among S-orfs. However, it has been shown that the translation products predicted through S-orfs have highly hydrophobic domains (Dewey et al. 1987; Köhler et al. 1991; Yamamoto et al. 2005; Kazama et al. 2008; Okazaki et al. 2013). Note that this is not always consistent with their strict membrane localization because petunia PCF protein and wild beet ORF129 protein (derived from I-12CMS(3)/E-type cytoplasm) were detected from both the mitochondrial membrane and matrix (Nivison et al. 1994; Yamamoto et al. 2008), suggesting that they are loosely associated with the mitochondrial membrane.

S-orfs can be classified according to the biochemical properties of their translation products. One is the oligomer-forming group, to which maize urf13-T (from T-type CMS), radish orf138 (Ogura-type CMS), and sugar beet preSatp6 (Owen-type CMS) belong (Rhoads et al. 1998; Duroc et al. 2005; Yamamoto et al. 2005). The translation products of these ORFs are highly hydrophobic and they are detected only in a fraction of the membrane, indicating that they are likely integrated into the mitochondrial membrane to form oligomers, whereas the specific association of another protein was not supported (Duroc et al. 2005, 2009; Yamamoto et al. 2005). Rhoads et al. (1998) proposed a model of molecular pore made from the translation products of urf13-T. This molecular pore is formed under the presence of specific chemicals such as T-toxin produced by Cochliobolus heterostrophus race T (the causal fungus of Southern corn leaf blight) and the pore acts as an uncoupler. A similar molecular pore may be formed in radish and sugar beet CMS plants, although this remains to be examined. The amount of oligomer-formed preSATP6 protein exhibits a good correlation with the severity of the CMS phenotype (Arakawa et al. 2018, 2019b, 2020a), suggesting that oligomer-formed preSATP6 protein is quantitatively associated with the CMS phenotype.

Another group consists of the protein having the ability to bind specific proteins other than itself. This group includes rice WA352c (WA-type CMS) and rice orfH79 (HL-type CMS). WA352c protein was shown to bind to COX11 protein, which is a copper metallochaperone involved in the cytochrome c oxidase assembly (Luo et al. 2013; Wang et al. 2018). ORFH79 seems to be able to bind a protein(s) but its partner may not be a single protein (Liu et al. 2012; Wang et al. 2013). The question of whether an additional group or alternative classification of S-orf is possible remains to be answered.

Functional aspects of CMS-associated ORF

The fact that there are trans-specific similarity among S-orf gene products at biochemical level, such as those seen in the oligomer-forming group, points to the presence of a shared mechanism that underlies the expression male sterility. The translation product of S-orf is believed to impair mitochondrial function, thereby causing pollen sterility (Horn et al. 2014). Given that many S-orfs are constitutively expressed (Touzet and Meyer 2014), the degree of impairment should be small or largely compensated in other organs than anthers, otherwise large deleterious effects would incur a fitness penalty, which will reduce the yield of hybrid crops. Three working hypotheses have been discussed about the expression of CMS: one evokes an anther-specific factor to activate the protein. This hypothesis was proposed based on the fact that URF13-T protein (maize T-type CMS) becomes an uncoupler under the presence of certain chemicals (Flavell 1974; Levings 1993). The second hypothesis postulates a difference in the relative importance of mitochondria among organs, of which the anther is highly dependent on mitochondria (Levings 1993). According to this hypothesis, a subtle effect of S-orf in non-anther organs is expected; however, it may be undetectable. Maize with T-type CMS shows slight morphological differences such as in its height, leaf number, and grain yield (Duvick 1965). Rice plants expressing orfH79 (HL-type CMS) exhibited shorter roots, altered mitochondrial function, and increased reactive oxygen species (ROS) compared to plants with N-cytoplasm and these alterations disappeared when orfH79 is suppressed (Yu et al. 2015), apparently consistent with this hypothesis. Some S-orf proteins are reported to be cytotoxic in non-anther cells (e.g., Wang et al. 2006; Kojima et al. 2010; Nakai et al. 1995). Note that, although some S-cytoplasm causes vegetative abnormalities such as chlorosis, these are attributed to the plastid genome and can be genetically separated from the male sterility phenotype (Yamagishi and Bhat 2014). Presence/absence of S-orf is not the only factor that differentiates the N- and S-cytoplasm. Additionally, differences have been found between N- and S mitochondrial DNA sequences (e.g., Satoh et al. 2004; Wang 2020). Detailed analyses are necessary to ascertain whether other phenotypes, other than male sterility, are caused by S-orf. In the third hypothesis, S-orf is translated only in the anther. Translation products of common bean orf239 and rice WA352c were detected from the anther but not from other organs (Abad et al. 1995; Luo et al. 2013). In this case, the effect of the S-orf protein may not be necessarily weak.

The way how the S-orf proteins impair mitochondrial function is unclear. If the S-orf protein binds to a specific mitochondrial protein, this binding may be deleterious to the protein’s function, thereby mitochondria are damaged (Luo et al. 2013). Sabar et al. (2003) investigated sunflower PET-type CMS and detected a decrease in the enzymatic capacity of the Complex V (F1F0-ATP synthase), for which translation product of orf522, an S-orf of this CMS, is suggested to compete with atp8 due to shared homology in their 5' coding regions. This notion may be applicable for S-orfs having homology to genuine mitochondrial genes. These data imply that some S-orfs directly impair function of mitochondrial proteins. For the oligomer-forming-type S-orfs, it has been postulated that the oligomer may act as a mild uncoupler (Duroc et al. 2009), but this notion needs further physiological support. While the enzymatic capacity of respiratory complexes is unchanged in the plant with orf138 (Duroc et al. 2009), a decrease in the Complex V was found in the plant with preSatp6 (Wesołowski et al. 2015). It was reported that not all preSATP6 proteins form oligomers under certain extraction conditions (Wesołowski et al. 2015). Given the high hydrophobicity of these oligomer-forming proteins, it may be possible that their non-specific binding with other mitochondrial proteins has a small harmful effect, but its effect accumulates eventually and crosses the threshold in the anther. This explanation is congruent with the above-mentioned notion that protein–protein interactions play a key role in the impairment of mitochondrial function, and may be a common mechanism that unifies the expression of CMS. On the other hand, mitochondrial function can be regulated by other mechanisms that have been extensively investigated within the context of CMS. For example, there is intimate associated between mitochondrial shape and function, and this link has been shown to drive changes in the respiratory supercomplex (Cogliati et al. 2013). Interestingly, Meyer et al. (2018) showed an alteration in the mitochondrial supercomplex in wild beet with G-type CMS. We believe that not enough evidence has been brought forth to conclude that the protein–protein interaction model explains all the dynamics of CMS. Post-translational action of oligomer-forming type S-orf should be investigated further.

Mitochondrial alteration by CMS-associated ORF

Mitochondria are involved in the production of energetic substances such as ATP through the respiratory chain in the inner membrane (Logan 2006). Considering the hypothesis that anther development highly depends on mitochondria, anthers may consume more ATP than other organs, and ATP shortage by mitochondrial impairment could impact on its development. However, Touzet and Meyer (2014) argue against this notion based on the phenotypes of mutants with defects in the subunits of the respiratory complexes. A detailed study is necessary on whether changes in ATP/ADP ratio observed in some CMS plants are the cause of CMS or by-products. Mitochondria not only produce ATP, but they also involved in the catabolism of amino acids and the provision of carbon skeletons for the biosynthesis of an array of compounds (Sweetlove et al. 2007). Whether and how S-orf affects these biosynthetic pathways remains to be examined.

Because mitochondria are a site of ROS production (Scheffler 1999), their impairment sometimes leads to the overproduction of ROS. An increase in ROS accumulation is frequently detected in CMS plants (Horn et al. 2014). Whereas ROS can directly damage biological macromolecules (Scheffler 1999), ROS is also associated with biological processes such as programmed cell death (PCD) (Gechev et al. 2006). In the developmental process of pollen, the anther tapetum tissue undergoes PCD at a certain stage of microspore development (Tariq et al. 2022). Anther tapetum is a sporophytic tissue surrounding microspore mother cells/microspores and plays important roles in pollen development, primarily by providing components for pollen production (Tariq et al. 2022). The PCD of tapetum is crucial for pollen development because mutants with defects in this process fail to produce functional pollen (Tariq et al. 2022). In sunflowers with PET-type CMS and rice with WA-type CMS, the onset of PCD is earlier than in the N-cytoplasm plant, which may be caused by altered ROS accumulation (Balk and Leaver 2001; Luo et al. 2013). On the other hand, Arakawa et al. (2019b) observed the correlation between the persistence of tapetal debris and the severity of sugar beet Owen-type CMS, suggesting that the onset of PCD is delayed or that PCD proceeds slowly. These observations point to mitochondria intervening in or modulating pollen development to induce male sterility. This viewpoint has been investigated by Geddy et al. (2005); they found that rapeseed with nap CMS exhibited altered expression pattern of apetala 3, a floral homeotic gene, pointing to the anther being impacted by a developmental abnormality. The link of mitochondrial function to the expression of developmental abnormalities is apparent, as homeotic conversion of the stamen into other floral organs such as the carpel and petal is seen in carrot and wheat CMS (Linke et al. 2003; Murai et al. 2002). In these CMS plants, the expression of the so-called B-class MADS box genes is disrupted (Linke et al. 2003; Murai et al. 2002). Overall, mitochondria are likely involved in anther development; however, the details underlying this involvement remain unknown. Hence, our understanding of CMS should be rooted in clearly elucidating the role of mitochondria in floral development; specifically, there should be investigation delving into how mitochondria affect nuclear gene expression within the context of floral development (i.e., retrograde signaling).

Genetic and molecular aspects of restorer-of-fertility

It is not always the case that one Rf is identified in an S-cytoplasm. There have been cases in which multiple Rfs have been identified (Schnable and Wise 1998; Kaul 1988). In general, an Rf specifically counteracts the cognate S-cytoplasm, a genetic rule used for discriminating S-cytoplasms (Duvick 1965). Test cross is a simple procedure to know whether a plant has a specific Rf, but it is time-consuming and laborious and a major obstacle for hybrid breeding programs in some crops. To save the cost, alternative methods such as DNA marker-assisted selection are necessary. Molecular organization and polymorphism of Rf is crucial information to develop DNA markers of Rf.

Table 1 summarizes Rf-gene products based on cloned Rf or its strong candidates. Note that the gene product of maize Rf4 is not a mitochondrial protein (Jaqueth et al. 2020). In many cases, the Rf-gene product is reported to be proteins with a pentatricopeptide repeat (PPR) motif (Gaborieau et al. 2016) (PPR-Rf). PPR genes are involved in post-transcriptional processes in mitochondria and plastids, and their translation products are generally considered to be able to bind to RNA in a sequence-specific manner (Barkan and Small 2014). Genes encoding PPR proteins occur 400 to more than 1000 times in land-plant genomes, and they are classified according to the difference in the PPR motif and the presence or absence of auxiliary domains (Barkan and Small 2014). Among the classes of PPR proteins, the P-class protein is the gene product of PPR-Rf, except for two cases (Table 1). Within the P-class, Fujii et al. (2011) identified a distinctive subgroup to which PPR-Rfs belong (termed restorer-to-fertility-like, RFL). Because it had been frequently observed that the expression of S-orfs is altered to reduce S-orf protein upon fertility restoration (Brown et al. 2003; Wang et al. 2006; Tang et al. 2014), how PPR-Rf alters S-orf expression is the principal question. Direct or indirect association between PPR-Rf protein and S-orf mRNA has been shown (Kazama et al. 2008; Qin et al. 2014; Jiang et al. 2022). A large protein complex including PPR-Rf protein is reported in several cases (Gillman et al. 2007; Hu et al. 2012), suggesting that an additional protein is involved in fertility restoration. Radish Rfo is a well-characterized PPR-Rf (Brown et al. 2003; Qin et al. 2014; Yamagishi et al. 2021). In the Rfo locus, multiple RFLs are clustered, of which PPR-B (or orf687) has been identified as the gene responsible for fertility restoration (Koizuka et al. 2003; Desloire et al. 2003; Uyttewaal et al. 2008). The translation product of PPR-B is shown to bind to the coding region of orf138 mRNA to inhibit translation (Wang et al. 2021), a clear instance of post-transcriptional suppression by PPR-Rf. Yamagishi et al. (2021) showed that a single nucleotide substitution in orf138 that coincides with the binding site is enough to prevent fertility restoration by Rfo, indicating that interaction between PPR-B and orf138 mRNA occurs in a highly sequence-specific manner. On the other hand, Sorghum Rf1 and barley Rfm1 belong to another class of PPR protein (PLS-DYW), but the function of these genes is unknown partly due to the absence of knowledge about their S-orf.

Table 1 List of gene products of restorer-of-fertility (Rf) and Rf candidate

The post-transcriptional mechanism is also involved in a restoration of fertility that is governed by other types of Rf. Rice Rf2 (for Lead Rice-type CMS) encodes a glycine-rich protein that can reduce the translation product of orf79 (the S-orf of this CMS) by mRNA degradation (Itabashi et al. 2011; Kazama et al. 2016). The recessive rf2 allele (i.e., male sterility-inducing allele) has a missense mutation. Barley Rfm3 and rye Rfp1 likely encode mitochondrial transcription termination factor family (mTERF) proteins, which are known to be involved in transcription, splicing, or tRNA maturation (Quesada 2016). Therefore, they may restore pollen fertility via post-transcriptional mechanism. Small et al. (2019) pointed out that RNA processing enzymes make up about 15% of plant mitochondrial proteome, suggesting the importance of post-transcriptional mechanisms in plant mitochondria. This can be associated with the prevalence of post-transcriptional mechanisms in fertility restoration.

The post-translational mechanisms associated with the inactivation of S-orf protein are found in sugar beet Rf1 (for Owen-type CMS) (Kitazaki et al. 2015). The molecular action of sugar beet Rf1 is not obvious in the amount of preSatp6 (S-orf of this CMS) mRNA or its protein product; however, although the total amount of preSATP6 protein is almost unchanged, its oligomer form is highly reduced in anther where Rf1 is expressed, suggesting that Rf1 can alter higher-order structure of preSATP6 protein (Kitazaki et al. 2015). The nucleotide sequence of sugar beet Rf1 resembles Oma1, known to be involved in the quality control of mitochondria (Matsuhira et al. 2012). Sugar beet Rf1 is not the orthologue of Oma1, but a paralogue evolved through segmental duplication (Arakawa et al. 2020a). Unlike authentic Oma1, sugar beet Rf1 has a defect in zinc-binding domain that is crucial for peptidase activity (Matsuhira et al. 2012). On the other hand, the translation product of sugar beet Rf1 (dominant allele) can bind preSATP6 protein but neither that of recessive rf1 nor authentic Oma1 does (Arakawa et al. 2019a). It is inferred that the translation product of Rf1 acts as a molecular chaperone on fertility restoration.

It should be noted that the relationship between S-orf and Rf-gene products is unclear in some cases. Rice Rf17 (for CW-type CMS) encodes a protein resembling acyl carrier protein (named RMS) (Fujii and Toriyama 2009). Among the alleles of Rf17, those with reduced expression restore pollen fertility but neither wild type- nor knockout allele does (Suketomo et al. 2020). Perhaps RMS may play an important role in pollen development (hence plant with a knockout allele is male sterile) but CW-mitochondria use RMS to express male sterility (i.e., RMS transmits a signal from CW-mitochondria for the CMS expression). Although RMS encodes a mitochondrial protein, its relationship with orf307 (S-orf of CW-CMS) is unknown (Fujii and Toriyama 2009). Maize Rf4 (for C-type CMS) encodes a basic helix-loop-helix protein, whose null allele had previously been identified as ms23, which is responsible for genic male sterility (Jaqueth et al. 2020). For the recessive rf4 allele for C-type CMS to allow male sterility but not for N-cytoplasm, it must have a missense mutation that inhibits hetero dimer formation of this protein with a partner protein (Jaqueth et al. 2020). It is unknown how this allele interacts with C-type mitochondria, its functioning points to the association between flower development and mitochondria. Maize Rf2a (for T-type CMS) is the first cloned Rf in the plant (Cui et al. 1996), which encode mitochondrial aldehyde dehydrogenase. The enzymatic activity of RF2a indicated that it has a broad range of substrate specificity (Liu and Schnable 2002), but its substrate in the anther is unknown. Both null allele and missense alleles are known for rf2a. A long-standing and unanswered question is how Rf2a interacts with urf13-T.

Rfs belonging to unique gene families

RFL is a subgroup of the P-class PPR gene family and is nearly ubiquitous in plants irrespective of the presence of the CMS/Rf system (Fujii et al. 2011). PPR-type Rf is frequently embedded within the RFL gene cluster (Gaborieau et al. 2016), which shows copy number variation polymorphism of clustered genes. Although sugar beet Rf1 (for Owen-type CMS) is a paralogue of Oma1 but not PPR-type Rf, it was identified from a gene cluster consisting of the Oma1 paralogues resembling the Rf1. This Oma1-paralogue cluster has evolved after the establishment of Amaranthaceae, perhaps in the genus Beta (Arakawa et al. 2020b). The Oma1-paralogue cluster seems to be ubiquitous in sugar beet and leaf beet irrespective of the Rf1/rf1 genotype (Ohgami et al. 2016; Arakawa et al. 2018). Like RFL, copy number variation is seen between sugar beet lines (Moritani et al. 2013; Arakawa et al. 2020b). Another Rf belonging to a unique gene family is the barley Rfm3 and rye Rfp1(Bernhard et al. 2019; Hackauf et al. 2017), which both encode or are associated with mTERF proteins. Melonek and Small (2022) found that plant mTERF genes form a gene family but a subgroup of this family greatly expands in cereals, to which rye Rfp1 belongs. These data suggest that some Rfs have evolved from gene families that formed and expanded in certain lineages.

Evolutionary patterns of RFL and the Oma1-paralogues are similar; they tend to form gene cluster that shows copy number variation, and they have traces of positive selection (Fujii et al. 2011; Arakawa et al. 2020b). This may be the case for a subgroup of the cereal mTERF gene family to which rye Rfp1 belongs (Melonek and Small 2022). Such evolutionary patterns are reminiscence of resistance (R) gene cluster (Young 2000), suggesting that the evolutionary forces underlying these gene families are common despite the difference in their gene products. It has been proposed that both the R gene and Rf have evolved to cope with ever-rising pathogens and S-orfs, respectively. The mitochondrial incentive to evolve S-orfs stems from the maternal inheritance of mitochondria because pollen is useless for them and male sterility is beneficial to maximize their transmission to progeny if it allows resources reallocation (Touzet 2012). In contrast, the nuclear genome incurs fitness penalty from S-orfs and favors the deployment of its suppressor (Touzet 2012). This situation evokes co-evolution between opposing parties, the so-called arms race (Fujii et al. 2011). However, it needs more study before considering all the Rf and Rf-like genes (i.e., RFLs and the Oma1 paralogues) as weapons. For example, maize Rf2a, Rf4, and rice Rf17 appear to be out of an arms race. Among the Arabidopsis RFLs, RFL8 is shown to be involved in the expression of a genuine mitochondrial gene ccmFN2, which needs Brassicaceae-specific compensation because of its truncated gene organization in this lineage (Nguyen et al. 2021).

We note that less attention is paid to the gene cluster to which Rf belongs, with regard to its function. In Arabidopsis, RPP1 locus contains an R gene cluster whose constituents encode nucleotide-binding site leucine-rich repeat (NBS-LRR) proteins conferring resistance to downy mildew (Botella et al. 1998). The clustered genes were dissected and examined for their specificity to the races of the pathogen. To our interest, multiple NBS-LRR genes in the cluster exhibited resistance to a race, indicating that RPP1 is a complex locus (Botella et al. 1998). In the case of sugar beet Rf1, multiple Oma1 paralogues in a haplotype are likely to participate in fertility restoration; hence, we consider Rf1 as a complex locus (Arakawa et al. 2020a, b). Wang et al. (2006) described rice Rf-1 locus contains two PPR genes that can restore Boro II-type CMS. Intraspecific polymorphism of the Rf locus has been reported in several crops (Yamagishi and Bhat 2014; Aarakawa et al. 2018; Melonek et al. 2019), but the functional analysis of haplotypes has been scarce. Is it only a single ORF or multiple ORFs in a haplotype that participate in the restoration of fertility? Moreover, Arakawa et al. (2018, 2019b, 2020a) found that haplotypes of sugar beet Rf1 reflect the alleles of different strengths in the level of fertility restoration (i.e., multiple allelism) (Fig. 2). Haplotype polymorphism should be considered for DNA marker development to discriminate Rf alleles.

Fig. 2
figure 2

Allelo morphs and organization of restorer-of-fertility 1 (Rf1) alleles in sugar beet. Sugar beet Rf1 is a complex locus consisting of ORFs encoding OMA1-like protein. Boxes and Wedges represent exons and introns, respectively. Gene directions are from right to left. Names of individual ORF copies are labeled above. A vertical arrow indicates the position of a premature stop codon. Protein products from ORFs indicated by striped boxes have ability to bind preSATP6 protein (the protein product S-orf of Owen-type CMS), whereas translation products from ORFs indicated by open boxes do not. This figure is based on Arakawa et al. (2020a)

Unstable CMS: its cause and potential application

Penetrance of CMS is an important factor for a CMS to be implemented into a breeding program. Low penetrance (i.e., phenotypically unstable CMS) means occasional pollen production in the CMS line, leading to contamination of self-pollinated plants in the hybrids. Xiao et al. (2022) found a variant of maize S-type CMS that exhibits a more stable male sterility phenotype. Although both the original and the variant mitochondria expressed orf355 (the S-orf of maize S-type CMS) at a similar level, the gene dosage of nad1 was different (Xiao et al. 2022), suggesting that mitochondrial genetic architecture affects CMS expression. It may be possible to genetically modify S-mitochondria to be more stable by mitochondrial genome editing.

Factors that impact CMS penetrance include low- and high temperature, air humidity, day length, and soil moisture (Kim et al. 2013; Bernhard et al. 2017; Downes and Marshall 1971; Reddy and Reddi 1970; Van der Veken et al. 2018; Duvick 1965; Bueckmann et al. 2016; Murai and Tsunewaki 1993; Bueckmann et al. 2016; Elkonin and Tsvetova 2012). In soybean, CMS-cytoplasm derived from the NJCMS1A line was combined with two different nuclear backgrounds and the two lines were grown at high temperatures during the flowering period. The CMS stability was different between the two lines (Ding et al. 2020, 2021). This is an instance that genetic background plays an important role in CMS stability.

We propose that there might be cases where environment-sensitive CMS is caused by environment-sensitive Rf. For example, when dominant Rf and environment-sensitive Rf are prevalent but true rf is scarce in a gene pool, the cognate CMS seems to be environmentally sensitive until it is combined with the true rf. In some cases, known Rf has been associated with environmental sensitivity. Rice Rf6 (for HL-type cytoplasm) can restore male fertility more stable than Rf5 under high temperatures (Zhang et al. 2017). Maize Rf9 expression is influenced by temperature (Gabay-Laughnan et al. 2009). Rf1 locus of a Japanese leaf beet accession is occupied by very weak Rf1 that conditions male sterility but occasionally restores partial male fertility under low temperatures (Arakawa et al. 2018; T. Arakawa, unpublished observation). In sugar beet, Owen-type mitochondria were combined with various genetic backgrounds, and the plants were grown under two different environments (normal temperature and high temperature). The plants’ phenotypes are classified into three groups; constant male sterile, constant male fertile, and male fertile under normal temperature but male sterile under high temperature (i.e., temperature-sensitive male sterility) (Matsuhira et al. 2022). In this experiment, the examined genetic background included different Rf1 haplotypes; the association of these elements with the three phenotypic groups was investigated (Matsuhira et al. 2022). Each of the three phenotypic groups has unique Rf1 haplotypes, suggesting an association between temperature-sensitive male sterility and the Rf1 allele (Matsuhira et al. 2022).

Even though genetic background has been known to affect the stability of CMS, the identification of genes involved in such a genetic background is challenging because of their subtle effects. The expression of environment-sensitive Rfs may be masked depending on the conditions under which the plant is growth. Additionally, the segregation of fertility restoration by such Rfs may deviate from Mendelian fashion; hence, such Rf should be identified as QTL. This strategy has been adopted and succeeded in the identification of the QTL in maize (Tie et al. 2006; Kohls et al. 2011; Feng et al. 2015; Su et al. 2016). A detailed study on the environment-sensitive Rf would provide clues to obtain stable CMS.

A breeding line equipped with stable CMS requires a maintainer line that has the same nuclear genotype as the CMS line but its cytoplasm is N for the propagation of CMS line (Fig. 3a). The cost to prepare the maintainer line for CMS line propagation could be saved if the male sterility phenotype is conditional, that is, the male sterile line can self-pollinate under permissive conditions (Fig. 3b). A similar approach is adopted in the case of hybrid breeding of rice using genic male sterility that is sensitive to photoperiod (Ding et al. 2012). This indicates that environmental sensitivity can be regarded as environment inducibility if a permissive- and nonpermissive condition is defined. Wheat photoperiod-sensitive CMS (D2-type) has been applied to maintainer-less hybrid breeding in which a line is self-pollinated under short-day conditions, whereas this line is converted into seed parents for hybrid production (Murai et al. 2008). CMS sugar beet (Owen-type) with the genotype expressing male sterility under high temperature, whereas male fertile under normal temperature was tested for its applicability to the maintainer-less system (Matsuhira et al 2022). The results showed that it is a good seed parent under nonpermissive conditions but self-pollinates to set seeds under permissive conditions (Fig. 3c). Genes associated with environment inducibility should be identified to establish a novel breeding system.

Fig. 3
figure 3

Genotypes and phenotypes of plants with different types of male sterility-inducing cytoplasms CMS. Outer circles and inner circles indicate cytoplasm and genotypes, respectively. a Breeding system using stable CMS (shown as S) requires maintainer line, containing N-cytoplasm and recessive rf alleles in homozygous, to propagate the cognate CMS line. CMS line is crossed with restorer line, harboring dominant Rf alleles in homozygous, to produce male fertile hybrids. b Plant with environment-sensitive CMS (the cytoplasm is shown as SES) is male fertile under permissive condition but male sterile under nonpermissive condition. The male fertile plant can self-pollinate to set seeds; hence, maintainer line is not necessary. The male sterile plant can be used for seed parent in hybrid seed production. c Plant with stable CMS (shown as S) is male fertile under permissive condition but male sterile under nonpermissive condition when the plant has environment-sensitive Rf genes (RfES)

Conclusions and perspectives

The implementation of CMS into the breeding programs of a variety of crop species has been a desirable outcome, but one that has been difficult to achieve. CMS expression may involve intricate interactions with known and unknown genes. Fine tuning of CMS expression would be possible if these genes and interactions are unveiled. The mitochondrial genome had been untouchable in terms of genetic manipulation for a long time, but genome editing technology is changing the situation. As novel technology emerges in this field (c.f., Nakazato et al. 2022), approaches from the mitochondrial genome will be increased in research and practical application. CMS expression has quantitative nature, part of which is associated with the expression level of CMS-associated ORF (Kazama et al. 2016; Onodera et al. 2015), the amount of the oligomer form of CMS-associated protein (Arakawa et al. 2018, 2019b, 2020a), and the amount of tapetal debris persisting in the later developmental stages (Arakawa et al. 2019b). This may imply that S-orf emits a substance that the anther issue responds to. A possible candidate for such a signal may be ROS, but it needs to be investigated whether the behavior and biological role of ROS is consistent with this notion. The involvement of mitochondria in the development of flowers should be studied in detail, with the aim of elucidating its association with CMS expression. It is possible that S-orf does not play a significant, but only slightly modifies flower development through a currently unknown mechanism.

PPR-Rf is prevalent in the crop but instances of non-PPR-Rf are increasing. The molecular basis of S-cytoplasm specificity of Rf can be associated with the ability of PPR-Rf protein to recognize S-orf mRNA in sequence-specific manner. But many Rfs lack such molecular bases. Some Rfs belong to unique gene families to specific lineages (e.g., sugar beet Oma1 paralogues and cereal mTERF genes). Evolutionary patterns of such Rfs show a striking similarity to those of RFL and R genes. It reflects arms race evolution, but a detailed study is necessary to find other possibilities. The Rf locus may be a complex locus in which multiple constituent genes can counteract single S-cytoplasm. Moreover, differences in their haplotypes may reflect functionally different allelomorphs (e.g., sugar beet Rf1 alleles differ in their strength of fertility restoration).

CMS expression is more or less affected by environmental factors. Perhaps CMS expression depends on the delicate balance between the action of S-orf and (both nuclear and mitochondrial) genes. Expression of S-orf and these genes may be affected by environmental factors. The instability of CMS expression can be corrected by nuclear background or, in the future, mitochondrial genome editing. Control of CMS by the environment has the potential to be a tool for breeding if the appropriate nuclear background and the permissive and nonpermissive condition is defined. The appropriate genetic background may include alleles of known Rf, but further study is necessary to clarify the genes involved in this genetic background.