6.1 Introduction

Cytoplasmic genomes are present in the cytoplasm and are distinct from nuclear genomes. In plants, these include chloroplast and mitochondrial genomes.

Mitochondria originated from an endosymbiosis between a host cell and an alpha-proteobacterium 1.45 billion years ago. Chloroplasts, on the other hand, originated from the endosymbiosis of a cyanobacterium 1.5 billion years ago. The host cell took control over the symbiotic fungi over the years, and eventually, mitochondria and chloroplasts came to play their own parts within the cell as an organ that synthesizes adenosine triphosphate (ATP) and an organelle responsible for carbon fixation, respectively. The genome of each bacteria disappeared as the host’s nuclear genome acquired its genes. Although hydrogenosomes present in certain anaerobic protists (e.g., Trichomonas) have entirely lost their original genomes following an endosymbiotic process, mitochondria and chloroplasts still retain their original genomes with functional genes, which are crucial for proper functioning of the organelles and the successful passage of hereditary information to the next generation (Dyall et al. 2004; Keeling 2010; Dunning Hotopp 2011).

Particular attention should be drawn to the distinct characteristics of the chloroplast genome, the plant mitochondrial genome, and the animal mitochondrial genome. The chloroplast genomes of most plant species are composed of two inverted repeats that separate a large single-copy (LSC) region from a small single-copy (SSC) region. The plant mitochondrial genomes, on the other hand, are believed to take on the shape of a circle, but the size and sometimes even the structure of them vary with species. The mammalian mitochondrial genomes are well conserved, with almost identical structure and encoded genes.

The genes encoded in the nuclear genome are essential for these organelles to function normally. Of all genes that are associated with mitochondria, over 1000 are transferred to mitochondria. These proteins are indispensable for normal photosynthesis and respiration. Furthermore, genes originating from the nuclear genome are deeply associated with the expression of genes encoded in each organelle genome. A chloroplast contains genes that use a plastid-encoded RNA polymerase (PEP) and a nucleus-encoded RNA polymerase (NEP). Although PEP and NEP use different promoters, nearly all genes have both PEP and NEP promoters like in most cases of bacterial expression, and these promoters are transcribed as polycistronic RNAs. Although mitochondria are also expressed by the bacterial RNA polymerase, they do not carry their own polymerase and therefore are dependent on a nucleus-encoded RNA polymerase (Borner et al. 2015; Hammani and Giege 2014). Transcripts are subject to post-transcriptional modifications, such as splicing, RNA editing, and definition at their 3′ end and 5′ end.

The presence of signal transduction that enables interactive control and regulation of genes encoded in the nucleus and organelles has also been identified, which plays a crucial role in stress and development responses. Although not much is known about the metabolic pathways and biosynthesis of these genes, several molecules have been identified to carry signals. These molecular components are referred to as anterograde (nuclear to organelle) and retrograde (organelle to nuclear) signals, which affect signal transduction pathways and switch gene expression on or off. Intermediates in the metabolic pathways of reactive oxygen species (ROS), calcium ion, and tricarboxylic acid (TCA) have been demonstrated to be involved in these mechanisms, and the alternative oxidase (AOX) pathway, in particular, has been studied extensively as a model for mitochondrial retrograde regulation (MRR) (Ng et al. 2014; Rhoads 2011).

In view of these facts, it is of extreme importance to focus attention on cytoplasms during the analysis of growth, development, and functionality of plants, and it is also essential to consider nucleus-cytoplasm compatibility. With agricultural crops, cytoplasmic male sterility/fertility restoration (CMS/Rf) systems are often used for F1 hybrid breeding. The use of a line exhibiting CMS as the female parent and a line with restorer genes as pollen parent allows for F1 hybrid seed production without the need for emasculation. These systems are especially useful for breeding agricultural crops not only because hybrids often exhibit hybrid vigor but also because they help protect information regarding genes carrying superior traits. With a typical CMS-Rf system, CMS-inducing genes are present in mitochondria, whereas the genes that restore male fertility are present in the nuclear genome (Hanson and Bentolila 2004; Bohra et al. 2016).

Furthermore, comparing cytoplasmic genomes between different species or genera helps to identify a lineage relationship between those species or genera. The estimation of a lineage relationship by using the nuclear genome can be quite challenging with species that have large nuclear genomes or polyploidy genomes. Cytoplasmic genomes, on the other hand, are extremely small in size, and because they are transmitted from mother to offspring, investigating cytoplasmic genomes can reveal maternal-line ancestry.

The widespread use of NGS in recent years has enabled sequencing of the entire genome of chloroplasts and mitochondria with relative ease and affordability. As a result, organelle genome data has been steadily accumulated from diverse plant species, In fact, chloroplast and mitochondrial genomes have been completely sequenced for several onion species. This chapter lays out the characteristics of the cytoplasmic genome of these onions.

6.2 Cytoplasm Species and Their Origin in Onion

Onion, one of the most important crops in the world, is a diploid species (2n = 2x = 16) that has been cultivated for more than 5000 years (Khosa et al. 2016). Onions are cultivated in 147 countries in the world, with the world’s onion production reaching 111 metric tons in 2014 (FAOSTAT 2014).

No wild form of onions has ever been found, and only cultivated varieties exist today. Allium vavilovii, which can be crossed with Allium cepa, grows naturally in the Tien-shan mountains and the Pamir–Altai mountains, where onions are believed to have originated (Kik 2002). Allium and its close relatives are classified into four grades, and it has been shown that A. cepa seems to originate from A. vavilovii or from hybridization between Allium galanthum and Allium fistulosum (Gurushidze et al. 2007).

The presence of CMS in onions has long been known, and onion was, in fact, the first crop to be used for F1 hybrid breeding. Since the discovery of CMS in onion by Jones and Emsweller (1936), a considerable increase in crop yields has been achieved through heterosis breeding (Khosa et al. 2016). To date, two types of male-sterility-inducing cytoplasm, namely the CMS-S and CMS-T types, have been widely used for commercial breeding.

The CMS-S type shows normal development until the tetrad stage of pollen development, which is followed by degeneration of the protoplasm in the tetrads, resulting in empty pollen grain (Holford et al. 1991a). CMS-T, on the other hand, has an abnormal pollen meiosis (Kik 2002).

S-type cytoplasmic male sterility is restored to fertility by a single nuclear locus (Jones and Clarke 1943; Jones and Davis 1944), whereas male sterility due to T cytoplasm is restored by three independent loci: aa, bb, and cc (Schweisguth 1973). The Normal-type cytoplasm, on the other hand, does not exhibit sterility. These three types of cytoplasms are identified by the restriction fragment patterns of chloroplast and mitochondrial DNAs.

Normal-type and CMS-T cytoplasms resemble one another and are, therefore, categorized into the M group, which is distinguished from the CMS-S-type cytoplasm (de Courcel et al. 1989; Holford et al. 1991b). In 2000, Havey showed that the CMS-S type originates from Allium x proliferum and that the CMS-T type and the Normal-type originate from A. vavilovii (Havey 2000).

Not much progress has been made in the improvement of Allium by use of wild-type subspecies in introgression breeding. This is attributable to the fact that onions are biennials that have a 2-year life cycle and that there are a limited number of onion species available for breeding. A previous study has demonstrated that while A. vavilovii can be crossed with A. cepa, crossing the triploid ovum A. x proliferum with A. cepa can be quite challenging (Kik 2002).

Introgression of A. fistulosum or A. galanthum with A. cepa produces sterility. Although CMS-induced by A. galanthum has been shown to produce empty pollen grains that are phenotypically similar to the pollens produced by the CMS-S type, CMS-S restorer genes do not restore fertility to galanthum-CMS (Havey 1999). A single locus has been found to restore male fertility in A. fistulosum (Japanese bunching onion) in which CMS has been induced by A. galanthum (Yamashita et al. 2005).

6.3 Chloroplast Genome

The chloroplast genomes are generally composed of two inverted repeats that separate an LSC from an SSC. This structure is common to all plants. The same is true for the following five chloroplast genomes identified so far for onions: CMS-S type and Normal type identified by von Kohn et al. (2013) and CMS-S type, CMS-T type, and Normal-type identified by Kim et al. (2015).

In the analysis performed by von Kohn et al. (2013), 28 single nucleotide polymorphisms (SNPs), 2 restriction fragment length polymorphisms (RFLPs), and 1 insertion–deletion (InDel) were found in the genetic code between the CMS-S type and the Normal type. Furthermore, an InDel of 45 bp was present in the accD gene, with the CMS-S type showing the same sequence as tobacco, orchid, and rice. InDels of 99, 59, and 50 bp were also present in the trnL-trnT, psbM-petN, and rps16-trnQ intergenic regions, respectively, with CMS-S having the shorter sequence. An InDel of 90 bp was also found in the rps4-trnT intergenic region, with S having one repeat of 45 bp and N having three tandem repeats of 45 bp.

In the analysis by Kim et al. (2015), on the other hand, a SNP call for all the regions revealed 323 SNPs and 141 InDels between the CMS-S type and the Normal type. Furthermore, SNPs at four locations and InDels at two locations were found between the Normal type and the CMS-T type.

Between the two registered types of CMS-S, differences were found in SNPs at three locations and an InDel at one location (Table 6.1). Between the two Normal types and the CMS-T type, differences were found in SNPs at three locations and InDels at two locations (Table 6.2). None of the differences found in those SNPs and InDels were specific to the Normal type or the CMS-T type. Mutations seen in not only the S type but also between the Normal types or between N and T appeared to be interspecies variation rather than mutations specific to Normal-type and CMS-T type. These findings suggest that while chloroplast genomes are useful for distinguishing between the CMS-S type and the Normal-type, they may be of little value when it comes to distinguishing between the Normal-type and the CMS-T type.

Table 6.1 Positions of SNPs and InDels between two CMS-S chloroplast genomes
Table 6.2 Positions of SNPs and InDels among three chloroplast genomes classified as M group

6.4 Mitochondrial Genome

As described in the previous section, it is important to distinguish between different types of onion cytoplasms because those that induce CMS are used with the Normal-type to develop F1 hybrids. Like chloroplast genomes, mitochondrial DNAs are also widely used for classification of onion cytoplasms, and as demonstrated by Havey, these cytoplasms have been classified into the CMS-S type, the Normal-type, and the CMS-T type, the latter two of which belong to the M group (Havey 2000). These three types of cytoplasms (i.e., CMS-S type, CMS-T type and Normal type) were first identified through an RFLP analysis of mitochondrial DNAs, which revealed three types of mitochondrial genomes (de Courcel et al. 1989).

A specific gene called orf725 in mitochondria has been shown to induce CMS (Kim et al. 2009). This gene, which is a cox1 gene with an additional sequence of 576 bp attached to its 3′ end, is found in both the CMS-S type and the CMS-T type. This gene has been shown to be transcribed in both CMS lines, and due to a stop codon resulting from the editing at a position of 30 bases inward from the 3′ end, this gene in the CMS lines turns out to be 184aa longer than the normal-type cox1 gene. This additional sequence shows a high level of homology with orfA501, which is a gene specific to the CMS cytoplasm of chive (A. schoenoprasum L.) (Engelke and Tatlioglu 2002), and according to protein structure prediction, it contains transmembrane domains at two positions.

Although orf725 is transcribed in both CMS lines, the normal-type cox1 is not transcribed to the CMS-S type. However, in the CMS-T type, the normal-type cox1 that is the same in size as that of the Normal type is transcribed (Kim et al. 2009). Furthermore, as mentioned earlier, the restorer genes in CMS-S and CMS-T are located at different loci, and the growth process of sterile pollen varies between these two cytoplasms. For this reason, this gene has long been considered a potential inducer of CMS; however, it has yet to be identified whether it is common to both the CMS-S type and the CMS-T type. On the other hand, a separation experiment carried out by using a line having a cytoplasm resembling CMS-T not only revealed an F1 of 3:1 but also successfully identified the presence or absence of restorer genes by using the same locus markers as those of the restorer genes of the CMS-S type (Kim 2014). Thus, future research must reassess CMS-T. In addition, it is not yet known what systems are involved in the induction of male sterility and how it is restored to fertility, which would be an interesting topic to explore.

The plant mitochondrial genomes are far more complicated than the plant chloroplast genomes. Their sizes vary with species, ranging from several mega-bases to several dozen kilo-bases. Although the structure of plant mitochondrial genomes is often expressed as a single master circle, it has also been suggested that they may exist as multiple circles or linear molecules. Furthermore, mitochondrial genomes are characterized by the many repeats they have, through which numerous recombinant molecules exist.

Kim et al. were the first to identify the CMS-S type of onion mitochondrial genome in 2016 (Kim et al. 2016), followed by the identification of other species of the CMS-S type (Tsujimura et al. 2018) and then the Normal type by our group.

Figure 6.1 shows the RFLPs of the mitochondrial genomes we identified. The comparison of our RFLPs with the RFLP patterns presented by Courcel et al. shows that their RFLP patterns resemble those of our CMS-S type and Normal type (de Courcel et al. 1989). Furthermore, it is evident that the CMS-S type and the Normal-type are different species.

Fig. 6.1
figure 1

Restriction fragment patterns of onion mitochondrial genome. M: molecular marker, S: CMS-S-type mtDNA, and N: Normal-type mtDNA

Interestingly, as it turned out, the CMS-S type we identified seemed to exist as three circles. Our initial NGS result found two circles, i.e., a 170-kbp (MC1) circle, and a 140-kbp circle. MC1 contains a direct repeat of 3.4 kbp, and when polymerase chain reaction (PCR) was carried out by using primers that had been designed to flank this repeat region, amplification and recombination were observed with all primer combinations. This PCR result suggests that the direct repeat yields two circles. Furthermore, another recombination occurred through a direct repeat of 3.4 kbp in MC1, possibly giving rise to two circles (Fig. 6.2). These findings suggest that the CMS-S type could be mapped as a total of three circles.

Fig. 6.2
figure 2

Mitochondrial genome structure of A. cepa cv. “Momiji-3”. Genes encoding proteins (red), rRNA (green), and tRNA (black) are shown outside (forward direction) and inside (reverse complementary direction) the circle. A direct repeat sequence of 3.4 kbp (R1) is also indicated as a box (purple) outside the MC1 circles

The Normal-type mitochondrial genomes, on the other hand, was illustrated as a single circle, according to the NGS result (Fig. 6.3). The total genome length was approximately 310 kbp for the CMS-S type and 537 kbp for the Normal-type, the latter of which turned out to be nearly 1.7 times longer than that of the “Momiji-3” onion cultivar. The size difference between these two species can be explained by the number of repeat regions. In fact, the percentage of the entire genome occupied by repeat regions was 1.5% for CMS-S-type and 30.7% for the Normal-type. Although it is possible that these repeat regions in the Normal-type induce recombination that could give rise to multiple circles like in the case of the CMS-S type, no band patterns were detected with pulsed-field gel electrophoresis (PFGE).

Fig. 6.3
figure 3

Mitochondrial genome structure of A. cepa “Normal type”. Genes encoding proteins (red), rRNA (green), and tRNA (black) are shown outside (forward direction) and inside (reverse complementary direction) the circle

Table 6.3 summarizes the differences in the gene regions into which protein is coded by the CMS-S type and the Normal-type. Of the base substitutions and InDels found a total of 14 positions, 12 were accompanied by amino acid substitution, with a particularly significant difference, including a frameshift, found in cox1, cox3, and nad6 (Fig. 6.4). Furthermore, we also obtained information on transcripts for each line by using RNA sequencing (RNAseq) and estimated the transcribed regions with respect to the coding regions.

Table 6.3 Nucleotide differences in the coding region of protein-coding genes between “Momiji-3” and “Normal type”
Fig. 6.4
figure 4

Details of coding region differences in the genes cox1, cox3, and nad6. Block allows show protein-coding regions and graphs on the back of block allow indicate transcript abundance calculated by the coverage of sequence reads in RNA experiments

In the case of cox1, the sequence after the stop codon differed between the CMS-S type and the Normal-type as previously reported, and with the CMS-S type, cox1 was encoded as orf725. The C-terminus of the CMS-S type was longer than that of the Normal-type by 184aa, with each transcribed region covering the coding region. In the case of cox3, due to frameshift caused by an InDel of 5 bp at the 5′ side, the N-terminus of the Normal type was shorter than that of the CMS-S type by 43aa. The transcribed region started from the same position in both genomes. In the case of nad6, the C-terminus of the Normal type was shorter than that of the CMS-S type by 75aa due to an InDel. In both genomes, however, the transcribed region ended within the coding region. Even with Arabidopsis, transcription termination in nad6 has been shown to occur upstream of the stop codon, suggesting the presence of a unique transcription termination system (Raczynska et al. 2006; Forner et al. 2007). Because the position of transcription termination is almost identical between the CMS-S type and the Normal-type, a similar transcription system is probably at work in onions.

Western blot analysis of orf725, a suspected inducer of CMS, using COX1 antibody demonstrated differences between the CMS-S type and the Normal-type (Fig. 6.5). No difference was found, however, between the CMS-S lines that have the Ms restorer gene and those that do not have the Ms restorer gene. This implies that the presence of a restorer gene does not affect the translated products of orf725. Two hypotheses can be inferred from this finding. First, genes other than orf725 may actually be responsible for inducing CMS. Genes such as the above-mentioned cox3 and nad6 may show differences even at the translation level. Alternatively, there could be an unknown open reading frame (ORF) that may exhibit unexpected effects. Secondly, a restorer gene may be playing a part in fixing a defect caused during the orf725 assembly process. cox1 is a subunit comprising respiratory chain complex IV. It could be that, even when no functional protein is produced during the process of complex formation, the helping hand from a restorer gene ensures that a functional complex is produced. Further analysis is needed, which may include the investigation of the activity of respiratory chain complexes, to verify these hypotheses.

Fig. 6.5
figure 5

Western blot analysis of COX1 and NAD9. Nms/ms; Normal-type with ms/ms, CMS-S type with Ms/ms, and CMS-S type with ms/ms