Introduction

Wheat (Triticum aestivum L.) is one of the most widely distributed and largest food crops in the world. In the face of growing population demand and consumption, it is important to increase global wheat production in a sustainable and effective manner. Heterosis (hybrid vigor) is a natural phenomenon in which the hybrid offspring of genetically diverse individuals show better yield, stress resistance, and fertility than their parents (Fu et al. 2014). To date, heterosis has been applied to rice (Oryza sativa L.), maize (Zea mays L.), rape (Brassica napus L.), soybean (Glycine max (Linn.) Merr.), and other crops; in general, hybrid varieties have higher and more consistent yields than common varieties (Longin et al. 2012; Cerna et al. 1997). Wheat is a self-pollinated plant, and its hybrid breeding requires blocking self-pollination, which can be achieved by genetic male sterility, photoperiothermic male sterility, chemical agents, or cytoplasmic male sterility (Singh et al. 2014).

Cytoplasmic male sterility (CMS) is a widespread phenomenon in higher plants that occurs due to the complex interaction between the nucleus and cytoplasm (Bohra et al. 2016), manifested by maternally inherited, pollen failure, and normal pistils. Sterility genes are encoded by mitochondrial genomes, while most fertility restorer genes are encoded by nuclear genomes (Xiao et al. 2020). Therefore, the application of CMS in hybrid breeding has become a major international trend. CMS has been found in more than 150 plants, such as maize (Zea mays L.), rice, rape, beets (Beta vulgaris L.), carrots (Daucus carota L.), onions (Allium cepa L.), sunflower (Helianthus annuus L.), and wheat (Carlsson et al. 2008).

The mitochondria are semi-autonomous organelles with their own genetic material and systems and are the main place where cells undergo aerobic respiration and are known as the powerhouse of the cell. Unlike chloroplast genome, mitochondrial genome sizes vary widely among eukaryotes, ranging from 6 kb in plasmodium to 200–2000 kb in higher plants (Palmer and Herbon 1987). Plant mitochondrial DNAs (mtDNAs) have remarkable features that distinguish them from other species. Higher plants harbor large mtDNAs that are highly variable, and genomes vary greatly in size and structure even before close relatives (Tang et al. 2015; Chen et al. 2017a). In most plant species, the mtDNA gene sequences evolve very slowly, as compared to animal mtDNA sequences. Plant mtDNAs have a high frequency of homologous recombination, which makes it easy to rearrange mtDNAs; when not lethal, it may result in CMS (Gualberto et al. 2014).

At present, CMS in wheat mainly comprise T (T.timopheevii), P (Primepi), V (Aegilops Ventricosa), K (Aegilops Kotchyi), and S (T.spelta) types of cytoplasms and so on (Wu et al. 2010; Sun et al. 1985). It has been found that abnormal anther development of K-type CMS wheat may be related to sucrose metabolism (Ba et al. 2019). By sequencing the mitochondrial genome of K-type CMS, it was found that the K-type CMS line was missing rps5 gene (Huitao Liu et al. 2011). Proteomic analysis of P-type CMS in wheat suggested that P-type CMS may be a result of cellular dysfunction caused by the disturbance of carbohydrate metabolism, inadequate energy supply, and protein synthesis (Zhang et al. 2021). Proteomics approaches found that V-type CMS in wheat may be correlated to premature programmed cell death (Guo Baojian et al. 2011). Abnormal expression of chalcone synthase (CHS) gene has been found in S-type CMS, and abnormal methylation has also been found in leaves of S-type CMS during seedling stage (Ba et al. 2014, 2017). At present, there is much research on CMS in wheat, but there is few research on the relationship between CMS and mitochondrial genome in wheat. S-type cytoplasmic interaction male sterility in wheat (S stands for sterile cytoplasm of wheat with T. spelta), the fertility transition is relatively easy, the restoration source is wide, and the restorer genes are basically distributed in common wheat species, which can create new restorer lines through accumulation of hybrid and has a good application prospect (Zhang et al. 1986; Ba et al. 2014). To study the mechanism of S-type CMS in a more systematic way, we sequenced and assembled the complete mitochondrial genome of S-type CMS. In addition, we compared and analyzed the complete mitochondrial genome of CMS and its maintainer and clarified their collinearity, InDels, SNPs, and Ka/Ks. The results provide insight into the intricate relationships between mtDNA and CMS generation and may stimulate further research.

Materials and methods

Plant materials

A S-type CMS line (S1376A) and its maintainer line (1376B) were provided by the Wheat Breeding Center, Northwest A&F University. All cultivars were bred successively over 20 years so that the CMS line is stable, and its nuclear background is the same as that of the maintainer. Plants were cultivated in experimental fields at Huaibei Normal University, Huaibei City (longitude 116.80′E, latitude 33.95′N), Anhui Province, China, in November 2020.

Microscopic observation

Fresh anthers were collected after the wheat started heading, and the development stage of microspores was determined by acetic acid magenta dyeing tablet method combined with cytological microscopy. Anther fertility was identified with iodine-potassium iodide (KI-I2), and the collected flowers were stained with a 1% KI-I2 solution and observed under an optical microscope. Healthy and fertile pollen grains are spherical and stained dark, sterile pollen grains atrophy and remain stainless.

Sequencing and assembly of mitochondrial genome

Approximately 5 g of fresh wheat leaves was collected, and mtDNA was extracted by an improved method (Chen et al. 2011). Then, the mitochondrial genome was sequenced and assembled applying second- and third-generation sequencing platforms. The second-generation sequencing platform was Illumina Novaseq 6000 (Genepioneer Biotechnologies), using the fastp (version 0.20.0, https://github.com/OpenGene/fastp) software to the original data filtering; the filter criteria is as follows: The sequencing joints and primer sequences in reads were intercepted; reads with an average mass value less than Q5 were filtered out; filter out reads with N numbers more than 5, then get high-quality reads. The third generation is the nanopore sequencing platform; the software LoRDEC (V0.6) was used to calibrate the third-generation original data with the second-generation data (Salmela and Rivals 2014).

Mitochondrial genome assembly involves several processes. Firstly, using the third-generation assembly software canu to splice the original third-generation data, then get contig sequences (Koren et al. 2017). Secondly, using blast v2.6 (https://blast.ncbi.nlm.nih.gov/Blast.cgi) to align contig sequences in plant mitochondrial gene database (mitochondrial gene sequences of published species on NCBI), the sequence on the contrast pair was stretched and cyclized. Thirdly, using NextPolish1.3.1 (https://github.com/Nextomics/NextPolish) to correct the assembly results with second- and third generation-data, the final assembly results are obtained after a manual inspection and correction (Hu et al. 2020).

Genome annotations and sequence analysis

Mitochondrial genetic structure annotation was divided into the following steps: (1) The encoding protein and rRNA used BLAST to align the published plant mitochondrial sequences as refs; further manual adjustments are made for related species; (2) tRNA was annotated using tRNAscanSE (http://lowelab.ucsc.edu/tRNAscan-SE/) (Chan and Lowe 2019); and (3) OpenReading Frame Finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html) was used to annotate ORF, the shortest length was set to 102 bp, and redundant sequences and sequences with overlap with known genes were excluded. Sequence alignments greater than 300 bp are annotated to the NR library. To obtain more accurate annotation results, the above results were checked and manually corrected. Then, the mitochondrial genome was mapped using OGDRAW (https://chlorobox.mpimp-golm.mpg.de/OGDraw.html).

Sequence analysis includes relative synonymous codon usage (RSCU) and scattered in repeated sequence analysis. The RSCU calculation method is (the number of one of the codons encoding an amino acid/the number of all codons encoding the amino acid)/(1/the type of codon encoding the amino acid), that is (the actual frequency of use of the codon/the theoretical frequency of use of the codon). Scattered repeats sequences were identified using the Vmatch v2.3.0 software (http://www.vmatch.de/) combined with Perl scripts. Its parameters were set as minimum length = 30 bp, amming distance = 3, and there were four recognition forms: forward, palindromic, reverse, and complement.

Comparative analysis of S1376A and 1376B

Firstly, the mitochondrial genomes of S1376A and 1376B were globally compared, and then the SNPs and InDels were found by using the AUTO pattern alignment of the MAFFT V7.427 software. KaKs Calculator 2.0 was used to perform Ka/Ks calculation for maFFT-aligned sequences. Using the Mauve software, the mitochondrial genome of S1376A was compared with that of its maintainer line (Rissman et al. 2009). Using DNASP5 for PI calculation, set the parameter as window 400 bp and step size 200 bp. Signal peptides were predicted using signalP-4.1 EUK mode.

Determination of O2., H2O2, and MDA content

The anthers of S1376A and 1376B were collected at the uninucleate stage, binucleate stage, and trinucleate stage and prepared in the refrigerator at − 80 °C. The content of superoxide radical (O2) was determined by hydroxylamine oxidation method (Lukatkin 2002). Determination of hydrogen peroxide (H2O2) was by titanium chloride method (Liu et al. 2010). The determination of malondialdehyde (MDA) content was based on the method of Hodges et al. and improved (Hodges 1999).

Determination of AsA and GSH content

The content of ascorbic acid (AsA) was determined by the Kampfenkel method, and glutathione (GSH) was determined by the DTNB cyclic method (Nagalakshmi and Prasad 2001; Kampfenkel et al. 1995).

Enzyme activity determinations

The activity of superoxide dismutase (SOD) were determined by the nitrate blue tetrazole (NBT) photoreduction method and the guaiacol method (Shen et al. 2014). The peroxidase (POD) activity was determined by guaiacol colorimetry (Hui Fang et al. 2012). Catalase (CAT) activity was mainly determined by the UV absorption method of Chance et al. (1955); ascorbate peroxidase (APX) activity was mainly determined by the UV absorption method (Miyake et al. 2006). Glutathione reductase (GR) activity was determined based on the modified method of Alonso GOnzález and Lynch 1998). Monodehydroascorbate reductase (MDHAR) activity was determined based on the converted method of Knörzer et al. (2008).

Results

Cytological observation of S1376A

To determine anther fertility, KI-I2 was used to stain the anther. We found that the pollen grains of 1376B were rich in starch (Fig. 1e), while the pollen grains of sterile line had no starch (Fig. 1a), indicating that sterile line S1376A was completely aborted. To determine the anther stage, we used acetic acid magenta staining to find that the anther was in uninucleate stage when the ear was just drawn out but not completely drawn out (Fig. 1b, f), in binucleate stage when the head sprouting completely (Fig. 1c, g), and in trinucleate stage after a period when the head sprouting completely (Fig. 1d, h).

Fig. 1
figure 1

Pollen development period and wheat fertility. Pollen grains of sterile and maintainer lines are stained with I2-KI solution (a, e). The anther structures of b and f; c and g; d and h were mononuclear, binucleate stage and trinucleate stage, respectively

Characteristics of mitochondrial genome assembly in S1376A

To investigate the specificity of S1376A mitochondrial genome, we sequenced it using the Llumina Novaseq 6000 and the nanopore sequencing platform. The mitochondrial genome of S1376A was assembled as a single molecule with a length of 452,638 bp, the content of GC was 44.36%, AT skew is 0.001, and GC skew is 0.002 (Fig. 2). The percentages of clean data bases with mass values greater than or equal to 20 and 30 were 98.05% and 94.18%, respectively. The mitochondrial genes of the CMS line were annotated with BLAST and tRNAscanSE, and 69 genes were identified (Table 1), including 26 tRNA genes, 8 rRNA genes (rrn5 (3), 18 (3), 26 (2)), and 35 protein-coding genes. There are several functions of protein-coding genes; one gene for maturase (matR), one gene for transport membrance protein (mttB), nine genes for complex I (nad1-7, 9, 4L), one gene for complex III (cob), three genes of complex IV (cox1-3), seven genes for complex V (atp1, 4, 6 (2), 8 (2), 9), four genes for cytohrome c biogenesis (ccmB, ccmC, ccmFN, ccmFc), and nine genes for ribosomal proteins (rpl5, 16, rps1, 2, 4, 7, 12–14). Of these, rrn26, rps3, and rps19 are pseudogenes; these three pseudogenes may be non-functional residues formed during the transformation of gene families, which are DNA fragments that have lost their gene functions and cannot be transcribed into normal mRNA and thus cannot be expressed. Through annotation of ORF, 2634 ORFs are found to be CDS, among which 1327 are forward encoding and 1307 are reverse encoding. There are 82 ORFs of length greater than 300 bp; through blast, it revealed that there were 7 with high homology (score > 400). In particular, orf1637 was highly similar fatty acyl-CoA reductase 2-like in Aegilops tauschii Coss., and orf499 was also highly similar ribosomal protein S7 in maize chloroplasts. In S1376A, there were 84 unique ORFs, 42 were forward and 42 were backward. Only orf1079, orf2232, orf2650, orf342, and orf958 were larger than 300 bp; however, annotation and blast showed that they did not encode protein.

Fig. 2
figure 2

Mitochondrial genome map. Genes that encode forward are located on the outside of the circle, and those that encode backward are located on the inside. The gray circle inside represents the GC content

Table 1 CMS line mitochondrial genome encoding genes

Relative synonymous codon usage (RSCU) analysis of PCGs

Because of the degeneracy of codons, each amino acid corresponds to at least one codon and at most six codons. The codon utilization rate of different species and different organisms diverges greatly. This unequal use of synonymous codons is called relative synonymous codon usage. It is believed to be the result of natural selection, mutation, and genetic drift. We used our own Perl script to filter unique CDS and calculate the RSCU value (Fig. 3 and Supplementary Fig. S1). Our analysis of S-type CMS mitochondrial genome showed that there were 31 codons with RSCU > 1 (Supplementary Table S1), indicating that these codons were preferentially used in the genome. The AGA bias of the codon encoding aspartic acid was the highest with RSCU of 1.55. The second is the isoleucine codon CAU, and the RSCU value is 1.54. Among the preferentially used codons, all codons ended with U (16 of 31) or A (12 of 31); two of the other three codons end in C and one ends in G.

Fig. 3
figure 3

RSCU pie chart. The height of the outermost cylinder is RSCU, the inner layer is the amino acid, and the innermost three layers are codons

Repeated sequence analysis

There were 9 large repeating sequences in the mitochondrial genome of CMS line, that is, the sequences with length ≥ 500 bp and the similarity more than 90%. The longest sequence length was 9882 bp, and the similarity was 99.98%. The shortest sequence length was 1362 bp, and the similarity was 90.05%. The similarity of four repeat sequences reached 100%, with lengths of 7035 bp, 5469 bp, 2045 bp, and 1634 bp (Supplementary Table S2).

Scattered repeats sequence, which are different from tandem repeats, are distributed in a dispersed manner in the genome; such DNA sequences are generally moderately repetitive. We used the vmatch V2.3.0 software to identified repeated sequences, and the identification forms were divided into forward, palindromic, reverse, and complement. According to the analysis results of scattered repeats sequence, it was found that there were only two forms of scattered repeats sequence in the mitochondrial genome of sterile wheat, namely forward repeats and reverse repeats. Only a few of the two types of repeat lengths were above 40 bp, and the repeats between 80–89 bp (F:2, P: 5) and 90–99 bp (F:1, P: 4) were the least. The longest scattered repeats sequence was forward repeat, and the length was 7035 bp, which encoded two rrn26 genes. Most of the other scattered repeat sequences were between 30 and 39 bp (F: 99, P: 95) (Fig. 4, Supplementary Table S3). The results showed that there were 20 forward repeats and 29 palindromic repeats in the scattered repeats sequence of 30 bp. Eight genes were encoded in the 49 scattered repeats sequence, including two nad7, two nad4 (intron), two rps3 (intron), one cox1, one ccmC, and 46 intergenic spacers (IGS) (Supplementary Table S4).

Fig. 4
figure 4

Graph of the distribution and length of the scattered repeats. The distribution of four kinds of scattered repeats on the genome (A). The two black lines represent the genome, and the same repeats are connected by line segments. The abscissa of B is the type of scattered repeats, and the ordinate is the number of scattered repeats. F, forward repetition; P, palindromic repetition; R, reverse repetition; C, complementary repetition

Collinearity analysis of S1376A and 1376B

Collinearity is mainly used to describe the location of genes on the same chromosome, that is, the type of genes and the conservation of relative order among distinctive species differentiated from the same ancestral type (that is, the homology of genes and the sequence of genes). We used the mauve software to analyze the mitochondrial genomes of S1376A and 1376B, and the results showed that neither sequence inversion nor rearrangement occurred in both genomes, only slight differences in gene location and type (Fig. 5). This indicated that the sequences of these two kinds of wheat had good collinearity and high homology. Specific differences in the gene type and location require further analysis.

Fig. 5
figure 5

Collinearity of the maintainer line and the CMS line

Identification of InDels and SNPs in S1376A and 1376B

Insertions/deletions (InDels) refer to the insertion or deletion of nucleotide fragments of different sizes at the same site of the genome between related species or individuals of the same species. Single nucleotide polymorphisms (SNPs) are the polymorphism of nucleic acid sequence caused by changes in a single nucleotide base, including substitution, transposition, deletion, and insertion. After we searched for SNPs and InDels in the S-type CMS line and the maintainer line, we identified 46 InDels and 303 SNPs (Supplementary Table S5). These InDels distributed in 39 orfs (Table 2) and 3 PCGs (cob, nad1, nad2(intron)). There were just 13 SNP-encoding genes: two SNPs for cox3, one SNP for matR, one SNP for nad4(intron), one SNP for rps4, one SNP for trnF, two SNPs for trnS, two SNPs for nad7(intron), and three SNPs for trnW (Table 3). Studies have shown that Indian cauliflowers’ (Brassica oleracea L. var. botrytis L.) cytoplasmic genetic variation is related to SNPs and InDels (Singh et al., 2021). We think that the genetic variation of the S-type CMS line may also be related to 303 SNPs and 46 InDels.

Table 2 Identification of InDels in the CMS line
Table 3 SNPs encoding genes in the CMS line

Subsequently, Pi analysis calculated the nucleic acid diversity of each gene. We used DNASP5 for PI calculation and set the parameter as window 400 bp and step size 200 bp. The calculation results revealed that the nucleotide diversity value was 0.00067, and the number of polymorphic sites was 302 (Supplementary Table S6). Among them, 211 fragments had nucleotide diversity; the maximum was 0.0725, occupying 29 polymorphic sites, and the minimum was 0.0025, occupying 134 polymorphic sites (Fig. 6).

Fig. 6
figure 6

Nucleic acid diversity

Ka/Ks analysis of S1376A and 1376B

We calculated the Ka/Ks of protein-coding genes and found that non-synonymous mutations occurred in cob, rps4, and cox3, with mutation rates of 0.0016, 0.0011, and 0.0032, respectively. The synonymous mutation rates of cob and matR were 0.0065 and 0.0025, respectively. Only cob had non-synonymous mutation and synonymous mutation, and its Ka/Ks value was 0.24 < 1 (Table 4). Ka/Ks represents the ratio between non-synonymous substitutions (Ka) and synonymous substitutions (Ks). This ratio determines whether there is selection pressure on the protein-coding gene. Ka/Ks > 1 indicates a positive selection, Ka/Ks < 1 indicates a negative selection, and Ka/Ks ≈ 1 indicates a neutral evolution (Li et al. 2009). This showed that cob was negatively selected during evolution. Whether negative selection of cob can induce pollen abortion needs to be verified in the S-type CMS line and the maintainer line.

Table 4 Ka/Ks of mitochondrial genes of the CMS line and the maintainer line

Analysis of physiological indicators

Ubichinol cytochrome c reductase encoded by cob is complex III in the mitochondrial respiratory chain. It has been convinced that complex III is the main source of ROS (Mazat et al. 2020; Brand 2016). Herein, we measured the relevant indicators, and the results showed that the content of H2O2 in S1376A and 1376B was significantly different in uninucleate stage and binucleate stage, and the content of H2O2 in S1376A in binucleate stage was more advanced than that in 1376B. The content of O2 in S1376A and 1376B was significantly different only at binucleate stage, and the content of O2 was higher in S1376A at each stage. The content of MDA in S1376A and 1376B was significantly different in all three stages, and the accumulation of H2O2 in S1376A was more. SOD activity in anthers increased in 1376B and decreased in S1376A. The SOD activity in 1376B was significantly higher than that in S1376A at the binucleate and trinucleate stages. The POD activity in anthers was significantly different among different cultivars at the same stage and increased in S1376A and then decreased in 1376B. There were significant differences in CAT activity between S1376A and 1376B anthers at binucleate stage, and CAT activity in 1376B was significantly higher than that in S1376A at binucleate stage (Fig. 7). The contents of ASA and GSH in 1376B were significantly higher than these in S1376A at binucleate stage. The APX activity maintained significantly higher at binucleate stage than in S1376A, and the difference was significant at other stages. GR activity in 1376B was prominently higher than that in S1376A at uninucleate stage and binucleate stage, while MDHAR activity was notably higher in 1376B than that in the S1376A only at binucleate stage (Fig. 8). In summary, the contents of antioxidant substances in S1376A were deficient, and the activity of antioxidant enzymes was low, especially at the binucleate stage.

Fig. 7
figure 7

ROS metabolism in the anthers of maintainer and sterile lines at different periods. Different lowercase letters indicate that different varieties have significant differences at the level of 0.05, the same as below

Fig. 8
figure 8

Effects of ascorbate–glutathione cycle on the anthers of maintainers and sterile lines at different stages

Discussion

The mitochondria are important energy supply sites for maintaining normal functions of cells. In addition, the mitochondria also participate in cell differentiation, apoptosis, cell information transmission, and other processes. The mitochondria themselves carry genetic material that coordinates with the nucleus to ensure the assembly and function of the large number of proteins in the mitochondrial respiratory chain (Osellame et al. 2012). It has highly variable intergenic regions containing distinct repeats, frequent structural rearrangements, large amounts of gene loss, and a highly variable RNA editing process (Gualberto et al. 2014). CMS is a complex trait which is believed to be related to mitochondrial genome and may be influenced by the evolution pattern of mitochondrial genome and gene transfer between organelles and nuclear ventricles in plant cells (Chen et al. 2017b). In previous research on wheat CMS, there have been a lot of research on male sterility, most of which focused on proteomics (Geng et al. 2018), cytology (Martin et al. 2010), and epigenetics (Ba et al. 2014), but there is little research on mitochondrial genome sequencing. At present, there are only three kinds of CMS type wheat mitochondrial genome sequencing. One was K-type CMS, which showed that rpl5 was absent in Ks3 and trnH was absent in Km3, twenty-two unique orfs were predicted in Ks3, representing potential candidate genes for K-type CMS (Huitao Liu et al. 2011). One was Al-type CMS, and the results showed that 12 orfs were selected as candidate causative gene sequences, among which orf279 was considered as the candidate causative gene sequence of Al-type CMS line (Hao et al. 2021). One was T-type CMS, the mechanism of this type of CMS has been determined to be due to the presence of orf279 (Melonek et al. 2021). In this research, S-type CMS was selected as the research material, and its mitochondrial genome sequence was compared and analyzed with that of the maintainer line, which provided a reference for further exploring the mechanism of S-type CMS.

Mitochondrial genome characteristics of CMS line

Unlike animal mitochondrial genome and plant chloroplast genome, their structure is conservative, and the gene arrangement is compact and very stable. Plant mitochondrial genomes vary greatly in size and structure and are sparsely arranged (Li et al. 2020a; Alverson et al. 2010). Repeated sequences are common in plant mitochondrial genomes, and most of the plant mitochondrial genomes are large, probably because there are many repeat sequences (Lowe and Chan 2016; Li et al. 2020a). However, there are different situations, Cucurbita mitochondrial genome 38% of the genome comprises repeat sequences, but its full mtDNA length is 371 kb (Alverson et al. 2010). The length of Vitis mtDNA reached about 773 kb, but only 7% of the genome that comprises repeat sequences were found (Goremykin et al. 2009). Our sequencing results showed that the length of S-type CMS mitochondrial genome was 452,638 bp, encoding 69 genes in total. The length of long repeats was 40,400 bp, and the length of scattered repeats was 60,563 bp, accounting for 8.9% and 13.4% of the mtDNA, respectively. The total length of repeating sequences was much lower than that of K-type CMS (Huitao Liu et al. 2011).

Comparison of mtDNA between the CMS line and the maintainer line

The mitochondrial genome length of S-type CMS in wheat was larger than that of maintainer, which was found in the mitochondrial genome of Nicotiana tabacum and Brassica rapa (Wang et al. 2020; Li et al. 2020b). No sequence inversion or rearrangement occurred in the two genomes. The sequences of CMS line and maintainer line were collinearity and homology were high. However, 46 InDels (39 orfs and 3 genes) and 303 SNPs (13 genes) were found by CMS and maintainer mtDNA comparative analysis. A similar phenomenon was found in soybeans. The mtDNA study of soybean CMS line found 209 SNPs and 110 InDels (He et al. 2021). We believe that SNPs and InDels may affect the fertility of wheat. The same study of Indian cauliflower (Brassica oleracea L. var. botrytis L.) mtDNAs found that its cytoplasmic genetic alteration was related to SNPs and InDels (Singh et al. 2021). Our Ka/Ks analysis showed that only the cob gene has both synonymous mutations and non-synonymous mutations. At present, some studies have shown that cob is related to CMS phenomenon. An unusual transcription pattern of mitochondrial cob gene was found in onion (Allium cepa L.) with S-type CMS, sequencing of mitochondrial cob gene revealed that chloroplast DNA sequence was inserted into the upstream region of cob in S-type cytoplasm (Sato 1998). In the cytoplasm of “Pampa” in rye (Secale cereale L.), which contains an additional cob-homologous transcript that might causally be correlated to the CMS phenotype (Dohmen et al. 1994). In our research, negative selection occurred in the mitochondrial cob gene of S-type male sterile line of wheat. After comparing the sequence of this gene, we discovered that several bases were inserted in a sequence of this gene, resulting in code shift and backward movement of the stop codon. We speculate that this might causally be correlated to the S-type CMS phenotype in wheat.

Cob is the differential gene between S1376A and 1376B, and the ubichinol cytochrome c reductase encoded by cob is complex III in the mitochondrial respiratory chain, and its function in the mitochondrial respiratory chain is to oxidize a ubiquinone in its diminished state and transfers one pair of electrons to cytochrome c. The four electrons of cytochrome c are then transferred to oxygen by complex IV (cytochrome c oxidase) to form H2O (Zhao et al. 2019; Ji et al. 2013). Studies have shown that complex III is the main source of ROS, mainly in the formation of H2O2 and O2 (Mazat et al. 2020; Brand 2016). MDA reflects the main index of membrane lipid peroxidation. In the binucleate stage, we found that the contents of H2O2 and MDA in the S-type CMS line were significantly higher than those in 1376B, indicating that ROS was accumulated in S1376A. ROS are toxic by-products produced during aerobic metabolism, which may lead to cell death if not clear in time (Maxwell et al. 2002). There are many mechanisms of ROS elimination in organisms. One mechanism involves antioxidants, such as GSH and AsA. Another mechanism relies on the protective system of antioxidant enzymes, including SOD, POD, CAT, and APX (Yang et al. 2018). GR and MDHAR engage in ascorbate–glutathione cycle and play a vital role in scavenging the accumulation of ROS (Miller et al. 2010). The research found that the content of antioxidant substances in the anthers of S1376A was deficient, and the activity of antioxidant enzymes was low, leading to the accumulation of ROS and MDA. Similar results were found in other plants. The activities of flavonoids and antioxidant enzymes in flower buds of soybean CMS lines were lower than those of maintainers (Ding et al. 2019). ROS levels were higher in the rice CMS line than in the maintainer line (Yan et al. 2014). More ROS were accumulated in the anthers of cotton CMS lines (Jiang et al. 2007). In pepper (Capsicum annuum L.) CMS line, the activity of antioxidant enzyme was lower than that of maintainer, and ROS accumulated more (Deng et al. 2012). These results indicate that ROS metabolism may be related to male sterility, especially in binucleate stage. However, whether this phenomenon is related to cob mutation still needs to be further determined.