INTRODUCTION

The human mitochondrial genome is composed of circular DNA molecules (mtDNA) approximately 16 569 base pairs in length. Unlike autosomal regions, mtDNA molecules are present in hundreds and thousands of copies per cell and are inherited strictly through the maternal line [1]. The matrilineal nature of mtDNA inheritance and the absence of recombination between molecules leads to the evolution of mitochondrial genomes through the sequential accumulation of mutations from generation to generation and to the stochastic distribution of mutant forms of mtDNA during cell division [1, 2]. In this regard, mtDNA is a fairly simple genetic system for studying macro- and microevolutionary processes, and in the case of human populations, for studying demographic history and biomedical aspects related to population differentiation as it spreads across the planet.

The main manifestation of the mutation process in mitochondrial genomes is mtDNA heteroplasmy, i.e., heterogeneity of DNA molecules in mitochondria. In the state of heteroplasmy, mtDNA molecules differ in one or more nucleotide bases and thereby form major and minor (mutant) mtDNA haplotypes. Earlier studies of mtDNA variation using dideoxy DNA sequencing (Sanger sequencing) showed that heteroplasmy is quite rare: in human populations, the frequency of individuals with heteroplasmy was approximately 1–2% [3, 4]. At the same time, identification of a minor variant of polymorphism is possible at a level of 10–20% and higher [5, 6]. However, the use of high-throughput massively parallel DNA sequencing (MPS) technologies has shown that mtDNA heteroplasmy is much more common. For example, in a study of mtDNA preparations obtained from whole blood in a sample of 1526 pairs of mothers and their children, approximately 45% of individuals showed heterogeneity of mtDNA molecules, with base heteroplasmy observed at a level of 1% or higher [7]. In population studies performed using MPS, mtDNA heteroplasmy was detected in 25–30% of the studied individuals, with minor bases at a level of 5% or higher [5, 8].

According to modern concepts, the evolution of mtDNA begins with a mutation in one of the mtDNA molecules of germline cells, the appearance of a state of heteroplasmy in the mitochondrion, and the subsequent segregation of mtDNA molecules into two (sometimes more than two) homogeneous lineages [911]. Segregation of mtDNA, leading to homoplasmy is accelerated by the “bottleneck” mechanism, which sharply reduces the number of mtDNA molecules in primary (primordial) germ cells [10]. As a result, the proportions of wild and mutant mtDNA molecules randomly shift up to homoplasmy of each type of molecule.

Studies of mtDNA polymorphism in human populations have practically not taken the effect of heteroplasmy on mutational spectra into account, since most of the population data were obtained using the dideoxy DNA sequencing method, which makes it possible to identify only small fractions of minor polymorphism variants. Thus, interpopulation differences are based on the results of the distribution of “major” mtDNA haplotypes present in mitochondria with the highest frequencies. To reconstruct the spectra of mtDNA nucleotide substitutions for population data methods of phylogenetic analysis of mtDNA nucleotide sequences are used, which make it possible to trace how genetic changes (nucleotide substitutions, deletions, and insertions of nucleotides) of mitochondrial genomes occurred during the process of evolution in the direction from ancestors to descendants. The results of these studies showed that in the mutational spectra of mtDNA, predominantly transitions are observed, and among them, pyrimidine substitutions predominate over purine ones (if mutations along the L-chain of mtDNA are taken into account). The most common substitutions in various ethnogeographical groups of humans are T→C [12, 13]. It was also found that there were no differences in the distribution of mtDNA mutations in the population of high-mountain and non-mountainous regions [13], although it was expected that the spectra of mtDNA nucleotide substitutions under normal conditions and under hypoxia should differ, especially in the frequency of T→C transitions, which are presumably most sensitive to the level of oxidative stress in mitochondria [14].

The above-mentioned features of mtDNA mutational spectra were discovered using phylogenetic analysis of nucleotide substitutions inherited from the maternal line from generation to generation over a very wide time interval, quite sufficient to identify genetic effects associated with the adaptation of populations to different living conditions and selection. Meanwhile, the primary source of mtDNA mutations are those genetic changes that occur in the mitochondrial genomes of germline cells and are maintained in a state of heteroplasmy in mothers and their children. Therefore, it seems important to question the extent to which the spectra of nucleotide substitutions observed in the state of heteroplasmy in the first generation coincide with the spectra of mutations inherited over successive generations, reconstructed using phylogenetic analysis of population data on mtDNA variability. Consideration of this issue is the purpose of this work.

MATERIALS AND METHODS

The distribution of nucleotide substitutions in the mitochondrial genomes of 649 individuals characterized by mtDNA heteroplasmy at a level of 1% or higher was analyzed based on the results of a study of 1526 pairs of mothers and their children [7]. For analysis, samples of nucleotide substitutions inherited in the state of heteroplasmy from mothers to their children were formed (category “inherited” according to [7]; 416 substitutions) and mutations that arose in children and identified in them in the state of heteroplasmy (category “de novo” according to [7]; 477 replacements).

Since, according to the UK National Institute for Health Research (NIHR) project [15], the above data were obtained for individuals of predominantly European origin (more than 80% of the total sample), for comparative analysis we reconstructed the spectrum of mtDNA nucleotide substitutions in Europeans by means of phylogenetic analysis. For this purpose, we used data presented in the GenBank database (https://www.ncbi.nlm.nih.gov/genbank/) nucleotide sequences of entire mitochondrial genomes in Russians (N = 377), Poles (N = 96), Czechs (N = 102), Hungarians (N = 78), Serbs (N = 163), Basques (N = 176), and Danes (N = 174) (1166 mitochondrial genomes in total). For phylogenetic analysis, we used the maximum parsimony method implemented in the mtPhyl v4.015 software package (https://sites.google.com/site/mtphyl/home). The order of appearance of nucleotide substitutions on the mtDNA phylogenetic tree was recorded in the direction from ancestors to descendants. The statistical significance of differences between the frequencies of nucleotide substitutions in populations was assessed using a two-sided version of Fisher’s exact test. All spectra nucleotide substitutions were reconstructed relative to mtDNA L strands.

To assess the effect of selection on mtDNA variability, we examined the distribution of nonsynonymous (NS) and synonymous mutations (S) in substitution groups associated with haplogroups (H), and private (P) substitutions in the terminal branches of the phylogenetic tree according to the methodology proposed in [16]. The analysis is based on comparison of NS/S ratios between groups H and P using Fisher’s exact test. The neutrality index NI is determined by the ratio (NS/S)P/(NS/S)H. It is assumed that in the absence of selection the neutrality index NI has values close to 1.0; at NI > 1.0 a negative (purifying ) selection is expected, at NI < 1.0 a positive selection is expected.

Previously reconstructed spectra were also used for comparative analysis of nucleotide substitutions in entire mitochondrial genomes of the Eurasian population: Northeastern Siberia (Eskimos, Aleuts, Chukchi, Koryaks, and Yukagirs; N = 336), Southern Siberia and adjacent territories of North-Eastern China (Buryats, Barguts, and Khamnigans; N = 430), Western Asia (Persians, Qashqais, and Lebanese; N = 340) [12], as well as among the indigenous peoples of the Pamirs (N = 202) and Tibet (N = 268) [13].

RESULTS AND DISCUSSION

The results of the analysis of the spectra of nucleotide substitutions of mitochondrial genomes reconstructed by phylogenetic analysis in the European population in comparison with other regions of Eurasia showed approximately the same prevalence of transitions and transversions in different populations (Table 1). The ratio of transitions to transversions in regional groups varies from 23.5 : 1 in the South Siberian group up to 27.8 : 1 in Tibetans; among Europeans it is registered at an intermediate level of 24.7 : 1.

Table 1. The spectra of phylogenetically reconstructed nucleotide substitutions mtDNA (along the L chain) in various human populations

In all regional spectra of mtDNA, pyrimidine transitions predominate, and of these, T→C substitutions, which make up approximately 33.4% among all nucleotide substitutions on average. The next most common substitutions in most regional groups are A→G, and in the Pamir and Northeast Asian groups it is G→A replacements. Of the transversions in all populations, C→A substitutions predominate, except for Tibetans, where A→C replacements are more common (Table 1). However, in general, the studied population groups do not differ statistically significantly in the distribution of frequencies of nucleotide substitutions in mtDNA spectra (P > 0.1, Fisher’s exact test).

In the study of the spectra of nucleotide substitutions observed in the state of heteroplasmy at a level of 1% and higher in the first generation, data on mtDNA polymorphism in pairs of mothers and their children were used according to [7]. Samples of nucleotide substitutions inherited by children in the state of heteroplasmy (“inherited” category) and substitutions that arose in children (“de novo” category) were analyzed. For analysis, these samples were combined into one, since in children mtDNA substitutions were inherited from mothers (via oocytes) and also arose at the embryonic stage and later (Table 2). Since [7] used the MPS platform designed for sequencing short sections of DNA (Illumina HiSeq 2500 sequencer), there is a danger of contamination of mtDNA sections with sequences of nuclear copies of the mitochondrial genome. To reduce this risk, it is recommended to increase the level of registration of minor bases in heteroplasmic mixtures to 5% or higher; this makes it possible to eliminate low-frequency mutations, including those emanating from nuclear copies of mtDNA [17]. Thus, Table 2 shows the distribution of nucleotide substitutions for the spectrum of mutations observed in the state of heteroplasmy at a level of 5% and above.

Table 2. The spectra of mtDNA nucleotide substitutions observed in the state of heteroplasmy in the first generation (along the L chain)

The results of a comparative analysis show that both mutation spectra are characterized by a predominance of pyrimidine transitions, and among them T→C substitutions (Table 2). Statistically significant differences are revealed only in the frequency of the C→T transition (P = 0.016) and A→C transversion (P = 6.7 × 10–4). In general, the studied spectra of nucleotide substitutions differ (although not significantly) in the frequency of transversions: the 1% spectrum contains more transversions than the 5% spectrum (6.2 versus 3.8%). A comparison of the phylogenetically reconstructed mutation spectrum of the mitochondrial genomes of Europeans with the spectra of mtDNA substitutions in the state of heteroplasmy showed that the first spectrum is significantly more different from the second (1%) than from the third (5%). The phylogenetic spectrum of mtDNA differs from the spectrum for heteroplasmy at the level of 1% and higher in the frequency of C→T and G→A (P = 2 × 10–6 and 0.034, respectively) transitions and A→C and T→A (P = 5.6 × 10–5 and 0.02, respectively) transversions, and from the spectrum of heteroplasmy at a level of 5% and above in the frequency of one substitution, the T→A transversion (P = 0.026). Thus, the results we obtained indicate the similarity between the spectra of nucleotide substitutions, reconstructed using phylogenetic analysis of population data on mtDNA variability, and nucleotide substitutions observed in the state of heteroplasmy in the first generation, especially with heteroplasmy at a level of 5% and above.

Phylogenetic analysis allows us to reconstruct mutational changes in mtDNA over time at, at least, two hierarchical levels: at a more ancient level for nucleotide substitutions inherited over many generations and undergoing selection (these substitutions are located in the nodes and trunks of the phylogenetic tree and determine mtDNA haplogroups), and at a “younger” level for nucleotide substitutions that are located in single terminal branches of the tree [16]. Using this analytical approach, the effect of negative selection during the evolution of mtDNA in various human populations was revealed, since at the periphery of phylogenetic trees, as a rule, a larger number of nonsynonymous substitutions are found than in the trunks and nodes, especially at the base of the tree [16, 1820].

Analysis of data on mtDNA variability in Europeans showed that the spectra of nucleotide substitutions associated with haplogroups and terminal branches of the phylogenetic tree differ only in the frequency of G→A substitutions (P = 0.037), which are more often found on the periphery of the tree than in the trunks (Table 3). In both spectra, pyrimidine substitutions dominate; among them is the T→C substitution. The frequencies of transversions also do not differ: 4.1% in tree trunks and 3.8% in terminal branches. Analysis of protein-coding regions of mtDNA showed that the mitochondrial genomes of Europeans are under the influence of negative selection (neutrality index NI = 1.32, P = 0.006) (Table 4). When considering the distribution of nucleotide substitutions in the phylogenetic tree, it is clear that only G→A substitutions (P = 5.8 × 10–4) are statistically significantly affected by negative selection; this is manifested in the fact that in terminal branches nonsynonymous substitutions predominate over synonymous ones. It is noteworthy that in the spectra of nucleotide substitutions, the predominance of pyrimidine transitions over purine ones (and among them T→C substitutions) observed only in groups of synonymous substitutions; in groups of nonsynonymous substitutions, on the contrary, purine substitutions predominate (Table 4). Moreover, in the spectra of nonsynonymous substitutions (both in the trunks of haplogroups and in the terminal branches), a predominance of G→A transitions is observed, for which a high degree of homoplasy was previously shown [21, 22].

Table 3. Frequency of nucleotide substitutions associated with haplogroups and terminal branches of the mtDNA phylogenetic tree of Europeans (by L-chain)
Table 4. Analysis of the distribution of nonsynonymous (NS) and synonymous (S) substitutions in the mtDNA phylogenetic tree of Europeans (test by Elson et al. [16])

In this work, using phylogenetic analysis of data on the variability of mitochondrial genomes, we studied the spectra of nucleotide substitutions observed in human populations. These substitutions represent “major” variants of mtDNA polymorphism that emerged from the state of heteroplasmy over many generations, i.e., on an evolutionary time scale. In addition, the spectra of mtDNA nucleotide substitutions observed in the state of heteroplasmy in the first generation were analyzed. As a result, it turned out that the compared spectra of mutations (along the L‑chain of mtDNA) practically do not differ in their main parameters: in the distribution of pyrimidine and purine substitutions (with a predominance of T→C transitions) and the ratio of the number of transitions and transversions.

Therefore, the data we obtained may indicate that the selective processes directing the evolution of mtDNA in one and many generations are of a similar nature, i.e., are caused by negative selection, which is quite reliably recorded in the analysis of phylogenetic trees of mtDNA haplotypes in human populations. Thus, according to the present and other studies [16, 1820, 23, 24], in protein-coding regions of mitochondrial genomes, the influence of negative selection is associated with an increase in the number of nonsynonymous nucleotide substitutions on the periphery of phylogenetic trees of mtDNA haplotypes. In addition, the possible effect of negative selection associated with differences in the distribution of pathogenic nonsynonymous substitutions was identified when comparing mitochondrial genomes in mother–child pairs [25]. Thus, if we accept that the heteroplasmic mixtures of mutations observed in the first generation are under the influence of negative selection, then it would be important to understand the origins of this phenomenon and study the distribution of nucleotide substitutions at earlier stages, that is, in oocytes and primordial germ cells. However, the data obtained to date in this direction are quite contradictory, mainly, apparently, due to the small volumes of analyzed mutations. In one study, in the spectrum of nucleotide substitutions of mtDNA located in the mitochondria of oocytes in the state of heteroplasmy, a predominance of purine substitutions over pyrimidine substitutions was observed [26], and in another, on the contrary, with a predominance of T→C transitions [27].

A study of mtDNA variability in primordial germ cells showed that as germ cells develop an increase in negative selection is observed, which is manifested in a decrease in the ratio of the number of nonsynonymous to synonymous substitutions at later stages of embryonic development [10]. However, this conclusion was later challenged due to the possible admixture of mutations associated with nuclear copies of mtDNA [28]. Based on the results of the reanalysis, the conclusion was proposed that the nature of the selection of mtDNA mutations during early embryonic development cannot be associated only with the influence of negative selection, since at some stages manifestations of positive selection are possible (even in favor of harmful mutations) [28]. Thus, to clarify issues related to the study of genetic changes in germline cells, further studies of the spectra of mtDNA mutations at various stages of embryogenesis are necessary.