1 Introduction

About 2 billion years ago, a bacterium was engulfed by another cell as an endosymbiont [1]. The relationship between bacterium and cell turned out to be a beneficial one, perhaps as the bacterium provided a new source of energy for the cell, which in turn provided protection from the environment. Over millions of years of evolution, the bacterium lost its independence and became an organelle of the cell: the mitochondrion. Currently most cells in our body cannot survive without mitochondria which, besides being the main energy producers in the cell, are involved in various other processes including intracellular calcium signalling, iron–sulphur cluster biogenesis, and cell death [25]. Mitochondria can be highly dynamic organelles, continuously undergoing fusion and fission events which leads to a diverse range of mitochondrial morphologies, from fragmented states to continuous networks [6, 7]. Correctly balancing mitochondrial fusion and fission is important for cellular functionality [8, 9], and dysfunctions in mitochondrial fusion–fission dynamics have been observed in numerous diseases [7, 1014]. Models of mitochondrial fusion–fission and implications on cellular health are discussed in Sect. 3.2.

Due to their rich evolutionary history [15], mitochondria retain their own genetic material: mitochondrial DNA (mtDNA). MtDNA is tiny compared to nuclear DNA and in humans comprises only 16,569 base pairs, encoding only 37 genes. The number of mtDNA molecules in a cell depends on the type of cell, and can vary over time. Replication of mtDNA can occur independent of the cell cycle [16] (though it is linked to certain stages of the cell cycle, see, e.g., [17]), ensuring continuous turnover of the mtDNA population in most dividing as well as non-dividing cells. Errors can occur during replication of mtDNA molecules. Mitochondria do contain machinery to repair mtDNA [18, 19], but this machinery is possibly less efficient than nuclear DNA repair. When mutations occur, they often coexist with wildtype mtDNA molecules, a situation which is called heteroplasmy. Denoting the number of wildtype and mutant mtDNA molecules in a cell by w and m, respectively, the level of heteroplasmy is defined as

$$\displaystyle{ h = \frac{m} {w + m} }$$
(1)

and can vary between cells in the same tissue, between different tissues, and between individuals.

Mutations in mtDNA can have detrimental consequences, and mitochondrial diseases (including mutations of both mtDNA and nuclear DNA) affect ∼ 1 in 4300 of the adult human population [20]. Because of the large number of mtDNAs in a cell, the presence of a few mutants does not immediately cause major problems. The heteroplasmy value has to exceed a critical threshold, typically 60–90%, before biochemical defects are observed [2124]. Interestingly, the same pathogenic mutations that cause mtDNA diseases are also found in healthy individuals but at much lower levels [25, 26]. Can these low frequency mutants suddenly expand? Are there ways in which we can prevent them from doing so? Understanding the dynamics of mtDNA molecules inside cells and the way in which mutants can accumulate over time, or even take over the whole mtDNA population (homoplasmy), is a crucial step in understanding the progression of mtDNA diseases [66, 115, 120] and diseases which might be linked to mitochondria (e.g. cancer [11], Parkinson’s [10], diabetes [7], and Alzheimer’s [12]).

Because the severity of mitochondrial diseases is partly related to the proportion of cells that have heteroplasmy values above the critical threshold, it is important to have knowledge of how the heteroplasmy distribution changes over time. Stochastic modelling allows us to investigate these important distributions and analyse the cell-to-cell variability in heteroplasmy, as opposed to deterministic models which typically only describe mean behaviour.

Mutations (pathological and non-pathological) accumulate with age in any healthy individual. Mutations associated with ageing are typically seen in postmitotic tissues (e.g. myocardium and brain) and often a cell contains very high fractions of a single mutation, which is said to have clonally expanded [27]. Different cells usually contain distinct mutations though some types of mutations occur more often (like the ‘common deletion’, e.g. [28]). It is not yet clear how clonal expansion arises, and there exist several hypotheses on this topic.

Many of the hypotheses on mutant clonal expansion involve a selection advantage for the mutant species such as: a shorter replication time for deletion mutants because of their smaller genomes [2931]; the survival-of-the-slowest (SOS) hypothesis which assumes that certain mutations reduce the release of damaging superoxide molecules, with the result that these mutant mitochondria are less often degraded than wildtype mitochondria [32, 33]; the ‘crippled mitochondrion’ hypothesis which states that mitochondrial biogenesis is partly controlled by the mitochondrion itself, and that mutant mtDNA molecules create a microenvironment that stimulates their own replication [34, 35]. Even though all of these hypotheses have some attractive features, none of them are fully supported by experimental data (a critical review is given in [42]). A recently proposed selection mechanism involves a faster replication rate for mutants caused by differences in transcription rates [36] and is discussed in more detail in Sect. 4.

One of the hypotheses that does not involve a direct selection advantage for mutants is the vicious cycle hypothesis. Unlike the SOS hypothesis, it states that mutants create more damaging radicals and thereby create more mutants which then create more radicals, forming a vicious cycle [3739]. The vicious cycle hypothesis predicts a whole range of different mutations to occur, which is not seen experimentally and evidence points towards replication as the major source of errors [40]. Another hypothesis is that stochastic drift of mtDNA molecules over time can account for the observed clonal expansion. In theory, mutants can take over entire mtDNA populations purely by chance because of the stochasticity of mtDNA replication and degradation, and cell divisions. This idea is also not fully supported by data, as versions of it cannot explain clonal expansion in short-lived animals [41]. Currently a debate exists in the literature as to whether selective differences exist between types of mtDNA in mixed populations [4144].

Studying the dynamics of mtDNA molecules and mutant accumulation (with or without additional selection advantages for mutants), and the effects of cell divisions and possible nuclear feedback mechanisms is an important step in understanding the ageing process and the progression of mtDNA diseases. There have been numerous models describing mtDNA dynamics in individual non-dividing cells, in dividing cells, on a tissue level, or across generations (reviewed below). Both the physical (fusion–fission) and the genetic (mtDNA dynamics) properties of mitochondria are linked and stochastic, meaning that physical and genetic stochastic models are valuable. We will discuss the different type of models that are considered and some of their main results.

2 Experimental Observations

In healthy cells, mtDNA levels are controlled and remain fairly constant over time as we age [45]. The number of mtDNA molecules per cell ranges from about 100 to 105 and depends heavily on the type of cell (e.g. mtDNA levels per cell have been measured to be 3650 ± 620 in skeletal muscle, 6970 ± 920 in heart [45], and mature human oocytes have around 105 mtDNA copies). In heteroplasmic cells, i.e. cells with both wildtype and mutant mtDNA, total mtDNA copy number can be 5- to 17-fold higher than in cells with only wildtype mtDNA [35, 4648], and this proliferation is one of the hallmarks of certain heteroplasmic mtDNA mutations.

As we age, heteroplasmy levels can start to vary dramatically between tissues [49, 50], being particularly high in the putamen, cerebral cortex and substantia nigra [51, 52]. Mammalian aged tissues show a mosaic pattern of healthy cells and severely dysfunctional cells, meaning that a patch of healthy cells can occur directly adjacent to cells with high mutant loads (reviewed in [53]). A small number of very dysfunctional cells can sometimes have large effects on tissue performance, and this non-linearity provides another reason why studying distributions rather than mean behaviours is important. By the eighth decade of life, ≤ 5% of postmitotic cells develop COX deficiency [5456]. By the eighth decade of life, ∼ (0. 1 − 5)% of postmitotic cells develop mitochondrial deficiencies due to high mutant levels [5456]. Perhaps surprisingly, rodents show similar levels of deficiency at only 3 years [57, 58].

Experimental measurements of parameters that are often used in stochastic models of mtDNA dynamics are summarized in Table 1. Note that interpreting experimental data can be challenging because sources of uncertainty are introduced through experimental measurement [59]. A Bayesian model was constructed to partially address this problem [60], inferring posterior parameter distributions for the substantia nigra region of the human brain (Table 1).

Table 1 Measured values (or values often used in models) of parameters relevant for mathematical models of mitochondria

3 Stochastic Modelling of Mitochondria

Our coverage of stochastic models of mitochondria starts by introducing a well-known and often-used model of mtDNA dynamics, the ‘relaxed replication model’. This model is discussed in some detail because it gives an intuitive possible explanation for the observed threshold effect and it is referred to by many subsequent models. Afterwards, a brief discussion of other in silico models of mtDNA dynamics and mitochondrial fusion–fission dynamics is given. Finally, an analytical model of mtDNA dynamics is discussed, which generalizes the relaxed replication model by allowing for arbitrary nuclear control of the replication and degradation rates of mtDNA. The specific role of mtDNA dynamics in ageing and development is discussed in the next section. We note briefly the types of stochastic mitochondrial phenomena that we do not consider: variability in mitochondrial network structure independent of genetics, e.g. [8, 71, 72], organ-to-organ variability, e.g. [66], how variability in mitochondrial content and function might link to gene expression variability, e.g. [73], and how individual membrane potentials might fluctuate, e.g. [74].

3.1 Relaxed Replication and Its Implications

In 1999, a stochastic model was developed to study how populations of mtDNA molecules vary over time [75]. This model of mtDNA dynamics is known as ‘relaxed replication’ because it assumes mtDNA turnover occurs continuously over time, independent of cell division. It has been subsequently used in a variety of other models, e.g. [41, 43, 76] and has obtained experimental support, e.g. [77]. Two situations were investigated, both of which concerned simulations of cells with two different types of mtDNA molecules. In the first case both types were neutral and in the second case one of the types was pathological. The aim was to see how the presence of mutant molecules affects the overall dynamics, and how the fraction of mutants varies over time. The population of mtDNA molecules is assumed to be well-mixed and cells are assumed to be non-dividing.

In the case of two neutral types of mtDNA molecules, the cell tries to meet a certain copy number, and the main dynamics are described by the following ODE:

$$\displaystyle\begin{array}{rcl} \frac{dN} {dt} & =& C -\mu N \\ & =& \mu (N_{\mathrm{opt}} - N){}\end{array}$$
(2)

where N is the total number of mtDNA molecules, C is the copy rate at which new mtDNA molecules are generated, and μ is their degradation rate. The constant C is chosen such that the total population is controlled towards a desired value N opt. When embedded in a stochastic framework, the above equation describes an Immigration-Death model with constant immigration (C) and death (μ) rates. The corresponding master equation for two neutral alleles A and B (with N = A + B) is given by:

$$\displaystyle\begin{array}{rcl} \frac{\partial P_{AB}(t)} {\partial t} & =& \mu N_{\mathrm{opt}}\Big(P_{A-1,B}(t) + P_{A,B-1}(t)\Big) +\mu (A + B + 1)\Big(P_{A+1,B}(t) + P_{A,B+1}(t)\Big) \\ & & -2\mu (N_{\mathrm{opt}} + A + B)P_{AB}(t) {}\end{array}$$
(3)

Master equation descriptions of mtDNA populations can sometimes be solved explicitly [102, 107], but this is often not the case and approximation methods must be made, such as the system size expansion (discussed in Sect. 3.3). The original implementation of this model combines deterministic immigration events with stochastic death events, which is arguably less natural than a full stochastic model (see supplement of [78]). The initial conditions were taken such that A(0) + B(0) = N opt, i.e. the system started in steady state. Simulations showed that, on average, the proportions of alleles A and B remain constant. The probability that a certain allele takes over the entire population was found to be equal to the initial allele frequency, consonant with a random drift model. The relaxed replication model is often referred to as ‘the random drift model’ (though the dynamics are non-trivial), and this term has now come to refer to models with an absence of selective differences between mutant and wildtype species.

To include the presence of mutant mtDNA, the dynamics in Eq. (2) were slightly changed to dNdt = C(N) −μN, i.e. the replication rate now depends on the state of the system. It was argued that a severely pathological mutant should not contribute to the replication feedback, meaning that C(N) = C(w), i.e. the control is only dependent on the wildtype species. The replication rate was then multiplied by the fraction of the species (proportional selection) which leads to:

$$\displaystyle\begin{array}{rcl} \frac{dw} {dt} & =& C(w) \frac{w} {w + m} -\mu w \\ \frac{dm} {dt} & =& C(w) \frac{m} {w + m} -\mu m \\ C(w)& =& \mu N_{\mathrm{opt}}\left (\alpha +(1-\alpha ) \frac{w} {N_{\mathrm{opt}}}\right ){}\end{array}$$
(4)

The parameter α ≥ 1 describes the response of the system. The idea is that the cell still tries to maintain the same number of wildtype mtDNAs as it would when no mutants are present, i.e. when w = N opt, C(w) = μN opt as it was in Eq. (2).

Using Eq. (4), the deterministic steady states of the system [denoted by (w s , m s )] can easily be found, and they form a line defined by

$$\displaystyle{ w_{s} + \frac{1} {\alpha } m_{s} = N_{\mathrm{opt}} }$$
(5)

Stochastic events will cause trajectories to fluctuate around the deterministic steady state line until one of the absorbing boundaries (h = 0 or h = 1) is reached. The survivor species will continue to fluctuate around its own steady state value.

Like before, the proportion of cells that become fixed on a certain allele is observed to be the same as the initial allele frequency. However, when w becomes fixed its steady state value is N opt, whereas a fully mutant cell will have copy numbers fluctuating around αN opt. If the simulation starts in steady state with the initial frequencies both 0.5 (meaning \(w_{0} = m_{0} = \frac{N_{\mathrm{opt}}\alpha } {1+\alpha }\)), then eventually 50% of cells will be (on average) in state (N opt, 0) and 50% in state (0, αN opt). This means that for long times 〈w〉 = N opt∕2 and 〈m〉 = αN opt∕2, i.e. 〈w〉 and 〈m〉 decrease and increase over time, respectively. Therefore, the heteroplasmy of the tissue as a whole (\(\frac{\sum _{i}m_{i}} {\sum _{i}m_{i}+\sum _{i}w_{i}}\) where i denotes the ith cell) increases. Note, however, that it does not follow that the mean heteroplasmy per cell, \(\frac{1} {n}\sum _{i}h_{i}\), increases with time.

A larger value of α means that 〈m〉 approaches a larger value, which seems to be disadvantageous. The benefit is, however, that w will deviate from its desired value N opt more slowly as h increases. More precisely, w s N opt = (1 − h)∕(1 − (1 − 1∕α)h), and a high α can therefore be interpreted as an attempt of the cell to keep the wildtype population near its optimal value. The consequence of a high α is an effective heteroplasmy threshold (Fig. 1) and this simple model could therefore be an explanation of the experimentally observed threshold effect [2124]. A generalization of the model described in Eq. (4), in which the mutants were allowed to contribute to the feedback as well [i.e. C = C(w, m)], was described deterministically [76]. The more the mutants contribute to the feedback, the less pronounced the threshold effect seen in Fig. 1.

Fig. 1
figure 1

Steady state lines of the relaxed replication model. (a) The number of wildtype mtDNA (relative to the desired value N opt) as a function of α. For high α, w remains close to N opt for a wide range of heteroplasmies until it suddenly drops at high h, creating an effective threshold effect. The functions that are plotted are described by w s N opt = (1 − h)∕(1 − (1 − 1∕α)h). This behaviour has been observed in skeletal muscle fibres [77]. (b) The line of steady states, showing that a large value for α results in a large number of mutants if h is high

3.2 In Silico Models of mtDNA Dynamics and Mitochondrial Dynamics

In many models, mtDNA molecules are assumed to be well-mixed and each mtDNA has a given probability per unit time of being replicated and degraded. The occurrence of mitophagy events, events involving the degradation of a whole mitochondrion, means that all the mtDNA molecules within are simultaneously degraded. In this case, it becomes important to know which mtDNA resides in which mitochondrion. Moreover, mitochondria can only fuse with others when they are sufficiently close, meaning that spatial positions start to play a role. The possible roles of fission and fusion and the networks of mitochondria that are produced are unclear [8]. Various models describe fusion and fission dynamics [7982], some of which include spatial effects [80, 82]. A brief overview of their results is given here.

The model in [82] incorporates random mtDNA turnover, fusion and fission of mitochondria, and spatial effects. MtDNA turnover was assumed to consist of (1) replication events of individual mtDNAs (with possible state feedback), and (2) degradation of whole mitochondria. The cell was divided into 16 compartments, and fusion only occurred between mitochondrial pairs in the same or in adjacent compartments. Upon fission, mtDNAs were binomially partitioned into daughter mitochondria which were themselves placed in their own compartment or an adjacent one. Some of these dynamics are described by the following Poisson processes:

$$\displaystyle{ \begin{array}{rlll} (w,m)\,&\mathop{\longrightarrow}\limits_{}^{a_{R}(w,m)} &&(w + 1,m) \\ (w,m)\,&\mathop{\longrightarrow}\limits_{}^{a_{R}(w,m)} &&(w,m + 1) \\ (w,m)\,&\mathop{\longrightarrow}\limits_{}^{a_{D}N_{\mathrm{mito}}} & & \emptyset \\ (w_{1},m_{1}) + (w_{2},m_{2})\,&\mathop{\longrightarrow}\limits_{}^{a_{fus}} &&(w_{1},m_{1}\vert w_{2},m_{2}) \\ (w_{1},m_{1}\vert w_{2},m_{2}\vert w_{3},m_{3})\,&\mathop{\longrightarrow}\limits_{}^{a_{\mathrm{fis}}(w,m)}&&(w_{1}^{{\prime}},m_{1}^{{\prime}}) + (w_{2}^{{\prime}},m_{2}^{{\prime}}\vert w_{3},m_{3}) \\ & &&\text{with}\,\,\,w_{1}^{{\prime}} + w_{2}^{{\prime}} = w_{1} + w_{2} \\ & & & \text{and}\,\,\,m_{1}^{{\prime}} + m_{2}^{{\prime}} = m_{1} + m_{2}\end{array} }$$
(6)

where (w, m) represents a single mitochondrion with w wildtype and m mutant mtDNA molecules, N mito is the total number of mitochondria in the cell, a R (w, m) represents the replication rate with feedback ensuring upregulation of mtDNA copy number as heteroplasmy increases, and (w 1, m 1 | w 2, m 2) represents a fused mitochondrion. The third equation represents a mitophagy event in which an entire mitochondrion is being degraded, the last two equations give examples of fusion and fission events (any number of mitochondria can be fused together). The fission propensity a fis(w, m) was assumed to increase with the size of the mitochondrion, i.e. its total mtDNA copy number. Stochastic Gillespie simulations [83] were used to model the system. Among the conclusions were the following: (1) faster fusion–fission dynamics results in a better mixing of mtDNAs, (2) slower mtDNA mixing increases the heteroplasmy variance between cells and speeds up the process of clonal expansion, (3) including replication feedback [similar to Eq. (4)] can lower the fraction of cells with clonally expanded mutants, but this effect is lessened with low fusion–fission rates.

The model described above was extended in [84] to account for the experimentally observed selectivity of mitochondrial fusion and mitophagy. Briefly, the higher the fraction of mutants inside a mitochondrion, the less likely it is to fuse and the more likely it is to be degraded. As in [82], higher rates of fusion and fission led to an increased heteroplasmy variance, but this was only beneficial when mitophagy and fusion were sufficiently selective, allowing for mitochondria with high h to be efficiently removed from the population. A decline of the selectivity of fusion and mitophagy with age can be a reason why mutants expand, and in this case a lower fusion–fission rate is actually beneficial [84].

In some other models the focus is not on mtDNA dynamics, but on the fusion and fission events themselves and how they affect the health of the overall mitochondrial population. It was assumed that mitochondria contain a discrete set of units (referred to as health units, hereditary units, or quality units) that can be exchanged during fusion–fission events and undergo damage over time [7981]. Less healthy mitochondria (mitochondria with a low membrane potential) are less likely to fuse [85], and usually they are assumed to have a higher degradation rate. Because of this, higher rates of fusion and fission tend to lead to more healthy mitochondria [7981]. When damaged mitochondria are able to spread their dysfunctions in an infection-like manner, lower fusion–fission rates were found to be beneficial [81].

Various other stochastic models of mitochondria were constructed, which are briefly summarized here: (1) the effect of a shorter mutant replication time was modelled using both DDEs (delayed differential equations) and stochastic simulations [68]. The authors concluded that faster mutant replication is highly unlikely to be the cause of clonal expansion; (2) the role of transcription rates in providing a replication advantage for mutants was investigated [36]. This model is discussed in more detail in Sect. 4; (3) Random drift was found to be sufficient to explain mutant loads in human tumors [44]; (4) a model simulating mtDNA segregation in hematopoietic stem cells found evidence for selection against pathogenic mutations [86]; and (5) a model was developed to investigate the dynamics of the network arising from fusion–fission events, the investigation of its equilibrium configurations in both deterministic and stochastic settings leading to the finding of the existence of percolation phase transition in the mitochondrial reticulum [72].

3.3 A General Approach to Investigating the Nuclear Control of mtDNA Dynamics

Recently, a general bottom-up theory has been produced to describe mtDNA dynamics in single cells [78]. The full model includes mtDNA turnover with (1) arbitrary copy number feedback control on replication and degradation rates, (2) cell divisions, (3) de novo mutations and replication errors, and (4) a possible selective advantage for mutant mtDNA molecules. Denoting a state with w wildtype and m mutant molecules by (w, m), the dynamics are described by the following set of Poisson processes:

$$\displaystyle{ \begin{array}{rlll} (w,m)\,&\mathop{\longrightarrow}\limits_{}^{\epsilon _{1} + (1 +\epsilon _{2})w\lambda (w,m) \equiv f_{1}(w,m)} &&(w + 1,m) \\ (w,m)\,&\mathop{\longrightarrow}\limits_{}^{\epsilon _{3} + (1 +\epsilon _{4})m\lambda (w,m) \equiv f_{2}(w,m)}&&(w,m + 1) \\ (w,m)\,&\mathop{\longrightarrow}\limits_{}^{\epsilon _{5} + (1 +\epsilon _{6})w\nu (w,m) \equiv f_{3}(w,m)} &&(w - 1,m) \\ (w,m)\,&\mathop{\longrightarrow}\limits_{}^{\epsilon _{7} + (1 +\epsilon _{8})m\nu (w,m) \equiv f_{4}(w,m)}&&(w,m - 1) \\ (w,m)\,&\mathop{\longrightarrow}\limits_{}^{w\mu _{1} \equiv f_{5}(w,m)} &&(w - 1,m + 1) \\ (w,m)\,&\mathop{\longrightarrow}\limits_{}^{w\mu _{2} \equiv f_{6}(w,m)} &&(w,m + 1) \\ (w,m)\,&\mathop{\longrightarrow}\limits_{}^{w\mu _{3} \equiv f_{7}(w,m)} &&(w - 1,m + 2)\end{array} }$$
(7)

where λ(w, m) and ν(w, m) are the replication and degradation rates, respectively, ε i indicate any possible selective advantage in replication and/or degradation for w or m; this advantage can be multiplicative (even-indexed ε i ) or additive (odd-indexed ε i ), and μ i indicate possible mutation processes; spontaneous mutations (μ 1), replication errors with the original molecule remaining intact (μ 2), and replication errors in which both the original and replicated molecule become mutated (μ 3). The rates of the reactions are given by f j with j = 1, , R where R = 7 is the total number of reactions.

The stoichiometry matrix of this system is given by

$$\displaystyle{ S_{ij} = \left (\begin{array}{*{10}c} 1&0&-1& 0 &-1&0&-1\\ 0 &1 & 0 &-1 & 1 &1 & 2 \end{array} \right ) }$$
(8)

with the index j representing the different reactions given in (7), and i = 1, , N denoting the different species (here, i = 1 corresponds to w and i = 2 to m, but the method can be readily extended to deal with more than 2 mtDNA species). Denoting P w, m (t) as the probability of observing the system in state (w, m) at time t, the system is described by the following master equation:

$$\displaystyle\begin{array}{rcl} \frac{\partial P_{w,m}} {\partial t} & =& \sum _{j=1}^{R}\Bigg(\prod _{ i=1}^{N}\mathbb{E}^{-S_{ij} } - 1\Bigg)f_{j}(w,m)P_{w,m} \\ & =& \left (\mathbb{E}^{-S_{11} }\mathbb{E}^{-S_{21} } - 1\right )f_{1}(w,m)P_{w,m} + \cdots + \left (\mathbb{E}^{-S_{17} }\mathbb{E}^{-S_{27} } - 1\right )f_{7}(w,m)P_{w,m} \\ & =& f_{1}(w - 1,m)P_{w-1,m} - f_{1}(w,m)P_{w,m} + \cdots \\ & +& f_{7}(w + 1,m - 2)P_{w+1,m-2} - f_{7}(w,m)P_{w,m} {}\end{array}$$
(9)

where \(\mathbb{E}^{-S_{ij}}\) is a raising and lowering operator.Footnote 1 For non-linear f j (w, m) this master equation is generally analytically intractable, but can be approximated by a van Kampen system size expansion [87]. The system size expansion treats w and m as the sum of a deterministic component (ϕ) and a fluctuating stochastic component (ξ), scaled by powers of the system size (\(\Omega\)):

$$\displaystyle\begin{array}{rcl} w& =& \phi _{w}\Omega +\xi _{w}\Omega ^{1/2} \\ m& =& \phi _{m}\Omega +\xi _{m}\Omega ^{1/2}{}\end{array}$$
(10)

All quantities in Eq. (9) are then written in terms of \(\Omega\), ϕ i , and ξ i , and equal powers of \(\Omega\) are collected. The largest terms, proportional to \(\Omega ^{1/2}\), form the macroscopic rate equations, i.e. they describe the deterministic behaviour of the mean quantities in the system. Next, terms of order \(\Omega ^{0}\), known as the linear noise approximaton (LNA), give a linear Fokker–Plank equation

$$\displaystyle{ \frac{\partial \Pi (\xi,t)} {\partial t} =\sum _{ i,j=1}^{N}A_{ ij}\frac{\partial \left (\xi _{j}\Pi \right )} {\partial \xi _{i}} + \frac{1} {2}\sum _{i,j=1}^{N}B_{ ij}\frac{\partial ^{2}\Pi } {\partial \xi _{i}\partial \xi _{j}} }$$
(11)

with A ij , B ij given by \(A_{ij} =\sum _{ k=1}^{R}S_{ik}\frac{\partial f_{k}} {\partial \phi _{j}}\) and B ij = k = 1 R S ik S jk f k . Evolution equations for the moments of ξ w (t), ξ m (t) [and, correspondingly, the moments of w(t) and m(t)] can be extracted from this Fokker–Planck equation, forming a set of coupled ODEs.

For non-linear functions λ(w, m) and ν(w, m) these coupled moment equations cannot be solved analytically, though they can be solved numerically. To make analytical progress, the replication and degradation rates can be linearized around their steady state values (w ss , m ss ) [78], i.e.

$$\displaystyle\begin{array}{rcl} \lambda (w,m)& \approx & \lambda (w_{ss},m_{ss}) +\beta _{w}(w - w_{ss}) +\beta _{m}(m - m_{ss}) \\ \nu (w,m)& \approx & \nu (w_{ss},m_{ss}) +\delta _{w}(w - w_{ss}) +\delta _{m}(m - m_{ss}){}\end{array}$$
(12)

where β j = j λ(w, m) and δ j = j ν(w, m) with j = w, m.

Using this linearized system, full solutions for the means and variances of w and m over time were provided by the authors for a simplified version of Eq. (7). The existence of steady state values depends on the eigenvalues of the system’s Jacobian matrix. One of the eigenvalues of the simplified system is zero, and imposing the conditions λ(w ss , m ss ) = ν(w ss , m ss ), β w , β m < 0 and δ w , δ m > 0 ensures that the other eigenvalue is negative. This gives rise to a line of steady state values for w and m, and for timescales on which the LNA is valid the wildtype and mutant steady states are roughly constant.

Starting in steady state with noiseless initial conditions, the solutions can be written in the simple form:

$$\displaystyle\begin{array}{rcl} \langle w\rangle & =& w_{ss} \\ \langle m\rangle & =& m_{ss} \\ \langle w^{2}\rangle & =& F_{ 1}^{\text{decay}}(t) +\theta _{ 1}t +\phi _{1} \\ \langle wm\rangle & =& F_{2}^{\text{decay}}(t) +\theta _{ 2}t +\phi _{2} \\ \langle m^{2}\rangle & =& F_{ 3}^{\text{decay}}(t) +\theta _{ 3}t +\phi _{3} \\ \langle h^{2}\rangle '& \equiv & \frac{\langle h^{2}\rangle } {\langle h\rangle (1 -\langle h\rangle )} = \frac{2\lambda (w_{ss},m_{ss})t} {w_{ss} + m_{ss}}{}\end{array}$$
(13)

where F i decay are transient functions that die out exponentially with time, 〈h 2〉′ is the normalized heteroplasmy variance (the quantity typically reported in experimental studies), \(\langle h\rangle =\langle \frac{m} {w+m}\rangle\) is the expected heteroplasmy value (which was approximated by \(\frac{\langle m\rangle } {\langle w\rangle +\langle m\rangle }\)), and the constants θ i and ϕ i are functions only of (1) the difference between replication and degradation rates, (2) steady state copy numbers w ss and m ss , and (3) the turnover rate in steady state λ(w ss , m ss ) ( = ν(w ss , m ss )). The structure of the above solutions is such that at nonzero w and m, at most one of the θ i can be zero, and θ 1 and θ 3 are non-negative [78].

Several conclusions can be drawn from this linearized system when the assumptions underlying this derivation hold (see below): (1) the variance of at least one species (w or m) increases linearly with time, (2) heteroplasmy variance increases linearly with time with a rate depending only on steady state copy numbers and the timescale of random turnover. This last observation means that the rate of increase of 〈h 2〉 does not depend on the specifics of the control mechanism applied (i.e. the specific forms of λ(w, m) and ν(w, m)), meaning that different control mechanisms lead to similar trends in heteroplasmy variance.

For arbitrary functions λ(w, m), ν(w, m) and nonzero ε i , μ i in Eq. (7), the coupled ODEs provided by the system size expansion can be solved numerically, which allows one to study behaviours away from steady state. Various specific control mechanisms are investigated in [78], including the relaxed replication control given in Eq. (4). Numerical solutions generally agree well with stochastic simulations, meaning that, when the LNA is valid (see below), variability arising from selection and mutation under any control mechanism can be characterized without requiring stochastic simulations.

It is further shown that with nonzero ε 8 and μ 2 (i.e. replication errors can occur and mutants are selectively degraded), mutants are only successfully cleared over time when (1 + ε 8) ≫ μ 2 i.e. when selection is sufficiently strong to overcome the increase in m through mutations. Depending on the specific control mechanism used, the wildtype variance can significantly increase as the mutants are cleared.

All the above conclusions are based on the validity of the LNA. Behaviours of w and m start to deviate from the LNA at long times, or if extinction of one of the species (or both) becomes non-negligible. The values of w ss and m ss , which are roughly constant for short times, will start to change at longer times. The steady state of either species can increase or decrease over time and, depending on the details of the feedback functions, will either reach zero, settle down to a constant value, or increase unboundedly. When fixation occurs, the means and variances obtained through the LNA are underestimations of the actual means and variances, while heteroplasmy variance is overestimated by the LNA. Also, for general λ(w, m) and ν(w, m), the transition rates between different states may be highly non-linear, making the LNA less accurate. Higher order correction terms in the system size expansion can be included to improve solutions, and a discussion on the accuracy of the LNA and its higher order terms is given in [88].

4 mtDNA and Ageing

As mentioned in the introduction, mtDNA mutations accumulate with age in any healthy individual. Recent reviews on the relationship between mtDNA mutations and ageing can be found in [8991]. The mitochondrial theory of ageing proposes that accumulation of mitochondrial damage is the cause of ageing in humans and animals, but whether this causal relationship actually exists is still debated [92]. Here, various models are discussed which describe the accumulation of mutations through the process of random replication, degradation, and/or segregation of mtDNA molecules in cells.

Because of the random fluctuations in both wildtype and mutant mtDNA molecules in a cell, a mutant species can become homoplasmic purely by chance, without experiencing any selective advantage. This idea was modelled in non-dividing cells [43]. Cells were simulated over a human lifetime using the model described in Eq. (2), with the addition that every time an mtDNA replicates a mutation can occur with probability P mut. Every mutation event is assumed to result in a new mutation, meaning that cells will acquire a variety of different mutations over time. It was shown that P mut = 5 × 10−5 is sufficient to obtain 5–10% of cells with h > 0. 6 after 100 years. Moreover, the majority ( > 80%) of mutated mtDNAs in the cell were the same, agreeing with the experimental observations of a single clonally expanded mutant (as opposed to many different mutations). The model that includes proliferation of pathological mutants [Eq. (4)] was also used, but changing proliferation (i.e. changing the parameter α) had no significant effect on the accumulation of high heteroplasmy cells. The results they obtained suggest that random drift alone can indeed lead to clonal expansion on the scale of a human lifetime. It has further been hypothesized that random drift can also account for the accumulation of mutations seen in cancer and mitochondrial diseases [44, 93].

However, it has been argued that this model of mutant accumulation purely by chance, without any selective advantage for mutants, cannot explain clonal expansion in short-lived animals [41] such as rats and mice which have an average lifespan of only 3 years. For mutants to have expanded in 5–15% of all cells in such a short time, a much higher P mut is required (7. 6 × 10−3 vs 5 × 10−5 on a human lifetime). The problem with requiring such a high mutation rate is that the number of different mutations that are found in the cells at the end of the simulations is very high. On average, more than 30 types of mutants were present in each cell with h > 0. 6 after 3 years. Moreover, the most frequent mutant in these cells represented less than 20% of all mtDNAs, meaning that experimentally several different mutants should be observed in these high heteroplasmy cells, which is not the case. This suggests that other mechanisms need to be evoked to explain clonal expansion in these short-lived animals.

Because some kind of replicative advantage seems to be required, several models were constructed to incorporate these advantages. In [68] a shorter replication time for mutants was used, but as mentioned in the previous section this particular mechanism is unlikely to explain clonal expansion. In [36], another mechanism was proposed. In order for mtDNA to be able to replicate, it needs to be transcribed as well, which results in the production of proteins. A negative feedback loop is assumed to exist, i.e. transcription rates of mtDNA drop if protein levels are high enough. Deletion mutants miss large parts of their DNA, so some proteins are not produced at all, meaning that the negative feedback on transcription rate never occurs, resulting in higher replication levels. An ODE model was constructed describing the dynamics of w, m and the level of ATP. The presence of mtDNA molecules is assumed to consume some ATP due to the requirements of producing and maintaining mitochondrial machinery (likewise for other cellular processes), and w is assumed to produce ATP. If ATP levels are high, replication of w and m is low. The higher replication rate of m (assumed to be 50% higher than that of w) makes them increase exponentially, eventually leading to system collapse through ATP exhaustion. The ODE model manages to explain the accumulation of mutants for short-lived animals, and levels of w and m after collapse agree with experimental observations. Next a stochastic model based on the ODE reactions was developed. The values for P mut were adjusted such that at the end of the simulations (the final simulation time ranging from 3 to 80 years to model different organisms) 10 ± 0. 5% of cells have h > 0. 6. The model does not require very high mutation rates in short-lived animals, and also predicts that the average number of different mtDNAs present at the end of the simulations is around 1 [36]. To obtain the desired 10 ± 0. 5% of cells with h > 0. 6 in humans, very low mutation probabilities (P mut ∼ 10−7) are required. This seems at odds with the finding that P mut is highly likely to lie between 10−4 and 10−5 in substantia nigra neurons [60], though mtDNA mutation rates may depend on cell type.

5 mtDNA and Development

Unfertilized human eggs contain on the order of 105 copies of mtDNA, some of which may be mutated. After fertilization, the egg starts dividing, and with each division the mtDNAs are stochastically partitioned between the daughter cells. By chance, some daughter cells will inherit more mutants than others, introducing a variance in heteroplasmy across the population of cells (see, e.g., [94]). This allows for the elimination of cells with high mutant loads (as they can become dysfunctional and initiate cell death), while purely wildtype cells and cells with low h survive. An increased heteroplasmy variance therefore provides a mechanism of filtering out mutant mtDNA molecules to reduce mutant load (and thus increase the health) of the offspring. Even though the starting mtDNA copy number of the egg is very high, the copy number per cell falls drastically in early development because of rapid cell divisions with little replication of mtDNA. This fall in copy number per cell further increases the stochasticity and thus the heteroplasmy variance between cells, and is termed the mitochondrial bottleneck.

The exact mechanism by which heteroplasmy variance increases, however, is highly debated. It might be that random drift in copy numbers between cell divisions, stochastic partitioning at cell divisions, or both, is sufficient to explain the observed heteroplasmy variance [59, 78, 9599]. However, other studies show a less pronounced decrease in mtDNA copy number per cell [100]. Additional bottlenecking mechanisms were suggested, such as the clustering of mtDNAs during cell division (increasing the stochasticity upon division) [100], or restricting the ability to replicate to only a subpopulation of mtDNAs [101]. Recently, a general model was developed which was able to reproduce all of these mechanisms and, importantly, provides a statistical framework to compare them given experimental observations [102]. Approximate Bayesian Computation (ABC) [103, 104] was used to infer the statistical support for each of the three mentioned mechanisms. Overall, the most support is found for mechanism involving a combination of random mtDNA turnover and binomial partitioning at cell division.

Several models have been developed to describe the behaviour of mtDNA heteroplasmy through development. Some of these are summarized here, in order of increasing complexity. The first model, originating from population dynamics and termed the Wright formula [105], describes the heteroplasmy mean and variance of a population over any number of cell divisions with binomial partitioning rules. A more detailed model was developed using the Kimura distribution [106] (also originally used in population genetics). The Kimura distribution allows a description of the entire heteroplasmy probability distribution after any number of cell divisions. Recently, a model was developed that includes mtDNA dynamics in between cell divisions [102, 107], providing analytical expressions of the heteroplasmy probability generating function after any number of cell divisions with birth-immigration-death dynamics in between divisions.

5.1 The Wright Formula of Partitioning Variance

Population genetics studies the frequency and interaction of alleles and genes in populations. Genetic drift in allele frequencies arises because the alleles of the offspring are randomly sampled from those of the parents. If random genetic drift is the only force acting on an allele then, after n generations, the variance in allele frequency across a population is given by the Wright formula:

$$\displaystyle{ V _{n} = p(1 - p)\left (1 -\left (1 - \frac{1} {N_{\mathrm{eff}}}\right )^{n}\right ) }$$
(14)

where p and (1 − p) are the initial allele frequencies, and N eff is an effective population size [105]. The mean allele frequency is assumed to be equal to the initial allele frequency.

To apply this theory to an mtDNA population, the Wright formula can be interpreted as describing the variance in mutant allele frequency, i.e. heteroplasmy, over n cell divisions. At cell division, each daughter cell obtains N eff mtDNA molecules which are randomly sampled with replacement from the mother cell. This approach has been used in various studies of mtDNA heteroplasmy [96, 97, 99, 108, 109]. Some of the conclusions drawn from these studies must be taken with care, as it was shown that including a principled description of the uncertainty arising from sampling small populations might change the interpretation of the data [110].

Applied to mtDNA dynamics, the formula has several limitations. Firstly, the parameter N eff is hard to interpret and does not correspond to any biological entity in the cell [111]. Secondly, knowing only the mean and variance of the heteroplasmy distribution may not be very informative. Finally, the Wright formula ignores stochastic effects resulting from random turnover of mtDNA between cell divisions. As is shown in the next section, including mtDNA turnover leads to a correction of the formula.

5.1.1 The Inclusion of mtDNA Turnover Between Cell Divisions

The model described earlier in Sect. 3.3 was used to adjust the Wright formula to include turnover of mtDNA molecules in between cell divisions [78]. The expected heteroplasmy variance given in Eq. (13) describes the approximate steady state variance in h arising from random mtDNA turnover through birth–death dynamics. This formula does not include cell divisions, and the denominator of the equation, w ss + m ss , thus gives the expected population size without the inclusion of cell divisions. If n is the population size immediately after a cell divides, then in order to maintain a constant average population size n has to increase to 2n before the next cell division. The expected population size is then roughly given by \(\frac{3} {2}n\). Here, no reference to a particular feedback mechanism was made, but if more knowledge about the feedback mechanism is present, then the expected population size can be tailored more appropriately. In the general case, the approximate expected heteroplasmy variance through random mtDNA turnover and cellular turnover is then given by:

$$\displaystyle\begin{array}{rcl} \langle h^{2}\rangle '& =& \frac{2\lambda t} {n_{\mathrm{eff}}} \\ & =& \frac{4t} {3n\tau }{}\end{array}$$
(15)

where n is the number of mtDNA molecules immediately after cell division, and τ is the timescale of mtDNA degradation (assuming that λ = ν = 1∕τ, i.e. constant birth and death rates that are balanced to keep a constant average population size). This leads to the ‘turnover adjusted Wright formula’ proposed in Ref. [78]:

$$\displaystyle{ \langle h^{2}\rangle ' = 1 -\left (1 - \frac{1} {2n}\right )^{g} + \frac{4t} {3n\tau } }$$
(16)

where g is the number of generations (cell divisions) that have occurred, t is the amount of time that has expired since an initial state with 〈h 2〉′ = 0, and τ is the timescale of mtDNA degradation. This adjusted Wright formula now includes random turnover of mtDNA and is written in terms of observable quantities, namely n (the number of mtDNA immediately after cell division), g (the number of cell divisions), and τ (the timescale of mtDNA turnover). While these models are useful in some circumstances, a more detailed approach has benefits as well and was developed by Kimura [106] as described in the next section. Equation (16) was tested against stochastic simulations and shown to be an improvement on the original formula [78].

5.2 The Kimura Distribution

To predict the entire heteroplasmy distribution after any number of cell divisions, the theory developed by Motoo Kimura [106] was applied to mtDNA segregation [112]. The Kimura distribution describes gene frequency distributions under random genetic drift. It is assumed that there is no selection and that there are no de novo mutations. A lack of de novo mutations means that for long times, the heteroplasmy in cells will settle down on either fully wildtype or fully mutant, as these are the only two absorbing states. According to the Kimura model, the total probability distribution for a particular allele (e.g. mutant mtDNAs) consists of three probability distributions: (1) the probability f(0, t) for having lost the allele in generation t, (2) the probability f(1, t) for having fixed on that allele, and (3) the probability distribution ϕ(x, t) giving the probability of observing the allele at frequency x in generation t:

$$\displaystyle\begin{array}{rcl} & & f(0,t) = (1 - p_{0}) +\sum _{ i=1}^{\infty }(2i + 1)p_{ 0}(1 - p_{0})(-1)^{i} \\ & & \qquad \qquad \times F(1 - i,i + 2,2,1 - p_{0})e^{-i(i+1)/(2N_{\mathrm{eff}})t} {}\end{array}$$
(17)
$$\displaystyle\begin{array}{rcl} & & \phi (x,t) =\sum _{ i=1}^{\infty }i(i + 1)(2i + 1)p_{ 0}(1 - p_{0}) \\ & & \qquad \qquad \times F(1 - i,i + 2,2,x)F(1 - i,i + 2,2,p_{0})e^{-i(i+1)/(2N_{\mathrm{eff}})t}{}\end{array}$$
(18)
$$\displaystyle\begin{array}{rcl} & & f(1,t) = p_{0} +\sum _{ i=1}^{\infty }(2i + 1)p_{ 0}(1 - p_{0})(-1)^{i} \\ & & \qquad \qquad \times F(1 - i,i + 2,2,p_{0})e^{-i(i+1)/(2N_{\mathrm{eff}})t} {}\end{array}$$
(19)

where F(a, b, c, d) is the hypergeometric function and p 0 is the initial allele frequency [106]. The variance of the Kimura distribution is the same as the variance described by the Wright formula in Eq. (14). Interpreting the Kimura model for mtDNA segregation means that p 0 is the initial heteroplasmy, and f(0, t), f(1, t), and ϕ(x, t) are the probabilities of observing h = 0, h = 1, and h = x (with x ≠ 0, 1) after t cell divisions, respectively. These equations were used to describe heteroplasmy data from human, mouse, and Drosophila [112]. Overall, the Kimura distribution provides a good description of experimental data [112]. In [112] only the heteroplasmy mean and variance were matched to data, but more detailed fits are possible using an explicit likelihood function. Some experimentally reported increases in heteroplasmy variance become hard to defend if standard errors of the variances are taken into account, assuming that the heteroplasmy variance data is sampled from a Kimura distribution [110].

An alternative way to obtain the full heteroplasmy distribution is by using stochastic simulations. The advantage of simulations is that de novo mutations and selection mechanisms can be easily included, though they do not provide an explicit analytical distribution as is given in Eqs. (17)– (19).

5.3 Analytical Descriptions of Random Turnover Combined with Cell Divisions

To describe the dynamics of mtDNA molecules over time through cycles of cell divisions, both the turnover within a cell cycle and the partitioning at cell divisions have to be taken into account. An analytical description of these dynamics are described in a recent model which follows the probability distribution of an agent (e.g. an mtDNA molecule) over time. The dynamics of the agents is assumed to arise from a combination of: (1) random turnover of agents between cell divisions according to a birth–death-immigration (BID) model, and (2) stochastic partitioning at cell division [107]. Several possible partitioning schemes are considered and analytic results are demonstrated for two important examples: binomial partitioning and subtractive partitioning. In subtractive partitioning a small number of agents are transferred to a small bud and the larger cell that is left over is tracked in the next generation, a model which is appropriate in various organisms such as budding yeast. A similar approach is taken in [102], with birth–death dynamics in between cell divisions. Here, a summary of the approach taken in [102] is given and some conclusions of the models in both [102] and [107] are discussed.

5.3.1 Within Cell Cycle Dynamics

Within a cell cycle, the time evolution of the probability distribution P m (t) of observing m agents at time t, according to a birth death model is given by the following master equation:

$$\displaystyle{ \frac{\partial P_{m}(t)} {\partial t} =\nu (m + 1)P_{m+1}(t) +\lambda (m - 1)P_{m-1}(t) - (\nu +\lambda )mP_{m}(t) }$$
(20)

where λ and ν are the birth and death rates, respectively. The initial condition is assumed to be \(P_{m}(0) =\delta _{m,m_{0}}\). A solution can be obtained by solving for the probability generating function G(z, t) = m = 0 z m P m (t) [113]. Knowing the generating function G(z, t) is equivalent to knowing the probability distribution because all the moments from the distribution can be derived from its derivatives. The generating function of the birth–death model satisfies

$$\displaystyle\begin{array}{rcl} \frac{\partial G(z,t)} {\partial t} & =& \sum _{m=0}^{\infty }z^{m}\frac{\partial P_{m}(t)} {\partial t} \\ & =& \Big(\nu (1 - z) +\lambda (z^{2} - z)\Big)\frac{\partial G(z,t)} {\partial z} {}\end{array}$$
(21)

with \(G(z,0) = z^{m_{0}}\), which can be solved [102, 107] to give

$$\displaystyle\begin{array}{rcl} G(z,t\vert m_{0})& =& \Bigg(\frac{(z - 1)\nu e^{(\lambda -\nu )t} -\lambda z+\nu } {(z - 1)\lambda e^{(\lambda -\nu )t} -\lambda z+\nu }\Bigg)^{m_{0} } \\ & \equiv & \Bigg(\frac{Az + B} {Cz + D}\Bigg)^{m_{0} } \\ & \equiv & g(z,t)^{m_{0} } {}\end{array}$$
(22)

where A = νlλ, B = ννl, C = λlλ, and D = νλl with l = e (λν)t.

When birth and death are balanced (λ = ν) Eq. (22) can be rewritten to give

$$\displaystyle{ G_{\lambda =\nu }(z,t\vert m_{0}) =\Bigg ( \frac{\nu t\ - z -\nu t} {\nu tz - 1 -\nu t}\Bigg)^{m_{0} } }$$
(23)

which is obtained by writing λ = ν + ε in Eq. (22) and taking the limit ε → 0.

5.3.2 Agent Partitioning at Cell Division

The overall generating function of the process containing both cell divisions and birth–death dynamics between these divisions can be written [102, 107] in a similar form to Eq. (22), i.e.

$$\displaystyle{ g_{\mathrm{div}}(z,t,n) =\Bigg (\frac{A_{\mathrm{div}}z + B_{\mathrm{div}}} {C_{\mathrm{div}}z + D_{\mathrm{div}}}\Bigg) }$$
(24)

with

$$\displaystyle\begin{array}{rcl} A_{\mathrm{div}}& =& 2^{n}\lambda (l + l' - 2) - I^{n}l'(\lambda +\nu (l - 2)) \\ B_{\mathrm{div}}& =& l^{n}l'(\lambda +\nu (l - 2)) - 2^{n}(\lambda l' +\nu (l - 2)) \\ C_{\mathrm{div}}& =& -\lambda l^{n}l'(l - 1) + 2^{n}\lambda (l + l' - 2) \\ D_{\mathrm{div}}& =& \lambda l^{n}l'(l - 1) - 2^{n}(\lambda l' +\nu (l - 2)) {}\end{array}$$
(25)

where n is the number of cell divisions, λ and ν are the birth and death rates of the mtDNA dynamics in between each of these divisions, l′ = e (λν)t, and l = e (λν)τ with τ the length of the cell cycle (which is here assumed to be equal for all cell cycles) [102, 107]. Equation (22) is a special case of Eq. (24) with n → 0 and τ → 0.

5.3.3 Combined Overall Solution

To construct a generating function of the overall process with n p phases, the generating functions of each phase have to be linked together in an appropriate way. Denoting the parameters describing the ith phase with index i (e.g. A i corresponds to A in Eq. (25) with λ, ν, n replaced by λ i , ν i , n i ), the overall generating function is given by [102, 107]:

$$\displaystyle\begin{array}{rcl} g_{\mathrm{overall}}& =& \frac{A'z + B'} {C'z + D'} \\ G_{\mathrm{overall}}& =& g_{\mathrm{overall}}^{m_{0} }{}\end{array}$$
(26)

with

$$\displaystyle{ \left (\begin{array}{*{10}c} A'&B'\\ C' &D' \end{array} \right ) =\prod _{ i=1}^{n_{p} }\left (\begin{array}{*{10}c} A_{i} & B_{i} \\ C_{i}&D_{i} \end{array} \right ) }$$
(27)

From this overall generating function, means and variances (and, if necessary, the full probability distribution) of m at any time t throughout development can be calculated. By performing a similar analysis for the wildtype population w, heteroplasmy statistics can also be obtained. This allows for the evaluation of important quantities such as the mean and variance of h(t) and the probability of crossing a certain heteroplasmy threshold (i.e. P(h > h ) for some h ) throughout development, and the mutant fixation probability P(m = 0, t).

One of the conclusions drawn from the model is that an increase in mitochondrial degradation increases heteroplasmy variance, and therefore increases the strength of selection to remove high heteroplasmy cells. This means that clinically increasing mitochondrial degradation may represent a way to reduce heteroplasmy levels in offspring. The more general model considered in [107] provides full, closed-form generating functions for several types of mtDNA dynamics, making it possible to extract all details of copy number distributions at any given time. The approach provides a way to explore the statistics of systems of mtDNA dynamics, with or without cell divisions, with arbitrarily changing population size. This formalism could also be applied to, for example, mtDNA dynamics in tumour cells.

6 Discussion

Mitochondrial dysfunctions and mutations are linked to many different diseases and it is important to understand how these dysfunctions arise, how they develop over time, and how they can be treated. Mathematical models are valuable tools to explore these questions from both the perspective of fundamental biology and of clinical strategies. In this review, the focus is on models of mtDNA dynamics and mitochondrial fusion–fission dynamics, the accumulation of mutant mtDNAs over time, and the role of mtDNA in ageing and development. Of particular importance is the time evolution of heteroplasmy values, since the proportion of cells exceeding a critical heteroplasmy threshold is related to disease severity. Numerous models, deterministic and stochastic, have been constructed describing mtDNA dynamics in different cells and over various timescales. Deterministic models typically describe mean behaviours of heteroplasmy alone; stochastic approaches are vital to describe the biomedically central structure of mtDNA distributions [78, 102, 107].

Existing models cover different mitochondrial aspects and use various approaches. At high copy numbers, mtDNA dynamics are well described by deterministic models (e.g. [33, 38, 39, 76]). The modelling of low copy numbers, fixation probabilities, and population variances requires the construction of stochastic models. Both simulation-based models (e.g. [43, 44, 68, 80, 82, 84]), analytical models (e.g. [78, 102, 105107]) and numerical approximations to analytical models (e.g. [78]) have been made, some of which include spatial effects (e.g. [80, 82, 84]) or mitochondrial fusion–fission dynamics (e.g. [7982, 84]). The nuclear control of mtDNA copy number has been modelled in several ways, e.g. (1) a simple total copy number control [75]; (2) negative feedback control of replication rates dependent on the wildtype [75] or both wildtype and mutant species [76] using proportional selection; and (3) a more general negative feedback control for both random replication and degradation rates.

These mathematical approaches have led to many new discoveries and progress. Predictions made by the relaxed replication model have found experimental support [77, 114]. A recent model [115] increased the power of analysis of large-scale experiments to identify a potential issue with cutting-edge medical therapies: namely, that proliferative differences between different mtDNA types, as may arise in therapeutic contexts, can lead to amplification of potentially harmful mutant mtDNA. The evidence from this joint mathematical and experimental study influenced the UK HFEA’s policy decisions on the implementation of these therapies [116]; the ‘haplotype matching’ approach that it advocated is now gaining support [117]. A model of the mitochondrial bottleneck [102] suggested approaches by which drugs can be used to modulate mtDNA during development and ameliorate disease inheritance, which is now being tested experimentally [118]. Additionally, this model provided clinically motivated strategies for optimally sampling embryos in preimplantation genetic diagnoses to address the inheritance of mtDNA diseases [102]. The many hypotheses existing on how exactly mutant mtDNA molecules expand over time can be tested with simulations and results indicate that (1) random mutant accumulation without any selection advantage has so far failed to explain clonal expansion in short-lived animals, (2) it is unlikely that a shorter mutant replication time causes clonal expansion [68], and (3) a higher mutant replication rate produces outcomes that are roughly consistent with various experimental data [36]. The ability of stochastic mathematical models to test, falsify, or confirm these hypotheses is extremely valuable because knowing how and why mutants accumulate allows us to clinically intervene in this process and potentially create a treatment for mitochondrial diseases.

Despite large amounts of progress, there are still open problems. Is there a causal relationship between mitochondrial damage and ageing? Mice accumulating mutations on a faster timescale (‘mutator mice’) show accelerated ageing-like phenotypes and shortened lifespan. However, this is only true for homozygous mice; heterozygous mice do not show ageing phenotypes despite having high mutation burdens [119]. How does the cell regulate its mitochondrial copy number? As noted in [78] it is experimentally hard to distinguish between different nuclear feedback mechanisms as distinct mechanisms can lead to very similar dynamics. The mechanisms by which mutant mtDNA molecules expand is still not fully understood, which is reflected by the large amount of hypotheses put forward. The recently developed model in [36] requires a very low mutation probability in human cells, which is not true for all cell types [60]. Moreover, it may not be able to explain clonal expansion of certain point mutations. It is likely that preferential replication of mutants is the result of a combination of multiple mechanisms which are different for distinct mutations and cell types, making a general theoretical description very challenging. We anticipate the development of stochastic models for mtDNA populations to continue to produce scientific insights as the amount of experimental data characterizing this rich and medically important system increases in the future.