Introduction

The random genetic drift model developed implicitly by Fisher in [11] and explicitly by Wright in [25], and henceforth called the Wright–Fisher model, is one of the most popular stochastic models in population genetics [2, 9]. In its simplest form, it is concerned with the evolution of the probabilities between non-overlapping generations in a population of fixed size of two alleles at a single diploid locus that are obtained from random sampling in the parental generation, without additional biological mechanisms like mutation, selection, or a spatial population structure. Generalizations to multiple alleles, several loci, inclusion of mutations and selection etc. then constituted an important part of mathematical population genetics. It is our aim to develop a general mathematical perspective on the Wright–Fisher model and its generalizations. In the present paper, we treat the case of multiple alleles at a single site. In a companion paper [23], we have discussed the simplest case of two alleles in more detail. Generalizations will be addressed in subsequent papers.

Let us first describe the basic mathematical contributions of Wright and Kimura. In 1945, Wright approximated the discrete process by a diffusion process that is continuous in space and time (continuous process, for short) and that can be described by a Fokker–Planck equation. In 1955, by solving this Fokker–Planck equation derived from the Wright–Fisher model, Kimura obtained an exact solution for the Wright–Fisher model in the case of two alleles (see [15]). Kimura [16] also developed an approximation for the solution of the Wright–Fisher model in the multi-allele case, and in 1956, he obtained ([17]) an exact solution of this model for three alleles and concluded that this can be generalized to arbitrarily many alleles. This yields more information about the Wright–Fisher model as well as the corresponding continuous process. Kimura’s solution, however, is not entirely satisfactory. For one thing, it depends on very clever algebraic manipulations so that the general mathematical structure is not very transparent, and this makes generalizations very difficult. Also, Kimura’s approach is local in the sense that it does not naturally incorporate the transitions resulting from the (irreversible) loss of one or more alleles in the population. Therefore, for instance the integral of his probability density function on its defined domain is not equal to 1.

As mentioned, while the original model of Wright and Fisher works with a finite population in discrete time, many mathematical insights into its behavior are derived from its diffusion approximation. After the original work of Wright and Kimura just described, a more systematic approach was developed within the theory of strongly-continuous semigroups and Markov processes. In this framework, the diffusion approximation for the multi-allele Wright–Fisher model was derived by Ethier and Nagylaki [6, 7], and a proof of convergence of the Markov chain to the diffusion process can be found in [5]. (In this paper, we are not concerned with deriving the diffusion approximation, but actually, this can be derived in a rather direct manner without having to appeal to the general theory, as we shall show elsewhere.) One may then derive existence and uniqueness results for solutions of the Fokker–Planck equation from the theory of strongly continuous semigroups [5, 6, 14]. As the diffusion operator of the diffusion approximation becomes degenerate at the boundary, the analysis at the boundary becomes difficult, and this issue is not addressed by the aforementioned results. Recent work of Epstein and Mazzeo [3, 4], however, treats the boundary regularity with general PDE methods.

The full structure of the Wright–Fisher model and its diffusion approximation, however, is only revealed when one can connect the dynamics before and after the loss of an allele, or in analytic terms, if one can extend the process from the interior of the probability simplex to all its boundary strata. In particular, this is needed to preserve the normalization of the probability distribution. Therefore, in this paper, we develop the definition of a general solution that naturally includes the transitions resulting from the disappearance of alleles and derive the formalism for its solution. Since this formalism is rather explicit, it will allow us to derive and generalize the known explicit formulas for the quantities associated with the Wright–Fisher diffusion model like expected waiting times for the loss of one or several alleles in a systematic manner. The key for our approach are evolution equations for the moments of the probability density and the duality between the forward and backward Kolmogorov equations. We show that there exists a unique global solution of the Fokker–Planck equation. Since, as explained, our concept of a solution is different from (and, as we believe, better adapted to the structure of the Wright–Fisher model than) those treated in the literature, insofar as it extends to the boundary, these results do not follow from the general results of the literature mentioned above.

In the present paper, we only treat genetic drift in the absence of mutation, selection, and recombination. Extensions that can be obtained on the basis of the formalism presented here, in particular to general recombination schemes, will be presented in subsequent publications.

The Global Solution of the Wright-Fisher Model

In this section, we shall first establish some notation, and then prove some propositions as well as the main theorem of this paper.

Notations

\(\Delta _n:=\{(x^1,x^2,\dots ,x^{n+1}):\, x^i\ge 0, \sum _{i=1}^{n+1} x^i=1\}\) is the standard n-simplex in \(\mathbb {R}^{n+1}\) representing the probabilities or relative frequencies of alleles \(A_1,\dots , A_{n+1}\) in our population. Often, however, it is advantageous to work in \(\mathbb {R}^n\) instead of \(\mathbb {R}^{n+1}\), and with \(e_0:= (0,\ldots ,0) \in \mathbb {R}^n,\, e_k:=(0,\ldots ,\underbrace{1}_{k{\mathrm{th}}},\ldots ,0)\in \mathbb {R}^n\), we therefore define

$$\begin{aligned} \Omega _n :=\, \mathrm {intco} \{e_0,\ldots ,e_n \}:= \left\{ \sum \limits _{k=0}^n x^k e_k,\, (x,x^0)=\left( x^1,\ldots ,x^n,1-\sum \limits _{k=1}^n x^k\right) \in \mathrm {int} \Delta _n \right\} . \end{aligned}$$

Moreover, we shall need the subsimplices corresponding to subsets of alleles, using the following notations

$$\begin{aligned} I_k:= & {} \{\{i_0,\ldots ,i_k \},\, 0\le i_0<\cdots <i_k \le n\}, \quad k\in \{1,\ldots ,n\},\\ V_0:= & {} \{e_0,\ldots , e_n\},\\&\text { the domain representing a population of one allele}, \\ V_k^{(i_0,\ldots ,i_k)}:= & {} \mathrm {intco} \{e_{i_0},\ldots ,e_{i_k}\}, \quad k\in \{1,\ldots ,n\},\\&\text { the domain representing a population of alleles } \{A_{i_0},\ldots ,A_{i_k}\},\\ V_k:= & {} \{\mathrm {intco} \{e_{i_0},\ldots ,e_{i_k}\}\text { for some } i_0<\cdots <i_k \in \overline{0,n} \}, \quad k\in \{1,\ldots ,n\},\\= & {} \bigsqcup \limits _{(i_0,\ldots ,i_k)\in I_k} V_k^{(i_0,\ldots ,i_k)},\\&\text { the domain representing a population of }(k+1) \text { alleles,}\\ \overline{V}_k:= & {} \bigcup \limits _{(i_0,\ldots ,i_k)\in I_k} \overline{V}_k^{(i_0,\cdots ,i_k)}, \quad k\in \{1,\ldots ,n\},\\= & {} \bigsqcup \limits _{i=0}^k V_i,\\&\text { the domain representing a population of at most }(k+1)\,\, \text {alleles.} \end{aligned}$$

We shall also need some function spaces:

$$\begin{aligned} \begin{aligned} H_k^{(i_0,\ldots , i_k)}:&=\, C^\infty \left( \overline{V_k^{(i_0,\ldots , i_k)}} \right) ,\\ H_k&:=\, C^\infty (\overline{V}_k),\quad k\in \{1,\ldots ,n\},\\ H&:=\, \{f:\overline{V}_n\rightarrow [0,\infty ] \text { measurable such that } [f,g]_n<\infty , \forall g\in H_n\},\\&\, \text {where } [f,g]_n:=\,\int \limits _{\overline{V}_n}f(x)g(x)d\mu (x)=\sum \limits _{k=0}^n\int \limits _{V_k} f(x)g(x)d\mu _k(x),\\&\,\quad \quad \quad \quad \quad =\sum \limits _{k=0}^n\sum \limits _{(i_0,\ldots ,i_k)\in I_k}\int \limits _{V^{(i_0,\ldots ,i_k)}_k} f(x)g(x)d\mu ^{(i_0,\ldots ,i_k)}_k(x),\\&\, \text {with } \mu ^{(i_0,\ldots ,i_k)}_k \text { a probability measure on } V^{(i_0,\ldots ,i_k)}_k. \end{aligned} \end{aligned}$$

We can now define the differential operators for our Fokker–Planck equation:

$$\begin{aligned} \begin{aligned} L_k^{(i_0,\ldots ,i_k)}:&\, H_k^{(i_0,\ldots , i_k)}\rightarrow H_k^{(i_0,\ldots , i_k)},\, L_k^{(i_0,\ldots , i_k)}f(x)=\frac{1}{2}\sum \limits _{i,j\in \{i_1,\ldots , i_k \}} \frac{\partial ^2 (a_{ij}(x)f(x))}{\partial x^i \partial x^j} ,\\ (L_k^{(i_0,\ldots , i_k)})^{*}:&\, H_k^{(i_0,\ldots , i_k)}\rightarrow H_k^{(i_0,\ldots , i_k)},\, (L_k^{(i_0,\ldots , i_k)})^*g(x)=\frac{1}{2} \sum \limits _{i,j\in \{i_1,\ldots , i_k \}} a_{ij}(x) \frac{\partial ^2 g(x)}{\partial x^i \partial x^j},\\ L_k:&\, H_k\rightarrow H_k,\quad (L_k)_{|{H_k^{(i_0,\ldots , i_k)}}} = L_k^{(i_0,\ldots ,i_k)},\\ L_k^*:&\, H_k\rightarrow H_k,\quad (L_k^*)_{|{H_k^{(i_0,\ldots , i_k)}}} = (L_k^{(i_0,\ldots , i_k)})^{*}, \end{aligned} \end{aligned}$$

where the coefficients are defined by

$$\begin{aligned} a_{ij}(x):=\, x^i(\delta _{ij}-x^j),\quad i,j\in \{1,\ldots ,n\}. \end{aligned}$$

Finally, we shall need

$$\begin{aligned} w_k^{(i_0,\ldots , i_k)}(x):=\prod \limits _{i\in I_k^{(i_0,\ldots , i_k)}} x^i, \quad k\in \{1,\ldots ,n\}. \end{aligned}$$

Proposition 2.1

For each \(1\le k\le n\), \(m\ge 0, |\alpha |=\alpha ^1+\cdots +\alpha ^k=m\), the polynomial of degree m in k variables \(x=(x^{i_1},\ldots ,x^{i_k})\) in \(\overline{V_k^{(i_0,\ldots ,i_k)}}\)

$$\begin{aligned} X_{m,\alpha }^{(k)}(x)=x^\alpha +\sum \limits _{|\beta |<m}a^{(k)}_{m,\beta } x^\beta , \end{aligned}$$
(1)

where the \(a^{(k)}_{m,\beta }\) are inductively defined by

$$\begin{aligned} a^{(k)}_{m,\beta }=-\frac{\sum \limits _{i=1}^k (\beta _i+2)(\beta _i+1)a^{(k)}_{m,\beta +e_i}}{(m-|\beta |)(m+\beta +2k+1)},\qquad \forall |\beta |<m, \end{aligned}$$

is the eigenvector of \(L_k^{(i_0,\ldots ,i_k)}\) corresponding to the eigenvalue \(\lambda ^{(k)}_m=\frac{(m+k)(m+k+1)}{2}\).

Proof

We have

By equalizing coefficients we obtain

$$\begin{aligned} \lambda ^{(k)}_m=\frac{(m+k)(m+k+1)}{2} \end{aligned}$$

and

$$\begin{aligned} a^{(k)}_{m,\beta }=-\frac{\sum \limits _{i=1}^k (\beta _i+2)(\beta _i+1)a^{(k)}_{m,\beta +e_i}}{(m-|\beta |)(m+\beta +2k+1)},\quad \forall |\beta |<m. \end{aligned}$$

This completes the proof. \(\square \)

Remark 2.2

  • When \(k=1\), \(X^{(1)}_{m,m}(x^1)\) is the \(m\mathrm{th}-\)Gegenbauer polynomial (up to a constant). Thus, the polynomials \(X_{m,\alpha }^{(k)}(x)\) can be understood as a generalization of the Gegenbauer polynomials to higher dimensions.

  • Because of this representation of eigenvectors, we can easily see that \(X_{m,\alpha }^{(n)}(x)\) is a basis of \(C^2(\overline{V_n})\).

Proposition 2.3

If \(X\in C^2(\overline{V_k^{(i_0,\ldots ,i_k)}})\) is an eigenvector of \(L_k^{(i_0,\ldots ,i_k)}\) corresponding to \(\lambda \), then \(w_k^{(i_0,\ldots ,i_k)}X\) is an eigenvector of \((L_k^{(i_0,\ldots ,i_k)})^*\) corresponding to \(\lambda \).

Proof

If \(X\in C^2(\overline{V_k^{(i_0,\ldots ,i_k)}})\) is an eigenvector of \(L_k^{(i_0,\ldots ,i_k)}\) corresponding to \(\lambda \), it follows that

$$\begin{aligned} -\lambda (w_k^{(i_0,\ldots ,i_k)}(x) X)&= \frac{1}{2} w_k^{(i_0,\ldots ,i_k)}(x) \sum \limits _{i,j\in \{i_1,\ldots , i_k \}} \frac{\partial ^2}{\partial x^i \partial x^j}\left( x^i(\delta _{ij}-x^j)X\right) \\&=\frac{1}{2} w_k^{(i_0,\ldots ,i_k)}(x) \sum \limits _{i,j\in \{i_1,\ldots , i_k \}} \left( x^i(\delta _{ij}-x^j)\right) \frac{\partial ^2 X}{\partial x^i \partial x^j}\\&\quad +\frac{1}{2} w_k^{(i_0,\ldots ,i_k)}(x) \sum \limits _{i,j\in \{i_1,\ldots , i_k \}} \frac{\partial \left( x^i(\delta _{ij}-x^j)\right) }{\partial x^i}\frac{\partial X}{\partial x^j}\\&\quad +\frac{1}{2} w_k^{(i_0,\ldots ,i_k)}(x) \sum \limits _{i,j=1}^k \frac{\partial \left( x^i(\delta _{ij}-x^j)\right) }{\partial x^j}\frac{\partial X}{\partial x^i}\\&\quad +\frac{1}{2} w_k^{(i_0,\ldots ,i_k)}(x) \sum \limits _{i,j\in \{i_1,\ldots , i_k \}} \frac{\partial ^2 \left( x^i(\delta _{ij}-x^j)\right) }{\partial x^i\partial x^j}X\\&=\frac{1}{2}\sum \limits _{i,j=1}^k \left( x^i(\delta _{ij}-x^j)\right) \left( w_k^{(i_0,\ldots ,i_k)}(x)\frac{\partial ^2 X}{\partial x^i \partial x^j}\right) \\&\quad + \frac{1}{2}\sum \limits _{j\in \{i_1,\ldots , i_k \}} w_k^{(i_0,\ldots ,i_k)}(x)\left( 1-(k-1)x^j\right) \frac{\partial X}{\partial x^j}\\&\quad + \frac{1}{2}\sum \limits _{i\in \{i_1,\ldots , i_k \}} w_k^{(i_0,\ldots ,i_k)}(x)\left( 1-(k-1)x^i\right) \frac{\partial X}{\partial x^i}\\&\quad -\frac{k(k+1)}{2}w_k^{(i_0,\ldots ,i_k)}(x) X\\&=\frac{1}{2}\sum \limits _{i,j\in \{i_1,\ldots , i_k \}} \left( x^i(\delta _{ij}-x^j)\right) \left( w_k^{(i_0,\ldots ,i_k)}(x)\frac{\partial ^2 X}{\partial x^i \partial x^j}\right) \\&\quad +\frac{1}{2}\sum \limits _{i,j\in \{i_1,\ldots , i_k \}} \left( x^i(\delta _{ij}-x^j)\right) \frac{\partial w_k^{(i_0,\ldots ,i_k)}(x)}{\partial x^i} \frac{\partial X}{\partial x^j}\\&\quad +\frac{1}{2}\sum \limits _{i,j\in \{i_1,\ldots , i_k \}} \left( x^i(\delta _{ij}-x^j)\right) \frac{\partial w_k^{(i_0,\ldots ,i_k)}(x)}{\partial x^j} \frac{\partial X}{\partial x^i}\\&\quad +\frac{1}{2}\sum \limits _{i,j\in \{i_1,\ldots , i_k \}} \left( x^i(\delta _{ij}-x^j)\right) \frac{\partial ^2 w_k^{(i_0,\ldots ,i_k)}(x)}{\partial x^i \partial x^j} X\\&=\frac{1}{2}\sum \limits _{i,j\in \{i_1,\ldots , i_k \}} \left( x^i(\delta _{ij}-x^j)\right) \frac{\partial ^2 (w_k^{(i_0,\ldots ,i_k)} X)(x)}{\partial x^i \partial x^j}\\&=\Big (L_k^{(i_0,\ldots ,i_k)}\Big )^* \Big (w_k^{(i_0,\ldots ,i_k)}(x) X\Big ). \end{aligned}$$

This completes the proof. \(\square \)

Proposition 2.4

Let \(\nu \) be the exterior unit normal vector of the domain \(V_k^{(i_0,\ldots ,i_k)}\). Then we have

$$\begin{aligned} \sum \limits _{j\in \{i_1,\ldots , i_k \}} a_{ij}\nu ^j =0 \quad \text { on } \partial V_k^{(i_0,\ldots ,i_k)},\quad \forall i\in \{i_1,\ldots , i_k\}. \end{aligned}$$
(2)

Proof

In fact, on the surface \((x^{s}=0)\), for some \(s\in \{i_1,\ldots , i_k \}\) we have \(\nu =-e_s\), and hence \(\sum \nolimits _{j\in \{i_1,\ldots , i_k \}} a_{ij}\nu ^j=a_{is}=x^s(\delta _{si}-x^i)=0\). On the surface \((x^{i_0}=0)\) we have \(\nu =\frac{1}{\sqrt{k}}(e_{i_1}+\cdots +e_{i_k})\), hence \(\sum \nolimits _{j\in \{i_1,\ldots , i_k \}} a_{ij}\nu ^j=\frac{1}{\sqrt{k}}\sum \nolimits _{j\in \{i_1,\ldots , i_k \}} a_{ij}=\frac{1}{\sqrt{k}}x^i x^{i_0} =0\). This completes the proof. \(\square \)

Proposition 2.5

\(L_k^{(i_0,\ldots ,i_k)}\) and \((L_k^{(i_0,\ldots ,i_k)})^*\) are weighted adjoints in \(H_k^{(i_0,\ldots , i_k)}\), i.e.

$$\begin{aligned} \left( L_k^{(i_0,\ldots ,i_k)} X,w_k^{(i_0,\ldots ,i_k)} Y\right) =\left( X,\left( L_k^{(i_0,\ldots ,i_k)}\right) ^*\left( w_k^{(i_0,\ldots ,i_k)} Y\right) \right) ,\quad \forall X,Y \in H_k^{(i_0,\ldots , i_k)}. \end{aligned}$$

Proof

We put \(F^{(k)}_i(x):=\sum \nolimits _{j\in \{i_1,\ldots , i_k \}} \frac{\partial (a_{ij}(x)X(x))}{\partial x^j}\). Because of \(w_k^{(i_0,\ldots ,i_k)} Y \in C^\infty _0(\overline{V}_k^{(i_0,\ldots ,i_k)})\), the second Green formula, and Proposition 2.4, we have

$$\begin{aligned} \left( L_k^{(i_0,\ldots ,i_k)} X,w_k^{(i_0,\ldots ,i_k)} Y\right)&= \frac{1}{2}\sum \limits _{i,j\in \{i_1,\ldots , i_k \}} \int \limits _{\overline{V}_k^{(i_0,\ldots ,i_k)}} \frac{\partial ^2 (a_{ij}(x)X(x))}{\partial x^i \partial x^j} w_k^{(i_0,\ldots ,i_k)}(x)Y(x)dx\\&=\frac{1}{2}\sum \limits _{i\in \{i_1,\ldots , i_k \}} \int \limits _{\overline{V}_k^{(i_0,\ldots ,i_k)}} \frac{\partial F^{(k)}_i(x)}{\partial x^i} w_k^{(i_0,\ldots ,i_k)}(x)Y(x)dx\\&=\frac{1}{2}\sum \limits _{i\in \{i_1,\ldots , i_k \}} \int \limits _{\partial V_k^{(i_0,\ldots ,i_k)}} F^{(k)}_i(x) \nu _i w_k^{(i_0,\ldots ,i_k)}(x)Y(x)do(x)\\&\qquad -\frac{1}{2}\sum \limits _{i\in \{i_1,\ldots , i_k \}} \int \limits _{\overline{V}_k^{(i_0,\ldots ,i_k)}} F^{(k)}_i(x) \frac{\partial (w_k^{(i_0,\ldots ,i_k)}(x)Y(x))}{\partial x^i} dx\\&=-\frac{1}{2}\sum \limits _{i\in \{i_1,\ldots , i_k \}} \int \limits _{\overline{V}_k^{(i_0,\ldots ,i_k)}} F^{(k)}_i(x) \frac{\partial (w_k^{(i_0,\ldots ,i_k)}(x)Y(x))}{\partial x^i} dx\\&=-\frac{1}{2}\sum \limits _{i,j\in \{i_1,\ldots , i_k \}} \int \limits _{\overline{V}_k^{(i_0,\ldots ,i_k)}} \frac{\partial (a_{ij}(x)X(x))}{\partial x^j} \frac{\partial (w_k^{(i_0,\ldots ,i_k)}(x)Y(x))}{\partial x^i} dx\\&=-\frac{1}{2}\sum \limits _{i,j\in \{i_1,\ldots , i_k \}} \int \limits _{\partial V_k^{(i_0,\ldots ,i_k)}} a_{ij}(x)\nu _j X(x)\frac{\partial (w_k^{(i_0,\ldots ,i_k)}(x)Y(x))}{\partial x^i} do(x)\\&\qquad +\left( X,L_k^*(w_k^{(i_0,\ldots ,i_k)} Y)\right) \\&=\left( X,L_k^*(w_k Y)\right) . \end{aligned}$$

\(\square \)

Proposition 2.6

In \(\overline{V}_k^{(i_0,\ldots ,i_k)}\), \(\{X^{(k)}_{m,\alpha } \}_{m\ge 0,|\alpha |=m}\) is a basis of \(H_k^{(i_0,\ldots , i_k)}\) which is orthogonal with respect to the weights \(w_k^{(i_0,\ldots ,i_k)}\), i.e.,

$$\begin{aligned} \left( X^{(k)}_{m,\alpha }, w_k^{(i_0,\ldots ,i_k)} X^{(k)}_{j,\beta }\right) = 0, \quad \forall j\ne m, |\alpha |=m, |\beta |=j. \end{aligned}$$

Proof

\(\{X^{(k)}_{m,\alpha }\}_{m\ge 0,|\alpha |=m}\) is a basis of \(H_k^{(i_0,\ldots , i_k)}\) because \(\{x^\alpha \}_{\alpha }\) is a basis of this space. To prove the orthogonality we apply the Propositions 2.1, 2.3, 2.7 as follows

$$\begin{aligned} \begin{aligned} -\lambda ^{(k)}_{m} \left( X^{(k)}_{m,\alpha }, w_k^{(i_0,\ldots ,i_k)} X^{(k)}_{j,\beta }\right)&= \left( L_k^{(i_0,\ldots ,i_k)} X^{(k)}_{m,\alpha }, w_k^{(i_0,\ldots ,i_k)} X^{(k)}_{j,\beta } \right) \\&= \left( X^{(k)}_{m,\alpha }, (L_k^{(i_0,\ldots ,i_k)})^*(w_k^{(i_0,\ldots ,i_k)} X^{(k)}_{j,\beta })\right) \\&=-\lambda ^{(k)}_{j}\left( X^{(k)}_{m,\alpha }, w_k^{(i_0,\ldots ,i_k)} X^{(k)}_{j,\beta }\right) . \end{aligned} \end{aligned}$$

Because \(\lambda ^{(k)}_{m}\ne \lambda ^{(k)}_{j}\), this finishes the proof. \(\square \)

Proposition 2.7

  1. (i)

    The spectrum of the operator \(L_k^{(i_0,\ldots ,i_k)}\) is

    $$\begin{aligned} Spec(L_k^{(i_0,\ldots ,i_k)})=\bigcup _{m\ge 0} \left\rbrace \lambda _m^{(k)}=\frac{(m+k)(m+k+1)}{2}\right\lbrace =:\Lambda _k \end{aligned}$$

    and the eigenvectors of \(L_k^{(i_0,\ldots ,i_k)}\) corresponding to \(\lambda ^{(k)}_m\) are of the form

    $$\begin{aligned} X=\sum \limits _{|\alpha |=m} d^{(k)}_{m,\alpha } X^{(k)}_{m,\alpha }, \end{aligned}$$

    i.e., the eigenspace corresponding to \(\lambda ^{(k)}_m\) is of dimension \(k+m-1 \atopwithdelims (){k-1}\);

  1. (ii)

    The spectrum of the operator \(L_k\) is the same.

Proof

  1. (i)

    Proposition 2.1 implies that \(\Lambda _k\subseteq Spec(L_k^{(i_0,\ldots ,i_k)})\). Conversely, for \(\lambda \notin \Lambda _k\), we will prove that \(\lambda \) is not an eigenvalue of \(L_k^{(i_0,\ldots ,i_k)}\). In fact, assume that \(X \in H_k^{(i_0,\ldots , i_k)}\) such that \(L_k^{(i_0,\ldots ,i_k)} X= -\lambda X\) in \(H_k^{(i_0,\ldots , i_k)}\). Because \(\{X^{(k)}_{m,\alpha }\}_{m,\alpha }\) is an orthogonal basis of \(H_k^{(i_0,\ldots , i_k)}\) with respect to the weights \(w_k^{(i_0,\ldots ,i_k)}\) (Proposition 2.4), we can represent X by \(X=\sum \limits _{m=0}^\infty \sum \nolimits _{|\alpha |=m} d^{(k)}_{m,\alpha } X^{(k)}_{m,\alpha }\). It follows that

    $$\begin{aligned} \begin{aligned} \sum \limits _{m=0}^\infty \sum \limits _{|\alpha |=m} d^{(k)}_{m,\alpha } (-\lambda ^{(k)}_m)X^{(n)}_{m,\alpha }&=\sum \limits _{m=0}^\infty \sum \limits _{|\alpha |=m} d^{(k)}_{m,\alpha } L_k^{(i_0,\ldots ,i_k)} X^{(k)}_{m,\alpha }\\&=L_k^{(i_0,\ldots ,i_k)} X\\&=-\lambda \sum \limits _{m=0}^\infty \sum \limits _{|\alpha |=m} d^{(k)}_{m,\alpha } X^{(k)}_{m,\alpha }. \end{aligned} \end{aligned}$$

    For any \(j\ge 0\), \(|\beta |=j\), multiplying by \(w_k X^{(k)}_{j,\beta }\) and then integrating on \(\overline{V}_n\) we have

    $$\begin{aligned} \begin{aligned}&\sum \limits _{|\alpha |=j} d^{(k)}_{j,\alpha } \lambda ^{(k)}_j\left( X^{(k)}_{j,\alpha },w_k^{(i_0,\ldots ,i_k)} X^{(k)}_{j,\beta }\right) \\&\quad =\sum \limits _{|\alpha |=j} d^{(k)}_{j,\alpha } \lambda \left( X^{(k)}_{j,\alpha },w_k^{(i_0,\ldots ,i_k)} X^{(k)}_{j,\beta }\right) , \quad \forall j\ge 0, |\beta |=j,\\&\Rightarrow \left( X^{(k)}_{j,\alpha },w_k^{(i_0,\ldots ,i_k)} X^{(k)}_{j,\beta }\right) _{\beta ,\alpha } \left( d^{(k)}_{j,\alpha } \lambda ^{(k)}_j\right) _{\alpha }\\&\quad =\left( X^{(k)}_{j,\alpha },w_k^{(i_0,\ldots ,i_k)} X^{(k)}_{j,\beta }\right) _{\beta ,\alpha } \left( d^{(k)}_{j,\alpha } \lambda \right) _{\alpha },\quad \quad \forall j\ge 0, |\beta |=j,\\&\Rightarrow d^{(k)}_{j,\alpha } \lambda ^{(k)}_j=d^{(k)}_{j,\alpha } \lambda ,\quad \forall j\ge 0, |\beta |=j, \text { because } det \left( X^{(k)}_{j,\alpha },w_k^{(i_0,\ldots ,i_k)} X^{(k)}_{j,\beta }\right) _{\beta ,\alpha } \ne 0\\&\Rightarrow d^{(k)}_{j,\alpha } = 0,\quad \forall j\ge 0, |\alpha |=j, \text { because } \lambda \ne \lambda ^{(k)}_j. \end{aligned} \end{aligned}$$

    It follows that \(X=0\) in \(H_k^{(i_0,\ldots , i_k)}\). Therefore

    $$\begin{aligned} Spec(L_k^{(i_0,\ldots ,i_k)})=\bigcup _{m\ge 0} \left\rbrace \lambda _m^{(k)}=\frac{(m+k)(m+k+1)}{2}\right\lbrace =\Lambda _k. \end{aligned}$$

    Moreover, assume that \(X\in H_k^{(i_0,\ldots , i_k)}\) is an eigenvector of \(L_k^{(i_0,\ldots ,i_k)}\) corresponding to \(\lambda ^{(k)}_j\), i.e., \(L_k^{(i_0,\ldots ,i_k)} X=-\lambda _j X\). We represent X by

    $$\begin{aligned} X=\sum \limits _{m=0}^\infty \sum \limits _{|\alpha |=m} d^{(k)}_{m,\alpha } X^{(k)}_{m,\alpha }. \end{aligned}$$

    It follows that

    $$\begin{aligned} \begin{aligned} \sum \limits _{m=0}^\infty \sum \limits _{|\alpha |=m} d^{(k)}_{m,\alpha } (-\lambda ^{(k)}_m)X^{(k)}_{m,\alpha }&=\sum \limits _{m=0}^\infty \sum \limits _{|\alpha |=m} d^{(k)}_{m,\alpha } L_k^{(i_0,\ldots ,i_k)}X^{(k)}_{m,\alpha }\\&=L_k^{(i_0,\ldots ,i_k)} X\\&=-\lambda ^{(k)}_j \sum \limits _{m=0}^\infty \sum \limits _{|\alpha |=m} d^{(k)}_{m,\alpha } X^{(k)}_{m,\alpha }. \end{aligned} \end{aligned}$$

    For any \(i\ne j\), \(|\beta |=i\), multiplying by \(w_k X^{(k)}_{i,\beta }\) and then integrating on \(\overline{V}_n\) we have

    $$\begin{aligned} \begin{aligned}&\sum \limits _{|\alpha |=i} d^{(k)}_{i,\alpha } \lambda ^{(k)}_i\left( X^{(k)}_{i,\alpha },w_k^{(i_0,\ldots ,i_k)} X^{(k)}_{i,\beta }\right) \\&\quad = \sum \limits _{|\alpha |=i} d^{(k)}_{i,\alpha } \lambda ^{(k)}_j \left( X^{(k)}_{i,\alpha },w_k^{(i_0,\ldots ,i_k)} X^{(k)}_{i,\beta }\right) , \quad \forall i\ne j, |\beta |=i,\\&\Rightarrow \left( X^{(k)}_{i,\alpha },w_k^{(i_0,\ldots ,i_k)} X^{(k)}_{i,\beta }\right) _{\beta ,\alpha } (d^{(k)}_{i,\alpha } \lambda ^{(k)}_i)_{\alpha }\\&\quad =\left( X^{(k)}_{i,\alpha },w_k^{(i_0,\ldots ,i_k)} X^{(k)}_{i,\beta }\right) _{\beta ,\alpha } (d^{(k)}_{i,\alpha } \lambda ^{(k)}_j)_{\alpha },\quad \forall i\ne j, |\beta |=i,\\&\Rightarrow d^{(k)}_{i,\alpha } \lambda ^{(k)}_i=d^{(k)}_{i,\alpha } \lambda ^{(k)}_j,\quad \forall i\ne j, |\beta |=i, \text { because } det \left( X^{(k)}_{i,\alpha },w_k^{(i_0,\ldots ,i_k)} X^{(k)}_{i,\beta }\right) _{\beta ,\alpha } \ne 0,\\&\Rightarrow d^{(k)}_{i,\alpha } = 0,\quad \forall i\ne j, |\alpha |=i, \text { because } \lambda ^{(k)}_i \ne \lambda ^{(k)}_j. \end{aligned} \end{aligned}$$

    It follows that

    $$\begin{aligned} X=\sum \limits _{|\alpha |=j}d^{(k)}_{j,\alpha } X^{(k)}_{j,\alpha }. \end{aligned}$$

    This completes the proof.

  1. (ii)

    is obvious.\(\square \)

Definition of the Solution

We shall now formally derive the Fokker–Planck equation as the diffusion limit of the Wright–Fisher model and introduce our solution concept for this equation. We consider a diploid population of fixed size N with \(n+1\) possible alleles \(A_1,\ldots ,A_{n+1},\) at a given locus. Suppose that the individuals in the population are monoecious, that there are no selective differences between these alleles and no mutations. There are 2N alleles in the population in any generation, so it is sufficient to focus on the number \(Y_m=(Y_m^1,\ldots ,Y_m^n)\) of alleles \(A_1,\ldots ,A_n\) at generation time m. Assume that \(Y_0=i_0=(i_0^1,\ldots ,i_0^n)\) and according to the Wright–Fisher model, the alleles in generation \(m+1\) are derived by sampling with replacement from the alleles of generation m. Thus, the transition probability is

$$\begin{aligned} \mathbb {P}(Y_{m+1}=j|Y_m=i)=\frac{(2N)!}{(j^0)! (j^1)! \ldots (j^n)!} \prod _{k=0}^n \left( \frac{i^k}{2N}\right) ^{j^k}, \end{aligned}$$

where

$$\begin{aligned} i,j \in S_n^{(2N)}=\Bigg \{i=(i^1,\ldots ,i^n): i^k\in \{0,1,\ldots , 2N\}, \sum _{k=1}^n i^k \le 2N\Bigg \}, \end{aligned}$$

and

$$\begin{aligned} i^0=2N-|i|=2N-i^1-\cdots -i^n;\quad \quad j^0=2N-|j|=2N-j^1-\cdots -j^n. \end{aligned}$$

After rescaling

$$\begin{aligned} t=\frac{m}{2N},\quad \, \, X_t=\frac{Y_t}{2N}, \end{aligned}$$

we have a discrete Markov chain \(X_t\) valued in \(\{0, \frac{1}{2N},\ldots ,1\}^n\) with \(t=1\) now corresponding to 2N generations. It is easy to see that

$$\begin{aligned} X_0= & {} p=\frac{i_0}{2N},\nonumber \\ \mathbb {E}(\delta X^i_t)= & {} 0,\nonumber \\ \mathbb {E}(\delta X^i_t. \delta X^j_t)= & {} (X^i_t)(\delta _{ij}-X^j_t),\nonumber \\ \mathbb {E}(\delta X_t)^\alpha= & {} (\delta t) \quad \text { for }|\alpha |\ge 3. \end{aligned}$$
(3)

We now denote by \(m_\alpha (t)\) the \(\alpha ^{th}-\)moment of the distribution about zero at the \(t^{th}\) generation, i.e.,

$$\begin{aligned} m_\alpha (t)=\mathbb {E}(X_t)^{\alpha }. \end{aligned}$$

Then

$$\begin{aligned} m_\alpha (t+1)=\mathbb {E}(X_t+\delta X_t)^{\alpha }. \end{aligned}$$

Expanding the right hand side and noting (3) we obtain the following recursion formula, under the assumption that the population number N is sufficiently large to neglect terms of order \(\frac{1}{N^2}\) and higher,

(4)

Under this assumption, the moments change very slowly per generation and we can replace this system of difference equations by a system of differential equations:

$$\begin{aligned} \dot{m}_\alpha (t)=-\frac{|\alpha |(|\alpha |-1)}{2} m_\alpha (t)+\sum \limits _{i=1}^n \frac{\alpha _i (\alpha _i-1)}{2}m_{\alpha -e_i}(t). \end{aligned}$$
(5)

With the aim of finding a continuous process which approximates the above discrete process, we should look for a continuous Markov process \(\{X_t\}_{t\ge 0}\) valued in \([0,1]^n\) with the same conditions as (3) and (5). Denoting by u(xt) the probability density function of this continuous process, the condition (3) implies (see for example [9, p. 137], or for a more rigorous analysis [68]) that u is a solution of the Fokker–Planck (Kolmogorov forward) equation

$$\begin{aligned} \left\{ \begin{array}{l} u_t=L_nu \text { in }V_n\times (0,\infty ),\\ u(x,0)=\delta _p(x) \text { in }V_n; \end{array}\right. \end{aligned}$$
(6)

and the condition (5) implies

$$\begin{aligned}{}[u_t,x^\alpha ]_n=\left[ u,-\frac{|\alpha |(|\alpha |-1)}{2} x^\alpha +\sum \limits _{i=1}^n \frac{\alpha _i (\alpha _i-1)}{2} x^{\alpha -e_i}\right] _n=[u,L^*(x^\alpha )]_n,\quad \forall \alpha , \end{aligned}$$

and hence, since the polynomials are dense in \(H_n\) w.r.t. the product \([.,.]_n\),

$$\begin{aligned}{}[u_t,\phi ]_n=[u,L_n^*\phi ]_n,\quad \forall \phi \in H_n. \end{aligned}$$
(7)

This leads us to the following definition of a solution.

Definition 2.8

We call \(u\in H\) a solution of the Fokker–Planck equation associated with the Wright–Fisher model if

$$\begin{aligned} u_t= & {} L_n u \text { in }V_n\times (0,\infty ), \end{aligned}$$
(8)
$$\begin{aligned} u(x,0)= & {} \delta _p(x) \text { in } V_n, \end{aligned}$$
(9)
$$\begin{aligned}{}[u_t,\phi ]_n= & {} [u,L_n^* \phi ]_n,\quad \, \forall \phi \in H_n. \end{aligned}$$
(10)

We point out that the last of these equations implicitly contains the boundary behavior that we wish to impose upon our solution. This will become clear from our construction in the next section.

The Global Solution

In this subsection, we shall construct the solution and prove the existence as well as the uniqueness of the solution. The process of finding the solution is as follows: We first find the general solution of the Fokker–Planck equation (21) by the separation of variables method. Then we construct a solution depending on certain parameters. We then use the conditions of (9, 10) to determine the parameters. Finally, we verify that we have indeed found the solution.

Step 1 Working on \(V_n\), assume that \(u_n(\mathbf {x},t)=X(\mathbf {x})T(t)\) is a solution of the Fokker–Planck equation (21). Then we have

$$\begin{aligned} \frac{T_t}{T}=\frac{L_n X}{X}=-\lambda . \end{aligned}$$

Clearly \(\lambda \) is a constant which is independent on TX. From the Proposition 2.7 we obtain the local solution of the Eq. (21) of the form

$$\begin{aligned} u_n(\mathbf {x},t)=\sum \limits _{m=0}^\infty \sum \limits _{|\alpha |=m}c^{(n)}_{m,\alpha } X^{(n)}_{m,\alpha }(\mathbf {x}) e^{-\lambda ^{(n)}_m t}, \end{aligned}$$

where

$$\begin{aligned} \lambda _m^{(n)}=\frac{(n+m)(n+m+1)}{2} \end{aligned}$$

is the eigenvalue of \(L_n\) and

$$\begin{aligned} X^{(n)}_{m,\alpha }(\mathbf {x}),\quad |\alpha |=m \end{aligned}$$

are the corresponding eigenvectors of \(L_n\).

For \(m\ge 0, |\beta |=m\), we conclude from Proposition 2.3 that

$$\begin{aligned} L_n^*\Big (w_n X^{(n)}_{m,\beta }\Big )=-\lambda _m^{(n)}w_n X^{(n)}_{m,\beta }. \end{aligned}$$

It follows that

$$\begin{aligned} \left[ u_t, w_n X^{(n)}_{m,\beta }\right] _n&= \Big [u,L_n^*\Big (w_n X^{(n)}_{m,\beta }\Big )\Big ]_n\quad \text {(the moment condition)}\\&=-\lambda _m^{(n)}\Big [u,w_n X^{(n)}_{m,\beta }\Big ]_n. \end{aligned}$$

Therefore

$$\begin{aligned} \left[ u, w_n X^{(n)}_{m,\beta }\right] _n&= \left[ u(\cdot ,0), w_n X^{(n)}_{m,\beta }\right] _n e^{ -\lambda _m^{(n)} t}\\&= w_n(\mathbf {p}) X^{(n)}_{m,\beta }(\mathbf {p}) e^{ -\lambda _m^{(n)} t}. \end{aligned}$$

Thus,

$$\begin{aligned} w_n(\mathbf {p}) X^{(n)}_{m,\beta }(\mathbf {p}) e^{ -\lambda _m^{(n)} t}&= \left[ u, w_n X^{(n)}_{m,\beta }\right] _n\\&=\left( u_n, w_n X^{(n)}_{m,\beta }\right) _n\quad \text {(because } w_n \text { vanishes on boundary)}\\&=\sum _{|\alpha |=m} c^{(n)}_{m,\alpha } \left( X^{(n)}_{m,\alpha },w_n X^{(n)}_{m,\beta }\right) _n e^{-\lambda ^{(n)}_m t}. \end{aligned}$$

It follows that

$$\begin{aligned} \Big (c^{(n)}_{m,\alpha }\Big )_{\alpha }= \Bigg [\Bigg ((X^{(n)}_{m,\alpha },w_n X^{(n)}_{m,\beta })_n\Bigg )_{\alpha ,\beta }\Bigg ]^{-1}\Bigg (w_n(\mathbf {p}) X^{(n)}_{m,\beta }(\mathbf {p})\Bigg )_\beta . \end{aligned}$$

Step 2 The solution \(u\in H\) satisfying (21) will be found in the following form

$$\begin{aligned} \begin{aligned} u(\mathbf {x},t)=\sum \limits _{k=1}^n u_k(\mathbf {x},t) \chi _{V_k}(x) + \sum \limits _{i=0}^n u^i_0(\mathbf {x},t) \delta _{e^i}(\mathbf {x}). \end{aligned} \end{aligned}$$
(11)

We use the condition (10) to obtain iteratively values of \(u_k, \, k=n-1,\ldots ,0\). In fact, assume that we want to calculate \(u^{(0,\ldots ,n-1)}_{n-1}(x^1,\ldots ,x^{n-1},0,t)\).

We note that, if we choose

$$\begin{aligned} \phi (\mathbf {x})=x^1\cdots x^n X^{(n-1)}_{k,\beta }(x^1,\ldots ,x^{n-1}), \quad |\beta |=k, \end{aligned}$$

then \(\phi (\mathbf {x})\) vanishes on faces of dimension at most \(n-1\) except the face \(V_{n-1}^{0,\ldots ,n-1}\). Therefore, the expectation of \(\phi \) will be

$$\begin{aligned}{}[u,\phi ]_n=(u_n,\phi )_n+\left( u^{(0,\ldots ,n-1)}_{n-1}, \phi \right) _{n-1}. \end{aligned}$$

The left hand side can be calculated easily by the condition (10)

$$\begin{aligned}{}[u_t,\phi ]_n= [u,L_n^*(\phi )]_n=-\lambda ^{(n-1)}_k [u,\phi ]_n. \end{aligned}$$
(12)

It follows that

$$\begin{aligned}{}[u,\phi ]_n = \phi (\mathbf {p}) e^{-\lambda ^{(n-1)}_k t}. \end{aligned}$$

The first part of the right hand side is

$$\begin{aligned} (u_n,\phi )_n= \sum _{m,\alpha } c^{(n)}_{m,\alpha }\Bigg ( \int _{V_n} X^{(n)}_{m,\alpha }(\mathbf {x})\phi (\mathbf {x})d\mathbf {x}\Bigg ) e^{-\lambda ^{(n)}_m t}. \end{aligned}$$

Therefore we can expand \(u^{(0,\ldots ,n-1)}_{n-1}(x^1,\ldots ,x^{n-1},0,t)\) as follows

$$\begin{aligned} u^{(0,\ldots ,n-1)}_{n-1}(x^1,\ldots ,x^{n-1},0,t)&= \sum _{m\ge 0} c^{(n-1)}_m(\mathbf {x}) e^{-\lambda ^{(n-1)}_m t}\\&=\sum _{m\ge 0}\sum _{l\ge 0}\sum _{|\alpha |=l} c^{(n-1)}_{m,l,\alpha }X^{(n-1)}_{l,\alpha }(x^1,\ldots ,x^{n-1}) e^{-\lambda ^{(n-1)}_m t}. \end{aligned}$$

Putting this formula into Eq. (12), we shall obtain all the coefficients \(c^{(n-1)}_{m,l,\alpha }\). Thus, we shall obtain \(u^{(0,\ldots ,n-1)}_{n-1}(x^1,\ldots ,x^{n-1},0,t)\). Similarly we shall obtain \(u_{n-1}\). And finally we shall obtain all \(u_k, k=n-1,\ldots ,0\). Thus, we shall obtain the global solution in the form

$$\begin{aligned} u(\mathbf {x},t)= & {} \sum \limits _{k=1}^n u_k \chi _{V_k}(\mathbf {x}) +\sum \limits _{i=0}^n u_0^i(\mathbf {x},t) \delta _{e_i}(\mathbf {x}).\nonumber \\= & {} \sum \limits _{k=1}^n\sum \limits _{m \ge 0}\sum _{l\ge 0} \sum \limits _{|\alpha |=l}c^{(k)}_{m,l,\alpha } X^{(k)}_{l,\alpha }(\mathbf {x}) e^{-\lambda ^{(k)}_m t} \chi _{V_k}(\mathbf {x}) +\sum \limits _{i=0}^n u_0^i(\mathbf {x},t) \delta _{e_i}(\mathbf {x}). \end{aligned}$$
(13)

It is not difficult to show that u is a solution of the Fokker–Planck equation associated with the WF model.

Step 3 We can easily see that this solution is unique. In fact, assume that \(u_1,u_2\) are two solutions of the Fokker–Planck equation associated with the WF model. Then \(u=u_1-u_2\) will satisfy

$$\begin{aligned} u_t&=L_n u \text { in } V_n \times (0,\infty ),\\ u(x,0)&=0 \text { in } \overline{V}_n,\\ [u_t,\phi ]_n&=[u,L^* \phi ]_n,\quad \, \forall \phi \in H_n. \end{aligned}$$

It follows that

$$\begin{aligned}{}[u_t,1]_n&=[u,L_n^*(1)]_n=0,\\ [u_t,x^i]_n&=[u,L_n^*(x^i)]_n=0,\\ \left[ u_t,w_k^{(i_0,\ldots , i_k)} X^{(k)}_{j,\alpha }\chi _{V_k^{(i_0,\ldots , i_k)}}\right] _n&=\left[ u,L_n^*(w_k^{(i_0,\ldots , i_k)} X^{(k)}_{j,\alpha }\chi _{V_k^{(i_0,\ldots , i_k)}})\right] _n\\&=\left[ u,L_k^*(w_k^{(i_0,\ldots , i_k)} X^{(k)}_{j,\alpha }\chi _{V_k^{(i_0,\ldots , i_k)}})\right] _n\\&=-\lambda ^{(k)}_{j} \left[ u,w_k^{(i_0,\ldots , i_k)} X^{(k)}_{j,\alpha }\chi _{V_k^{(i_0,\ldots , i_k)}}\right] _n. \end{aligned}$$

Therefore

$$\begin{aligned}{}[u,1]_n&=[u(\cdot , 0),1]_n=0,\\ [u,x^i]_n&=[u(\cdot ,0),x^i]_n=0,\\ \left[ u,w_k^{(i_0,\ldots , i_k)} X^{(k)}_{j,\alpha }\chi _{V_k^{(i_0,\ldots , i_k)}}\right] _n&=\left[ u(\cdot ,0),w_k^{(i_0,\ldots , i_k)} X^{(k)}_{j,\alpha }\chi _{V_k^{(i_0,\ldots , i_k)}}\right] _ne^{-\lambda ^{(k)}_j t}=0. \end{aligned}$$

Since \(\{1,\{x^i\}_i,\{w_k^{(i_0,\ldots , i_k)} X^{(k)}_{j,\alpha }\chi _{{V_k}^{(i_0,\ldots , i_k)}}\}_{1\le k\le n,(i_0,\ldots ,i_k)\in I_k,j\ge 0,|\alpha |=j} \}\) is also a basis of \(H_n\) it follows that \(u=0 \in H\).

In conclusion, we have established:

Theorem 2.9

The Fokker Planck equation associated with the Wright–Fisher model with \(n+1\) alleles possesses the unique solution

$$\begin{aligned} u(\mathbf {x},t)= & {} \sum \limits _{k=1}^n u_k \chi _{V_k}(\mathbf {x}) +\sum \limits _{i=0}^n u_0^i(\mathbf {x},t) \delta _{e_i}(\mathbf {x})\nonumber \\= & {} \sum \limits _{k=1}^n\sum \limits _{m \ge 0}\sum _{l\ge 0} \sum \limits _{|\alpha |=l}c^{(k)}_{m,l,\alpha } X^{(k)}_{l,\alpha }(\mathbf {x}) e^{-\lambda ^{(k)}_m t} \chi _{V_k}(\mathbf {x}) +\sum \limits _{i=0}^n u_0^i(\mathbf {x},t) \delta _{e_i}(\mathbf {x}). \end{aligned}$$
(14)

Example 2.10

To illustrate this process, we consider the case of three alleles.

We shall construct the global solution for the problem

$$\begin{aligned} \left\{ \begin{array}{l} \frac{\partial u}{\partial t} = L_2 u,\quad \text { in } V_2\times (0,\infty ),\\ u(\mathbf {x},0)=\delta _{\mathbf {p}}(\mathbf {x}),\quad \mathbf {x}\in V_2,\\ {[u_t,\phi ]}_2 =[u,L_2^* \phi ]_2, \quad \text { for all } \phi \in H_2, \end{array}\right. \end{aligned}$$

where the global solution of the form

$$\begin{aligned} u=u_2 \chi _{V_2}+ u_1^{0,1} \chi _{V_{1}^{0,1}}+u_1^{0,2} \chi _{V_{1}^{0,2}}+u_1^{0,0} \chi _{V_{1}^{0,0}} + u_0^1 \chi _{V_0^1}+u_0^2 \chi _{V_0^2}+u_0^0 \chi _{V_0^0}, \end{aligned}$$

and the product is

$$\begin{aligned} {[}u,\phi ]_2&=\int _{V_2}u_2 \phi _{|V_2}d\mathbf {x}+ \int _{0}^1 u_1^{0,1}(x^1,0,t)\phi (x^1,0)dx^1+ \int _{0}^1 u_1^{0,2}(0,x^2,t)\phi (0,x^2)dx^2\\&\qquad + \frac{1}{\sqrt{2}}\int _{0}^1 u_1^{1,2}(x^1,1-x^1,t)\phi (x^1,1-x^1)dx^1\\&\qquad +u_0^1(1,0,t)\phi (1,0)+u_0^2(0,1,t)\phi (0,1)+u_0^0(0,0,t)\phi (0,0). \end{aligned}$$

Step 1: We find the local solution \(u_2\)

$$\begin{aligned} u_2(\mathbf {x},t)=\sum _{m\ge 0}\sum _{\alpha ^1+\alpha ^2=m} c^{(2)}_{m,\alpha ^1,\alpha ^2} X^{(2)}_{m,\alpha ^1,\alpha ^2}(\mathbf {x})e^{-\lambda _m^{(2)}t}. \end{aligned}$$

To define the coefficients \(c^{(2)}_{m,\alpha ^1,\alpha ^2}\) we use the initial condition and the orthogonality of the eigenvectors \(X^{(2)}_{m,\alpha ^1,\alpha ^2}\) to get

$$\begin{aligned} w_2(\mathbf {p})X^{(2)}_{m,\beta ^1,\beta ^2}(\mathbf {p})&=\Big [u(0),w_2 X^{(2)}_{m,\beta ^1,\beta ^2}\Big ]_2\\&=\Big (u_2(0),w_2 X^{(2)}_{m,\beta ^1,\beta ^2}\Big )_2\quad \text { because } w_2 \text { vanishes on the boundary}\\&=\sum _{\alpha ^1+\alpha ^2=m} c^{(2)}_{m,\alpha ^1,\alpha ^2} \Big (X^{(2)}_{m,\alpha ^1,\alpha ^2},w_2X^{(2)}_{m,\beta ^1,\beta ^2}\Big )\quad \text {for all } \beta ^1+\beta ^2=m. \end{aligned}$$

Since the matrix

$$\begin{aligned} \Big (X^{(2)}_{m,\alpha ^1,\alpha ^2},w_2X^{(2)}_{m,\beta ^1,\beta ^2}\Big )_{(\alpha ^1,\alpha ^2), (\beta ^1,\beta ^2)} \end{aligned}$$

is positive definite, we therefore have unique values of the \(c^{(2)}_{m,\alpha ^1,\alpha ^2}\). It follows that we have a unique local solution \(u_2\).

Step 2: We will use the moment condition to define all other coefficients of the global solution.

Firstly, we define the coefficients of \(u_1^{1,2}\) as follows

$$\begin{aligned} u_1^{1,2}(x^1,1-x^1,t)&=\sum _{m\ge 0}c_{m}(x^1) e^{-\lambda ^{(1)}_m t} \end{aligned}$$
(15)
$$\begin{aligned}&=\sum _{m,l\ge 0}c_{m,l} X^{(1)}_l(x^1) e^{-\lambda ^{(1)}_m t}. \end{aligned}$$
(16)

We note that

$$\begin{aligned} L_2^* \Big (x^1 x^2 X_k^{(1)}(x^1)\Big )=-\lambda ^{(1)}_k x^1 x^2 X_k^{(1)}(x^1). \end{aligned}$$

Therefore

$$\begin{aligned} \Big [u_t, x^1 x^2 X_k^{(1)}(x^1)\Big ]_2=\Big [u, L^*_2 \Big (x^1 x^2 X_k^{(1)}(x^1)\Big )\Big ]_2=-\lambda ^{(1)}_k \Big [u,x^1 x^2 X_k^{(1)}(x^1)\Big ]_2. \end{aligned}$$

It follows that

$$\begin{aligned} \Big [u,x^1 x^2 X_k^{(1)}(x^1)\Big ]_2=p^1 p^2 X_k^{(1)}(p^1) e^{-\lambda _k^{(1)}t}. \end{aligned}$$

Thus we have

$$\begin{aligned} p^1 p^2 X_k^{(1)}(p^1) e^{-\lambda _k^{(1)}t}&=\Big [u,x^1 x^2 X_k^{(1)}(x^1)\Big ]_2\\&=\Big (u_2,x^1 x^2 X_k^{(1)}(x^1)\Big )_2 + \Big (u_1^{1,2},x^1 (1-x^1) X_k^{(1)}(x^1) \Big )_1\\&\qquad \text {because } x^1 x^2 \text { vanish on the other boundaries,}\\&=\sum _{m\ge 0}\Bigg (\sum _{|\alpha |=m} c^{(2)}_{m,\alpha }\Bigg ( \int _{V_2} x^1 x^2 X_{m,\alpha }^{(2)}(x^1,x^2) X_k^{(1)}(x^1)d\mathbf {x}\Bigg )\Bigg )e^{-\lambda ^{(2)}_{m}t}\\&\qquad +\sum _{m\ge 0} c_{m,k} \Big (X_k^{(1)},w_1 X_k^{(1)}\Big ) e^{-\lambda ^{(1)}_m t}\\&\qquad \text {because of the orthogonality of } (\cdot ,\cdot )_1 \text { with respect to } w_1,\\&=\sum _{m\ge 0}r_m e^{-\lambda ^{(2)}_{m}t}+\sum _{m\ge 0} c_{m,k}d_k e^{-\lambda ^{(1)}_m t}\ .\\ \end{aligned}$$

By equating the coefficients of \(e^{\alpha t}\) we obtain \(u_1^{1,2}\). Similarly we obtain \(u_1\). Then, we define the coefficients of \(u_0^1\) from the first moment.

Note that when \(\phi = x^i\), \(L_2^*(\phi )=0\), therefore \([u_t,\phi ]_2=0\) or

$$\begin{aligned}{}[u,x^i]_2=[u(0),x^i]=p^i. \end{aligned}$$

It follows that

$$\begin{aligned} p^1=[u,x^1]= (u_2,x^1)_2 + (u_1^{0,1},x^1)_1+(u_1^{1,2},x^1)_1+ u_0^1(1,0,t). \end{aligned}$$

Thus we obtain \(u_0^1(1,0,t)\). Similarly we get all \(u_0\). Therefore we obtain the global solution u.

It is easy to check that u is a global solution. To prove the uniqueness we proceed as follows: Assume that u is the difference of any two global solutions, i.e., u satisfies

$$\begin{aligned} \left\{ \begin{array}{l} u_t=L_2u \quad \text { in } V_2\times (0,\infty ),\\ u(\mathbf {x},0)=0\quad \text { in } V_2\\ {[u_t,\phi ]}_2={[u,L_2^* \phi ]}_2 \quad \text { for all } \phi \in H_2. \end{array}\right. \end{aligned}$$

We shall prove that

$$\begin{aligned}{}[u,\phi ]_2=0\quad \forall \phi \in H_2. \end{aligned}$$
(17)

In fact,

$$\begin{aligned}{}[u_t,1]_2&=[u,L_2^*(1)]_2=0 \Rightarrow [u,1]_2=[u(0),1]_2=0,\\ [u_t,x^i]_2&=[u,L_2^*(x^i)]_2=0 \Rightarrow [u,x^i]_2=[u(0),x^i]_2=0,\\ [u_t,w_1(x^i)X^{(1)}_m(x^i)]_2&=[u,L_2^*(w_1(x^i)X^{(1)}_m(x^i))]_2=-\lambda _m^{(1)} [u,w_1(x^i)X^{(1)}_m(x^i)]_2.\\ \end{aligned}$$

It implies that

$$\begin{aligned} \left[ u,w_1(x^i)X^{(1)}_m(x^i)\right] _2=\left[ u(0),w_1(x^i)X^{(1)}_m(x^i)\right] _2 e^{-\lambda _m^{(1)} t}=0. \end{aligned}$$

Therefore

$$\begin{aligned} \Big [u_t,w_2(x^1,x^2)X^{(2)}_{m,\alpha }(x^1,x^2)\Big ]_2&=\Big [u,L_2^*(w_2(x^1,x^2)X^{(2)}_{m,\alpha }(x^1,x^2))\Big ]_2\\&=-\lambda _m^{(2)} \Big [u,w_2(x^1,x^2)X^{(2)}_{m,\alpha }(x^1,x^2)\Big ]_2,\\ \Rightarrow \Big [u,w_2(x^1,x^2)X^{(2)}_{m,\alpha }(x^1,x^2)\Big ]_2&=\Big [u(0),w_2(x^1,x^2)X^{(2)}_{m,\alpha }(x^1,x^2)\Big ]_2 e^{-\lambda _m^{(2)} t}=0\ . \end{aligned}$$

We need only to prove that Eq. (17) holds for all

$$\begin{aligned} \phi (x^1,x^2)=(x^1)^m(x^2)^n,\quad \forall m,n\ge 0. \end{aligned}$$
  1. (1)

    If \(n=0, m\ge 0\), we see that \(\phi \) can be generated from \(\{1,x^1,w_1(x^1)X^{(1)}_m(x^1)\}\), therefore \([u,\phi ]_2=0\);

  2. (2)

    If \(m=0, n\ge 0\), we see that \(\phi \) can be generated from \(\{1,x^2,w_1(x^2)X^{(1)}_m(x^2)\}\), therefore \([u,\phi ]_2=0\);

  3. (3)

    If \(n=1, m\ge 1\), we expand \((x^1)^{m-1}\) by

    $$\begin{aligned} (x^1)^{m-1}=\sum _{k\ge 0} c_k X_k^{(1)}(x^1). \end{aligned}$$

    Note that

    $$\begin{aligned} L_2^*\Big (x^1 x^2 X_k^{(1)}(x^1)\Big )=-\lambda _k^{(1)}x^1 x^2 X_k^{(1)}(x^1). \end{aligned}$$

    Therefore

    $$\begin{aligned} \Big [u_t, x^1 x^2 X_k^{(1)}(x^1)]_2=[u, L_2^*\Big (x^1 x^2 X_k^{(1)}(x^1)\Big )\Big ]_2=-\lambda _k^{(1)} [u,x^1 x^2 X_k^{(1)}(x^1)]_2. \end{aligned}$$

    It follows that

    $$\begin{aligned}{}[u,x^1 x^2 X_k^{(1)}(x^1)]_2=[u(0),x^1 x^2 X_k^{(1)}(x^1)]_2 e^{-\lambda ^{(1)}_k}=0. \end{aligned}$$

    Therefore

    $$\begin{aligned}{}[u,\phi ]_2=\sum _{k\ge 0} c_k [u,x^1 x^2 X_k^{(1)}(x^1)]_2=0; \end{aligned}$$
  4. (4)

    If \(n\ge 2, m\ge 1\) we proceed by induction w.r.t. n. We have

    $$\begin{aligned} (x^1)^m(x^2)^n&= x^1x^2(x^1+x^2-1)(x^1)^{m-1}(x^2)^{n-2}+(x^1)^m(1-x^1)(x^2)^{n-1}\\&=-w_2(x^1,x^2)(x^1)^{m-1}(x^2)^{n-2}+(x^1)^m(1-x^1)(x^2)^{n-1}. \end{aligned}$$

    By the inductive assumption, we have

    $$\begin{aligned} \left[ u, (x^1)^m(1-x^1)(x^2)^{n-1}\right] _2=0. \end{aligned}$$

    Then, we expand \((x^1)^{m-1}(x^2)^{n-2}\) by

    $$\begin{aligned} (x^1)^{m-1}(x^2)^{n-2}=\sum _{m,\alpha }c^{(2)}_{m,\alpha }X^{(2)}_{m,\alpha }(x^1,x^2). \end{aligned}$$

    Therefore

    $$\begin{aligned} \left[ u,w_2(x^1,x^2)(x^1)^{m-1}(x^2)^{n-2}\right] _2=\sum _{m,\alpha }c^{(2)}_{m,\alpha }\left[ u,w_2(x^1,x^2)X^{(2)}_{m,\alpha }(x^1,x^2)\right] _2=0. \end{aligned}$$

    It follows that \([u,(x^1)^m(x^2)^n]_2=0\).

    Thus, \(u=0\).

Applications

In this section, we present some applications of our global solution to the evolution of the process \((X_t)_{t\ge 0}\) such as the expectation and the second moment of the absorption time, the probability distribution of the absorption time for having \(k+1\) alleles, the probability of having exactly \(k+1\) alleles, the \(\alpha \mathrm{th}\) moments, the probability of heterogeneity, and the rate of loss of one allele in a population having \(k+1\) alleles. Several of our formulas are known from other methods, see [9, 1517, 19, 20], but we emphasize here the general and unifying approach.

The Absorption Time for Having \(k+1\) Alleles

The moments of the sojourn and absorption times were derived by Nagylaki [22] for two alleles, and by Lessard and Lahaie [18] in the multi-allele case. We denote by \(T^{k+1}_{n+1}(p)=\inf \{{t>0: X_t\in \overline{V}_k}|X_0=p \}\) the first time when the population has (at most) \(k+1\) alleles. \(T^{k+1}_{n+1}(p)\) is a continuous random variable valued in \([0,\infty )\) and we denote by \(\phi (t,p)\) its probability density function. It is easy to see that \(\overline{V}_k\) is invariant under the process \((X_t)_{t \ge 0}\), i.e. if \(X_s \in \overline{V}_k\) then \(X_t\in \overline{V}_k\) for all \(t\ge s\) (once an allele is lost from the population, it can never again be recovered). We have the equality

$$\begin{aligned} \mathbb {P}\left( T^{k+1}_{n+1}(p)\le t\right) =\mathbb {P}(X_{t}\in \overline{V}_k|X_0=p)=\int _{\overline{V}_k}u(x,p,t)d\mu (x). \end{aligned}$$

It follows that

$$\begin{aligned} \phi (t,p)=\int _{\overline{V}_k}\frac{\partial }{\partial t} u(x,p,t)d\mu (x). \end{aligned}$$

Therefore the expectation for the absorption time of having \(k+1\) alleles is (see [9, p. 194]):

$$\begin{aligned} \begin{aligned} \mathbb {E}(&T^{k+1}_{n+1}(p))=\int _0^\infty t\phi (t,p)dt\\&=\int _{\overline{V}_k}\int _0^\infty t\frac{\partial }{\partial t} u(x,p,t)dtd\mu (x)\\&=\sum \limits _{j=1}^k \sum \limits _{(i_0,\ldots ,i_j)\in I_j} \sum \limits _{m\ge 0} \sum \limits _{|\alpha |=m} c_{m,\alpha }^{(j)} \int _{{V}^{(i_0,\ldots ,i_j)}_j} X^{(j)}_{m,\alpha }(x)\left( \int _0^\infty t\frac{\partial }{\partial t} e^{-\lambda _m^{(j)}t}dt\right) d\mu ^{(i_0,\ldots ,i_j)}_j(x)\\&\qquad +\sum \limits _{i=0}^n \sum \limits _{k=1}^n\sum \limits _{m \ge 0} \sum \limits _{|\alpha |=m}c^{(k)}_{m,\alpha }a^{(k)}_{m,\alpha ,i}\left( \int _0^\infty t\frac{\partial }{\partial t} e^{-\lambda ^{(k)}_m t}dt\right) \\&=\sum \limits _{j=1}^k \sum \limits _{(i_0,\ldots ,i_j)\in I_j} \sum \limits _{m\ge 0} \sum \limits _{|\alpha |=m} c_{m,\alpha }^{(j)} \int _{{V}^{(i_0,\ldots ,i_j)}_j} X^{(j)}_{m,\alpha }(x)\left( -\frac{1}{\lambda _m^{(j)}}\right) d\mu ^{(i_0,\ldots ,i_j)}_j(x)\\&\qquad +\sum \limits _{i=0}^n \sum \limits _{k=1}^n\sum \limits _{m \ge 0} \sum \limits _{|\alpha |=m}c^{(k)}_{m,\alpha }a^{(k)}_{m,\alpha ,i}\left( -\frac{1}{\lambda _m^{(k)}}\right) ; \end{aligned} \end{aligned}$$

and the second moment of this absorption time is (see [1, 20]):

$$\begin{aligned} \begin{aligned} \mathbb {E}(&T^{k+1}_{n+1}(p))^2=\int _0^\infty t^2\phi (t,p)dt\\&=\int _{\overline{V}_k}\int _0^\infty t^2\frac{\partial }{\partial t} u(x,p,t)dtd\mu (x)\\&=\sum \limits _{j=1}^k \sum \limits _{(i_0,\ldots ,i_j)\in I_j} \sum \limits _{m\ge 0} \sum \limits _{|\alpha |=m} c_{m,\alpha }^{(j)} \int _{{V}^{(i_0,\ldots ,i_j)}_j} X^{(j)}_{m,\alpha }(x)\left( \int _0^\infty t^2\frac{\partial }{\partial t} e^{-\lambda _m^{(j)}t}dt\right) d\mu ^{(i_0,\ldots ,i_j)}_j(x)\\&\qquad +\sum \limits _{i=0}^n \sum \limits _{k=1}^n\sum \limits _{m \ge 0} \sum \limits _{|\alpha |=m}c^{(k)}_{m,\alpha }a^{(k)}_{m,\alpha ,i}\left( \int _0^\infty t^2\frac{\partial }{\partial t} e^{-\lambda ^{(k)}_m t}dt\right) \\&=\sum \limits _{j=1}^k \sum \limits _{(i_0,\ldots ,i_j)\in I_j} \sum \limits _{m\ge 0} \sum \limits _{|\alpha |=m} c_{m,\alpha }^{(j)} \int _{{V}^{(i_0,\ldots ,i_j)}_j} X^{(j)}_{m,\alpha }(x)\left( -\frac{2}{(\lambda _m^{(j)})^2}\right) d\mu ^{(i_0,\ldots ,i_j)}_j(x)\\&\qquad +\sum \limits _{i=0}^n \sum \limits _{k=1}^n\sum \limits _{m \ge 0} \sum \limits _{|\alpha |=m}c^{(k)}_{m,\alpha }a^{(k)}_{m,\alpha ,i}\left( -\frac{2}{(\lambda _m^{(k)})^2}\right) . \end{aligned} \end{aligned}$$

In order to see what this means, we consider the case of three alleles:

$$\begin{aligned} \begin{aligned} u(x^1,x^2;t)&=u_2(x^1,x^2;t) \chi _{V_2}+ u_1^{0,1}(x^1,0;t) \chi _{V_{1}^{0,1}}\\&\quad +u_1^{0,2}(0,x^2;t) \chi _{V_{1}^{0,2}}+u_1^{0,0}(x^1,1-x^1;t) \chi _{V_{1}^{0,0}}\\&\quad + u_0^1(t) \delta _{e_1}+u_0^2(t) \delta _{e_2}+u_0^0(t) \delta _{e_0}, \end{aligned} \end{aligned}$$

and the product is

$$\begin{aligned}{}[u,\phi ]_2&=(u_2,\phi )_2+(u_1^{0,1},\phi (\cdot ,0))_1+(u_1^{0,2},\phi (0,\cdot ))_1+(u_1^{0,1},\phi (\cdot ,1-\cdot ))_1\\&\qquad + u_0^1(1,0;t)\phi (1,0)+u_0^2(0,1;t)\phi (0,1)+u_0^0(0,0;t)\phi (0,0)\\&=\int _{V_2}u_2(x^1,x^2;t) \phi (x^1,x^2)dx^1dx^2 + \int _{0}^1 u_1^{0,1}(x^1,0;t)\phi (x^1,0)dx^1\\&\qquad + \int _{0}^1 u_1^{0,2}(0,x^2;t)\phi (0,x^2)dx^2+ \frac{1}{\sqrt{2}}\int _{0}^1 u_1^{1,2}(x^1,1-x^1;t)\phi (x^1,1-x^1)dx^1\\&\qquad +u_0^1(1,0;t)\phi (1,0)+u_0^2(0,1;t)\phi (0,1)+u_0^0(0,0;t)\phi (0,0). \end{aligned}$$

By expansion of eigenvectors, we have

$$\begin{aligned} u_2(\mathbf {x};\mathbf {p};t)=\sum _{m\ge 0}\sum _{|\alpha |=m} c^{(2)}_{m,\alpha }(\mathbf {p}) X^{(2)}_{m,\alpha }(\mathbf {x})e^{-\lambda _m^{(2)}t}, \end{aligned}$$

where \(c^{(2)}_{m,\alpha }(\mathbf {p})\) is uniquely defined. We represent \(u_1(\mathbf {x};t)\) by

$$\begin{aligned} u_1^{0,1}(x^1,0;t)&=\sum _{m\ge 0}a^{0,1}_{m}(x^1) e^{-\lambda ^{(1)}_m t}; \end{aligned}$$
(18)
$$\begin{aligned} u_1^{0,2}(0,x^2;t)&=\sum _{m\ge 0}a^{0,2}_{m}(x^2) e^{-\lambda ^{(1)}_m t}; \end{aligned}$$
(19)
$$\begin{aligned} u_1^{1,2}(x^1,1-x^1;t)&=\sum _{m\ge 0}a^{1,2}_m(x^1) e^{-\lambda ^{(1)}_m t}, \end{aligned}$$
(20)

where the coefficients \(a^{\cdot ,\cdot }_{m}(x^1)\) are defined as follows:

Putting

$$\begin{aligned} \psi _n(x^1):=x^1(1-x^1)X^{(1)}_n(x^1), \end{aligned}$$

we note that \(\psi _n(0)=\psi _n(1)=0\) and

$$\begin{aligned} L_2^*\psi _n(x^1)=-\lambda _n^{(1)}\psi _n(x^1). \end{aligned}$$

It follows that

$$\begin{aligned} \begin{aligned} \Big [u_t,\psi _n(x^1)\Big ]_2&=\Big [u,L_2^*(\psi _n(x^1))\Big ]_2\\&=-\lambda _n^{(1)}\Big [u, \psi _n(x^1)\Big ]_2. \end{aligned} \end{aligned}$$

Therefore

$$\begin{aligned} \begin{aligned} \psi _n&(p^1)e^{-\lambda _n^{(1)}t}=\Big [u(0),\psi _n(x^1)\Big ]_2 e^{-\lambda _n^{(1)}t}=\Big [u,\psi _n(x^1)\Big ]_2\\&=\Big (u_2,\psi _n(x^1)\Big )_2+\Big (u_1,\psi _n(x^1)\Big )_1+(u_0,\psi _n(x^1))_0\\&=\sum _{m\ge 0}\sum _{|\alpha |=m} c^{(2)}_{m,\alpha } \Big (X^{(2)}_{m,\alpha },\psi _n(x^1)\Big )_2 e^{-\lambda _m^{(2)}t}+\sum _{m\ge 0}\Big (a_{m}(x^1),\psi _n(x^1)\Big )_1 e^{-\lambda ^{(1)}_m t}\\&\text {where } a_m(x^1):=a^{0,1}_{m}(x^1)+a^{1,2}_m(x^1) \text { and note that } \psi _n(0)=\psi _n(1)=0\\&=\Big (a_{0}(x^1),\psi _n(x^1)\Big )_1 e^{-\lambda ^{(1)}_0 t}+\sum _{m\ge 1}\Bigg \{\Big (a_{m}(x^1),\psi _n(x^1)\Big )_1\\&\quad + \sum _{|\alpha |=m-1} c^{(2)}_{m,\alpha } \Big (X^{(2)}_{m,\alpha },\psi _n(x^1)\Big )_2\Bigg \}e^{-\lambda ^{(1)}_m t}\\&\quad \text {(because of } \lambda ^{(1)}_m=\lambda ^{(2)}_{m-1}). \end{aligned} \end{aligned}$$

We obtain by equating the coefficients in terms of \(e^{-\lambda t}\)

$$\begin{aligned} \Big (a_{0}(x^1),\psi _n(x^1)\Big )_1= & {} \delta _{0,n} \psi _n(p^1)\nonumber \\ \Big (a_{m}(x^1),\psi _n(x^1)\Big )_1= & {} \delta _{m,n} \psi _n(p^1)- \sum _{|\alpha |=m-1} c^{(2)}_{m-1,\alpha } \Big (X^{(2)}_{m-1,\alpha },\psi _n(x^1)\Big )_2,\quad \text {if }m\ge 1.\nonumber \\ \end{aligned}$$
(21)

Remark 3.1

The coefficients of \(u_2\) occur in the representation of the coefficients of \(u_1\) because of the probability flux.

Similarly because of

$$\begin{aligned} L_2^*(x^1)= & {} 0,\\ \Big [u_t,x^1\Big ]_2= & {} \Big [u,L_2^*(x^1)\Big ]_2=0. \end{aligned}$$

We have

$$\begin{aligned} \begin{aligned} p^1&=\Big [u(0),x^1\Big ]_2=\Big [u,x^1\Big ]_2\\&=\Big (u_2,x^1\Big )_2+\Big (u_1,x^1\Big )_1+(u_0,x^1)_0. \end{aligned} \end{aligned}$$

Thus,

$$\begin{aligned} \begin{aligned} u_0^1(\mathbf {p};t)&=p^1-\sum _{m\ge 0}\sum _{|\alpha |=m} c^{(2)}_{m,\alpha }(\mathbf {p}) \Big (X^{(2)}_{m,\alpha },x^1\Big )_2 e^{-\lambda _m^{(2)}t}\\&\qquad -\sum _{m\ge 0}\Big (a^{0,1}_{m},x^1\Big )_1 e^{-\lambda ^{(1)}_m t}\\&\qquad -\sum _{m\ge 0}\Big (a^{1,2}_{m},x^1\Big ) e^{-\lambda ^{(1)}_m t}\\&=p^1- \Big (a_{0}(x^1),x^1\Big )_1 e^{-\lambda ^{(1)}_0 t}-\sum _{m\ge 1}\Bigg \{\Big (a_{m}(x^1),x^1\Big )_1\\&\quad + \sum _{|\alpha |=m-1} c^{(2)}_{m-1,\alpha } \Big (X^{(2)}_{m-1,\alpha },x^1\Big )_2\Bigg \}e^{-\lambda ^{(1)}_m t}. \end{aligned} \end{aligned}$$

The expectation for the absorption time of having only 1 allele is

$$\begin{aligned} \begin{aligned} \mathbb {E}(T^{1}_{3}(\mathbf {p}))&=\int _0^\infty t\phi (t,\mathbf {p})dt\\&=\int _0^\infty t\frac{\partial }{\partial t}\Big ( u_0^1(\mathbf {p};t)+u_0^2(\mathbf {p};t)+u_0^0(\mathbf {p};t)\Big )dt. \end{aligned} \end{aligned}$$

We first calculate the first term; the other terms will be obtained similarly. To do this, we expand \(x^1\) in terms of the \(\psi _n(x^1)\)

$$\begin{aligned} x^1=\sum _{n\ge 0} d_n \psi _n(x^1). \end{aligned}$$

We construct a sequence of entropy functions on [0, 1] as follows.

  • \(E_0(x)=-x\)

  • \(E_r(x)\) is the unique solution of the boundary value problem

    $$\begin{aligned} \left\{ \begin{array}{l} L_1^*(E_r(x))=-r E_{r-1}(x)\\ E_r(0)=E_r(1)=0. \end{array}\right. \end{aligned}$$

By some simple calculations, we obtain some first entropy functions

  1. (1)

    \(E_0(x)=-x\)

  2. (2)

    \(E_1(x)=-2(1-x) \log (1-x) \)

  3. (3)

    \(E_2(x)=-8xz(x)+8(1-x)\log (1-x)\)

  4. (4)

    \(E_3(x)=48(1-x)u(x)+96[xz(x)-(1-x)\log (1-x)]\)

where

$$\begin{aligned} z(x)=\int _x^1 \frac{ln(1-y)}{y}dy, \quad u(x)=\int _x^1 \frac{z(y)}{1-y}dy. \end{aligned}$$

Lemma 3.2

The entropy functions satisfy

$$\begin{aligned} \frac{\Big (X_{m}^{(1)},x^1\Big )_1}{\lambda _m^{(1)}}=\Big (E_1(x^1),X_{m}^{(1)}\Big )_1,\\ \frac{2\Big (X_{m}^{(1)},x^1\Big )_1}{\Big (\lambda _m^{(1)}\Big )^2}=\Big (E_2(x^1),X_{m}^{(1)}\Big )_1, \end{aligned}$$

and more generally,

$$\begin{aligned} \frac{r! \Big (X_{m}^{(1)},x^1\Big )_1}{\Big (\lambda _m^{(1)}\Big )^r}=\Big (E_r(x^1),X_{m}^{(1)}\Big )_1, \quad r\ge 2. \end{aligned}$$

Proof

We have

$$\begin{aligned} \begin{aligned} \lambda _m^{(1)}\Big (E_1(x^1),X_{m}^{(1)}\Big )_1&= \Big (E_1(x^1),\lambda _m^{(1)}X_{m}^{(1)}\Big )_1\\&=\Big (E_1(x^1),-L_1\Big (X_{m}^{(1)}\Big )\Big )_1\\&=\Big (-L_1^*\Big (E_1(x^1)\Big ),X_{m}^{(1)}\Big )_1,\quad \text {because of } E_1(0)=E_1(1)=0 \\&=\Big (-L_1^*\Big (E_1(x^1)\Big ),X_{m}^{(1)}\Big )_1\\&=\Big (x^1,X_{m}^{(1)}\Big )_1. \end{aligned} \end{aligned}$$

Similarly we have

$$\begin{aligned} \begin{aligned} \Big (\lambda _m^{(1)}\Big )^2\Big (E_2(x^1),X_{m}^{(1)}\Big )_1&= \lambda _m^{(1)}\Big (E_2(x^1),\lambda _m^{(1)} X_{m,\alpha }^{(1)}\Big )_1\\&= \lambda _m^{(1)}\Big (E_2(x^1),-L_1\Big (X_{m,\alpha }^{(1)}\Big )\Big )_1\\&=\lambda _m^{(1)}\Big (-L_1^*\Big (E_2(x^1)\Big ),X_{m}^{(1)}\Big )_1,\quad \text {because of } E_2(0)=E_2(1)=0\\&=\lambda _m^{(1)}\Big (-L_1^*\Big (E_2(x^1)\Big ),X_{m}^{(1)}\Big )_1\\&=\lambda _m^{(1)}\Big (2E_1(x^1),X_{m}^{(1)}\Big )_1\\&=\Big (2x^1,X_{m}^{(1)}\Big )_1,\quad \text {because of the above calculation}. \end{aligned} \end{aligned}$$

The proof for all r is similar. \(\square \)

From the Lemma, we can expand \(E_1(x^1)\) as

$$\begin{aligned} E_1(x^1)=\sum _{n\ge 0} \frac{d_n}{\lambda _n^{(1)}} \psi _n(x^1). \end{aligned}$$

Therefore we have

$$\begin{aligned}&\int _0^\infty t\frac{\partial u_0^1(\mathbf {p};t)}{\partial t} dt\\&\quad =\Big (a_{0}(x^1),x^1\Big )_1 \int _0^\infty t \lambda ^{(1)}_0 e^{-\lambda ^{(1)}_0 t}dt \\&\qquad +\sum _{m\ge 1}\Bigg \{\Big (a_{m}(x^1),x^1\Big )_1+ \sum _{|\alpha |=m-1} c^{(2)}_{m-1,\alpha } \Big (X^{(2)}_{m-1,\alpha },x^1\Big )_2\Bigg \} \int _0^\infty t \lambda ^{(1)}_m e^{-\lambda ^{(1)}_m t}dt\\&\quad =\frac{\Big (a_{0}(x^1),x^1\Big )_1}{\lambda ^{(1)}_0}+\sum _{m\ge 1}\frac{\Big (a_{m}(x^1),x^1\Big )_1+ \sum _{|\alpha |=m-1} c^{(2)}_{m-1,\alpha } \Big (X^{(2)}_{m-1,\alpha },x^1\Big )_2}{\lambda ^{(1)}_m}\\&\quad =\sum _{n\ge 0} d_n \left\{ \frac{\Big (a_{0}(x^1),\psi _n(x^1)\Big )_1}{\lambda ^{(1)}_0}+\sum _{m\ge 1}\frac{\Big (a_{m}(x^1),\psi _n(x^1)\Big )_1+ \sum _{|\alpha |=m-1} c^{(2)}_{m-1,\alpha } \Big (X^{(2)}_{m-1,\alpha },\psi _n(x^1)\Big )_2}{\lambda ^{(1)}_m}\right\} \\&\quad =\sum _{n\ge 0} d_n \Bigg \{\frac{\delta _{0,n}\psi _n(p^1)}{\lambda ^{(1)}_0}+\sum _{m\ge 1}\frac{\delta _{m,n}\psi _n(p^1)}{\lambda ^{(1)}_m}\Bigg \},\quad \text {because of (21)}\\&\quad =\sum _{m\ge 0}\frac{d_m}{\lambda _m^{(1)}} \psi _m(p^1)\\&\quad =E_1(p^1). \end{aligned}$$

Thus, we have

$$\begin{aligned} \mathbb {E}(T^{1}_{3}(\mathbf {p}))= E_1(p^1)+E_1(p^2)+E_1(p^3). \end{aligned}$$

Remark 3.3

We can obtain the rth moments of this absorption time by the same method, i.e.

$$\begin{aligned} \mathbb {E}(T^{1}_{3}(\mathbf {p}))^r= E_r(p^1)+E_r(p^2)+E_r(p^3). \end{aligned}$$

The Probability Distribution of the Absorption Time for Having \(k+1\) Alleles

We note that \(X_{T^{k+1}_{n+1}(p)}\) is a random variable valued in \(\overline{V_k}\). We consider the probability that this random variable takes its value in \(V_k^{(i_0,\ldots ,i_k)}\), i.e., the probability of the population at the first time having at most \(k+1\) alleles to consist precisely of the \(k+1\) alleles \(\{A_{i_0},\ldots ,A_{i_k}\}\). Let \(g_k\) be a function of k variables defined inductively by

$$\begin{aligned} \begin{aligned} g_1(p^1)&=p^1;\\ g_2(p^1,p^2)&=\frac{p^1}{1-p^2}g_1(p^2)+\frac{p^2}{1-p^1}g_1(p^1);\\ g_{k+1}(p^1,\ldots ,p^{k+1})&=\sum \limits _{i=1}^{k+1}\frac{p^i}{1-\sum \nolimits _{j\ne i}p^j}g_k(p^1,\ldots ,p^{i-1},p^{i+1},\ldots ,p^{k+1}). \end{aligned} \end{aligned}$$

Then we have

Theorem 3.4

$$\begin{aligned} \mathbb {P}\left( X_{T^{k+1}_{n+1}(p)} \in \overline{V^{(i_0,\ldots ,i_k)}_k}\right) =g_{k+1}(p^{i_0},\ldots ,p^{i_k}). \end{aligned}$$

Proof

Method 1: By proving that

$$\begin{aligned} \mathbb {P}\left( X_{T^{k+1}_{n+1}(p)} \in \overline{V^{(i_0,\ldots ,i_k)}_k}|X_{T^{k}_{n+1}(p)} \in \overline{V^{(i_1,\ldots ,i_k)}_k}\right) =\frac{p^{i_0}}{1-p^{i_1}-\cdots -p^{i_k}} \end{aligned}$$

and elementary combinatorial arguments, we immediately obtain the result (see [19]).

Method 2: By proving that it is the unique solution of the classical Dirichlet problem

$$\begin{aligned} \left\{ \begin{array}{l} (L_k^{(i_0,\ldots ,i_k)})^* v(p)=0\ \quad \text { in } V_k\\ \lim \limits _{p\rightarrow q} v(p)=1,\quad q\in V_k^{(i_0,\ldots ,i_k)},\\ \lim \limits _{p\rightarrow q} v(p)=0,\quad q\in \partial V_k \backslash V_k^{(i_0,\ldots ,i_k)} \backslash V_{k-1}. \end{array}\right. \end{aligned}$$

\(\square \)

The Probability of Having Exactly \(k+1\) Alleles

The probability of having only the particular allele \(A_i\) is (see [12]):

$$\begin{aligned} \begin{aligned} \mathbb {P}(X_t\in V^{(i)}_0|X_0=\mathbf {p})&=\int \limits _{V^{(i)}_{0}}u_0^{(i)}(\mathbf {x},t)d\mu ^{(i)}_0(\mathbf {x})\\&=u^{(i)}_0(e_i,t)\\&= p^i-\sum \limits _{k=1}^n\sum \limits _{m^{(k)} \ge 0} \sum _{l^{(k)}\ge 0} \sum \limits _{|\alpha ^{(k)}|=l^{(k)}}c^{(k)}_{m^{(k)},l^{(k)},\alpha ^{(k)}} \Big (x^i, X^{(k)}_{l^{(k)},\alpha ^{(k)}}\Big )_k e^{-\lambda ^{(k)}_{m^{(k)}} t}. \end{aligned} \end{aligned}$$

The probability of having exactly the \((k+1)\) alleles \(\{A_0,\ldots ,A_k\}\) (the coexistence probability of alleles \(\{A_0,\ldots ,A_k\}\)) is (see [16, 20]):

$$\begin{aligned} \begin{aligned} \mathbb {P}(X_t\in V^{(i_0,\ldots ,i_k)}_k|X_0=\mathbf {p})&=\int \limits _{V^{(i_0,\ldots ,i_k)}_k}u^{(i_0,\ldots ,i_k)}_k(\mathbf {x},t)d\mu ^{(i_0,\ldots ,i_k)}_k(\mathbf {x})\\&=\sum \limits _{m\ge 0}\sum _{l\ge 0}\sum \limits _{|\alpha |=l}c^{(k)}_{m,l,\alpha }\left( \int \limits _{V^{(i_0,\ldots ,i_k)}_k} X^{(k)}_{m,\alpha }(\mathbf {x})d\mu ^{(i_0,\ldots ,i_k)}_k(\mathbf {x})\right) e^{-\lambda _m^{(k)}t}. \end{aligned} \end{aligned}$$

The \(\alpha \mathrm{th}\) Moments

The \(\alpha \mathrm{th}\)-moments are (see [1517]):

$$\begin{aligned} \begin{aligned} m_\alpha (t)&= [u,\mathbf {x}^\alpha ]_n\\&=\int \limits _{\overline{V_n}}x^\alpha u(\mathbf {x},t)d\mu (\mathbf {x})\\&=\sum \limits _{k=0}^n \sum \limits _{(i_0,\ldots ,i_k)\in I_k} \int \limits _{V^{(i_0,\ldots ,i_k)}_k}\mathbf {x}^\alpha u_k^{(i_0,\ldots ,i_k)}(\mathbf {x},t)d\mu ^{(i_0,\ldots ,i_k)}_k(\mathbf {x}). \end{aligned} \end{aligned}$$

The Probability of Heterogeneity

The probability of heterogeneity is (see [16]):

$$\begin{aligned} \begin{aligned} H_t&= (n+1)! \, [u,w_n]_n\\&= (n+1)! \, (u_n,w_n)_n \quad \text { (because }w_n \text {vanishes on the boundary)}\\&= (n+1)! \, \Big (\sum _{m\ge 0} \sum _{|\varvec{\alpha }|=m} c^{(n)}_{m,\varvec{\alpha }} X^{(n)}_{m,\varvec{\alpha }} e^{-\lambda ^{(n)}_{m,\varvec{\alpha }}t}, w_n X^{(n)}_{0,\varvec{0}}\Big )_n\\&= (n+1)! \, \Big (c^{(n)}_{0,\varvec{0}} X^{(n)}_{0,\varvec{0}}, w_n X^{(n)}_{0,\varvec{0}}\Big )_n \, e^{-\lambda ^{(n)}_{0,\varvec{0}}t} \\&\quad \text { (because of the orthogonality of the eigenvectors } X^{(n)}_{m,\varvec{\alpha }})\\&= H_0\, e^{-\frac{(n+1)(n+2)}{2}t}\ . \end{aligned} \end{aligned}$$

The Rate of Loss of One Allele in a Population Having \(k+1\) Alleles

We have the solution of the form

$$\begin{aligned} u=\sum _{k=0}^n u_{k}(\mathbf {x},t) \chi _{V_k}(\mathbf {x}) \end{aligned}$$

The rate of loss of one allele in a population with \(k+1\) alleles equals the rate of decrease of

$$\begin{aligned} u_{k}(\mathbf {x},t)=\sum \limits _{m \ge 0} \sum _{l\ge 0}\sum \limits _{|\alpha |=l}c^{(k)}_{m,l,\alpha } X^{(k)}_{l,\alpha }(x)\chi _{V_k}(x) e^{-\lambda ^{(k)}_m t}. \end{aligned}$$

which is \(\lambda ^{(k)}_0=\frac{k(k+1)}{2}\). This means that the rate of loss of alleles in the population decreases as k gets smaller in the course of the process (see [10, 13, 16]).

Population Genetics

The Wright–Fisher model as the basic theoretical model in population genetics then can also be applied to population genetics data, although in most cases, the basic model needs to be suitably extended. For instance, single nucleotide polymorphisms (SNPs) can be modeled in this way. As the name indicates, in a SNP, there is a genetic variant in the population that differs from the rest of the population at a single nucleotide position in the genetic sequence. Thus, we may consider the background sequence and its variant as two different alleles, \(A_1, A_2\), and ask for the chances of the variant allele \(A_2\) to sweep the population, for the expected when this happens or when the variant will go extinct, etc., and apply the formulae given in Sect. 3. A detailed bioinformatical analysis of SNPs in the human genome was carried out in [21]. In that paper, the effects of recombination and of varying population size were analyzed, again on the basis of the multinomial distribution as in Sect. 2.2. In particular, a population bottleneck in the human history around 40,000 years ago could be identified on the basis of the current SNP distributions in the human population. For our methods to apply, we thus need to extend the basic Wright–Fisher model to include recombination—which we shall present in another paper—as well as varying population size. Similarly, data about mitochondrial DNA [24] indicate a population bottleneck in human history. Since mitochondrial DNA is solely inherited from the mother, we can treat this as a haploid Wright–Fisher model without recombination. Again, however, to be applicable, our model needs to be extended to handle variable population sizes.

Conclusion

We have developed a new global solution concept for the Fokker–Planck equation associated with the Wright–Fisher model, and we have proved the existence and uniqueness of this solution (Theorem 2.9). From this solution, we can easily read off the properties of the considered process, like the absorption time of having \(k+1\) alleles, the probability of having exactly \(k+1\) alleles, the \(\alpha \mathrm{th}\) moments, the probability of heterogeneity, and the rate of loss of one allele in a population having \(k+1\) alleles.