Mathematical methods in biology and medicine have been developed and have been used since a long time. Examples cover a variety of experimental contexts, and a broad spectrum of mathematical theories and techniques. An actual question for talented young mathematicians, who are knowledgeable and curious about biology and medicine, and who have strong skills in one or several branches of mathematics is, where to turn to for interesting, important, and challenging problems in the life-sciences, to which mathematics can give a sensible input.

There exists a plethora of names and research branches for theoretical approaches, mathematical modeling and mathematical analysis in the bio-medical sciences, e.g. Theoretical Biology, Bioinformatics, Biomathematics, Biophysics, Mathematical Biology, Theoretical Biosciences, Computational Biology, Mathematics in the Life-Sciences, and many others. But so far there is no clear distinction of which type of research is done in which area specifically. Thus it is very important to take a closer look individually. Crucial questions are:

  • Are experimental questions of deeper relevance for biology or medicine addressed?

  • Are specific or profound mathematical tools useful and necessary in that context?

  • Do these mathematical methods help for a better understanding of the observed phenomena or further predictions?

Just because one is able to use or to develop certain mathematical techniques it does not always make sense to impose those onto each problem one is exposed to. On the other hand, many researchers are still very hesitant to develop new or to use deep mathematical techniques in the life-sciences, even though those methods could be extremely useful to shed some more light on important experimental findings.

Highly interdisciplinary cooperations are already quite successful, even though a crosswise quality control is not always possible. On the one hand side it becomes more and more popular to use computer programs in order to solve a problem, without knowing about all theoretical backbones and abstractions behind the respective code. This approach does, in a strict sense, not allow to infer scientific conclusions from the obtained results. On the other hand, not knowing by which materials and methods experimental results have been obtained is similarly “risky” for an interpretation or judgement of the relevance of the obtained results. Nevertheless, progress is made, but quality standards for crosswise information should be developed and further education should be improved in this direction in the future.

Among the many fundamental questions in the life-sciences, there exists a wealth of problems of mathematical interest and challenge. Quite often interesting life-science problems lay a natural basis for new and interesting mathematics to be developed. Further, mathematics might be able by its methodologies and concepts to suggest new hypotheses for the functioning of biological systems, in a similar way as it has proven to be very successful for instance in physics, sometimes even before any experiment had been conducted.

Instead of giving examples of new biological problems and their mathematical modeling, which I personally would consider interesting and worth doing mathematical analysis for, I rather prefer to summarize two older and two more modern classical examples of mathematical modeling and analysis in the life-sciences. These are the results by Turing on diffusion-reaction systems in morphogenesis, Hodkin’s and Huxley’s model for the initiation and propagation of action potentials in nerve, first rigorous chemotaxis models, and the mathematical analysis of molecular sequence alignment. Those contributions have already proven to be fundamental and they show the impact which mathematical considerations and abstractions can have on the life-sciences. Simply by being combined here, they may help to develop a personal taste for possible future research. These examples also include abstract concepts on how one could think about biological problems. This mathematical, conceptual approach within the life-sciences is not yet developed as prominently as it principally could be, even though it has already proven to be very useful.

Why should we be looking specifically at these four examples?

First, I have chosen contributions closer to my own technical expertise. Second, some fundamental and highly cited contributions are presented here, which have initiated and still do initiate a lot of follow up research.

Nearly all formulas and most text passages are directly taken from the original papers cited in the references without this further being indicated. The references to other related literature, which is huge, is kept minimal. The main aim of this note is to create a renewed interest in reading the original papers.

1 Turing

In his 1952 paper on The Chemical Basis of Morphogenesis, [15], A.M. Turing discusses possible chemical mechanisms for pattern formation in early morphogenesis.

“It is suggested that a system of chemical substances, called morphogens, reacting together and diffusing through a tissue, is adequate to account for the main phenomena of morphogenesis. Such a system, although it may originally be quite homogeneous, may later develop pattern or structure due to an instability of the homogeneous equilibrium, which is triggered off by random disturbances …. The purpose of this paper is to discuss possible mechanisms by which genes of a zygote may determine the anatomical structure of the resulting organism … a mathematical model of the growing embryo will be described. This model will be a simplification and an idealization, and consequently a falsification. It is to be hoped that the features retained for discussion are those of greatest importance in the present stage of knowledge.”

Turing analyzes two situations. In one of his mathematical models the cells within the tissue are idealized as geometrical points. In his other model the matter of the tissue is assumed to be continuously distributed. He then attaches a mechanical and a chemical state to each cell within the system. As a first approximation, the mechanical aspects of morphogenesis are ignored in the mathematical models in [15], and situations are considered where chemical aspects seem to be the most significant. Turing was very aware, that mechanical and electrical aspects also play an important role in development.

In [15] the mathematical tissue models are considered in a time frame, where the cells are neither growing nor dividing, but during which certain substances are reacting chemically and are diffusing through the tissue. Turing calls these substances morphogens, which could e.g. be hormones or skin pigments. He does not intend to specify these morphogens but rather aims to describe important functional behaviors they may have. A detailed discussion about possible realistic mechanisms and examples in developmental biology are included in his paper.

Without cell walls being present, the diffusibility of the chemicals within the “tissue” would be inversely proportional to the square root of their respective molecular weight. This assumption is used as a first approximation. The reaction rates are assumed to obey the law of mass action. Considering \(N\) cells and \(M\) morphogens, the state of the system is given by \(M\cdot N\) numbers, namely the quantities of the \(M\) morphogens in each of the \(N\) cells, which change in time.

The embryo in its blastula stage is considered to be spherically symmetric. Deviations from the spherical symmetry are naturally to be expected and may greatly vary from embryo to embryo within a species. The specific type of these deviations is not so important but rather the fact that there exist some deviations. The system may reach a state of instability in which these irregularities tend to grow. By this a new and stable equlibrium can be reached, where the initial symmetry is entirely gone. The number of such new equilibria is usually not as large as the number of irregularities giving rise to them, i.e. the direction of the axis of the gastrula can vary, but the phenomenon of gastrulation otherwise takes place in a similar fashion in different organisms of the same species. Therefore Turing suggested to look for a breakdown of symmetry of homogeneity in this context. Since systems naturally tend to leave unstable equilibria they can not often be observed being in those states. Such unstable equilibria do occur when a system changes from a stable equilibrium into an unstable one, triggered by external events.

First, in order to stress the basic strategy of his ideas, Turing considered two crucial morphogens \(X\), \(Y\) acting on a ring of \(N\) cells and onto each other. The morphogen concentrations within each cell are denoted by \(X_{r}\) and \(Y_{r}\), \(r = 1, \ldots , N \). It is assumed that each cell \(r\) exchanges these two signals diffusively with its nearest neighbors \(r-1\) and \(r+1\). Then

$$\begin{aligned} \frac{dX_{r}}{dt} =& f(X_{r}, Y_{r}) + \mu(X_{r+1} - 2 X_{r} + X_{r-1}), \end{aligned}$$
(1)
$$\begin{aligned} \frac{dY_{r}}{dt} =& g(X_{r}, Y_{r}) + \nu(Y_{r+1} - 2 Y_{r} + Y_{r-1}), \end{aligned}$$
(2)

where cell \(N+1\) is identified with cell 1 and cell 0 with cell \(N\). Further, \(\mu\), \(\nu\) are the cell-to-cell diffusion constants for \(X\) and \(Y\), and \(f\) and \(g\) are the respective reaction mechanisms. If \(f(h,k)=g(h,k)=0\), then each single cell \(r\) in the ring of cells is in equilibrium, containing a respective signal concentration \(X_{r}=h\) and \(Y_{r}=k\). Now assuming that the system is not far from such an equilibrium, let \(X_{r} = h + x_{r}\) and \(Y_{r}= k + y_{r}\) with small, non constant perturbations \(x_{r}\) and \(y_{r}\). Then, by Taylor expansion one obtains \(f(h+x_{r}, k + y_{r}) \approx a x_{r} + b y_{r}\), \(g(h+x_{r}, k + y_{r}) \approx cx_{r} + d y_{r}\), where higher powers of \(x_{r}\) and \(y_{r}\) are neglected. Then approximately one has

$$\begin{aligned} \frac{dx_{r}}{dt} =& a x_{r} + b y_{r} + \mu(x_{r+1} - 2x_{r} + x_{r-1}), \end{aligned}$$
(3)
$$\begin{aligned} \frac{dy_{r}}{dt} =& c x_{r} + d y_{r} + \nu(y_{r+1} - 2 y_{r} + y_{r-1}). \end{aligned}$$
(4)

With the ansatz

$$\begin{aligned} x_{r} = \sum_{s=0}^{N-1} \exp [{2 \pi i rs}/{N} ] \xi_{s},\qquad y_{r} = \sum_{s=0}^{N-1} \exp [{2\pi i rs}/{N} ] \eta_{s}, \end{aligned}$$
(5)

one obtains, after some calculations, that

$$\begin{aligned} \frac{d\xi_{s}}{dt} =& \bigl[ a - 4\mu\sin^{2}(\pi s/N) \bigr] \xi_{s} + b\eta_{s}, \end{aligned}$$
(6)
$$\begin{aligned} \frac{d\eta_{s}}{dt} =& c\xi_{s} + \bigl[ d - 4\nu\sin^{2}(\pi s/N) \bigr] \eta_{s} . \end{aligned}$$
(7)

Let \(p_{s}\) and \(\tilde{p}_{s}\) denote the roots of

$$ \bigl[p - a + 4\mu\sin^{2}(\pi s/N)\bigr] \cdot\bigl[p - d + 4 \nu \sin^{2}(\pi s/N)\bigr] = bc $$
(8)

with \(\operatorname{Re} p_{s} \geq \operatorname{Re} \tilde{p}_{s}\). Then one gets

$$\begin{aligned} &\xi_{s} = A_{s} \exp(p_{s} t) + B_{s} \exp(\tilde{p}_{s} t), \qquad\eta_{s} = C_{s} \exp(p_{s} t) + D_{s} \exp(\tilde{p}_{s} t),\quad \mbox{where} \\ &\quad A_{s} \bigl[ p_{s} - a + 4 \mu\sin^{2}(\pi s/N) \bigr] = bC_{s},\ B_{s} \bigl[ \tilde{p}_{s} - a + 4 \mu\sin^{2}(\pi s/N)\bigr] = b D_{s} \end{aligned}$$
(9)

and respective conditions resulting from the equation for \(\eta_{s}\). Thus

$$\begin{aligned} X_{r} =& h + \sum_{s=1}^{N} \bigl[A_{s} \exp(p_{s} t) + B_{s} \exp(\tilde{p}_{s} t) \bigr] \exp(2\pi i rs/N) , \end{aligned}$$
(10)
$$\begin{aligned} Y_{r} =& k + \sum_{s=1}^{N} \bigl[C_{s} \exp(p_{s} t) + D_{s} \exp(\tilde{p}_{s} t) \bigr] \exp(2\pi i rs/N) . \end{aligned}$$
(11)

These expressions will further be simplified. But before doing so, analogous formulas for the continuous approximation of the discrete ring of tissue are derived. It turns out that the qualitative behavior/structure forming properties for both models are similar. Thus a continuous approximation is reasonable for the originally discrete model.

Consider a circle of radius \(\rho\) around the origin. Let \(\theta\) be the angle at which a vector pointing from the origin towards a point on this circle deviates from a fixed reference vector of this type. Let

$$\begin{aligned} \frac{\partial X}{\partial t} =& a(X-h) + b(Y-k) + \frac{\tilde{\mu}}{\rho^{2}} \frac{\partial^{2} X}{\partial\theta^{2}}, \end{aligned}$$
(12)
$$\begin{aligned} \frac{\partial Y}{\partial t} =& c(X-h) + d(Y-k) + \frac{\tilde{\nu}}{\rho^{2}} \frac{\partial^{2} Y}{\partial\theta^{2}}, \end{aligned}$$
(13)

where \(\mu= \tilde{\mu}N^{2}/(2\pi\rho)^{2}\), \(\nu= \tilde{\nu}N^{2}/(2\pi\rho)^{2}\) for the diffusivities of the two morphogens. As before \(a,b,c,d\) are the values at equilibrium of \(\partial f /\partial X\), \(\partial f / \partial Y\), \(\partial g / \partial X\), \(\partial g /\partial Y\). This system can be obtained as the limiting case of the discrete model. Its general solution are

$$\begin{aligned} X =& h + \sum_{s=-\infty}^{\infty}\bigl[ A_{s} \exp(p_{s} t) + B_{s} \exp(\tilde{p}_{s} t) \bigr] \exp(is\theta), \end{aligned}$$
(14)
$$\begin{aligned} Y =& k + \sum_{s=-\infty}^{\infty}\bigl[C_{s} \exp(p_{s} t) + D_{s} \exp(\tilde{p}_{s} t) \bigr] \exp(is\theta), \end{aligned}$$
(15)

where now \(p_{s}\) and \(\tilde{p}_{s}\) denote the roots of

$$ \bigl[p-a +\tilde{\mu}s^{2}/\rho^{2}\bigr] \bigl[p-d + \tilde{\nu}s^{2}/\rho^{2}\bigr] = bc $$
(16)

and fulfill \(A_{s}(p_{s} - a + \tilde{\mu}s^{2}/\rho^{2})=bC_{s}\), \(B_{s} ( \tilde{p}_{s} - a + \tilde{\mu}s^{2}/ \rho^{2} ) = b D_{s}\). These solutions are also a limiting case of the solutions in the discrete setting.

Now looking again at the discrete setting, the asymptotic behavior of (10), (11) will be dominated by those terms within the sums for which the corresponding \(p_{s}\) has the largest real part. Let \(p_{s_{0}}\) be related to one of those leading terms, then also \(p_{N-s_{0}}= p_{s_{0}}\) is related to one of them, since \(\sin^{2}(\pi(N-s_{0}) /N) = \sin^{2}(\pi s_{0}/N)\). We distinguish two cases, namely when \(p_{s_{0}}\) is real and when it is complex, and call these cases stationary, respectively oscillatory. Excluding certain specific situations, the asymptotic behavior of the solution is as follows:

If the value of \(p_{s_{0}}\) for one of the dominating terms is real, then asymptotically

$$X_{r} = h +2 \operatorname{Re} A_{s_{0}} \exp(It + 2\pi i s_{0} r/N) , \quad\ Y_{r} = k + 2 \operatorname{Re} C_{s_{0}} \exp(It + 2\pi i s_{0} r/N) . $$

This is the stationary case, where \(I\) denotes the real part of \(p_{s_{0}}\), indicating the instability.

If the value of \(p_{s_{0}}\) for one of the dominating terms is complex, then asymptotically

$$\begin{aligned} X_{r} =& h + 2 \exp(It) \operatorname{Re} \bigl\{ A_{s_{0}} \exp ( i \omega t + 2 \pi i s_{0} r/N ) \\ &\phantom{h + 2 \exp(It) \operatorname{Re} \bigl\{ }{} + A_{N-s_{0}} \exp ( - i\omega t - 2 \pi i s_{0} r/N ) \bigr\} ,\\ Y_{r} =& k + 2 \exp(It) \operatorname{Re} \bigl\{ C_{s_{0}} \exp ( i \omega t + 2\pi i {s_{0}} r/{N} )\\ &\phantom{k + 2 \exp(It) \operatorname{Re} \bigl\{ }{} + C_{N-s_{0}} \exp ( - i \omega t - 2\pi i {s_{0}} r/{N} ) \bigr\} . \end{aligned}$$

Here \(\omega\) denotes the imaginary part of \(p_{s_{0}}\). This is the oscillatory case.

figure a

These formulas can be interpreted in terms of waves. Due to the relation (9) between \(A_{s_{0}}\) and \(C_{s_{0}}\) the pattern of one morphogen defines the pattern of the other. If \(I>0\), then the pattern becomes more pronounced in time.

In the stationary case there are stationary waves on the ring having \(s_{0}\) minima or maxima. Dividing \(s_{0}\) by the circumference \(2\pi\rho\) of the ring of cells one obtains the wavelength for this pattern.

In the oscillatory case two wave trains are moving around the ring in opposite directions. The wave-frequency is \(\omega/(2\pi)\), and the wave velocity equals the wavelength times the wave-frequency.

The wavelengths of the patterns on the ring depend on the chemical data and on the circumference of the ring. Nevertheless, there exists a purely chemical, i.e. true wavelength, which is independent of the radius of the ring. This is the limit to which the wavelength converges, when the ring is successively growing, i.e. the wavelength of that radius, which gives the largest possible instability \(I\).

A number of situations can develop:

  1. (a)

    Stationary case with extreme long wavelength:

    An example for this is \(\mu= \nu= 0.25\), \(b=c=1\), and \(a=d\). Thus both morphogens act in a similar fashion. Here \(p_{s} = a - \sin^{2}( \pi s/N) + 1\) is a real number, being largest for \(s=0\). Therefore the chemical content of all cells is the same, and each of them behaves as if it is isolated. Each cell is in an unstable equilibrium and slips out if it in synchrony with the other cells.

  2. (b)

    Oscillatory case with extreme long wavelength:

    An example for this is again \(\mu= \nu= 0.25\), \(a = d\), but now \(b = - c = 1\). Then \(p_{s} = a - \sin^{2} (\pi s/N) \pm i\). The real part of this complex number is largest when \(s=0\). The behavior is similar to case (a), but now the departure from equilibrium is oscillatory.

  3. (c)

    Stationary waves of extreme short wavelength:

    An example for this is \(v=0\), so no exchange of the second morphogen to neighboring cells takes place, and \(\mu= 1\), \(a= I -1\), \(b= - c = 1\), \(d = I\). Then

    $$p_{s} = I - 0.5 - 2 \sin^{2}(\pi s/N) + \sqrt{ \bigl( 2 \sin^{2}(\pi s/N) + 0.5 \bigr)^{2} - 1}, $$

    which is largest when \(\sin^{2}(\pi s/N)\) is largest. If \(N\) is even, then the morphogen content of each cell is similar to that of the next but one cell, but its content is distinctively different from that of its immediate neighbors. For an odd number of cells, this arrangement is not possible. In this case the morphogen concentration varies from zero to a maximum value from one location to the position diametrically opposite to it.

  4. (d)

    Stationary waves of finite wavelength:

    This case has many important biological implications. An example is \(a = I - 2\), \(b = 2.5\), \(c = - 1.25\), \(d= I + 1.5\), \(\tilde{\mu}= 1\), \(\tilde{\nu}= 0.5\) with \({\mu}/{\tilde{\mu}} = {\nu}/{\tilde{\nu}} = ( {N}/{2 \pi\rho} )^{2}\), and \(\rho\) being the radius of the ring of \(N\) cells. Let \(U := [ N \sin(\pi s /N)/(\pi\rho)]^{2}\), then the roots are calculated from

    $$(p-I)^{2} + (0.5 + 0.75 U) (p - I) + 0.5 (U - 0.5)^{2} = 0. $$

    For \(U=0.5\) and the related \(s_{0}\) there are stationary waves with \(s_{0}\) minima and a wave-length equal to the chemical wave-length \(p_{s_{0}} =I\). For every other \(p_{s}\) we have \(\operatorname{Re} p_{s} < I\). If \(\rho\) is chosen such that \(U = 0.5\) can not be solved for an integral \(s\), then the number of minima will be one of the two nearest integers, usually the nearest to the non-integral solution.

    The two following effects can only occur for three or more morphogens.

  5. (e)

    Oscillatory case with finite wavelength:

    Here genuine traveling waves can occur. A system with three morphogens which corresponds to the previous considerations consists of a \(3 \times3\) matrix \(M\) with \(\det M = 0\) as the analogon to (8). The off-diagonal components of \(M\) are \(m_{ij} = a_{ij}\), \(i\neq j\), and its diagonal elements are \(m_{ii} = a_{ii} - p - \mu_{i} U\), with \(U = [N \sin(\pi s/N) /(\pi\rho)]^{2}\), respectively \(U= [ 2\pi/ \lambda ]^{2}\), where \(\lambda\) is the wavelength. A system of three linear(ized) equations is considered, analogous to (3), (4) and (12), (13), where the \(m_{ij}\) denote the respective coefficients for the reaction terms, instead of \(a,b,c,d\). Parameter values leading to traveling waves are e.g. \(\mu_{1} = 2/3\), \(\mu_{2} = 1/3\), \(\mu_{3} = 0\), for the diffusivities of the three morphogens, and \(a_{11} = -10/3\), \(a_{12} = a_{31}=3\), \(a_{13}= -1\), \(a_{21} = -2\), \(a_{22} = 7/3\), \(a_{23} = a_{33} = 0\), \(a_{32} = - 4\). Consider

    $$\det M = p^{3} + p^{2}(U+1) + p \bigl[1 + 2(U-1)^{2}/9 \bigr] + U + 1 = 0. $$

    For any root \(p\) of this equation, 0 is its maximal real part, thus \(p = \pm i\) and \(U=1\). So there do exist traveling waves. When adding \(I\) to the \(a_{ii}\), \(i=1,2,3\), one obtains the instability \(I\) in place of zero.

  6. (f)

    Oscillatory case with extreme short wavelength:

    Metabolic oscillations, where neighboring cells are nearly \(180^{0}\) out of phase can be obtained only with three morphogens. Example data are \(\mu_{1} = 1\), \(\mu_{2} = \mu_{3} = 0\) for the diffusivities, and \(a_{11} = a_{12} = a_{23} = -1\), \(a_{13} = a_{22} = a_{31} = a_{33} = 0\), \(a_{21} = a_{32}=1\). Consider \(\det M = p^{3} + p^{2}(U+1) + 2p + U + 1 = 0\). For \(U \geq0\) all roots have negative real part. For large \(U\) the real part of \(p\) can approach zero as closely as desired, but never attains zero.

Next some general formulas for the two morphogen case are presented, which allow to directly check given parameter sets on their possible pattern forming effects. Taking the limiting case of a ring of cells of large radius one may write \(U = [N \sin(\pi s/N) / (\pi\rho) ]^{2} = [ {2 \pi}/{\lambda} ]^{2}= [s/\rho]^{2}\) in (8), (16), and obtains for the continuous ring of tissue that

$$( p - a + \tilde{\mu}U) (p - d + \tilde{\nu}U) = bc, $$

whose solutions are

$$p = \frac{a+d}{2} - \frac{\tilde{\mu}+ \tilde{\nu}}{2} U \pm\sqrt{ \biggl( \frac{\tilde{\mu}- \tilde{\nu}}{2} U + \frac{d - a}{2} \biggr)^{2} + bc }. $$

As before, let \(I=I(U)\) denote the real part of \(p\). The corresponding wave length for this instability is \(\lambda= {2 \pi}/{\sqrt {U}}\). The dominant waves correspond to the maximum of \(I(U)\), which may either be attained at \(U=0\) or \(U= \infty\), or at a stationary point on that part of the curve which is hyperbolic, not straight. In this later case at the maximum one has

$$\begin{aligned} p= I =& (d\tilde{\mu}- a\tilde{\nu}- 2 \sqrt{\tilde{\mu}\tilde{\nu}} \sqrt{-bc} ) ( \tilde{\mu}- \tilde{\nu})^{-1}, \\ U =& \bigl( a - d + (\tilde{\mu}+ \tilde{\nu}) \sqrt{-bc} / {\sqrt{\tilde{\mu}\tilde{\nu}}} \bigr) (\tilde{\mu}- \tilde{\nu})^{-1} . \end{aligned}$$

Conditions which then lead to the above mentioned four cases (a), (b), (c), (d) for two morphogens under the assumption that \(\tilde{\nu}\leq\tilde{\mu}\) and \(\tilde{\mu}> 0\) are the following:

  1. (a)

    Stationary waves with extreme long wavelength occur for

    • either \(bc > 0\),

    • or \(bc < 0\) and \(({d-a})/{\sqrt{-bc}} > ({\tilde{\mu}+ \tilde{\nu}})/{\sqrt{\tilde{\mu}\tilde{\nu}}}\),

    • or \(bc <0\) and \(({d-a})/{\sqrt{-bc}} < -2 \).

    Conditions for instability, i.e. structure formation in all three cases are:

    • either \(bc > ad\), or \(a + d > 0\).

  2. (b)

    Oscillations with extreme long wavelength, i.e. synchronized oscillations, occur when \(bc < 0\) and \(- 2 < (d-a)/\sqrt{-bc} < 4 \sqrt{\tilde{\mu}\tilde{\nu}}/(\tilde{\mu}+ \tilde{\nu})\).

    There is an instability, if in addition \(a+d > 0\).

  3. (c)

    Stationary waves of extreme short wave-length occur if \(bc< 0\) and \(0= \tilde{\nu}< \tilde{\mu}\). There is an instability, if in addition \(a+d > 0\).

  4. (d)

    Stationary waves of finite length:

    This case is highly cited in the large body of literature since the publication of Turing’s paper and biologically very relevant.

    It occurs if \(bc < 0\) and \(4 \sqrt{\tilde{\mu}\tilde{\nu}}/(\tilde{\mu}+ \tilde{\nu}) < (d-a) / \sqrt{-bc } < (\tilde{\mu}+ \tilde{\nu}) /\sqrt{\tilde{\mu}\tilde{\nu}}\).

    Due to this chain of inequalities we need \(\tilde{\mu}\neq\tilde{\nu}\).

    An instability occurs if additionally \([d \sqrt{\tilde{\mu}/\tilde{\nu}} - a \sqrt{\tilde{\nu}/\tilde{\mu}} ] / \sqrt{-bc} > 2\).

To obtain a better insight into the dependencies of the involved parameters consider the scaling \(\tilde{\nu}= \gamma\tilde{\mu}\), \(\gamma\in(0,1)\). Then the conditions given above read:

$$\begin{aligned} bc < 0 \quad \mbox{ and}\quad & 4\sqrt{\gamma}/(1+\gamma) < (d-a)/\sqrt{-bc} < (1 +\gamma)/\sqrt{\gamma}\\ \mbox{and}\quad & (d/\sqrt{\gamma} - a \sqrt{\gamma})/\sqrt{-bc} > 2. \end{aligned}$$

Thus for very similar diffusion coefficients, i.e. \(1 \neq\gamma \approx1\), at least one of the reaction rates has to be tuned quite precisely in relation to the other three. Some additional formulas for concrete calculations in this case are

$$\begin{aligned} \textstyle\begin{array}{rcl@{\qquad}rcl} a &=& \tilde{\mu}(\tilde{\nu}- \tilde{\mu})^{-1} ( 2\tilde{\nu}U_{0} + \chi) + I ,& b&=& \tilde{\mu}(\tilde{\nu}- \tilde{\mu})^{-1} \bigl(( \tilde{\mu}+ \tilde{\nu}) U_{0} + \chi\bigr) \alpha,\\ c &=& \tilde{\nu}(\tilde{\mu}- \tilde{\nu})^{-1} \bigl(( \tilde{\mu}+ \tilde{\nu}) U_{0} + \chi\bigr) \alpha^{-1} ,& d &=& \tilde{\nu}(\tilde{\mu}- \tilde{\nu})^{-1} (2 \tilde{\mu}U_{0} + \chi ) + I . \end{array}\displaystyle \end{aligned}$$

Here \(2\pi/\sqrt{U_{0}}\) is the chemical wavelength and \(\chi> 0\), \(\alpha\) are parameters.

So far the naturally occurring disturbances were assumed not to operate continuously and the marginal reaction rates did not change in time. Dropping these assumptions and instead letting the statistical amplitude of the noise disturbances be constant in time one obtains instead of (6), (7) that

$$\begin{aligned} \frac{d\xi}{dt} = \tilde{a} \xi+ b \eta+ R_{1}(t), \qquad \frac{d \eta}{dt} = c \xi+ \tilde{d} \eta+ R_{2}(t), \end{aligned}$$

where now \(\xi, \eta\) are written in place of \(\xi_{s}, \eta_{s}\), since \(s\) is fixed, and where \(\tilde{a} = a - 4 \mu\sin^{2}(\pi s/ N)\), \(\tilde{d} = d - 4 \nu\sin^{2}(\pi s/N)\). The disturbances \(R_{1}\), \(R_{2}\) are assumed to be white noise. Now one introduces new variables \(u,v\) by means of

$$\xi= b (u+v), \qquad\eta= (p - \tilde{a}) u + (\tilde{p} - \tilde{a}) v . $$

Here \(p > \tilde{p}\) denote the roots of \((p- \tilde{a}) (p -\tilde{d}) = bc\). Assume that \(v= 0\), since it is small in comparison with \(u\) due to \(dv/dt \approx\tilde{p} v\). This results from just considering leading order terms. Then one obtains after some calculations that

$$\begin{aligned} &u = \int_{-\infty}^{t} \bigl[L_{1}(w) R_{1}(w) + L_{2}(w) R_{2}(w) \bigr] \exp \biggl( \int_{w}^{t} q(z) dz \biggr) dw ,\\ &\quad\mbox{with } L_{1}(t) = ({\tilde{p} - \tilde{a}})/ \bigl[{b(\tilde{p} - p)}\bigr],\ L_{2}(t) = 1/(\tilde{p} - p), \ q = p + bL_{1}'(t). \end{aligned}$$

The interest is not so much in such solutions per se, but rather in the statistical distribution of the values of \(u, \xi, \eta\) at various time instances after the instability has set in. Since we have assumed white noise acting, \(u\) at time \(t\) is distributed according to the normal error law with variance

$$\int_{-\infty}^{t} \bigl[\beta_{1} L_{1}^{2}(w) + \beta_{2} L_{2}^{2}(w) \bigr] \exp \biggl( 2 \int_{w}^{t} q(z) dz \biggr) dw, $$

where the constants \(\beta_{1}\), \(\beta_{2}\) describe the amplitude of the noise disturbances.

If the system is in a distinctly stable state, then \(q(t)\), which is close to \(p(t)\), is distinctly negative and the above variance of \(u\) can be approximated by \([\beta_{1} L_{1}^{2}(t) + \beta_{2} L_{2}^{2}(t) ] [- 2 q(t)]^{-1} \).

If the system is unstable, so \(q(t) > 0\), then the variance of \(u\) can—after some calculations and suitable assumptions—be approximated by

$$\sqrt{\pi}\bigl[\beta_{1} L_{1}^{2}(t_{0}) + \beta_{2} L_{2}^{2}(t_{0}) \bigr] \exp \biggl( 2 \int_{t_{0}}^{t} q(z) dz \biggr) \Big/ \sqrt{q'(t_{0})}. $$

Here \(t_{0}\) denotes the last time instance when \(q(t) = 0\). Thus disturbances near the time when the instability is zero are the only ones which have a reasonable ultimate effect. Those which occur earlier are damped out by the subsequent period of stability. Those which occur later have a shorter period of instability within which they would need to develop to a greater amplitude. The factor \(\exp ( \int_{t_{0}}^{t} q(z) dz )\) is essentially the integrated instability and describes the extent to which one would expect disturbances of an appropriate wavelength to grow between the time instances \(t_{0}\) and \(t\). The factor \(\sqrt{\pi}\beta_{1} L_{1}^{2}(t_{0}) / \sqrt{q'(t_{0})} \) indicates that the disturbances on the first morphogen should be regarded as lasting for a time \(\sqrt{\pi}(b L_{1}(t_{0}) )^{2}/ \sqrt{q(t_{0})} \).

In his paper Turing shortly refers also to the situation of non-linear reaction rates and gives some qualitative conclusions. With some practical computations/numerical examples he completes his analysis. A biological interpretation of the obtained mathematical results follows. The effects within the ring of cells and within the continuous ring of tissue are extremely similar. The main focus in [15] is on the situation, when the reaction-diffusion systems describing the dynamics of the morphogens are just unstable. A strong assumption was the linearity of the reaction rates. The patterns appearing are best described in terms of waves. The situation when there is an instability for an isolated cell, compare case (a), might account for dappled color patterns. If so, then these patterns must be laid down in a latent form when the developing foetus is only a few inches long. There may not be any similarity between the contents of cells which are far apart. Case (b) is similar to case (a) and might account for metabolic oscillations. In case (c) there is a drift from equilibrium which is in opposite directions in neighboring cells. In the biologically most interesting case (d) the peaks of the waves are uniformly spaced around the ring of cells. The number of peaks corresponds to the chemical wavelength divided by the circumference of the ring of tissue. With at least three morphogens it is possible to obtain traveling waves. For all these patterns to occur the range of suitable reaction rates is rather general.

Biological examples and a critical discussion of the relevance of the mathematical model in specific situations follow in the original paper. Further, hints on how to choose biological questions in a sensible way in order to pave the way for a possible detection of fundamental mechanisms in biology are given. After that, chemical waves on the sphere are discussed in the context of gastrulation. Spherical surface harmonics play a fundamental role here. The breakdown of homogeneity and the onset of pattern formation in this case is axially symmetric about a new axis, which is determined by the specificities of the disturbing influences.

At the end of his paper Turing advertizes the use of computers, stresses the role of mechanics in developmental biology and the importance of analysing non-linear problems. By intentionally structuring the chapters of his paper in different ways, he made his results comprehensible for a broad readership with a varying background of knowledge. In [15, Chap. 4] he gives a brilliant account of the idea of a bifurcation parameter in a common language. His article comprises many more interesting points than those which are commonly cited nowadays. Modestly Turing ends his seminal paper:

“It must be admitted that the biological examples which it has been possible to give in the present paper are very limited. … Taking this in combination with the relatively elementary mathematics used in this paper one could hardly expect to find that many observed biological phenomena would be covered. It is thought, however, that the imaginary biological systems which have been treated, and the principles which have been discussed, should be of some help in interpreting real biological forms.”

Due to his untimely death Turing could not finish his manuscript on “Morphogen Theory of Phyllotaxis” [16], which clearly indicates how he had intended to proceed.

2 Hodgkin-Huxley

Another classical and important example for mathematical modeling in the life-sciences was published at around the same time when Turing’s paper appeared. Hodgkin and Huxley—who received the Nobelprize in 1963—concluded in [7], A Quantitative Description of Membrane Current and it Application to Conduction and Excitation in Nerve, a series of their previous papers. Some of their results were obtained together with their colleague Katz, who received the Nobelprize in 1970. Their paper describes experimental results, mathematical modeling and numerical analysis of the flow of electric current through the surface membrane of a giant nerve fibre, the giant axon of the squid.

Data of voltage clamp experiments were used as the basis for a mathematical description of the changes in sodium and potassium conductance associated with an alteration of the membrane potential. Once the parameters in the mathematical model were fixed, this theoretical approach was used to predict the behavior of an exemplary nerve fibre of the squid Loligo pealeii under a variety of new experimental settings. Astonishingly good agreement of the mathematical model with the experimental results were obtained. The mathematical results did even account for conduction and excitation in quantiative terms.

At rest the membrane of the axon is polarized, i.e. its electrical potential is non-zero. Depending on the intra- and extracellular ionic concentrations and their intrinsic charges, both, sodium and potassium have an equilibrium potential. If the membrane potential is equal to this equilibrium or reversal potential, then no net movement of the respective ions will occur. If the potentials differ, then the respective ions flow out or into the cell. Specific channels within the membrane conduct either sodium \((Na^{+})\) or potassium \((K^{+})\). When the membrane potential exceeds a threshold, then the sodium channels open, and the membranes sodium conductance is raised. Due to the characteristics of the sodium equilibrium potential, sodium flows inward, and the membrane potential rises further up, such that also the potassium channels open. Potassium then flows outward, according to the characteristics of its own equilibrium potential. Thus the membrane potential drops back down. This process generates a spike, whose rising phase is caused by the sodium influx, and whose dropping down is caused by the potassium outflux. Therefore Hodgkin and Huxley borrowed from the theory of electrical circuits to mathematically model their experimental findings.

figure b

The assumptions for their system of equations is based on a number of initial voltage clamp experiments, i.e. measurements of ionic currents by holding the membrane voltage at a set level. These assumptions are the following:

The most relevant ionic currents within the nerve, sodium and potassium, are denoted by \(I_{Na}\) and \(I_{K}\). Further, a small leakage current \(I_{l}\) is introduced, in order to take removals from the system into account. This leakage current is made up by chloride and other ions, and is not further specified in the mathematical model. Ionic currents can be carried through the membrane of the giant nerve fibre by either charging the membrane capacitance or by the movement of ions. The larger the capacitance, i.e. the ability to store electrical charge, the more electrical charge can be hold at a given voltage. For each component of the ionic current the electrical potential difference and a permeability coefficient can be measured. Thus one obtains

$$I_{Na} = g_{Na} (E - E_{Na}), $$

i.e. the sodium current \(I_{Na}\) equals the sodium conduction \(g_{Na}\), respectively the permeability of the membrane for sodium, times the difference between the membrane potential \(E\) and the equilibrium potential \(E_{Na}\) for the sodium ions. Similar equations are considered for potassium and the leakage current.

The thickness and composition of the excitable membrane were unknown at the time and the nature of the molecular events underlying changes in the permeability of the membrane could not be measured experimentally. Thus the idea was to develop biological hypotheses for the functioning of the experimental system by the help of a mathematical model. With their model Hodgkin and Huxley analyzed which of the developed theories were consistent with concrete experiments and which theories could be excluded by them. This is an important conceptual ansatz. Especially in situations where direct experimental measurements can not yet be done, mathematical modeling can give important new insights, also for the planing of new experiments.

Hodgkin and Huxley divided the total membrane current density \(I\) into the contribution of the membrane capacity current and the ionic current density \(I_{i}\), which are assumed to be in parallel

$$\begin{aligned} I = C_{M} \frac{dV}{dt} + I_{i}. \end{aligned}$$
(17)

Here \(V\) denotes the displacement of the membrane potential from its resting value and \(C_{M}\) the membrane capacity per unit area, which is assumed to be constant. For an inward current, \(I\) is positive, and \(V\) is negative for depolarization. Equation (17) neglects though dielectric losses within the membrane. Further, the ionic current density is split into

$$I_{i} = I_{Na} + I_{K} + I_{l}, $$

namely the components carried by sodium ions, potassium ions and others. For the individual ionic currents it is assumed that

$$\begin{aligned} I_{Na} =& g_{Na} (E - E_{Na}) = g_{Na}(V - V_{Na}) , \end{aligned}$$
(18)
$$\begin{aligned} I_{K} =& g_{K} (E- E_{K}) = g_{K} (V - V_{K}), \end{aligned}$$
(19)
$$\begin{aligned} I_{l} =& \bar{g}_{l} (E - E_{l})= \bar{g}_{l} (V - V_{l}), \end{aligned}$$
(20)

where \(E_{Na}\), \(E_{K}\) are the equilibrium potentials for the sodium and potassium ions, and \(E_{l}\) is the potential at which the leakage current due to chloride and other ions is zero. Further \(V= E- E_{r}\), \(V_{Na} = E_{Na} - E_{r}\), \(V_{K} = E_{K} - E_{r}\), \(V_{l} = E_{l} - E_{r}\), where \(E_{r}\) is the absolute value of the resting potential. So \(V, V_{Na}, V_{K}, V_{l}\) can be measured directly as displacements from the resting potential. It is assumed that

$$\begin{aligned} g_{K}= \bar{g}_{K} n^{4} , \qquad \frac{dn}{dt} = \alpha_{n} (1-n) - \beta_{n} n. \end{aligned}$$
(21)

Here the constant \(\bar{g}_{K}\) has the dimension of the \(\mbox{conductance/cm}^{2}\); \(\alpha_{n}, \beta_{n}\) are rate constants, which vary with voltage but not with time and have dimensions of 1/time. Further, the dimensionless variable \(n\) varies between 0 and 1, and represents the proportion of particles, for example at the inside of the membrane, and \((1-n)\) represents the proportion elsewhere, e.g. at the outside of the membrane. In this case \(\alpha_{n}\) determines the rate of transfer of particles from the outside of the membrane to the inside, while \(\beta_{n}\) determines the transfer in the opposite direction.

The assumptions on the power four of \(n\) in (21) was chosen, simply because it results into the best fit of the solutions of the mathematical model when compared with experimental results. Nevertheless, the ansatz can be interpreted as follows: The potassium ions can only cross the membrane when four similar particles occupy a certain region of the membrane. If a particle has negative charge, the respective \(\alpha_{n}\) should increase and \(\beta_{n}\) should decrease, while the membrane is depolarized. In the resting state \(V=0\) the resting value for \(n\) is \(n_{0} = \alpha_{n_{0}}/( \alpha_{n_{0}} + \beta_{n_{0}})\). With this condition the solution for the \(n\)-equation with initial value \(n(0) = n_{0}\) is

$$\begin{aligned} &n = n_{\infty}- (n_{\infty}- n_{0})\exp(-t/ \tau_{n}), \quad\mbox{where } n_{\infty}= \alpha_{n}/( \alpha_{n} + \beta_{n}), \, \tau_{n} = 1/( \alpha_{n} + \beta_{n}),\\ &\quad\mbox{and} \quad g_{K} = \bigl( \sqrt{g_{K_{\infty}}} - [ \sqrt {g_{K_{\infty}}} - \sqrt{g_{K_{0}}} ] \exp(-t/\tau_{n}) \bigr)^{4} . \end{aligned}$$

Here \(g_{K_{\infty}}\) is the value finally attained by the potassium conductance, and \(g_{K_{0}}\) is the potassium conductance at \(t=0\). The agreement with the experimental results turned out to be very good. Only a stronger initial delay could be observed experimentally.

In order to find suitable functions which relate \(\alpha_{n}\) and \(\beta_{n}\) to the membrane potential, all measurements were plotted against \(V\). The following continuous curves then gave a good fit for the obtained experimental data

$$\alpha_{n} = 0.01 (V + 10) \bigl[ \exp \bigl(0.1 (V+10) \bigr) -1 \bigr]^{-1}, \qquad\beta_{n} = 0.125 \exp(V/80) . $$

These equations can be given a qualitative physical basis, if one supposes that the variation of \(\alpha_{n}\) and \(\beta_{n}\) with respect to the membrane potential do arise from the effect of the electric field on the movement of a negatively charged particle, which rests on the outside of the membrane when \(V\) is large and positive, and on the inside of the membrane when \(V\) it is large and negative. It is speculated that some asymmetry in the structure of the membrane may be responsible for the asymmetry in \(\alpha_{n}\) and \(\beta_{n}\).

For the sodium conductance the following formal assumptions are made:

$$g_{N_{a}} = \bar{g}_{N_{a}} m^{3} h , \qquad \frac{dm}{dt} = \alpha_{m} (1-m) - \beta_{m} m , \qquad \frac{dh}{dt} = \alpha_{h} (1- h) - \beta_{h} h , $$

where \(\bar{g}_{N_{a}}\) is constant and \(\alpha_{m}, \beta_{m}, \alpha_{h}, \beta_{h}\) only depend on \(V\). A physical basis for these equations can be given by an argument as follows: The sodium conductance is assumed to be proportional to the number of sites on the inside of the membrane which are occupied simultaneously by three activating molecules but are not blocked by an inactivating molecule. Here \(m\) represents the proportion of activating molecules on the inside of the membrane, and \((1-m)\) their proportion on the outside of the membrane. Further \(h\) represents the proportion of inactivating molecules on the outside of the membrane and \((1-h)\) their proportion on the inside. Finally \(\alpha_{n}\) or \(\beta_{h}\) and \(\beta_{m}\) or \(\alpha_{h}\) represent the transfer rate constants in the two directions. Solutions of these equations with \(m(0) = m_{0}\), \(h(0)= h_{0}\) are

$$\begin{aligned} m =& m_{\infty}- (m_{\infty}- m_{0}) \exp( - t/ \tau_{m})\qquad h = h_{\infty}- (h_{\infty}- h_{0}) \exp(- t/ \tau_{h}), \end{aligned}$$
(22)

where \(m_{\infty}= \alpha_{m}/(\alpha_{m} + \beta_{m})\), \(\tau_{m} = 1/(\alpha_{m} + \beta_{m})\), \(h_{\infty}= \alpha_{h} / (\alpha_{h} + \beta_{h})\), and \(\tau_{h} = 1/(\alpha_{h} + \beta_{h}) \). In the resting state the sodium conductance is very small when compared with the value attained during a large depolarization. Therefore it is assumed that \(m_{0} = 0\) for a depolarization greater than 30 mV. Further, inactivation is nearly complete for \(V < -30~\mbox{mV}\), so that then also \(h_{\infty}\) is neglected. So the expression for the sodium conductance becomes

$$g_{N_{a}} = \tilde{g}_{N_{a}} \bigl[ 1 - \exp(-t/ \tau_{m})\bigr]^{3} \exp(-t/\tau_{h}), $$

where \(\tilde{g}_{N_{a}} = \bar{g}_{N_{a}} m_{\infty}^{3} h_{0}\) is the value which the sodium conductance would attain if \(h\) would remain at its resting level \(h_{0}\). This equation was fitted to experimental curves by comparing different ratios of \(\tau_{m}\) to \(\tau_{h}\). The rate constants \(\alpha_{m}\), \(\beta_{m}\) were then obtained in a similar manner as \(\alpha_{n}\), \(\beta_{n}\) before:

$$\alpha_{m} = 0.1 (V + 25) \bigl[\exp \bigl( 0.1 (V + 25) \bigr) -1 \bigr]^{-1} , \qquad\beta_{m} = 4 \exp(V/18) , $$

and analogously

$$\alpha_{h} = 0.07 \exp(V/20) , \qquad \beta_{h} = \bigl[ \exp \bigl( 0.1 (V+30) \bigr) + 1 \bigr]^{-1}. $$

One of the most striking properties of the membrane is the extreme steepness of the relation between ionic conductance and membrane potential. The possible meaning of this result can be illustrated as follows:

Suppose that a charged molecule with some affinity for sodium rests either on the inside or the outside of the membrane, and is present in negligible concentrations elsewhere. Let the sodium conductance be proportional to the number of such molecules on the inside of the membrane but be independent of their number at the outside. Boltzmann’s principle then relates the proportion \(P_{i}\) of the molecules on the inside of the membrane to the ones on the outside \(P_{o}\), by

$$P_{i}/P_{o} = \exp\bigl[(\omega+ z e E)/(kT)\bigr], \quad P_{i} + P_{o} = 1 , $$

where \(E\) is the potential difference between the outside and the inside of the membrane, \(\omega\) is the work required to move a molecule from the inside to the outside of the membrane when \(E=0\), \(e\) is the absolute value of the electronic charge, \(z\) is the number of positive electronic charges on the molecules, \(k\) is the Boltzmann constant, and \(T\) is the absolute temperature.

Then \(P_{i} = ( 1 + \exp [ -(\omega+ z e E ) / ( kT) ] )^{-1}\). If \(V > - 30~\mbox{mV}\), then \(h_{\infty}\) is of a similar form, namely \(h_{\infty}= (1 + \exp[(V_{h} -V)/7])^{-1}\). This is consistent with the suggestion that inactivation might be due to the movement of a negatively charged particle which blocks the flow of sodium ions when it reaches the inside of the membrane. This is quite interesting, but on the other hand still some further ad hoc assumptions have to be made in order to obtain satisfactory functions \(\alpha_{h}, \beta_{h}\) which fit to the experiments. So altogether the following system of equations results

$$\begin{aligned} \begin{aligned} I&= C_{M} \frac{dV}{dt} + \bar{g}_{K} n^{4} (V - V_{K}) + \bar{g}_{N_{a}} m^{3} h (V - V_{N_{a}}) + \bar{g}_{l} (V - V_{l}) \\ \frac{dn}{dt} &= \alpha_{n} (1-n) - \beta_{n} n ,\quad \frac{dm}{dt} = \alpha_{m} (1-m) - \beta_{m} m,\quad \frac{dh}{dt} = \alpha_{h} (1-h) - \beta_{h} h \\ \alpha_{n} &= 0.01 (V + 10) \bigl[ \exp\bigl(0.1 (V + 10)\bigr) - 1 \bigr]^{-1} , \qquad \beta_{n} = 0.125 \exp(V/80) \\ \alpha_{m} &= 0.1 (V + 25) \bigl[ \exp\bigl(0.1 (V + 25)\bigr) -1 \bigr]^{-1} , \qquad\beta_{m} = 4 \exp(V/18)\\ \alpha_{h} &= 0.07 \exp(V/20) , \qquad \beta_{h} = \bigl[ \exp\bigl(0.1 (V+30)\bigr) +1 \bigr]^{-1} . \end{aligned} \end{aligned}$$
(23)

The \(\alpha's\) and \(\beta's\) are appropriate for a temperature of \(6.3^{\circ}\mbox{C}\). Potentials are given in mV, current densities in \(\upmu \mbox{A/cm}^{2}\), conductances in \(\mbox{m}\,\mbox{mho/cm}^{2}\), capacity in \(\upmu \mbox{F /cm}^{2}\), and time in msec. Here V denotes volt, A ampere, mho Siemens, and F farad. There is one important quantiative difference between the solutions of the mathematical model and the experimental data, namely that the theoretical current has too little delay at the sodium potential. Therefore the equations do not fully account for the delay in the rises of \(g_{K}\). Otherwise the comparison between experiment and theory is astonishingly good.

For a membrane action potential which propagates in space, one has for the membrane current \(i\) per unit length that

$$i = \biggl( \frac{1}{r_{1} + r_{2}} \biggr) \frac{\partial^{2} V}{\partial x^{2}}, $$

where \(r_{1}, r_{2}\) are the external and internal resistances per unit length, and \(x\) is the distance along the fibre. For an axon surrounded by a large volume of conducting fluid, \(r_{1}\) is negligible, thus

$$i= \frac{1}{r_{2}} \frac{\partial^{2} V}{\partial x^{2}} , \qquad \mbox{respectively } I= \frac{a}{2 R_{2}} \frac{\partial^{2} V}{\partial x^{2}}, $$

where \(I\) is the membrane current density, \(a\) is the radius of the fibre and \(R_{2}\) is the specific resistance of the axoplasm. Inserting this formula into (23) one obtains

$$\frac{a}{2 R_{2}} \frac{\partial^{2} V}{\partial x^{2}} = C_{M} \frac{\partial V}{\partial t} + \bar{g}_{K} n^{4} (V- V_{K}) + \bar{g}_{N_{a}} m^{3} h (V- V_{N_{a}}) + \bar{g}_{l} (V - V_{l}). $$

The other equations of the system remain unchanged for this case.

The mathematical model was numerically approximated and extensively studied by Hodgkin and Huxley in [7]. It predicted with fair accuracy many of the electrical properties of the squid giant axon, including many phenomena of excitation, like anode break excitation and accommodation. Interestingly the mathematical model was tested without further adjustment to many follow up experiments. Not only qualitatively but also quantitatively the model reflects very well:

  • the form, duration, amplitude and threshold of an action potential under zero membrane current at two temperatures,

  • the form, duration, amplitude and velocity of a propagated action potential,

  • the form and amplitude of the impedance changes associated with an action potential,

  • the total inward movement of sodium ions as well as the total outward movement of potassium ions associated with an impulse,

  • the threshold and response during the refractory period,

  • the existence and form of subthreshold responses,

  • the existence and form of an anode break response,

  • and the properties of the subthreshold oscillations,

as can be observed in cephalopod axons. The theory also predicts that a direct current does not excite if it rises too slow.

The constants used in the mathematical model were entirely derived from voltage clamp records without any adjustment to make them fit the phenomena they were subsequently applied to. Therefore, the qualitative comparison of the model with new experiments is quite satisfactory. This agreement, as Hodgkin and Huxley pointed out in their paper, does of course not necessarily mean, that the model is more than an empirical description of the dynamics of sodium and potassium. They were aware that in principle very different mathematical systems could give equally good results. The system they discuss takes into account a specific mechanism for permeability changes. This fairly simple change in response to alterations in membrane potential interestingly is a sufficient explanation for a wide range of experimental phenomena.

Finally Hodgkin and Huxley critically discuss their results. The solutions of the mathematical model cover the short-term responses of the membrane and do apply to experiments with an isolated squid giant axon. Additional processes have to be taken into account for a nerve in a living animal in order to maintain the ionic gradients and not to let them run down as it happens in the mathematical model for the isolated squid giant axon. Also, the equations do not account for after-potentials. Easy modifications of the model could account for increases of the resting potential. This could be achieved by reducing the leak conductance and adding a small outward current, which represents the metabolic extrusion of sodium ions. Upon changes of the concentrations of sodium and potassium the resting potential and the action potential of many excitable tissues behave qualitatively similar. There is though a large difference in the exact shape of the action potentials, so at least the parameters in the model have to be different in other applications.

In very few aspects the behavior of the mathematical model does not agree with the experimental results. One assumption was that the membrane capacity behaves like a perfect condenser/capacitor, i.e. approximately like two electrical conductors/plates separated by an insulator that can store energy by becoming polarized. This simplification may account for the fact that the initial fall in potential after application of a short shock is much less marked in the model than it is in the experimental curves.

Another aspect is that the potassium concentration in the model does not show as much delay in the conductance rise upon depolarization, e.g. to the sodium potential, as was observed in the voltage clamp experiment. Therefore the falling phase of the spike develops too early in the mathematical model.

Further the model showed in comparison with the experiment a too large exchange of internal and external potassium ions per impulse.

The theory strongly supports the idea that the responses of an isolated giant axon of Loligo to electrical stimuli are due to reversible alterations in sodium and potassium permeability which arise from changes in membrane potential.

The impact of the results by Hodgkin, Huxley, and Katz in the decades after their findings was very strong. Hodgkin and Huxley discovered the voltage-dependent ion channels by their combined approach of experimental and theoretical analysis. Interesting is the fourth power they assumed in their model for \(n\), which simply is the minimal power giving the best fits for the data obtained from a series of experiments. One of the seminal contributions of Hodgkin and Huxley was the notion that sodium channels transit between various conformational states in the process of opening. Yet another set of conformations is entered when the channels shut during maintained depolarization. It is known today, that indeed the potassium channels have a tetrameric structure. Four homologous subunits change conformation as these channels open.

3 Chemotaxis and Self-Organization

Chemotaxis is a phenomenon which can be observed in a variety of developmental processes, in microorganisms as well as in certain cells of vertebrates. It describes the directed movement of the respective species towards higher concentrations of a chemical signal. This process is very common and important for the correct positioning and relocation of cells during structure formation of cell populations or within tissues, e.g. during embryogenesis. Cells can also react negatively, by moving away from a signal source. In this note, for convenience, only positive chemotaxis is considered.

One of the model organisms in biology for development and especially also for chemotaxis are the slime mold amoebae Dictyostelium discoideum, Dd. Under starvation conditions these amoebae produce an attracting chemical messenger, called acrasin. Such kind of self-production is not always the case in chemotaxis. The attractive signal can also arise from external sources. In some amoeboid species the acrasin is degraded by an enzyme called acrasinase, which occurs both bound to the cell membrane and free.

Under starvation conditions Dd aggregates chemotactically and then coordinates further to develop into structures, which are termed fruiting bodies, in order to allow part of the population to survive in a vegetative state. This structure forming process requires the death of about 90 percent of the amoebae for the survival of about 10 percent. Aggregation phenomena, in which spatially separated cells first form a multicellular group and then differentiate, are ideal model systems for the understanding of the interactions between cells during morphogenesis.

An important question in the context of self-organization of Dd is, if chemotaxis is a main driving mechanism, not only for the movement of the amoebae towards higher concentrations of acrasin, but also for aggregation, mound formation and further development in order to survive.

figure c

In [12] Keller and Segel proposed a mathematical model for chemotaxis of Dd which takes into account

  • the acrasin of concentration \(v\), which is produced by the amoebae at a rate \(f(v)\) per cell,

  • the acrasinase of concentration \(\eta\), which is produced by the amoebae at a rate \(g(v, \eta)\) per cell,

  • the amoebae, whose concentration \(u\) changes as a result of an oriented chemotactic motion in the direction of a positive gradient of acrasin and a random motion analogous to diffusion.

  • The acrasin and the acrasinase react to form a complex of concentration \(c\), which dissociates into the free acrasinase plus the degraded product.

  • The acrasin, the acrasinase and the complex diffuse according to Fick’s law.

This model system considered in a two-dimensional spatial domain, e.g. the Petri dish, then reads

$$\begin{aligned} \partial_{t} u =& - \nabla\cdot( D_{1} \nabla v) + \nabla \cdot(D_{2} \nabla u) \end{aligned}$$
(24)
$$\begin{aligned} \partial_{t} v =& - k_{1} v\eta+ k_{-1} c + uf(v) + D_{v} \Delta v \end{aligned}$$
(25)
$$\begin{aligned} \partial_{t} c =& k_{1} v \eta- (k_{-1} + k_{2}) c + D_{c} \Delta c \end{aligned}$$
(26)
$$\begin{aligned} \partial_{t} \eta =& - k_{1} v \eta + ( k_{-1} + k_{2}) c + u g(v,\eta) + D_{\eta}\Delta\eta, \end{aligned}$$
(27)

with Neumann boundary conditions. Here \(k_{1}\), \(k_{-1}\), \(k_{2}\) are the rate constants for the acrasin-acrasinase reactions. The system accounts for a stage where the attractive signal is steadily released by all amoebae in the field. The entire process is modelled spatially two dimensional, since all amoebae in the respective experimental situation are first crawling over a more or less flat substrate within the Petri dish. As described in [13] the amoebae are roughly about 10 μm in diameter. A lower threshold for aggregation is about \(5 \times10^{4}\) amoebae per square centimeter. The density at their closest packing is about \(10^{6}\) amoebae per square centimeter. Thus the amoebae, when still moving within a monolayer, are separated from each other by about less than 50 μm. Therefore a density description is sensible.

Assuming the complex \(c\) to be in a steady state with regard to the chemical reactions, i.e. \(k_{1} v \eta- (k_{-1} + k_{2}) c = 0\), and further that the total concentration of the enzyme (both free and bound) is constant, i.e. \(\eta+ c = \tilde{\eta}\), a simplified model was derived in [12], namely

$$\begin{aligned} \partial_{t} u =& - \nabla\cdot(D_{1} \nabla v ) + \nabla \cdot(D_{2} \nabla u) \end{aligned}$$
(28)
$$\begin{aligned} \partial_{t} v =& - k(v) v + u f(v) + D_{v} \Delta v , \end{aligned}$$
(29)

again in a two-dimensional domain with Neumann boundary conditions, and with \(k(v) = \tilde{\eta}k_{2} k_{1} / ( k_{-1} + k_{2} + k_{1} v ) \). Linearizing around constant steady states \(\tilde{u}, \tilde{v}\) and performing a perturbation analysis, like it was done for the Turing system, result into the following instability condition

$$ D_{1} f(\tilde{v}) + D_{2} f'(\tilde{v}) \tilde{u} > D_{2} \bigl[ k(\tilde{v}) + \tilde{v} k'(\tilde{v}) \bigr]. $$
(30)

If \(f\) and \(k\) are constant one has \(D_{1} f > D_{2} k\). Thus the acrasin production and the chemotactic strength of the amoebae have to overcome the decay of acrasin and the random motion of the amoebae suitably, in order for an instability to arise. Generally, when looking at the two terms on the left hand side of (30), one obtains two different sources of instability. As mentioned, a high acrasin production and a strong chemotactic response promote instabilities. Diffusion and decay of acrasin are acting against this, as can be seen from the term on the right hand side of (30).

On the other hand, assuming \(D_{1}\) to be small and then looking at \(f'(\tilde{v}) \tilde{u} > k(\tilde{v}) + \tilde{v} k'(\tilde{v})\), a second possible mechanism for an instability to occur, is a larger density of amoebae, or a rapidly increasing production of acrasin, due to a small increase of the acrasin level, which outweights its decay.

Whether the above mentioned instabilities really lead to a clustering of the amoebae, where they start to also crawl over each other, can only be learned from a non-linear analysis. If the model system is indeed able to indicate the onset of a local accumulation in three dimensional aggregates of amoebae instead of a somewhere more and somewhere less densely packed single cell layer, the assumption that the solutions of this system are well-behaved has to break down, since the model setup is spatially two dimensional. Such an initiation of further self-organization can be experimentally observed in Dd. So one is not only interested in the mere pattern forming behavior of the system, i.e. the emergence of regions of higher density of amoebae and regions of lower density. The question to be addressed with the mathematical model for chemotaxis of Dd is, if a suitable spatially 2-dimensional model for the observed chemotactic cell motion first results in regions of higher cell densities and then is also able to indicate the onset of the experimentally observed 3-dimensional self-organization. When using a spatially 2-dimensional model for this behavior, then a necessary feature is that the solutions to the model should not exist globally in order to answer this question positively. In [1] it was already conjectured that \(\delta\)-functions might be possible solutions of (28), (29).

On the other hand, one could start with a spatially 3-dimensional model right from the beginning. Then the peculiarities of cell motion in the third dimension would also have to be described. This motion is crucially different from mere three dimensional chemotactic motion. The basic model treated in [10], which for the first time answered the biological question of chemotaxis being a possible driving mechanism for the onset of self-organization of Dd in mathematical rigorous terms reads

$$\begin{aligned} \partial_{t} u =& \Delta u - \chi\nabla(u \nabla v) \end{aligned}$$
(31)
$$\begin{aligned} \partial_{t} v =& D_{v} \Delta v - \mu v + \beta u \end{aligned}$$
(32)
$$\begin{aligned} u(0,\cdot) =& u_{0},\qquad v(0,\cdot) = v_{0}, \quad u_{0}, v_{0} \geq0 \end{aligned}$$
(33)
$$\begin{aligned} \partial_{\nu}u (t , \cdot) = & \partial_{\nu}v(t, \cdot) = 0 \quad \mbox{on } \delta\varOmega, \ \varOmega\subset {\mathbb {R}}^{2}. \end{aligned}$$
(34)

This system is a specific version of the simplified Keller-Segel model with \(D_{1} = \chi u\) and \(D_{2} = 1\). The amoebae undergo random motion and move up the chemical gradient \(\nabla v\) with a so-called chemotactic sensitivity of strength \(\chi\). The attractive chemical diffuses, it is produced with a rate \(\beta\) by the amoebae themselves and decays with rate \(\mu\). Here \(\chi, D_{v}, \mu, \beta\) are all positive constants. The diffusion of the chemical is much faster than the random motion of the cells. So \(D_{v}\) is assumed to be of order \(1/\varepsilon\), with \(\varepsilon\) being small. Further, it is assumed that the production rate of the attractive chemical fulfills \(\beta= D_{v} \alpha\), and that \(\alpha\) and \(\mu\) are of order one. Then for \(\bar{w} := \frac{1}{\varOmega} \int_{\varOmega}w dx\) one obtains

$$\bar{u} (t) = \bar{u}_{0}, \quad ( \partial_{t} + \mu) \bar{v} /D_{v} = \alpha\bar{u} = \alpha\bar{u}_{0} . $$

Introducing the relative density of the chemical \(v^{*} := v -\bar{v}\) one gets \((\partial_{t} + \mu) v^{*} / D_{v} = \Delta v^{*} + \alpha(u - \bar{u}_{0})\). Hence for small \(\varepsilon\) one may as well consider the approximating system

$$\begin{aligned} \partial_{t} u =& \Delta u - \chi\nabla(u \nabla v) = \Delta u - \chi \nabla\bigl(u \nabla v^{*} \bigr)\\ 0 = & \Delta v^{*} + \alpha( u - \bar{u}_{0}). \end{aligned}$$

Then rescaling \(v^{\sharp}= v^{*}/(\alpha\bar{u}_{0})\), \(u^{\sharp}= u/\bar{u}_{0}\), and after that immediately dropping the , one obtains

$$\begin{aligned} \begin{aligned} \partial_{t} u &= \Delta u - \tilde{\chi}\nabla( u \nabla v)\\ 0 &= \Delta v + u - 1, \end{aligned} \end{aligned}$$
(35)

where \(\tilde{\chi}= \alpha\bar{u}_{0} \chi\). By the maximum principle one has for any solution \((u,v)\) that \(u\geq0\), and \(v\) can be computed from \(u\).

In the rigorous bifurcation analysis for general chemotaxis systems given in [14] it was already pointed out that the long time behavior of the solutions can only be non-constant if \(\tilde{\chi}\) is larger than the modulus of the first non-zero eigenvalue of the Laplacian.

The main results in [10] are the following. For an open, bounded domain \(\varOmega\in {\mathbb {R}}^{2}\) with \(C^{1}\)-boundary and initial data \(u_{0}\) in \(C^{1}\)

  1. (a)

    there is a critical number \(c(\varOmega)\) such that for \(\tilde{\chi}= \alpha\bar{u}_{0} \chi< c(\varOmega)\) there exists a unique, smooth positive solution for system (35) with the respective initial data and boundary values for all time. More precisely, if \(u\) is a smooth, positive solution of (35) and \(0 < t^{*} \leq\infty\) is the maximal time of its existence, then there exists \(c_{1}(\varOmega) > 0\) such that \(t^{*} < \infty\) implies

    $$\lim_{k \rightarrow\infty} \limsup_{t \rightarrow t_{-}^{*}} \int_{\varOmega}( u - k)_{+} \ dx \geq c_{1}(\varOmega). $$
  2. (b)

    If \(\varOmega\subset {\mathbb {R}}^{2}\) is a disk, then there exists \(c^{*} >0\) such that for \(\tilde{\chi}= \alpha\bar{u}_{0} \chi> c^{*}\) radially symmetric positive initial values can be constructed so that explosion of \(u\) happens in the center of the disc in finite time.

With the second result it is shown that this chemotaxis model exhibits effects which indicate the onset of a self-organization phenomenon, which later, experimentally, leads to fruiting body formation of Dd. The authors remark that (a) contains information on the rate of explosion if it happens in finite time, and that it would be interesting to know more about the set of explosion points at \(t^{*}\). Further they mention, that solutions may globally exist in a weak sense and that the study of singularities after a finite maximal time of existence is another important topic to be studied. Interestingly this last suggestion within the paper was followed up only quite recently. Biologically such an analysis gives hints about what may happen after part of the amoebae have started to self-organize. Can more aggregates form later and self-organize too?

The modeling and analysis of the onset of chemotactic self-organization of Dd in [10], a highly interesting developmental process, has also contributed a lot to the understanding of the biological relevance of certain singular solutions for mathematical models in the life-sciences. This has been recognized much later in comparison to other sciences, where the relevance of singular solutions had long been established before, e.g. in physics. In this specific context we refer also to [17].

Although not given explicitly in their theorems, from the estimates in [10] one can directly deduce the explicit critical constant \(c^{*}\) for the radially symmetric setting, which distinguishes between the existence of global solutions and blowup of solutions.

In summary, a large enough chemotactic sensitivity, a large enough production rate of the attractive chemical signal and/or a large enough initial mass of amoebae is necessary for the onset of self-organization due to chemotaxis in the mathematical model. Qualitatively this fits very well to what is measured and known experimentally. This first rigorous result on switching between existence of global solutions and singular solutions, just by a slight variation of certain parameters of the model systems has triggered a huge number of mathematical research on chemotaxis systems and aggregation phenomena. Many more results are still expected to come. Within the ‘DMV-Jahresberichte’ two reviews have been provided [8, 9], with a quite complete overview of what has been proved until then.

4 Statistical Significance of Molecular Sequence Characteristics

It is well known that an unusual pattern in a nucleic acid or a protein sequence, or that a region of strong similarity shared by two or more sequences can be of biological significance. Therefore it is desirable to know whether such a pattern can have arisen simply by chance or rather not. In order to identify interesting sequence patterns, appropriate scoring values have to be assigned to individual residues of a single sequence or to a set of residues when several sequences should be compared. For single sequences, such scores can reflect biophysical properties such as charge or secondary structure potential. For multiple sequences the scores can reflect nucleotide or amino acid similarity. How to measure such characteristics in a reasonable way is an important task in order to obtain an efficient and reliable search algorithm. In this context the basic local alignment search tool BLAST and its follow up algorithms are highly successful and the most frequently used software in computational biology. Surprisingly, it is less well known, that the success of these programs heavily relies on rigorous mathematical results. Only those do finally allow for a sound biological interpretation of the obtained output. By using an appropriate random model [2, 3, 6] a mathematical theory was developed which provides precise formulas for assessing the statistical significance of any region with high scores. Significant sequence configurations are characterized with reference to a general scoring scheme. This chapter is strongly based on the summary given in [11].

figure d

Determining which patterns are likely or unlikely to occur just by chance in nucleic acid and protein sequence analysis, helps to identify features of interest for experimental study. Interesting patterns in a single protein sequence might be unusual concentrations of charged residues or potential glycosylation sites. A region of high similarity of two or more sequences may indicate a common function. A mathematical model appropriate to experimental data was constructed in [2, 3, 6], which provides a benchmark for analyzing various data statistics on a sound basis. This independence random model generates successive letters of a given sequence in an independent fashion such that e.g. the letter \(a_{j}\) is selected with probability \(p_{j}\). In the case of proteins the \(p_{j}\) are usually specified as the actual amino acid frequencies in the sequence under consideration. A random first-order Markov model then prescribes \(p_{jk}\) as the conditional probability for sampling letter \(a_{k}\) following letter \(a_{j}\). In the case of a single protein sequence the \(p_{jk}\) would correspond to the observed diresidue frequencies. More complex random models can account for more elaborate long-range dependencies.

Other mathematical models which were available previous to [2, 3, 6], and are outlined in [11], do not allow to describe properties or mismatches that vary in degree. This limits their value for applications. In sequence comparison one would like to be able to count a mismatch between isoleucine and valine differently from one between glycine and tryptophan.

Let us first describe the mathematical theory given in [2, 3, 6], and summarized in [11], in the context of the analysis of a single protein sequence with the objective to identify segments with statistically significant high scores, e.g. for charge concentration, phosphorylation potential, or secondary structure propensity. Let \(A= \{ a_{1}, a_{2}, \ldots \}\) be the alphabet in use with corresponding letter scores \(\{ s_{1}, s_{2}, \ldots \}\). It will be explained later how to choose sensible scoring criteria. For the random sequence model the letters are sampled from \(A\) independently, with the respective probabilities \(\{ p_{1}, p_{2}, \ldots \}\). The theorems summarized in [11] are proved and further generalized in [2, 3, 6] also to random models where successive letters have a Markov dependence.

One is primarily interested in the segment of the sequence with largest aggregate (additive) score, which is denoted as maximal segment score. The length of this segment is determined by the structure of the data themselves, rather than being prescribed. Previous profile studies of protein sequences had used a fixed length, but no clear criteria for choosing this so-called window length had been proposed, and no rigorous significance results were available.

For the mathematical ansatz summarized in [11] it is required that

  • at least one score is positive,

  • and the expected score \(E:= \sum p_{i} s_{i}\) is negative.

Otherwise the maximal segment would tend to be the whole sequence under consideration, which is not what one is interested in. For any set of scores with \(E > 0\) one can define the modified scores \(\tilde{s}_{i} := s_{i} - \alpha E\) with suitable \(\alpha> 1\) to fulfill the above condition.

In order to assess the statistical significance of high-scoring segments in a given protein sequence, it is important to know the probability distribution for maximal segment scores from a random sequence of length \(n\). Let \(\lambda_{*}^{n}\) be the unique, positive solution of

$$\sum_{i=1}^{n} p_{i} \exp\bigl( \lambda^{n} s_{i}\bigr) = 1. $$

Note, that also \(\lambda^{n} = 0\) solves this equation. Let \(M(n)\) denote the maximal segment score for the random sequence of length \(n\). It was proved in [2, 3, 6] that \(M(n)\) is of order \(\log(n)/\lambda_{*}^{n}\), and that the limiting distribution for the centered maximal segment score \(M(n) - \log(n)/\lambda_{*}^{n} =: \tilde{M}(n)\) fulfills

$$ \operatorname{Prob}\bigl\{ \tilde{M}(n) > x \bigr\} \approx1 - \exp\bigl( - K^{*} e^{- \lambda_{*}^{n} x} \bigr). $$
(36)

Additionally an explicit formula for \(K^{*}\) was given, i.e. a rapidly converging series, which is convenient for computational approaches.

The number of separate high-scoring segments, i.e. those with scores exceeding \(x + \log(n)/\lambda_{*}^{n} \) for \(x \in {\mathbb {R}}\), and which are sufficiently far apart, can be closely approximated by a Poisson distribution with parameter \(K^{*} \exp(-\lambda_{*}^{n} x)\). Therefore the probability of finding \(m\) or more distinct segments with score greater than or equal to \(S\) can be closely approximated by \((1 - e^{-y} \sum_{i=0}^{m-1} y^{i} / i! ) \), where \(y = K^{*} n \exp(-\lambda_{*}^{n} S)\). For \(m=1\) this reduces to the right hand side of Eq. (36). With this distribution one can assess whether or not the count of segments with moderate to high score over a whole protein is unusually high, since (36) allows to calculate explicitly the probability that some segment from a random sequence has a score larger than a given value. This provides a crucial benchmark for assessing the statistical significance of high-scoring segments.

As mentioned before, it is very important to define a good scoring scheme with relevance for the respective experimental situation. Often natural criteria underly the chosen score assignments. But sometimes one may be confronted with an unusual amino acid composition whose features are not easy to describe. The search for optimality of scoring schemes in order to identify a particular region is only possible if there occurs no statistical difference between the composition of high-scoring chance segments and the composition of similarly scoring true segments. The following mathematical result makes the composition of high-scoring chance segments relevant for the selection of suitable scores.

For a random sequence, growing without bound, the frequency of say letter \(a_{i}\) in any sufficiently high-scoring segment approaches \(p_{i} \exp( \lambda^{*} s_{i})\) with probability one. In particular this is true for the maximal segment. Since \(s_{i} = \log(q_{i}/p_{i})/\lambda^{*}\) for \(q_{i} := p_{i} \exp(\lambda^{*} s_{i})\), the score associated with each letter is the logarithm to some base \(q/p\), where \(p\) is the frequency with which the letter appears by chance, and \(q\) is the letter’s implicit target frequency. So the question of an optimal set of scores can be recast into the question of what is an optimal set of target frequencies. The best target frequencies to choose are those in the region of interest. So one merely has to characterize the letter distribution in those regions. The score for the letter \(a_{i}\) can be set equal to the corresponding log-likelihood ratio, namely \(\log(q_{i}/p_{i})\).

When comparing several sequences a basic problem is, to find similar segments in each sequence and to align them. Again the question is, when such subalignments are statistically significant and when not. Consider two independent random sequences with letter probabilities \(\{p_{1}, p_{2}, \ldots \}\) and \(\{ \tilde{p}_{1}, \tilde{p}_{2} \dots\}\). The pair of letters \(a_{i} b_{j}\), where \(a_{i}\) is of the first sequence and \(b_{j}\) is of the second sequence occurs with probability \(p_{i} \tilde{p}_{j}\). Let the score for such a pairing be \(s_{ij}\), which results in a scoring matrix. As before one assumes that

  • the expected pair score \(\sum_{i,j} p_{i} \tilde{p}_{j} s_{ij}\) is negative,

  • and there is some probability of a positive score.

Now \(\lambda_{*}\) is determined as the unique, positive solution of

$$\sum_{i,j} p_{i} \tilde{p}_{j}\exp(\lambda s_{ij}) = 1, $$

subject to the assumption that the probability distributions \(\{ p_{i} \}\) and \(\{ \tilde{p}_{j} \}\) for the two sequences are not too dissimilar and that the sequence lengths \(m\) and \(n\) grow at roughly equal rates. The previously mentioned rigorous result also holds for the maximal scoring segmental alignment, but with \(n\) now being replaced by \(nm\). For large \(x\) one obtains

$$\operatorname{Prob} \bigl\{ M > x + log(nm)/\lambda_{*} \bigr\} \leq K^{*} e^{-\lambda_{*} x} . $$

Thus any alignment of segments from two sequences has an unusually high score (statistically significant at the 1% level) if \(M\) exceeds \(x_{0} + \log(nm) / \lambda_{*} \), where \(x_{0}\) is determined by \(K^{*} \exp( - \lambda_{*} x_{0}) = 0.01\). This result can also be generalized in a natural way to the comparison of more than two sequences. The random model for protein sequences upon which this result is based is most useful for showing that the scores of certain subalignments can be explained by chance alone. As before, the random model for protein sequence comparison serves as a benchmark. Also in this case, optimal score matrices can be obtained by refined estimations of random and target distributions.

Example

Consider a specific protein of length \(n\) and a set of amino acid scores. In order to calculate the level below which 99% of the maximal segment scores for random sequences with similar composition and length will fall, one first takes the amino acid probabilities for a random protein model directly from the protein at hand. From these probabilities and the given scores, one can calculate \(K^{*}\) and \(\lambda_{*}\). Solving \(\exp( - e^{-\lambda_{*} x}) = 0.99\) for \(x\) yields \(x = - \log\log(1/0.99)/ \lambda_{*}\). Then any segment with score greater than \(x + (\log n + \log K^{*} )/\lambda_{*}\) is considered significant at the 99% level.

The statistical theory for multiple high-scoring segments works similarly.

More generally in [4] two independent sequences \(X_{1}, \ldots , X_{n}\) and \(Y_{1}, \ldots , Y_{n}\) are considered. It is supposed that the first sequence is i.i.d. \(\mu_{X}\) and the second one is i.i.d. \(\mu_{Y}\), where \(\mu_{X}\) and \(\mu_{Y}\) are distributions on finite alphabets \(\varSigma_{X}\) and \(\varSigma_{Y}\). A score \(F: \varSigma_{X} \times\varSigma_{Y} \rightarrow {\mathbb {R}}\) is assigned to each letter pair \((X_{i}, Y_{j})\). For the maximal nonaligned segment score \(M_{n} = \max_{ \{ 0\leq i,j \leq n - r, r \geq0\}} \{\sum_{l=1}^{r} F(X_{i+l}, Y_{j+l}) \}\), i.e. the maximal segment score allowing for shifts it was proved in [4] that

$$M_{n} /\log n \rightarrow\gamma^{*} ( \mu_{X}, \mu_{Y}) , $$

where \(\gamma^{*}\) is determined by a tractable variational formula. Further, the pair empirical measure of \((X_{i+l}, Y_{j+l})\) during the segment where \(M_{n}\) is achieved converges to a probability measure \(\nu^{*}\), which is accessible by the same formula. These results generalize to intrasequence scores with shifts, to asymptotics of the longest quality match, to more than two independent sequences, and to sequences of different length. The constant \(\gamma^{*}\) is expressed in terms of relative entropy functions. Vital for applications is the precise limit distribution of \(M_{n}\) centered at \((\gamma^{*}/\log n)\), which is given in [5].

Due to the paramount relevance of these and follow up results for the BLAST-type of programs it is recommended to not only reread the cited articles, but also related and follow up literature. This mathematical theory is the foundation and logical basis of the involved algorithms, and one of the fundamental contributions to nowadays computational biology tools. A fact, which unfortunately is barely known.