Keywords

Introduction

The time evolution of a spatially homogeneous mixture of chemically reacting molecules is often modeled using a stochastic formulation, which takes into account the inherent randomness of thermal molecular motion. This formulation is important when modeling complex reactions inside living cells, where small populations of key reactants can set the stage for significant stochastic effects. In this entry, we review the basic stochastic model of chemical reactions and discuss the most common techniques used to simulate and analyze this model.

Stochastic Models of Chemical Reactions

We start by considering a set of N molecular species (reactants) \(\mathcal{S}_{1},\ldots,\mathcal{S}_{N}\) that are confined to a fixed volume Ω. These species react through M possible reactions R1, , R M . In this formulation of chemical kinetics, we shall assume that the system is in thermal equilibrium and is well mixed. Thus, the reacting molecules move due to their thermal energy. The population of the different reactants is described by a random process X(t) = (X1(t)… X N (t))T, where X i (t) is a random variable that models the abundance (in terms of the number of copies) of molecules of species \(\mathcal{S}_{i}\) in the system at time t. For the allowable reactions, we shall only consider elementary reactions. These could either be monomolecular, \(\mathcal{S}_{i} \rightarrow \mbox{ products}\), or bimolecular, \(\mathcal{S}_{i} + \mathcal{S}_{j} \rightarrow \mbox{ products}\). Upon the firing of reaction R k , a transition occurs from some state \(X =\boldsymbol{ x}_{i}\) right before the reaction fires to some other state \(X =\boldsymbol{ x}_{i} +\boldsymbol{ s}_{k}\), which reflects the change in the population immediately after the reaction has fired. \(\boldsymbol{s}_{k}\) is referred to as the stoichiometric vector. The set of allowable M reactions defines the so-called stoichiometry matrix:

$$\displaystyle{S = \left [\begin{array}{*{10}c} \boldsymbol{s}_{1} & \cdots &\boldsymbol{s}_{M} \end{array} \right ].}$$

To each reaction R k , we associate a propensity function, \(w_{k}(\boldsymbol{x})\) that describes the rate of that reaction. More precisely, \(w_{k}(\boldsymbol{x})h\) is the probability that, given the system is in state \(\boldsymbol{x}\) at time t, R k fires once in the time interval [t, t + h). The propensity functions for elementary reactions is given in Table 1.

Stochastic Description of Biochemical Networks, Table 1 Propensity functions for elementary reactions. The constants c, c, and c′ ′ are related to k, k, and k′ ′, the reaction rate constants from deterministic mass-action kinetics. Indeed it can be shown that c = k, c = kΩ, and c′ ′ = 2k′ ′Ω

Limiting to the Deterministic Regime

There is an important connection between the stochastic process X(t), as represented by the continuous-time discrete-state Markov chain described above, and the solution of a related deterministic reaction rate equations obtained from mass-action kinetics. To see this, let Φ(t) = [Φ1(t), , Φ N (t)]T be the vector concentrations of species S1, , S N . According to mass-action kinetics, Φ(⋅ ) satisfies the ordinary differential equation:

$$\displaystyle{\dot{\varPhi }= Sf(\varPhi (t)),\qquad \varPhi (0) =\varPhi _{0}.}$$

In order to compare the Φ(t) with X(t), which represents molecular counts, we divide X(t) by the reaction volume to get XΩ(t) = X(t)∕Ω. It turns out that XΩ(t) limits to Φ(t): According to Kurtz (Ethier and Kurtz 1986), for every t ≥ 0:

$$\displaystyle{\lim _{\varOmega \rightarrow \infty }\sup _{s\leq t}\ \left \vert X^{\varOmega }(s) -\varPhi (s)\right \vert = 0,\quad \mbox{ almost surely}.}$$

Hence, over any finite time interval, the stochastic model converges to the deterministic mass-action one in the thermodynamic limit. Note that this is only a large volume limit result. In practice, for a fixed volume, a stochastic description may differ considerably from the deterministic description.

Stochastic Simulations

Gillespie’s stochastic simulation algorithm (SSA) constructs sample paths for the random process X(t) = (X1(t). X N (t))T that are consistent with the stochastic model described above (Gillespie 1976). It consists of the following basic steps:

  1. 1.

    Initialize the state X(0) and set t = 0.

  2. 2.

    Draw a random number τ ∈ (0, ) with exponential distribution and mean equal to \(1/\sum _{k}w_{k}(X(t))\).

  3. 3.

    Draw a random number \(k \in \{ 1,2,\ldots,M\}\) such that the probability of \(k = i \in \{ 1,2,\ldots,M\}\) is proportional to w i (X(t)).

  4. 4.

    Set \(X(t+\tau ) = X(t) +\boldsymbol{ s}_{k}\) and t = t +τ.

  5. 5.

    Repeat from (2) until t reaches the desired simulation time.

By running this algorithm multiple times with independent random draws, one can estimate the distribution and statistical moments of the random process X(t).

The Chemical Master Equation (CME)

The chemical master equation (CME), also known as the forward Kolmogorov equation, describes the time evolution of the probability that the system is in a given state \(\boldsymbol{x}\). The CME can be derived based on the Markov property of chemical reactions. Suppose the system is in state \(\boldsymbol{x}\) at time t. Within an error of order \(\mathcal{O}(h^{2})\), the following statements apply:

  • The probability that an R k reaction fires exactly once in the time interval [t, t + h) is given by \(w_{k}(\boldsymbol{x})h\).

  • The probability that no reactions fire in the time interval [t, t + h) is given by \(1 -\sum _{k}w_{k}(\boldsymbol{x})dx\).

  • The probability that more than one reaction fires in the time interval [t, t + h) is zero.

Let \(P(\boldsymbol{x},t)\), denote the probability that the system is in state \(\boldsymbol{x}\) at time t. We can express \(P(\boldsymbol{x},t + h)\) as follows:

$$\displaystyle\begin{array}{rcl} & & P(\boldsymbol{x},t + h) = P(\boldsymbol{x},t)\left (1 -\sum _{k}w_{k}(\boldsymbol{x})h\right ) {}\\ & & +\sum _{k}P(\boldsymbol{x} - s_{k},t)w_{k}(\boldsymbol{x} - s_{k})h + \mathcal{O}(h^{2}). {}\\ \end{array}$$

The first term on the right-hand side is the probability that the system is already in state \(\boldsymbol{x}\) at time t, and no reactions occur in the next h. In the second term on the right-hand side, the kth term in the summation is the probability that the system at time t is an R k reaction away from being at state \(\boldsymbol{x}\) and that an R k reaction takes place in the next h.

Moving \(P(\boldsymbol{x},t)\) to the left-hand side, dividing by h, and taking the limit as h goes to zero yields the chemical master equation (CME):

$$\displaystyle\begin{array}{rcl} \frac{dP(\boldsymbol{x},t)} {dt} & =\sum _{ k=1}^{M}\Big(w_{k}(\boldsymbol{x} -\boldsymbol{ s}_{k})P(\boldsymbol{x} -\boldsymbol{ s}_{k},t)& \\ & \qquad - w_{k}(\boldsymbol{x})P(\boldsymbol{x},t)\Big). &{}\end{array}$$
(1)

The CME defines a linear dynamical system in the probabilities of the different states (each state is defined by a specific number of molecules of each of the species). However, there are generally an infinite number of states, and the resulting infinite linear system is not directly solvable. One approach to overcome this difficulty is to approximate the solution of the CME by truncating the states. A particular truncation procedure that gives error bounds is called the finite-state projection (FSP) (Munsky and Khammash 2006). The key idea behind the FSP approach is to keep those states that support the bulk of the probability distribution while projecting the remaining infinite states onto a single “absorbing” state. See Fig. 1.

Stochastic Description of Biochemical Networks, Fig. 1
figure 1826figure 1826

The finite-state projection

The left panel in the figure shows the infinite states of a system with two species. The arrows indicate transitions among states caused by allowable chemical reactions. The underlying stochastic process is a continuous-time discrete-state Markov process. The right panel shows the projected (finite-state) system for a specific projection region (box). The projection is obtained as follows: transitions within the retained sates are kept, while transitions that emanate from these states and end at states outside the box are channeled to a single new absorbing state. Transitions into the box are deleted. The resulting projected system is a finite-state Markov process. The probability of each of its finite states can be computed exactly. It can be shown that the truncation, as defined here, gives a lower bound for the probability for the original full system. The FSP algorithm provides a way for constructing an approximation of the CME that satisfies any prespecified accuracy requirement.

Moment Dynamics

While the probability distribution \(P(\boldsymbol{x},t)\) provides great detail on the state \(\boldsymbol{x}\) at time t, often statistical moments of the molecule copy numbers already provide important information about their variability, which motivates the construction of mathematical models for the evolution of such models over time.

Given a vector of integers \(m := (m_{1},m_{2},\ldots,m_{n})\), we use the notation μ(m) to denote the following uncentered moment of X:

$$\displaystyle{ \mu ^{(m)} :=\mathrm{ E}[X_{ 1}^{m_{1} }X_{2}^{m_{2} }\cdots X_{n}^{m_{n} }]. }$$

Such moment is said to be of order \(\sum _{i}m_{i}\). With N species, there are exactly N first-order moments e[X i ], \(\forall i \in \{ 1,2,\ldots,N\}\), which are just the means; N(N − 1)∕2 second-order moments e[X i 2], \(\forall i\) and e[X i X j ], \(\forall i\neq j\), which can be used to compute variances and covariance; N(N − 1)(N − 2)∕6 third-order moments; and so on.

Using the CME (1), one can show that

$$\displaystyle\begin{array}{rcl} & & \frac{d\mu ^{(m)}} {dt} =\mathrm{ E}\Big[\sum _{k}w_{k}(X)\Big((X_{1} +\boldsymbol{ s}_{1,k})^{m_{1} }(X_{2} -\boldsymbol{ s}_{2,k})^{m_{2} } {}\\ & & \qquad \qquad \cdots (X_{N} -\boldsymbol{ s}_{N,k})^{m_{N} } - X_{1}^{m_{1} }X_{2}^{m_{2} }\cdots X_{N}^{m_{N} }\Big)\Big], {}\\ \end{array}$$

and, because the propensity functions are all polynomials on \(\boldsymbol{x}\) (cf. Table 1), the expected value in the right-hand side can actually be written as a linear combination of other uncentered moments of X. This means that if we construct a vector μ containing all the uncentered moments of x up to some order k, the evolution of μ is determined by a differential equation of the form

$$\displaystyle{ \frac{d\mu } {dt} = A\mu + B\bar{\mu },\quad \mu \in \mathbb{R}^{K},\;\bar{\mu }\in \mathbb{R}^{\bar{K}} }$$
(2)

where A and B are appropriately defined matrices and \(\bar{\mu }\) is a vector containing moments of order larger than k. The equation (2) is exact, and we call it the (exact) k-order moment dynamics, and the integer k is called the order of truncation. Note that the dimension K of (2) is always larger than k since there are many moments of each order. In fact, in general, K is of order nk.

When all chemical reactions have only one reactant, the term \(B\bar{\mu }\) does not appear in (2), and we say that the exact moment dynamics are closed. However, when at least one chemical reaction has two or more reactants, then the term \(B\bar{\mu }\) appears, and we say that the moment dynamics are open since (2) depends on the moments in \(\bar{\mu }\), which are not part of the state μ. When all chemical reactions are elementary (i.e., with at most two reactants), then all moments in \(\bar{\mu }\) are exactly of order k + 1.

Moment closure is a procedure by which one approximates the exact (but open) moment dynamics (2) by an approximate (but now closed) equation of the form

$$\displaystyle{ \dot{\nu }= A\nu + B\varphi (\nu ),\quad \nu \in \mathbb{R}^{K} }$$
(3)

where \(\varphi (\nu )\) is a column vector that approximates the moments in \(\bar{\mu }\). The function \(\varphi (\nu )\) is called the moment closure function, and (3) is called the approximate kth-order moment dynamics. The goal of any moment closure method is to construct \(\varphi (\nu )\) so that the solution ν to (3) is close to the solution μ to (2).

There are three main approaches to construct the moment closure function \(\varphi (\cdot )\):

  1. 1.

    Matching-based methods directly attempt to match the solutions to (2) and (3) (e.g., Singh and Hespanha 2011).

  2. 2.

    Distribution-based methods construct \(\varphi (\cdot )\) by making reasonable assumptions on the statistical distribution of the molecule counts vector x (e.g., Gomez-Uribe and Verghese 2007).

  3. 3.

    Large volume methods construct \(\varphi (\cdot )\) by assuming that reactions take place on a large volume (e.g., Van Kampen 2001).

It is important to emphasize that this classification is about methods to construct moment closure. It turns out that sometimes different methods lead to the same moment closure function \(\varphi (\cdot )\).

Conclusion and Outlook

We have introduced complementary approaches to study the evolution of biochemical networks that exhibit important stochastic effects.

Stochastic simulations permit the construction of sample paths for the molecule counts, which can be averaged to study the ensemble behavior of the system. This type of approach scales well with the number of molecular species, but can be computationally very intensive when the number of reactions is very large. This challenge has led to the development of approximate stochastic simulation algorithms that attempt to simulate multiple reactions in the same simulation step (e.g., Rathinam et al. 2003).

Solving the CME provides the most detailed and accurate approach to characterize the ensemble properties of the molecular counts, but for most biochemical systems such solution cannot be found in closed form, and numerical methods scale exponentially with the number of species. This challenge has led to the development of algorithms that compute approximate solutions to the CME, e.g., by aggregating states with low probability, while keeping track of the error (e.g., Munsky and Khammash 2006).

Moment dynamics is attractive in that the number of kth-order moments only scales polynomially with the number of chemical species, but one only obtains closed dynamics for very simple biochemical networks. This limitation has led to the development of moment closure techniques to approximate the open moment dynamics by a closed system of ordinary differential equations.

Cross-References