Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The idea of moment-based methods is most easily explained in the context of stochastic dynamical systems. Abstractly, such a system generates a time-indexed sequence of random variables \(x=x(t)\in \mathcal {X}\), say for \(t\in [0,+\infty )\) on a given state space \(\mathcal {X}\). Let us assume that the random variable x has a well-defined probability density function (PDF) \(p=p(x,t)\). Instead of trying to study the full PDF, it is a natural step to just focus on certain moments \(m_j=m_j(t)\) such as the mean, the variance, and so on, where \(j\in \mathcal {J}\) and \(\mathcal {J}\) is an index set and \(\mathbb {M}=\{m_j:j\in \mathcal {J}\}\) is a fixed finite-dimensional space of moments. In principle, we may consider any moment space \(\mathbb {M}\) consisting of a choice of coarse-grained variables approximating the full system, not just statistical moments. A typical moment-closure based study consists of four main steps:

  1. (S0)

    Moment Space Select the space \(\mathbb {M}\) containing a hierarchy of moments \(m_j\).

  2. (S1)

    Moment Equations The next step is to derive evolution equations for the moments \(m_j\). In the general case, such a system will be high-dimensional and fully coupled.

  3. (S2)

    Moment Closure The large, often even infinite-dimensional, system of moment equations has to be closed to make it tractable for analytical and numerical techniques. In the general case, the closed system will be nonlinear and it will only approximate the full system of all moments.

  4. (S3)

    Justification and Verification One has to justify, why the expansion made in step (S1) and the approximation made in step (S2) are useful in the context of the problem considered. In particular, the choice of the \(m_j\) and the approximation properties of the closure have to be answered.

Each of the steps (S0)–(S3) has its own difficulties. We shall not focus on (S0) as selecting what good ‘moments’ or ‘coarse-grained’ variables are creates its own set of problems. Instead, we consider some classical choices. (S1) is frequently a lengthy computation. Deriving relatively small moment systems tends to be a manageable task. For larger systems, computer algebra packages may help to carry out some of the calculations. Finding a good closure in (S2) is very difficult. Different approaches have shown to be successful. The ideas frequently include heuristics, empirical/numerical observations, physical first-principle considerations or a-priori assumptions. This partially explains, why mathematically rigorous justifications in (S3) are relatively rare and usually work for specific systems only. However, comparisons with numerical simulations of particle/agent-based models and comparisons with explicit special solutions have consistently shown that moment closure methods are an efficient tool. Here we shall also not consider (S3) in detail and refer the reader to suitable case studies in the literature.

Although moment closure ideas appear virtually across all quantitative scientific disciplines, a unifying theory has not emerged yet. In this review, several lines of research will be highlighted. Frequently the focus of moment closure research is to optimize closure methods with one particular application in mind. It is the hope that highlighting common principles will eventually lead to a better global understanding of the area.

In Sect. 13.2 we introduce moment equations more formally. We show how to derive moment equations via three fundamental approaches. In Sect. 13.3 the basic ideas for moment closure methods are outlined. The differences and similarities between different closure ideas are discussed. In Sect. 13.4 a survey of different applications is given. As already emphasized in the title of this review, we do not aim to be exhaustive here but rather try to indicate the common ideas across the enormous breadth of the area.

2 Moment Equations

The derivation of moment equations will be explained in the context of three classical examples. Although the examples look quite different at first sight, we shall indicate how the procedures are related.

2.1 Stochastic Differential Equations

Consider a probability space \((\varOmega ,\mathcal {F},\mathbb {P})\) and let \(W=W(t)\in \mathbb {R}^L\) be a vector of independent Brownian motions for \(t\in \mathbb {R}\). A system of stochastic differential equations (SDEs) driven by W(t) for unknowns \(x=x(t)\in \mathbb {R}^N=\mathcal {X}\) is given by

$$\begin{aligned} \text {d} x = f(x)~\text {d} t + F(x)~\text {d} W \end{aligned}$$
(13.1)

where \(f:\mathbb {R}^N\rightarrow \mathbb {R}^N\), \(F:\mathbb {R}^N\rightarrow \mathbb {R}^{N\times L}\) are assumed to be sufficiently smooth maps, and we interpret the SDEs in the Itô sense [1, 2]. Alternatively, one may write (13.1) using white noise, i.e., via the generalized derivative of Brownian motion , \(\xi :=W'\)  [1] as

$$\begin{aligned} x'=f(x)+F(x)\xi ,\qquad '=\frac{\text {d}}{\text {d} t}. \end{aligned}$$
(13.2)

For the equivalent Stratonovich formulation see [3]. Instead of studying (13.1)–(13.2) directly, one frequently focuses on certain moments of the distribution. For example, one may make the choice to consider

$$\begin{aligned} m_{\varvec{j}}(t):=\langle x(t)^{\varvec{j}}\rangle =\langle x_1(t)^{j_1}\cdots x_N(t)^{j_N}\rangle , \end{aligned}$$
(13.3)

where \(\langle \cdot \rangle \) denotes the expected (or mean) value and \(\varvec{j}\in \mathcal {J}\) , \(\varvec{j}=(j_1,\ldots ,j_N)\), \(j_n\in \mathbb {N}_0\), where \(\mathcal {J}\) is a certain set of multi-indices so that \(\mathbb {M}=\{m_j:\varvec{j}\in \mathcal {J}\}\). Of course, it should be noted that \(\mathcal {J}\) can be potentially a very large set, e.g., for the cardinality of all multi-indices up to order J we have

$$\begin{aligned} \left| \left\{ \varvec{j}\in \mathbb {N}_0^N:|\varvec{j}|=\sum _n j_n\le J\right\} \right| = \left( \begin{array}{c}J+N\\ J \\ \end{array}\right) =\frac{(J+N)!}{J!N!}. \end{aligned}$$

However, the main steps to derive evolution equations for \(m_{\varvec{j}}\) are similar for every fixed choice of JN. After defining \(m_{\varvec{j}}=m_{\varvec{j}}(t)\) (or any other “coarse-grained” variables), we may just differentiate \(m_{\varvec{j}}\). Consider as an example the case \(N=1=L\), and \(\mathcal {J}=\{1,2,\ldots ,J\}\), where we write the multi-index simply as \(\varvec{j}=j\in \mathbb {N}_0\). Then averaging (13.2) yields

$$\begin{aligned} m_1'=\langle x'\rangle =\langle f(x)\rangle +\langle F(x) \xi \rangle , \end{aligned}$$

which illustrates the problem that we may never hope to express the moment equations explicitly for any nonlinear SDE if f and/or F are not expressible as convergent power series, i.e., if they are not analytic. The term \(\langle F(x) \xi \rangle \) is not necessarily equal to zero for general nonlinearities F as \(\int _0^t F(x(s))~\text {d} W(s)\) is only a local martingale under relatively mild assumptions [2]. Suppose we simplify the situation drastically by assuming a quadratic polynomial f and constant additive noise

$$\begin{aligned} f(x)=a_2x^2+a_1 x+a_0,\qquad F(x)\equiv \sigma \in \mathbb {R}. \end{aligned}$$
(13.4)

Then we can actually use that \(\langle \xi \rangle =0\) and get

$$\begin{aligned} m_1'=\langle x'\rangle =a_2\langle x^2\rangle +a_1\langle x\rangle +a_0=a_2m_2+a_1m_1+a_0. \end{aligned}$$

Hence, we also need an equation for the moment \(m_2\). Using Itô’s formula one finds the differential

$$\begin{aligned} \text {d} (x^2) = [2x f(x)+\sigma ^2]~\text {d} t + 2x \sigma ~\text {d} W \end{aligned}$$

and taking the expectation it follows that

$$\begin{aligned} m_2'= & {} 2\langle a_2x^3+a_1 x^2+a_0 x\rangle +\sigma ^2 + \sigma \langle 2x\xi \rangle \nonumber \\= & {} 2(a_2m_3+a_1m_2+a_0m_1)+\sigma ^2, \end{aligned}$$
(13.5)

where \(\langle 2x\xi \rangle =0\) due to the martingale property of \(\int _0^t 2x(s)~\text {d} W_s\). The key point is that the ODE for \(m_2\) depends upon \(m_3\). The same problem repeats for higher moments and we get an infinite system of ODEs, even for the simplified case considered here. For a generic nonlinear SDE, the moment system is a fully-coupled infinite-dimensional system of ODEs. Equations at a given order \(|\varvec{j}|=J\) depend upon higher-order moments \(|\varvec{j}|>J\), where \(|\varvec{j}|:=\sum _nj_n\).

Another option to derive moment equations is to consider the Fokker-Plank (or forward Kolmogorov) equation associated to (13.1)–(13.2); see [3]. It describes the probability density \(p=p(x,t|x_0,t_0)\) of x at time t starting at \(x_0=x(t_0)\) and is given by

$$\begin{aligned} \frac{\partial p}{\partial t}=-\sum _{k=1}^N \frac{\partial }{\partial x_k}[pf] +\frac{1}{2} \sum _{i,k=1}^N \frac{\partial ^2}{\partial x_i \partial x_k}[(F F^T)_{ik}p]. \end{aligned}$$
(13.6)

Consider the case of additive noise \(F(x)\equiv \sigma \), quadratic polynomial nonlinearity f(x) and \(N=1=L\) as in (13.4), then we have

$$\begin{aligned} \frac{\partial p}{\partial t}=-\frac{\partial }{\partial x}[(a_2x^2+a_1 x+a_0)p ] +\frac{\sigma ^2}{2}\frac{\partial ^2 p}{\partial x^2}. \end{aligned}$$
(13.7)

The idea to derive equations for \(m_j\) is to multiply (13.7) by \(x^j\), integrate by parts and use some a-priori known properties or assumptions about p. For example, we have

$$\begin{aligned} m_1'= & {} \langle x'\rangle =\int _\mathbb {R} x\frac{\partial p}{\partial t}~\text {d} x\\= & {} \int _\mathbb {R} -x\frac{\partial }{\partial x}[(a_2x^2+a_1 x+a_0)p ]~\text {d} x +\int _\mathbb {R} x\frac{\sigma ^2}{2}\frac{\partial ^2 p}{\partial x^2}~\text {d} x. \end{aligned}$$

If p and its derivative vanish at infinity, which is quite reasonable for many densities, then integration by parts gives

$$\begin{aligned} m_1'=\int _\mathbb {R} [(a_2x^2+a_1 x+a_0)p ]~\text {d} x = a_2m_2+a_1m_1+a_0 \end{aligned}$$

as expected. A similar calculation yields the equations for other moments. Using the forward Kolmogorov equation generalizes in a relatively straightforward way to other Markov process, e.g., to discrete-time and/or discrete-space stochastic processes; in fact, many discrete stochastic processes have natural ODE limits [47]. In the context of Markov processes, yet another approach is to utilize the moment generating function or Laplace transform \(s\mapsto \langle \exp [\text {i} sx]\rangle \) (where \(\text {i}:=\sqrt{-1}\) ) to determine equations for the moments.

2.2 Kinetic Equations

A different context where moment methods are used frequently is kinetic theory [810]. Let \(x\in \varOmega \subset \mathbb {R}^N\) and consider the description of a gas via a single-particle density \(\varrho =\varrho (x,t,v)\), which is nonnegative and can be interpreted as a probability density if it is normalized; in fact, the notational similarity between p from Sect. 13.2.1 and the one-particle density \(\varrho \) is deliberate. The pair \((x,v)\in \varOmega \times \mathbb {R}^{N}\) is interpreted as position and velocity. A kinetic equation is given by

$$\begin{aligned} \frac{\partial \varrho }{\partial t}+v\cdot \nabla _x \varrho =Q(\varrho ), \end{aligned}$$
(13.8)

where \(\nabla _x=\left( \frac{\partial }{\partial x_1},\ldots ,\frac{\partial }{\partial x_N}\right) ^\top \), suitable boundary conditions are assumed, and \(\varrho \mapsto Q(\varrho )\) is the collision operator acting only on the v-variable at each \((x,t)\in \mathbb {R}^N \times [0,+\infty )\) with domain \(\mathcal {D}(Q)\). For example, for short-range interaction and hard-sphere collisions [11] one would take for a function \(v\mapsto G(v)\) the operator

$$\begin{aligned} Q(G)(v)=\int _{\mathbb {S}^{N-1}}\int _{\mathbb {R}^N}\Vert v-w\Vert [G(w^*)G(v^*)-G(v)G(w)]~\text {d} w~\text {d} \psi \end{aligned}$$

where \(v^*=\frac{1}{2}(v+w+\Vert v-w\Vert \psi )\), \(w^*=\frac{1}{2}(v+w+\Vert v-w\Vert \psi )\) for \(\psi \in \mathbb {S}^{N-1}\) and \(\mathbb {S}^{N-1}\) denotes the unit sphere in \(\mathbb {R}^N\). We denote velocity averaging by

$$\begin{aligned} \langle G\rangle =\int _{\mathbb {R}^N} G(v)~\text {d} v, \end{aligned}$$

where the overloaded notation \(\langle \cdot \rangle \) is again deliberately chosen to highlight the similarities with Sect. 13.2.1. It is standard to make several assumptions about the collision operator such as the conservation of mass, momentum, energy as well as local entropy dissipation

$$\begin{aligned} \langle Q(G)\rangle =0,\quad \langle v Q(G)\rangle =0,\quad \langle \Vert v\Vert ^2Q(G)\rangle =0,\quad \langle \ln (G)Q(G)\rangle \le 0. \end{aligned}$$
(13.9)

Moreover, one usually assumes that the steady states of (13.8) are Maxwellian (Gaussian-like) densities of the form

$$\begin{aligned} \rho _*(v)=\frac{q}{(2\pi \theta )^{N/2}}\exp \left( -\frac{\Vert v-v_*\Vert ^2}{2\theta }\right) ,\quad (q,\theta ,v_*)\in \mathbb {R}^+\times \mathbb {R}^+\times \mathbb {R}^N \end{aligned}$$
(13.10)

and that Q commutes with certain group actions [8] implying symmetries. Note that the physical constraints (13.9) have important consequences, e.g., entropy dissipation implies the local dissipation law

$$\begin{aligned} \frac{\partial }{\partial t}\langle \varrho \ln \varrho -\varrho \rangle +\nabla _x\cdot \langle v(\varrho \ln \varrho -\varrho ) \rangle = \langle \ln \varrho Q(\varrho )\rangle \le 0. \end{aligned}$$
(13.11)

while mass conservation implies the local conservation law

$$\begin{aligned} \frac{\partial }{\partial t}\langle \varrho \rangle +\nabla _x\cdot \langle v\varrho \rangle = 0 \end{aligned}$$
(13.12)

with similar local conservation laws for momentum and energy. The local conservation law indicates that it could be natural, similar to the SDE case above, to multiply the kinetic equation (13.8) by polynomials and then average. Let \(\{m_j=m_j(v)\}_{j=1}^J\) be a basis for a J-dimensional space of polynomials \(\mathbb {M}\). Consider a column vector \(M=M(v)\in \mathbb {R}^J\) containing all the basis elements so that every element \(m\in \mathbb {M}\) can be written as \(m=\alpha ^\top M\) for some vector \(\alpha \in \mathbb {R}^J\). Then it follows

$$\begin{aligned} \frac{\partial }{\partial t}\langle \varrho M\rangle +\nabla _x \cdot \langle v\varrho M\rangle =\langle Q(\varrho )M\rangle \end{aligned}$$
(13.13)

by multiplying and averaging. This is exactly the same procedure as for the forward Kolmogorov equation for the SDE case above. Observe that (13.13) is a J-dimensional set of moment equations when viewed component-wise. This set is usually not closed. We already see by looking at the case \(M\equiv v\) that the second term in (13.13) will usually generate higher-order moments.

2.3 Networks

Another common situation where moment equations appear are network dynamical systems . Typical examples occur in epidemiology, chemical reaction networks and socio-economic models. Here we illustrate the moment equations [1215] for the classical susceptible-infected-susceptible (SIS) model [16] on a fixed network; for remarks on adaptive networks see Sect. 13.4. Given a graph of K nodes, each node can be in two states, infected I or susceptible S. Along an SI-link infections occur at rate \(\tau \) and recovery of infected nodes occurs at rate \(\gamma \). The entire (microscopic) description of the system is then given by all potential configurations \(x\in \mathbb {R}^N=\mathcal {X}\) of non-isomorphic graph configurations of S and I nodes. Even for small graphs, N can be extremely large since already just all possible node configurations without considering the topology of the graph are \(2^K\). Therefore, it is natural to consider a coarse-grained description. Let \(m_I=\langle I\rangle =\langle I\rangle (t)\) and \(m_S=\langle S\rangle =\langle S\rangle (t)\) denote the average number of infected and susceptibles at time t. From the assumptions about infection and recovery rates we formally derive

$$\begin{aligned} \frac{\text {d} m_S}{\text {d} t}= & {} \gamma m_I - \tau \langle SI \rangle , \end{aligned}$$
(13.14)
$$\begin{aligned} \frac{\text {d} m_I}{\text {d} t}= & {} \tau \langle SI \rangle - \gamma m_I, \end{aligned}$$
(13.15)

where \(\langle SI\rangle =:m_{SI}\) denotes the average number of SI-links. In (13.14) the first term describes that susceptibles are gained proportional to the number of infected times the recovery rate \(\gamma \). The second term describes that infections are expected to occur proportional to the number of SI-links at the infection rate \(\tau \). Equation (13.15) can be motivated similarly. However, the system is not closed and we need an equation for \(\langle SI\rangle \). In addition to (13.14)–(13.15), the result [14, Theorem 1] states that the remaining second-order motif equations are given by

$$\begin{aligned} \frac{\text {d} m_{SI}}{\text {d} t}= & {} \gamma (m_{II} - m_{SI}) +\tau (m_{SSI} - m_{ISI} - m_{SI}),\end{aligned}$$
(13.16)
$$\begin{aligned} \frac{\text {d} m_{II}}{ d t}= & {} -2\gamma m_{II} +2\tau (m_{ISI}+m_{SI}),\end{aligned}$$
(13.17)
$$\begin{aligned} \frac{\text {d} m_{SS}}{\text {d} t}= & {} 2\gamma m_{SI} - 2\tau m_{SSI}, \end{aligned}$$
(13.18)

where we refer also to [12, 13]; it should be noted that (13.16)–(13.18) does not seem to coincide with a direct derivation by counting links [17, (9.2)–(9.3)]. In any case, it is clear that third-order motifs must appear, e.g., if we just look at the motif ISI then an infection event generates two new II-links so the higher-order topological motif structure does have an influence on lower-order densities. If we pick the second-order space of moments

$$\begin{aligned} \mathbb {M}=\{m_{I},m_{S},m_{SI},m_{SS},m_{II}\} \end{aligned}$$
(13.19)

the Eqs. (13.14)–(13.15) and (13.16)–(13.18) are not closed. We have the same problems as for the SDE and kinetic cases discussed previously. The derivation of the SIS moment equations can be based upon formal microscopic balance considerations. Another option is write the discrete finite-size SIS-model as a Markov chain with Kolmogorov equation

$$\begin{aligned} \frac{\text {d} x}{\text {d} t} = P x, \end{aligned}$$
(13.20)

which can be viewed as an ODE of \(2^K\) equations given by a matrix P. One defines the moments as averages, e.g., taking

$$\begin{aligned} \langle I \rangle (t):=\sum _{k=0}^K k x^{(k)}(t), \qquad \langle S \rangle (t) :=\sum _{k=0}^K (K-k) x^{(k)}(t), \end{aligned}$$

where \(x^{(k)}(t)\) are all states with k infected nodes at time t. Similarly one can define higher moments, multiply the Kolmogorov equation by suitable terms, sum the equation as an analogy to the integration presented in Sect. 13.2.2, and derive the moment equations [14]. For any general network dynamical systems, moment equations can usually be derived. However, the choice which moment (or coarse-grained) variables to consider is far from trivial as discussed in Sect. 13.4.

3 Moment Closure

We have seen that moment equations, albeit being very intuitive, do suffer from the drawback that the number of moment equations tends to grow rapidly and the exact moment system tends to form an infinite-dimensional system given by

$$\begin{aligned} \begin{array}{lcl} \frac{\text {d} m_1}{\text {d} t}&{}=&{}h_1(m_1,m_2,\ldots ),\\ \frac{\text {d} m_2}{\text {d} t}&{}=&{}h_2(m_2,m_3,\ldots ),\\ \frac{\text {d} m_3}{\text {d} t}&{}=&{}\cdots ,\\ \end{array} \end{aligned}$$
(13.21)

where we are going to assume from now on the even more general case \(h_j=h_j(m_1,m_2,m_3,\ldots )\) for all j. In some cases, working with an infinite-dimensional system of moments may already be preferable to the original problem. We do not discuss this direction further and instead try to close (13.21) to obtain a finite-dimensional system. The idea is to find a mapping H, usually expressing the higher-order moments in terms of certain lower-order moments of the form

$$\begin{aligned} H(m_1,\ldots ,m_\kappa )=(m_{\kappa +1},m_{\kappa +2},\ldots ) \end{aligned}$$
(13.22)

for some \(\kappa \in \mathcal {J}\), such that (13.21) yields a closed system

$$\begin{aligned} \begin{array}{ccc} \frac{\text {d} m_1}{\text {d} t}&{}=&{}h_1(m_1,m_2,\ldots ,m_\kappa ,H(m_1,\ldots ,m_\kappa )),\\ \frac{\text {d} m_2}{\text {d} t}&{}=&{}h_2(m_1,m_2,\ldots ,m_\kappa ,H(m_1,\ldots ,m_\kappa )),\\ \vdots &{}=&{}\vdots \\ \frac{\text {d} m_\kappa }{\text {d} t}&{}=&{}h_\kappa (m_1,m_2,\ldots ,m_\kappa ,H(m_1,\ldots ,m_\kappa )).\\ \end{array} \end{aligned}$$
(13.23)

The two main questions are

  1. (Q1)

    How to find/select the mapping H?

  2. (Q2)

    How well does (13.23) approximate solutions of (13.21) and/or of the original dynamical system from which the moment equations (13.21) have been derived?

Here we shall focus on describing the several answers proposed to (Q1). For a general nonlinear system, (Q2) is extremely difficult and Sect. 13.3.4 provides a geometric conjecture why this could be the case.

3.1 Stochastic Closures

In this section we focus on the SDE (13.1) from Sect. 13.2.1. However, similar principles apply to all incarnations of the moment equations we have discussed. One possibility is to truncate [18] the system and neglect all moments higher than a certain order, which means taking

$$\begin{aligned} H(m_1,\ldots ,m_\kappa )=(0,0,\ldots ). \end{aligned}$$
(13.24)

Albeit being rather simple, the advantage of (13.24) is that it is trivial to implement and does not work as badly as one may think at first sight for many examples. A variation of the theme is to use the method of steady-state of moments by setting

$$\begin{aligned} \begin{array}{ccc} 0&{}=&{}h_{\kappa +1}(m_1,m_2,\ldots ,m_\kappa ,m_{\kappa +1},\ldots ),\\ 0&{}=&{}h_{\kappa +2}(m_1,m_2,\ldots ,m_\kappa ,m_{\kappa +1},\ldots ),\\ \vdots &{}=&{}\vdots \\ \end{array} \end{aligned}$$
(13.25)

and try to solve for all higher-order moments in terms of \((m_1,m_2,\ldots ,m_\kappa )\) in the algebraic equations (13.25). As we shall point out in Sect. 13.3.4, this is nothing but the quasi-steady-state assumption in disguise. Similar ideas as for zero and steady-sate moments can also be implemented using central moments and cumulants  [18].

Another common idea for moment closure principles is to make an a priori assumption about the distribution of the solution. Consider the one-dimensional SDE example (\(N=1=L\)) and suppose \(x=x(t)\) is normally distributed. For a normal distribution with mean zero and variance \(\nu ^2\), we know the moments

$$\begin{aligned} \langle x^j\rangle =\nu ^j(j-1)!!,\quad \text {if}\ j\ \text {is even,}\qquad \langle x^j\rangle = 0,\quad \text {if}\ j \text { is odd,} \end{aligned}$$
(13.26)

so one closure method, the so-called Gaussian (or normal) closure , is to set

$$\begin{aligned} m_j= & {} 0\quad \text {if}\ j\ge 3 \ \text {and}\ j\ \text {is odd},\\ m_j= & {} (m_2)^{j/2}~(j-1)!! \quad \text {if}\ j\ge 4 \ \text {and}\ j \ \text {is even.} \end{aligned}$$

A similar approach can be implemented using central moments. If x turns out to deviate substantially from a Gaussian distribution, then one has to question whether a Gaussian closure is really a good choice. The Gaussian closure principle is one choice of a wide variety of distributional closures. For example, one could assume the moments of a lognormal distribution [19 ] instead

$$\begin{aligned} x\sim \exp [\tilde{\mu }+\tilde{\nu }\tilde{x}],~\tilde{x}\sim \mathcal {N}(0,1),\quad \Rightarrow \langle x^j\rangle =m_j=\exp \left[ j\tilde{\mu }+\frac{1}{2} j^2\tilde{\nu }^2\right] \end{aligned}$$
(13.27)

where ‘\(\sim \)’ means ‘distributed according to’ a given distribution and \(\mathcal {N}(0,1)\) indicates the standard normal distribution. Solving for \((\tilde{\mu },\tilde{\nu })\) in (13.27) in terms of \((m_1,m_2)\) yields a moment closure \((m_3,m_4,\ldots )=H(m_1,m_2)\). The same principle also works for discrete state space stochastic process, using a-prior distribution assumption. A typical example is the binomial closure [20] and mixtures of different distributional closure have also been considered [21, 22].

3.2 Physical Principle Closures

In the context of moment equations of the form (13.13) derived from kinetic equations, a typical moment closure technique is to consider a constrained closure based upon a postulated physical principle. The constraints are usually derived from the original kinetic equation (13.8), e.g., if it satisfies certain symmetries, entropy dissipation and local conservation laws, then the closure for the moment equations should aim to capture these properties somehow. For example, the assumption

$$\begin{aligned} \text {span}\{1,v_1,\ldots ,v_N,\Vert v\Vert ^2\}\subset \mathbb {M} \end{aligned}$$

turns out to be necessary to recover conservation laws [8], while assuming that the space \(\mathbb {M}\) is invariant under suitable transformations is going to preserve symmetries. However, even by restricting the space of moments to preserve certain physical assumptions, this usually does not constraint the moments enough to get a closure. Following [8] suppose that the single-particle density is given by

$$\begin{aligned} \varrho =\mathfrak {M}(\alpha )=\exp [\alpha ^\top M(v)],\qquad m=m(v)\in \mathbb {M}\,\text {s.t.}\,m(v)=\alpha ^\top M(v) \end{aligned}$$
(13.28)

for some moment densities \(\alpha =\alpha (x,t)\in \mathbb {R}^J\) . Using (13.28) in (13.13) leads to

$$\begin{aligned} \frac{\partial }{\partial t}\langle \mathfrak {M}(\alpha )M\rangle +\nabla _x \cdot \langle v\mathfrak {M}(\alpha ) M\rangle =\langle Q(\mathfrak {M}(\alpha ))M\rangle . \end{aligned}$$
(13.29)

Observe that we may view (13.29) as a system of J equations for the J unknowns \(\alpha \). Hence, one has formally achieved closure. The question is what really motivates the exponential ansatz (13.28). Introduce new variables \(\eta =\langle \mathfrak {M}(\alpha ) M\rangle \) and define a function

$$\begin{aligned} H(\eta )=-\langle \mathfrak {M}(\alpha ) \rangle +\alpha ^\top \eta \end{aligned}$$

and one may show that \(\alpha =[\text {D}_\eta H](\eta )\). It turns out [8] that \(H(\eta )\) can be computed by solving the entropy minimization problem

$$\begin{aligned} \min _\varrho \{\langle \varrho \ln \varrho -\varrho \rangle :\langle M\varrho \rangle =\eta \}=H(\eta ), \end{aligned}$$
(13.30)

where the constraint \(\langle M\varrho \rangle =\eta \) prescribes certain moments; we recall that \(M=M(v)\) is the fixed vector containing the moment space basis elements and the relation \(\alpha =[\text {D}_\eta H](\eta )\) holds. From a statistical physics perspective, it may be more natural to view (13.30) as an entropy maximization problem [23] by introducing another minus sign. Therefore, the choice of the exponential function in the ansatz (13.28) does not only guarantee non-negativity but it was developed as it is the Legendre transform of the so-called entropy density \(\varrho \mapsto \varrho \ln \varrho -\varrho \) so it naturally relates to a physical optimization problem [8].

To motivate further why using a closure motivated by entropy corresponds to certain physical principles, let us consider the ‘minimal’ moment space

$$\begin{aligned} \mathbb {M}=\text {span}\{1,v_1,\ldots ,v_N,\Vert v\Vert ^2\} \end{aligned}$$

The closure ansatz (13.28) can be facilitated using the vector \(M(v)=(1,v_1,\ldots ,v_N,\Vert v\Vert ^2)\) but then [24] the ansatz is related to the Maxwellian density (13.10) since

$$\begin{aligned} \rho _*(v)=\exp [\alpha ^\top M(v)],\quad \alpha =\left( \ln \left( \frac{q}{(2\pi \theta )^{3/2}}\right) -\frac{\Vert v_*\Vert }{2\theta },\frac{v_*}{\theta },-\frac{1}{2\theta }\right) ^\top \end{aligned}$$

but Maxwellian densities are essentially Gaussian-like densities and we again have a Gaussian closure. Using a Gaussian closure implies that the moment equations become the Euler equations of gas dynamics, which can be viewed as a mean-field model near equilibrium for the mesoscopic single-particle kinetic equation (13.8), which is itself a limit of microscopic equations for each particle [25, 26].

Taking a larger moment space \(\mathbb {M}\) one may also get the Navier-Stokes equation as a limit [8], and this hydrodynamic limit can even be justified rigorously under certain assumptions [27]. This clearly shows that moment closure methods can link physical theories at different scales.

3.3 Microscopic Closures

Since there are limit connections between the microscopic level and macroscopic moment equations, it seems plausible that starting from an individual-based network model, one may motivate moment closure techniques. Here we shall illustrate this approach for the SIS-model from Sect. 13.2.3. Suppose we start at the level of first-order moments and let \(\mathbb {M}=\{m_I,m_S\}\). To close (13.14)–(13.15) we want a map

$$\begin{aligned} m_{SI}=H(m_I,m_S). \end{aligned}$$
(13.31)

If we view the density of the I nodes and S nodes as very weakly correlated random variables then a first guess is to use the approximation

$$\begin{aligned} m_{SI}=\langle SI\rangle \approx \langle S \rangle \langle I\rangle =m_{S}m_{I}. \end{aligned}$$
(13.32)

Plugging (13.32) into (13.14)–(13.15) yields the mean-field SIS model

$$\begin{aligned} \begin{array}{lcl} m_S' &{} = &{} \gamma m_I - \tau m_{S}m_{I},\\ m_I' &{} = &{} \tau m_{S}m_{I} - m_I. \end{array} \end{aligned}$$
(13.33)

The mean-field SIS model is one of the simplest examples where one clearly sees that although the moment equations are linear ODEs, the moment-closure ODEs are frequently nonlinear. It is important to note that (13.32) is not expected to be valid for all possible networks as it ignores the graph structure. A natural alternative is to consider

$$\begin{aligned} m_{SI}=\langle SI\rangle \approx \mathfrak {m}_{\text {d}}\langle S \rangle \langle I\rangle = \mathfrak {m}_{\text {d}} m_{S}m_{I}, \end{aligned}$$
(13.34)

where \(\mathfrak {m}_{\text {d}}\) is the mean degree of the given graph/network. Hence it is intuitive that (13.32) is valid for a complete graph in the limit \(K\rightarrow \infty \) [15].

If we want to find a closure similar to the approximation (13.32) for second-order moments with \(\mathcal {M}\) as in (13.19), then the classical choice is the pair-approximation [2830 ]

$$\begin{aligned} m_{abc}\approx \frac{m_{ab}m_{bc}}{m_b},\qquad a,b,c\in \{S,I\} \end{aligned}$$
(13.35)

which just means that the density of triplet motifs is given approximately by counting certain link densities that form the triplet. In (13.35) we have again ignored pre-factors from the graph structure such as the mean excess degree [12, 17]. As before, the assumption (13.35) is neglecting certain correlations and provides a mapping

$$\begin{aligned} (m_{SSI},m_{ISI})=H(m_{II},m_{SS},m_{SI})=\left( \frac{m_{SS}m_{SI}}{m_S},\frac{m_{SI}m_{SI}}{m_S}\right) \end{aligned}$$
(13.36)

and substituting (13.36) into (13.16)–(13.18) yields a system of five closed nonlinear ODEs. Many other paradigms for similar closures exist. The idea is to use the interpretation of the moments and approximate certain higher-order moments based upon certain assumptions for each moment/motif. In the cases discussed here, this means neglecting certain correlation terms from random variables. At least on a formal level, this is approach is related to the other closures we have discussed. For example, forcing maximum entropy means minimizing correlations in the system while assuming a certain distribution for the moments just means assuming a particular correlation structure of mixed moments.

3.4 Geometric Closure

All the moment closure methods described so far, have been extensively tested in many practical examples and frequently lead to very good results; see Sect. 13.4. However, regarding the question (Q2) on approximation accuracy of moment closure, no completely general results are available. To make progress in this direction I conjecture that a high-potential direction is to consider moment closures in the context of geometric invariant manifold theory. There is very little mathematically rigorous work in this direction [31] although the relevance [32, 33] is almost obvious.

Consider the abstract moment equations (13.21). Let us assume for illustration purposes that we know that (13.21) can be written as a system

$$\begin{aligned} \begin{array}{ccc} \frac{\text {d} m_1}{\text {d} t}&{}=&{}h_1(m_1,m_2,\ldots ,m_\kappa ,m_{\kappa +1},m_{\kappa +2},\ldots ),\\ \frac{\text {d} m_2}{\text {d} t}&{}=&{}h_2(m_1,m_2,\ldots ,m_\kappa ,m_{\kappa +1},m_{\kappa +2},\ldots ),\\ \vdots &{}=&{}\vdots \\ \frac{\text {d} m_\kappa }{\text {d} t}&{}=&{}h_\kappa (m_1,m_2,\ldots ,m_\kappa ,m_{\kappa +1},m_{\kappa +2},\ldots ).\\ \frac{\text {d} m_{\kappa +1}}{\text {d} t}&{}=&{}\frac{1}{\varepsilon }h_{\kappa +1}(m_1,m_2,\ldots , m_\kappa ,m_{\kappa +1},m_{\kappa +2},\ldots ).\\ \frac{\text {d} m_{\kappa +2}}{\text {d} t}&{}=&{}\frac{1}{\varepsilon }h_{\kappa +2}(m_1,m_2,\ldots , m_\kappa ,m_{\kappa +1},m_{\kappa +2},\ldots ).\\ \vdots &{}=&{}\vdots \\ \end{array} \end{aligned}$$
(13.37)

where \(0<\varepsilon \ll 1\) is a small parameter and each of the component functions of the vector field h is of order \(\mathcal {O}(1)\) as \(\varepsilon \rightarrow 0\). Then (13.37) is a fast-slow system [34, 35 ] with fast variables \((m_{\kappa +1},m_{\kappa +2},\ldots )\) and slow variables \((m_1,\ldots ,m_{\kappa })\). The classical quasi-steady-state assumption  [36] to reduce (13.37) to a lower-dimensional system is to take

$$\begin{aligned} 0=\frac{\text {d} m_{\kappa +1}}{\text {d} t},\qquad 0=\frac{\text {d} m_{\kappa +2}}{\text {d} t},\qquad \cdots . \end{aligned}$$

This generates a system of differential-algebraic equations and if we can solve the algebraic equations

$$\begin{aligned} 0=h_{\kappa +1}(m_1,m_2,\ldots ),\qquad 0=h_{\kappa +2}(m_1,m_2,\ldots ),\qquad \cdots \end{aligned}$$
(13.38)

via a mapping H as in (13.22) we end up with a closed system of the form (13.23).

The quasi-steady-state approach hides several difficulties that are best understood geometrically from the theory of normally hyperbolic invariant manifolds, which is well exemplified by the case of fast-slow systems. For fast-slow systems, the algebraic equations (13.38) provide a representation of the critical manifold

$$\begin{aligned} \mathcal {C}_0=\{(m_1,m_2,\ldots ):h_j=0 \quad \text {for}\,j>\kappa , j\in \mathbb {N}\}. \end{aligned}$$

However, it is crucial to note that, despite its name, \(\mathcal {C}_0\) is not necessarily a manifold but in general just an algebraic variety. Even if we assume that \(\mathcal {C}_0\) is a manifold and we would be able to find a mapping H of the form (13.22), this mapping is generically only possible locally [34, 37]. Even if we assume in addition that the mapping is possible globally, then the dynamics on \(\mathcal {C}_0\) given by (13.22) does not necessarily approximate the dynamics of the full moment system for \(\varepsilon >0\). The relevant property to have a dynamical approximation is normal hyperbolicity , i.e., the ‘matrix’

$$\begin{aligned} \left. \left( \frac{\partial h_j}{\partial m_l}\right) \right| _{\mathcal {C}_0},\qquad j,l\in \{\kappa +1,\kappa +2,\ldots \} \end{aligned}$$

has no eigenvalues with zero real parts; in fact, this matrix is just the total derivative of the fast variables restricted to points on \(\mathcal {C}_0\) but for moment equations it is usually infinite-dimensional. Even if we assume in addition that \(\mathcal {C}_0\) is normally hyperbolic, which is a very strong and non-generic assumption for a fast-slow system [34, 35], then the dynamics given via the map H is only the lowest-order approximation. The correct full dynamics is given on a slow manifold

$$\begin{aligned} \mathcal {C}_\varepsilon =\{(m_{\kappa +1},m_{\kappa +2},\ldots )=H(m_1,m_2,\ldots ,m_\kappa )+ \mathcal {O}(\varepsilon )\} \end{aligned}$$
(13.39)

so H is only correct up to order \(\mathcal {O}(\varepsilon )\). This novel viewpoint on moment closure shows why it is probably quite difficult [38] to answer the approximation question (Q2) since for a general nonlinear system, the moment equations will only admit a closure via an explicit formula locally in the phase space of moments. One has to be very lucky, and probably make very effective use of special structures [39, 40] in the dynamical system, to obtain any global closure. Local closures are also an interesting direction to pursue [41].

4 Applications and Further References

Historically, applications of moment closure can at least be traced back to the classical Kirkwood closure [42 ] as well as statistical physics applications, e.g., in the Ising model [43]. The Gaussian (or normal) closure has a long history as well [44]. In mechanical applications and related nonlinear vibrations questions, stochastic mechanics models have been among the first where moment closure techniques for stochastic processes have become standard tools [45, 46] including the idea to just discard higher-order moments [47]. By now, moment closure methods have permeated practically all natural sciences as evidenced by the classical books [48, 49]. For SDEs, moment closure methods have not been used as intensively as one may guess but see [50].

For kinetic theory, closure methods also have a long history, particularly starting from the famous Grad 13-moment closure [51, 52 ] , and moment methods have become fundamental tools in gas dynamics [53]. One particularly important application for kinetic-theory moment methods is the modelling of plasmas [54, 55]. In general, it is quite difficult to study the resulting kinetic moment equations analytically [56, 57] but many numerical approaches exist [5861]. Of course, the maximum entropy closure we have discussed is not restricted to kinetic theory [62] and maximum entropy principles appear in many contexts [6367].

One area where moment closure methods are employed a lot recently is mathematical biology. For example, the pair approximation [12] and its variants [68] are frequently used in various models including lattice models [6974], homogeneous networks [75, 76] and many other network models [7780]. Several closures have also included higher-order moments [81, 82] and truncation ideas are still used [8385]. Applications to various different setups for epidemic spreading are myriad [85, 86 ] . A typical benchmark problem for moment methods in biology is the stochastic logistic equation [8793]. Furthermore, spatial models in epidemiology and ecology have been a focus [9497]. There are several survey and comparison papers with a focus on epidemics application and closure-methods available [13, 98100]. There is also a link from mathematical biology and moment closure to transport and kinetic equations [101, 102], e.g., in applications of cell motion [103]. Also physical constraints, as we have discussed for abstract kinetic equations, play a key role in biology, e.g., trying to guarantee non-negativity [86].

Another direction is network dynamics [104], where moment closure methods have been used very effectively are adaptive, or co-evolutionary, networks with dynamics of and on the network [30, 105]. Moment equations are one reason why one may hope to describe self-organization of adaptive networks [106] by low-dimensional dynamical systems models [107]. Applications include opinion formation [108, 109] with a focus on the classical voter model [110112 ] ; see [113] for a review of closure methods applied to the voter model. Other applications are found again in epidemiology [114120] and in game theory [121123]. The maximum entropy-closure we introduced for kinetic equations has also been applied in the context of complex networks [124] and spatial network models in biology [125]. An overview of the use of the pair approximation, several models, and the relation to master equations can be found in [126]. It has also been shown that in many cases low-order or mean-field closures can still be quite effective [127].

On the level of moment equations in network science, one has to distinguish between purely moment or motif-based choices of the space \(\mathbb {M}\) and the recent proposal to use heterogeneous degree-based moments . For example, instead of just tracking the moment of a node density, one also characterizes the degree distribution [128] of the node via new moment variables [129]. Various applications of heterogeneous moment equations have been investigated [130, 131].

Another important applications are stochastic reaction networks [132134], where the mean-field reaction-rate equations are not accurate enough [135]. A detailed computation of moment equations from the master equation of reaction-rate models is given in [136]. In a related area, turbulent combustion models are investigated using moment closure [137141]. For turbulent combustion , one frequently considers so-called conditional moment closures where one either conditions upon the flow being turbulent or restricts moments to certain parts of phase space; see [142] for a very detailed review.

Further applications we have not focused on here can be found in genetics [143], client-server models in computer science [144, 145], mathematical finance [146], systems biology [147], estimating transport coefficients [148], neutron transport [149], and radiative transport problems [150, 151]. We have also not focused on certain methods to derive moment equations including moment-generating functions [152154], Lie-algebraic methods [155], and factorial moment expansions [156].

In summary, it is clear that many different areas are actively using moment closure methods and that a cross-disciplinary approach could yield new insights on the validity regimes of various methods. Furthermore, it is important to emphasize again that only a relatively small snapshot of the current literature has been given in this review and a detailed account of all applications of moment closure methods would probably fill many books.