1 Introduction

An often ignored principle in Bayesian inference is that the inference requires the utilization of all available knowledge and all the relevant information available (Jaynes 2003). In the context of data assimilation, especially for physical systems, one often has access to hierarchies of multiple models, each one more accurate than its predecessor in the hierarchy; higher resolution models can be obtained by simply refining the simulation grid or step-size, or through the ability to more accurately capture the underlying physical phenomena. In addition, one frequently has access to observations of the same physical variable through different types of sensors. The principle of Bayesian inference asks to not discard this information, but to incorporate it whenever possible.

Multifidelity data assimilation refers to methods that merge information about the same underlying natural truth obtained through the use of multiple models or observation operators at different levels of fidelity. For a survey of general multifidelity methods and types of multifidelity models see Peherstorfer et al. (2018).

First introduced by Giles (2008) and then later more formally defined in Giles (2015) the ‘multilevel’ Monte Carlo filter describes the optimal linear coupling between different ‘levels’ of simulations (which we generalize to calling fidelities). Here we aim to generalize the notion of levels to fidelities. We introduce a formal notion of structure in the relation between fidelities, and also within the structure that optimally combines the information contained therein.

This chapter is organized as follows. The rest of the introduction is concerned with describing the data assimilation problem, and the idea of multifidelity models. Control variate theory is introduced in Sect. 2. The problem of multifidelity inference is introduced in Sect. 3, with the multifidelity Kalman filter discussed in Sect. 3.1 and the multifidelity ensemble Kalman filter in Sect. 3.2. We then introduce the concept of multifidelity observations in Sect. 4. A trivial numerical example with the Lorenz ’96 system is shown in Sect. 5. We conclude with some remarks in Sect. 6.

1.1 Notation

Consider a random variable \(\chi \). The distribution of \(\chi \) is denoted by \(\pi _\chi \), and an ensemble representing N samples from the distribution by \(\mathbf {E}_\chi = [\mathbf {\chi }^{(1)},\dots , \mathbf {\chi }^{(N)}]\). We (exact) denote the mean by \(\mathbf {\mu }_{\chi }\), and the empirical sample mean by \(\widetilde{\mathbf {\mu }}_{\chi }\). The covariance between random variables \(\chi \) and \(\upsilon \) is denoted by \(\mathbf {\Sigma }_{\chi ,\upsilon }\), and the empirical sample covariance by \(\widetilde{\mathbf {\Sigma }}_{\chi ,\upsilon }\).

1.2 The Data Assimilation Problem

We seek to model the state X of a dynamical system with an imperfect model,

$$\begin{aligned} X_i = \mathcal {M}_i (X_{i-1}) + \Xi _i, \end{aligned}$$
(1)

where the model errors at different times \(\Xi _i\) are independent of each other. We assume the errors have mean zero, \(\mathbf {\mu }_{\Xi _i}=0\), and covariances \(\mathbf {\Sigma }_{\Xi _i,\Xi _i}\).

Observations of the true state \(X^t_i\) are available at discrete time moments i:

$$\begin{aligned} Y_i = \mathcal {H}_i (X^t_i) + \eta _i, \end{aligned}$$
(2)

where we again assume that the observation errors \(\eta _i\) at different times are independent of each other, have mean zero, \(\mathbf {\mu }_{\eta _i}=0\), and covariances \(\mathbf {\Sigma }_{Y_i,Y_i}\).

Given prior information \(X^b_i\) about the state at time i, and noisy observations of the truth, the filtering problem consists of sequentially computing the posterior, \(X^a_i\) in some (usually Bayesian) inference sense.

Explicitly, the Bayesian formulation (Reich and Cotter 2015) aims to find

$$\begin{aligned} \pi (X^a_i) = \pi (X^b_i | Y_i) = \frac{\pi (Y_i | X^b_i) \pi (X^b_i)}{\pi (Y_i)}, \end{aligned}$$
(3)

in some approximate way, as the problem, more often than not, is computationally intractable.

1.3 Multifidelity Models

The focus of multifidelity data assimilation is to compute the inference (3) using not a single model (1), but leveraging a hierarchy of models at different fidelities. Consider two levels of fidelity, and assume that our high fidelity state variable is X and coarse fidelity variable is U. The two models that propagate these quantities in time are:

$$\begin{aligned} X_i = \mathcal {M}^X_i (X_{i-1}),\end{aligned}$$
(4)
$$\begin{aligned} U_i = \mathcal {M}^U_i (U_{i-1}). \end{aligned}$$
(5)

The goal of multifidelity data assimilation is to make use of these different models to incorporate as much information as possible.

An important aspect of multifidelity models, which is a generalization of multilevel hierarchies, is that the the state spaces of the different models do not necessarily have to be the same. In fact, we will assume that the fine fidelity model state can be embedded into \(\mathbb {R}^n\) and that the coarse fidelity model state can be embedded into \(\mathbb {R}^r\), where typically \(r < n\), though this is not necessarily the case.

Alternatively, we can think of the word ‘model’ as describing the operator about which we are optimizing. In a data assimilation context this will often be our observations. Assume that there exist two ways of obtaining observations (2) of the same fundamental phenomenon, one defined by a fine fidelity operator \(\mathcal {H}^\chi \), and the other defined by a coarse fidelity observation operator, \(\mathcal {H}^\upsilon \), such that,

$$\begin{aligned} Y^\chi = \mathcal {H}^\chi (\mathbf {X}^t) + \eta ^\chi ,\end{aligned}$$
(6)
$$\begin{aligned} Y^\upsilon = \mathcal {H}^\upsilon (\mathbf {X}^t) + \eta ^\upsilon , \end{aligned}$$
(7)

wherein the goal would shift to either combining and utilizing the observations in some optimal way without loss of information, but also without duplication of information.

2 Control Variates

The linear control variate technique (Rubinstein and Marcus 1985) is a method for reducing the variance of an estimator by making use of highly correlated data about which additional information is known. Assume our quantity of interest is described by the distribution of the principal variate \(\chi \). The distribution of the highly correlated control variate \(\hat{\upsilon }\) describes information in an alternate way (such as in a different space), and the distribution of an independent (or more weaker, uncorrelated) ancillary variate\(\upsilon \) describes information related to that of the control variate and shares the same mean, \(\mathbf {\mu }_{\hat{\upsilon }}=\mathbf {\mu }_{{\upsilon }}\). The linear control variate approach builds a total variate \(\zeta \)

$$\begin{aligned} \zeta = \chi - \mathbf {S}(\hat{\upsilon } - \upsilon ), \end{aligned}$$
(8)

where the free parameter \(\mathbf {S}\), known as the gain operator, is chosen to minimize the generalized variance of \(\zeta \). The three variates that make up the total variate will be collectively called the constituent variates.

Theorem 1

(Unbiased nature of linear control variates) Without proof, the mean of the total variate equals the mean of the principal variate,

$$\begin{aligned} \mathbf {\mu }_{\zeta }=\mathbf {\mu }_{\chi }. \end{aligned}$$
(9)

Theorem 2

(Optimal gain for linear control variates) The optimal gain matrix \(\mathbf {S}\) that minimizes the trace of the covariance of (8) is

$$\begin{aligned} \mathbf {S} = \mathbf {\Sigma }_{\chi ,\hat{\upsilon }}{\left( \mathbf {\Sigma }_{\hat{\upsilon },\hat{\upsilon }} + \mathbf {\Sigma }_{\upsilon ,\upsilon }\right) }^{-1}. \end{aligned}$$
(10)

Proof

Observe that the covariance of (8) is

$$\begin{aligned} \mathbf {\Sigma }_{\zeta ,\zeta } = \mathbf {\Sigma }_{\chi ,\chi } - \mathbf {S}\mathbf {\Sigma }_{\hat{\upsilon },\chi } - \mathbf {\Sigma }_{\chi ,\hat{\upsilon }}\mathbf {S}^T + \mathbf {S}\mathbf {\Sigma }_{\hat{\upsilon },\hat{\upsilon }}\mathbf {S}^T + \mathbf {S}\mathbf {\Sigma }_{\upsilon ,\upsilon }\mathbf {S}^T. \end{aligned}$$
(11)

Taking the derivative with respect to \(\mathbf {S}\) of the trace of (11),

$$\begin{aligned} \frac{\partial }{\partial \mathbf {S}}\text {tr}\left( \mathbf {\Sigma }_{\zeta ,\zeta }\right) = -2\mathbf {\Sigma }_{\chi ,\hat{\upsilon }} + 2\mathbf {S}\left( \mathbf {\Sigma }_{\hat{\upsilon },\hat{\upsilon }} + \mathbf {\Sigma }_{\upsilon ,\upsilon }\right) , \end{aligned}$$
(12)

the local minimum is found at (10), as required.

Corollary 1

By simple manipulation, the covariance of the total variate under the optimal gain from Theorem 2 is:

$$\begin{aligned} \mathbf {\Sigma }_{\zeta ,\zeta } = \mathbf {\Sigma }_{\chi ,\chi } - \mathbf {\Sigma }_{\chi ,\hat{\upsilon }}{\left( \mathbf {\Sigma }_{\hat{\upsilon },\hat{\upsilon }} + \mathbf {\Sigma }_{\upsilon ,\upsilon }\right) }^{-1}\mathbf {\Sigma }_{\hat{\upsilon },\chi }, \end{aligned}$$
(13)

from which it is clear that \(\mathbf {\Sigma }_{\zeta ,\zeta } \le \mathbf {\Sigma }_{\chi ,\chi }\) in the symmetric semi-positive definite sense.

The linear control variate technique can be derived in a parallel but completely alternate way. Taking the principal, control, and ancillary variates as Gaussian random variables, the total variate is the solution to the Bayesian inference problem,

$$\begin{aligned} \pi (\zeta ) = \pi (\chi | \hat{\upsilon }, \upsilon ). \end{aligned}$$
(14)

which is a well-known result due to Kalman (1960).

Following Nelson (1987) we now attempt to introduce the idea of non-linear control variates. Instead of searching for function approximations that follow a set of rules, we will instead view the problem of finding the total variate \(\zeta \) in terms of the principal, control, and ancillary variates as an inference problem, generalizing (14) to arbitrary distributed variables. Specifically, we seek to cast the general inference problem (and specific approximations thereof) into an application of some problem-specific transform,

$$\begin{aligned} \zeta = \mathcal {T}(\chi , \hat{\upsilon }, \upsilon ), \end{aligned}$$
(15)

with the function \(\mathcal {T}\) represents a distribution transformation on the principal variate, built making use of the information given by the control and ancillary variates.

For the remainder of this chapter we will assume that the control variate is related to the principal variate through a deterministic function (coupling),

$$\begin{aligned} \hat{\upsilon } = \theta (\chi ), \end{aligned}$$
(16)

which implies that there is necessarily some loss of information from the space of the total and principal variates to the space of the control and ancillary variates.

An important generalization of the control variate concept is its ability to be applied in a nested form. This means that the total variate \(\zeta \) can itself be an ancillary variate for a finer fidelity principal variate. Assume that we have to have \(\mathcal {F}\) fidelities, with \(\upsilon _{\mathcal {F}}\) being the coarsest fidelity ancillary variate. Its corresponding control variate is \(\hat{\upsilon }_{\mathcal {F}}\), and its principal and total variates are on level \(\mathcal {F} - 1\): \(\chi _{\mathcal {F} - 1}\) and \(\zeta _{\mathcal {F} - 1}\). The total variate is then also the ancillary variate for the next set,

$$\begin{aligned} \upsilon _{\mathcal {F} - 1} \xleftarrow {} \zeta _{\mathcal {F} - 1}, \end{aligned}$$
(17)

which can be generalized all the way up the chain, until we reach the the constituent variates \(\chi _1\), \(\hat{\upsilon }_2\), and \(\upsilon _2\), that represent the full information content through the total variate \(\zeta _1\). Explicitly, from Popov et al. (2020), the total variate for \(\mathcal {F}\) fidelities and the corresponding optimal gain matrices can be written as,

$$\begin{aligned} \zeta = \chi - \sum _{f=1}^{\mathcal {F}} \overline{\,\mathbf {S}}_f\,\left( \hat{\upsilon }_{f} - \upsilon _{f}\right) , \quad \overline{\,\mathbf {S}}_f = \prod _{\psi =1}^{f}\,\mathbf {S}_{\psi }. \end{aligned}$$
(18)

2.1 Ensemble Control Variates

Instead of employing the exact distribution of a random variable, which usually is considered to be an intractable task, an ensemble of samples is typically used.

We will discuss the ways in which ensemble multifidelity inference is performed in a later section. Here we concern ourselves with the problem of finding an ensemble representation of the total variate (8) given an ensemble of \(N_\chi \) samples of the principal variate \(\mathbf {E}_\chi = [\mathbf {\chi }^{(1)},\dots \mathbf {\chi }^{(N_\chi )}]\) and corresponding pairwise samples of the control variate \(\mathbf {E}_{\hat{\upsilon }} = [\hat{\mathbf {\upsilon }}^{(1)},\dots \hat{\mathbf {\upsilon }}^{(N_\chi )}]\). We seek to find an ensemble of the total variate \(\mathbf {E}_\zeta = [\mathbf {\zeta }^{(1)},\dots \mathbf {\zeta }^{(N_\chi )}]\).

We will define the ensemble means as

$$\begin{aligned} \widetilde{\mathbf {\mu }}_\chi = \frac{1}{N_\chi }\mathbf {E}_\chi \mathbf {1}_{N_\chi } \in \mathfrak {R}^{n},\quad \widetilde{\mathbf {\mu }}_{\hat{\upsilon }} = \frac{1}{N_\chi }\mathbf {E}_{\hat{\upsilon }}\mathbf {1}_{N_\chi } \in \mathfrak {R}^{r}, \end{aligned}$$
(19)

and the anomalies as,

$$\begin{aligned} \mathbf {A}_\chi = \mathbf {E}_\chi - \widetilde{\mathbf {\mu }}_\chi \mathbf {1}_{N_\chi }^T \in \mathfrak {R}^{n\times N_\chi },\quad \mathbf {A}_{\hat{\upsilon }} = \mathbf {E}_{\hat{\upsilon }} - \widetilde{\mathbf {\mu }}_{\hat{\upsilon }} \mathbf {1}_{N_\chi }^T \in \mathfrak {R}^{r\times N_\chi }. \end{aligned}$$
(20)

Assume that we are given either the mean and covariance of the ancillary variate (\(\mathbf {\mu }_\upsilon \) and \(\mathbf {\Sigma }_{\upsilon ,\upsilon }\)), or that we are able to derive empirical approximations \(\widetilde{\mathbf {\mu }}_\upsilon \) and \(\widetilde{\mathbf {\Sigma }}_{\upsilon ,\upsilon }\) an ensemble of \(N_\upsilon \) samples of \(\upsilon \), \(\mathbf {E}_\upsilon \). In the first approach we utilize the linear control variate framework (8).

There are numerous ways in which to derive the ensemble of the total variate. One way is to create a synthetic ensemble of \(N_\chi \) samples of the ancillary variate sampled from its known distribution. Denote this ensemble \(\widetilde{\mathbf {E}}_\upsilon \). Under the linear control variate approach,

$$\begin{aligned} \mathbf {E}_\zeta = \mathbf {E}_\chi - \mathbf {S}(\mathbf {E}_{\hat{\upsilon }} - \widetilde{\mathbf {E}}_\upsilon ), \end{aligned}$$
(21)

where the optimal gain is approximated by

$$\begin{aligned} \widetilde{\mathbf {S}} = \widetilde{\mathbf {\Sigma }}_{\chi ,\hat{\upsilon }}{\left( \widetilde{\mathbf {\Sigma }}_{\hat{\upsilon },\hat{\upsilon }} + \mathbf {\Sigma }_{\upsilon ,\upsilon }\right) }^{-1}. \end{aligned}$$
(22)

The astute reader will recognize this as the ‘perturbed observations’ ensemble Kalman filter (Houtekamer and Mitchell 1998).

An alternate formulation that does away with the synthetic ensemble assumes a Gaussian prior on the ancillary variate, and uses the optimal empirical gain,

$$\begin{aligned} \mathbf {E}_\zeta= & {} \widetilde{\mathbf {\mu }}_\chi \mathbf {1}_{N_\chi }^T + \mathbf {A}_\chi \mathbf {T} \left( \mathbf {I}_{N_\chi } - \frac{1}{N_\chi - 1} \mathbf {T}^T \mathbf {A}_{\hat{\upsilon }}^T \mathbf {\Sigma }_{\upsilon ,\upsilon }^{-1}(\widetilde{\mathbf {\mu }}_{\hat{\upsilon }} - \mathbf {\mu }_\upsilon )\mathbf {1}_{N_\chi }^T\right) ,\end{aligned}$$
(23)
$$\begin{aligned} \mathbf {T}= & {} {\left( \mathbf {I}_{N_\chi } - \frac{1}{N_\chi - 1}\mathbf {A}_{\hat{\upsilon }}^T{\left( \widetilde{\mathbf {\Sigma }}_{\hat{\upsilon },\hat{\upsilon }} + \mathbf {\Sigma }_{\upsilon ,\upsilon }\right) }^{-1} \mathbf {A}_{\hat{\upsilon }}\right) }^{\frac{1}{2}}, \end{aligned}$$
(24)

which the astute reader will recognize as the ensemble transform Kalman filter (ETKF)  (Bishop et al. 2001).

Note that if it is not possible to represent the covariance of the ancillary variate exactly, then one needs to compute the optimal gain in alternate ways.

Another interesting approach to ensemble inference is the importance sampling optimal transport procedure (Reich 2013). In essence, one constructs the posterior mean from the importance sampling procedure,

$$\begin{aligned} \mathbf {\mu }_\zeta = \mathbf {E}_\chi \mathbf {w},\end{aligned}$$
(25)
$$\begin{aligned} \left[ \mathbf {w}\right] _i \propto \pi _\upsilon (\hat{\mathbf {\upsilon }}^{(i)}). \end{aligned}$$
(26)

The anzatz is made that the optimal transportation into an equally weighted ensemble with the same mean defines an ensemble with the same empirical moments as those defined by the importance sampling weights,

$$\begin{aligned} \mathbf {E}_\zeta = \mathbf {E}_\chi \mathbf {T}^*, \end{aligned}$$
(27)

where the optimal transport is defined in the Monge-Kantorovich sense,

$$\begin{aligned} \begin{aligned} \mathbf {T}^*&= \text {arg}\,\text {min}_{\mathbf {T}} \sum _{1\le i, j\le N_\chi } [\mathbf {T}]_{ij}{\left\Vert \mathbf {\chi }^{(i)}-\mathbf {\chi }^{(j)}\right\Vert }_2^2 \\&\quad \text {subject to: } \mathbf {T}\mathbf {1}_{N_\chi } = N_\chi \mathbf {w},\, \mathbf {T}^T\mathbf {1}_{N_\chi } = \mathbf {1}_{N_\chi },\, [\mathbf {T}]_{ij} \ge 0, \end{aligned} \end{aligned}$$
(28)

which ensures that the weights of the new posterior ensemble are equal.

Second order accurate (preserving the weighted ensemble covariance) extensions to this formulation exist (Acevedo et al. 2017) and should be used if this methodology is to be attempted operationally.

3 Multifidelity Filtering

For ease of exposition we primarily focus on the case of two fidelities; multifidelity extensions will be described separately.

Assume now that the state of a dynamical system is our quantity of interest, and that there are two different fidelities in which we can represent it: fine and coarse. Let the distribution of the principal variate \(X^b\) represent the prior information about the state at fine fidelity. Let \(\hat{U}^b\) be its corresponding control variate, and \(U^b\) be the ancillary variate, the distributions of which describe information about the state at coarse fidelity.

Assume that the prior total variate \(Z^b\) represents the general posterior of the multifidelity inference procedure (14). Note that it is possible, with some abuse of notation, to represent the inference as an application of some nonlinear function,

$$\begin{aligned} Z^b = \mathcal {C}(X^b, \hat{U}^b, U^b), \end{aligned}$$
(29)

with the function \(\mathcal {C}\) defining an implicit assumption about the relationship between the four variates, such as the linear control variate assumption (8) or an optimal transport based assumption (27) if our variates are represented by ensembles. In the most general sense, \(\mathcal {C}\) can represent some non-linear variance reduction technique that is informed by the distributions of the constituent variates (Nelson 1987).

The prior total variate is \(Z^b\) and the posterior total variate is \(Z^a\), defined by the same function applied to its component variates:

$$\begin{aligned} Z^a = \mathcal {C}(X^a, \hat{U}^a, U^a). \end{aligned}$$
(30)

The inference step from the prior total to the posterior total variates is a filtering step which explicitly combines information,

$$\begin{aligned} Z^a = \mathcal {F}(Z^b, \mathcal {H}(Z^b), Y), \end{aligned}$$
(31)

with the function \(\mathcal {F}\) standing in for some filter, such as the Kalman filter.

The principal variate can be propagated by some constituent filter,

$$\begin{aligned} X^a = \widetilde{\mathcal {F}}^{Z}_X(X^b, \mathcal {H}(X^b), Y), \end{aligned}$$
(32)

which is dependent on \(Z^b\), \(Z^a\) and the filter\(\mathcal {F}\) that is implicitly applied between them. Similar formulations can be made for the other constituent variates.

Note that the goal of one step of a multifidelity filter is not to find the posterior total variate \(Z^a\), but rather to find posteriors of its constituent variates, \(X^a\), \(\hat{U}^a\), and \(U^a\). In fact, as the total variate is merely a synthetic construction, the multifidelity inference reduces to performing virtual inference on the total variate by manipulating the principal, control, and ancillary variates. In this way the explicit filtering of the total variate (31) is not performed, but only the constituent filtering problems (32) are explicitly solved.

While the general problem of finding the analysis principal variate \(X^a\) given only the analysis total variate \(Z^a\) is not well posed, the combined problem of finding the distributions of \(X^a\), \(\hat{U}^a\), and \(U^a\) may be posed in terms of a minimum cross entropy problem:

$$\begin{aligned} X^a, \hat{U}^a, U^a = \text {arg}\,\text {min}_{X^a, \hat{U}^a, U^a} H(X^b, X^a) + H(\hat{U}^b, \hat{U}^a) + H(U^b, U^a), \end{aligned}$$
(33)

subject to the constraints,

$$\begin{aligned} Z^a = \mathcal {C}(X^a, \hat{U}^a, U^a),&\quad Z^a = \mathcal {F}(Z^b, \mathcal {H}(Z^b), Y),\end{aligned}$$
(34)
$$\begin{aligned} \mathbb {E}[\hat{U}^a] = \mathbb {E}[U^a],&\quad \hat{U}^a = \theta (X^a), \end{aligned}$$
(35)

from which the constituent filters (32) are implicitly defined.

A powerful assumption that can be made is that the same control structure imposed on the prior is also imposed on the posterior. We call this the ‘control structure consistency assumption’. One way in which this holds in the linear control variate approach is:

$$\begin{aligned} Z^b&= X^b - \mathbf {S}^b(\hat{U}^b - U^b),\end{aligned}$$
(36)
$$\begin{aligned} Z^a&= X^a - \mathbf {S}^a(\hat{U}^a - U^a), \end{aligned}$$
(37)

where the we impose the assumption that the (approximately) optimal prior and posterior gains are equivalent,

$$\begin{aligned} \mathbf {S}^b = \mathbf {S}^a, \end{aligned}$$
(38)

meaning that we restrict all possible posterior constituent variates to ones that obey the same structure as their prior counterparts. One way in which this is achieved is by assuming a particular structure on the relationship between the principal and control variate (16) from Popov et al. (2020).

3.1 Multifidelity Kalman Filter

We now introduce the multifidelity Kalman filter (MFKF), fleshed out from Popov et al. (2020). As Gaussian random variables can be trivially combined through known formulas involving their means and covariances, the MFKF is not an algorithm that needs to exist for the purposes of practical implementation, but merely needs to exist to explain derivations of practical extensions thereof.

We restrict ourselves to a linear principal-control variate coupling (16),

$$\begin{aligned} \hat{U}_i = \theta (X_i) = \boldsymbol{\Theta } X_i, \end{aligned}$$
(39)

with \(\boldsymbol{\Theta }\) a projection operator from the n-dimensional space of the principal variate onto the r-dimensional space of the control variate. The corresponding interpolating operator is denoted \(\mathbf {\Phi }\) (such that \(\boldsymbol{\Theta }\,\boldsymbol{\Phi } = \mathbf {I}_r\)). We decompose the principal variate into its control variate and residual variate components:

$$\begin{aligned} X_i = \boldsymbol{\Theta }\, \hat{U}_i + \Delta X_i. \end{aligned}$$
(40)

Additionally, as is canonical, we restrict ourselves to the case of a linear observation operator \(\mathbf {H}_i\).

For the rest of this chapter we assume that we seek to propagate the total variate,

$$\begin{aligned} Z_i = X_i - \mathbf {S}_i\,(\hat{U}_i - U_i), \end{aligned}$$
(41)

through both a dynamical model (forecast step), and through the analysis step conditioned by observations.

We express the moments of the total variate in terms of the moments of the corresponding constituents:

$$\begin{aligned} \mathbf {\mu }_{Z_i}= & {} \mathbf {\mu }_{X_i} - \mathbf {S}_i(\mathbf {\mu }_{\hat{U}_i} - \mathbf {\mu }_{U_i}),\end{aligned}$$
(42)
$$\begin{aligned} \mathbf {\Sigma }_{Z_i, Z_i}= & {} \mathbf {\Sigma }_{X_i, X_i} + \mathbf {S}_i\mathbf {\Sigma }_{\hat{U}_i, \hat{U}_i}\mathbf {S}_i^T + \mathbf {S}_i\mathbf {\Sigma }_{U_i, U_i}\mathbf {S}_i^T - \mathbf {S}_i\mathbf {\Sigma }_{\hat{U}_i, X_i} - \mathbf {\Sigma }_{X_i, \hat{U}_i}\mathbf {S}_i^T. \end{aligned}$$
(43)

We are now ready to look at the MFKF. For the forecast step, assume that we have a linear fine fidelity model \(\mathbf {M}^X_i\), and a linear coarse fidelity model \(\mathbf {M}^U_i\). Assume that the error \(\Xi _i\) of the fine fidelity model is unbiased and is known to have covariance \(\mathbf {\Sigma }_{\Xi _i, \Xi _i}\). Assume additionally that the coarse fidelity model has no error in the coarse subspace in relation to the truth. This could be because the coarse fidelity model was built to capture this error through data driven closures.

Assume that we have the posterior information at the previous step \(i-1\) about the principal, control, and ancillary variates, and that the relation between the principal and control variate (39) holds. We propagate the means as follows:

$$\begin{aligned} \mathbf {\mu }_{X^b_i}= & {} \mathbf {M}^X_i\mathbf {\mu }_{X^a_{i-1}},\end{aligned}$$
(44)
$$\begin{aligned} \mathbf {\mu }_{\hat{U}^b_i}= & {} \mathbf {M}^U_i\mathbf {\mu }_{\hat{U}^a_{i-1}},\end{aligned}$$
(45)
$$\begin{aligned} \mathbf {\mu }_{U^b_i}= & {} \mathbf {M}^U_i\mathbf {\mu }_{U^a_{i-1}}, \end{aligned}$$
(46)

with the covariances propagated as,

$$\begin{aligned} \mathbf {\Sigma }_{X^b_i,X^b_i}= & {} \mathbf {M}^X_i\mathbf {\Sigma }_{X^a_{i-1},X^a_{i-1}}\mathbf {M}^{X, T}_i + \mathbf {\Sigma }_{\Xi _i, \Xi _i},\end{aligned}$$
(47)
$$\begin{aligned} \mathbf {\Sigma }_{X^b_i,\hat{U}^b_i}= & {} \mathbf {M}^X_i\mathbf {\Sigma }_{X^a_{i-1},\hat{U}^a_{i-1}}\mathbf {M}^{U, T}_i,\end{aligned}$$
(48)
$$\begin{aligned} \mathbf {\Sigma }_{\hat{U}^b_i,\hat{U}^b_i}= & {} \mathbf {M}^U_i\mathbf {\Sigma }_{\hat{U}^a_{i-1},\hat{U}^a_{i-1}}\mathbf {M}^{U, T}_i,\end{aligned}$$
(49)
$$\begin{aligned} \mathbf {\Sigma }_{U^b_i,U^b_i}= & {} \mathbf {M}^U_i\mathbf {\Sigma }_{U^a_{i-1},U^a_{i-1}}\mathbf {M}^{U, T}_i, \end{aligned}$$
(50)

We note that unless the principal variate residual is not propagated in control space by the fine fidelity model,

$$\begin{aligned} \boldsymbol{\Theta }\mathbf {M}^X_i \Delta X^a_{i-1} = 0, \end{aligned}$$
(51)

then the above propagation will violate (39). Therefore, as a useful heuristic, the propagation of the control variate moments can be replaced by the propagation of the projected principal variate moments in order for the relation (39) to hold more strongly at each step,

$$\begin{aligned} \mathbf {\mu }_{\hat{U}^b_i}= & {} \mathbf {M}^U_i\boldsymbol{\Theta }\mathbf {\mu }_{X^a_{i-1}},\end{aligned}$$
(52)
$$\begin{aligned} \mathbf {\Sigma }_{X^b_i,\hat{U}^b_i}= & {} \mathbf {M}^X_i\mathbf {\Sigma }_{X^a_{i-1},X^a_{i-1}}{\left( \mathbf {M}^{U}_i\boldsymbol{\Theta }\right) }^T,\end{aligned}$$
(53)
$$\begin{aligned} \mathbf {\Sigma }_{\hat{U}^b_i,\hat{U}^b_i}= & {} \mathbf {M}^U_i\boldsymbol{\Theta }\mathbf {\Sigma }_{X^a_{i-1},X^a_{i-1}}{\left( \mathbf {M}^{U}_i\boldsymbol{\Theta }\right) }^T. \end{aligned}$$
(54)

This is especially useful if the models are non-linear, generalizing to the multifidelity extended Kalman filter, or in the case of the multifidelity ensemble Kalman filter later in the chapter in Sect. 3.2.

Lemma 1

The fine fidelity model, coarse fidelity model, posterior optimal gain at step \(i-1\), and prior optimal gain at time i are related as follows:

$$\begin{aligned} \mathbf {M}^X_i\mathbf {S}^a_{i-1} = \mathbf {S}^b_i \mathbf {M}^U_i. \end{aligned}$$
(55)

Proof

By simple manipulation,

$$\begin{aligned} \mathbf {M}^X_i\mathbf {S}^a_{i-1}= & {} \mathbf {M}^X_i\mathbf {\Sigma }_{X^a_{i-1},\hat{U}^a_{i-1}}{\left( \mathbf {\Sigma }_{\hat{U}^a_{i-1},\hat{U}^a_{i-1}} + \mathbf {\Sigma }_{U^a_{i-1},U^a_{i-1}}\right) }^{-1}\end{aligned}$$
(56)
$$\begin{aligned}= & {} \mathbf {\Sigma }_{X^b_i,\hat{U}^a_{i-1}}\mathbf {M}^{U,T}_i{\left[ \mathbf {M}^{U}_i\left( \mathbf {\Sigma }_{\hat{U}^a_{i-1},\hat{U}^a_{i-1}} + \mathbf {\Sigma }_{U^a_{i-1},U^a_{i-1}}\right) \mathbf {M}^{U,T}_i\right] }^{-1}\mathbf {M}^U_i\end{aligned}$$
(57)
$$\begin{aligned}= & {} \mathbf {\Sigma }_{X^b_i,\hat{U}^b_i}{\left( \mathbf {\Sigma }_{\hat{U}^b_i,\hat{U}^b_i} + \mathbf {\Sigma }_{U^b_i,U^b_i}\right) }^{-1}\mathbf {M}^U_i = \mathbf {S}^b_i \mathbf {M}^U_i, \end{aligned}$$
(58)

as required.

Theorem 3

The MFKF forecast is the total variate forecast:

$$\begin{aligned} \mathbf {\mu }_{Z^b_i}= & {} \mathbf {M}^X_i \mathbf {\mu }_{Z^a_{i-1}},\end{aligned}$$
(59)
$$\begin{aligned} \mathbf {\Sigma }_{Z^b_i,Z^b_i}= & {} \mathbf {M}^X_i \mathbf {\Sigma }_{Z^a_{i-1},Z^a_{i-1}} \mathbf {M}^{X,T}_i + \mathbf {\Sigma }_{\Xi _i, \Xi _i}. \end{aligned}$$
(60)

Proof

Using lemma 1, manipulate the formulation for the mean of Z in (42),

$$\begin{aligned} \mathbf {M}^X_i \mathbf {\mu }_{Z^a_{i-1}}= & {} \mathbf {M}^X_i \mathbf {\mu }_{Z^a_{i-1}},\end{aligned}$$
(61)
$$\begin{aligned}= & {} \mathbf {M}^X_i\mathbf {\mu }_{X^a_{i-1}} - \mathbf {M}^X_i\mathbf {S}^a_{i-1}(\mathbf {\mu }_{\hat{U}^a_{i-1}} - \mathbf {\mu }_{U^a_{i-1}}),\end{aligned}$$
(62)
$$\begin{aligned}= & {} \mathbf {\mu }_{X^b_{i}} - \mathbf {S}^b_i \mathbf {M}^U_i(\mathbf {\mu }_{\hat{U}^a_{i-1}} - \mathbf {\mu }_{U^a_{i-1}}),\end{aligned}$$
(63)
$$\begin{aligned}= & {} \mathbf {\mu }_{X^b_{i}} - \mathbf {S}^b_i (\mathbf {\mu }_{\hat{U}^b_i} - \mathbf {\mu }_{U^b_i}) = \mathbf {\mu }_{Z^b_i}, \end{aligned}$$
(64)

as required. A similar manipulation can be performed for the covariance.

In order to obtain an efficient implementation of the analysis step in the MFKF, we need to restrict the projection operator (39) to a class that has ‘nice’ properties. We assume that the joint variability of the principal variate in the orthogonal complement space and control variate is negligible,

$$\begin{aligned} \mathbf {\Sigma }_{\Delta X, \hat{U}} \approx 0, \end{aligned}$$
(65)

or alternatively that the projection operator \(\boldsymbol{\Theta }\) captures the dominant linear modes of the variability in the dynamics of X. Common methods by which such operators can be obtained are POD and DMD, and variants thereof (Brunton and Kutz 2019).

Theorem 4

If the first two moments of the control and ancillary variate are identical, and assumption (65) holds, then the optimal gain is,

$$\begin{aligned} \mathbf {S} = \frac{1}{2}\boldsymbol{\Theta }. \end{aligned}$$
(66)

Proof

Observe by Theorem 2 and (65),

$$\begin{aligned} \mathbf {S} = \mathbf {\Sigma }_{X,\hat{U}}{\left( \mathbf {\Sigma }_{\hat{U},\hat{U}} + \mathbf {\Sigma }_{U,U}\right) }^{-1} = \frac{1}{2}\boldsymbol{\Theta } \boldsymbol{\Sigma }_{\hat{U},\hat{U}} \mathbf {\Sigma }_{\hat{U},\hat{U}}^{-1} = \frac{1}{2}\boldsymbol{\Theta }, \end{aligned}$$
(67)

as required.

If we choose a projection operator for which (65) holds, then the optimal gain is constant and does not have to be estimated. Moreover this provides for a clear relationship between the projection operator \(\boldsymbol{\Theta }\) and the optimal gain, such that \(\boldsymbol{\Theta }\mathbf {S} = \frac{1}{2}\mathbf {I}_r\). For the rest of this section we assume that \(\mathbf {S}\) is constant.

We next discuss the analysis step of the MFKF. Note first that the Kalman gain is the optimal gain when the principal variate is the prior information about the state of the dynamics, the control variate is that information cast into observation space, and the ancillary variate are the independent observations. Assume that the arbitrary variate \(W^b_i\) represents some prior information, we write the Kalman gain as a function of \(W^b_i\),

$$\begin{aligned} \mathbf {K}_{W^b_i} = \mathbf {\Sigma }_{W^b_i, \mathbf {H}_i W^b_i} {\left( \mathbf {\Sigma }_{\mathbf {H}_i W^b_i, \mathbf {H}_i W^b_i} + \mathbf {\Sigma }_{Y_i,Y_i} \right) }^{-1}. \end{aligned}$$
(68)

The standard Kalman filter analysis step applied to the total variate, as described by (31):

$$\begin{aligned} Z^a_i(\mathbf {K}_{Z^b_i}) = Z^b_i - \mathbf {K}_{Z^b_i}(\mathbf {H}_i Z^b_i - Y_i), \end{aligned}$$
(69)

can be decomposed into its constituent variates:

$$\begin{aligned} X^a - \mathbf {S}(\hat{U}^a - U^a) = X^b - \mathbf {S}(\hat{U}^b - U^b) - \mathbf {K}_{Z^b_i}\left( \mathbf {H}\left( X^b - \mathbf {S}(\hat{U}^b - U^b)\right) - Y\right) . \end{aligned}$$
(70)

Taking the ‘natural’ decomposition of this relation into components leads to, the constituent filters (32):

$$\begin{aligned} X^a= & {} X^b - \mathbf {K}_{Z^b_i}\left( \mathbf {H}X^b - Y\right) ,\end{aligned}$$
(71)
$$\begin{aligned} \hat{U}^a= & {} \hat{U}^b - \boldsymbol{\Theta }\mathbf {K}_{Z^b_i}\left( \mathbf {H}\mathbf {\Phi }\hat{U}^b - Y\right) ,\end{aligned}$$
(72)
$$\begin{aligned} U^a= & {} U^b - \boldsymbol{\Theta }\mathbf {K}_{Z^b_i}\left( \mathbf {H}\mathbf {\Phi }U^b - Y\right) , \end{aligned}$$
(73)

which assumes that the control and ancillary variates do not carry any additional information from the orthogonal complement space of the principal variate.

The authors conjecture that the decomposition (71) approximately minimizes the cross entropy functional (33) out of all such decompositions, though there is no strong evidence for this claim as of yet.

The propagation of the total mean through its constituent variate means is:

$$\begin{aligned} \mathbf {\mu }_{X^a}= & {} \mathbf {\mu }_{X^b} - \mathbf {K}_{Z^b_i} (\mathbf {H} \mathbf {\mu }_{X^b} - \mathbf {y})\end{aligned}$$
(74)
$$\begin{aligned} \mathbf {\mu }_{\hat{U}^a}= & {} \mathbf {\mu }_{\hat{U}^b} - \boldsymbol{\Theta }\mathbf {K}_{Z^b_i}\left( \mathbf {H}\mathbf {\Phi }\mathbf {\mu }_{\hat{U}^b} - \mathbf {y}\right) ,\end{aligned}$$
(75)
$$\begin{aligned} \mathbf {\mu }_{U^a}= & {} \mathbf {\mu }_{U^b} - \boldsymbol{\Theta }\mathbf {K}_{Z^b_i}\left( \mathbf {H}\mathbf {\Phi }\mathbf {\mu }_{U^b} - \mathbf {y}\right) . \end{aligned}$$
(76)

The corresponding covariance update formulas are:

$$\begin{aligned} \mathbf {\Sigma }_{X^a_i, X^a_i}= & {} \left( \mathbf {I} - \mathbf {K}_{Z^b_i}\mathbf {H}\right) \mathbf {\Sigma }_{X^b_i, X^b_i}\left( \mathbf {I} - \mathbf {K}_{Z^b_i}\mathbf {H}_i\right) ^T + \mathbf {K}_{Z^b_i}\mathbf {\Sigma }_{Y_i, Y_i}\mathbf {K}_{Z^b_i}^T,\end{aligned}$$
(77)
$$\begin{aligned} \mathbf {\Sigma }_{X^a_i,\hat{U}^a_i}= & {} \left( \mathbf {I} - \mathbf {K}_{Z^b_i}\mathbf {H}_i\right) \mathbf {\Sigma }_{X^b_i,\hat{U}^b_i}{\left( \mathbf {I} - \boldsymbol{\Theta }\mathbf {K}_{Z^b_i}\mathbf {H}_i\mathbf {\Phi }\right) }^T,\end{aligned}$$
(78)
$$\begin{aligned}&+ \mathbf {K}_{Z^b_i}\mathbf {\Sigma }_{Y_i, Y_i}\left( \boldsymbol{\Theta }\mathbf {K}_{Z^b_i}\right) ^T,\end{aligned}$$
(79)
$$\begin{aligned} \mathbf {\Sigma }_{\hat{U}^a_i, \hat{U}^a_i}= & {} \left( \mathbf {I} - \boldsymbol{\Theta }\mathbf {K}_{Z^b_i}\mathbf {H}_i\mathbf {\Phi }\right) \mathbf {\Sigma }_{\hat{U}^b_i, \hat{U}^b_i}{\left( \mathbf {I} - \boldsymbol{\Theta }\mathbf {K}_{Z^b_i}\mathbf {H}_i\mathbf {\Phi }\right) }^T\end{aligned}$$
(80)
$$\begin{aligned}&+ \boldsymbol{\Theta }\mathbf {K}_{Z^b_i}\mathbf {\Sigma }_{Y_i, Y_i}\left( \boldsymbol{\Theta }\mathbf {K}_{Z^b_i}\right) ^T,\end{aligned}$$
(81)
$$\begin{aligned} \mathbf {\Sigma }_{U^a_i, U^a_i}= & {} \left( \mathbf {I} - \boldsymbol{\Theta }\mathbf {K}_{Z^b_i}\mathbf {H}_i\mathbf {\Phi }\right) \mathbf {\Sigma }_{U^b_i, U^b_i}\left( \mathbf {I} - \boldsymbol{\Theta }\mathbf {K}_{Z^b_i}\mathbf {H}_i\mathbf {\Phi }\right) ^T\end{aligned}$$
(82)
$$\begin{aligned}&+ \boldsymbol{\Theta }\mathbf {K}_{Z^b_i}\mathbf {\Sigma }_{Y_i, Y_i}\left( \boldsymbol{\Theta }\mathbf {K}_{Z^b_i}\right) ^T. \end{aligned}$$
(83)

The inner working of the MFKF is illustrated in Fig. 1.

Fig. 1
figure 1figure 1

A diagram of the MFKF computations, including the heuristic re-correlation of the principal and control variates (52), and the assumed relation between the forecast variates (16)

Theorem 5

Without proof, if the optimal gain interpolation projection step does not remove additional information from the control and ancillary variate (that is (65) is exact), then the ‘natural’ decomposition (71) is exact, thus the linear control variate combination of the mean is the total variate analysis mean,

$$\begin{aligned} \mathbf {\mu }_{Z^a_i} = \mathbf {\mu }_{X^a_i} - \mathbf {S}(\mathbf {\mu }_{\hat{U}^a_i} - \mathbf {\mu }_{U^a_i}). \end{aligned}$$
(84)

Similarly for the covariances.

Theorem 6

Without proof, if \(\mathbf {S}\) is the optimal gain (Theorems 2 and 4), then the simple relation between the covariances of the principal and total variates is,

$$\begin{aligned} \mathbf {\Sigma }_{Z_i, Z_i} = \left( \mathbf {I}_n - \mathbf {S}\boldsymbol{\Theta } \right) \mathbf {\Sigma }_{X_i, X_i}. \end{aligned}$$
(85)

Thus we are able to obtain a covariance for the total variate by only knowing the covariance of the principal variate.

An alternate decomposition for which Theorem 5 is exact without qualification, that we will not be analyzing is:

$$\begin{aligned} X^a= & {} X^b - \mathbf {K}_{Z^b_i}\left( \mathbf {H}Z^b - Y\right) ,\end{aligned}$$
(86)
$$\begin{aligned} \hat{U}^a= & {} \hat{U}^b - \boldsymbol{\Theta }\mathbf {K}_{Z^b_i}\left( \mathbf {H}Z^b - Y\right) ,\end{aligned}$$
(87)
$$\begin{aligned} U^a= & {} U^b - \boldsymbol{\Theta }\mathbf {K}_{Z^b_i}\left( \mathbf {H}Z^b - Y\right) , \end{aligned}$$
(88)

These formulas, however, are difficult to implement using ensembles.

We show next how the variability of the total variate \(Z^a_i(\mathbf {K}_{Z^b_i})\), the principal variate \(X^a_i(\mathbf {K}_{Z^b_i})\), and the principal variate analyzed by itself \(X^a_i(\mathbf {K}_{X^b_i})\) are related.

Theorem 7

The covariances of \(Z^a_i(\mathbf {K}_{Z^b_i})\), \(X^a_i(\mathbf {K}_{Z^b_i})\), \(X^a_i(\mathbf {K}_{X^b_i})\) are such that:

$$\begin{aligned} \mathbf {\Sigma }_{Z^a(\mathbf {K}_{Z^b_i}), Z^a(\mathbf {K}_{Z^b_i})} \le \mathbf {\Sigma }_{X^a(\mathbf {K}_{X^b_i}), X^a(\mathbf {K}_{X^b_i})} \le \mathbf {\Sigma }_{X^a(\mathbf {K}_{Z^b_i}), X^a(\mathbf {K}_{Z^b_i})}. \end{aligned}$$
(89)

Proof

By the optimality of the Kalman gain \(\mathbf {K}_{Z^b_i}\) in Theorem 2,

$$\begin{aligned} \mathbf {\Sigma }_{Z^a(\mathbf {K}_{Z^b_i}), Z^a(\mathbf {K}_{Z^b_i})} \le \mathbf {\Sigma }_{Z^a(\mathbf {K}_{X^b_i}), Z^a(\mathbf {K}_{X^b_i})}, \end{aligned}$$
(90)

and by the optimality of the control variate relation \(\mathbf {S}\) from Corollary 1,

$$\begin{aligned} \mathbf {\Sigma }_{Z^a(\mathbf {K}_{X^b_i}), Z^a(\mathbf {K}_{X^b_i})} \le \mathbf {\Sigma }_{X^a(\mathbf {K}_{X^b_i}), X^a(\mathbf {K}_{X^b_i})}. \end{aligned}$$
(91)

The second inequality similarly relies on the optimality of the Kalman gain \(\mathbf {K}_{X^b_i}\).

Theorem 7 shows that the principal variate covariance is an upper bound on the covariance of the total variate.

Relations (89) in Theorem 7 are valid only when the means of the constituent variates are roughly equivalent. This is especially important in the ‘extended’ and ‘ensemble’ extensions to the MFKF. To achieve this, at each step we apply the following heuristic correction:

$$\begin{aligned} \mathbf {\mu }_{X^a_i} \xleftarrow {} \mathbf {\mu }_{Z^a_i}, \quad \mathbf {\mu }_{\hat{U}^a_i} \xleftarrow {} \boldsymbol{\Theta }\mathbf {\mu }_{Z^a_i},\quad \mathbf {\mu }_{U^a_i} \xleftarrow {} \boldsymbol{\Theta }\mathbf {\mu }_{Z^a_i}, \end{aligned}$$
(92)

which additionally enforces the control variate relation (39), ensures that the principal and total variate means are equivalent, and that the control and ancillary variate means are equivalent.

3.2 Multifidelity Ensemble Kalman Filter

Following Popov et al. (2020), we present the multifidelity ensemble Kalman filter (MFEnKF).

Assume now that instead of manipulating the first two moments of our variates, we manipulate ensembles. Assume that we have \(N_X\) pairwise correlated samples of the principal and control variates \((\mathbf {X}^{(i)}, \mathbf {\hat{U}}^{(i)})\) represented by the ensembles \(\mathbf {E}_X\) and \(\mathbf {E}_{\hat{U}}\), respectively, and \(N_U \ge N_X\) samples of the ancillary variate \(\mathbf {U}^{(i)}\), represented by the ensemble \(\mathbf {E}_U\). We wish to construct practical ensemble-based generalizations to the MFKF.

The forecast step, similar to the standard EnKF, and MFKF ((44) and (47)), propagates the ensemble members individually through their respective models,

$$\begin{aligned} \mathbf {X}^{b,(j)}_i= & {} \mathcal {M}^X_i(\mathbf {X}^{a,(j)}_{i-1}) + \mathbf {\xi }_i^{(j)},\quad 1\le j \le N_X\end{aligned}$$
(93)
$$\begin{aligned} \hat{\mathbf {U}}^{b,(j)}= & {} \mathcal {M}^U_i(\hat{\mathbf {U}}^{a,(j)}_{i-1}),\quad 1\le j \le N_X\end{aligned}$$
(94)
$$\begin{aligned} \mathbf {U}^{b,(j)}= & {} \mathcal {M}^U_i(\mathbf {U}^{a,(j)}_{i-1}),\quad 1\le j \le N_U, \end{aligned}$$
(95)

where each \(\mathbf {\xi }_i^{(j)}\) is a random sample accounting for the fine fidelity model error. The coarse fidelity model bias is accounted for by the propagation of both the control variate ensemble and ancillary ensemble through the coarse fidelity model.

Assume that the sample means, anomalies, and covariances are readily available for the three constituent ensembles, from which it is possible to derive the empirical estimates of the first two moments of the total variate,

$$\begin{aligned} \widetilde{\mathbf {\mu }}_{Z_i}= & {} \widetilde{\mathbf {\mu }}_{X_i} - \mathbf {S}_i(\widetilde{\mathbf {\mu }}_{\hat{U}_i} - \widetilde{\mathbf {\mu }}_{U_i}),\end{aligned}$$
(96)
$$\begin{aligned} \widetilde{\mathbf {\Sigma }}_{Z_i, Z_i}= & {} \widetilde{\mathbf {\Sigma }}_{X_i, X_i} + \mathbf {S}_i\widetilde{\mathbf {\Sigma }}_{\hat{U}_i, \hat{U}_i}\mathbf {S}_i^T + \mathbf {S}_i\widetilde{\mathbf {\Sigma }}_{U_i, U_i}\mathbf {S}_i^T - \mathbf {S}_i\widetilde{\mathbf {\Sigma }}_{\hat{U}_i, X_i} - \widetilde{\mathbf {\Sigma }}_{X_i, \hat{U}_i}\mathbf {S}_i^T, \end{aligned}$$
(97)

where once again we assume that the optimal gain is constant (66) from Theorem 4.

Similar to the standard EnKF, it is not explicitly required to compute the full total background covariance, but merely the related cross-covariances with respect to the observation operator:

$$\begin{aligned} \widetilde{\mathbf {\Sigma }}_{Z^b_i, \mathbf {H}_i Z^b_i} = \widetilde{\mathbf {\Sigma }}_{Z^b_i, Z^b_i} \mathbf {H}_i^T,\end{aligned}$$
(98)
$$\begin{aligned} \widetilde{\mathbf {\Sigma }}_{\mathbf {H}_i Z^b_i, \mathbf {H}_i Z^b_i} = \mathbf {H}_i \widetilde{\mathbf {\Sigma }}_{Z^b_i, Z^b_i} \mathbf {H}_i^T, \end{aligned}$$
(99)

which can efficiently be computed by utilizing the observation ensemble anomalies. From this the sample Kalman gain can be computed.

Applying the MFKF formulas (71) to the MFEnKF statistics it is possible to gain access to the corresponding empirical Kalman gain,

$$\begin{aligned} \widetilde{\mathbf {K}}_{Z^b_i} = \widetilde{\mathbf {\Sigma }}_{Z^b_i, \mathbf {H}_i Z^b_i} {\left( \widetilde{\mathbf {\Sigma }}_{\mathbf {H}_i Z^b_i, \mathbf {H}_i Z^b_i} + \mathbf {\Sigma }_{Y_i, Y_i}\right) }^{-1}, \end{aligned}$$
(100)

and the corresponding analysis of the anomalies,

$$\begin{aligned} \mathbf {A}_{X^a_i}= & {} \mathbf {A}_{X^b_i} - \widetilde{\mathbf {K}}_{Z^b_i}(\mathbf {A}_{\mathbf {H}_i X^b_i} - \mathbf {E}_{Y_i}^X),\end{aligned}$$
(101)
$$\begin{aligned} \mathbf {A}_{\hat{U}^a_i}= & {} \mathbf {A}_{\hat{U}^b_i} - \boldsymbol{\Theta }\widetilde{\mathbf {K}}_{Z^b_i}(\mathbf {A}_{\mathbf {H}_i \boldsymbol{\Theta } \hat{U}^b_i} - \mathbf {E}_{Y_i}^{\hat{U}}),\end{aligned}$$
(102)
$$\begin{aligned} \mathbf {A}_{U^a_i}= & {} \mathbf {A}_{U^b_i} - \boldsymbol{\Theta }\widetilde{\mathbf {K}}_{Z^b_i}(\mathbf {A}_{\mathbf {H}_i \boldsymbol{\Theta } U^b_i} - \mathbf {E}_{Y_i}^U), \end{aligned}$$
(103)

where each \(\mathbf {E}_{Y_i}\) is an ensemble of perturbed observations. In Popov et al. (2020) it was shown that there is no unique ‘nice’ solution to the problem of perturbed observations in the MFEnKF, thus we will leave this discussion aside in this chapter.

In order to get an ensemble of \(Z^a\), we can look towards the ensemble transform Kalman filter, specifically at the ‘left transform’ variant (Sakov and Bertino 2011). Using known properties of the matrix shift lemma (Asch et al. 2016) and the linearity of the control variate relation (39) one can write the transformation of the ensemble of \(X^a\) into anomalies of \(Z^a\) given by the ETKF (23) as:

$$\begin{aligned} \mathbf {E}_{Z^a} = \widetilde{\mathbf {\mu }}_{Z^a}\mathbf {1}_{N_X}^T + {\left( \mathbf {I}_n - \mathbf {S}\boldsymbol{\Theta }\right) }^{\frac{1}{2}}\mathbf {A}_{X^a}, \end{aligned}$$
(104)

which can be implemented in any number of computationally efficient ways (Allen et al. 2000) beyond the scope of this chapter.

Note however that this methodology relies on the equality of the covariances of the control and ancillary variates, which especially in the ensemble case will be violated.

We now discuss the main advantage of the MFEnKF: utilizing the same amount of samples of the fine fidelity model as the standard EnKF, the MFEnKF provides a more accurate mean analysis.

Theorem 8

Assume that we have access to the exact Kalman gains \(K_{Z^b_i}\), \(K_{X^b_i}\) from (68), of the theoretical Kalman filters. The variance of the empirical mean of the analysis total variate computed with the total variate Kalman gain is less that the variance in the empirical mean in the analysis principal variate computed with the principal variate Kalman gain,

$$\begin{aligned} \mathbf {\Sigma }_{\widetilde{\mathbf {\mu }}_{Z^a_i(\mathbf {K}_{Z^b_i})}, \widetilde{\mathbf {\mu }}_{Z^a_i(\mathbf {K}_{Z^b_i})}} \le \mathbf {\Sigma }_{\widetilde{\mathbf {\mu }}_{X^a_i(\mathbf {K}_{X^b_i})}, \widetilde{\mathbf {\mu }}_{X^a_i(\mathbf {K}_{X^b_i})}}. \end{aligned}$$
(105)

Proof

Assuming again that \(N_U \ge N_X\), and by Theorem 7,

$$\begin{aligned} \mathbf {\Sigma }_{\widetilde{\mathbf {\mu }}_{Z^a_i(\mathbf {K}_{Z^b_i})}, \widetilde{\mathbf {\mu }}_{Z^a_i(\mathbf {K}_{Z^b_i})}}\le & {} N_X^{-1} \mathbf {\Sigma }_{Z^a_i(\mathbf {K}_{Z^b_i}), Z^a_i(\mathbf {K}_{Z^b_i})} \end{aligned}$$
(106)
$$\begin{aligned}&+ (N_U^{-1} - N_X^{-1})\mathbf {S}\mathbf {\Sigma }_{U^a_i(\mathbf {K}_{Z^b_i}), U^a_i(\mathbf {K}_{Z^b_i})}\mathbf {S}^T,\end{aligned}$$
(107)
$$\begin{aligned}\le & {} N_X^{-1} \mathbf {\Sigma }_{X^a_i(\mathbf {K}_{X^b_i}), X^a_i(\mathbf {K}_{X^b_i})},\end{aligned}$$
(108)
$$\begin{aligned}= & {} \mathbf {\Sigma }_{\widetilde{\mathbf {\mu }}_{X^a_i(\mathbf {K}_{X^b_i})}, \widetilde{\mathbf {\mu }}_{X^a_i(\mathbf {K}_{X^b_i})}}, \end{aligned}$$
(109)

as required.

The perturbed observations MFEnKF is similar to a typical EnKF algorithm in the way in which inflation and B-localization can be applied. An important salient difference is that there is now an additional hyperparameter, namely the inflation factor for the ancillary ensemble \(\alpha _U\). Inflation for the principal and control ensembles \(\alpha _X\) should be the same in order to keep them highly correlated. As optimal inflation is known to depend on the ensemble size (Popov and Sandu 2020), it should generally be the case that \(\alpha _U < \alpha _X\).

3.3 Other ‘Multi-x’ Data Assimilation Algorithms

In this section we discuss other data assimilation algorithms in the ‘multi-’ family that do not, as-of-now, have rigorous multifidelity counterparts.

3.3.1 Particle Filters

In Giles (2008, 2015), Giles discusses ‘multilevel’ Monte-Carlo simulations. The case of projection and interpolation is ignored, and the optimal gain is explicitly set to be identity. The author examines the component variates as being coupled through their differences, which in a two-level control variate framework is equivalent to examining the variates:

$$\begin{aligned} {\left\{ \begin{array}{ll}U\\ X - \hat{U} \end{array}\right. }, \end{aligned}$$
(110)

treating each as an independent source of information, with means:

$$\begin{aligned} {\left\{ \begin{array}{ll}\mathbf {\mu }_U\\ \mathbf {\mu }_X - \mathbf {\mu }_{\hat{U}} \end{array}\right. }. \end{aligned}$$
(111)

The implicit assumption is that the only important source of information is the mean-estimate, and that \(\mathbf {\mu }_U\) carries negligible uncertainty. The intended use of such algorithms is in small-dimensional cases where large ensembles on coarse fidelity models can be created, thus this is not an unreasonable assumption.

In Gregory et al. (2016), Gregory and Cotter (2017), Gregory and co-authors propose ‘multilevel’ ensemble transform particle  filters (ETPF). The authors again employ a linear control variate structure where the optimal gain is assumed to be the identity, and in which all variable operations are performed on the same empirical measures. The authors pay attention to the need for their principal and control variate ensemble to be related, but do not pay attention to the optimality of the couplings. Furthermore the authors utilize a coupling that is optimal for Gaussian random variables; an optimal coupling based on optimal transport could be utilized, while at the same time performing transformations between ensembles through optimal transport techniques.

It is of independent interest to develop more rigorous ‘multifidelity’ generalizations of such algorithms using the couplings outlined in this chapter.

3.3.2 Ensemble Kalman Filters

In Chernov et al. (2017), Hoel et al. (2016), the authors propose a ‘multilevel’ EnKF. The authors extend the empirical measures (110) to spatial relations. In a two-level framework the authors analyze the variables

$$\begin{aligned} {\left\{ \begin{array}{ll}U\\ X - \mathbf {\Phi }\hat{U} \end{array}\right. }, \end{aligned}$$
(112)

again treating each as an independent source of information, with the means:

$$\begin{aligned} {\left\{ \begin{array}{ll}\mathbf {\mu }_U\\ \mathbf {\mu }_X - \mathbf {\Phi }\mathbf {\mu }_{\hat{U}} \end{array}\right. }, \end{aligned}$$
(113)

and the signed empirical measure covariance estimates:

$$\begin{aligned} {\left\{ \begin{array}{ll}\mathbf {\Sigma }_{U,U}\\ \mathbf {\Sigma }_{X,X} - \mathbf {\Sigma }_{\mathbf {\Phi }\hat{U},\mathbf {\Phi }\hat{U}} \end{array}\right. }. \end{aligned}$$
(114)

This covariance estimate is not guaranteed to be semi-positive definite. Additionally, no attention is paid to utilize an optimal gain linear control variate structure, and the enforcement the principal-control variate relation.

4 Multifidelity Observations

We now discuss an optimal way in which to combine observations from different sources at roughly similar physical locations of the same phenomenon. In operational literature this is commonly dubbed ‘super-observations’ (Cummings 2005; Oke et al. 2008), though such formulations are largely heuristic in that they take naive averages of interpolations of similar observations. The chief reason why observations are combined instead of used separately is to reduce the observation space dimension, making similar information represented in a denser format.

Assume that the true state is \(\mathbf {X}^t\), and recall the multifidelity observation definition (6) where the observations \(Y^\chi \) and \(Y^\upsilon \) have the observation errors \(\eta ^\chi \), and \(\eta ^\upsilon \) that are assumed to be unbiased and independent. We make the additional assumption that the fine fidelity and coarse fidelity observation operators are deterministically related by the coupling

$$\begin{aligned} \mathcal {H}^\upsilon (\mathbf {x}) = \theta (\mathcal {H}^\chi (\mathbf {x})), \end{aligned}$$
(115)

similar to the state relation assumed in (16).

The truth in observation space is assumed to be the expected value of the observation for each fine and coarse observation. This can be alternatively reformulated as the truth in observation space is distributed according to a distribution with mean \(\mathbf {Y}^\chi \) and \(\mathbf {Y}^\upsilon \) for the fine and coarse observations respectively.

A canonical way of dealing with such a scenario is by ‘stacking’ the observations, and creating the observation operator

$$\begin{aligned} \mathcal {H}(\mathbf {X}) = \begin{bmatrix}\mathcal {H}^\chi (\mathbf {X})\\ \mathcal {H}^\upsilon (\mathbf {X}) \end{bmatrix}. \end{aligned}$$
(116)

We will not pursue this approach, as it increases the dimensionality of the observations without increasing the information content.

Under the linear control variate approach the total variate observation mean is defined to be:

$$\begin{aligned} \mathbf {\mu }_{Y^\zeta } = \mathbf {\mu }_{Y^\chi } - \mathbf {S}(\theta (\mathbf {\mu }_{Y^\chi } ) - \mathbf {\mu }_{Y^\upsilon } ), \end{aligned}$$
(117)

where one implicitly assumes that \(\mathbb {E}[\theta (Y^\chi )] = \mathbb {E}[Y^\upsilon ]\). The optimal gain is,

$$\begin{aligned} \mathbf {S} = \mathbf {\Sigma }_{Y^\chi ,\theta (Y^\chi )}{\left( \mathbf {\Sigma }_{\theta (Y^\chi ),\theta (Y^\chi )} + \mathbf {\Sigma }_{Y^\upsilon ,Y^\upsilon }\right) }^{-1}, \end{aligned}$$
(118)

with the new covariance of the total observation given by

$$\begin{aligned} \mathbf {\Sigma }_{Y^\zeta ,Y^\zeta } = \mathbf {\Sigma }_{Y^\chi ,Y^\chi } - \mathbf {\Sigma }_{Y^\chi ,\theta (Y^\chi )}{\left( \mathbf {\Sigma }_{\theta (Y^\chi ),\theta (Y^\chi )} + \mathbf {\Sigma }_{Y^\upsilon ,Y^\upsilon }\right) }^{-1}\mathbf {\Sigma }_{\theta (Y^\chi ),Y^\chi }. \end{aligned}$$
(119)

Evaluation of this formula, however, requires knowledge of both \(\mathbf {\Sigma }_{Y^\chi ,\theta (Y^\chi )}\) and \(\mathbf {\Sigma }_{\theta (Y^\chi ),\theta (Y^\chi )}\), which might not be readily available.

An alternate approach is to utilize the importance sampling framework. Assume we have an ensemble of perturbed observations, \(\mathbf {E}_{Y^\chi } = \left[ \mathbf {Y}^{\chi ,{(1)}}, \ldots , \mathbf {Y}^{\chi ,{(M)}}\right] \), representing M independent samples from the assumed distribution of the fine fidelity observation \(\pi _{Y^\chi }\). Apply the importance sampling procedure to generate the weights:

$$\begin{aligned} {[\mathbf {w}]}_i \propto \pi _{Y^\upsilon }\left( \theta \left( \mathbf {Y}^{\chi ,{(i)}}\right) \right) . \end{aligned}$$
(120)

The unbiased mean and covariance estimates of the total observation are given by

$$\begin{aligned} \widetilde{\mathbf {\mu }}_{Y^\zeta } = \mathbf {E}_{Y^\chi }\mathbf {w},\quad \widetilde{\mathbf {\Sigma }}_{Y^\zeta ,Y^\zeta } = \frac{M}{M-1}\mathbf {E}_{Y^\chi }\left( \text {diag}(\mathbf {w}) - \mathbf {w}\mathbf {w}^T\right) \mathbf {E}_{Y^\chi }^T. \end{aligned}$$
(121)

Alternatively, an ensemble of equally weighted perturbed observations to be used with a perturbed observations EnKF can be derived by the optimal transport framework,

$$\begin{aligned} \mathbf {E}_{Y^\zeta } = \mathbf {E}_{Y^\chi }\mathbf {T}^*, \end{aligned}$$
(122)

given by (27).

As many of these methods rely on empirical estimates of the total observation covariance matrix, methods such as localization can trivially be applied, especially since in most operational algorithms for physical systems the observation covariance is typically assumed to be diagonal.

5 Numerical Experiments

For the sake of completeness we provide a simple twin experiment on a simple dynamical system to test a two-fidelity MFEnKF.

For the fine fidelity model we use the 40-variable Lorenz ’96 system (Lorenz 1996), posed as an ODE:

$$\begin{aligned} \mathbf {x}'_i = f_X(\mathbf {x}) = -\mathbf {x}_{i-1}(\mathbf {x}_{i-2} - \mathbf {x}_{i+1}) - \mathbf {x}_{i} + 8,\quad 1 \le i \le 40, \end{aligned}$$
(123)

where \(\mathbf {x}_0 :=\mathbf {x}_{40}\), \(\mathbf {x}_{-1} :=\mathbf {x}_{39}\) and \(\mathbf {x}_{41} :=\mathbf {x}_{1}\).

Fig. 2
figure 2figure 2

Comparison of the analysis empirical mean RMSE of a localized perturbed observations MFEnKF with a localized perturbed observations EnKF, for various fine fidelity (full order) model ensemble sizes

We use the method of snapshots (Sirovich 1987) to construct linear projection and interpolation operators, \(\mathbf {\Phi }\) and \(\boldsymbol{\Theta }\), utilizing 20000 snapshots over an expressive time interval of 1000 units.

For the coarse fidelity we consider a reduced order model built using a naive approach, where we evaluate the derivative in the full space and then project onto the reduced space:

$$\begin{aligned} \mathbf {u}' = f_U(\mathbf {u}) = \boldsymbol{\Theta } f_X(\mathbf {\Phi } \mathbf {u}). \end{aligned}$$
(124)

For the Lorenz ’96 system this can be written equivalently as a multivariate quadratic equation.

The Lorenz ’96 system is known to have a Kaplan-Yorke dimension of 27.1 (Popov and Sandu 2019). For this reason we take \(r=28\) reduced modes to describe the whole system (though this is only possible non-linearly). In the reduced model, this represents about \(90\%\) of the total energy of the system, as represented by the ratio of the captured eigenvalues to the total eigenvalues. In this context it is actually relatively difficult to build a reduced order model for the Lorenz ’96 system.

We compare the algorithm to the standard perturbed observations ensemble Kalman filter. Both algorithms will use forecast anomaly inflation and Gaspi-Cohn covariance localization (Gaspari and Cohn 1999).

We observe every other variable every \(\Delta t = 0.05\) time units, with a Gaussian error (2) of \(\mathbf {\Sigma }_{Y,Y} = \mathbf {I}_{20}\).

We perform localization and inflation as follows. For forecast anomaly inflation for the full system we will take \(\alpha _X = 1.1\) and for the coarse system \(\alpha _U = 1.00\) as the reduced order model is less stable than the full order model, thus not requiring inflation. To retain an undersampled ensemble for the ancillary variate, we choose an ensemble size of \(N_U = 25\). The inner parameter of the localization function is selected to match that of a Gaussian kernel, and set the radius to be equal to 4 (Petrie and Dance 2010).

Figure 2 shows the relationship between the principal variate ensemble size and the spatio-temporal RMSE of the empirical analysis mean of the MFEnKF and the EnKF. As can be seen, the problem is comparatively difficult for the EnKF, as it requires at least 18 fine fidelity ensemble members for a stable behavior. The same RMSE can be achieved with less than 10 fine fidelity ensemble members in the MFEnKF framework. Assuming the coarse fidelity model runs are significantly cheaper (not true in this trivial contrived example) then the MFEnKF is clearly superior.

We note that there is some loss of accuracy in the results, due in part to several assumptions that are violated. One is that the orthogonal complement space is uncorrelated with that of the full space (65). As we are capturing \(90\%\) of the energy of the system, the rest of the energy is not that negligible, and is no doubt highly correlated with the what is captured. Methods to diminish the influence of this error, would be needed for operational systems.

6 Discussion

Multifidelity data assimilation, and multifidelity inference in general, seek to leverage the availability of information about reality at multiple resolution levels. The field is still in its infancy, but the multifidelity methods are highly promising. This chapter provides a general philosophical and theoretical framework for the development of such methods. New multifidelity data assimilation approaches should utilize efficient coarse fidelity models to speed up high fidelity inference. The new methods should be grounded in sound statistical and probabilistic theory.

In this chapter we focus on the multifidelity stochastic EnKF. Variational multifidelity approaches have been developed in Stefanescu et al. (2015). Square root multifidelity Kalman filters, analogues to the perturbed observations MFEnKF, must be developed in the future. Particle filters that are appropriate for non-Gaussian probability densities, or even hybrid EnKF-PF systems where different variates are assimilated with different algorithms, might provide an avenue for development of multifidelity particle filtering. Multifidelity hybrid data assimilation, that combines multifidelity EnKF and multifidelity variational methods, are also a promising future venue. Finally, the construction of a hierarchy of coarser models to support data assimilation should be carefully investigated. For example, methods based on machine learning (e.g., as discussed in Moosavi et al. 2018a, b, or non-linear projections using autoencoders) are of considerable interest.