1 Introduction

The goal of four-dimensional variational (4D-Var) data assimilation is to estimate unknown control variables of a dynamical system—classically the initial condition of the system—that provide the best fit of the system outputs with observation data over a specific time interval (Courtier 1997; Dimet and Talagrand 1986; Lorenc 1981, 1986; Sasaki 1970). The use of 4D-Var data assimilation is prevalent in oceanography (Bennett 1993) and meteorology (Lynch 2015), where the dynamical system is described by partial differential equations (PDEs); see the recent texts (Law et al. 2015; Reich and Cotter 2015) and references therein for variational data assimilation in general.

We consider two variants of the 4D-Var problem. In the traditional strong-constraint 4D-Var formulation, the model is assumed to be “perfect” and only the initial conditions serve as the (unknown) control variable. The weak-constraint 4D-Var formulation additionally accounts for an imperfect model in the traditional formulation by introducing and finding a forcing term to account for the model error. In the weak-constraint case, the unknown initial condition and unknown model-error forcing term thus serve as control variables; for various weak-constraint formulations see e.g. Trémolet (2006).

The 4D-Var problem is usually cast as an optimization problem and has very close connections to optimal control theory (Vermeulen and Heemink 2006). A cost functional is introduced consisting of two terms in the classical strong-constraint formulation: the first term penalizes the misfit between the (unknown) initial condition and its prior background information and the second term penalizes the distance between the predicted system outputs and the observation data. In the weak-constraint case, another term is added which penalizes the model-error forcing. The optimal estimate of the initial condition is then found by minimizing the cost functional subject to the governing equations of the dynamical system, i.e., the PDE. After discretization of the PDE using classical techniques such as finite elements or volumes, the 4D-Var problem results in a large-scale optimization problem which is typically very expensive to solve due to the high-dimensional state and control variable spaces and the associated computation of the cost functional, gradient, and possibly Hessian. Note that in the discretized weak-constraint formulation, the model-error forcing is also assumed to be spatially distributed and thus has approximately the same dimension as the state and initial condition. To lower the tremendous computational cost for solving the problem, an incremental approach has been proposed in Courtier et al. (1994).

Another way to speed up the solution process is a reduced-order approach; such approaches have been proposed successfully for the strong-constraint 4D-Var formulation in, for example, Cao et al. (2007), Daescu and Navon (2008), Dimitriu et al. (2010), Hoteit and Köhl (2006), Robert et al. (2005), Vermeulen and Heemink (2006) and Ştefănescu et al. (2015). There are two kinds of 4D-Var reduced-order approaches in the literature: In the first approach (Hoteit and Köhl 2006; Robert et al. 2005; Vermeulen and Heemink 2006), a reduced basis space is introduced, e.g. using empirical orthogonal functions, for only the control variable (initial condition). By limiting the search space to the reduced space, the optimization cost per iteration decreases and the convergence improves (at least during the first few iterations). In the second approach (Cao et al. 2007; Daescu and Navon 2008; Dimitriu et al. 2010), a reduced-order model for the system dynamics using proper orthogonal decomposition (POD) is additionally introduced. This leads to an additional speed-up and significant overall computational savings compared to reducing only the control space. All of these approaches also consider adapting the basis during the optimization. However, to the best of our knowledge, a posteriori error bounds to assess the sub-optimality of the reduced-order 4D-Var solutions have not yet been developed.

In this paper, we develop efficiently evaluable a posteriori error bounds for reduced order solutions of the strong- and weak-constraint 4D-Var data assimilation problem. We consider the standard quadratic 4D-Var cost functional constrained by parametrized linear parabolic PDEs involving noisy observations in time. Our final goal is not only to recover the “usual” 4D-Var control variables, i.e., the initial condition and model-error forcing, but also the model parameters. A preliminary improvement of the model itself before estimating the state can result in an improved state estimate, see e.g. the application in Habert et al. (2016). We thus obtain a bilevel optimization problem where the outer optimization stage is performed over the model parameters after an inner optimization stage identical to the standard 4D-Var setting, i.e., an optimization over control variables for given fixed model parameters. In this paper, we focus mainly on the inner optimization stage and propose a posteriori error bounds for the control variable. Our main contributions are as follows:

  • In Sect. 3, we consider the strong-constraint 4D-Var formulation. We employ the reduced basis method to generate reduced order approximations for the solution of the parametrized 4D-Var problem, i.e., the state, adjoint, and control variables (i.e., the initial condition). We then propose an a posteriori error bound for the control variable that allows us to assess the error between the reduced-order 4D-Var solution and 4D-Var solution of the underlying high-dimensional FE approximation.

  • In Sect. 4, we extend the reduced basis approximation and a posteriori error estimation procedure from the strong- to the weak-constraint case. For simplicity of exposition, we consider the model-error forcing as the only unknown control variable in this section.

  • In Sect. 5, we combine the results from the two previous sections and consider problems with unknown initial condition and model-error forcing.

With the assumption of affine parameter dependence, the reduced-order 4D-Var problems and the a posteriori error bounds can be efficiently evaluated using an offline-online computational decomposition. Problems involving material parameters often naturally satisfy an affine parameter dependence, and even geometric parameters can often be treated after introducing suitable affine mappings onto a reference domain (Rozza et al. 2008). Furthermore, the dimension reduction as well as the a posteriori error bound formulation presented in this paper still hold even for non-affine problems. However, for non-affine problems the computations can no longer be decomposed into offline-online stages, and the online computational efficiency thus suffers. To address this issue, the non-affine case can be treated using the empirical interpolation method (EIM) which replaces the non-affine terms using an affine approximation and thus allows us to regain the online-computational efficiency; we refer the interested reader e.g. to Barrault et al. (2004), Grepl et al. (2007) and Maday et al. (2007).

We present numerical results for the strong- and weak-constraint setting in Sect. 6. We consider the dispersion of a pollutant governed by a convection-diffusion equation with a Taylor–Green vortex velocity field. Our goal is to recover the initial condition (in the strong-constraint case) or the model-error forcing (in the weak-constraint case) given noisy measurements of the pollutant concentration at five spatial locations over time. Since we focus on the solution of the inverse problem here, we limit our test case to low Peclet numbers up to 50. The reason is that high Peclet numbers pose significant challenges for model reduction even for the forward problem itself: the high Peclet regime may require stabilization and faces a growing Kolmogorov N-width and associated increase of the reduced order spaces. However, even Peclet numbers below 50 are still practically relevant and do appear in realistic scenarios, see e.g., Marshall et al. (2006).

We note that there is a close connection between the 4D-Var problem formulation and optimal control and that a posteriori error bounds for reduced order solutions to optimal control problems have been developed previously. However, rigorous and efficiently evaluable error bounds have been proposed mainly for elliptic problems KG2012,Kaercher2017,NRMsps2012), whereas error bounds for parabolic optimal control problems are either not rigorous (Dedè 2010) or not (online-)efficient (Tröltzsch and Volkwein 2009). The only exception for parabolic problems is Kärcher and Grepl (2014), which considers only scalar time-dependent controls and is based on a pertubation argument, often resulting in a more conservative error bound (Kärcher et al. 2017).

Finally, we note that the reduced basis method has already been used in a parameterized-background data-weak approach to variational data assimilation in Maday et al. (2015a, b). However, this previous work considers the elliptic case and presents a relaxation of the 3D-Var setting, whereas we consider the time-dependent case using the classical 4D-Var formulation. Before introducing some preliminary definitions and assumptions in the following section, we do note that although we consider the 4D-Var problem here, our approach directly applies to the 3D-Var setting since the two are formally similar (Lynch 2015).

2 Preliminaries

In this section, we introduce the necessary ingredients and definitions for the subsequent discussion. The 4D-Var problem is usually cast in a fully discrete setting; we thus directly consider a spatial finite element (FE) and temporal finite difference (FD) discretization using the weak variational formulation. We summarize the continuous formulation of the 4D-Var problem in “Appendix 1”.

Let \(Y_\mathrm {e}\) with \(H^1_0({\varOmega }) \subset Y_\mathrm {e}\subset H^1({\varOmega })\) be a Hilbert space of functions over the bounded Lipschitz domain \({\varOmega }\subset \mathbb {R}^d,\) \(d \in \mathbb {N},\) with boundary \({\varGamma }.\) The inner product and induced norm associated with \(Y_\mathrm {e}\) are given by \((\cdot ,\cdot )_Y\) and \(\left||\cdot \right||_Y = \sqrt{(\cdot ,\cdot )_Y},\) respectively. We assume that the norm \(\left||\cdot \right||_Y\) is equivalent to the \(H^1({\varOmega })\)-norm and denote the dual space of \(Y_\mathrm {e}\) by \(Y_\mathrm {e}'.\) We also introduce the Hilbert space for the control, \(U_\mathrm {e}= L^2({\varOmega }),\) together with its inner product \((\cdot ,\cdot )_U,\) induced norm \(\left||\cdot \right||_U = \sqrt{(\cdot ,\cdot )_U},\) and associated dual space \(U_\mathrm {e}'.\) Furthermore, let \(\mathcal {D}\subset \mathbb {R}^P\) be a prescribed P-dimensional compact set in which our P-tuple input parameter \(\mu = (\mu _1,\ldots ,\mu _{P})\) resides.

We divide the time interval [0, T] with fixed final time T into K subintervals of equal length \(\tau = \frac{T}{K}\) and define \(t^k = k \, \tau , \ 0 \le k \le K,\) and \(\mathbb {K}= \{ 1, \dots , K \}.\) We also introduce two conforming finite element approximation spaces \(Y \subset Y_\mathrm {e}\) and \(U \subset U_\mathrm {e}\) of typically large dimension \(\mathcal {N}_Y = \dim (Y)\) and \(\mathcal {N}_U= \dim (U);\) note that Y and U shall inherit the inner product and norm from \(Y_\mathrm {e}\) and \(U_\mathrm {e},\) respectively. We shall assume that the spaces \(Y,U\) and the number of timesteps K are large enough – i.e. Y and U are sufficiently rich and the time-discretization sufficiently fine – such that the FE-FD approximation guarantees a desired accuracy over the whole parameter domain \(\mathcal {D}.\)

We next introduce the (for the sake of simplicity) parameter-independent bilinear forms \(m(w,v) = (w,v)_{L^2({\varOmega })}\) for all \(w,v \in L^2({\varOmega })\) and \(b(\cdot ,\cdot ): U \times Y \rightarrow \mathbb {R}.\) We assume that \(b(\cdot ,\cdot )\) is continuous, i.e.

$$\begin{aligned} \gamma _b= \sup _{w \in U \setminus \{0\}} \sup _{v \in Y \setminus \{0\}} \frac{b(w,v)}{\left||w\right||_{U} \left||v\right||_{Y}} < \infty . \end{aligned}$$
(1)

We also introduce the parameter-dependent bilinear form \(a(\cdot ,\cdot ;\mu ): Y \times Y \rightarrow \mathbb {R},\) which we assume to be continuous, coercive,

$$\begin{aligned} \alpha (\mu )= \inf _{v \in Y \setminus \{0\}} \frac{a(v,v;\mu )}{\left||v\right||_Y^2} \ge \underline{\alpha } > 0 \quad \forall \mu \in \mathcal {D}, \end{aligned}$$
(2)

and affinely parameter-dependent,

$$\begin{aligned} a(w,v;\mu ) = \sum _{q=1}^{Q_a} {\varTheta }_a^q(\mu ) \, a^q(w,v) \quad \forall w,v \in Y, \quad \forall \mu \in \mathcal {D}, \end{aligned}$$
(3)

for some (preferably) small integer \(Q_a.\) Here, the coefficient functions \({\varTheta }_a^q: \mathcal {D}\rightarrow \mathbb {R}\) are continuous and depend on \(\mu ,\) but the continuous bilinear forms \(a^q: Y \times Y \rightarrow \mathbb {R}\) do not depend on \(\mu .\)

We also require the continuous linear functional \(f(\cdot ): Y \rightarrow \mathbb {R}\) and the continuous and linear (observation) operator \(C: Y \rightarrow D,\) where D is a suitable Hilbert space of observations with inner product \((\cdot ,\cdot )_D\) and norm \(\left||\cdot \right||_D.\) Although a more general setting is possible, we consider here the observation space \(D = \mathbb {R}^l\) and the observation operator given by \(C \phi = (h_1(\phi ), \dots , h_{\ell }(\phi ))^T,\) where \(h_i \in Y'\) are linear output functionals. The continuity constant of the operator C is given by

$$\begin{aligned} \gamma _c= \sup _{v \in Y \setminus \{0\}} \frac{\left||C v\right||_{D}}{\left||v\right||_Y}. \end{aligned}$$
(4)

For the development of the a posteriori error bounds we assume that we have access to a positive lower bound \(\alpha _{\mathrm{LB}}(\mu ): \mathcal {D}\rightarrow \mathbb {R}_{+}\) for the coercivity constant \(\alpha (\mu )\) defined in (2) such that

$$\begin{aligned} 0 < \underline{\alpha } \le \alpha _{\mathrm{LB}}(\mu )\le \alpha (\mu )\quad \forall \mu \in \mathcal {D}. \end{aligned}$$
(5)

We note that \(\alpha _{\mathrm{LB}}(\mu )\) is used in the a posteriori error bound formulation to replace the actual coercivity constants. Whereas the constants \(\gamma _b\) and \(\gamma _c\) are parameter-independent and can thus be computed once offline, we require that the coercivity lower can be efficiently evaluated online, i.e., the computational cost is independent of the FE dimension \(\mathcal {N}.\) Various recipes exist to obtain such bounds (Huynh et al. 2007; Rozza et al. 2008).

3 Strong-constraint 4D-Var

In this section, we consider the strong-constraint 4D-Var data assimilation problem. The extension to the weak-constraint case is considered in Sect. 4.

3.1 Problem statement

For a given parameter \(\mu \in \mathcal {D},\) the classical 4D-Var problem can be stated as the minimization problem

$$\begin{aligned} \begin{aligned}&\min _{y \in Y^K, \, u \in U} J(y,u;\mu ) \quad \text {s.t.} \quad y \in Y^K \quad \text {solves} \\&m(y^k,v) + \tau \, a(y^k,v;\mu ) = m(y^{k-1},v) + \tau f(v) \quad \forall v \in Y, \ \forall k \in \mathbb {K}, \end{aligned} \end{aligned}$$
(6)

with initial condition \(m(y^0,v) = m(u,v)\) for all \(v \in Y,\) and cost functional \(J(\cdot ,\cdot ;\mu ): Y^K \times U \rightarrow \mathbb {R}\) given by

$$\begin{aligned} J(y,u;\mu ) = \frac{1}{2} \left||u - u_d\right||_{U}^2 + \frac{\tau }{2} \sum _{k=1}^K \left||C y^k - z_d^k\right||^2_{D}. \end{aligned}$$
(7)

Here \(u_d \in U\) is the background state (also referred to as the prior), i.e., the best estimate of the true initial condition \(u \in U\) prior to measurements being available, and \(z_d^k \in D,\) \(k \in \mathbb {K},\) is the given data, e.g., observed outputs. The first term in the cost functional penalizes the deviation of the initial condition from the background state, the second term penalizes the deviation of the predicted outputs from the given data/observed outputs. The relative weight of both terms is affected by the choice of the \((\cdot ,\cdot )_U\) and \((\cdot ,\cdot )_D\) inner products. Note that we use u for the unknown control/initial condition to signify the similarity to optimal control and the notation \(J(\cdot ,\cdot ;\mu )\) to indicate the implicit dependence of the cost functional J on the parameter \(\mu\) through the state y. However, to simplify the notation we often do not explicitly state the dependence of the state and control on the parameter, i.e., we use \(y^k\) and u instead of \(y^k(\mu )\) and \(u(\mu ),\) respectively.

We would like to point out that the first term in (7) represents a Tichonov regularization of the cost functional and that the regularization parameter is “hidden” in the choice of the inner product. We refer to Engl et al. (1996) for regularization of inverse problems in general and to Puel (2009) for Tichonov regularization in data assimilation. Furthermore, we note that the choice of the norm for the data misfit term depends on the characteristics of the noise and is inspired by Gaussian noise in this paper. Different noise characteristics may require a different choice of norm; we refer e.g. to Rao et al. (2017) for a discussion using \(L_1\) and Huber norms instead of the \(L_2\) norm. The approach presented in the following is restricted to the case of Gaussian noise.

Employing a Lagrangian approach, we obtain the associated necessary, and in our setting sufficient, first-order optimality conditions: Given \(\mu \in \mathcal {D},\) the optimal solution \((y^*,p^*,u^*) \in Y^K \times Y^K \times U\) satisfies

$$m(y^{*,k} - y^{*,k-1},\phi ) + \tau \, a ( y^{*,k},\phi ;\mu )= \tau \, f(\phi ) \quad \forall \phi \in Y, \ \forall k \in \mathbb {K},$$
(8a)
$$m(y^{*,0}, \phi )= m(u^*,\phi ) \quad \forall \phi \in Y,$$
(8b)
$$m(\varphi , p^{*,k} - p^{*,k+1}) + \tau \, a ( \varphi ,p^{*,k};\mu )= \tau \, \left( z_d^k - C y^{*,k}, C \varphi \right) _{D} \quad \forall \varphi \in Y, \ \forall k \in \mathbb {K},$$
(8c)
$$\left( u^{*} - u_d,\psi \right) _U - m(\psi , p^{*,1})= 0 \quad \forall \psi \in U,$$
(8d)

where the final condition of the adjoint is given by \(p^{*,K+1} = 0.\) Concerning the existence and uniqueness of the 4D-Var problem specifically and of saddle point problems in general we refer to Bröcker (2017) and Benzi et al. (2005).

3.1.1 Algebraic formulation

The 4D-Var problem is usually stated using an algebraic formulation (Ide et al. 1997). We thus briefly outline the algebraic equivalent of (6) by introducing a basis for the finite element spaces Y and U such that \(Y = {{\mathrm{span}}}\{\phi _i^y, \, i = 1, \ldots , \mathcal {N}_Y\}\) and \(U = {{\mathrm{span}}}\{\phi _i^u, \, i = 1, \ldots , \mathcal {N}_U\},\) respectively. We express the state, adjoint, and control, respectively, as

$$\begin{aligned} y^k = \textstyle \sum \limits _{i = 1}^{\mathcal {N}_Y} y^k_i \phi _i^y, \qquad p^k = \textstyle \sum \limits _{i = 1}^{\mathcal {N}_Y} y^k_i \phi _i^y, \qquad u = \textstyle \sum \limits _{i = 1}^{\mathcal {N}_U} u_i \phi _i^u, \end{aligned}$$

and denote the corresponding coefficient vectors by \(\mathrm {y}^k = [y_1^k, \ldots , y_{\mathcal {N}_Y}^k]^T \in \mathbb {R}^{\mathcal {N}_Y},\) \(\mathrm {p}^k = [p_1^k, \ldots , p_{\mathcal {N}_Y}^k]^T \in \mathbb {R}^{\mathcal {N}_Y},\) and \(\mathrm {u}= [u_1, \ldots , u_{\mathcal {N}_U}]^T \in \mathbb {R}^{\mathcal {N}_U}.\) We thus obtain the algebraic formulation of the classical 4D-Var minimization problem

$$\begin{aligned} \begin{aligned}&\min J(\mathrm {y},\mathrm {u};\mu ) = \frac{1}{2} (\mathrm {u}- \mathrm {u}_b)^T \mathrm {U}(\mathrm {u}- \mathrm {u}_b) + \frac{\tau }{2} \sum _{k=1}^K \left( \mathrm {C}\mathrm {y}^k - \mathrm {z}_d^k\right) ^T \mathrm {D}\left( \mathrm {C}\mathrm {y}^k - \mathrm {z}_d^k\right) , \\&\text {s.t.} \quad \mathrm {y}^k \in \mathbb {R}^{\mathcal {N}_Y} \quad \text {solves}\quad \mathrm {M}\mathrm {y}^k + \tau \, \mathrm {A}(\mu ) \mathrm {y}^k = \mathrm {M}\mathrm {y}^{k-1} + \tau \mathrm {F}\quad \forall k \in \mathbb {K}, \\&\text {with\, initial\, condition}\quad \mathrm {M}\mathrm {y}^0 = \mathrm {M}_u \mathrm {u}. \end{aligned} \end{aligned}$$
(9)

Here \(\mathrm {M}\in \mathbb {R}^{\mathcal {N}_Y \times \mathcal {N}_Y},\) \(\mathrm {A}(\mu ) \in \mathbb {R}^{\mathcal {N}_Y \times \mathcal {N}_Y},\) \(\mathrm {F}\in \mathbb {R}^{\mathcal {N}_Y},\) and \(\mathrm {C}\in \mathbb {R}^{\ell \times \mathcal {N}_Y}\) are the usual finite element mass matrix, stiffness matrix, load vector, and state-to-output matrix with entries \(\mathrm {M}_{ij} = m(\phi _j^y,\phi _i^y),\) \(\mathrm {A}_{ij}(\mu ) = a(\phi _j^y,\phi _i^y;\mu ),\) \(\mathrm {F}_{i} = f(\phi _i^y),\) and \(\mathrm {C}_{ij} = h_i(\phi _j^y),\) respectively. The matrix \(\mathrm {M}_u \in \mathbb {R}^{\mathcal {N}_Y \times \mathcal {N}_U}\) is given by \((\mathrm {M}_u)_{ij} = m(\phi _j^u,\phi _i^y).\) Furthermore, the matrices \(\mathrm {U}\in \mathbb {R}^{\mathcal {N}_Y \times \mathcal {N}_Y}\) with entries \(\mathrm {U}_{ij} = (\phi _j^y,\phi _i^y)_U\) and \(\mathrm {D}\in \mathbb {R}^{\ell \times \ell }\) with entries \(\mathrm {D}_{ij} = (e_j,e_i)_D\) can be identified as the inverses of the background and observation error covariance matrices, respectively. Here, \(e_i\) denotes the ith unit vector in \(\mathbb {R}^{\ell }.\)

The derivation and algebraic formulation of the optimality system (8) is standard and thus omitted for brevity. Further, in our problem setting the first-discretize-then-optimize and first-optimize-then-discretize strategies lead to the same algebraic formulation of the first-order optimality system. For more details on these two approaches, we refer to Hinze et al. (2009) and for time-dependent problems specifically to Stoll and Wathen (2013).

3.2 Reduced basis approximation

We first assume that we are given the reduced basis spaces \(Y_N \subset Y\) for the state and adjoint, and \(U_N^0 \subset U\) for the control. Here, \(1 \le N \le N_{\mathrm{max}}\) is the number of iterations of the POD-Greedy sampling procedure to construct the spaces \(Y_N\) and \(U_N^0\) discussed in Sect. 4.4. Note that the dimensions \(N_Y(N) := \dim (Y_N)\) and \(N_U^0(N) := \dim (U_N^0)\) of the reduced basis spaces depend on N but are in general not equal to N. Furthermore, the basis functions of \(Y_N\) and \(U_N^0\) are orthogonalized with respect to the \((\cdot ,\cdot )_Y\) and \((\cdot ,\cdot )_U\) inner product, respectively.

We next replace the finite element approximation of the PDE constraint in the 4D-Var problem statement (6) with its reduced basis approximation. For a given parameter \(\mu \in \mathcal {D}\), the reduced-order 4D-Var data assimilation problem can thus be stated as

$$\begin{aligned} \begin{aligned}&\min _{y_N \in Y_N^K, \, u_N \in U_N^0} J(y_N,u_N;\mu ) \quad \text {s.t.} \quad y_N \in Y_N^K \quad \text {solves} \\&m(y_N^k,v) + \tau \, a(y_N^k,v;\mu ) = m(y_N^{k-1},v) + \tau f(v) \quad \forall v \in Y_N, \ \forall k \in \mathbb {K}, \end{aligned} \end{aligned}$$
(10)

with initial condition \(m(y_N^0,v) = m(u_N,v)\) for all \(v \in Y_N\).

We can again employ a Lagrangian approach to obtain the reduced-order optimality system: Given any \(\mu \in \mathcal {D}\), the optimal solution \((y_N^*,p_N^*,u_N^*) \in Y_N^K \times Y_N^K \times U_N^0\) satisfies

$$m\left( y_N^{*,k} - y_N^{*,k-1},\phi \right) + \tau \, a \left( y_N^{*,k},\phi ;\mu \right)= \tau \, f(\phi )\quad\forall \phi \in Y_N, \ \forall k \in \mathbb {K},$$
(11a)
$$m\left( y_N^{*,0}, \phi \right)= m\left( u_N^*,\phi \right)\quad \forall \phi \in Y_N,$$
(11b)
$$m\left( \varphi , p_N^{*,k} - p_N^{*,k+1}\right) + \tau \, a \left( \varphi ,p_N^{*,k};\mu \right)= \tau \, \left( z_d^k - C y_N^{*,k}, C \varphi \right) _{D} \quad \forall \varphi \in Y_N, \ \forall k \in \mathbb {K},$$
(11c)
$$\left( u_N^{*} - u_d,\psi \right) _U - m\left( \psi , p_N^{*,1}\right)= 0\quad \forall \psi \in U_N^0,$$
(11d)

where the final condition of the adjoint is given by \(p_N^{*,K+1} = 0\). The reduced-order optimality system can be solved efficiently using an offline-online computational procedure which is briefly discussed in Sect. 3.4.

Note that we use a single reduced basis ansatz and test space for the state and adjoint equations for two reasons: first, a single space for state and adjoint guarantees the stability of the reduced-order optimality system (Gerner and Veroy 2012); and second, the reduced-order optimality system (11) reflects the reduced-order 4D-Var problem (10) only if the spaces of the state and adjoint equations are identical. Since the state and adjoint solutions need to be well-approximated using the single space \(Y_N,\) we combine both snapshots of the state and adjoint equations into the reduced basis space \(Y_N.\)

We also note that the dynamics of the state and adjoint are often different, and thus separate spaces for the state and adjoint would be beneficial concerning the computational efficiency, i.e., the dimension of the state/adjoint reduced basis space and thus the overall dimension of the reduced-order optimality system would be considerably smaller. However, this requires a Petrov–Galerkin projection for the state and adjoint with associated detriment concerning the stability.

3.3 A posteriori error estimation

We turn to the a posteriori error estimation procedure. Although we consider a parametrized problem here, we note that the error bounds proposed below can also be used in the non-parametrized reduced-order setting and are independent of how the reduced-order spaces are constructed, i.e., the bound directly applies to reduced-order approaches where the spaces are constructed e.g. using empirical orthogonal functions, POD, or dual-weighted POD (Daescu and Navon 2008).

As mentioned above, our main goal is to rigorously bound the error in the optimal control, \(u^* - u_N^*\). This will allow us to confirm the fidelity of the reduced-order 4D-Var solution efficiently during the online stage. Our a posteriori error bounds are also crucial in the construction of the reduced basis spaces by the POD-Greedy algorithm (see Sect. 3.5).

To begin, we require the residuals

$$\begin{aligned} r_y^k(\phi ;\mu )&= f(\phi ) - a\left( y_N^{*,k},\phi ;\mu \right) - \frac{1}{\tau } m\left( y_N^{*,k}- y_N^{*,k-1},\phi \right) \quad \forall \phi \in Y, \ k \in \mathbb {K}, \end{aligned}$$
(12)
$$\begin{aligned} r_p^k(\varphi ;\mu )&= \left( z_d^k - C y_N^{*,k}, C \varphi \right) _D - a\left( \varphi ,p_N^{*,k};\mu \right) - \frac{1}{\tau } m\left( \varphi ,p_N^{*,k}-p_N^{*,k+1}\right) \nonumber \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \forall \varphi \in Y, \ k \in \mathbb {K}, \end{aligned}$$
(13)
$$\begin{aligned} r_u(\psi ;\mu )&= m\left( \psi ,p_N^{*,1}\right) - \left( u_N^* - u_d, \psi \right) _U\quad \forall \psi \in U. \end{aligned}$$
(14)

We also define

$$\begin{aligned} R_y = \left( \tau \sum _{k=1}^K \left||r_y^k\right||_{Y'}^2 \right) ^{1/2}, \qquad R_p = \left( \tau \sum _{k=1}^K \left||r_p^k\right||_{Y'}^2 \right) ^{1/2}, \end{aligned}$$
(15)

and the errors \(e_y^k= y^{*,k} - y_N^{*,k}\), \(e_p^k= p^{*,k} - p_N^{*,k}\), and \(e_u = u^* - u_N^*\). Note that we use \(\left||r_{y,p}^k\right||_{Y'}\) and \(\left||r_u\right||_{U'}\) as a shorthand notation for \(\left||r_{y,p}^k(\cdot ;\mu )\right||_{Y'}\) and \(\left||r_u(\cdot ;\mu )\right||)_{U'}\), respectively. We can now state our main result:

Proposition 1

Let \(u^*\) and \(u_N^*\) be the optimal solutions of the full-order and reduced-order 4D-Var problems, (6) and (10), respectively. The error satisfies

$$\begin{aligned} \left||u^* - u_N^*\right||_U \le {\varDelta }_N^u(\mu ) := c_1(\mu ) + \sqrt{c_1(\mu )^2 + c_2(\mu )} \quad \forall \mu \in \mathcal {D}, \end{aligned}$$
(16)

where \(c_1(\mu )\) and \(c_2(\mu )\) are given by

$$\begin{aligned} c_1(\mu )&= \frac{1}{2} \left( \left||r_u(\cdot ;\mu )\right||_{U'} + \frac{1}{\sqrt{\alpha _{\mathrm{LB}}(\mu )}} R_p \right) , \quad \text {and} \end{aligned}$$
(17)
$$\begin{aligned} \quad c_2(\mu )&= \left( \frac{\sqrt{2} + 1}{\alpha _{\mathrm{LB}}(\mu )} R_y R_p + \frac{\gamma _c^2}{2 (\alpha _{\mathrm{LB}}(\mu ))^2} R_y^2 \right) . \end{aligned}$$
(18)

Proof

We start from the error-residual equations obtained from (8) and the definitions of the residuals

$$\begin{aligned} m\left( e_y^k- e_y^{k-1},\phi \right) + \tau \, a\left( e_y^k,\phi ;\mu \right)&= \tau \, r_y^k(\phi ;\mu ), \qquad \qquad \forall \phi \in Y, \; k \in \mathbb {K}, \end{aligned}$$
(19)
$$\begin{aligned} m\left( \varphi ,e_p^k- e_p^{k+1}\right) + \tau \, a\left( \varphi ,e_p^k;\mu \right)&= \tau \, r_p^k(\varphi ;\mu ) -\tau \, \left( C e_y^k, C \varphi \right) _D, \nonumber \\&\quad \quad \,\,\,\,\qquad \qquad \qquad \qquad \qquad \forall \varphi \in Y, \; k \in \mathbb {K}, \end{aligned}$$
(20)
$$\begin{aligned} (e_u,\psi )_U- m\left( \psi ,e_p^1\right)&= r_u(\psi ;\mu ), \qquad \qquad \qquad \forall \psi \in U, \end{aligned}$$
(21)

where \(e_p^{K+1} = 0\) and \(e_y^0 = e_u\). We first choose \(\phi = e_p^k\) in (19) and take the sum from \(k=1\) to K to get

$$\begin{aligned} \sum _{k=1}^K m\left( e_y^k-e_y^{k-1},e_p^k\right) + \tau \sum _{k=1}^K a\left( e_y^k,e_p^k;\mu \right) = \tau \sum _{k=1}^K r_y^k\left( e_p^k;\mu \right) . \end{aligned}$$
(22)

Similarly, choosing \(\varphi = e_y^k\) in (20) and summing from \(k=1\) to K we obtain

$$\begin{aligned} \sum _{k=1}^K m\left( e_y^k,e_p^k-e_p^{k+1}\right) + \tau \sum _{k=1}^K a\left( e_y^k,e_p^k;\mu \right) = \tau \sum _{k=1}^K r_p^k\left( e_y^k;\mu \right) - \tau \sum _{k=1}^K \left||C e_y^k\right||_D^2. \end{aligned}$$
(23)

Finally, from (21) with \(\psi = e_u\) we have

$$\begin{aligned} \left||e_u\right||_U^2 - m\left( e_u,e_p^1\right) = r_u(e_u;\mu ). \end{aligned}$$
(24)

By adding Eqs. (23) and (24), and then subtracting (22) we get

$$\begin{aligned}&\sum _{k=1}^K m\left( e_y^{k-1},e_p^k\right) - \sum _{k=1}^K m\left( e_y^k,e_p^{k+1}\right) - m(e_u,e_p^1) + \left||e_u\right||_U^2 \nonumber \\&\quad = -\,\tau \sum _{k=1}^K r_y^k\left( e_p^k;\mu \right) + \tau \sum _{k=1}^K r_p^k\left( e_y^k;\mu \right) +r_u(e_u;\mu ) - \tau \sum _{k=1}^K \left||C e_y^k\right||_D^2. \end{aligned}$$
(25)

Since \(e_y^0=e_u\), and \(e_p^{K+1}=0\), the left-hand side of (25) reduces to \(\left||e_u\right||_U^2\) and we thus obtain

$$\begin{aligned} \left||e_u\right||_U^2 + \tau \sum _{k=1}^K \left||C e_y^k\right||_D^2&= -\,\tau \sum _{k=1}^K r_y^k\left( e_p^k;\mu \right) + \tau \sum _{k=1}^K r_p^k\left( e_y^k;\mu \right) +r_u(e_u;\mu ) \nonumber \\&\le \left( \tau \sum _{k=1}^K \left||r_y^k\right||_{Y'}^2 \right) ^{1/2} \left( \tau \sum _{k=1}^K \left||e_p^k\right||_Y^2 \right) ^{1/2} \nonumber \\&\quad + \,\left( \tau \sum _{k=1}^K \left||r_p^k\right||_{Y'}^2 \right) ^{1/2} \left( \tau \sum _{k=1}^K \left||e_y^k\right||_Y^2 \right) ^{1/2} + \left||r_u\right||_{U'} \left||e_u\right||_U. \end{aligned}$$
(26)

From the proof for the spatio-temporal energy norm bound in Grepl and Patera (2005) and Kärcher and Grepl (2014) we know that

$$\begin{aligned} \tau \sum _{k=1}^K \left||e_y^k\right||_Y^2 \le \frac{\tau }{(\alpha _{\mathrm{LB}}(\mu ))^2} \sum _{k=1}^K \left||r_y^k\right||_{Y'}^2 + \frac{1}{\alpha _{\mathrm{LB}}(\mu )} \underbrace{m(e_u,e_u)}_{= \left||e_u\right||_U^2}. \end{aligned}$$
(27)

We need an analogous result for the adjoint. To this end, we first choose \(\varphi = e_p^k\) in (20) to obtain

$$\begin{aligned} m\left( e_p^k,e_p^k- e_p^{k+1}\right) + \tau \, a\left( e_p^k,e_p^k;\mu \right) = \tau \, r_p^k\left( e_p^k;\mu \right) - \tau \, \left( C e_y^k, C e_p^k\right) _D. \end{aligned}$$
(28)

We next note from the Cauchy–Schwarz inequality and Young’s inequality that

$$\begin{aligned} 2 \, m\left( e_p^k,e_p^{k+1}\right) \le m\left( e_p^k,e_p^k\right) + m\left( e_p^{k+1},e_p^{k+1}\right) , \end{aligned}$$
(29)

and also that

$$\begin{aligned} 2 \, \tau \, \left( C e_y^k, C e_p^k\right) _D&\le 2 \, \tau \, \left||C e_y^k\right||_D \, \left||C e_p^k\right||_D \le 2 \, \tau \, \left||C e_y^k\right||_D \, \gamma _c\, \left||e_p^k\right||_Y \nonumber \\&\le \frac{2 \, \tau \, \gamma _c^2}{\alpha _{\mathrm{LB}}(\mu )} \left||C e_y^k\right||_D^2 + \frac{\tau \, \alpha _{\mathrm{LB}}(\mu )}{2} \left||e_p^k\right||_Y^2, \end{aligned}$$
(30)

where we also used the definition of the constant \(\gamma _c\). Finally, again from Young’s inequality we obtain

$$\begin{aligned} 2 \, \tau \, r_p^k\left( e_p^k;\mu \right) \le \frac{2 \, \tau }{\alpha _{\mathrm{LB}}(\mu )} \left||r_p^k\right||_{Y'}^2 + \frac{\tau \, \alpha _{\mathrm{LB}}(\mu )}{2} \left||e_p^k\right||_Y^2. \end{aligned}$$
(31)

By summing two times (28) from \(k=1\) to K and invoking (29), (30), and (31), we obtain

$$\begin{aligned} m\left( e_p^1,e_p^1\right) + \tau \sum _{k=1}^K a\left( e_p^k,e_p^k;\mu \right) \le \frac{2 \, \tau }{\alpha _{\mathrm{LB}}(\mu )} \sum _{k=1}^K \left||r_p^k\right||_{Y'}^2 + \frac{2 \, \tau \, \gamma _c^2}{\alpha _{\mathrm{LB}}(\mu )} \sum _{k=1}^K \left||C e_y^k\right||_D^2, \end{aligned}$$
(32)

and hence

$$\begin{aligned} \tau \sum _{k=1}^K \left||e_p^k\right||_Y^2 \le \frac{2 \, \tau }{(\alpha _{\mathrm{LB}}(\mu ))^2} \sum _{k=1}^K \left||r_p^k\right||_{Y'}^2 + 2 \left( \frac{ \gamma _c}{\alpha _{\mathrm{LB}}(\mu )} \right) ^2 \tau \sum _{k=1}^K \left||C e_y^k\right||_D^2. \end{aligned}$$
(33)

Using the inequalities (27) and (33) in (26), invoking the definitions (15), and noting that \((a^2 + b^2)^{1/2} \le |a| + |b|\), it follows that

$$\begin{aligned} \left||e_u\right||_U^2 + \tau \sum _{k=1}^K \left||C e_y^k\right||_D^2&\le \left||r_u\right||_{U'} \left||e_u\right||_U+ R_p \left[ \frac{1}{(\alpha _{\mathrm{LB}}(\mu ))^2} R_y^2 + \frac{1}{\alpha _{\mathrm{LB}}(\mu )} \left||e_u\right||_U^2 \right] ^{1/2} \nonumber \\&\quad + R_y \left[ \frac{2}{(\alpha _{\mathrm{LB}}(\mu ))^2} R_p^2 + 2 \left( \frac{ \gamma _c}{\alpha _{\mathrm{LB}}(\mu )} \right) ^2 \tau \sum _{k=1}^K \left||C e_y^k\right||_D^2 \right] ^{1/2} \nonumber \\&\le \left||r_u\right||_{U'} \left||e_u\right||_U+ R_p \left[ \frac{1}{\alpha _{\mathrm{LB}}(\mu )} R_y + \frac{1}{\sqrt{\alpha _{\mathrm{LB}}(\mu )}} \left||e_u\right||_U\right] \nonumber \\&\quad + R_y \left[ \frac{\sqrt{2}}{\alpha _{\mathrm{LB}}(\mu )} R_p + \frac{\sqrt{2} \, \gamma _c}{\alpha _{\mathrm{LB}}(\mu )} \left( \tau \sum _{k=1}^K \left||C e_y^k\right||_D^2 \right) ^{1/2} \right] . \end{aligned}$$
(34)

We now use Young’s inequality to bound

$$\begin{aligned} R_y \frac{\sqrt{2} \, \gamma _c}{\alpha _{\mathrm{LB}}(\mu )} \left( \tau \sum _{k=1}^K \left||C e_y^k\right||_D^2 \right) ^{1/2} \le \frac{\gamma _c^2}{2 (\alpha _{\mathrm{LB}}(\mu ))^2} R_y^2 + \tau \sum _{k=1}^K \left||C e_y^k\right||_D^2, \end{aligned}$$
(35)

and thereby eliminate the second term on the left-hand side of the inequality (34) to obtain

$$\begin{aligned}&\left||e_u\right||_U^2 \le \left||r_u\right||_{U'} \left||e_u\right||_U+ \frac{1}{\sqrt{\alpha _{\mathrm{LB}}(\mu )}} R_p \left||e_u\right||_U\nonumber \\&\quad + \frac{\sqrt{2} + 1}{\alpha _{\mathrm{LB}}(\mu )} R_y R_p + \frac{\gamma _c^2}{2 (\alpha _{\mathrm{LB}}(\mu ))^2} R_y^2. \end{aligned}$$
(36)

Using the definitions of \(c_1(\mu )\) and \(c_2(\mu )\) in (17) and (18), respectively, (36) simplifies to

$$\begin{aligned} \left||e_u\right||_U^2 - 2 \, c_1(\mu ) \, \left||e_u\right||_U- c_2(\mu ) \le 0. \end{aligned}$$
(37)

We obtain the desired result by bounding the error \(\left||e_u\right||_{U}\) by the larger root of the quadratic inequality. \(\square\)

We note that we currently cannot assess the tightness of the error bound (16) by providing an a priori upper bound for the effectivity, i.e., the ratio of the bound to the error. We present numerical results for the effectivity in Sect. 6.2.

3.4 Computational procedure

We briefly comment on the computational procedure to solve the reduced-order 4D-Var problem and to evaluate the error bound. Given the affine parameter dependence, the offline-online decomposition for the reduced basis approximation is already quite standard in the reduced basis literature (Rozza et al. 2008); for the parabolic case considered in this paper, we also specifically refer to Grepl and Patera (2005) and Kärcher and Grepl (2014). The evaluation of the a posteriori error bounds requires the following ingredients:

  • The dual norm of the residuals \(\left||r_y^k\right||_{Y'}\), \(\left||r_p^k\right||_{Y'}\), and \(\left||r_u\right||_{U'}\);

  • The coercivity lower bound \(\alpha _{\mathrm{LB}}(\mu )\) and the constant \(\gamma _c\).

For the construction of the coercivity lower bound, \(\alpha _{\mathrm{LB}}(\mu )\), various recipes exist (Huynh et al. 2007; Prud’homme et al. 2002; Veroy et al. 2002). The specific choices for our numerical tests are stated in Sect. 6. The constant \(\gamma _c\) is parameter-independent and can be computed by solving a generalized eigenproblem. The offline-online evaluation of the dual norms of the residuals is standard and hence omitted (Rozza et al. 2008). For a summary of the computational cost in the parabolic optimal control context, we refer to Kärcher and Grepl (2014).

We solve the full-order and reduced-order 4D-Var problems with a preconditioned Newton-CG method on the “reduced” cost functional \(j(u;\mu ) := J(y(u),u;\mu )\), i.e., we eliminate the PDE-constraint in the minimization problem. The control mass matrix is used as a preconditioner. We present results for the number of CG iterations in Sect. 6. Overall, the online computational cost to solve the reduced-order 4D-Var problem and to evaluate the a posteriori error bound depends only on the reduced basis dimensions \(N_Y\) and \(N_U^0\), and is independent of \(\mathcal {N}\).

3.5 Greedy algorithm

To construct the reduced basis spaces \(Y_N\) and \(U_N^0\), we use the POD-Greedy sampling procedure in Algorithm 1. Here, \({\varXi }_\mathrm {train}\subset \mathcal {D}\) is a finite but suitably large training sample, \(\mu ^1 \in {\varXi }_\mathrm {train}\) is the initial parameter value, \(N_{\mathrm{max}}\) the maximum number of greedy iterations, and \(\epsilon _\mathrm {tol, min}> 0\) a prescribed error tolerance. We also define the relative error bound \({\varDelta }_{N,\mathrm{rel}}^u(\mu ) = {\varDelta }_N^u(\mu )/\left||u_N^*(\mu )\right||_U\). Furthermore, for a given time history \(v_k \in Y, \ k \in \mathbb {K}\), the operator \(\text {POD}_Y(\{ v_k: k \in \mathbb {K}\})\) returns the largest POD-mode with respect to the \((\cdot ,\cdot )_Y\) inner product (normalized with respect to the Y-norm), and \(v^k_{{\text {proj}},N}(\mu )\) denotes the Y-orthogonal projection of \(v^k(\mu )\) onto the reduced basis space \(Y_N\).

In steps 6 and 7 of Algorithm 1 we expand the reduced basis space \(Y_N\) with the largest POD mode of both the state and the adjoint solution. Note that we apply the POD in these two steps to the time history of the optimal state and adjoint projection errors, i.e., \(e^{y,k}_{\text {proj},N}(\mu ) = y^{*,k}(\mu ) - y^{*,k}_{\text {proj},N}(\mu )\) and \(e^{p,k}_{\text {proj},N}(\mu ) = p^{*,k}(\mu ) - p^{*,k}_{\text {proj},N}(\mu ), \ k \in \mathbb {K}\), and not to the solutions \(y^k(\mu ),\ k \in \mathbb {K}\), and \(p^k(\mu ),\ k \in \mathbb {K}\).Footnote 1 This ensures that the POD modes are already orthogonal with respect to the \((\cdot ,\cdot )_Y\) inner product and that we add only new information to \(Y_N\) which is not yet captured in the reduced basis.

In step 8 we expand the reduced basis space \(U_N^0\) with the optimal control at \(\mu ^*\). Due to the time-dependence of the state and adjoint, it is possible that a specific parameter \(\tilde{\mu }\) is picked several times by the greedy search in step 9. Before expanding \(U_N^0\), we thus need to check if the new snapshot is already contained in the reduced basis space \(U_{N-1}^0\), and consequently discard linearly dependent snapshots. By construction, we thus have \(\dim (U_N^0) \le N\) and \(\dim (Y_N) = 2N\) (although it is theoretically possible that \(\dim (Y_N) \le 2N\), we did not observe this case in the numerical results). Finally, we note that information from the data assimilation cost functional enters through the adjoint equation and the adjoint snapshots into \(Y_N\).

figure a

4 Weak-constraint 4D-Var

We next consider the weak-constraint 4D-Var data assimilation problem, thus accounting for possible model errors in the dynamical system. For simplicity, we assume in this section that the initial condition is known and that we are interested only in bounding the model error. We consider the combined problem (unknown initial condition and model error) in the next section.

4.1 Problem statement

To emphasize the relation between the weak-constraint 4D-Var problem and the optimal control setting, we denote in this section the model error by u. However, the model error is now time-dependent, i.e., \(u = u^k, \, k \in \mathbb {K}\), and appears in every time step of the dynamical system. For a given parameter \(\mu \in \mathcal {D}\), the weak-constraint 4D-Var problem is then given by the minimization problem

$$\begin{aligned} \begin{aligned}&\min _{y \in Y^K, \, u \in U^K} J(y,u;\mu ) \quad \text {s.t.} \quad y \in Y^K \quad \text {solves} \\&m(y^k,v) + \tau \, a(y^k,v;\mu ) = m(y^{k-1},v) + \tau \, b(u^k,v) + \tau \, f(v) \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \forall v \in Y, \ \forall k \in \mathbb {K}, \end{aligned} \end{aligned}$$
(38)

with initial condition \(m(y^0,v) = m(y_0,v)\) for all \(v \in Y,\) and cost functional \(J(\cdot ,\cdot ;\mu ): Y^K \times U^K \rightarrow \mathbb {R}\) given by

$$\begin{aligned} J(y,u;\mu ) = \frac{\tau }{2} \sum _{k=1}^K \left||u^k - u_d^k\right||_{U}^2 + \frac{\tau }{2} \sum _{k=1}^K \left||C y^k - z_d^k\right||^2_{D}. \end{aligned}$$
(39)

We note that the cost functional now contains the contribution of the model error \(u^k\) as a sum over all time steps. In the optimal control setting, \(u_d^k \in U, \, k \in \mathbb {K}\) denotes the desired optimal control. In the data assimilation setting, however, \(u_d^k\) is usually set to zero since the model error is generally assumed to be unbiased (Law et al. 2015). We also note that a constant (known) bias can be taken into account by adjusting the right-hand side f(v). Similar to the strong-constraint formulation, \(z_d^k \in D\), \(k \in \mathbb {K}\), are the observed outputs.

We again obtain the associated necessary and sufficient first-order optimality conditions using a Lagrangian approach: Given \(\mu \in \mathcal {D}\), the optimal solution \((y^*,p^*,u^*) \in Y^K \times Y^K \times U^K\) satisfies

$$m\left( y^{*,k} - y^{*,k-1},\phi \right) + \tau \, a ( y^{*,k},\phi ;\mu )= \tau \, b(u^k,\phi ) + \tau \, f(\phi ) \quad \forall \phi \in Y, \ \forall k \in \mathbb {K},$$
(40a)
$$m(y^0,\phi )= m(y_0,\phi )\quad \forall \phi \in Y,$$
(40b)
$$m(\varphi , p^{*,k} - p^{*,k+1}) + \tau \, a (\varphi ,p^{*,k};\mu )= \tau \, \left( z_d^k - C y^{*,k}, C \varphi \right) _{D} \quad \forall \varphi \in Y, \ \forall k \in \mathbb {K},$$
(40c)
$$\tau \, \left( u^{*,k} - u_d^k,\psi \right) _U - \tau \, b(\psi , p^{*,k})= 0\quad \forall \psi \in U, \ \forall k \in \mathbb {K},$$
(40d)

where the final condition of the adjoint is given by \(p^{*,K+1} = 0\). We note that the adjoint equation of the weak-constraint formulation (40c) is identical to the adjoint of the strong constraint formulation (8c).

4.2 Reduced basis approximation

We again assume that we are given the reduced basis spaces \(Y_N \subset Y\) for the state and adjoint and \(U_N \subset U\) for the control. Whereas the construction of the space \(Y_N\) directly follows from the discussion in Sect. 3.5 for the strong-constraint case, the construction of \(U_N\) needs to be adjusted to account for the time-dependence of the model error. We briefly outline the procedure in Sect. 4.4.

For a given parameter \(\mu \in \mathcal {D}\), we can now state the weak-constraint reduced-order 4D-Var data assimilation problem as follows

$$\begin{aligned} \begin{aligned}&\min _{y_N \in Y_N^K, \, u_N \in U_N^K} J(y_N,u_N;\mu ) \quad \text {s.t.} \quad y_N \in Y_N^K \quad \text {solves} \\&m\left( y_N^k,v\right) + \tau \, a\left( y_N^k,v;\mu \right) = m\left( y_N^{k-1},v\right) + \tau \, b\left( u_N^k,v\right) + \tau \, f(v) \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \forall v \in Y_N, \ \forall k \in \mathbb {K}, \end{aligned} \end{aligned}$$
(41)

with initial condition \(m(y_N^0,v) = m(y_0,v)\) for all \(v \in Y_N\). The reduced-order optimality system directly follows from (40) and is thus omitted.

4.3 A posteriori error estimation

We first introduce the residuals for the weak-constraint case

$$\begin{aligned} \tilde{r}_y^k(\phi ;\mu )&= f(\phi ) + b\left( u_N^{*,k},\phi \right) - a\left( y_N^{*,k},\phi ;\mu \right) - \frac{1}{\tau } m\left( y_N^{*,k}- y_N^{*,k-1},\phi \right) \nonumber \\&\qquad \qquad \qquad \qquad \qquad \forall \phi \in Y, \ k \in \mathbb {K}, \end{aligned}$$
(42)
$$\begin{aligned} \tilde{r}_p^k(\varphi ;\mu )&= \left( z_d^k - C y_N^{*,k}, C \varphi \right) _D - a\left( \varphi ,p_N^{*,k};\mu \right) - \frac{1}{\tau } m\left( \varphi ,p_N^{*,k}-p_N^{*,k+1}\right) \nonumber \\&\qquad \qquad \qquad \forall \varphi \in Y, \ k \in \mathbb {K}, \end{aligned}$$
(43)
$$\begin{aligned} \tilde{r}_u^k(\psi ;\mu )&= m\left( \psi ,p_N^{*,k}\right) - \left( u_N^{*,k}- u_d, \psi \right) _U\quad \forall \psi \in U, \ k \in \mathbb {K}. \end{aligned}$$
(44)

Since the adjoint equations (40c) and (8c) are identical, the adjoint residual is actually equivalent to the strong-constraint case, i.e., \(r_p^k= \tilde{r}_p^k\). Similar to (15), we introduce the sums from \(k =1\) to K of the dual norms of the residuals as

$$\begin{aligned} \tilde{R}_{y,p} = \left( \tau \sum _{k=1}^K \left||\tilde{r}_{y,p}^k(\cdot ;\mu )\right||_{Y'}^2 \right) ^{1/2}, \qquad \tilde{R}_u = \left( \tau \sum _{k=1}^K \left||\tilde{r}_u^k(\cdot ;\mu )\right||_{U'}^2 \right) ^{1/2}, \end{aligned}$$
(45)

and the time-dependent model error \(e_u^k= u^{*,k} - u_N^{*,k}\). We may now state our main result:

Proposition 2

Let \(u^{*,k}\) and \(u_N^{*,k}\), \(k \in \mathbb {K}\), be the optimal solutions of the full-order and reduced-order 4D-Var problems (38) and (41), respectively. The error satisfies

$$\begin{aligned} \left( \tau \sum _{k=1}^{K} \left||u^{*,k} - u_N^{*,k}\right||_U^2 \right) ^{1/2} \le \tilde{{\varDelta }}_N^u(\mu ) := c_1(\mu ) + \sqrt{c_1(\mu )^2 + c_2(\mu )} \quad \forall \mu \in \mathcal {D}, \end{aligned}$$
(46)

where \(c_1(\mu )\) and \(c_2(\mu )\) are given by

$$\begin{aligned} c_1(\mu )&= \frac{1}{2} \left( \tilde{R}_u + \frac{\sqrt{2} \, \gamma _b}{\alpha _{\mathrm{LB}}(\mu )} \tilde{R}_p \right) , \quad \text {and} \end{aligned}$$
(47)
$$\begin{aligned} \quad c_2(\mu )&= \left( \frac{2\sqrt{2}}{\alpha _{\mathrm{LB}}(\mu )} \tilde{R}_y \tilde{R}_p + \frac{\gamma _c^2}{2 (\alpha _{\mathrm{LB}}(\mu ))^2} \tilde{R}_y^2 \right) . \end{aligned}$$
(48)

Proof

The proof follows partly from the proof of Proposition 1; we thus stress the differences and refer to the previous proof whenever possible. We again start from the error-residual equations which are now given by

$$m\left( e_y^k- e_y^{k-1},\phi \right) + \tau \, a\left( e_y^k,\phi ;\mu \right)= \tau \, \tilde{r}_y^k(\phi ;\mu ) + \tau \, b(e_u^k,\phi ), \quad\forall \phi \in Y, \; k \in \mathbb {K},$$
(49)
$$m\left( \varphi ,e_p^k- e_p^{k+1}\right) + \tau \, a\left( \varphi ,e_p^k;\mu \right)= \tau \, \tilde{r}_p^k(\varphi ;\mu ) -\tau \, \left( C e_y^k, C \varphi \right) _D, \quad \forall \varphi \in Y, \; k \in \mathbb {K},$$
(50)
$$\tau \, \left( e_u^k,\psi \right) _U- \tau \, b\left( \psi ,e_p^k\right)= \tau \, \tilde{r}_u^k(\psi ;\mu ),\quad \forall \psi \in U, \; k \in \mathbb {K},$$
(51)

where \(e_p^{K+1} = 0\) and \(e_y^0 = 0\), since we guarantee that \(y_0 \in Y_N\). We now choose \(\phi = e_p^k\) in (49), \(\varphi = e_p^k\) in (50), and \(\psi = e_u^k\) in (21), sum all equations from from \(k=1\) to K and combine them following the proof of Proposition 1 to obtain

$$\begin{aligned} \tau \sum _{k=1}^{K}&\left||e_u^k\right||_U^2 + \tau \sum _{k=1}^K \left||C e_y^k\right||_D^2 \nonumber \\&= -\,\tau \sum _{k=1}^K \tilde{r}_y^k\left( e_p^k;\mu \right) + \tau \sum _{k=1}^K \tilde{r}_p^k\left( e_y^k;\mu \right) + \tau \sum _{k=1}^K \tilde{r}_u^k\left( e_u^k;\mu \right) \nonumber \\&\le \tilde{R}_y \left( \tau \sum _{k=1}^K \left||e_p^k\right||_Y^2 \right) ^{1/2} + \tilde{R}_p \left( \tau \sum _{k=1}^K \left||e_y^k\right||_Y^2 \right) ^{1/2} + \tilde{R}_u \left( \tau \sum _{k=1}^K \left||e_u^k\right||_U^2 \right) ^{1/2}. \end{aligned}$$
(52)

We next bound the primal error. Since the primal equation contains the model error on the right-hand side, we need to extend the proof from Grepl and Patera (2005) for the spatio-temporal energy norm bound to include the extra term on the right-hand side. The derivation is similar to the one for the bound of the adjoint in the proof of Proposition 1 [cf. (28)–(33)], but instead of bounding the \((\cdot ,\cdot )_D\) inner product using Cauchy–Schwarz and the constant \(\gamma _c\), we invoke the continuity of the bilinear form \(b(\cdot ,\cdot )\). We can thus derive the bound

$$\begin{aligned} \tau \sum _{k=1}^K \left||e_y^k\right||_Y^2 \le \frac{2 \, \tau }{(\alpha _{\mathrm{LB}}(\mu ))^2} \sum _{k=1}^K \left||r_y^k\right||_{Y'}^2 + 2 \left( \frac{ \gamma _b}{\alpha _{\mathrm{LB}}(\mu )} \right) ^2 \tau \sum _{k=1}^K \left||e_u^k\right||_U^2. \end{aligned}$$
(53)

Furthermore, since the adjoint of the strong- and weak-constraint case are equivalent, we can directly use the bound (33). Using the inequalities (53) and (33) in (52), invoking the definitions (45), and noting that \((a^2 + b^2)^{1/2} \le |a| + |b|\), it follows that

$$\begin{aligned} \tau \sum _{k=1}^{K} \left||e_u^k\right||_U^2&+ \tau \sum _{k=1}^K \left||C e_y^k\right||_D^2 \nonumber \\&\le \left[ \tilde{R}_u + \frac{\sqrt{2} \, \gamma _b}{\alpha _{\mathrm{LB}}(\mu )} \tilde{R}_p \right] \Big ( \tau \sum _{k=1}^K \left||e_u^k\right||_U^2 \Big )^{1/2} \nonumber \\&\quad +\frac{2 \sqrt{2}}{\alpha _{\mathrm{LB}}(\mu )} \tilde{R}_y \tilde{R}_p + \frac{\sqrt{2} \, \gamma _c}{\alpha _{\mathrm{LB}}(\mu )} \tilde{R}_y \left( \tau \sum _{k=1}^K \left||C e_y^k\right||_D^2 \right) ^{1/2}. \end{aligned}$$
(54)

We again use Young’s inequality to bound

$$\begin{aligned} \tilde{R}_y \frac{\sqrt{2} \, \gamma _c}{\alpha _{\mathrm{LB}}(\mu )} \left( \tau \sum _{k=1}^K \left||C e_y^k\right||_D^2 \right) ^{1/2} \le \frac{\gamma _c^2}{2 (\alpha _{\mathrm{LB}}(\mu ))^2} \tilde{R}_y^2 + \tau \sum _{k=1}^K \left||C e_y^k\right||_D^2, \end{aligned}$$
(55)

and thereby eliminate the second term on the left-hand side of (54) to obtain

$$\begin{aligned}&\tau \sum _{k=1}^{K} \left||e_u^k\right||_U^2 \le \left[ \tilde{R}_u + \frac{\sqrt{2} \, \gamma _b}{\alpha _{\mathrm{LB}}(\mu )} \tilde{R}_p \right] \left( \tau \sum _{k=1}^K \left||e_u^k\right||_U^2 \right) ^{1/2} \nonumber \\&\quad + \frac{2 \sqrt{2}}{\alpha _{\mathrm{LB}}(\mu )} \tilde{R}_y \tilde{R}_p + \frac{\gamma _c^2}{2 (\alpha _{\mathrm{LB}}(\mu ))^2} \tilde{R}_y^2. \end{aligned}$$
(56)

Using the definitions of \(c_1(\mu )\) and \(c_2(\mu )\) in (47) and (48), respectively, we obtain

$$\begin{aligned} \tau \sum _{k=1}^{K} \left||e_u^k\right||_U^2 - 2 \, c_1(\mu ) \, \left( \tau \sum _{k=1}^K \left||e_u^k\right||_U^2 \right) ^{1/2} - c_2(\mu ) \le 0, \end{aligned}$$
(57)

The desired result follows again by using the larger root of the quadratic inequality as a bound for the error. \(\square\)

The offline-online computational procedure in the weak-constraint case is analogous to the strong-constraint case discussed in Sect. 3.4 and therefore omitted. Note that we additionally require the constant \(\gamma _b\) now, which is parameter-independent and can be computed by solving a generalized eigenproblem (similar to \(\gamma _c\)). For the Newton–CG method, we use the block-diagonal matrix \(\mathrm{blkdiag}(\tau \mathrm {M}, \ldots , \tau \mathrm {M})\) as a preconditioner.

Similar to the strong-constraint case, we again cannot assess the tightness of the error bound (46) by providing an a priori upper bound for the associated effectivity. Instead, we present numerical results for the weak-constraint case also in Sect. 6.2.

4.4 Greedy algorithm

The POD-Greedy sampling procedure to construct the reduced basis spaces \(Y_N\) and \(U_N\) in the weak-constraint case is very similar to the strong-constraint case. We summarize the procedure in Algorithm 2 and comment only on the differences.

First, since we assume in this section that the initial condition \(y_0\) is known, we initialize the reduced basis space \(Y_N\) with \(y_0/\Vert y_0\Vert _Y\). Second, we additionally require the operator \(\text {POD}_U(\{ v_k: k \in \mathbb {K}\})\), which returns the largest POD mode with respect to the \((\cdot ,\cdot )_U\) inner product (and normalized with respect to the U-norm). Also, \(v^k_{{\text {proj}_U},N}(\mu )\) denotes the U-orthogonal projection of \(v^k(\mu )\) onto the reduced basis space \(U_N\) and \(e^{u,k}_{\text {proj}_U,N}(\mu ) = u^{*,k}(\mu ) - u^{*,k}_{\text {proj}_U,N}(\mu )\) denotes the time history of the optimal model-error forcing. Since the model-error forcing is time-dependent, we simply replace step 8 in Algorithm 1 with a POD-step and add only the largest POD mode \(\zeta\) to \(U_N\). We note that the POD modes \(\zeta\) are orthogonal with respect to the \((\cdot ,\cdot )_U\) inner product and that we now usually have \(\dim (U_N) = N\) and \(\dim (Y_N) = 2N + 1\) (due to the initial condition), i.e., the reduced basis space \(U_N\) is enriched in every greedy step. Again, it is theoretically possible that \(\dim (U_N) \le N\) and \(\dim (Y_N) \le 2N + 1\), although we did not observe this case in the numerical results.

figure b

5 Combined 4D-Var formulation

We now combine the results from the previous two sections and consider the classical 4D-Var data assimilation problem including model error.

5.1 Problem statement

For a given parameter \(\mu \in \mathcal {D}\), we now consider the minimization problem

$$\begin{aligned} \begin{aligned}&\min _{y \in Y^K, \, u \in U^{K+1}} J(y,u;\mu ) \quad \text {s.t.} \quad y \in Y^K \quad \text {solves} \\&m(y^k,v) + \tau \, a(y^k,v;\mu ) = m(y^{k-1},v) + \tau \, b(u^k,v) + \tau \, f(v) \\&\qquad \qquad \qquad \qquad \qquad \qquad \forall v \in Y, \ \forall k \in \mathbb {K}, \end{aligned} \end{aligned}$$
(58)

with initial condition \(m(y^0,v) = m(u^0,v)\) for all \(v \in Y,\) and cost functional \(J(\cdot ,\cdot ;\mu ): Y^K \times U^{K+1} \rightarrow \mathbb {R}\) given by

$$\begin{aligned} J(y,u;\mu ) = \frac{1}{2} \left||u^0 - u_d^0\right||_{U}^2 + \frac{\tau }{2} \sum _{k=1}^K \left||u^k - u_d^k\right||_{U}^2 + \frac{\tau }{2} \sum _{k=1}^K \left||C y^k - z_d^k\right||^2_{D}. \end{aligned}$$
(59)

In addition to the error between the predicted and observed outputs, the cost functional now contains the deviation of the initial condition from the background state, \(u_d^0 \in U,\) as well as the model error for all time steps. As mentioned earlier, in the data assimilation context we usually have \(u_d^0 \ne 0\) and \(u_d^k = 0, \, 1 \le k \le K\), i.e., the background state is nonzero whereas the model error is assumed to have zero mean.

The associated necessary and sufficient first-order optimality conditions are thus: Given \(\mu \in \mathcal {D}\), the optimal solution \((y^*,p^*,u^*) \in Y^K \times Y^K \times U^{K+1}\) satisfies

$$m\left( y^{*,k} - y^{*,k-1},\phi \right) + \tau \, a ( y^{*,k},\phi ;\mu )= \tau \, b(u^k,\phi ) + \tau \, f(\phi ) \quad \forall \phi \in Y, \ \forall k \in \mathbb {K},$$
(60a)
$$m(y^{*,0},\phi )= m(u^0,\phi )\quad\forall \phi \in Y$$
(60b)
$$m(\varphi , p^{*,k} - p^{*,k+1}) + \tau \, a (\varphi ,p^{*,k};\mu )= \tau \, \left( z_d^k - C y^{*,k}, C \varphi \right) _{D} \quad\forall \varphi \in Y, \ \forall k \in \mathbb {K},$$
(60c)
$$\tau \, \left( u^{*,k} - u_d^k,\psi \right) _U - \tau \, b(\psi , p^{*,k})= 0\quad\forall \psi \in U, \ \forall k \in \mathbb {K},$$
(60d)
$$\left( u^{*,0} - u_d^0,\psi \right) _U - m(\psi , p^{*,1})= 0\quad\forall \psi \in U,$$
(60e)

where the final condition of the adjoint is given by \(p^{*,K+1} = 0\).

5.2 Reduced basis approximation and error estimation

The reduced-order problem follows directly from (58) and (59) by restricting the state, adjoint, and control spaces to their respective reduced basis spaces. We again introduce an integrated space \(Y_N\) for the state and adjoint, and two separate spaces for the “control,” i.e., \(U_N^0\) for the initial condition \(u_N^0\) and \(U_N\) for the model error \(u_N^k, \, k \in \mathbb {K}\). The greedy procedure to generate these spaces simply combines the algorithms introduced in Sects. 3.5 and 4.4.

For any given \(\mu \in \mathcal {D}\), we can now state the reduced-order minimization problem as follows

$$\begin{aligned} \begin{aligned}&\min _{y_N \in Y_N^K, \, u_N \in U_N^0 \times U_N^K} J(y_N,u_N;\mu ) \quad \text {s.t.} \quad y_N \in Y_N^K \quad \text {solves} \\&m\left( y_N^k,v\right) + \tau \, a\left( y_N^k,v;\mu \right) = m\left( y_N^{k-1},v\right) + \tau \, b\left( u_N^k,v\right) + \tau \, f(v) \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \forall v \in Y_N, \ \forall k \in \mathbb {K}, \end{aligned} \end{aligned}$$
(61)

with initial condition \(m(y_N^0,v) = m(u_N^0,v)\) for all \(v \in Y_N\). The reduced-order optimality system directly follows from (60) and is thus omitted.

The a posteriori error bound result is a combination of the strong- and weak-constraint case. In addition to the residuals of the state \(\tilde{r}_y^k\), adjoint \(\tilde{r}_p^k\), and model error \(\tilde{r}_u^k\) defined in (42), (43), and (44), we also require the residual

$$\begin{aligned} r_u^0(\psi ;\mu ) = m\left( \psi ,p_N^{*,1}\right) - \left( u_N^{*,0} - u_d^0, \psi \right) _U\quad \forall \psi \in U. \end{aligned}$$
(62)

The a posteriori error bound is given in the following proposition.

Proposition 3

Let \(u^{*,k}\) and \(u_N^{*,k}\) be the optimal solutions of the full-order and reduced-order 4D-Var problems (58) and (61), respectively. The error satisfies

$$\begin{aligned}&\left( \left||u^{*,0} - u_N^{*,0}\right||_U^2 + \tau \, \sum _{k=1}^{K} \left||u^{*,k} - u_N^{*,k}\right||_U^2 \right) ^{1/2} \nonumber \\&\quad \le \hat{{\varDelta }}_N^u(\mu ) := c_1(\mu ) + \sqrt{c_1(\mu )^2 + c_2(\mu )} \quad \forall \mu \in \mathcal {D}, \end{aligned}$$
(63)

where \(c_1(\mu )\) and \(c_2(\mu )\) are given by

$$\begin{aligned} c_1(\mu ) = \frac{1}{2} \left( \left( \left||r_u^0(\cdot ;\mu )\right||_{U'}^2 + \tilde{R}_u^2 \right) ^{1/2} + \left( \frac{2 \, \gamma _b^2}{(\alpha _{\mathrm{LB}}(\mu ))^2} + \frac{1}{\alpha _{\mathrm{LB}}(\mu )} \right) ^{1/2} \tilde{R}_p \right) \end{aligned}$$
(64)

and

$$\begin{aligned} c_2(\mu ) = \left( \frac{2\sqrt{2}}{\alpha _{\mathrm{LB}}(\mu )} \tilde{R}_y \tilde{R}_p + \frac{\gamma _c^2}{2 (\alpha _{\mathrm{LB}}(\mu ))^2} \tilde{R}_y^2 \right) . \end{aligned}$$
(65)

The proof follows from the proofs of Propositions 1 and 2 and is thus omitted. The offline-online decomposition is analogous to our previous discussion in Sect. 3.4.

6 Numerical results

6.1 Problem description

We consider the dispersion of a pollutant governed by a convection-diffusion equation with a Taylor–Green vortex velocity field. The concentration of the pollutant is measured at five spatial locations over time. The computational domain is \({\varOmega }= (-\,1,1)^2\) and we assume homogeneous Dirichlet boundary conditions on the lower boundary \({\varGamma }_D\) and homogeneous Neumann boundary conditions on the remaining boundary \({\varGamma }_N\). The Péclet number serves as our parameter, i.e., we have \(\mu = \mathrm{Pe} \in \mathcal {D}= [10,50]\). The bilinear form a is thus given by

$$\begin{aligned} a(w,v;\mu ) = \frac{1}{\mu } \int _{\varOmega }\nabla w \cdot \nabla v \; \mathrm{d}x+ \int _{\varOmega }(\beta \cdot \nabla w) v \; \mathrm{d}x, \end{aligned}$$
(66)

and the velocity field is \(\beta (x) = ( \sin (\pi x_1) \cos (\pi x_2), - \cos (\pi x_1) \sin (\pi x_2) )^T\). The domain \({\varOmega }\) with measurement sites as well as the velocity field are sketched in Fig. 1. Our model problem is motivated by the source reconstruction of a (possibly) accidental release of an agent, where the velocity field is known (Krysta et al. 2006; Krysta and Bocquet 2007). Although we consider a fixed velocity field here, our problem formulation also directly applies to (affinely) parametrized velocity fields.

We do not consider an additional forcing term and thus set \(f \equiv 0\). The inner product on \(Y_\mathrm {e}= \{ v \in H^1({\varOmega }): v|_{{\varGamma }_D} \equiv 0 \}\) is defined as \((w,v)_Y = \frac{1}{2} a(w,v;\mu ^{\mathrm {ref}}) + \frac{1}{2} a(v,w;\mu ^{\mathrm {ref}})\) for the reference parameter \(\mu ^{\mathrm {ref}}= 30\). Since \(\beta\) is divergence-free and \(\beta \cdot n \equiv 0\) on \({\varGamma }\), one can show that a is coercive and that the symmetric part of a is given by \(1 / \mu \int _{\varOmega }\nabla w \cdot \nabla v \; \mathrm{d}x\). Hence we can use the min-theta approach to construct a coercivity lower bound: \(\alpha _{\mathrm{LB}}(\mu ):= \mu ^{\mathrm {ref}}/ \mu\). For details, we refer to Appendix B.3 of Kärcher (2017).

Fig. 1
figure 1

Left: sketch of the computational domain with measurement locations \({\varOmega }_1,\dots ,{\varOmega }_5\). The centers of the sensors are located at \((\pm\, 0.6,\pm\, 0.6)^T\) and \((0,0)^T\); their width and height is 0.1. The colors match those in Fig. 3. Right: plot of the Taylor–Green vortex velocity field. The blue dot indicates the center \((-\,0.1,0.8)^T\) of the Gaussian serving as initial condition. (Color figure online)

We choose the time interval \(I = [0,8]\) and a time step size \(\tau = 0.04\) resulting in \(K = 200\) time steps. For the space discretization we introduce a spatial mesh with an element size of \(h = 0.04\) and corresponding linear finite element approximation spaces \(Y=U\) with \(\mathcal {N}_Y = \mathcal {N}_U = 13{,}131\) degrees of freedom. We assume that the (unknown true) initial condition \(y_0^\text {true}\) is given by a spatial Gaussian function with mean \((-0.1,0.8)^T\) and covariance matrix \(\sigma ^2 \mathbb{I}\), where \(\sigma = 0.1\) and \(\mathbb {I}\) is the identity matrix (the center of the Gaussian is shown as a blue dot in Fig. 1). The average concentrations over the measurement domains shown in Fig. 1 serve as our five outputs \(h_i(\phi ) = |{\varOmega }_i|^{-1} \int _{{\varOmega }_i} \phi \; \mathrm{d}x\), \(i=1,\dots ,5\). We then generate noisy measurements by adding white noise to the outputs computed from the full-order model for the (unknown true) parameter \(\mu ^\text {true} = 30\) with initial condition \(y_0^\text {true}\) such that \(z_d^k = C y^{k,\text {true}} + \eta ^k\), where \(\eta ^k \in \mathbb {R}^5, \ k \in \mathbb {K},\) is a vector containing uncorrelated Gaussian noise in each entry, i.e., \(\eta _i^k \sim N(0,0.05^2), \ i = 1,\ldots ,5, \ k \in \mathbb {K}\) . The inverse observation covariance matrix is given by \(\mathrm {D}= 10 \mathbb {I}\). In practice, the choice 10 produces acceptable results for the 4D Var problem (a thorough discussion of the impact of Tychonov regularization on 4D-Var is beyond the scope of this paper; we refer to Puel (2009) for more details). In the strong-constraint case, we assume an optimal prior and set the prior mean \(u_d\) to be equal to the true initial condition. In the weak-constraint case, we set \(b(\cdot ,\cdot ) = m(\cdot ,\cdot )\) to account for the model-error forcing and \(u_d^k = 0, \ k \in \mathbb {K},\) i.e., the model-error forcing is assumed to be unbiased and have zero mean. In both cases, the inverse prior covariance matrix \(\mathrm {U}\) is given by the mass matrix.

Fig. 2
figure 2

State solution for the true initial condition \(y_0^\text {true}\) and for three different parameters \(\mu\)

A preconditioned Newton–CG method takes between 30 s for \(\mu = 10\) (requiring 31 CG iterations) and 54 s for \(\mu = 50\) (requiring 56 CG iterations) to solve the full-order strong-constraint 4D-Var problem. For the weak-constraint case, the solution time ranges from 114 s (\(\mu = 10\), 81 CG iterations) to 189 s (\(\mu = 50\), 137 CG iterations). In Fig. 2, we plot the concentration of the pollutant for three different parameter values and various timesteps. The influence of the Taylor–Green vortex and the Péclet number on the solutions is clearly visible. In Fig. 3 on the left, we plot the five true outputs \(C y^{k,\text {true}}\) over time (the numbering and color of the curves correspond to the sketch in Fig. 1). The corresponding noisy measurements \(z_d^k\) used for the data assimilation are shown on the right. We note that all computations were performed in Matlab on a computer with 2.6 GHz Intel Core i7 processor and 16 GB of RAM.

Fig. 3
figure 3

Outputs \(C y^k(\mu ^\text {true})\) and associated noisy output measurements \(z_d^k\) over time

6.2 Reduced-order 4D-Var approach

We consider the strong- and weak-constraint 4D-Var data assimilation problem separately and present results for the performance of the reduced-order approach for each setting. We thus build different reduced basis spaces for the strong- and weak-constraint case by employing the Greedy sampling procedure described in Sects. 3.5 and 4.4, respectively. For both, we choose \(\mu ^\text {start} = 10\) and a training set consisting of 40 equidistant parameters over the parameter domain \(\mathcal {D}\). We also set the number of Greedy iterations to \(N_{\max } = 80\) (strong) and \(N_{\max } = 100\) (weak) resulting in a relative error bound tolerance of approximately \(10^{-2}\).

In Fig. 4 we plot the maximum relative error and error bound over a test sample consisting of 20 randomly chosen parameters in \(\mathcal {D}\) versus the number of Greedy iterations N. The relative error and bound are defined as \(\left||u^*(\mu ) - u_N^*(\mu )\right||_U/\left||u^*(\mu )\right||_U\) and \({\varDelta }_N^u(\mu )/\left||u^*(\mu )\right||_U\) in the strong-constraint case, and by \(\Big ( \tau \sum _{k=1}^{K} \left||u^{*,k}(\mu ) - u_N^{*,k}(\mu )\right||_U^2 \Big )^{1/2} /\Big ( \tau \sum _{k=1}^{K} \left||u^{*,k}(\mu )\right||_U^2 \Big )^{1/2}\) and \(\tilde{{\varDelta }}_N^u(\mu )/ \Big ( \tau \sum _{k=1}^{K} \left||u^{*,k}(\mu )\right||_U^2 \Big )^{1/2}\) in the weak-constraint case. We observe that the error and bound converge at the same rate and that the effectivities, i.e., the ratio of the bound and the error, thus remain almost constant over N. The mean effectivities over the test sample for \(N_{\mathrm{max}}\) are 480 in the strong-constraint case and 40 in the weak-constraint case. We note that the maximum dimensions of the reduced basis state/adjoint and control spaces are \(N_{Y,{\mathrm{max}}} = 2 N_{\mathrm{max}} = 160\) and \(N_{U,\mathrm{max}}^0 = 21\) (strong-constraint), and \(N_{Y,{\mathrm{max}}} = 2 N_{\mathrm{max}} + 1 = 201\) and \(N_{U,{\mathrm{max}}} = N_{{\mathrm{max}}} = 100\) (weak-constraint). Especially in the strong-constraint case, we thus obtain a considerable reduction in the dimension of the control space from \(\mathcal {N}= {13,131}\) to \(N_{U,{\mathrm{max}}}^0 = 21\). This will also be reflected in the required number of CG iterations to solve the reduced-order 4D-Var problem (see below).

Fig. 4
figure 4

Maximum relative control error and error bound over number of Greedy iterations N for strong-constraint case (left) and weak-constraint case (right)

We next report on the online computational times of our reduced-order approach. Similar to the full-order approach, the reduced-order solution times also depend on \(\mu\) (smaller for \(\mu = 10\) and higher for \(\mu = 50\)) and of course also strongly on N. We first consider the strong-constraint case: the solution times for the reduced-order 4D-Var problem range from 10 ms to 1.37 s, and the evaluation of the a posteriori error bound \({\varDelta }_N^u(\mu )\) takes between 2.8 and 29 ms. We note that the computation of the error bound is much faster than the solution of the 4D-Var problem itself. Furthermore, we note that the computational time to evaluate the error bound depends only on N and not on \(\mu\) (i.e., evaluating the bound for fixed N at \(\mu = 10\) or \(\mu = 50\) takes the same time). The overall online speed-up for \(N = N_{\mathrm{max}}\) thus ranges from approximately 23–40.

In the weak-constraint case, the solution times for the reduced-order 4D-Var problem range from 99 ms to 12.6 s, and the evaluation of the a posteriori error bound \(\tilde{{\varDelta }}_N^u(\mu )\) takes between 4.8 and 71 ms. Again, the evaluation of the error bound is much faster than the solution of the 4D-Var problem itself. The online speed-up for \(N = N_{\mathrm{max}}\) is now approximately 15.

In order to illustrate the connection between the approximation error and the online solution time, we plot the average online solution time of the reduced-order 4D-Var problem versus the average relative error over the test sample in Fig. 5. Recall that the full-order solution takes approximately 30–54 s for the strong-constraint case and 114–189 s for the weak-constraint case.

Fig. 5
figure 5

Average online solution time of the reduced-order 4D-Var problem over the average relative error for strong-constraint case (left) and weak-constraint case (right)

We next show results for the number of CG iterations required to solve the reduced-order 4D-Var problem. In Fig. 6, we plot the number of CG iterations as a function of the parameter \(\mu\) for various values of N and \(N_U\) on the left for the strong-constraint case and on the right for the weak-constraint case. In the same plots, we also show the number of CG iterations required to solve the full-order problem. We observe a different behavior in the strong- and weak-constraint case. We first note that in the weak-constraint case the number of reduced-order CG iterations converges to the number of full-order CG iterations with increasing N. However, in the strong-constraint case the number of reduced-order CG iterations is bounded by \(N_U^0\), which is significantly smaller than N. The number of reduced-order CG iterations is thus almost constant over \(\mu\) for given N and considerably smaller than the number of full-order CG iterations even for \(N = N_{\mathrm{max}}\).

Fig. 6
figure 6

Required number of CG iterations for solving the full- and reduced-order 4D-Var problem in dependence of the parameter \(\mu\) and the number of Greedy iterations N. Strong-constraint case (left) and weak-constraint case (right)

Finally, we consider the outer minimization problem and try to estimate the unknown true parameter \(\mu ^\text {true} = 30\) which leads to the noisy measurements. To this end, we define the “optimal” parameters \(\mu ^*\) and \(\mu _N^*\) which minimize the full-order and reduced-order cost functionals

$$\begin{aligned} \mu ^* = \mathop {\hbox {arg min}}\limits _{\mu \in \mathcal {D}} J^*(\mu ) \quad \text {and} \quad \mu _N^*= \mathop {\hbox {arg min}}\limits _{\mu \in \mathcal {D}} J_N^*(\mu ), \end{aligned}$$
(67)

respectively. We compute the optimal estimated parameters \(\mu ^*\) and \(\mu _N^*\) using the Matlab routine fminbnd, which needs only evaluations of the full-order and reduced-order cost functional. We also define the maximum relative cost functional error \(e_{J,N}^{\max } = \max _{\mu \in \mathcal {D}} |J^*(\mu ) - J_N^*(\mu )|/|J^*(\mu )|\) and parameter error \(e_{\mu ,N} := |\mu ^* - \mu _N^*|/|\mu ^*|\). We present these errors for the strong- and weak-constraint case as a function of N in Table 1. We observe that in both cases the cost functional error and parameter error converge very fast, i.e., the reduced-order approach allows us to recover the optimal parameter \(\mu ^*\). We also note that the (full-order) optimal parameter is close to the true parameter in the strong-constraint case (\(\mu ^* = 29.67\) vs. \(\mu ^\text {true} = 30\)), but that this is not true in the weak-constraint case (\(\mu ^* = 45.36\) vs. \(\mu ^\text {true} = 30\)). Since \(\mu _N^* \rightarrow \mu ^*\) with increasing N, this is of course also true for—and the best we can expect of—the reduced-order optimal parameters.

Table 1 Error in cost functional and estimated parameter over number of Greedy iterations N

7 Conclusion

In this paper, we considered the strong- and weak-constraint 4D-Var data assimilation problem. We presented a reduced-order approach to the 4D-Var problem based on the reduced basis method and proposed rigorous and efficiently evaluable a posteriori error bounds for the optimal control, i.e., the initial condition in the strong-constraint setting and the model-error forcing in the weak-constraint setting. For both instances we showed numerical results confirming the validity of the proposed approach. We also presented theoretical results for the combined case with unknown initial condition and model-error forcing.

We note that although we consider a parametrized problem here, the error bounds can also be used in the non-parametrized reduced-order setting and are independent of how the reduced-order spaces are constructed. The bound thus directly applies to reduced-order approaches where the spaces are constructed, e.g., using empirical orthogonal functions, POD, or dual-weighted POD (Daescu and Navon 2008). We also believe that the error bounds can be gainfully applied in a multi-fidelity approach to solve the 4D-Var problem, e.g., in a trust-region approach as proposed in Chen et al. (2011) and Du et al. (2013).

Although we also presented results for the error in the cost functional and for estimating the unknown model parameter, we currently cannot provide rigorous and sharp a posteriori error bounds for these quantities. Furthermore, we considered only a fixed setting for the noise level and regularization parameter here; a detailed analysis of the influence of these parameters on the performance of the reduced order model has not been performed. These are topics of current and future research in our groups.