1 Introduction

The methods of data assimilation (DA) have become an important tool for analysis of complex physical phenomena in various fields of science and technology. These methods allow us to combine mathematical models, data from observations and a priori information.

Currently, there is an increasing interest in computational technologies that combine the flows of real data and hydrodynamic forecasts using mathematical models. This is especially true for 4D technologies - the combination of the flows of observational data and forecasts in a certain spatio-temporal domain. These methods have received the greatest applications in meteorology and oceanography, where observations are assimilated into numerical models. Geophysical flows are governed by equations derived from fluid dynamics: a set of nonlinear partial differential equations of the first order with respect to time. Formally, it is a Cauchy problem, and an initial condition is necessary to integrate these equations, to carry out a prediction. The purpose of assimilation procedures is to construct or refine the initial and boundary conditions (or other model parameters) to improve the accuracy of a prediction model Le Dimet and Talagrand (1986), Asch et al. (2016), Fletcher (2017), Carrassi et al. (2018).

At present, two main approaches are well known for the assimilation of observational data in models of geophysical hydrodynamics and oceanography. The first is the Statistical approach which is based on the methods of probability theory and mathematical statistics. Historically, its rigorous justification and limits of applicability were given by Markov (1900) and Kolmogorov (1946). From a methodological point of view, this method gave rise to the methods of optimal interpolation, the Kalman filter methods and their subsequent modifications, widely used in various fields of science and technology. This approach is used to estimate unknown quantities from measurement data, taking into account the random nature of measurement errors.

The second approach is based on the methods of calculus of variations, optimal control (see, e.g., Lions (1968), Pontryagin et al. (1964)) and the theory of adjoint equations (see Marchuk (1995)). Compared to the statistical method, the variational method has greater versatility. It allows, on a unified methodological basis, to solve the problems of initializing hydrophysical fields, assessing the sensitivity of a model solution, identifying model parameters, etc. The variational approach can be applied by assimilating information of various types and measuring systems. In this case this approach is reffered to as variational data assimilation (VDA) Le Dimet and Talagrand (1986), Asch et al. (2016), Fletcher (2017), Carrassi et al. (2018). The main idea of the method is to minimize some functional that describes the deviation of the model solution from the observational data, and the minimum of this functional is sought on the model trajectories, in other words, in the subspace of model solutions.

Basically, as seen as a problem of optimal control, VDA is an optimization problem and as such we need to exhibit a necessary optimality condition derived from the evaluation of the gradient of the cost function, which should be zero at the optimum. Information on the gradient of the cost function (first-order information) is used to construct the optimality system (OS). To this aim and for the numerical solution of the optimization problem, the representation of the gradient through adjoint equations (first-order adjoint problem) is often used Le Dimet and Talagrand (1986), Marchuk (1995). In the case of discontinuous processes in the physics (rain, deep convection, etc.) the cost function is no longer differentiable and the formal application of the adjoint operator will evaluate a sub-gradient.

To study the variational data assimilation problem (as an optimal control problem) and to develop efficient algorithms for its numerical solution, second-order information is needed. This is information about the Hessian of the cost function. A necessary and sufficient optimality condition is to get the Hessian positive definite at the optimum; therefore, a second-order analysis must be carried out. Often, to construct the Hessian, it is necessary to differentiate the optimality system. In this case, a second-order adjoint problem arises Le Dimet et al. (2002). The investigation of the second-order adjoint equations and the Hessian of the cost functional plays an important role in the study of the solvability of the variational assimilation problem, the construction of algorithms for its numerical solution based on the modification of Newton type methods, the identification of model parameters, and the study of the sensitivity of the optimal solution and its functionals. These issues are the subject of this chapter.

2 Variational Data Assimilation

Variational methods were introduced in meteorology  in 1958 by Sasaki (1958). These methods consider the equations governing the flow as constraints and the problem is closed by using a variational principle, e.g. the minimization of the discrepancy between the model and the observations. Using Optimal Control Techniques (Lions (1968)) was proposed by Le Dimet (1982), Le Dimet and Talagrand (1986), Talagrand and Courtier (1987), Penenko and Obraztsov (1976), Marchuk et al. (1978).

Consider the mathematical model of a physical process that is described by the nonlinear evolution problem

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \varphi }{\displaystyle \partial t}&{}=&{} F(\varphi ) +f, \quad t\in (0,T)\\ \varphi \bigl |_{t=0}&{}=&{} u, \\ \end{array} \right. \end{aligned}$$
(1)

where the initial state u is supposed to be from a Hilbert space X, the unknown function \(\varphi =\varphi (t)\) belongs to \(Y=L_2(0,T;X)\) with the norm \( \Vert \varphi \Vert _Y=(\varphi ,\varphi )_Y^{1/2}= (\int _0^T\Vert \varphi (t)\Vert _X^2 dt)^{1/2}\), F is a nonlinear operator mapping Y into Y, \(f\in Y\). We suppose that for given \(u\in X, f\in Y\) there exists a unique solution \(\varphi \in Y\) to (1) with \(\frac{ \displaystyle \partial \varphi }{ \displaystyle \partial t}\in Y\).

Often, the the initial state u is supposed to be unknown, and one would like to find it using the information from observations. Let us introduce the cost function as a functional on X in the form

$$\begin{aligned} J(u)=\frac{1}{2} (V_1(u-u_{b}), u-u_{b})_{X}+ \frac{1}{2} (V_2(C\varphi -\varphi _{obs}), C\varphi -\varphi _{obs})_{Y_{obs}}, \end{aligned}$$
(2)

where \(u_b\in X\) is a prior (background) function, \(\varphi _{obs}\in Y_{obs}\) is a prescribed function (observational data), \(Y_{obs}\) is a Hilbert space (observation space), \(C:Y\rightarrow Y_{obs}\) is a linear bounded operator (observation operator), \(V_1: X\rightarrow X\) and \(V_2: Y_{obs} \rightarrow Y_{obs}\) are symmetric positive definite bounded operators. Usually, \(V_1, V_2\) are chosen as inverse covarianvce operators of background and observation errors, respectively, Asch et al. (2016), Carrassi et al. (2018).

Let us consider the following data assimilation problem with the aim to find the initial value u: for given \(f\in Y, \varphi _{obs}\in Y_{obs}, u_b\in X\), find \(u\in X\) and \(\varphi \in Y\) such that they satisfy (1), and on the set of solutions to (1), the functional J(u) takes the minimum value, i.e.

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \varphi }{\displaystyle \partial t}&{}=&{} F(\varphi )+f, \quad t\in (0,T)\\ \varphi \bigl |_{t=0}&{}=&{} u, \\ J(u)&{}=&{}\inf \limits _{{\displaystyle w}\in X} J(w). \\ \end{array} \right. \end{aligned}$$
(3)

This is a so-called hind-cast (initialization) variational DA problem, a typical DA problem often considered in numerical weather prediction and oceanographic applications Le Dimet and Talagrand (1986), Asch et al. (2016), Fletcher (2017), Carrassi et al. (2018). We suppose that the solution of (3) exists. To derive the optimality system, we assume the solution \(\varphi \) and the operator \(F(\varphi )\) in (1)–(2) are regular enough, and for \(u, w\in X\) introduce the directional (Gâteaux) derivative with respect to u in the direction w (Gâteaux differential):

$$ dJ(u,w)=\lim _{\tau \rightarrow 0}\frac{J(u+\tau w)-J(u)}{\tau }=\frac{d}{d\tau }J(u+\tau w)\biggl |_{\tau =0}. $$

If dJ(uw) is linear with respect to w, then it may be represented as follows:

$$ dJ(u,w)=J'(u)w, $$

where \(J'(u)\) is the gradient of J with respect to u. From (1)–(2) we get

$$\begin{aligned} dJ(u, w) =(V_1(u-u_{b}), w)_{X}+ (C^*V_2(C\varphi -\varphi _{obs}), \tilde{\phi })_{Y}, \end{aligned}$$
(4)

where \(\tilde{\phi }\) is the solution to the tangent linear problem:

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \tilde{\phi }}{\displaystyle \partial t}&{}=&{} F'_{\varphi } (\varphi )\tilde{\phi }, \quad t\in (0,T), \\ \phi \bigl |_{t=0}&{}=&{} w. \\ \end{array} \right. \end{aligned}$$
(5)

Here \(F'_{\varphi }(\varphi ): Y\rightarrow Y\) is the Fréchet derivative of F Marchuk et al. (1996) with respect to \(\varphi \), and \(C^*\) is the adjoint operator to C defined by \((C\varphi ,\psi )_{Y_{obs}}=(\varphi ,C^*\psi )_Y, \; \varphi \in Y, \psi \in Y_{obs}\).

Let us introduce the adjoint operator \((F'_{\varphi }(\varphi ))^*: Y\rightarrow Y\) and consider the adjoint problem:

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \varphi ^{*}}{\displaystyle \partial t}+ (F'_{\varphi }(\varphi ))^*\varphi ^*&{}=&{} C^*V_3(C\varphi -\varphi _{obs}), \quad t\in (0,T) \\ \varphi ^{*}\bigl |_{t=T}&{}=&{} 0. \\ \end{array} \right. \end{aligned}$$
(6)

The problem (6) is adjoint with respect to the linearized (tangent linear) problem (5), therefore, it is also linear in \(\varphi ^*\), however, it is still nonlinear in \(\varphi \).

In the below consideration, we assume that the direct and adjoint linear problems of the form

$$ \left\{ \begin{array}{rcl} \frac{ \displaystyle \partial \phi }{\displaystyle \partial t}-F'_{\varphi }(\varphi )\phi &{}=&{}p, \quad t\in (0,T) \\ \phi \bigl |_{t=0}&{}=&{} q, \\ \end{array} \right. $$
$$ \left\{ \begin{array}{rcl} -\frac{\displaystyle \partial \phi ^*}{\displaystyle \partial t}- (F'_{\varphi }(\varphi ))^*\phi ^*&{}=&{}g, \quad t\in (0,T) \\ \phi ^*\bigl |_{t=T}&{}=&{} 0\\ \end{array} \right. $$

with \( p,g\in Y, q\in X\) have the unique solutions \(\phi , \phi ^*\in Y\) and \(\frac{ \displaystyle \partial \phi }{ \displaystyle \partial t}, \frac{ \displaystyle \partial \phi ^*}{ \displaystyle \partial t}\in Y\) .

From (4)–(6) we get

$$\begin{aligned} dJ(u,w) = (V_1(u-u_{b}), w)_{X}- (\varphi ^*\bigl |_{t=0}, w)_X. \end{aligned}$$
(7)

The relation (7) exibits the linear dependence of dJ(uw) with respect to w. Thus, \(dJ(u,w)=J'(u)w\), and the gradient of J with respect to u is defined by

$$ J'(u) =V_1(u-u_{b}) - \varphi ^*\bigl |_{t=0}. $$

The necessary optimality condition Lions (1968) is \(J'(u)=0\). From (3)–(7) we obtain the Optimality System :

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \varphi }{\displaystyle \partial t}&{}=&{} F(\varphi )+f, \quad t\in (0,T), \\ \varphi \bigl |_{t=0}&{}=&{} u, \\ \end{array} \right. \end{aligned}$$
(8)
$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \varphi ^{*}}{\displaystyle \partial t}+ (F'_{\varphi }(\varphi ))^*\varphi ^*&{}=&{} C^*V_3(C\varphi -\varphi _{obs}) , \quad t\in (0,T) \\ \varphi ^{*}\bigl |_{t=T}&{}=&{} 0, \\ \end{array} \right. \end{aligned}$$
(9)
$$\begin{aligned} V_1(u-u_{b}) - \varphi ^*\bigl |_{t=0}=0. \end{aligned}$$
(10)

It worth to point out that there is no approximation in the derivation of the optimality system and the only assumption is the differentiability of the operator of the model. Some authors consider, at this level, a so-called "linear tangent approximation", it is fully unnecessary.

We suppose that the system (8)–(10) has a unique solution \(\varphi , \varphi ^{*}\in Y, u\in X\). The system (8)–(10) may be considered as a generalized model of the form \( \mathcal {A}(U)=0\) with the state variable \(U=(\varphi , \varphi ^*, u)\), and it contains the information on the observation data \(\varphi _{obs}\in Y_{obs}\). The optimality system plays a fundamental role in studying the solvability of the original data assimilation problem, searching efficient algorithms for its solution, and studying the sensitivity of the optimal solution with respect to observations.

3 Computing the Hessian

Consider the Hessian \(\mathcal{H}(u)\) of the functional (2); it depends on \(u\in X\) (which may be the exact solution, the optimal solution, or some arbitrary function \(u\in X\)). For a fixed \(u\in X\) the Hessian \(\mathcal{H}(u)\) is defined by the successive solutions of the below-formulated problems. First we find \(\varphi \) and \(\varphi ^{*}\) by solving the direct and adjoint problems (like in the optimality system):

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \varphi }{\displaystyle \partial t}&{}=&{} F(\varphi )+f, \quad t\in (0,T) \\ \varphi \bigl |_{t=0}&{}=&{} u, \\ \end{array} \right. \end{aligned}$$
(11)
$$\begin{aligned} \left\{ \begin{array}{rcl} -\frac{\displaystyle \partial \varphi ^{*}}{\displaystyle \partial t}- (F'(\varphi ))^*\varphi ^*&{}=&{} -C^*V_2(C\varphi -\varphi _{obs}) , \quad t\in (0,T) \\ \varphi ^{*}\bigl |_{t=T}&{}=&{} 0. \\ \end{array} \right. \end{aligned}$$
(12)

Note that here u is not necessarily the optimal solution from the optimality system (8)–(10); it is just some fixed function at which we would like to compute the Hessian. (Hence, in general, the functions \(\varphi \) and \(\varphi ^{*}\) do not satisfy the optimality system). Note also that (11)–(12) are usual two steps when we compute the gradient of the functional J(u) (at the point u) using the adjoint problem. If for a fixed u the functions \(\varphi , \varphi ^*\) are computed from (11)–(12), the gradient of J with respect to u is defined by

$$\begin{aligned} J'(u) =V_1(u-u_{b}) - \varphi ^*\bigl |_{t=0}. \end{aligned}$$
(13)

To find the Hessian we should differentiate (11)–(13) with respect to u. Then, the action of the Hessian \(\mathcal{H}(u)\) on the function \(v\in X\) is defined by the successive solutions of the following problems:

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial {\psi }}{\displaystyle \partial t}- F'(\varphi ){\psi }&{}=&{} {0}, \; t\in (0,T),\\ {\psi }|_{t=0}&{}=&{}{v}, \\ \end{array} \right. \end{aligned}$$
(14)
$$\begin{aligned} \left\{ \begin{array}{rcl} -\frac{\displaystyle \partial {\psi }^{*}}{\displaystyle \partial t}- (F'(\varphi ))^*{\psi }^*&{}=&{} ({F}''(\varphi )\psi )^*\varphi ^* -C^*V_2C{\psi }, \;\; t\in (0,T)\\ {\psi }^{*}\bigl |_{t=T}&{}=&{}0,\\ \end{array} \right. \end{aligned}$$
(15)
$$\begin{aligned} \mathcal{H}(u)v=V_1 {v}-{\psi }^*|_{t=0}. \end{aligned}$$
(16)

Here \(\varphi \) and \(\varphi ^{*}\) are involved, being taken from (11)–(12). The problem (15) is a so-called second-order adjoint problem Le Dimet et al. (2002). It involves the second derivative \({F}''(\varphi )\) of the model operator \(F(\varphi )\) and depends on the solution \(\varphi ^*\) of the first-order adjoint problem (12).

If u is the optimal solution, then \(\varphi \) and \(\varphi ^{*}\) are exactly the functions from the optimality system (8)–(10).

Formulas (11)–(16) may be used to compute the Hessian of the original cost functional. To solve the second-order adjoint problem (15), no additional software is needed to be developed. To this aim, one can use the existing code for the first-order adjoint problem (12), taking into account the new right-hand side involving the term \(({F}''(\varphi )\psi )^*\varphi ^*\). An alternative method to compute the Hessian \(\mathcal H\) is the method of finite differences described in Gill et al. (1981). However, this method is not sufficiently accurate due to truncations used in a local Taylor expansion and is expensive for practical implementation. The sensitivity matrix method Thacker (1989) is computationally efficient if the dimension of the observation vector is much smaller than the dimension of the state vector, and so is feasible mainly for the 3D-VAR applications. It requires full storage of the resulting matrix. The above-formulated second-order adjoint method allows the action \(\mathcal{H}v\) to be computed, thus does not require full storage of \(\mathcal H\).

In the finite-dimensional space, \(\mathcal{H}(u)\) is a matrix. To obtain the first column of this matrix, one can choose v in (14)–(16) to be the first basis vector \(v=(1, 0, \ldots , 0)\). To obtain the second column of this matrix, one can choose v in (14)–(16) to be the second basis vector \(v=(0, 1, 0, \ldots , 0)\), and so on.

In the linear case, the solution is unique if the Hessian is positive definite. In this case, the necessary optimality condition given by the optimality system is also a sufficient condition. From a general point of view the information given by the Hessian is important for theoretical, numerical and practical issues. For operational models it is impossible to compute the Hessian itself, as it is a square matrix with around \(10^{18}\) terms, nevertheless the most important information can be extracted from the spectrum of the Hessian which can be estimated without an explicit determination of this matrix. This information is of importance for estimating the condition number of the Hessian for preparing an efficient preconditioning.

The above-obtained system with the second order adjoint is used to compute the product of the Hessian by any vector. Of course, if we consider all the vectors of the canonical base, then it will be possible to get the complete Hessian.

The determination of this product permits to access some information. So, by using Lanczos type methods and deflation, it is possible to compute the eigenvectors and eigenvalues of the Hessian. Also, to solve the variational data assimilation problem, second-order optimization methods of Newton-type are used for equations of the form:

$$ J'\left( u \right) = 0. $$

The iterations are

$$ u_{n + 1} = u_n -\mathcal{H}^{ - 1} \left( {u_n } \right) J'\left( {u_n } \right) , $$

where \(\mathcal H \) is the Hessian of J, or its approximation. At each iteration a linear system should be solved. This is done by carrying out some iterations of a conjugate gradient methods which require computing the Hessian-vector product. To construct an approximation of the inverse Hessian, the quasi-Newton BFGS algorithm may be used Polak (1997). This algorithm generates \(\mathcal{H}^{ - 1} \) in the course of a minimization process.

In some applications (such as sensitivity analysis)  one needs to solve the system of equations in the form \(\mathcal{H}(u)v=p\). In this case, computing the Hessian-vector product by (11)–(16) may be efficient for using iterative algorithms. The following directions to construct a specialized solver for the equation \(\mathcal{H}(u)v=p\) could be considered: the use of a multi-grid strategy; the use of reduced order models (Proper Orthogonal Decomposition) or local approximations (splines, wavelets); decomposition of the spatial domain by the ‘region of influence’ principle, hence decomposition of a global DA problem into a set of local open boundary DA problems.

The inverse Hessian or its approximations may be used also to estimate the optimal solution error covariances Gejadze et al. (2008, 2011, 2013), Shutyaev et al. (2012). Assuming the so-called tangent linear hypothesis (TLH), the covariance is often approximated by the inverse Hessian of the objective function. In practice, the same approximation could be valid even though the TLH is clearly violated. However, often we deal with such a highly nonlinear dynamics that the inverse Hessian approach is no longer valid. In this case a new method for computing the covariance matrix named the ‘effective inverse Hessian’ method can be used Shutyaev et al. (2012). This method yields a significant improvement of the covariance estimate as compared to the inverse Hessian. The method is potentially feasible for large-scale applications because it can be used in the multiprocessor environment and operates in terms of the Hessian-vector products. The software blocks needed for its implementation are the standard blocks of any existing 4D-Var system. The results given by the method are consistent with the assumption on a ‘close-to-normal’ nature of the optimal solution error. This should be expected taking into account the consistency and asymptotic normality of the estimator and the fact that the observation window in variational DA is usually quite large.

4 Parameter Estimation

We should mention the importance of the parameter estimation problem itself. A precise determination of the initial condition is very important in view of forecasting, however the use of variational data assimilation is not limited to operational forecasting. In many domains (e.g. hydrology) the uncertainty in the parameters is more crucial that the uncertainty in the initial condition (e.g. White et al. (2003)). In some problems the quantity of interest can be represented directly by the estimated parameters as controls. For example, in Agoshkov et al. (2015) the sea surface heat flux is estimated in order to understand its spatial and temporal variability. The problems of parameter estimation are common inverse problems considered in geophysics and in engineering applications (see Alifanov et al. (1996), Sun (1994), Zhu and Navon (1999), Storch et al. (2007)). Last years an interest is rising to the parameter estimation using 4D-Var (Bocquet (2012), Schirber et al. (2013), Smith et al. (2013), Yuepeng et al. (2018), Agoshkov and Sheloput (2017)).

We consider a dynamic formulation of variational data assimilation problem for parameter estimation in a continuous form. Of course, the initial condition function may be also considered as a parameter, however, in our dynamic formulation we have two equations for the model: one equation for describing an evolution of the model operator (involving model parameters such as right-hand sides, coefficients, boundary conditions etc.), and another equation is considered as an initial condition.

Let the model be governed by the evolution problem of the form (1):

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \varphi }{\displaystyle \partial t}&{}=&{} F(\varphi , \lambda ) +f, \quad t\in (0,T)\\ \varphi \bigl |_{t=0}&{}=&{} u, \\ \end{array} \right. \end{aligned}$$
(17)

where F is a nonlinear operator mapping \(Y\times Y_p\) into Y, \(Y_p\) is a Hilbert space (space of control parameters, or control space). Suppose that for given \(u\in X, f\in Y\) and \(\lambda \in Y_p\) there exists a unique solution \(\varphi \in Y\) to (17) with \(\frac{ \displaystyle \partial \varphi }{ \displaystyle \partial t}\in Y\). The function \(\lambda \) is an unknown model parameter.

Let us introduce the cost function

$$\begin{aligned} \begin{array}{rcl} J(\lambda )= & {} \displaystyle \frac{1}{2} (V_1(\lambda -\lambda _{b}), \lambda -\lambda _{b})_{Y_p}+\displaystyle \frac{1}{2} (V_2(C\varphi -\varphi _{obs}), C\varphi -\varphi _{obs})_{Y_{obs}}, \end{array} \end{aligned}$$
(18)

where \(\lambda _{b}\in Y_p\) is a prior (background) function, \(\varphi _{obs}\in Y_{obs}\) is a prescribed function (observational data), \(Y_{obs}\) is a Hilbert space (observation space), \(C:Y\rightarrow Y_{obs}\) is a linear bounded observation operator, \(V_1: Y_p\rightarrow Y_p\) and \(V_2: Y_{obs} \rightarrow Y_{obs}\) are symmetric positive definite bounded operators.

Let us consider the following data assimilation problem with the aim to estimate the parameter \(\lambda \): for given \(u\in X, f\in Y\), find \(\lambda \in Y_p\) and \(\varphi \in Y\) such that they satisfy (17), and on the set of solutions to (17), the functional \(J(\lambda )\) takes the minimum value, i.e.

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \varphi }{\displaystyle \partial t}&{}=&{} F(\varphi , \lambda )+f, \quad t\in (0,T)\\ \varphi \bigl |_{t=0}&{}=&{} u, \\ J(\lambda )&{}=&{}\inf \limits _{{\displaystyle v}\in Y_p} J(v). \\ \end{array} \right. \end{aligned}$$
(19)

We suppose that the solution of (19) exists. Let us note that the solvability of the parameter estimation problems (or identifiability) has been addressed, e.g., in Chavent (1983), Navon (1998). To derive the optimality system, we assume the solution \(\varphi \) and the operator \(F(\varphi ,\lambda )\) in (17)–(18) are regular enough, and for \(v\in Y_p\) find the gradient of the functional J with respect to \(\lambda \):

$$ J'(\lambda )v = (V_1(\lambda -\lambda _{b}), v)_{Y_p}+ (V_2(C\varphi -\varphi _{obs}), C\phi )_{Y_{obs}} $$
$$\begin{aligned} = (V_1(\lambda -\lambda _{b}), v)_{Y_p}+ (C^*V_2(C\varphi -\varphi _{obs}), \phi )_{Y}, \end{aligned}$$
(20)

where \(\phi \) is the solution to the problem:

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \phi }{\displaystyle \partial t}&{}=&{} F'_{\varphi } (\varphi , \lambda )\phi + F'_{\lambda }(\varphi , \lambda ) v,\\ \phi \bigl |_{t=0}&{}=&{} 0. \\ \end{array} \right. \end{aligned}$$
(21)

Here \(F'_{\varphi }(\varphi , \lambda ): Y\rightarrow Y, \; F'_{\lambda }(\varphi , \lambda ): Y_p\rightarrow Y\) are the Fréchet derivatives of FMarchuk et al. (1996) with respect to \(\varphi \) and \(\lambda \), correspondingly, and \(C^*\) is the adjoint operator to C defined by \((C\varphi ,\psi )_{Y_{obs}}=(\varphi ,C^*\psi )_Y, \; \varphi \in Y, \psi \in Y_{obs}\).

Let us consider the adjoint operator \((F'_{\varphi }(\varphi , \lambda ))^*: Y\rightarrow Y\) and introduce the adjoint problem:

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \varphi ^{*}}{\displaystyle \partial t}+ (F'_{\varphi }(\varphi , \lambda ))^*\varphi ^*&{}=&{} C^*V_2(C\varphi -\varphi _{obs}), \\ \varphi ^{*}\bigl |_{t=T}&{}=&{} 0. \\ \end{array} \right. \end{aligned}$$
(22)

Then (20) with (21) and (22) gives

$$ J'(\lambda ) v= (V_1(\lambda -\lambda _{b}), v)_{Y_p}- (\varphi ^*, F'_{\lambda }(\varphi , \lambda ) v)_Y= $$
$$ (V_1(\lambda -\lambda _{b}), v)_{Y_p}- ((F'_{\lambda }(\varphi , \lambda ))^*\varphi ^*, v)_{Y_p}, $$

where \((F'_{\lambda }(\varphi , \lambda ))^*:Y\rightarrow Y_p\) is the adjoint operator to \(F'_{\lambda }(\varphi , \lambda )\). Therefore, the gradient of J is defined by

$$\begin{aligned} J'(\lambda ) = V_1(\lambda -\lambda _{b}) -(F'_{\lambda }(\varphi , \lambda ))^*\varphi ^*. \end{aligned}$$
(23)

From (20)–(23) we get the optimality system (the necessary optimality conditions, Lions (1968)):

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \varphi }{\displaystyle \partial t}&{}=&{} F(\varphi , \lambda )+f, \quad t\in (0,T), \\ \varphi \bigl |_{t=0}&{}=&{} u, \\ \end{array} \right. \end{aligned}$$
(24)
$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \varphi ^{*}}{\displaystyle \partial t}+ (F'_{\varphi }(\varphi , \lambda ))^*\varphi ^*&{}=&{} C^*V_2(C\varphi -\varphi _{obs}) , \\ \varphi ^{*}\bigl |_{t=T}&{}=&{} 0, \\ \end{array} \right. \end{aligned}$$
(25)
$$\begin{aligned} V_1(\lambda -\lambda _{b}) -(F'_{\lambda }(\varphi , \lambda ))^*\varphi ^*=0. \end{aligned}$$
(26)

We assume that the system (24)–(26) has a unique solution. The system (24)–(26) may be considered as a generalized model \( \mathcal {A}(U)=0\) with the state variable \(U=(\varphi , \varphi ^*, \lambda )\), and it contains information about observations.

If the observation operator C is nonlinear, i.e. \(C\varphi =C(\varphi )\), then the right-hand side of the adjoint equation (25) contains \((C'_{\varphi })^*\) instead of \(C^*\) and all the analysis presented below is similar.

To compute the Hessian \(\mathcal{H}(\lambda )\) of the cost function (18) one should differentiate (24)–(25) and (23) with respect to \(\lambda \), following Sect. 3. Then, the action of the Hessian \(\mathcal{H}(\lambda )\) on a function \(w\in Y_p\) is defined by the successive solutions of the following problems:

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \phi }{\displaystyle \partial t}-F'_{\varphi }(\varphi ,\lambda )\phi &{}=&{}F'_{\lambda }(\varphi ,\lambda )w, \quad t\in (0,T) \\ \phi \bigl |_{t=0}&{}=&{} 0, \\ \end{array} \right. \end{aligned}$$
(27)
$$\begin{aligned} \left\{ \begin{array}{rcl} -\frac{ \displaystyle \partial \phi ^*}{ \displaystyle \partial t}&{}-&{} (F'_{\varphi }(\varphi ,\lambda ))^*\phi ^*- (F''_{\varphi \varphi }(\varphi ,\lambda )\phi )^*\varphi ^*= (F''_{\lambda \varphi }(\varphi , \lambda )w)^*\varphi ^*-C^*V_2C\phi ,\\ \phi ^*\bigl |_{t=T}&{}=&{} 0, \\ \end{array} \right. \end{aligned}$$
(28)
$$\begin{aligned} \mathcal {H}(\lambda )w= V_1w - (F''_{\varphi \lambda }(\varphi , \lambda )\phi )^*\varphi ^* -(F''_{\lambda \lambda }(\varphi , \lambda )w)^*\varphi ^*- (F'_{\lambda }(\varphi , \lambda ))^*\phi ^*. \end{aligned}$$
(29)

The definition of the Hessian \(\mathcal{H}(\lambda )\) by (27)–(29) involves the second-order derivatives of the model operator F with respect to \(\varphi \) and \(\lambda \).

Numerical examples for computing the Hessian for the parameter estimation problems are presented in Gejadze et al. (2010).

5 Sensitivity Analysis

In the environmental sciences the mathematical models contain parameters which cannot be estimated precisely, because they are used to parametrize some subgrid processes and therefore can not be physically measured. Thus, it is important to be able to estimate the impact of uncertainties on the outputs of the model after assimilation. The optimal solution depends on the parameters, which may contain uncertainties, and for the forecasts it is very important to study the sensitivity of the optimal solution and its functionals with respect to these parameters Marchuk (1995), Cacuci (1981), Dontchev (1983), Griesse and Vexler (2007).

The necessary optimality condition is related to the gradient of the original cost function, thus to study the sensitivity of the optimal solution, one should differentiate the optimality system with respect to imprecisely known parameters. In this case, we come to the second-order adjoint problem Le Dimet et al. (2002). The first studies of sensitivity of the response functions after assimilation with the use of second-order adjoint were done by Le Dimet et al. (1997) for variational data assimilation problem aimed at restoration of initial condition, where sensitivity with respect to model parameters was considered. The equations of the forecast sensitivity to observations in a four-dimensional (4D-Var) data assimilation were derived by Daescu (2008). Based on these results, a practical computational approach was given by Cioaca et al. (2013) to quantify the effect of observations in 4D-Var data assimilation.

Sensitivity of the optimal solution is related to its statistical properties (see Gejadze et al. (2008, 2011, 2013), Shutyaev et al. (2012)). General sensitivity analysis in variational data assimilation with respect to observations for a nonlinear dynamic model was given in Shutyaev et al. (2017)–Shutyaev et al. (2018) to control the initial-value function and the model parameters.

Consider the mathematical model of a physical process that is described by the evolution problem of the form (17):

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \varphi }{\displaystyle \partial t}&{}=&{} F(\varphi , \lambda ), \quad t\in (0,T)\\ \varphi \bigl |_{t=0}&{}=&{} u.\\ \end{array} \right. \end{aligned}$$
(30)

Suppose that for given \(u\in X\) and \(\lambda \in Y_p\) there exists a unique solution \(\varphi \in Y\) to (30).

We introduce the functional

$$\begin{aligned} J(u)=\frac{1}{2} (V_1(u-u_{0}), u-u_{0})_X+ \frac{1}{2} (V_2(C\varphi -\varphi _{obs}), C\varphi -\varphi _{obs})_{Y_{obs}}, \end{aligned}$$
(31)

where \(u_{0}\in X\) is a prior initial-value function (background state), \(\varphi _{obs}\in Y_{obs}\) is a prescribed function (observational data), \(Y_{obs}\) is a Hilbert space (observation space), \(C:Y\rightarrow Y_{obs}\) is a linear bounded operator, \(V_1: X\rightarrow X\) and \(V_2: Y_{obs} \rightarrow Y_{obs}\) are symmetric positive definite operators.

Consider the variational data assimilation problem with the aim to identify the initial condition: for given \(\lambda \in Y_p\) find \(u\in X\) and \(\varphi \in Y\) such that they satisfy (30), and on the set of solutions to (30), the functional J(u) takes the minimum value, i.e.

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{ \displaystyle \partial \varphi }{ \displaystyle \partial t}&{}=&{} F(\varphi ,\lambda ), \quad t\in (0,T)\\ \varphi \bigl |_{t=0}&{}=&{} u, \\ J(u)&{}=&{}\inf \limits _{ v}J(v). \\ \end{array} \right. \end{aligned}$$
(32)

The necessary optimality condition reduces the problem (32) to the optimality system:

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \varphi }{\displaystyle \partial t}&{}=&{} F(\varphi ,\lambda ), \quad t\in (0,T) \\ \varphi \bigl |_{t=0}&{}=&{} u, \\ \end{array} \right. \end{aligned}$$
(33)
$$\begin{aligned} \left\{ \begin{array}{rcl} -\frac{\displaystyle \partial \varphi ^{*}}{ \displaystyle \partial t}- (F'_{\varphi }(\varphi , \lambda ))^*\varphi ^*&{}=&{} -C^*V_2(C\varphi -\varphi _{obs}) , \quad t\in (0,T) \\ \varphi ^{*}\bigl |_{t=T}&{}=&{} 0, \\ \end{array} \right. \end{aligned}$$
(34)
$$\begin{aligned} V_1 (u-u_{0})-\varphi ^{*}\bigl |_{t=0}=0 \end{aligned}$$
(35)

with the unknowns \(\varphi , \varphi ^*\), u, where \((F'_{\varphi }(\varphi , \lambda ))^*\) is the adjoint to the Frechet derivative of F with respect to \(\varphi \).

We assume that the system (33)–(35) has a unique solution. The system (33)–(35) may be considered as a generalized model \( \mathcal {F}(U, \lambda )=0\) with the state variable \(U=(\varphi , \varphi ^*, u)\), and it contains all the available information. All the components of U depend on the parameters \(\lambda \in Y_p\), which may contain uncertainties. An important issue is to study the sensitivity of this generalized model with respect to the parameters.

Let us introduce a response function \(G(\varphi , u, \lambda )\), which is supposed to be a real-valued function and can be considered as a functional on \(Y\times X\times Y_p\). We are interested in the sensitivity of G with respect to \(\lambda \), with \(\varphi \) and u obtained from the optimality system (33)–(35). As is known Marchuk (1995), Cacuci (1981), Dontchev (1983), sensitivity is defined by the gradient of G with respect to \(\lambda \), which is a functional derivative:

$$\begin{aligned} \frac{d {G}}{d \lambda }=\frac{\displaystyle \partial {G}}{ \displaystyle \partial \varphi }\frac{\displaystyle \partial {\varphi }}{\displaystyle \partial \lambda }+ \frac{\displaystyle \partial {G}}{\displaystyle \partial u}\frac{\displaystyle \partial {u}}{\displaystyle \partial \lambda }+\frac{\displaystyle \partial {G}}{\displaystyle \partial \lambda }. \end{aligned}$$
(36)

If \(\delta \lambda \) is a perturbation on \(\lambda \), we get from the optimality system:

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \delta \varphi }{ \displaystyle \partial t}&{}=&{} F'_{\varphi }(\varphi ,\lambda )\delta \varphi +F'_{\lambda }(\varphi ,\lambda )\delta \lambda , \quad t\in (0,T) \\ \delta \varphi \bigl |_{t=0}&{}=&{} \delta u, \\ \end{array} \right. \end{aligned}$$
(37)
$$\begin{aligned} \left\{ \begin{array}{rcl} -\frac{\displaystyle \partial \delta \varphi ^{*}}{ \displaystyle \partial t}- (F'_{\varphi }(\varphi , \lambda ))^*\delta \varphi ^*- (F''_{\varphi \varphi }(\varphi ,\lambda )\delta \varphi +F''_{\varphi \lambda }(\varphi ,\lambda )\delta \lambda )^*\varphi ^*&{}=&{} -C^*V_2C\delta \varphi , \\ \delta \varphi ^{*}\bigl |_{t=T}&{}=&{} 0, \\ \end{array} \right. \end{aligned}$$
(38)
$$\begin{aligned} V_1 \delta u-\delta \varphi ^{*}\bigl |_{t=0}=0, \end{aligned}$$
(39)

and

$$\begin{aligned} \biggl (\frac{d {G}}{d \lambda }, \delta \lambda \biggr )_{Y_p}=\biggl (\frac{\partial {G}}{\partial \varphi }, \delta \varphi \biggr )_Y+ \biggl (\frac{\partial {G}}{\partial u}, \delta u\biggr )_X+\biggl (\frac{\partial {G}}{\partial \lambda }, \delta \lambda \biggr )_{Y_p}, \end{aligned}$$
(40)

where \(\delta {\varphi }\), \(\delta {\varphi ^{*}}\) and \(\delta u\) are the G\(\hat{\text{ a }}\)teaux derivatives of \(\varphi \), \(\varphi ^{*}\) and u in the direction \(\delta \lambda \) (for example, \(\delta {\varphi }=\frac{\partial {\varphi }}{\partial \lambda }\delta \lambda \)).

To compute the gradient \(\nabla _\lambda G(\varphi , u, \lambda )\), let us introduce three adjoint variables \(P_1\in Y\), \(P_2\in Y\) and \(P_3\in X\). By taking the inner product of (37) by \(P_1\), (38) by \(P_2\) and of (39) by \(P_3\) and adding them, we obtain:

$$ \biggl (\delta \varphi , -\frac{ \partial P_1}{ \partial t}- (F'_{\varphi }(\varphi ,\lambda ))^*P_1- (F''_{\varphi \varphi }(\varphi ,\lambda )P_2)^*\varphi ^*+C^*V_2CP_2\biggr )_Y+ \biggl (\delta \varphi \bigl |_{t=T}, P_1\bigl |_{t=T}\biggr )_X+ $$
$$ +\biggl (\delta \varphi ^*, \frac{ \partial P_2}{ \partial t}-F'_{\varphi }(\varphi ,\lambda )P_2\biggr )_Y+\biggl (\delta \varphi ^*\bigl |_{t=0}, P_2\bigl |_{t=0}-P_3\biggr )_X+ $$
$$\begin{aligned} +\biggl (\delta u, -P_1\bigl |_{t=0}+ V_1P_3\biggr )_X+ \biggl (\delta \lambda , - (F'_{\lambda }(\varphi ,\lambda ))^*P_1-(F''_{\varphi \lambda }(\varphi ,\lambda )P_2)^*\varphi ^* \biggr )_{Y_p}=0. \end{aligned}$$
(41)

Here we put

$$ -\frac{ \partial P_1}{ \partial t}- (F'_{\varphi }(\varphi ,\lambda ))^*P_1- (F''_{\varphi \varphi }(\varphi ,\lambda )P_2)^*\varphi ^*+C^*V_2CP_2=\frac{\partial {G}}{\partial \varphi }, $$

and

$$ -P_1\bigl |_{t=0}+ V_1P_3=\frac{\partial {G}}{\partial u}, \; P_1\bigl |_{t=T}=0, \; \frac{ \partial P_2}{ \partial t}-F'_{\varphi }(\varphi ,\lambda )P_2=0,\; P_2\bigl |_{t=0}-P_3=0. $$

Hence, we can exclude the variable \(P_3\) by

$$ P_3=P_2\bigl |_{t=0} $$

and obtain the initial condition for \(P_2\) in the form:

$$ V_1P_2\bigl |_{t=0}=\frac{\partial {G}}{\partial u}+P_1\bigl |_{t=0}. $$

Thus, if \(P_1, P_2\) are the solutions of the following system of equations

$$\begin{aligned} \left\{ \begin{array}{rcl} -\frac{\displaystyle \partial P_1}{\displaystyle \partial t}- (F'_{\varphi }(\varphi ,\lambda ))^*P_1- (F''_{\varphi \varphi }(\varphi ,\lambda )P_2)^*\varphi ^*+C^*V_2CP_2&{}=&{}\frac{\displaystyle \partial {G}}{\displaystyle \partial \varphi }, \quad t\in (0,T) \\ \\ P_1\bigl |_{t=T}&{}=&{} 0, \\ \end{array} \right. \end{aligned}$$
(42)
$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{ \displaystyle \partial P_2}{ \displaystyle \partial t}-F'_{\varphi }(\varphi ,\lambda )P_2&{}=&{}0, \quad t\in (0,T) \\ \\ V_1P_2\bigl |_{t=0}&{}=&{} \frac{\displaystyle \partial {G}}{\displaystyle \partial u}+P_1\bigl |_{t=0}, \\ \end{array} \right. \end{aligned}$$
(43)

then from (41) we get

$$ \biggl (\frac{\partial {G}}{\partial \varphi }, \delta \varphi \biggr )_Y+ \biggl (\frac{\partial {G}}{\partial u}, \delta u\biggr )_X=\biggl (\delta \lambda , (F'_{\lambda }(\varphi ,\lambda ))^*P_1+(F''_{\varphi \lambda }(\varphi ,\lambda )P_2)^*\varphi ^* \biggr )_{Y_p}, $$

and the gradient of G is given by

$$\begin{aligned} \frac{d {G}}{d \lambda }=(F'_{\lambda }(\varphi ,\lambda ))^*P_1+(F''_{\varphi \lambda }(\varphi ,\lambda )P_2)^*\varphi ^* +\frac{\partial {G}}{\partial \lambda }. \end{aligned}$$
(44)

We get a coupled system of two differential equations (42) and (43) of the first order with respect to time. One equation has a final condition (backward problem) while the other has an initial condition (forward problem) depending on the initial value for the first equation: it is a non-standard problem.

Let us represent the non-standard problem (42)–(43) in an equivalent form:

$$\begin{aligned} \left\{ \begin{array}{rcl} -\frac{\displaystyle \partial P_1}{\displaystyle \partial t}- (F'_{\varphi }(\varphi ,\lambda ))^*P_1- (F''_{\varphi \varphi }(\varphi ,\lambda )P_2)^*\varphi ^*+C^*V_2CP_2&{}=&{}\frac{\displaystyle \partial {G}}{\displaystyle \partial \varphi }, \quad t\in (0,T) \\ \\ P_1\bigl |_{t=T}&{}=&{} 0, \\ \end{array} \right. \end{aligned}$$
(45)
$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial P_2}{\displaystyle \partial t}-F'_{\varphi }(\varphi ,\lambda )P_2&{}=&{}0, \quad t\in (0,T) \\ \\ P_2\bigl |_{t=0}&{}=&{} v, \\ \end{array} \right. \end{aligned}$$
(46)
$$\begin{aligned} V_1v-P_1\bigl |_{t=0}= \frac{\partial {G}}{\partial u}. \end{aligned}$$
(47)

Here we have three unknowns: \(v\in X, \; P_1, P_2 \in Y\). Let us write (45)–(47) in the form of an operator equation for v. We define the operator \(\mathcal {H}\) by the successive solution of the following problems:

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \phi }{\displaystyle \partial t}-F'_{\varphi }(\varphi ,\lambda )\phi &{}=&{}0, \quad t\in (0,T) \\ \\ \phi \bigl |_{t=0}&{}=&{} w, \\ \end{array} \right. \end{aligned}$$
(48)
$$\begin{aligned} \left\{ \begin{array}{rcl} -\frac{\displaystyle \partial \phi ^*}{ \displaystyle \partial t}- (F'_{\varphi }(\varphi ,\lambda ))^*\phi ^*- (F''_{\varphi \varphi }(\varphi ,\lambda )\phi )^*\varphi ^*&{}=&{}-C^*V_2C\phi , \quad t\in (0,T) \\ \\ \phi ^*\bigl |_{t=T}&{}=&{} 0, \\ \end{array} \right. \end{aligned}$$
(49)
$$\begin{aligned} \mathcal {H}w= V_1w - \phi ^*\bigl |_{t=0}. \end{aligned}$$
(50)

Then (45)–(47) is equivalent to the following equation in X:

$$\begin{aligned} \mathcal {H}v= \mathcal {F} \end{aligned}$$
(51)

with the right-hand side \(\mathcal {F}\) defined by

$$\begin{aligned} {\mathcal {F}} = \frac{\partial {G}}{\partial u}+ \tilde{\phi }^*\bigl |_{t=0}, \end{aligned}$$
(52)

where \( \tilde{\phi }^*\) is the solution to the adjoint problem:

$$\begin{aligned} \left\{ \begin{array}{rcl} -\frac{\displaystyle \partial \tilde{\phi }^*}{\displaystyle \partial t}- (F'_{\varphi }(\varphi ,\lambda ))^*\tilde{\phi }^*&{}=&{}\frac{\displaystyle \partial {G}}{\displaystyle \partial \varphi }, \quad t\in (0,T) \\ \\ \tilde{\phi }^*\bigl |_{t=T}&{}=&{} 0. \\ \end{array} \right. \end{aligned}$$
(53)

It is easily seen that the operator \(\mathcal {H}\) defined by (48)–(50) is the Hessian of the original functional J considered on the optimal solution u of the problem (33)–(35): \(J''(u)=\mathcal {H}\). Under the assumption that \(\mathcal {H}\) is positive definite, the operator equation (51) is correctly and everywhere solvable in X, i.e. for every \(\mathcal {F}\) there exists a unique solution \(v\in X\) and

$$ \Vert v\Vert _X \le c \Vert {\mathcal {H}}\Vert _X, \;\; c=const > 0. $$

Therefore, under the assumption that \(J''(u)\) is positive definite on the optimal solution, the non-standard problem (42)–(43) has a unique solution \(P_1, P_2\in Y\).

Based on the above consideration, we can formulate the following algorithm to solve the non-standard problem:

(1) For \(\frac{\partial {G}}{\partial u}\in X, \; \frac{\partial {G}}{\partial \varphi }\in Y\) solve the adjoint problem

$$\begin{aligned} \left\{ \begin{array}{rcl} -\frac{\displaystyle \partial \tilde{\phi }^*}{\displaystyle \partial t}- (F'_{\varphi }(\varphi ,\lambda ))^*\tilde{\phi }^*&{}=&{}\frac{\displaystyle \partial {G}}{\displaystyle \partial \varphi }, \quad t\in (0,T) \\ \\ \tilde{\phi }^*\bigl |_{t=T}&{}=&{} 0\\ \end{array} \right. \end{aligned}$$
(54)

and put

$$ \mathcal {F} = \frac{\partial {G}}{\partial u}+ \tilde{\phi }^*\bigl |_{t=0}. $$

(2) Find v by solving

$$ \mathcal {H}v= \mathcal {F} $$

with the Hessian of the original functionalJ defined by (48)–(50).

(3) Solve successively the direct and adjoint problems

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial P_2}{\displaystyle \partial t}-F'_{\varphi }(\varphi ,\lambda )P_2&{}=&{}0, \quad t\in (0,T) \\ \\ P_2\bigl |_{t=0}&{}=&{} v, \\ \end{array} \right. \end{aligned}$$
(55)
$$\begin{aligned} \left\{ \begin{array}{rcl} -\frac{\displaystyle \partial \tilde{P}_1}{\displaystyle \partial t}- (F'_{\varphi }(\varphi ,\lambda ))^*\tilde{P}_1- (F''_{\varphi \varphi }(\varphi ,\lambda )P_2)^*\varphi ^*+C^*V_2CP_2&{}=&{}0, \quad t\in (0,T) \\ \\ \tilde{P}_1\bigl |_{t=T}&{}=&{} 0, \\ \end{array} \right. \end{aligned}$$
(56)

and put

$$ P_1=\tilde{P}_1 +\tilde{\phi }^*. $$

Thus, we obtain \(P_1, P_2 \in Y\) as the solutions to the non-standard problem (42)–(43), which determine the sensitivity of the response function with respect to imprecisely known parameters according to (44).

6 Sensitivity with Respect to Observations

In geophysical applications a usual request is the estimation of the sensitivity with respect to observations Langland and Baker (2004), Daescu and Langland (2013), Kalnay et al. (2012), Godinez and Daescu (2009). What will be the impact of an uncertainty on the prediction? It is clear that observations are not directly used in the forward model, they are involved only as a forcing term in the adjoint model. Therefore to apply the general formalism of sensitivity analysis we should apply it not to the model itself but to the optimality system, i.e. the model plus the adjoint model. A very simple example with a scalar ordinary differential equation is given in Le Dimet et al. (2002) showing that the only model is not sufficient to carry out sensitivity analysis in the presence of data. Differentiating the optimality system will introduce second order derivatives.

Consider the mathematical model governed by the nonlinear evolution problem of the form (17):

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \varphi }{\displaystyle \partial t}&{}=&{} F(\varphi , \lambda ) +f, \quad t\in (0,T)\\ \varphi \bigl |_{t=0}&{}=&{} u.\\ \end{array} \right. \end{aligned}$$
(57)

We suppose that for given \(u\in X, f\in Y\) and \(\lambda \in Y_p\) there exists a unique solution \(\varphi \in Y\) to (57) with \(\frac{ \displaystyle \partial \varphi }{ \displaystyle \partial t}\in Y\). The function \(\lambda \) is an unknown model parameter, and we suppose that the initial state u is also unknown, so we will consider joint parameter and state estimation problem.

Let us introduce the cost function as a functional on \(X\times Y_p\) in the form

$$\begin{aligned} J(u, \lambda )=\frac{1}{2} \Vert V_1^{1/2}(u-u_{b})\Vert _{X} +\frac{1}{2} \Vert V_2^{1/2}(\lambda -\lambda _{b})\Vert _{Y_p} +\frac{1}{2} \Vert V_3^{1/2}(C\varphi -\varphi _{obs})\Vert _{Y_{obs}}, \end{aligned}$$
(58)

where \(u\in X, \lambda _{b}\in Y_p\) are prior (background) functions, \(\varphi _{obs}\in Y_{obs}\) is a prescribed function (observational data), \(Y_{obs}\) is a Hilbert space (observation space), \(C:Y\rightarrow Y_{obs}\) is a linear bounded operator (observation operator), \(V_1: X\rightarrow X, V_2: Y_p\rightarrow Y_p\) and \(V_3: Y_{obs} \rightarrow Y_{obs}\) are symmetric positive definite bounded operators.

Let us consider the following data assimilation problem with the aim to find the initial value u and the parameter \(\lambda \): for given \(f\in Y, \varphi _{obs}\in Y_{obs}\), find \(u\in X, \lambda \in Y_p\) and \(\varphi \in Y\) such that they satisfy (57), and on the set of solutions to (57), the functional \(J(u, \lambda )\) takes the minimum value, i.e.

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \varphi }{\displaystyle \partial t}&{}=&{} F(\varphi , \lambda )+f, \quad t\in (0,T)\\ \varphi \bigl |_{t=0}&{}=&{} u, \\ J(u, \lambda )&{}=&{}\inf \limits _{{\displaystyle w}\in X, {\displaystyle v}\in Y_p} J(w, v). \\ \end{array} \right. \end{aligned}$$
(59)

We suppose that the solution of (59) exists. The necessary optimality condition reduces (59) to the optimality system:

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \varphi }{\displaystyle \partial t}&{}=&{} F(\varphi , \lambda )+f, \quad t\in (0,T), \\ \varphi \bigl |_{t=0}&{}=&{} u, \\ \end{array} \right. \end{aligned}$$
(60)
$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \varphi ^{*}}{\displaystyle \partial t}+ (F'_{\varphi }(\varphi , \lambda ))^*\varphi ^*&{}=&{} C^*V_3(C\varphi -\varphi _{obs}) , \quad t\in (0,T) \\ \varphi ^{*}\bigl |_{t=T}&{}=&{} 0, \\ \end{array} \right. \end{aligned}$$
(61)
$$\begin{aligned} V_1(u-u_{b}) - \varphi ^*\bigl |_{t=0}=0, \end{aligned}$$
(62)
$$\begin{aligned} V_2(\lambda -\lambda _{b}) -(F'_{\lambda }(\varphi , \lambda ))^*\varphi ^*=0. \end{aligned}$$
(63)

Here \(F'_{\varphi }(\varphi , \lambda ): Y\rightarrow Y, \; F'_{\lambda }(\varphi , \lambda ): Y_p\rightarrow Y\) are the Fréchet derivatives of F with respect to \(\varphi \) and \(\lambda \), correspondingly, and \(C^*\) is the adjoint operator to C defined by \((C\varphi ,\psi )_{Y_{obs}}=(\varphi ,C^*\psi )_Y, \; \varphi \in Y, \psi \in Y_{obs}\).

Supposing that the system (60)–(63) has a unique solution \(\varphi , \varphi ^{*}\in Y, u\in X, \lambda \in Y_p\), we will study the sensitivity of functionals of the optimal solution with respect to the observation data \(\varphi _{obs}\).

We introduce a response function \(G(\varphi , u, \lambda )\), which is supposed to be a real-valued function and can be considered as a functional on \(Z=Y\times X\times Y_p\). We are interested in the sensitivity of G with respect to \(\varphi _{obs}\), with \(\varphi , u\) and \(\lambda \) obtained from the optimality system (60)–(63). By definition, the sensitivity is defined by the gradient of G with respect to \(\varphi _{obs}\):

$$\begin{aligned} \frac{d {G}}{d \varphi _{obs}}=\frac{\displaystyle \partial {G}}{ \displaystyle \partial \varphi }\frac{\displaystyle \partial {\varphi }}{\displaystyle \partial \varphi _{obs}}+ \frac{\displaystyle \partial {G}}{\displaystyle \partial \lambda }\frac{\displaystyle \partial {\lambda }}{\displaystyle \partial \varphi _{obs}}+ \frac{\displaystyle \partial {G}}{\displaystyle \partial u}\frac{\displaystyle \partial {u}}{\displaystyle \partial \varphi _{obs}}, \end{aligned}$$
(64)

where \(\frac{\displaystyle \partial {G}}{ \displaystyle \partial \varphi }: Z\rightarrow Y, \frac{\displaystyle \partial {G}}{\displaystyle \partial \lambda }: Z\rightarrow Y_p, \frac{\displaystyle \partial {G}}{\displaystyle \partial u}: Z\rightarrow X\), and \(\frac{\displaystyle \partial {\varphi }}{\displaystyle \partial \varphi _{obs}}, \frac{\displaystyle \partial {\lambda }}{\displaystyle \partial \varphi _{obs}}, \frac{\displaystyle \partial {u}}{\displaystyle \partial \varphi _{obs}}\) are the G\(\hat{\text{ a }}\)teaux derivatives of \(\varphi \), \(\lambda , u\) with respect to \(\varphi _{obs}\).

Let \(\delta \varphi _{obs}\) be a perturbation on \(\varphi _{obs}\), then we obtain from the optimality system (60)–(63):

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \delta \varphi }{ \displaystyle \partial t}&{}=&{} F_{\varphi }'(\varphi , \lambda )\delta \varphi + F_{\lambda }'(\varphi , \lambda )\delta \lambda ,\quad t\in (0,T) \\ \delta \varphi \bigl |_{t=0}&{}=&{} \delta u, \\ \end{array} \right. \end{aligned}$$
(65)
$$\begin{aligned} \left\{ \begin{array}{rcl} -\frac{\displaystyle \partial \delta \varphi ^{*}}{ \displaystyle \partial t}- (F'_{\varphi }(\varphi , \lambda ))^*\delta \varphi ^* -(F''_{\varphi \varphi }(\varphi , \lambda )\delta \varphi )^*\varphi ^*&{}=&{} (F''_{\varphi \lambda }(\varphi , \lambda )\delta \lambda )^*\varphi ^*\\ -C^*V_3(C\delta \varphi -\delta \varphi _{obs}) , \\ \delta \varphi ^{*}\bigl |_{t=T}&{}=&{} 0, \\ \end{array} \right. \end{aligned}$$
(66)
$$\begin{aligned} V_1\delta u - \delta \varphi ^*\bigl |_{t=0}=0, \end{aligned}$$
(67)
$$\begin{aligned} V_2 \delta \lambda - (F''_{\lambda \varphi }(\varphi , \lambda )\delta \varphi )^*\varphi ^* - (F''_{\lambda \lambda }(\varphi , \lambda )\delta \lambda )^*\varphi ^*-(F'_{\lambda }(\varphi , \lambda ))^*\delta \varphi ^*=0, \end{aligned}$$
(68)

and

$$\begin{aligned} \biggl (\frac{d {G}}{d \varphi _{obs}}, \delta \varphi _{obs}\biggr )_{Y_{obs}}=\biggl (\frac{\partial {G}}{\partial \varphi }, \delta \varphi \biggr )_Y+ \biggl (\frac{\partial {G}}{\partial \lambda }, \delta \lambda \biggr )_{Y_p}+ \biggl (\frac{\partial {G}}{\partial u}, \delta u\biggr )_{X}, \end{aligned}$$
(69)

where \(\delta {\varphi }\), \(\delta {\varphi ^{*}}\), \(\delta \lambda , \delta u\) are the solutions of (65)–(68).

Following the methodology presented in Sect. 5, we obtain the gradient of G through solutions of a non-standard problem.

Let \(P_1, P_2\in Y, P_3\in Y_p, P_4\in X\) be the solutions of the following system of equations

$$\begin{aligned} \left\{ \begin{array}{rcl} -\frac{ \displaystyle \partial P_1}{\displaystyle \partial t}- (F'_{\varphi }(\varphi ,\lambda ))^*P_1- (F''_{\varphi \varphi }(\varphi ,\lambda )P_2)^*\varphi ^*&{}=&{} (F''_{\lambda \varphi }(\varphi , \lambda )P_3)^*\varphi ^*-C^*V_3CP_2\\ +\frac{\displaystyle \partial {G}}{\displaystyle \partial \varphi },\\ P_1\bigl |_{t=T}&{}=&{} 0, \\ \end{array} \right. \end{aligned}$$
(70)
$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial P_2}{ \displaystyle \partial t}-F'_{\varphi }(\varphi ,\lambda )P_2-F'_{\lambda }(\varphi ,\lambda )P_3&{}=&{}0, \quad t\in (0,T) \\ P_2\bigl |_{t=0}-P_4&{}=&{} 0, \\ \end{array} \right. \end{aligned}$$
(71)
$$\begin{aligned} V_1P_4-P_1\bigl |_{t=0}=\frac{\partial {G}}{\partial u}, \end{aligned}$$
(72)
$$\begin{aligned} V_2P_3 - (F''_{\varphi \lambda }(\varphi , \lambda )P_2)^*\varphi ^*- (F''_{\lambda \lambda }(\varphi , \lambda )P_3)^*\varphi ^*- (F'_{\lambda }(\varphi , \lambda ))^*P_1=\frac{\partial {G}}{\partial \displaystyle \lambda }, \end{aligned}$$
(73)

where \(\varphi , \varphi ^{*}\in Y, u\in X, \lambda \in Y_p\) are the solution of the optimality system (60)–(63). Then the gradient of G with respect to \(\varphi _{obs}\) is given by

$$\begin{aligned} \frac{d {G}}{d \varphi _{obs}}=V_3CP_2. \end{aligned}$$
(74)

We obtain a coupled system of two differential equations (70) and (71) of the first order with respect to time, with additional conditions (72)–(73). To study this non-standard problem (70)–(73) with mutually dependent initial conditions for \(P_1, P_2\), we reduce it to a single operator equation involving the Hessian of the original cost function \(J(u,\lambda )\).

The Hessian \(\mathcal {H}: X\times Y_p\rightarrow X\times Y_p\) acts on \(U=(w,v)^T \in X\times Y_p\) and is defined by the successive solution of the following problems:

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \phi }{\displaystyle \partial t}-F'_{\varphi }(\varphi ,\lambda )\phi &{}=&{}F'_{\lambda }(\varphi ,\lambda )v, \quad t\in (0,T) \\ \phi \bigl |_{t=0}&{}=&{} w, \\ \end{array} \right. \end{aligned}$$
(75)
$$\begin{aligned} \left\{ \begin{array}{rcl} -\frac{ \displaystyle \partial \phi ^*}{ \displaystyle \partial t}- (F'_{\varphi }(\varphi ,\lambda ))^*\phi ^*- (F''_{\varphi \varphi }(\varphi ,\lambda )\phi )^*\varphi ^*&{}=&{} (F''_{\lambda \varphi }(\varphi , \lambda )w)^*\varphi ^*-C^*V_3C\phi ,\\ \phi ^*\bigl |_{t=T}&{}=&{} 0, \\ \end{array} \right. \end{aligned}$$
(76)
$$\begin{aligned} \mathcal {H}U= \biggl ( V_1w-\phi ^*\bigl |_{t=0}, \; V_2v - (F''_{\varphi \lambda }(\varphi , \lambda )\phi )^*\varphi ^*- (F''_{\lambda \lambda }(\varphi , \lambda )w)^*\varphi ^*- (F'_{\lambda }(\varphi , \lambda ))^*\phi ^*\biggr )^T, \end{aligned}$$
(77)

where \(\lambda , u, \varphi \) and \(\varphi ^*\) are the solutions of the optimality system (60)–(63). It is easily seen that (70)–(73) is equivalent to the following equation in \(X\times Y_p\):

$$\begin{aligned} \mathcal {H}U= \mathcal {F} \end{aligned}$$
(78)

with some \(\mathcal {F}\in X\times Y_p\).

Under the assumption that \(\mathcal {H}\) is positive definite, the operator equation (78) is correctly and everywhere solvable in \(X\times Y_p\), i.e. for every \(\mathcal {F}\) there exists a unique solution \(U\in X\times Y_p\) and the estimate is valid:

$$ \Vert U\Vert _{X\times Y_p} \le c \Vert {\mathcal {F}}\Vert _{X\times Y_p}, \;\; c=const > 0. $$

Therefore, under the assumption that \(J''(u,\lambda )\) is positive definite on the optimal solution, the non-standard problem (70)–(73) has a unique solution \(P_1, P_2\in Y, P_3\in Y_p, P_4\in X\).

Based on (70)–(74),we can formulate the following algorithm to compute the gradient of the response function G:

(1) For \(\frac{\displaystyle \partial {G}}{\displaystyle \partial \lambda }\in Y_p, \; \frac{\displaystyle \partial {G}}{\displaystyle \partial \varphi }\in Y,\; \frac{\displaystyle \partial {G}}{\displaystyle \partial u}\in X\) solve the adjoint problem

$$\begin{aligned} \left\{ \begin{array}{rcl} -\frac{\displaystyle \partial \tilde{\phi }^*}{\displaystyle \partial t}- (F'_{\varphi }(\varphi , \lambda ))^*\tilde{\phi }^*&{}=&{}\frac{\displaystyle \partial {G}}{\displaystyle \partial \varphi }, \quad t\in (0,T) \\ \tilde{\phi }^*\bigl |_{t=T}&{}=&{} 0\\ \end{array} \right. \end{aligned}$$
(79)

and put

$$ \mathcal {F} = \biggl (\frac{\partial {G}}{\partial u}+\tilde{\phi }^*\bigl |_{t=0}, \; \frac{\partial {G}}{\partial \lambda }+ (F'_{\lambda }(\varphi , \lambda ))^*\tilde{\phi }^*\biggr )^T. $$

(2) Find \(U=(w, v)^T\) by solving

$$ \mathcal {H}U= \mathcal {F} $$

with the Hessian of the original functionalJ defined by (75)–(77).

(3) Solve the direct problem

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial P_2}{\displaystyle \partial t}-F'_{\varphi }(\varphi ,\lambda )P_2&{}=&{}F'_{\lambda }(\varphi ,\lambda )v, \quad t\in (0,T) \\ P_2\bigl |_{t=0}&{}=&{} w. \\ \end{array} \right. \end{aligned}$$
(80)

(4) Compute the gradient of the response function as

$$\begin{aligned} \frac{d {G}}{d \varphi _{obs}}=V_3CP_2. \end{aligned}$$
(81)

The last formula allows us to estimate the sensitivity of the response functions related to the optimal solution after assimilation, with respect to observation data.

7 Application for a Sea Thermodynamics Model

We consider the sea thermodynamics problem in the form Marchuk et al. (1987):

$$ T_t +(\bar{U},\mathrm {Grad})T-\mathrm {Div}(\hat{a}_T \cdot \mathrm {Grad} T)=f_T\; \text{ in } \; D\times (t_{0} ,t_1 ), $$
$$ T=T_{0} \text{ for } t=t_{0} \text{ in } D, $$
$$\begin{aligned} -\nu _T \frac{\partial T}{\partial z}=Q \text{ on } \Gamma _S \times (t_{0} ,t_1), \quad \frac{\partial T}{\partial n }=0 \text{ on } \Gamma _{w,c} \times (t_{0} ,t_1 ), \end{aligned}$$
(82)
$$ \bar{U}_n^{\left( - \right) } T+\frac{\partial T}{\partial n }= { Q_T } \text{ on } \Gamma _{w,op} \times (t_{0} ,t_1), $$
$$ \displaystyle \frac{\partial T}{\partial n}=0 \text{ on } \Gamma _H \times (t_{0} ,t_1 ), $$

where \(T=T(x,y,z,t)\) is an unknown temperature function, \(t\in (t_0,t_1)\), \((x,y,z)\in D = \Omega \times (0,H)\), \(\Omega \subset R^2\), \(H=H(x,y)\) is the function of the bottom releif, \(Q=Q(x,y,t)\) is the total heat flux, \(\bar{U} = (u,v,w)\), \(\widehat{a}_{T}=\;\)diag\(((a_{T})_{ii})\), \((a_T)_{11} = (a_T)_{22} = \mu _T\), \((a_T)_{33} = \nu _T\), \(f_T = f_T(x,y,z,t)\) are given functions. The boundary of the domain \(\Gamma \equiv \partial D\) is represented as a union of four disjoint parts \(\Gamma _S\), \(\Gamma _{w,op}\), \(\Gamma _{w,c}\), \(\Gamma _{H}\), where \(\Gamma _{S} = \Omega \) (the unperturbed sea surface), \(\Gamma _{w,op}\) is the liquid (open) part of vertical lateral boundary, \(\Gamma _{w,c}\) is the solid part of the vertical lateral boundary, \(\Gamma _{H}\) is the sea bottom, \(\bar{U}_n^{(-)}=(|\bar{U}_n| - \bar{U}_n)/2,\) and \(\bar{U}_n\) is the normal component of \(\bar{U}\). The other notations and a detailed description of the problem statement can be found in Agoshkov et al. (2008).

Problem (82) can be written in the form of an operator equation:

$$\begin{aligned} \begin{array}{c} T_t + LT = \mathcal {F}+BQ, \quad t\in (t_{0},t_1),\\[2mm] T=T_{0},\quad \;\;\; t=t_{0}, \end{array} \end{aligned}$$
(83)

where the equality is understood in the weak sense, namely,

$$\begin{aligned} (T_t,\widehat{T}) + (LT,\widehat{T}) = \mathcal {F}(\widehat{T})+ (B Q,\widehat{T}) \;\; \forall \widehat{T}\in W_2^1(D), \end{aligned}$$
(84)

in this case L, \(\mathcal {F}, B\) are defined by the following relations:

$$ (LT,\widehat{T}) \equiv \int \limits _{D} (-T \mathrm {Div}(\bar{U}\widehat{T}))d D +\int \limits _{\Gamma _{w,op}} \bar{U}_n^{(+)} T\widehat{T} d\Gamma + \int \limits _{D} \widehat{a}_{T}\mathrm {Grad}(T)\cdot \mathrm {Grad}(\widehat{T})d D, $$
$$ \mathcal {F}(\widehat{T}) = \int \limits _{\Gamma _{w,op}} Q_T \widehat{T} d\Gamma + \int \limits _D f_T \widehat{T} dD, (T_t,\widehat{T})=\int \limits _{D} T_t \widehat{T} d D, (B Q,\widehat{T}) = \int \limits _{\Omega } Q \widehat{T}\big |_{z=0} d\Omega , $$

and the functions \(\widehat{a}_T\), \(Q_T\), \(f_T\), Q are such that equality (84) makes sense. The properties of the operator L were studied in Agoshkov et al. (2008).

Problem (82) is linear in TQ, however, written in the form (83), it is a particular case of the original problem (57), and all the reasoning and the methodology presented in Sect. 6 are easily transferred to the case of problem (83), understood in a weak sense (83).

We consider the data assimilation problem for the sea surface temperature (see Agoshkov et al. (2008)). Suppose that the functions \(Q\in L_2(\Omega \times (t_0, t_1))\) and \(T_0\in L_2(D)\) are unknown in problem (82). Let also \(T_{obs}(x,y,t)\in L_2(\Omega \times (t_0, t_1))\) be the function on \(\Omega \) obtained for \(t\in (t_{0},t_1)\) by processing the observation data, and this function in its physical sense is an approximation to the surface temperature function on \(\Omega \), i.e. to \(T\big |_{z=0}\). We admit the case when \(T_\mathrm{obs}\) is defined only on some subset of \(\Omega \times (t_0, t_1)\) and denote the indicator (characteristic) function of this set by \(m_0\). For definiteness sake, we assume that \(T_{obs}\) is zero outside this subset.

Consider the data assimilation problem for the surface temperature in the following form: find \(T_0\) and Q such that

$$\begin{aligned} \left\{ \begin{array}{rcl} T_t + LT &{}=&{} \mathcal {F} + BQ \;\; \text{ in } \;\; D\times (t_0,t_1),\\[2mm] T &{}=&{} T_0, \;\; \;\; t=t_0\\[2mm] J(T_0, Q) &{}=&{} \inf \limits _{w,v} J(w,v), \end{array}\right. \end{aligned}$$
(85)

where

$$ J(T_0, Q) = \frac{\alpha }{2}\int \limits _{t_0}^{t_1}\int \limits _{\Omega } |Q-Q^{(0)}|^2 d\Omega dt+ \frac{\beta }{2}\int \limits _{D} |T_0-T^{(0)}|^2 dD+ $$
$$\begin{aligned} + \frac{1}{2}\int \limits _{t_0}^{t_1}\int \limits _{\Omega } m_0|T\big |_{z=0}-T_{obs}|^2 d\Omega dt, \end{aligned}$$
(86)

and \(Q^{(0)}=Q^{(0)}(x,y,t), T^{(0)}=T^{(0)}(x,y,z) \) are given functions, \(\alpha , \beta =const>0\).

For \(\alpha ,\beta >0\) this variational data assimilation problem has a unique solution. The existence of the optimal solution follows from the classic results of the theory of optimal control problems Lions (1968).

The optimality system determining the solution of the formulated variational data assimilation problem according to the necessary condition \(\mathrm {grad} J=0\) has the form:

$$\begin{aligned} \begin{array}{c} T_t + LT = \mathcal {F}+BQ \quad \text{ in } \;\; D\times (t_0,t_1), \\[2mm] T = T_0, \quad \;\; t=t_0, \end{array} \end{aligned}$$
(87)
$$\begin{aligned} \begin{array}{c} -(T^*)_t + L^* T^* = Bm_0(T_\mathrm{obs}-T) \; \text{ in } \;\; D\times (t_0,t_1), \\[2mm] T^* = 0, \quad \;\; t=t_1, \end{array} \end{aligned}$$
(88)
$$\begin{aligned} \alpha (Q-Q^{(0)}) - T^* =0 \quad \text{ on } \;\; \Omega \times (t_0,t_1), \end{aligned}$$
(89)
$$\begin{aligned} \beta (T_0-T^{(0)}) - T^*\bigl |_{t=0} =0 \quad \text{ in } \;\; D, \end{aligned}$$
(90)

where \(L^*\) is the operator adjoint to L.

Here the boundary-value function Q plays the role of \(\lambda \) from Sect. 6, \(\varphi =T\), the operator F has the form \(F(T,Q)=-LT+BQ\), and \(F'_T=-L, F'_Q=B\). Since the operator F(TQ) is linear in this case and \(F''_{TT}=F''_{QT}=F''_{QQ}=0\), the Hessian \(\mathcal {H}\) acting on some \(U=(w,\psi )^T\), \(w\in L_2(D), \psi \in L_2(\Omega \times (t_0, t_1))\) is defined by the successive solution of the following problems:

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial \phi }{\displaystyle \partial t}+L\phi &{}=&{}B\psi , \quad t\in (t_0,t_1) \\ \phi \bigl |_{t=t_0}&{}=&{} w, \\ \end{array} \right. \end{aligned}$$
(91)
$$\begin{aligned} \left\{ \begin{array}{rcl} -\frac{\displaystyle \partial \phi ^*}{ \displaystyle \partial t} +L^*\phi ^*&{}=&{}-Bm_0\phi , \quad t\in (t_0,t_1) \\ \phi ^*\bigl |_{t=t_1}&{}=&{} 0, \\ \end{array} \right. \end{aligned}$$
(92)
$$\begin{aligned} \mathcal {H}U= (\beta w -\phi ^*\bigl |_{t=0}, \alpha \psi - B^*\phi ^*)^T. \end{aligned}$$
(93)

To illustrate the above-presented theory, we consider the problem of sensitivity of functionals of the optimal solution \(T_0, Q\) to the observations \(T_{obs}\). Let us introduce the following response function:

$$\begin{aligned} G(T)=\int \limits _{t_0}^{t_1} dt \int \limits _{\Omega } k(x, y, t)T(x, y, 0, t)d\Omega , \end{aligned}$$
(94)

where k(xyt) is a weight function related to the temperature field on the sea surface \(z=0\). For example, if we are interested in the mean temperature of a specific region of the sea \(\omega \) for \(z=0\) in the interval \(\bar{t}-\tau \le t\le \bar{t}\), then as k we take the function

$$\begin{aligned} k(x, y, t) =\left\{ \begin{array}{cc} 1\Big /(\tau \text{ mes } \omega ) &{} \text{ if }\; (x, y)\in \omega , \; \bar{t}-\tau \le t\le \bar{t} \\ 0 &{} \text{ else }, \end{array}\right. \end{aligned}$$
(95)

where \(\text{ mes } \omega \) denotes the area of the region \(\omega \). Thus, the functional (94) is written in the form:

$$\begin{aligned} G(T)=\frac{1}{\tau }\int \limits _{\bar{t}-\tau }^{\bar{t}} dt \Biggl (\frac{1}{\text{ mes } \omega }\int \limits _{\omega }T(x,y,0,t)d\Omega \Biggr ). \end{aligned}$$
(96)

Formula (96) represents the mean temperature averaged over the time interval \(\bar{t}-\tau \le t\le \bar{t}\) for a given region \(\omega \). The response functions of this type are of most interest in the theory of climate change (Marchuk (1995), Marchuk et al. (1996)).

In our notations the functional (94) may be written as

$$ G(T)=\int \limits _{t_0}^{t_1} (Bk,T) dt = (Bk,T)_Y, \;\; Y=L_2(D\times (t_0, t_1)). $$

We are interested in the sensitivity of the response function G(T), obtained for T after data assimilation, with respect to the observation function \(T_{obs}\).

By definition, the sensitivity is given by the gradient of G with respect to \(T_{obs}\):

$$\begin{aligned} \frac{d {G}}{d T_{obs}}=\frac{\displaystyle \partial {G}}{ \displaystyle \partial T}\frac{\displaystyle \partial T}{\displaystyle \partial T_{obs}}. \end{aligned}$$
(97)

Since \(\frac{\displaystyle \partial {G}}{ \displaystyle \partial T}=Bk\), then according to the theory presented in Sect. 6, to compute the gradient (97) we need to perform the following steps:

1) For k defined by (95) solve the adjoint problem

$$\begin{aligned} \left\{ \begin{array}{rcl} -\frac{\displaystyle \partial \tilde{\phi }^*}{\displaystyle \partial t} +L^*\tilde{\phi }^*&{}=&{}Bk, \quad t\in (t_0,t_1) \\ \tilde{\phi }^*\bigl |_{t=t_1}&{}=&{} 0\\ \end{array} \right. \end{aligned}$$
(98)

and put \( \Phi = (\tilde{\phi }^*\bigl |_{t=0}, B^*\tilde{\phi }^*)^T. \)

2) Find \(U=(w,v)^T\) by solving \( \mathcal {H}U= \Phi \) with the Hessian defined by (91)–(93).

3) Solve the direct problem

$$\begin{aligned} \left\{ \begin{array}{rcl} \frac{\displaystyle \partial P_2}{\displaystyle \partial t}+LP_2&{}=&{}Bv, \quad t\in (t_0,t_1) \\ P_2\bigl |_{t=t_0}&{}=&{} w. \end{array} \right. \end{aligned}$$
(99)

4) Compute the gradient of the response function as

$$\begin{aligned} \frac{d {G}}{d T_{obs}}=m_0P_2\bigl |_{z=0}. \end{aligned}$$
(100)

The last formula allows us to estimate the sensitivity of the functionals related to the mean temperature after data assimilation, with respect to the observations on the sea surface.

For numerical experiments have used the three-dimensional numerical model of the Baltic Sea hydrothermodynamics developed at the INM RAS on the base of the splitting method Zalesny et al. (2017) and supplied with the assimilation procedure Agoshkov et al. (2008) for the surface temperature \(T_{obs}\) with the aim to reconstruct the heat fluxes Q and the initial state \(T_0\).

The parameters of the considered domain of the Baltic Sea and its geographic coordinates can be described as follows: \(\sigma \)-grid is \(336\times 394\times 25\) (the latitude, longitude, and depth, respectively). The first point of the "grid C" Zalesny et al. (2017) has the coordinates \(9.406^\circ \) E and \(53.64^\circ \) N. The mesh sizes in x and y are constant and equal to 0.0625 and 0.03125 degrees. The time step is \(\Delta t = 5\) minutes. The assimilation procedure worked only during some time windows. To start the assimilation procedure, the function \(T^{(0)}\) was taken as a model forecast for the previous time interval.

The Baltic Sea daily-averaged nighttime surface temperature data were used for \(T_{obs}\). These are the data of the Danish Meteorological Institute based on measurements of radiometers (AVHRR, AATSR and AMSRE) and spectroradiometers (SEVIRI and MODIS) Karagali et al. (2012). Data interpolation algorithms were used Zakharova et al. (2013) to convert observations on computational grid of the numerical model of the Baltic Sea thermodynamics. The mean climatic flux obtained from the NCEP (National Center for Environmental Prediction) reanalysis was taken for \(Q^{(0)}.\)

Using the hydrothermodynamics model mentioned above, which is supplied with the assimilation procedure for the surface temperature \(T_{obs}\), we have performed calculations for the Baltic Sea area where the assimilation algorithm worked only at certain time moments \(t_0\); in this case \(t_1=t_0+\Delta t\). The aim of the experiment was the numerical study of the sensitivity of functionals of the optimal solution \(T_0, Q\) to observation errors in the interval \((t_0, t_1)\).

We use the discretize-then-optimize approach, and for numerical experiments all the presented equations are understood in a discrete form, as finite-dimensional analogues of the corresponding problems, obtained after approximation. This allows us to consider model equations as a perfect model, with no approximation errors.

Let us present some results of numerical experiments.

The calculation results for \(t_0\) = 50 h (600 time steps for the model) are presented in Fig. 1 showing the gradient of the response function G(T) defined by (96) and related to the mean temperature after data assimilation, with respect to the observations on the sea surface, according to (98)– (100). Here \(\omega = \Omega \), \(\tau = \Delta t\), \(\bar{t} = t_1\), \(\alpha =\beta =10^{-5}\) (Fig. 1).

Fig. 1
figure 1figure 1

The gradient of the response function G(T)

We can see the sub-areas (in red) in which the response function G(T) is most sensitive to errors in the observations during assimilation. The largest values of the gradient of G(T) correspond to the points xy with a small depth (cf. sea topography, Fig. 2). Thus, the considered functional G(T) of the optimal solution turned out to be the most sensitive to observation errors at surface points near these regions. This result is confirmed by the direct computation of the response function G(T) according to (96) obtained after assimilation, by introducing perturbations into the observation data \(T_{obs}\).

Fig. 2
figure 2figure 2

Baltic Sea topography 

The above studies allow to determine the sea sub-areas in which the response function related to the optimal solution is most sensitive to errors in the observations during variational data assimilation.

8 Conclusions

Variational data assimilation is an efficient method in modeling the large-scale geophysical flows, with the main difficulty being linked to the nonlinearity of the governing equations. This method allows to combine the observational data and the model forecasts. From the mathematical point of view, we have to deal with the initial-boundary-value control problems for a nonlinear evolution model governed by partial differential equations. The necessary optimality condition is defined by the optimality system which is based on the gradient of the cost function and involves forward and adjoint equations. To study the variational data assimilation problem as an optimal control problem and to develop efficient algorithms for its numerical solution, second-order information is needed. This is information about the Hessian of the cost functional. To construct the Hessian, it is necessary to differentiate the optimality system and derive a second-order adjoint problem. The investigation of the second-order adjoint equations and the Hessian of the cost function plays an important role in the study of the solvability of the variational assimilation problem, the construction of algorithms for its numerical solution based on the modification of Newton type methods, the identification of model parameters. The Hessian allows to study the sensitivity of the optimal solution and its functionals with respect to observations and uncertainties in parameters.