INTRODUCTION

In recent decades, data have been collected, processed, and interpreted; retrospective analysis has been carried out; and high-quality models of subsystems of the global biogeochemical system have been developed within a number of international research projects on global changes on the Earth in order to create predictive models taking into account modern models of the global climate system. Currently, a coordinated approach to the development and application of global climate system models (including the atmosphere, oceans, the cryosphere, and the biosphere), as well as the estimation of the sensitivity of climate predictability using such coupled models, is being developed.

There is increasing interest in the problems of assimilating and processing observational data for retrospective analysis in different branches of knowledge in connection with studies of global changes on Earth. In problems of geophysical hydrodynamics, in particular, in meteorology and oceanography, mathematical models are used to study and predict hydrodynamic fields. These models are based on the laws of hydrodynamics, which follow from the conservation of mass, momentum, energy, etc., which leads to systems of nonlinear partial differential equations. These equations, although necessary, are insufficient to predict the evolution of fields. Additional information is required, including, in particular, initial conditions and model parameters. This information can be obtained by observations. Data assimilation methods are used to predict the state of the flow at the right moment based on all available observations.

In recent decades, significant progress has been made in Earth sciences due to improved observation systems and understanding of laws of geosystems. A fairly accurate description of the initial conditions is one of the fundamental requirements for successful forecasting in oceanography. Data assimilation aims to obtain the best (in a certain sense) estimation of the state of a physical system from its observations and an adequate mathematical model.

The data assimilation method is widely used in Earth sciences. It is most popular in meteorology and oceanography, where atmospheric and ocean observations are assimilated into atmospheric and oceanic models in order to obtain initial conditions (or other model parameters) for further modeling and forecasting. In recent years, data assimilation methods have been also used to analyze other observations of the geosystem including the biosphere, the cryosphere, and the soil surface.

Researchers have always wanted not only to know and understand the climatic and current states of hydrodynamic fields in the atmosphere and the ocean, but also to be able to predict them. It is necessary to estimate the current state, which, in turn, depends on a certain state in the past, to make a forecast for the future. The first attempts to estimate the state of the system based on the analysis of observational data were made in meteorology in the middle of the 19th century by Vice Admiral Robert FitzRoy, founder of the British Meteorological Service. Subjective analysis (the simplest data interpolation) was then used by Richardson [1], Charney [2], and Phillips [3]. Eventually, objective analysis replaced manual graphical interpolations of observational data with more rigorous mathematical methods, from polynomial interpolation and sequential estimation algorithms to modern variational methods.

Data assimilation methods were most developed in dynamic meteorology and physical oceanography, as well as in the real-time numerical prediction of atmospheric and oceanic fields. To date, theoretical and practical ideas of data assimilation can be found in technical [4, 5], mathematical [69], and geophysical literature [1013]. The Seventh International Symposium of the World Meteorological Organization on observation data assimilation in meteorology and oceanography (Brazil, September 2017: http://www. cptec.inpe.br/das2017/) showed significant progress in the practical application of modern assimilation methods based on the optimal control approach (variational data assimilation) and on the sequential estimation approach (statistical methods), as well as on a combination of both approaches (hybrid methods).

Currently, intensive studies on the development of information computation systems (ICS) using observation data assimilation procedures (satellite, shipboard, etc.) are being conducted in a number of countries. The development of modern information and computing systems can rightly be attributed to the interdisciplinary fundamental problems of computer science, mathematics, physics, and many other areas of science and technology. The development of such ICS’s is nowadays necessary from the point of view of economy, national security, and other needs of the state and society. The most important problems here are the implementation of real-time short-term and long-term weather forecasts, determining the areas of high biological productivity, ensuring the safety of navigation and selection of optimal ship routes, controlling the ecology of the sea, detecting and monitoring especially dangerous phenomena (such as storm surges and tsunamis), and predicting marine disasters and estimating possible damage they may cause and the risks arising from them. The problems of monitoring and predicting the state of the environment are of vital importance for human society. New geoinformation technologies, including the technology for developing variational observation data assimilation, make it possible to develop a unified system for monitoring and forecasting geosystems for global monitoring programs.

In recent years, there have been qualitative changes in measurement systems. The global scientific community is receiving more and more measurements of various characteristics of our geosystem. Therefore, the development of technologies for variational observation data assimilation based on modern approaches is an urgent problem.

In this paper, we review and analyze approaches to data assimilation in problems of geophysical hydrodynamics, from the simplest sequential assimilation schemes to modern variational methods. Special attention is paid to the study of the problem of variational assimilation in a weak formulation, in particular, to the construction of an optimality system and the estimation of the covariance matrices of the optimal solution errors. This is a new direction of research in which the author has obtained some results.

1 METHODS AND APPROACHES TO OBSERVATION DATA ASSIMILATION

1.1 Basic Notation and Formulation of the Problem

Consider a mathematical model that describes the evolution of a hydrodynamic system (atmospheric, oceanic, or coupled) as follows

$$\left\{ \begin{gathered} \frac{{dx}}{{dt}} = M\left( {x,t} \right),\quad t > 0 \hfill \\ {{\left. x \right|}_{{t = 0}}} = {{x}_{0}}, \hfill \\ \end{gathered} \right.$$
((1.1))

where x is the state vector of the model, M is the corresponding dynamic operator of the model, and \({{x}_{0}}\) is the vector of the initial state. In numerical simulation or prediction, dynamic operator M is generally nonlinear and deterministic, while the true flow field differs from (1.1) by a random or systematic error. In geophysical hydrodynamics, (1.1) is usually a system of nonlinear partial differential equations, which is often called a distributed parameter system in mathematical literature. The dependent variable x is called the field.

Observations are given by some vector function \({{y}^{0}}\left( t \right)\), which satisfies the following equation:

$${{y}^{0}}\left( t \right) = H\left( {{{x}^{t}}} \right) + \varepsilon ,$$
((1.2))

where H is the observation operator, \({{x}^{t}}\) is the true flow field, and \(\varepsilon \) is the error function (noise). Function \({{y}^{0}}\left( t \right)\) is assumed to be given, while there is usually no information about function \(\varepsilon \). Operator H, as well as M, can be nonlinear. It sets the mapping of the state vector to the observation space.

Strictly speaking, Eqs. (1.1) and (1.2) should be considered in the corresponding function spaces, and in each specific case it is important to investigate questions of solvability and properties of the solution of the problem for the development of numerical algorithms.

When model (1.1) is discretized over time using finite differences, finite elements, or (pseudo) spectral methods, a discrete model describing the transition from time \({{t}_{i}}\) to time \({{t}_{{i + 1}}}\) is often obtained:

$$x\left( {{{t}_{{i + 1}}}} \right) = {{M}_{i}}\left( {x\left( {{{t}_{i}}} \right)} \right),$$
((1.3))

where \(x\left( {{{t}_{i}}} \right)\) is the state vector with a dimension of \(n\), \(i\) is the number of the time step, and \({{M}_{i}}\) is the difference operator of the state vector dynamics. In discrete model (1.3), observations \({{y}^{0}}\) at time \({{t}_{i}}\) are given by the following equation:

$$y_{i}^{0} = {{H}_{i}}\left( {{{x}^{t}}\left( {{{t}_{i}}} \right)} \right) + {{\varepsilon }_{i}},$$
((1.4))

where \({{H}_{i}}\) is the observation operator at time \(t = {{t}_{i}}\), \({{x}^{t}}\) is the true state, and \({{\varepsilon }_{i}}\) is the error function. Vectors \(y_{i}^{0}\) have dimensions \({{p}_{i}}\). In most practical problems, \({{p}_{i}}\) is much smaller than \(n\).

Additional information (for example, initial conditions and unknown parameters of the model, which can be obtained using observational data) is required to predict the evolution of flows in problems of geophysical hydrodynamics. Thus, the data assimilation problem arises: for a given observation function \({{y}^{0}}\left( t \right)\), it is required to find, for example, an unknown a priori initial condition so that the state vector x satisfies problem (1.1) and vector H(x) is in any sense close to \({{y}^{0}}\left( t \right)\). The resulting solution of x is called a state estimate (or analysis) and is denoted by \({{x}^{a}}\).

1.2 Objective Analysis and Its Generalizations

The first attempt at objective data analysis was made by Panovsky [14] using two-dimensional (2D) polynomial interpolation of observational data. Later, this approach was developed by Gilchrist and Cressman [15], who introduced the area of influence for each observation and suggested using the so-called initial approximation field (background), the field from the previous forecast.

The approach of Bergthorsson and Doos [16] is based on an analysis of the difference between the observed data and the initial approximation and optimization of the weight assigned to each observation. This approach was later modified by Cressman [17], who proposed the successive correction method (SCM), which is an iterative algorithm for determining the state vector:

$${{x}^{{k + 1}}} = {{x}^{k}} + W\left( {{{y}^{0}} - H\left( {{{x}^{k}}} \right)} \right),\,\,\,k = 0,1,...,$$
((1.5))

where k is the iteration number, \({{x}^{0}} = {{x}^{b}}\) is the initial approximation, W is the weight operator, and H is the observation operator from (1.2). After a sufficiently large number of iterations, \({{x}^{a}} = {{x}^{k}}\) is the state estimation. Successive iterations approximate observational data on ever smaller scales, as was shown in [17]. The disadvantage of the method is that the observational data, the errors of which are not taken into account, are approached more and more accurately during iterations. Nevertheless, it is widely used for real-time weather forecasting.

The nudging method, which consists of adding a term such as (1.5) to the right-hand side of dynamic system (1.1), is a generalization of the method of successive corrections for nonstationary problem (1.1):

$$\left\{ \begin{gathered} \frac{{dx}}{{dt}} = M\left( {x,t} \right) + W({{y}^{0}} - H(x)),\quad t \in \left( {0,T} \right) \hfill \\ {{\left. x \right|}_{{t = 0}}} = {{x}_{0}}. \hfill \\ \end{gathered} \right.$$
((1.6))

This term causes the model solution to approach the observational data as accurately as possible. This method was first used in meteorology in [18] and later in oceanography in [1921]. This method is still of interest. Its new versions have appeared, in particular, the BFN algorithm [22].

1.3 Statistical Methods, Sequential Assimilation Algorithms

The use of statistical interpolation methods was a very important breakthrough in solving data assimilation problems. This approach goes back to A.N. Kolmogorov (1941), the works of N. Wiener (1949), and in Earth sciences it became known thanks to the monograph by L.S. Gandin [23]. This approach is also called optimal interpolation (OI) [24, 25]. Observations are assigned weights that are associated with observational errors. At the same time, the initial approximation field is not the first approximation for analysis, as previously, but is used together with its error characteristic along with other observational data.

Let observation operator H be linear and observation function y0 and the field of the first approximation xb be given as follows:

$${{y}^{0}} = H{{x}^{t}} + \varepsilon ,\,\,\,\,{{x}^{b}} = {{x}^{t}} + {{\varepsilon }_{b}},$$
((1.7))

where errors ε and εb are assumed to be random Gaussian vectors with zero expectation and covariance matrices

$$R = E(\varepsilon {{\varepsilon }^{T}}),\quad B = E({{\varepsilon }_{b}}{{\varepsilon }_{b}}^{T}).$$
((1.8))

The problem of optimal interpolation is to find the estimate \({{x}^{a}}\) minimizing deviation \({{x}^{t}} - {{x}^{a}}\) based on data (1.7)–(1.8), for example, in the sense of the minimum trace of the covariance matrix of the analysis:

$${\text{tr}}E[({{x}^{t}} - {{x}^{a}}){{({{x}^{t}} - {{x}^{a}})}^{T}}] \to \min .$$
((1.9))

Then the optimal interpolation method consists of determining the analysis of \({{x}^{a}}\) using the following formula [11, 12]:

$${{x}^{a}} = {{x}^{b}} + BH{\text{*}}{{\left( {HBH{\text{*}} + R} \right)}^{{ - 1}}}\left( {{{y}^{0}} - H{{x}^{b}}} \right),$$
((1.10))

where H* is the operator adjoint to H.

According to (1.10), \({{x}^{a}}\) is computed as the field of the initial approximation \({{x}^{b}}\) plus the correction, which is nothing but the result of the action of some weight operator on the vector \({{y}^{0}} - H{{x}^{b}}\). The latter is called the innovation vector or residual of observations.

It can be seen that the optimal interpolation method in the form of (1.10) is equivalent to the optimal control problem, which reduces to finding the minimum of a quadratic functional:

$$\begin{gathered} J(x) = \frac{1}{2}({{B}^{{ - 1}}}(x - {{x}^{b}}),x - {{x}^{b}}) \\ + \,\,\frac{1}{2}({{R}^{{ - 1}}}(Hx - {{y}^{0}}),Hx - {{y}^{0}}). \\ \end{gathered} $$

To do this, its first derivative should vanish:

$$\begin{gathered} J_{x}^{'}(x)\delta x \\ = ({{B}^{{ - 1}}}(x - {{x}^{b}}),\delta x) + ({{R}^{{ - 1}}}(Hx - {{y}^{0}}),H\delta x) = 0. \\ \end{gathered} $$

Hence, we obtain

$$\begin{gathered} {{x}^{a}} = {{({{B}^{{ - 1}}} + H{\text{*}}{{R}^{{ - 1}}}H)}^{{ - 1}}}({{B}^{{ - 1}}}{{x}^{b}} + H{\text{*}}{{R}^{{ - 1}}}{{y}^{0}}) \\ = {{x}^{b}} + BH{\text{*}}{{\left( {HBH{\text{*}} + R} \right)}^{{ - 1}}}\left( {{{y}^{0}} - H{{x}^{b}}} \right). \\ \end{gathered} $$

Optimal interpolation algorithm (1.10) can be divided into the following steps:

$$\left( {HBH{\text{*}} + R} \right)\xi = {{y}^{0}} - H{{x}^{b}},$$
((1.11))
$$\theta = BH{\text{*}}\xi ,$$
((1.12))
$${{x}^{a}} = {{x}^{b}} + \theta ,$$
((1.13))

where Eq. (1.11) is written in the space of observations and (1.12)–(1.13) is written in the state space.

The optimal interpolation method has been used in many operations centers since the late 1970s [24, 26]. Later, this method was developed in the works of Lorenc [27, 28], who used different approximations to solve equations (1.11)(1.13) and introduced the analysis correction method, which is a “hybrid” of optimal interpolation and successive corrections.

The optimal interpolation method and its modifications have so far been most widely used for real-time data analysis in weather forecasting [27, 2932], as well as in assimilation of oceanographic data [3335]. Ensemble optimal interpolation (EnOI) [36, 37], which makes it possible to construct parallel data assimilation algorithms [38], has gained great popularity.

The Kalman filter, which extrapolates dynamic variables and their covariances at each step and then recursively refines the state estimate [39], is a generalization of the optimal interpolation method. The continuous analog of this method is often called the Kalman–Bucy filter [40]. There are different generalizations of this method to the nonlinear case [4]. Currently, the extended Kalman filter (EKF) [41, 42], which uses model linearization near a certain known state, is very popular. A.S. Sarkisyan, V.V Knysh, G.A. Korotaev, etc. [4347] made a significant contribution to the development of Kalman filter methods and methods of 4D analysis of hydrophysical fields based on dynamic–stochastic models of the ocean. In recent years, the ensemble Kalman filter (EnKF) [4850], which is based on the Monte Carlo method at every time step, has become very popular.

1.4 Variational Methods

The use of variational methods and, in particular, optimal control methods was significant progress in solving data assimilation problems. The idea of minimizing a certain functional related to observational data on the trajectories (solutions) of the model under consideration turned out to be very productive. Thus, the data assimilation problem is formulated as an optimal control problem. The theoretical foundations of research and solution of such problems were laid in the classical works of R. Bellman (1957), L.S. Pontryagin (1962), N.N. Krasovsky (1969), J.-L. Lions (1968), G.I. Marchuk (1961), etc. Variational formalism was used for the first time in Sasaki meteorology [51, 52] and in problems of dynamic oceanography, Provost and Salmon [53].

It is necessary to calculate the gradient of the original functional when solving minimization problems. One important step in this direction was the use of the theory of adjoint equations (Marchuk, 1964; Lions, 1968). Adjoint equations have been widely used for the study and numerical solution of data assimilation problems (including the calculation of the gradient of the functional) by many researchers [5868], starting from the well-known works (Penenko [54], Marchuk and Penenko [55], Le Dimet and Talagrand [56], Lewis and Derber [57]).

Three-dimensional variational data assimilation (3D-VAR) was used for real-time analysis for the first time at the National Center for Environmental Prediction (NCEP) [69] and later at the European Centre for Medium-Range Weather Forecasts (ECMWF) [70].

Currently, four-dimensional data assimilation (4D-VAR) is attracting more and more attention, in which linearized and adjoint models are used to assimilate observational data not at a specific time, but at a given time interval. The 4D-VAR system was used for the first time at the ECMWF [71].

Let us dwell on the formulation of the problem of 4D-VAR data assimilation using the example of the problem of restoring the initial condition. Consider problem (1.1) on the interval \(\left( {0,T} \right)\):

$$\left\{ \begin{gathered} \frac{{dx}}{{dt}} = M\left( {x,t} \right),\quad t \in \left( {0,T} \right) \hfill \\ {{\left. x \right|}_{{t = 0}}} = {{x}_{0}} \hfill \\ \end{gathered} \right.$$
((1.14))

and introduce the functional of its solution:

$$\begin{gathered} J\left( {{{x}_{0}}} \right) = \frac{1}{2}\left( {{{C}_{1}}\left( {{{{\left. x \right|}}_{{t = 0}}} - x_{0}^{b}} \right),{{{\left. x \right|}}_{{t = 0}}} - x_{0}^{b}} \right) \\ + \,\,\frac{1}{2}\int\limits_0^T {\left( {{{C}_{2}}\left( {Hx - {{y}^{0}}} \right),Hx - {{y}^{0}}} \right)dt,} \\ \end{gathered} $$
((1.15))

where \(H\) is the (linear) observation operator from (1.2), \({{y}^{0}}\) is the observation function, \(x_{0}^{b}\) is the given vector, \({{C}_{1}}\), \({{C}_{2}}\) are weight operators, and \(\left( { \cdot , \cdot } \right)\) is the scalar product. Usually, \({{C}_{1}}\), \({{C}_{2}}\) are selected in the following form: \({{C}_{1}} = {{B}^{{ - 1}}}\), \({{C}_{2}} = {{R}^{{ - 1}}}\), where \(B\), \(R\) is the covariance matrix of vectors \({{\varepsilon }_{b}} = x_{0}^{b} - {{\left. {{{x}^{t}}} \right|}_{{t = 0}}}\) and \(\varepsilon = {{y}^{0}} - H{{x}^{t}}\), respectively: \(B = E({{\varepsilon }_{b}}{{\varepsilon }_{b}}^{T})\), \(R = E(\varepsilon {{\varepsilon }^{T}})\) under the assumption that ε and \({{\varepsilon }_{b}}\) are random Gaussian vectors with zero expectation. Such weight operators (or their approximations) are often used in practical problems [12, 72, 73].

Suppose that initial condition \({{x}_{0}}\) from (1.14) is unknown to us. Then the simplest data assimilation problem is formulated as follows: find \({{x}_{0}}\), \(x\) such that they satisfy (1.14) and functional (1.15) reaches its smallest value on set of solutions (1.14). In other words,

$$\left\{ \begin{gathered} \frac{{dx}}{{dt}} = M\left( {x,t} \right),\quad t \in \left( {0,T} \right) \hfill \\ {{\left. x \right|}_{{t = 0}}} = {{x}_{0}}, \hfill \\ J\left( {{{x}_{0}}} \right) = \mathop {\inf }\limits_{v} J\left( {v} \right). \hfill \\ \end{gathered} \right.$$
((1.16))

The necessary optimality condition [6] leads this problem to a system for three unknowns \({{x}_{0}}\), \(x\), x*:

$$\left\{ \begin{gathered} \frac{{dx}}{{dt}} = M\left( {x,t} \right),\quad t \in \left( {0,T} \right) \hfill \\ {{\left. x \right|}_{{t = 0}}} = {{x}_{0}}, \hfill \\ \end{gathered} \right.$$
((1.17))
$$\left\{ \begin{gathered} - \frac{{dx{\text{*}}}}{{dt}}\, = \,\left( {M{{'}}{\kern 1pt} \left( {x,t} \right)} \right){\text{*}}x{\text{*}}\, - \,H{\text{*}}{{C}_{2}}\left( {Hx\, - \,{{y}^{0}}} \right),\,\,t\, \in \,(0,T) \hfill \\ {{\left. {x{\text{*}}} \right|}_{{t = T}}} = 0, \hfill \\ \end{gathered} \right.$$
((1.18))
$${{C}_{1}}\left( {{{x}_{0}} - x_{0}^{b}} \right) - {{\left. {x{\text{*}}} \right|}_{{t = 0}}} = 0,$$
((1.19))

where \(\left( {M{{'}}\left( {x,t} \right)} \right){\text{*}}\) is the operator adjoint to the derivative of the dynamic operator of model M. System (1.17)–(1.19) is called the optimality system and plays an important role in the study and numerical solution of data assimilation problems. This system can also be obtained from the Pontryagin maximum principle formulated for problem (1.16) [61] or by the Lagrange multipliers method [74].

The solvability of nonlinear data assimilation problems and rigorous justification of numerical methods for their solution is not a simple problem. Sufficiently complete results concerning the solvability of linear optimal control problems of form (1.16)–(1.19) were obtained by Lions using the Hilbert Uniqueness Method (HUM) that he developed. Some results about the solvability of weakly nonlinear data assimilation problems were obtained in [60, 75, 67]. Further generalizations and new applications have been proposed in recent years [76, 77, 68].

Problems of form (1.16) are currently numerically solved by well-known optimization algorithms developed in classical works. A number of new iterative algorithms for solving data assimilation problems using adjoint equations were proposed in [6063, 67, 68], etc.

It is possible to use known minimization methods for problem (1.16) or solve optimality system (1.17)–(1.19) to construct a numerical algorithm for solving the data assimilation problem. In the numerical solution of a problem, it is often necessary to calculate the gradient of original functional J. This can be done using an adjoint problem selected in a suitable manner. In the case under consideration, the gradient of the functional is calculated as follows: for a given \(v\), we successively find the solutions of the direct and adjoint problems

$$\left\{ \begin{gathered} \frac{{dx}}{{dt}} = M\left( {x,t} \right),\,\,\,\,t \in \left( {0,T} \right) \hfill \\ {{\left. x \right|}_{{t = 0}}} = {v}, \hfill \\ \end{gathered} \right.$$
((1.20))
$$\left\{ \begin{gathered} - \frac{{dx{\text{*}}}}{{dt}}\, = \,\left( {M{{'}}{\kern 1pt} \left( {x,t} \right)} \right){\kern 1pt} {\text{*}}x{\text{*}} - H{\text{*}}{{C}_{2}}\left( {Hx\, - \,{{y}^{0}}} \right),\,\,\,t\, \in \,(0,T) \hfill \\ {{\left. {x{\text{*}}} \right|}_{{t = T}}} = 0 \hfill \\ \end{gathered} \right.$$
((1.21))

and put

$$J{{'}}\left( {v} \right) = {{C}_{1}}\left( {{v} - x_{0}^{b}} \right) - {{\left. {x{\text{*}}} \right|}_{{t = 0}}}.$$
((1.22))

In the works of many authors, much attention is paid to the numerical construction of adjoint model (1.21), which can be obtained both by the discretization of continuous problem (1.21) [66, 78, 79] and the direct transposition of the code of the discrete linearized problem [68, 80, 81]. In the latter case, automatic differentiation methods are often used [82, 83, 74]. These two approaches to the construction of a discrete adjoint problem were compared, for example, in [78, 84].

The properties of the optimal solution itself play an important role, along with the study of solvability, as well as the development and justification of algorithms for the numerical solution of problems of variational data assimilation. The question of the sensitivity of optimal solutions of variational assimilation problems to the errors of observational data and model errors is extremely important. This question has been, until recently, little studied. However, a number of results have been obtained using control operators over the last few years [8592]. Equations for the optimal solution error were obtained and investigated through the errors of the observational data in the problem of restoring the initial condition. The sensitivity of the optimal solution was investigated using singular vectors of control operators. It turned out that fundamental control functions, which are singular vectors of response operators, play an important role in the study of errors [85, 88, 90, 92].

Currently, 4D data assimilation algorithms [10, 11, 13] seem to be the most effective. In recent years there have been many studies comparing the ensemble Kalman method (EnKF) and variational data assimilation [9399]. In addition, a so-called hybrid approach has appeared. It combines the ensemble Kalman method and the variational data assimilation, Hybrid 4D-VAR [100104], as well as the ensemble method of 4D-VAR assimilation, 4DEnVar [105110].

2 COVARIANCE MATRICES OF OPTIMAL SOLUTION ERRORS

The a posteriori covariance matrix is an important characteristic of the optimal solution obtained from the optimality system of the variational data assimilation problem. This section is devoted to the development of algorithms for the study of covariance operators of errors of optimal solutions of problems of variational data assimilation using the cost functional Hessian. The theoretical foundations of the algorithms were laid in [85, 111113].

Consider the variational data assimilation problem by the example of initialization problem (1.14), for which optimality system (1.17)–(1.19) is valid. It is assumed that the input data are given with errors: \(x_{0}^{b} = x_{0}^{t} + {{\varepsilon }_{b}},{{y}^{0}} = H{{x}^{t}} + \varepsilon ,\) where \({{\varepsilon }_{b}}\sim N(0,B),\)\(\varepsilon \sim N(0,R),\) and \({{x}^{t}}\) is the exact solution of problem (1.14) for \({{x}_{0}} = {{x}_{0}}^{t}\):

$$\left\{ \begin{gathered} \frac{{d{{x}^{t}}}}{{dt}} = M\left( {{{x}^{t}},t} \right),\quad t \in \left( {0,T} \right) \hfill \\ {{\left. {{{x}^{t}}} \right|}_{{t = 0}}} = x_{0}^{t}. \hfill \\ \end{gathered} \right.$$
((2.1))

Here, \(\varepsilon \sim N(0,R)\) means that the random variable \(\varepsilon \) is distributed according to the Gaussian law with zero expectation and covariance matrix \(R.\) We will investigate the influence of errors \({{\varepsilon }_{b}},\varepsilon \) on the optimal solution \({{x}_{0}}\) obtained by solving (1.17)–(1.19) and formulate algorithms for calculating the covariance operators of optimal solution errors through the cost functional Hessian.

System (1.17)–(1.19) with three unknowns \(x,x{\text{*}},{{x}_{0}}\) can be considered one operator equation of the following form:

$$F(U,{{U}_{d}}) = 0,$$
((2.2))

where \(U = (x,x*,{{x}_{0}}),{{U}_{d}} = (x_{0}^{b},{{y}^{0}}).\) A similar equation holds for the following exact solution:

$$F(\bar {U},{{\bar {U}}_{d}}) = 0,$$
((2.3))

where \(U = ({{x}^{t}},x{{{\text{*}}}^{t}},x_{0}^{t}),{{U}_{d}} = (x_{0}^{t},H{{x}^{t}}),x{{{\text{*}}}^{t}} = 0.\) System (2.3) is a necessary condition for the optimality of the following minimization problem: find \(u\) and \(\phi \) such that

$$\left\{ \begin{gathered} \frac{{d\phi }}{{dt}} = M\left( {\phi ,t} \right),\quad t \in \left( {0,T} \right) \hfill \\ {{\left. \phi \right|}_{{t = 0}}} = u, \hfill \\ \bar {J}\left( u \right) = \mathop {\inf }\limits_{v} \bar {J}\left( {v} \right), \hfill \\ \end{gathered} \right.$$

where

$$\begin{gathered} \bar {J}\left( u \right) = \frac{1}{2}\left( {{{C}_{1}}\left( {u - x_{0}^{t}} \right),u - x_{0}^{t}} \right) \\ + \,\,\frac{1}{2}\int\limits_0^T {\left( {{{C}_{2}}\left( {H\phi - H{{x}^{t}}} \right),H\phi - H{{x}^{t}}} \right)dt.} \\ \end{gathered} $$

From (2.2)–(2.3), we have \(F(U,{{U}_{d}}) - F(\overline U ,{{\overline U }_{d}}) = 0.\) Let \(\delta U = U - \bar {U}\), \(\delta {{U}_{d}} = {{U}_{d}} - {{\bar {U}}_{d}}\). Then

$$F(\overline U + \delta U,{{\overline U }_{d}} + \delta {{U}_{d}}) - F(\overline U ,{{\overline U }_{d}}) = 0.$$
((2.4))

Let \(\delta x = x - {{x}^{t}},\delta {{x}_{0}} = {{x}_{0}} - x_{0}^{t},\) and then δU = \((\delta x,x*,\delta x_{0}^{{}}),\delta {{U}_{d}} = ({{\varepsilon }_{b}},\varepsilon ).\) Assuming that operator M is sufficiently smooth, \(\underline x = {{x}^{t}} + \tau (x - {{x}^{t}}),\tau \in [0,1]\) exists such that \(M(x,t) - M({{x}^{t}},t) = M{{'}}(\underline x ,t)\delta x.\) Then Eq. (2.4) is equivalent to the following system:

$$\left\{ \begin{gathered} \frac{{d\delta x}}{{dt}} = M{{'}}\left( {\underline x ,t} \right)\delta x,\,\,\,\,t \in \left( {0,T} \right) \hfill \\ \delta {{\left. x \right|}_{{t = 0}}} = \delta {{x}_{0}}, \hfill \\ \end{gathered} \right.$$
((2.5))
$$\left\{ \begin{gathered} - \frac{{dx{\text{*}}}}{{dt}} = \left( {M{{'}}\left( {x,t} \right)} \right){\text{*}}x{\text{*}} \hfill \\ - \,\,H{\text{*}}{{C}_{2}}\left( {H\delta x - \varepsilon } \right),\quad t \in (0,T) \hfill \\ {{\left. {x{\text{*}}} \right|}_{{t = T}}} = 0, \hfill \\ \end{gathered} \right.$$
((2.6))
$${{C}_{1}}\left( {\delta {{x}_{0}} - {{\varepsilon }_{b}}} \right) - {{\left. {x{\text{*}}} \right|}_{{t = 0}}} = 0.$$
((2.7))

Since \(\underline x = {{x}^{t}} + \tau \delta x,x = {{x}^{t}} + \delta x,\) then, under the assumption of smallness of \(\delta x\) and smoothness of M in (2.5)–(2.7), we can suppose

$$M{{'}}(\underline x ,t) \approx M{{'}}({{x}^{t}},t),(M{{'}}(x,t)){\text{*}} \approx (M({{x}^{t}},t)){\text{*}}.$$
((2.8))

Then (2.5)–(2.7) reduces to the following system

$$\left\{ \begin{gathered} \frac{{d\delta x}}{{dt}} = M{{'}}\left( {{{x}^{t}},t} \right)\delta x,\quad t \in \left( {0,T} \right) \hfill \\ \delta {{\left. x \right|}_{{t = 0}}} = \delta {{x}_{0}}, \hfill \\ \end{gathered} \right.$$
((2.9))
$$\left\{ \begin{gathered} - \frac{{dx{\text{*}}}}{{dt}} = \left( {M{{'}}\left( {{{x}^{t}},t} \right)} \right){\text{*}}x{\text{*}} \hfill \\ - \,\,H{\text{*}}{{C}_{2}}\left( {H\delta x - \varepsilon } \right),\,\,\,\,t \in (0,T) \hfill \\ {{\left. {x{\text{*}}} \right|}_{{t = T}}} = 0, \hfill \\ \end{gathered} \right.$$
((2.10))
$${{C}_{1}}\left( {\delta {{x}_{0}} - {{\varepsilon }_{b}}} \right) - {{\left. {x{\text{*}}} \right|}_{{t = 0}}} = 0.$$
((2.11))

System (2.9)–(2.11) is the linear data assimilation problem. For a fixed \({{x}^{t}}\), this is a necessary condition for the optimality of the following minimization problem: find \(u\) and \(\phi \) such that

$$\left\{ \begin{gathered} \frac{{d\phi }}{{dt}} = M{{'}}\left( {{{x}^{t}},t} \right)\phi ,\quad t \in \left( {0,T} \right) \hfill \\ {{\left. \phi \right|}_{{t = 0}}} = u, \hfill \\ {{J}_{1}}\left( u \right) = \mathop {\inf }\limits_{v} {{J}_{1}}\left( {v} \right), \hfill \\ \end{gathered} \right.$$

where

$$\begin{gathered} {{J}_{1}}\left( u \right) = \frac{1}{2}\left( {{{C}_{1}}\left( {u - {{\varepsilon }_{b}}} \right),u - {{\varepsilon }_{b}}} \right) \\ + \,\,\frac{1}{2}\int\limits_0^T {\left( {{{C}_{2}}\left( {H\phi - \varepsilon } \right),H\phi - \varepsilon } \right)dt.} \\ \end{gathered} $$
((2.12))

Consider the Hessian ℌ of functional (2.12). It is defined on v by the successive solution of problems

$$\left\{ \begin{gathered} \frac{{d\delta x}}{{dt}} = M{{'}}\left( {{{x}^{t}},t} \right)\delta x,\quad t \in \left( {0,T} \right) \hfill \\ {{\left. {\delta x} \right|}_{{t = 0}}} = {v}, \hfill \\ \end{gathered} \right.$$
((2.13))
$$\left\{ \begin{gathered} - \frac{{dx{\text{*}}}}{{dt}}\, = \,\left( {M{{'}}\left( {{{x}^{t}},t} \right)} \right){\text{*}}x{\text{*}}\, - \,H{\text{*}}{{C}_{2}}H\delta x,\,\,\,t\, \in \,(0,T) \hfill \\ {{\left. {x{\text{*}}} \right|}_{{t = T}}} = 0, \hfill \\ \end{gathered} \right.$$
((2.14))
$$\mathfrak{H}{v} = {{C}_{1}}{v} - {{\left. {x{\text{*}}} \right|}_{{t = 0}}}.$$
((2.15))

Let us introduce the auxiliary operators \({{R}_{1}},{{R}_{2}}.\) Let and the operator \({{R}_{2}}\) be defined on the functions g by the formula \({{R}_{2}}g = {{\left. {\theta {\text{*}}} \right|}_{{t = 0}}},\) where θ* is the solution of the adjoint problem

$$\left\{ \begin{gathered} - \frac{{d\theta {\text{*}}}}{{dt}} = \left( {M{{'}}\left( {{{x}^{t}},t} \right)} \right){\text{*}}\theta {\text{*}} + H{\text{*}}{{C}_{2}}g, \hfill \\ {{\left. {\theta {\text{*}}} \right|}_{{t = T}}} = 0. \hfill \\ \end{gathered} \right.$$
((2.16))

From (2.13)–(2.16), we conclude that system (2.9)–(2.11) is equivalent to the equation for the error \(\delta {{x}_{0}}\):

$$\mathfrak{H}\delta {{x}_{0}} = {{R}_{1}}{{\varepsilon }_{b}} + {{R}_{2}}\varepsilon .$$
((2.17))

Hessian ℌ is by definition a symmetric nonnegative definite operator. We will assume that ℌ is positive definite and, thus, invertible. Then Eq. (2.17) can be written as follows:

$$\delta {{x}_{0}} = {{T}_{1}}{{\varepsilon }_{b}} + {{T}_{2}}\varepsilon ,$$
((2.18))

where Ti = ℌ–1Ri, i=1,2.

We assume that errors \({{\varepsilon }_{b}},\varepsilon \) are random, normally distributed with zero mean, and uncorrelated among themselves while, as was mentioned above, \({{\varepsilon }_{b}}\sim N(0,B),\varepsilon \sim N(0,R).\) Then it follows from (2.18) that error \(\delta {{x}_{0}}\) is also normally distributed with zero expectation. Let P denote the covariance operator of the error of the optimal solution: \(P = E[\delta {{x}_{0}}{{(\delta {{x}_{0}})}^{T}}].\) From (2.18), we obtain

$$P = E[\delta {{x}_{0}}{{(\delta {{x}_{0}})}^{T}}] = {{T}_{1}}BT_{1}^{*} + {{T}_{2}}RT_{2}^{*},$$
((2.19))

where \(T_{i}^{*}\) are operators adjoint to \({{T}_{i}},i = 1,2.\) It is necessary to find operators \({{T}_{1}}BT_{1}^{*},{{T}_{2}}RT_{2}^{*}\) to construct operator P. Consider operator \({{T}_{1}}BT_{1}^{*}.\) Since \({{T}_{1}} = \)–1R1 = ℌ–1C1 = \(T_{1}^{*}\), then \({{T}_{1}}BT_{1}^{*} = \)–1C1BC1–1. Moreover, if , then

$${{T}_{1}}BT_{1}^{*} = {{\mathfrak{H}}^{{ - 1}}}{{C}_{1}}B{{C}_{1}}{{\mathfrak{H}}^{{ - 1}}} = {{\mathfrak{H}}^{{ - 1}}}{{B}^{{ - 1}}}{{\mathfrak{H}}^{{ - 1}}}.$$
((2.20))

Thus, the algorithm for calculating \(w = {{T}_{1}}BT_{1}^{*}{v}\) is as follows:

(1) solve equation ℌp = v,

(2) calculate C1p,

(3) solve equation ℌw = C1p.

As a result, Eq. (2.20) gives the contribution of error \({{\varepsilon }_{b}}\) to covariance operator P.

Consider now operator \({{T}_{2}}RT_{2}^{*}.\) Since \({{T}_{2}} = \)–1R2, then \({{T}_{2}}RT_{2}^{*} = \)–1R2R\(R_{2}^{*}\)–1. Let us consider the scalar product \(({{R}_{2}}g,p)\) for fixed g and p to find \(R_{2}^{*}\). We have from (2.16) that (R2g, p) = \(({{\left. {\theta {\text{*}}} \right|}_{{t = 0}}},p)\) = \(\int_0^T {(H{\text{*}}{{C}_{2}}g,\varphi )} dt = (g,R_{2}^{*}p),\) where \(R_{2}^{*}p = {{C}_{2}}H\varphi ,\) and \(\varphi \) is the solution of problem (2.13) for v = p. Thus, operator \({{T}_{2}}RT_{2}^{*}\) is determined by successively solving the following problems (for a given v):

$$\mathfrak{H}p = {v},$$
((2.21))
$$\left\{ \begin{gathered} \frac{{d\varphi }}{{dt}} = M{{'}}\left( {{{x}^{t}},t} \right)\varphi ,\quad t \in \left( {0,T} \right) \hfill \\ {{\left. \varphi \right|}_{{t = 0}}} = p, \hfill \\ \end{gathered} \right.$$
((2.22))
$$\left\{ \begin{gathered} - \frac{{d\theta {\text{*}}}}{{dt}}\, = \,\left( {M{{'}}\left( {{{x}^{t}},t} \right)} \right){\text{*}}\theta {\text{*}}\, - \,H{\text{*}}{{C}_{2}}R{{C}_{2}}H\varphi ,\,\,\,t\, \in \,(0,T) \hfill \\ {{\left. {\theta {\text{*}}} \right|}_{{t = T}}} = 0, \hfill \\ \end{gathered} \right.$$
((2.23))
$$\mathfrak{H}w = {{\left. {\theta {\text{*}}} \right|}_{{t = 0}}}.$$
((2.24))

Then \({{T}_{2}}RT_{2}^{*}{v} = w.\) If , then \(H{\text{*}}{{C}_{2}}R{{C}_{2}}H = \)\(H{\text{*}}{{C}_{2}}H\) and from (2.22)–(2.23) we have that \({{\left. {\theta {\text{*}}} \right|}_{{t = 0}}} = \)v – C1v, where ℌ is the Hessian given by formulas (2.13)(2.15). Then \({{R}_{2}}RR_{2}^{*} = \) ℌ – C1 and

$${{T}_{2}}RT_{2}^{*} = {{\mathfrak{H}}^{{ - 1}}}\left( {\mathfrak{H} - {{C}_{1}}} \right){{\mathfrak{H}}^{{ - 1}}}.$$
((2.25))

Thus, the algorithm for calculating \(w = {{T}_{2}}RT_{2}^{*}{v}\) is as follows:

(1) solve equation ℌp = v,

(2) calculate (ℌ – C1)p,

(3) solve equation ℌw = (ℌ – C1)p.

As a result, Eq. (2.25) gives the contribution of error \(\varepsilon \) to covariance operator P.

Equations (2.20) and (2.25) should be added to calculate the total contribution of uncorrelated errors \({{\varepsilon }_{b}},\varepsilon \). Then

$$P = {{T}_{1}}BT_{1}^{*} + {{T}_{2}}RT_{2}^{*} = {{\mathfrak{H}}^{{ - 1}}}{{C}_{1}}{{\mathfrak{H}}^{{ - 1}}} + {{\mathfrak{H}}^{{ - 1}}}\left( {\mathfrak{H} - {{C}_{1}}} \right){{\mathfrak{H}}^{{ - 1}}},$$

from which we obtain the following:

$$P = {{\mathfrak{H}}^{{ - 1}}}.$$
((2.26))

The latter formula gives the form of covariance operator P through the Hessian defined by Eqs. (2.13)(2.15). The rule of ℌ multiplication by some function v according to (2.13)–(2.15) or the BFGS method [114, 115], which gives an approximation for ℌ–1 as a result of iterations, can be used to calculate the inverse Hessian.

3 WEAK FORMULATION OF VARIATIONAL DATA ASSIMILATION

The formulation of the problem of variational data assimilation in the form of (1.17)–(1.19) is often called a strong formulation or a formulation with strong constraints. The strong constraints are the equations of model (1.17), which must be strictly satisfied if functional (1.15) is minimized. A significant shortcoming of formulation (1.17) is that the model is assumed to be accurate and the model errors are not taken into account. Model errors can be associated with discretization, with an inaccurate description of physical processes, and with errors in the input data. The so-called weak formulation or formulation with weak constraints is considered to take into account possible errors of the model [51, 52], [116125].

In the case of a weak formulation of the equation, the models are no longer necessarily accurate and they are included in the original cost functional. Thus, we consider the problem of minimizing the following functional instead of problem (1.14):

$$\begin{gathered} J = \frac{1}{2}\left( {{{C}_{1}}\left( {{{{\left. x \right|}}_{{t = 0}}} - x_{0}^{b}} \right),{{{\left. x \right|}}_{{t = 0}}} - x_{0}^{b}} \right) \\ + \,\,\frac{1}{2}\int\limits_0^T {\left( {{{C}_{2}}\left( {Hx - {{y}^{0}}} \right),Hx - {{y}^{0}}} \right)dt} \\ + \,\,\frac{1}{2}\int\limits_0^T {\left( {{{C}_{3}}\left( {\frac{{dx}}{{dt}} - M(x,t)} \right),\frac{{dx}}{{dt}} - M(x,t)} \right)dt,} \\ \end{gathered} $$
((3.1))

where are weight operators: C1 = B–1, C2 = , and \(B,R,Q\) are covariance matrices of vectors \({{\varepsilon }_{b}},\varepsilon ,\xi \), respectively: \(B = E[{{\varepsilon }_{b}}\varepsilon _{b}^{T}],\)\(R = E[\varepsilon {{\varepsilon }^{T}}],\)\(Q = E[\xi {{\xi }^{T}}].\) Error vectors \({{\varepsilon }_{b}},\varepsilon \) were introduced earlier, and \(\xi \) is the model error: Let us introduce the notation Then it can be seen that functional minimization problem (3.1) by x can be formulated in an equivalent form; namely, it is necessary to find \({{x}_{0}},f\) such that

$$\left\{ \begin{gathered} \frac{{dx}}{{dt}} = M\left( {x,t} \right) + f,\quad t \in \left( {0,T} \right) \hfill \\ {{\left. x \right|}_{{t = 0}}} = {{x}_{0}}, \hfill \\ J\left( {{{x}_{0}},f} \right) = \mathop {\inf }\limits_{{v},g} J\left( {{v},g} \right), \hfill \\ \end{gathered} \right.$$
((3.2))

where

$$\begin{gathered} J({{x}_{0}},f) = \frac{1}{2}\left( {{{C}_{1}}\left( {{{x}_{0}} - x_{0}^{b}} \right),{{x}_{0}} - x_{0}^{b}} \right) \\ + \,\,\frac{1}{2}\int\limits_0^T {\left( {{{C}_{2}}\left( {Hx - {{y}^{0}}} \right),Hx - {{y}^{0}}} \right)dt + } \frac{1}{2}\int\limits_0^T {\left( {{{C}_{3}}f,f} \right)dt.} \\ \end{gathered} $$

Thus, functional minimization problem (3.1) again reduces to a problem with strong constraints, but here the unknowns (controls) are not only the function of the initial condition \({{x}_{0}}\), but also the right side \(f\).

We calculate the gradients J in \({{x}_{0}}\) and \(f\), respectively, to construct an optimality system. By definition of the gradient,

$$\begin{gathered} J_{{{{x}_{0}}}}^{'}({{x}_{0}},f)\delta {{x}_{0}} = \left( {{{C}_{1}}\left( {{{x}_{0}} - x_{0}^{b}} \right),\delta {{x}_{0}}} \right) \\ + \,\,\int\limits_0^T {\left( {{{C}_{2}}\left( {Hx - {{y}^{0}}} \right),H\delta {{x}_{1}}} \right)dt,} \\ \end{gathered} $$
$$\begin{gathered} J_{f}^{'}({{x}_{0}},f)\delta f \\ = \int\limits_0^T {\left( {{{C}_{2}}\left( {Hx - {{y}^{0}}} \right),H\delta {{x}_{2}}} \right)dt + } \int\limits_0^T {\left( {{{C}_{3}}f,\delta f} \right)dt,} \\ \end{gathered} $$

where \(\delta {{x}_{1}}\), \(\delta {{x}_{2}}\) are the solutions of the following problems:

$$\left\{ \begin{gathered} \frac{{d\delta {{x}_{1}}}}{{dt}} = M{{'}}\left( {x,t} \right)\delta {{x}_{1}},\quad t \in \left( {0,T} \right) \hfill \\ {{\left. {\delta {{x}_{1}}} \right|}_{{t = 0}}} = \delta {{x}_{0}}, \hfill \\ \end{gathered} \right.$$
((3.3))
$$\left\{ \begin{gathered} \frac{{d\delta {{x}_{2}}}}{{dt}} = M{{'}}\left( {x,t} \right)\delta {{x}_{2}} + \delta f,\quad t \in \left( {0,T} \right) \hfill \\ {{\left. {\delta {{x}_{2}}} \right|}_{{t = 0}}} = 0. \hfill \\ \end{gathered} \right.$$
((3.4))

It can be seen that their sum \(\delta x = \delta {{x}_{1}} + \delta {{x}_{2}}\) satisfies the system in variations:

$$\left\{ \begin{gathered} \frac{{d\delta x}}{{dt}} = M{{'}}\left( {x,t} \right)\delta x + \delta f,\quad t \in \left( {0,T} \right) \hfill \\ {{\left. {\delta x} \right|}_{{t = 0}}} = \delta {{x}_{0}}. \hfill \\ \end{gathered} \right.$$
((3.5))

We introduce the adjoint problem with respect to (3.5) in form (1.18) to construct gradients explicitly. Then we obtain the following gradients using the well-known conjugacy relation [9]

$$J{{_{{{{x}_{0}}}}^{'}}_{{}}}({{x}_{0}},f)\delta {{x}_{0}} = \left( {{{C}_{1}}\left( {{{x}_{0}} - x_{0}^{b}} \right) - {{{\left. {x{\text{*}}} \right|}}_{{t = 0}}},\delta {{x}_{0}}} \right),$$
$$J_{f}^{'}({{x}_{0}},f)\delta f = \int\limits_0^T {\left( {{{C}_{3}}f - {x}{\text{*}},\delta f} \right)dt,} $$

which should be set to zero for any \(\delta {{x}_{0}},\delta f.\) Thus, the necessary optimality condition leads the problem to the system for four unknowns \({{x}_{0}},f,x,x{\text{*}}\):

$$\left\{ \begin{gathered} \frac{{dx}}{{dt}} = M\left( {x,t} \right) + f,\quad t \in \left( {0,T} \right) \hfill \\ {{\left. x \right|}_{{t = 0}}} = {{x}_{0}}, \hfill \\ \end{gathered} \right.$$
((3.6))
$$\left\{ \begin{gathered} - \frac{{dx{\text{*}}}}{{dt}}\, = \,\left( {M{{'}}{\kern 1pt} \left( {x,t} \right)} \right){\text{*}}x{\text{*}}\, - \,H{\text{*}}{{C}_{2}}\left( {Hx\, - \,{{y}^{0}}} \right),\,\,\,t\, \in \,(0,T) \hfill \\ {{\left. {x{\text{*}}} \right|}_{{t = T}}} = 0, \hfill \\ \end{gathered} \right.$$
((3.7))
$${{C}_{1}}\left( {{{x}_{0}} - x_{0}^{b}} \right) - {{\left. {x{\text{*}}} \right|}_{{t = 0}}} = 0,$$
((3.8))
$${{C}_{3}}f - x{\text{*}} = 0.$$
((3.9))

Note that adjoint problem (3.7) in the obtained optimality system coincides with adjoint problem (1.18), and condition (3.8) coincides with (1.19), the condition for equality of the gradient in \({{x}_{0}}\) to zero.

Let us construct covariance matrix P of optimal solution errors \(\delta u = {{(\delta {{x}_{0}},\delta f)}^{T}},\) where \(\delta {{x}_{0}} = {{x}_{0}} - {{\left. {{{x}^{t}}} \right|}_{{t = 0}}},\)\(\delta f = f - {{f}^{t}},\) which is determined by formula \(P = E[\delta u{{(\delta u)}^{T}}].\) For this purpose, we write the optimality system for errors, which under assumption (2.8) reduces to the following system, similarly to (2.5)–(2.7):

$$\left\{ \begin{gathered} \frac{{d\delta x}}{{dt}} = M{{'}}\left( {{{x}^{t}},t} \right)\delta x + \delta f,\quad t \in \left( {0,T} \right) \hfill \\ \delta {{\left. x \right|}_{{t = 0}}} = \delta {{x}_{0}}, \hfill \\ \end{gathered} \right.$$
((3.10))
$$\left\{ \begin{gathered} - \frac{{dx{\text{*}}}}{{dt}}\, = \,\left( {M{{'}}\left( {{{x}^{t}},{\kern 1pt} {\kern 1pt} t} \right)} \right){\text{*}}x{\text{*}}\, \hfill \\ - \,\,H{\text{*}}{{C}_{2}}\left( {H\delta x\, - \,\varepsilon } \right),\,\,\,t\, \in \,(0,\,T) \hfill \\ {{\left. {x{\text{*}}} \right|}_{{t = T}}} = 0, \hfill \\ \end{gathered} \right.$$
((3.11))
$${{C}_{1}}\left( {\delta {{x}_{0}} - {{\varepsilon }_{b}}} \right) - {{\left. {x{\text{*}}} \right|}_{{t = 0}}} = 0,$$
((3.12))
$${{C}_{3}}\left( {\delta f - \xi } \right) - x{\text{*}} = 0.$$
((3.13))

System (3.10)–(3.13) is nothing more than a condition of optimality for the following linear data assimilation problem: find \(\delta {{x}_{0}},\delta f\) such that

$$\left\{ \begin{gathered} \frac{{d\delta x}}{{dt}} = M{{'}}\left( {{{x}^{t}},t} \right)\delta x + \delta f,\quad t \in \left( {0,T} \right) \hfill \\ {{\left. x \right|}_{{t = 0}}} = \delta {{x}_{0}}, \hfill \\ {{J}_{2}}\left( {\delta {{x}_{0}},\delta f} \right) = \mathop {\inf }\limits_{{v},g} {{J}_{2}}\left( {{v},g} \right), \hfill \\ \end{gathered} \right.$$
((3.14))

where

$$\begin{gathered} {{J}_{2}}(\delta {{x}_{0}},\delta f) = \frac{1}{2}\left( {{{C}_{1}}\left( {\delta {{x}_{0}} - {{\varepsilon }_{b}}} \right),\delta {{x}_{0}} - {{\varepsilon }_{b}}} \right) \\ + \,\,\frac{1}{2}\int\limits_0^T {\left( {{{C}_{2}}\left( {H\delta x - \varepsilon } \right),H\delta x - \varepsilon } \right)dt} \\ + \,\,\frac{1}{2}\int\limits_0^T {\left( {{{C}_{3}}(\delta f - \xi ),f - \xi } \right)dt.} \\ \end{gathered} $$

Let us introduce the Hessian ℌ of the functional J2; it is defined on \(u = {{({v},g)}^{T}}\) by the sequential solution of problems:

$$\left\{ \begin{gathered} \frac{{d\delta x}}{{dt}} = M{{'}}\left( {{{x}^{t}},t} \right)\delta x + g,\quad t \in \left( {0,T} \right) \hfill \\ {{\left. {\delta x} \right|}_{{t = 0}}} = {v}, \hfill \\ \end{gathered} \right.$$
((3.15))
$$\left\{ \begin{gathered} - \frac{{dx{\text{*}}}}{{dt}}\, = \,\left( {M{{'}}\left( {{{x}^{t}},t} \right)} \right){\text{*}}x{\text{*}}\, - \,H{\text{*}}{{C}_{2}}H\delta x,\,\,\,t\, \in \,(0,T) \hfill \\ {{\left. {x{\text{*}}} \right|}_{{t = T}}} = 0, \hfill \\ \end{gathered} \right.$$
((3.16))
$$\mathfrak{H}{v} = {{({{C}_{1}}{v} - {{\left. {x{\text{*}}} \right|}_{{t = 0}}},{{C}_{3}}g - x{\text{*}})}^{T}}.$$
((3.17))

Then it can be seen that system (3.10)–(3.13) is equivalent to the equation for the error \(\delta u = {{(\delta {{x}_{0}},\delta f)}^{T}}\):

$$\mathfrak{H}\delta u = {{\Re }_{1}}\Xi + {{\Re }_{2}}\varepsilon ,$$
((3.18))

where \(\Xi \, = \,{{({{\varepsilon }_{b}},\xi )}^{T}},{{\Re }_{1}}\Xi \, = \,{{({{C}_{1}}{{\varepsilon }_{b}},{{C}_{3}}\xi )}^{T}},{{\Re }_{2}}\varepsilon \, = \,{{({{\left. {\theta {\text{*}}} \right|}_{{t = 0}}},\theta {\text{*}})}^{T}},\) and θ* is the solution of adjoint problem (2.16). Assuming that operator ℌ from (3.18) is invertible, we have

$$\delta u = {{\Im }_{1}}\Xi + {{\Im }_{2}}\varepsilon ,$$
((3.19))

where \({{\Im }_{i}} = \)–1\({{\Re }_{i}},i = 1,2.\) The latter equation can be used to construct covariance operator P for the optimal solution errors: \(P = E[\delta u{{(\delta u)}^{T}}].\) From (3.19), we obtain the following

$$P = {{\Im }_{1}}{{V}_{\Xi }}\Im _{1}^{*} + {{\Im }_{2}}R\Im _{2}^{*},$$
((3.20))

where \({{V}_{\Xi }} = E[\Xi {{\Xi }^{T}}] = \left( {\begin{array}{*{20}{c}} B&0 \\ 0&Q \end{array}} \right)\), assuming that errors \({{\varepsilon }_{b}},\varepsilon ,\xi \) are uncorrelated. Following the scheme of the proof of (2.20)–(2.25), it can be shown that

$$\begin{gathered} {{\Im }_{1}}{{V}_{\Xi }}\Im _{1}^{*} = {{\mathfrak{H}}^{{ - 1}}}V_{\Xi }^{{ - 1}}{{\mathfrak{H}}^{{ - 1}}}, \\ {{\Im }_{2}}R\Im _{2}^{*} = {{\mathfrak{H}}^{{ - 1}}}\left( {\mathfrak{H} - V_{\Xi }^{{ - 1}}} \right){{\mathfrak{H}}^{{ - 1}}}, \\ \end{gathered} $$
((3.21))

where ℌ is the Hessian defined by formulas (3.15)(3.17). Then, from (3.21) we conclude:

$$P = {{\mathfrak{H}}^{{ - 1}}}.$$
((3.22))

Thus, we obtain a result similar to (2.26) only with a different operator ℌ.

Operator ℌ, defined by formulas (3.15)(3.17), can be written in matrix form as follows:

where Hij are combinations of derivatives with respect to \(x\) and \(f\). Thus, the dimension of ℌ in this case increases by an order of magnitude compared with (2.26), because in problem (3.2) it is necessary to find not only the function of the initial condition \({{x}_{0}}\) but also the right-hand side \(f\), which depends on time and space variables. Other ways of accounting for model errors using reduction in the dimension of the problem were considered in [119121, 124, 125].

4 COMPARATIVE ANALYSIS OF 4D-VAR AND ENSEMBLE KALMAN FILTER

As follows from the review presented above, the new generations of assimilation schemes are based on 4D-VAR data assimilation (4D-VAR) and the ensemble Kalman filter (EnKF). Each of these modern approaches has its advantages and disadvantages, and quite a lot of work has been devoted to their comparative analysis (see, for example, [9399]).

In the case of a linear model, a linear observation operator, and Gaussian errors, the 4D-VAR methods and the Kalman filter give identical results at the end of the assimilation “window” if model errors are not taken into account [27]. The EnKF method approximates the Kalman filter well [36] under the same assumptions and with a sufficiently large number of elements of the ensemble. Nonlinearities of the model and the observation operator (and, as a consequence, the non-Gaussianity of errors) are a potential cause of the discrepancy in the results when using 4D-VAR and EnKF [95]. If the errors of observations and initial approximation remain Gaussian and the dynamics model is nonlinear, the 4D-VAR method gives an estimate of maximum likelihood—the mode of the distribution function of a posteriori conditional probability [126]. At the same time, in general, it is not clear how the search for such a mode is associated with the result of the EnKF method [95].

In most problems of geophysical hydrodynamics, the dimension of the state vector of the system is so large that it is necessary to find a compromise between computational capabilities and theoretically optimal approaches. For example, the EnKF method has sampling errors due to the limited size of the ensemble, and it is necessary to search for approximations of the initial approximation covariance matrices in the 4D‑VAR method due to its large dimension, which also leads to errors that are difficult to estimate using comparative analysis.

As we saw above, 4D-VAR data assimilation (4D‑VAR) in form (1.17)–(1.19) uses direct and adjoint models to estimate the state of the system, which reproduces the observed data as accurately as possible at a given time interval in the sense of minimizing cost functional (1.15). It should be noted that problems (1.16) and (1.17)–(1.19) are solved at once over the entire time interval (0, T) in the 4D-VAR method.

The EnKF method assimilates observations sequentially, unlike 4D-VAR. This method requires the ensemble of state vector \(x_{i}^{f}\) from the previous step \({{t}_{{i - 1}}}\) for given observations \(y_{i}^{0}\) at time \({{t}_{i}}\). The EnKF method consists of constructing a correction for expectation (mean over the ensemble) \(\bar {x}_{i}^{f}\) according to the following formula:

$$\bar {x}_{i}^{a} = \bar {x}_{i}^{f} + \tilde {K}\left( {y_{i}^{0} - H\left( {\bar {x}_{i}^{f}} \right)} \right),$$
((4.1))

where \(\bar {x}_{i}^{a}\) is the ensemble mean state estimate (analysis) at time \({{t}_{i}}\), \(H\) is the observation operator, and \(\tilde {K}\) is the generalization of the Kalman matrix (gain matrix):

$$K = P_{i}^{f}H{\text{*}}{{\left( {HP_{i}^{f}H{\text{*}} + R} \right)}^{{ - 1}}},$$
((4.2))

where \(P_{i}^{f}\) is the covariance matrix of state errors at time \({{t}_{i}}\). The covariance matrices in definition (4.2) are replaced by the covariance matrices of the sample based on the ensemble to obtain \(\tilde {K}\). Thus, the EnKF method constructs corrections for \(\bar {x}_{i}^{f}\) taking into account the uncertainties in the observational data \(y_{i}^{0}\). This scheme gives the state ensemble at time \({{t}_{i}}\), which later serves as the initial condition for the ensemble at time \({{t}_{{i + 1}}}\).

Thus, the difference in the mentioned approaches is laid down already in very formulations (1.16) and (4.1)–(4.2): the 4D-VAR method minimizes the functional \(J(x)\) at once over the entire time interval \((0,T)\), while the EnKF method assimilates the observations sequentially at each specific time. Unlike 4D-VAR, the covariance analysis matrix \(P_{i}^{f}\) plays a key role in assimilation by the EnKF method, where the estimates of these matrices are refined at each time step.

The processing of the covariance matrices in the EnKF method becomes a serious computational problem if dimensions of the state vector are large. The use of a limited number of ensemble elements leads to a deterioration in the approximation of the Kalman filter. On the other hand, in the 4D-VAR method, it is necessary to construct and solve linearized direct and adjoint problems using iterative gradient methods, which is often a big problem for complex geophysical models. Thus, the construction of an adjoint model for the well-known NEMOVAR data assimilation system took many hours [73].

Numerical comparisons of 4D-VAR and EnKF [9399] showed that these methods often give similar results. The EnKF provides more accurate results for small time intervals. 4D-VAR leads to smaller errors than EnKF for observations with gaps in the data, when ensemble perturbations grow nonlinearly and become non-Gaussian [94]. However, EnKF is preferable from the point of view of parallelization of computations, because computations for each member of the ensemble can be carried out independently [99].

Errors in the models describing real physical systems, such as the atmosphere and the ocean, occur due to the inaccurate forcing (of the right side or boundary conditions), the parametrization of subgrid processes, low resolution, etc. Errors can be systematic and random, as well as errors of model parameters or physical parameterizations. In this case, 4D-VAR is used in the weak formulation (weak constraint) [116] described in Section 3. On the one hand, this approach places high demands on computing systems due to the high dimension of the state vector of the system. It reaches ~109 in modern numerical weather forecast models. On the other hand, this approach can improve the accuracy of the forecast and increase the “window” of data assimilation by considering the extended cost functional in form (3.1).

In the EnKF method, a weak formulation appears naturally by adding errors of the model, the estimation of which is also refined at each step to the formulation of the problem of the covariance matrix [93]. This indicates the need for a further comparison of EnKF and 4D-VAR methods in a weak formulation.

A broad discussion of the comparison of 4D-VAR and EnKF [93–99] methods concluded in the recognition of the need to develop data assimilation approaches combining the best features of 4D-VAR and EnKF [103]. This is how the Hybrid 4DVar approach, which combines the ensemble Kalman filter and variational data assimilation [100–104], and the ensemble method of 4D-VAR data assimilation 4DEnVar [105110] appeared.

5 CONCLUSIONS

In this paper, we reviewed and analyzed methods for solving data assimilation problems developed in recent decades. The development of data assimilation systems began with meteorology and was dictated by the need to improve weather forecasts. These methods are increasingly used in oceanography and other areas, in addition to modern complex meteorological data assimilation systems.

Qualitative changes in measurement systems occur along with the progress in solving data assimilation problems. Recent years have been marked by a continuous increase in the number of measurements of various characteristics of our geosystem. Therefore, the development of technologies for solving data assimilation problems based on modern approaches and taking into account recent advances in this direction is urgent.

The most modern and effective methods are the variational data assimilation. Thus, special attention should be paid to research in the field of a numerical solution of the problems of variational observation data assimilation for the problems of the dynamics of oceans and seas. The weak formulation of the problem of variational data assimilation makes it possible to take into account possible errors of the model and thus leads to a more accurate solution of the problem. The algorithms formulated in Sections 2 and 3 can be used to calculate the covariance matrix of the errors of the optimal solution and the individual contributions to it associated with the errors of the input data. These algorithms make it possible to investigate the sensitivity of optimal solutions of problems of variational data assimilation using the cost functional Hessian.