Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

12.1 Introduction

Four-dimensional variational data assimilation (4D-Var) is an estimation technique which finds a model state x(t 0), at initial time t 0, that minimizes a quadratic objective function, the sum of the distance between the initial state \(\mathbf{x}(t_{0}) \in {R}^{n}\) and a prior estimate (the so-called background field) \({\mathbf{x}}^{b} \in {R}^{n}\), and the distance between a real-valued vector of observations y ∈ R m and measurements \(\mathcal{H}(\mathbf{x})\) of the trajectory x(t) obtained by integration of a dynamical model from x(t 0). The objective function \(\mathcal{J}\) is written

$$\begin{array}{ll} \mathcal{J} [\mathbf{x}(t_{0})]\,=\,&{(\mathbf{x}(t_{0}) -{\mathbf{x}}^{b})}^{T}{\mathbf{B}}^{-1}(\mathbf{x}(t_{0}) -{\mathbf{x}}^{b}) \\ & + {[\mathbf{y} -\mathcal{H}(\mathbf{x})]}^{T}{\mathbf{R}}^{-1}[\mathbf{y} -\mathcal{H}(\mathbf{x})], \end{array}$$
(12.1)

where B and R are estimates of the background and observation error covariance matrices, respectively, and the observations, \(\mathbf{y} =\{ y_{i}\}_{i=1}^{m}\), are nonlinear functions of the initial state,

$$y_{i} = \mathcal{H}_{i}[\mathcal{M}(t_{i},t_{0})\mathbf{x}(t_{0})] +\delta _{i}.$$
(12.2)

Here we assume that \(\mathcal{M}(t_{i},t_{0})\) propagates the model state from t 0 to t i , \(\mathcal{H}_{i}\) is the i − th observation operator, and δ i is the observation error. Note that if the initial condition and observation errors are Gaussian distributed with covariances B and R, if the observation errors are unbiased, and if the background field x b is equal to the statistical mean of x(t 0), then the minimizer of \(\mathcal{J}\) is the maximum likelihood estimate of x(t 0).

In addition to errors in the initial conditions, it is clear that oceanic and atmospheric models contain other sources of error which must be considered. Specifically, there are errors in model inhomogeneities such as boundary conditions and radiative forcing. Weak-constraint four-dimensional variational data assimilation (W4D-Var) is a generalization of 4D-Var which permits one to estimate these additional inhomogeneities, denoted f. Assuming that prior or background values of the forcing fields are available, f b, then the above objective function naturally generalizes to

$$\begin{array}{ll} \mathcal{J} [\mathbf{x}(t_{0}),\mathbf{f}]\,=\,&{(\mathbf{f} -{\mathbf{f}}^{b})}^{T}{\mathbf{F}}^{-1}(\mathbf{f} -{\mathbf{f}}^{b}) \\ & + {(\mathbf{x}(t_{0}) -{\mathbf{x}}^{b})}^{T}{\mathbf{B}}^{-1}(\mathbf{x}(t_{0}) -{\mathbf{x}}^{b}) \\ & + [(\mathbf{y} -\mathcal{H}(\mathbf{x}){]}^{T}{\mathbf{R}}^{-1}[(\mathbf{y} -\mathcal{H}(\mathbf{x})],\end{array}$$
(12.3)

where it should be understood that the model propagator \(\mathcal{M}\) now depends on both the space-time-dependent inhomogeneities, f, and the initial conditions, x(t 0).

In the incremental formulation (Courtier et al. 1994), the dynamics and measurement operators are linearized around a background trajectory \(\overline{\mathbf{x}}\), and an incremental objective function is defined in terms of \(\delta \mathbf{x} = \mathbf{x} -\overline{\mathbf{x}}\). Of course, if the model dynamics and observation operator are linear, the extremum of the incremental objective function corresponds to an extremum of the original objective function. When nonlinearity is present, the incremental objective function is used to build an iterative solver for the original, nonlinear, data assimilation problem. In this article we assume that some linearization strategy has been selected, e.g., the tangent linearization proposed in Courtier et al. (1994) or the bounded iterate strategy of Bennett and Thorburn (1992), so that the so-called inner loop solver must minimize a strictly quadratic objective function. Henceforth, we shall restrict our attention to the objective function,

$$\begin{array}{ll} \mathcal{J} [\mathbf{x}(t_{0}),\mathbf{f}]\,=\,&{(\mathbf{f} -{\mathbf{f}}^{b})}^{T}{\mathbf{F}}^{-1}(\mathbf{f} -{\mathbf{f}}^{b}) \\ & + {(\mathbf{x}(t_{0}) -{\mathbf{x}}^{b})}^{T}{\mathbf{B}}^{-1}(\mathbf{x}(t_{0}) -{\mathbf{x}}^{b}) \\ & + {(\mathbf{y} -\mathbf{H}\mathbf{x})}^{T}{\mathbf{R}}^{-1}(\mathbf{y} -\mathbf{H}\mathbf{x}),\end{array}$$
(12.4)

where the matrix H ∈ R m ×n is a linear approximation to the operator \(\mathcal{H}\), and inhomogeneities resulting from the linearization have been absorbed into x b, f b, and y.

There are practical considerations which make the implementation of W4D-Var considerably more complex than 4D-Var for realistic models. The first issue is the dimensionality of the unknown vectors, which has consequences for the design and implementation of solvers for minimizing \(\mathcal{J}\). Assuming the state vector x(t) is of dimension n, then the model forcing f may be as large as T ×n, where T is the cardinality of the time interval under consideration. The dimension of the space-time covariance matrix F is formally the square of this. The second key issue is scientific, and relates to the determination of the error covariances B and F. Quantitative estimation of these objects requires vast amounts of data which are rarely available; in practice they are often parameterized in terms of a spatially- or temporally-varying variance function, and a set of correlation scales for the orthogonal coordinate directions.

Here we review recent developments associated with the application of representer-based solvers (Bennett 1992) to 4D-Var and W4D-Var problems, an approach which is the foundation for the so-called dual form of variational data assimilation (Courtier 1997). Recall that the minimizer of the objective function is the solution to \(\frac{1} {2}\nabla \mathcal{J} (\mathbf{x}) = 0\); applied it to (12.1) yields,

$$({\mathbf{B}}^{-1} +{ \mathbf{H}}^{T}{\mathbf{R}}^{-1}\mathbf{H})\mathbf{x} ={ \mathbf{H}}^{T}{\mathbf{R}}^{-1}\mathbf{y} +{ \mathbf{B}}^{-1}{\mathbf{x}}^{b},$$
(12.5)

where uniqueness is assured provided that B is of full rank. Equivalently, the solution can be expressed as the sum of the background and a linear combination of representer functions \(\mathbf{x} ={ \mathbf{x}}^{b} + \mathbf{B}{\mathbf{H}}^{T}\hat{\mathbf{x}}\), yielding the equation for the dual variables \(\hat{\mathbf{x}}\),

$$(\mathbf{H}\mathbf{B}{\mathbf{H}}^{T} + \mathbf{R})\hat{\mathbf{x}} = \mathbf{y} -\mathbf{H}{\mathbf{x}}^{b}.$$
(12.6)

In this dual formulation the unknown vector \(\hat{\mathbf{x}}\) lies in R m, whereas x lies in R n. Also, the expansion in terms of representer functions is valid even in the continuum limit of the discretized dynamics, in which case (12.5) become the Euler-Lagrange equations for the extremum of the objective functional. The columns of the BH T matrix, which are approximations to the representer functions in the continuum limit, span the space of observable increments; i.e., they are exactly the m degrees of freedom which are determined by the measurements (Bennett 1992).

The dual formulation and representer expansion have by now been utilized in many data assimilative modeling studies of the ocean and atmosphere. Because the dimension of the vector of unknowns is m in either case of 4D-Var or W4D-Var, there is no intrinsic limitation of the method in the latter case. In order to fix the notation so that a single system describes both 4D-Var and W4D-Var, consider the following augmented vectors and covariance matrices:

$$\displaystyle\begin{array}{rcl} \mathbf{x}' = \left (\begin{array}{c} \mathbf{x}(t_{0})\\ \mathbf{f} \end{array} \right ),\quad \mathbf{B}' = \left (\begin{array}{ll} \mathbf{B}&0\\ 0 &\mathbf{F} \end{array} \right ),\quad \mathbf{H}' = \left (\begin{array}{l} \mathbf{H}\\ 0\end{array} \right ),\quad \mathbf{R}' = \mathbf{R},\quad \mathbf{y}' = \mathbf{y}.& &\end{array}$$
(12.7)

Henceforth, we drop primes and simply write the objective function as

$$\begin{array}{ll} \mathcal{J} [\mathbf{x}] =&{(\mathbf{x} -{\mathbf{x}}^{b})}^{T}{\mathbf{B}}^{-1}(\mathbf{x} -{\mathbf{x}}^{b}) \\ & + {(\mathbf{y} -\mathbf{H}\mathbf{x})}^{T}{\mathbf{R}}^{-1}(\mathbf{y} -\mathbf{H}\mathbf{x}),\end{array}$$
(12.8)

noting that the extremal conditions (12.5) and dual formulation (12.6) are formally unchanged.

Recent advances for representer-based variational assimilation have been connected with technologies for solving (12.6), e.g., preconditioners and iterative solvers, and with developing justifiable error models for the background and model forcing errors, B and F.

In the next section, recent technological developments for solving (12.6) are discussed, and we share our experience concerning the primal and dual forms of the variational data assimilation algorithms, as has been the focus of recent papers (El Akkraoui and Gauthier 2010; El Akkraoui et al. 2008; Gratton and Tshimanga 2009). Following that, recent work on covariance modeling is described. The latter developments are not unique to representer-based approaches.

12.2 Solver Improvements

Several considerations have led to improvements in representer-based solvers for variational data assimilation.

First, it has been noted that iterative solvers for (12.6) may yield a non-monotonic sequence of \(\mathcal{J} (\mathbf{x}_{p})\) values, where x p represents the approximate solution at step p of the iterative solver (El Akkraoui et al. 2008). This phenomenon has been observed with the Physical-space Statistical Analysis System (PSAS, Cohn et al. 1998), which employs the conjugate-gradient algorithm applied to (12.6) using \({\mathbf{R}}^{-1/2}\) as preconditioner, and it was also displayed in Zaron (2006) with a non-preconditioned solver. The non-monotonic reduction in the value of the objective function makes it problematic to establish an acceptable stopping criteria for the iterative solver. In spite of the fact that m < < n, data sets are frequently large enough that executing full set of m iterations, the worst-case iteration count for conjugate-gradient-type linear solvers in exact arithmetic, is prohibitive.

Another issue which arises in practice is that the huge condition number of the covariance matrices and asymmetry of the linearized model and its approximate adjoint may cause R + HBH T to be non-positive-definite symmetric. Experience with idealized problems, where the operators can be explicitly constructed as matrices, shows that the lack of monotonic convergence discussed in the previous paragraph is exacerbated by symmetry errors and lack of positive-definiteness in the HBH T matrix.

A final consideration in the development of new solvers is the availability of diagnostic data to assess the progress of the iteration or to evaluate the quality of the state variable which is obtained.

Recent experience has shown that the generalized conjugate residual (GCR) method (de Sturler 19941996) addresses all the above-mentioned points. GCR is a general-purpose Krylov method for solving non-symmetric systems, Ax = b, which builds matrices U and C in R p ×m such that AU = C. The columns of both U and C are in the span of the Krylov subspace \(K = Span\{\mathbf{b},\mathbf{A}\mathbf{b},\ldots ,{\mathbf{A}}^{p-1}\mathbf{b}\}\), and C is orthogonal, such that C T C = I. The GCR algorithm shown in Fig. 12.1 computes x p  ∈ K to minimize ∥ Ax p  − b ∥ 2, which is similar to the minimum residual algorithm suggested by El Akkraoui and Gauthier (2010). Although the GCR algorithm can fail when either the residual is orthogonal to the Krylov subspace or when b is an eigenvector of A p, neither of these situations has occurred in practice.

Fig. 12.1
figure 1

The GCR algorithm for solving Ax = b

Figure 12.2 shows the progress of \(\mathcal{J} (\mathbf{x}_{p})\) for a data-assimilative three-dimensional ocean model with approximately \(n = 400 \times 300 \times 30 \times 5 = 18 \times 1{0}^{6}\) state variables and m = 17 ×104 observations (see Zaron et al. 2009 for a similar application in a smaller computational domain). The figure shows that the decrease in cost function is not monotonic, and increases can occur. This behavior does not occur in smaller, exactly symmetric problems, and the working hypothesis is that the non-monotonicity is caused by asymmetry or lack of positive-definiteness in either the adjoint model or background covariance. Pointwise tests of the symmetry of B and HBH T indicate that the former is symmetric to machine precision, while the latter contains symmetry errors of 10 % of the diagonal elements. The computational cost of evaluating Ax is approximately 100 cpu-hours, so there is a substantial need for computational efficiency.

Fig. 12.2
figure 2

Reduction of \(\mathcal{J} (\mathbf{x})\) using GCR. The performance of the GCR solver as measured by the value of the objective function for an ocean data assimilation problem is shown. \(\mathcal{J} (\mathbf{x}_{p})\) is computed using (12.14) and (12.15) in the text. The application involves the assimilation of satellite altimetry data into a three dimensional primitive equations ocean model encompassing the Hawaiian Ridge, with the goal of estimating the tidal circulation around the Ridge

Further diagnostic information is available from the GCR iterates as well. Qualitative assessment of the solution in the state space is available since the solution x p is computed at each iterate. Because AU = C, with C orthogonal, the singular values λ(U) of U approximate the singular values of A  − 1 (Golub and Van Loan 1989). Knowledge of the singular spectrum and orthogonal decomposition of U may be used to better precondition subsequent outer iterations (Giraud et al. 2006; Parks et al. 2006).

Assuming the observation error is uncorrelated and constant, R = σ I, one can approximate the singular spectrum of the so-called representer matrix \(\mathcal{R} = \mathbf{H}\mathbf{B}{\mathbf{H}}^{T}\) (Bennett 1992) with \(\lambda (\mathcal{R}) \approx \lambda {(\mathbf{U})}^{-1}-\sigma\). Here the notation \(\lambda (\mathbf{U}) =\{\lambda _{i}(\mathbf{U})\}_{i=1}^{p}\) denotes the ordered singular spectrum, the set of nonzero singular values of the matrix U ∈ R m ×p, where λ i + 1(U) ≤ λ i (U) and p ≤ m are assumed, and the inverse of the singular spectrum λ(U) − 1 is defined as the set of reciprocals of the singular values. This singular spectrum is useful when assessing the observing array or covariance model, since it establishes a criterion for counting the number of degrees of freedom effectively constrained by the data (Bennett 19851992). When the observation error is not a constant it is advantageous to transform with the change of variables, \(\hat{\mathbf{v}} ={ \mathbf{R}}^{-1/2}\hat{\mathbf{x}}\).

The singular spectrum can be used to develop a stopping criterion for the iterative solver in terms of the predicted percent of variance explained. Recall that the representer matrix \(\mathcal{R}\) can be interpreted as a covariance matrix, the trace of which is the total amount of variance expected in the observations exclusive of measurement noise (Bennett 2002). Recall also, that the degrees of freedom associated with singular vectors may be classified as either smoothed or interpolated by the data assimilation, according to whether \(\lambda _{i}(\mathcal{R}) <\sigma\) or \(\lambda _{i}(\mathcal{R}) >\sigma\), respectively (Bennett 2002). Let k denote the mode number with the singular value comparable to the measurement error, e.g., \(\lambda _{k}(\mathcal{R}) >\sigma \geq \lambda _{k+1}(\mathcal{R})\), then

$$S =\displaystyle\sum _{ i=1}^{k}\lambda _{ i}(\mathcal{R})$$
(12.9)

is the expected total observed variance explainable by the given data assimilation system. In practice \(\lambda (\mathcal{R})\) is not known exactly, but its approximation \(\hat{\lambda }(\mathcal{R}) =\lambda {(\mathbf{U})}^{-1}-\sigma\) is available from the orthogonal decomposition of U. An approximation to S can be made by extrapolating \(\hat{\lambda }(\mathcal{R})\) out to i = k. Letting \({\hat{\lambda }}^{e}(\mathcal{R})\) denote this approximate spectrum, then the fraction of S explained by stopping at iterate p may be estimated as

$$f = \left (\displaystyle\sum _{i=1}^{p}\hat{\lambda }_{ i}(\mathcal{R})\right ){\left (\displaystyle\sum _{i=1}^{k}\hat{\lambda }_{ i}^{e}(\mathcal{R})\right )}^{-1}.$$
(12.10)

Figure 12.3 shows an application of these ideas with the data-assimilative ocean model described in Zaron et al. (2009). The estimated spectrum \(\hat{\lambda }(\mathcal{R})\) is computed for iterates p = 10, 20, 40 (gray) and for the final iterate p = 58 (black). The extrapolated spectrum \({\hat{\lambda }}^{e}(\mathcal{R})\) is computed from a power-law fit to the middle 50 % of the singular values, and one sees that the extrapolated spectrum and data error variance intersect at approximately k = 200; thus, one expects approximately 142 additional iterates would be necessary to minimize \(\mathcal{J} (\mathbf{x})\). Applying (12.10) to compute the fraction of variance explained, one finds f = 88 %. In other words, the solution obtained by stopping the solver at p = 58 accounts for 88 of the explainable observed variance. Note that the variance associated with modes p > k is un-explainable with the covariance model B, and it is ascribed to observation error. While the details are certainly problem-dependent, we have found that \(\hat{\lambda }(\mathcal{R})\) adequately approximates the true spectrum when judged against the uncertainty in B. Experience with idealized, low-dimensional, data assimilation problems suggests that these methods are applicable in realistic systems, where complete knowledge of the spectra cannot be obtained.

Fig. 12.3
figure 3

Spectral Diagnostics from GCR. The estimated spectrum \(\hat{\lambda }(\mathcal{R})\) of the representer matrix \(\mathcal{R} = \mathbf{H}\mathbf{B}{\mathbf{H}}^{T}\) is shown by the dark solid line corresponding to the last GCR iterate (p = 58) in Fig. 12.2. Solid gray lines show \(\hat{\lambda }(\mathcal{R})\) based on iterates p = 10, 20, and 40, for comparison. The data variance is σ, where R = σ I. The extrapolated spectrum is computed from a linear fit to \((log(i),log(\lambda _{i}(\mathcal{R})))\) in the range \(p/4 \leq i \leq 3p/4\)

Finally, the two components of \(\mathcal{J} (\mathbf{x}_{p})\) due to the background and observations may be obtained as diagnostic information from the GCR iterates. Substituting \(\mathbf{x}_{p} = \mathbf{B}{\mathbf{H}}^{T}\hat{\mathbf{x}}_{p}\) in (12.4), one obtains

$$\begin{array}{ll} \mathcal{J} (\hat{\mathbf{x}}_{p})& = {\mathcal{J}}^{B}(\hat{\mathbf{x}}_{p}) + {\mathcal{J}}^{R}(\hat{\mathbf{x}}_{p}) \\ & =\hat{ \mathbf{x}}_{p}^{T}\mathbf{H}\mathbf{B}{\mathbf{H}}^{T}\hat{\mathbf{x}}_{p} \\ &\quad + {(\mathbf{H}\mathbf{B}{\mathbf{H}}^{T}\hat{\mathbf{x}}_{p} -\mathbf{y})}^{T}{\mathbf{R}}^{-1}(\mathbf{H}\mathbf{B}{\mathbf{H}}^{T}\hat{\mathbf{x}}_{p} -\mathbf{y}).\end{array}$$
(12.11)

Because the GCR solver computes the residual r p at each iterate, one has

$$(\mathbf{H}\mathbf{B}{\mathbf{H}}^{T} + \mathbf{R})\hat{\mathbf{x}}_{ p} = \mathbf{y} -\mathbf{r}_{p}.$$
(12.12)

Assuming that \(\mathbf{R}\hat{\mathbf{x}}_{p}\) can be computed on demand, then

$$\mathbf{H}\mathbf{B}{\mathbf{H}}^{T}\hat{\mathbf{x}}_{ p} = \mathbf{y} -\mathbf{r}_{p} -\mathbf{R}\hat{\mathbf{x}}_{p},$$
(12.13)

and all terms in the expression for the objective function are computable. The contribution from the background term is

$${\mathcal{J}}^{B}(\mathbf{x}_{ p}) = {(\hat{\mathbf{x}}_{p})}^{T}(\mathbf{y} -\mathbf{r}_{ p} -\mathbf{R}\hat{\mathbf{x}}_{p}),$$
(12.14)

while the contribution from the observations is

$${\mathcal{J}}^{R}(\mathbf{x}_{ p}) = {(\mathbf{r}_{p} + \mathbf{R}\hat{\mathbf{x}}_{p})}^{T}{\mathbf{R}}^{-1}(\mathbf{r}_{ p} + \mathbf{R}\hat{\mathbf{x}}_{p}).$$
(12.15)

In summary, the GCR algorithm has been found useful for data assimilation solvers based on the representer expansion. Being applicable to non-symmetric linear systems, the solver is more tolerant of symmetry errors in the adjoint model, such as are present when the continuous adjoint equations are discretized. The GCR solver is currently being used for a variety of weak-constraint ocean data assimilation problems, and it has been implemented within the IOM data assimilation software system (Bennett et al. 2008; Muccino et al. 2008).

12.3 Diagnosis of Error Variances

The preceding analysis of the solver performance and interpretation in terms of explained variance is contingent upon having correct descriptions of the model and observation error covariances. Validation of B and R is thus of paramount importance. This section outlines the posterior diagnosis strategy of Desroziers and Ivanov (2001) for validating the errors B and R, with application to a large-scale operational weather analysis system, the Naval Research Laboratory Atmospheric Variational Data Assimilation System-Accelerated Representer, or (NAVDAS-AR; Xu et al. 2005; Rosmond and Xu 2006).

12.3.1 Notation and Background Materials

First, recall some established results using the notation employed here. It may be shown (Lorenc 1986) that the analysis x a, the minimizer of the objective function (12.8), is given by

$${ \mathbf{x}}^{a} ={ \mathbf{x}}^{b} + \mathbf{K}(\mathbf{y} -\mathbf{H}{\mathbf{x}}^{b}),$$
(12.16)

where K denotes the so-called Kalman gain,

$$\mathbf{K} = \mathbf{B}{\mathbf{H}}^{T}{(\mathbf{H}\mathbf{B}{\mathbf{H}}^{T} + \mathbf{R})}^{-1}.$$
(12.17)

At this optimum, the value of the objective function \(\mathcal{J}\) is given by Bennett (1992),

$$\mathcal{J} ({\mathbf{x}}^{a}) ={ \mathbf{d}}^{T}{\mathbf{D}}^{-1}\mathbf{d},$$
(12.18)

where \(\mathbf{D} = \mathbf{H}\mathbf{B}{\mathbf{H}}^{T} + \mathbf{R}\) denotes the stabilized representer matrix, and \(\mathbf{d} = \mathbf{y} -\mathbf{H}{\mathbf{x}}^{b}\) denotes the innovation vector. If the background and observation errors are correctly modeled by B and R, it may be shown that the minimum value of \(\mathcal{J}\) is a chi-squared random variable with m degrees of freedom (Bennett 1992),

$$E\{\mathcal{J} ({\mathbf{x}}^{a})\} = E\{\chi _{ m}\} = m,$$
(12.19)

where it is recalled that m is the number of observations, and E{} denotes the expected value of its argument. Furthermore, Bennett et al. (2000) notes that the expected values of parts \({\mathcal{J}}^{B}\) and \({\mathcal{J}}^{R}\) of the objective function \(\mathcal{J}\) are

$$E\{{\mathcal{J}}^{B}({\mathbf{x}}^{a})\} = \mathit{Tr}(\mathbf{H}\mathbf{B}{\mathbf{H}}^{T}{\mathbf{D}}^{-1}),$$
(12.20)

and

$$E\{{\mathcal{J}}^{R}({\mathbf{x}}^{a})\} = \mathit{Tr}(\mathbf{R}{\mathbf{D}}^{-1}),$$
(12.21)

where Tr(A) denotes the trace of the matrix argument A. These results may be further specialized to compute the expected value of subsets of terms in \({\mathcal{J}}^{B}\) and \({\mathcal{J}}^{R}\) (Talagrand 1999; Desroziers and Ivanov 2001). Define \(\boldsymbol{\Pi }_{l}^{B}\) as a projection operator such that \(\mathbf{x}_{l} =\boldsymbol{ \Pi }_{l}^{B}\mathbf{x}\), then the expected value of \(\mathcal{J}_{l}^{B}\) associated with x l a is given by Desroziers and Ivanov (2001)

$$E\{\mathcal{J}_{l}^{B}({\mathbf{x}}^{a})\} = \mathit{Tr}(\boldsymbol{\Pi }_{ l}^{B}\mathbf{H}\mathbf{B}{\mathbf{H}}^{T}{\mathbf{D}}^{-1}{\boldsymbol{\Pi }_{ l}^{B}}^{T}).$$
(12.22)

Likewise, define the projection operator \(\boldsymbol{\Pi }_{k}^{R}\) so that \(\mathbf{y}_{k} =\boldsymbol{ \Pi }_{k}^{R}\mathbf{y}\), then the expected value for \(\mathcal{J}_{k}^{R}\) of \({\mathcal{J}}^{R}\) is

$$E\{\mathcal{J}_{k}^{R}({\mathbf{x}}^{a})\} = \mathit{Tr}(\boldsymbol{\Pi }_{ k}^{R}\mathbf{R}{\mathbf{D}}^{-1}{\boldsymbol{\Pi }_{ k}^{R}}^{T}).$$
(12.23)

12.3.2 Validation of Error Variances by Posterior Diagnosis

Desroziers and Ivanov (2001) utilize the above relations (12.22) and (12.23) to validate the error variances in the objective function based on the posterior diagnosis of the assimilation system. They demonstrate how to produce realistic error variances for simulated observations in a cost-effective manner. This approach was further evaluated and developed by Chapnik et al. (20042006) and Sadiki and Fischer (2005) for operational data assimilation systems. Following Chapnik et al. (2004), the objective function (12.8) is rewritten as

$$\mathcal{J} (\mathbf{x}) =\displaystyle\sum _{ l=1}^{{\nu }^{B} }{ \mathcal{J}_{l}^{B}(\mathbf{x}) \over \mathbf{s}_{l}^{B}} +\displaystyle\sum _{ k=1}^{{\nu }^{R} }{ \mathcal{J}_{k}^{R}(\mathbf{x}) \over \mathbf{s}_{k}^{R}} ,$$
(12.24)

where s l B and s k R are scalar tuning parameters for the ν B and ν R components of the background and the observations, respectively. The analysis x a(s) is now a function of the tuning parameter vector \(\mathbf{s} = (\mathbf{s}_{l}^{B},\mathbf{s}_{k}^{R})\) (Chapnik et al. 2004),

$${ \mathbf{x}}^{a}(\mathbf{s}) ={ \mathbf{x}}^{b} + \mathbf{K}(\mathbf{s})(\mathbf{y} -\mathbf{H}{\mathbf{x}}^{b}),$$
(12.25)

where the tuned Kalman gain, K(s), takes the form

$$\mathbf{K}(\mathbf{s}) = \mathbf{B}(\mathbf{s}){\mathbf{H}}^{T}{[\mathbf{H}\mathbf{B}(\mathbf{s}){\mathbf{H}}^{T} + \mathbf{R}(\mathbf{s})]}^{-1} = \mathbf{B}(\mathbf{s}){\mathbf{H}}^{T}\mathbf{D}{(\mathbf{s})}^{-1},$$
(12.26)

with \(\mathbf{B}(\mathbf{s}) =\sum _{ l=1}^{{\nu }^{B} }\mathbf{s}_{l}^{B}{\boldsymbol{\Pi }_{l}^{B}}^{T}\mathbf{B}_{l}\boldsymbol{\Pi }_{l}^{B}\) and \(\mathbf{R}(\mathbf{s}) =\sum _{ k=1}^{{\nu }^{R} }\mathbf{s}_{k}^{R}{\boldsymbol{\Pi }_{k}^{R}}^{T}\mathbf{R}_{k}\boldsymbol{\Pi }_{k}^{R}\). The reduced values for the sub-parts \(\mathcal{J}_{l}^{B}\) and \(\mathcal{J}_{k}^{R}\) of the objective function \(\mathcal{J} (\mathbf{s})\) are

$$\mathcal{J}_{l}^{B}({\mathbf{x}}^{a}(\mathbf{s})) ={ \mathbf{d}}^{T}{\mathbf{D}}^{-1}\mathbf{H}{\boldsymbol{\Pi }_{ l}^{B}}^{T}\mathbf{B}(\mathbf{s})\boldsymbol{\Pi }_{ l}^{B}{\mathbf{H}}^{T}{\mathbf{D}}^{-1}\mathbf{d},$$
(12.27)

with expected value

$$E\{\mathcal{J}_{l}^{B}({\mathbf{x}}^{a}(\mathbf{s}))\} = \mathbf{s}_{ l}^{B}\mathit{Tr}[\boldsymbol{\Pi }_{ l}^{B}\mathbf{H}\mathbf{B}(\mathbf{s}){\mathbf{H}}^{T}\mathbf{D}{(\mathbf{s})}^{-1}{\boldsymbol{\Pi }_{ l}^{B}}^{T}],$$
(12.28)

and

$$\begin{array}{ll} \mathcal{J}_{k}^{R}({\mathbf{x}}^{a}(\mathbf{s}))& = {[\boldsymbol{\Pi }_{k}^{R}(\mathbf{y} -\mathbf{H}{\mathbf{x}}^{a}(\mathbf{s}))]}^{T}\mathbf{R}{(s)}^{-1}[\boldsymbol{\Pi }_{k}^{R}(\mathbf{y} -\mathbf{H}{\mathbf{x}}^{a}(\mathbf{s}))] \\ & ={ \mathbf{d}}^{T}\mathbf{D}{(\mathbf{s})}^{-1}{\boldsymbol{\Pi }_{k}^{R}}^{T}\mathbf{R}(s)\boldsymbol{\Pi }_{k}^{R}\mathbf{D}{(\mathbf{s})}^{-1}\mathbf{d}, \end{array}$$
(12.29)

with expected value

$$E\{\mathcal{J}_{k}^{R}({\mathbf{x}}^{a}(\mathbf{s}))\} = \mathbf{s}_{ k}^{R}\mathit{Tr}[\boldsymbol{\Pi }_{ k}^{R}\mathbf{R}(\mathbf{s})\mathbf{D}{(\mathbf{s})}^{-1}{\boldsymbol{\Pi }_{ k}^{R}}^{T}].$$
(12.30)

The criterion for the tuning parameters is that the relations

$$\mathbf{s}_{l}^{B} ={ \mathcal{J}_{l}^{B}({\mathbf{x}}^{a}(\mathbf{s})) \over \mathit{Tr}[\boldsymbol{\Pi }_{l}^{B}\mathbf{H}\mathbf{B}(\mathbf{s}){\mathbf{H}}^{T}\mathbf{D}{(\mathbf{s})}^{-1}{\boldsymbol{\Pi }_{l}^{B}}^{T}]}$$
(12.31)

and

$$\mathbf{s}_{k}^{R} ={ \mathcal{J}_{k}^{R}({\mathbf{x}}^{a}(\mathbf{s})) \over \mathit{Tr}[\boldsymbol{\Pi }_{k}^{R}\mathbf{R}(\mathbf{s})\mathbf{D}{(\mathbf{s})}^{-1}{\boldsymbol{\Pi }_{k}^{R}}^{T}]}$$
(12.32)

are exactly satisfied. Desroziers and Ivanov (2001) proposed an iterative approach (fixed-point algorithm) to solve (12.31) and (12.32), namely,

$$\mathbf{s}_{l\ i+1}^{B} ={ \mathcal{J}_{l}^{B}({\mathbf{x}}^{a}(\mathbf{s}_{ i})) \over \mathit{Tr}[\boldsymbol{\Pi }_{l}^{B}\mathbf{H}\mathbf{B}(\mathbf{s}_{i}){\mathbf{H}}^{T}\mathbf{D}{(\mathbf{s}_{i})}^{-1}{\boldsymbol{\Pi }_{l}^{B}}^{T}]}$$
(12.33)
$$\mathbf{s}_{k\ i+1}^{R} ={ \mathcal{J}_{k}^{R}({\mathbf{x}}^{a}(\mathbf{s}_{ i})) \over \mathit{Tr}[\boldsymbol{\Pi }_{k}^{R}\mathbf{R}(\mathbf{s}_{i})\mathbf{D}{(\mathbf{s}_{i})}^{-1}{\boldsymbol{\Pi }_{k}^{R}}^{T}]} ,$$
(12.34)

observing that the first iteration of the fixed-point algorithm gives a good estimate of the converged results.

12.3.3 Practical Implementation and Application to NAVDAS-AR

Computation of the tuning parameters requires the evaluation of the trace of the large matrices, \(\mathit{Tr}[\boldsymbol{\Pi }_{l}^{B}\mathbf{H}\mathbf{B}(\mathbf{s}){\mathbf{H}}^{T}\mathbf{D}{(\mathbf{s})}^{-1}{\boldsymbol{\Pi }_{l}^{B}}^{T}]\) and \(\mathit{Tr}[\boldsymbol{\Pi }_{k}^{R}\mathbf{R}(\mathbf{s})\mathbf{D}{(\mathbf{s})}^{-1}{\boldsymbol{\Pi }_{k}^{R}}^{T}]\). Because the matrices HBH T and D(s) − 1 are not explicitly formed (Chua and Bennett 2001), the trace is computed using the randomized trace estimator (Girard 1989; Hutchinson 1989) which was used by Wahba et al. (1995) for an adaptive tuning of parameters in a numerical weather prediction application.

It is the randomized trace technique which makes feasible the posterior analysis of Desroziers and Ivanov (2001) for large-scale data assimilation, and this approach has been applied to the NAVDAS-AR. The forecast model associated with the NAVDAS-AR system is the United States Navy Operational Global Atmospheric Prediction System (NOGAPS). NOGAPS is a global spectral numerical weather prediction model (Hogan and Rosmond 1991) with 42 vertical levels and T239 spectral horizontal resolution.

The research version of NAVDAS-AR routinely assimilates conventional in situ observations (including radiosondes and pibals, and surface observations from land and sea) and satellite observations (including geostationary rapid-scan and feature-tracked winds; winds from QuikScat, WindSat, ASCAT, ERS-2, AVHRR, MODIS, SSM/I and SSMIS; and total precipitable water from WindSat, SSM/I and SSMIS). NAVDAS-AR also assimilates remotely-sensed microwave and infrared sounder radiances from AMSU-A, SSMIS, AIRS and IASI. The representation of the background error covariance matrix B (in (12.7)) is based on the NAVDAS 3D-Var analysis system (Daley and Barker 2001), and the observation error covariance matrix R is diagonal. Because the space-time error covariance F (in (12.7)) is set to zero, the current system is 4D-Var, rather than the W4D-Var targeted for the future.

Figure 12.4 shows the behavior of the NAVDAS-AR system based on the diagnostics: \(\mathcal{J} ({\mathbf{x}}^{a})/m\), s B and s R. The values are computed over a 7 day period from 23 to 29 November 2008, with all available observations assimilated. If the background and observation errors are correctly modeled, one would expect \(\mathcal{J} ({\mathbf{x}}^{a})/m\,=\,{\mathbf{s}}^{B}\,=\,{\mathbf{s}}^{R}\,\approx \,1\). The figure shows that \(\mathcal{J} ({\mathbf{x}}^{a})/m\) varies from 0. 4 to 0. 6 and is smaller than the expected value of 1. Also, the background errors are underestimated and the observation errors are overestimated, as shown by values of s B varying from 1. 8 to 2. 4, and values of s R varying from 0. 4 to 0. 6, nearly overlapping the values of \(\mathcal{J} ({\mathbf{x}}^{a})/m\). The diagnostics also indicate that the analysis system is sensitive to the number of observations (more radiosonde observations at 0 and 12 UTC than at 6 and 18 UTC), with stable values over the observation period.

Fig. 12.4
figure 4

NAVDAS-AR posterior error diagnostics. The reduced value of the objective function divided by the number of observations is consistently smaller than unity (\(\mathcal{J} ({\mathbf{x}}^{a})/m < 1\); solid line), its expected value if both background and observation errors are correctly scaled (12.19). Analysis of the separate background and observation errors, s B (12.31) and s R (12.32), respectively, shows that the background error variance is under-estimated (s B > 1; solid line, square markers) and the observation error variance is over-estimated (s R < 1; dashed-line, circle markers). The sawtooth (up-down) pattern in these curves is due to the twice-daily timing of radiosonde observations, resulting in twice-daily changes in the number of observations assimilated.

The observation error tuning coefficient s R may be further broken down to diagnose the observation error variances for different types of observations. Table 12.1 shows the components for temperature, wind velocity, wind speed, moisture, total precipitable water, and satellite radiances. The values indicate that the temperature standard errors should be kept unchanged, but the standard error of the zonal and meridional components of wind should be slightly reduced. Likewise, the standard error for wind-speed, total precipitable water, and radiances should be adjusted downward. In contrast, the standard error for moisture data should be increased.

Table 12.1 Tuning coefficients

12.4 Summary

Variational data assimilation systems based on representer-based solution methods are being used to perform analyses and prediction in the ocean and atmosphere. One such weather prediction system, NAVDAS-AR, is currently in operational use (Xu et al. 2005; Rosmond and Xu 2006).

The inner iterative linear solvers at the core of these systems may display non-monotonic convergence in the norm defined by the primal objective function, and this behavior makes problematic the development of practical stopping criteria. One approach to this problem has been described, namely, using an inner solver that permits more diagnostics of the solution progress and objective function to be computed during the minimization. The generalized conjugate residual (GCR) algorithm provides these diagnostics, at the cost of some additional complexity compared with the conjugate gradient algorithm, but it performs reliably when the approximate adjoint of the model is used.

The analysis produced by any data assimilation system is always limited by the quality of the prior covariance models for the background, model forcings, and observations. In Sect. 12.3 it was shown how the posterior error analysis of Desroziers and Ivanov (2001) could be applied to calibrate these covariance models in variational data assimilation systems using representer-based solvers. Application of these methods has been applied to diagnose the observation error in NAVDAS-AR, which utilizes many sources of atmospheric data, each with unique error characteristics.