Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

16.1 Introduction

The impact of supplemental observations on the forecast error reduction depends on: (a) the size of the forecast error at the location where the observation is taken, (b) the assumptions used in the data assimilation scheme about the strength of the correlation between errors in forecasts of the observed variable and errors in all other variables defining the model state, (c) the actual correlation between errors of the observed variable and the model state variables, and (d) the growth and movement of the change in the estimated state imparted by the supplemental observations. In many applications, there is a special region called a verification region and a special time called a verification time. One often wishes to collect and use supplemental observations at an earlier observation time to minimize the forecast error variance within the verification region at the verification time. The problem of identifying the best location for deploying mobile observation platforms is often called the adaptive sampling or targeting problem. The importance of this problem has been heightened in oceanic applications by the advent of Autonomous Underwater Vehicles (AUVs) and underwater gliders. These observing platforms need to be told where to go and when. Since one must decide where to take the supplemental observations well before the targeting time, it is critical to solve the adaptive sampling problem in an accurate and timely manner. The ETKF based technique is used to provide the guidance of the ocean adaptive sampling for the supplemental ocean observations.

The ETKF uses an ensemble forecast initialized at an initialization time to quickly obtain the prediction error covariance matrix associated with a particular deployment of observation by solving a low rank Kalman filter equation. The technique can quickly assess the ability of a large number of future feasible sequences of observations to reduce the forecast error variance. The ETKF was developed by Bishop et al. (2001) and first used to provide the optimal flight tracks, where Global Positioning System (GPS) dropwindsondes were released during the Winter Storm Reconnaissance (WSR) program (Szunyogh et al. 2000), for improving the 24–72 h forecasts over the continental United States (Majumdar et al. 2002). It was also used for the medium range forecasts through a single model ensemble (Buizza et al. 2003; Sellwood et al. 2008), and a multi-model ensemble (Majumdar et al. 2010), as well as for tropical cyclone predictions (Majumdar et al. 2011). While the ETKF technique is increasingly used in the area of atmospheric adaptive sampling, there are relatively few applications in the area of ocean adaptive sampling.

In this study, the ETKF ocean adaptive sampling technique is applied to the glider data collected during the AOSN II field campaign that took place in the Monterey Bay in August 2003. The goal for the month-long field experiment was to build a fundamental understanding for upwelling and relaxation processes as well as their impact on the other biological (ecosystem productivity) and chemical (nutrient fertilization) counterparts in the Monterey Bay. To achieve the goal, it was important to develop strategies to command sophisticated robotic vehicles to the locations where the observations collected by them could be the most useful ones (AOSN 2003). Multiple AUVs and underwater gliders were deployed during the field campaign to collect data so that the data could be integrated into ocean forecast models for improving the model performance.

The ocean ensemble and adaptive sampling technique presented here is a continued effort of the verification of ocean modeling project (Hong et al. 2009a). The deterministic run in Hong et al. (2009a) is used as the control run of the ensemble simulation in this study. Consequently, the model, model configuration, and domain setting are exactly the same in both studies. The ocean model is the Navy Coastal Ocean Model (NCOM, Martin 2000) with the multivariate analysis of Navy Coupled Ocean Data Assimilation (NCODA, Cummings 2005). The atmospheric forcing is obtained from a deterministic operational forecast using Coupled Ocean/Atmosphere Mesoscale Prediction System (COAMPS, Doyle et al. 2008). Figure 16.1 shows the domain setting for the atmospheric and oceanic components of COAMPS as well as NCODA, respectively. The domain for the ocean components is within the innermost nested domain of the atmospheric component of COAMPS.

Fig. 16.1
figure 1

The NCOM and NCODA domain; (b) The COAMPS nested domain

The rest of the Chapter is organized as follows. In Sect. 16.2, the description of the ETKF adaptive sampling is provided. Section 16.3 contains the discussion of the atmospheric forcing ensemble generation. Section 16.4 presents the results from the ocean ensemble forecast. Section 16.5 illustrates the application of the ocean adaptive sampling for the AOSN II glider observations. Summary and discussion are presented in Sect. 16.6.

16.2 Ocean Adaptive Sampling Technique

In ETKF adaptive sampling, the observations are divided into: (1) non-adaptive or routine observations such as satellite and buoy observed SST, satellite observed altimeter, mooring observed ocean profiles and high frequency radar observed surface current, and (2) adaptive observations such as aircraft observed SST and observations collected by autonomous underwater gliders. The first step is to estimate routine analysis error covariance matrix valid for the ocean routine observations. The second step is to estimate the reduction in forecast error variance due to the supplemental ocean adaptive observations.

16.2.1 Analysis Error Covariance for the Routine Observations with the ET Technique

To be consistent with the ET technique of ensemble generation, we need to utilize a guess of the analysis error covariance matrix P g a associated with the routine observational network. Let the columns of the nxK matrices X o and X v list the raw ensemble perturbations at the observation and verification times, respectively, of the ensemble forecast initialized at the initialization time.

The forecast perturbations X o can be transformed into a set of perturbations X r that are consistent with P g a using

$${ \mathbf{X}}^{r} ={ \mathbf{X}}^{o}\mathbf{T}$$
(16.1)

where

$$\mathbf{T} = \mathbf{B}{\Lambda -}^{-1/2}{\mathbf{B}}^{T}$$
(16.2)

and where \(\mathbf{B} = \left [\mathbf{b}_{1},\mathbf{b}_{2},\ldots ,\mathbf{b}_{K}\right ]\) is a K ×K orthogonal matrix containing the eigenvectors of the symmetric matrix \(\left ({\mathbf{X}}^{\mathit{oT}}\mathbf{P}_{g}^{a-1}{\mathbf{X}}^{0}/N\right )\). In other words,

$$\frac{{\mathbf{X}}^{\mathit{oT}}\mathbf{P}_{g}^{a-1}{\mathbf{X}}^{o}} {N} = \mathbf{B}\Lambda _{K\,\,\times \,\,K}{\mathbf{B}}^{T}.$$
(16.3)

where \(\Lambda \,=\,\mathit{diag}\left (\lambda _{11},\lambda _{22},\ldots ,\lambda _{KK}\right )\) is a K ×K diagonal matrix listing the eigenvalues of \(\left ({\mathbf{X}}^{\mathit{oT}}\mathbf{P}_{g}^{a-1}{\mathbf{X}}^{0}/N\right )\). Since the sum of the forecast perturbations is equal to zero, one of these eigenvalues will be equal to zero. Consequently, provided each ensemble contains K − 1 linearly independent perturbations, Λ can be written in the form,

$$\Lambda _{K\,\,\times \,\,K} = \left [\begin{array}{ll} \Lambda _{(K-1)\,\,\times \,\,(K-1)} & 0 \\ 0 &0\\ \end{array} \right ]$$
(16.4)

where \(\Lambda _{(K-1)\,\,\times \,\,(K-1)}\) is a (K − 1) ×(K − 1) diagonal matrix whose diagonal elements are all greater than zero. The \(\Lambda -\) used in (16.4) is obtained from Λ by setting its zero eigenvalue equal to 1, in other words,

$$\Lambda -_{K\;\times \;K} = \left [\begin{array}{ll} \Lambda _{(K-1)\,\,\times \,\,(K-1)} & 0 \\ 0 &1\\ \end{array} \right ]$$
(16.5)

Note that while \(\Lambda -\) has an inverse, the inverse of Λ does not exist. This adjustment of the eigenvalue matrix is permissible because it does not affect the sample covariance matrix of initial perturbations implied by (16.3). To see this, first note that pre and post multiplying (16.5) by the eigenvector b K corresponding to the zero eigenvalue λ K  = 0 shows that

$$\frac{\mathbf{b}_{K}^{T}{\mathbf{X}}^{\mathit{oT}}\mathbf{P}_{g}^{a-1}{\mathbf{X}}^{0}\mathbf{b}_{K}} {N} = 0,\mbox{ and consequently}\left \vert {\mathbf{X}}^{o}\mathbf{b}_{ K}\right \vert = 0.$$
(16.6)

Second, note that if λ ii and \(\Lambda -_{ii}\) denote the diagonal elements of Λ and \(\Lambda -\), respectively, we may deduce that the perturbation ensemble sample covariance matrix P e r associated with the transformed ensemble perturbations is given by

$$\displaystyle\begin{array}{rcl} \mathbf{P}_{e}^{r}& =& \frac{{\mathbf{X}}^{r}{\mathbf{X}}^{rT}} {K - 1} = \frac{{\mathbf{X}}^{o}{\mathbf{TT}}^{T}{\mathbf{X}}^{\mathit{oT}}} {K - 1} \mbox{ =}\frac{{\mathbf{X}}^{o}\mathbf{B}{\Lambda -}^{-1}{\mathbf{B}}^{T}{\mathbf{X}}^{\mathit{oT}}} {K - 1} \\ & =& \frac{1} {K - 1}\displaystyle\sum \limits _{i=1}^{K}\frac{\mathbf{x}_{i}^{o}\mathbf{b}_{ i}\mathbf{b}_{i}^{T}\mathbf{x}_{ i}^{\mathit{oT}}} {{\lambda -} _{ii }^{1/2}} = \frac{1} {K - 1}\displaystyle\sum \limits _{i=1}^{K-1}\frac{\mathbf{x}_{i}^{o}\mathbf{b}_{ i}\mathbf{b}_{i}^{T}\mathbf{x}_{ i}^{\mathit{oT}}} {\lambda _{ii}^{1/2}} \end{array}$$
(16.7)

where b i is the ith column of B. Equation 16.7 shows that because X o b K  = 0, P e r is entirely independent of the value assigned to Kth eigenvalue. Throughout this discussion we will assume that every ensemble contains K − 1 linearly independent ensemble perturbations.

16.2.2 Signal Variance and Forecast Error Variance Reduction for Adaptive Observation with the ETKF Technique

If the true analysis error covariance at the observation time after assimilating all routine observations was given by \(\mathbf{P}_{e}^{r} = \frac{{\mathbf{X}}^{r}{\mathbf{X}}^{rT}} {K-1}\) then the posterior analysis error covariance P i a after assimilating the ith feasible deployment of adaptive observations y i a in addition to the routine observations is given by

$$\mathbf{P}_{i}^{a} = \mathbf{P}_{ r}^{e} -\mathbf{P}_{ r}^{e}\tilde{\mathbf{H}}_{ i}^{\mathit{aT}}{\left (\tilde{\mathbf{H}}_{ i}^{a}\mathbf{P}_{ r}^{e}\tilde{\mathbf{H}}_{ i}^{\mathit{aT}} + \mathbf{I}\right )}^{-1}\mathbf{\tilde{H}}_{ i}^{a}\mathbf{P}_{ r}^{e}$$
(16.8)

where \(\mathbf{\tilde{H}}_{i}^{a}\) describes the mapping from the model state vector to the observation vector normalized by the inverse square root of the observation error covariance \(\mathbf{R}_{i}^{-1/2}\) associated with the ith feasible deployment; in other words,

$$\mathbf{\tilde{H}}_{i}^{a}{\mathbf{x}}^{t} = \mathbf{R}_{ i}^{-1/2}\mathbf{y}_{ i}^{t}$$
(16.9)

where x t denotes the true model state and y i t denoted the true value of the observed variable. As shown in Bishop et al. (2001), if

$$\mathbf{P}_{i}^{a} = \frac{\mathbf{X}_{i}^{a}\mathbf{X}_{ i}^{\mathit{aT}}} {K - 1}$$
(16.10)

where X i a is a n ×K matrix then

$$\mathbf{X}_{i}^{a} ={ \mathbf{X}}^{r}\mathbf{C}_{ i}{\left (\Gamma _{i} + \mathbf{I}\right )}^{-1/2}\mathbf{C}_{ i}^{T}$$
(16.11)

where the K ×K orthonormal matrix C i and the K ×K diagonal matrix Γ i is given by the eigenvector decomposition

$$\frac{\mathbf{X}_{r}^{\mathit{aT}}\mathbf{H}_{i}^{\mathit{aT}}\mathbf{R}_{i}^{-1}\mathbf{H}_{i}^{a}\mathbf{X}_{r}^{a}} {K - 1} = \mathbf{C}_{i}\Gamma _{i}\mathbf{C}_{i}^{T}.$$
(16.12)

The columns of X i a may be interpreted as transformed ensemble perturbations whose covariance gives the analysis error covariance at the observation time assuming that the ith deployment of adaptive observations had been assimilated. To see the impact of the adaptive observations at the verification time, one needs to be able to propagate each of the columns of X i a through time in a manner consistent with the governing dynamical equations. A computationally expensive way of doing this would be to define a tangent linear model M such that

$$M\left (\mathbf{x}_{c}^{o} + \mathbf{x}_{ ji}^{a}\right ) - M\left (\mathbf{x}_{ c}^{o}\right ) \approx \mathbf{Mx}_{ ji}^{a}$$
(16.13)

where M is the non-linear dynamics propagator that maps state vectors from the observation time to the verification time, x c o is the control forecast at the observation time and x ji a is the jth column of X i a. If one had this operator in hand, then the forecast error covariance matrix given the ith deployment of observations P i v would be given by

$$\mathbf{P}_{i}^{v} = \frac{\mathbf{MX}_{i}^{a}{\left (\mathbf{MX}_{ i}^{a}\right )}^{T}} {K - 1}$$
(16.14)

However, using (16.11) and (16.1) in (16.14) gives

$$\mathbf{MX}_{i}^{a} = \left ({\mathbf{MX}}^{o}\right )\mathbf{TC}{\left (\Gamma + \mathbf{I}\right )}^{-1/2}{\mathbf{C}}^{T}.$$
(16.15)

Now MX o represents a tangent linear approximation to the propagation of the raw untransformed ensemble perturbations at the observation time to the verification time. Of course, the non-linear equations map the observation time raw perturbations X o to the verification time perturbations X v. These are directly available from the raw ensemble without any additional computational expense. Hence, a computationally inexpensive way of computing P i v that is more accurate than that given by (16.14) is

$$\mathbf{P}_{i}^{v} = \frac{\mathbf{X}_{i}^{v}\mathbf{X}_{ i}^{vT}} {K - 1} ,\mathrm{where}\quad \mathbf{X}_{i}^{v} ={ \mathbf{X}}^{v}\mathbf{TC}_{ i}{\left (\Gamma _{i} + \mathbf{I}\right )}^{-1/2}\mathbf{C}_{ i}^{T}.$$
(16.16)

Equation 16.16 gives the forecast error covariance of the model variables given the ith deployment of adaptive observations. Often the controller of adaptive observational resources will want to use them to minimize the error variance of some q-vector function f v of some subset(s) of the forecasted variables.

A perfect raw ensemble would provide K draws from the distribution of verifying functions given the forecast. In particular, the jth ensemble member gives

$$\displaystyle\begin{array}{rcl} \mathbf{f}_{j}^{v}& =& {H}^{v}\left (\mathbf{x}_{ j}^{v}\right ) = {H}^{v}\left [{\mathbf{\bar{x}}}^{v} + \left (\mathbf{x}_{ j}^{v} -{\mathbf{\bar{x}}}^{v}\right )\right ] \\ & \simeq & {H}^{v}\left ({\mathbf{\bar{x}}}^{v}\right ) +{ \mathbf{H}}^{v}\left (\mathbf{x}_{ j}^{v} -{\mathbf{\bar{x}}}^{v}\right ) \end{array}$$
(16.17)

where \({\mathbf{\bar{x}}}^{v}\) is the mean of the ensemble forecast and where H v is the non-linear function of interest and H v is the derivative of the non-linear function with respect to the model variables about the mean of the ensemble forecast \({\mathbf{\bar{x}}}^{v}\). Thus, the estimate of the qxq forecast error covariance matrix of the vector function f associated with the forecast upon which targeting decisions is made is given by

$$\displaystyle\begin{array}{rcl} \left \langle \left (\mathbf{f} -{\mathbf{f}}^{t}\right ){\left (\mathbf{f} -{\mathbf{f}}^{t}\right )}^{T}\right \rangle & \simeq & \frac{1} {K - 1}\displaystyle\sum \limits _{j=1}^{K}\left [{H}^{v}\left (\mathbf{x}_{ j}^{v}\right ) -\overline{{H}^{v}\left (\mathbf{x}_{ j}^{v}\right )}\right ]{\left [{H}^{v}\left (\mathbf{x}_{ j}^{v}\right ) -\overline{{H}^{v}\left (\mathbf{x}_{ j}^{v}\right )}\right ]}^{T} \\ & \simeq & \frac{1} {K - 1}\displaystyle\sum \limits _{j=1}^{K}{\mathbf{H}}^{v}\left (\mathbf{x}_{ j}^{v} -{\mathbf{\bar{x}}}^{v}\right ){\left (\mathbf{x}_{ j}^{v} -{\mathbf{\bar{x}}}^{v}\right )}^{T}{\mathbf{H}}^{vT} \\ & =& \frac{{\mathbf{H}}^{v}{\mathbf{X}}^{v}{\left ({\mathbf{H}}^{v}{\mathbf{X}}^{v}\right )}^{T}} {K - 1}.\qquad \quad \end{array}$$
(16.18)

where \(\overline{{H}^{v}\left (\mathbf{x}_{j}^{v}\right )}\) denotes the mean of the ensemble of vector functions. Using (16.18) and (16.16) leads to the following estimate of forecast error covariance matrix \(\left \langle \left (\mathbf{f} -{\mathbf{f}}^{t}\right ){\left (\mathbf{f} -{\mathbf{f}}^{t}\right )}^{T}\right \rangle _{i}\) for the vector function f given routine observations and the ith deployment of adaptive observations.

$$\displaystyle\begin{array}{rcl} \left \langle \left (\mathbf{f} -{\mathbf{f}}^{t}\right ){\left (\mathbf{f} -{\mathbf{f}}^{t}\right )}^{T}\right \rangle _{ i}& \approx & \frac{{\mathbf{H}}^{v}\mathbf{X}_{i}^{v}\mathbf{X}_{i}^{vT}{\mathbf{H}}^{vT}} {K - 1} \\ & =& \frac{{\mathbf{H}}^{v}{\mathbf{X}}^{v}\mathbf{TC}_{i}{\left (\Gamma _{i} + \mathbf{I}\right )}^{-1}\mathbf{C}_{i}^{T}{\mathbf{T}}^{T}{\mathbf{X}}^{vT}{\mathbf{H}}^{vT}} {K - 1} \\ & \approx & \frac{\left [{H}^{v}\left ({\mathbf{X}}^{v}\right )\right ]\mathbf{TC}_{i}{\left (\Gamma _{i} + \mathbf{I}\right )}^{-1}\mathbf{C}_{i}^{T}{\mathbf{T}}^{T}{\left [{H}^{v}\left ({\mathbf{X}}^{v}\right )\right ]}^{T}} {K - 1} \qquad \qquad \ \end{array}$$
(16.19)

where the qxK matrix \(\left [{H}^{v}\left ({\mathbf{X}}^{v}\right )\right ]\) is given by

$$\displaystyle\begin{array}{rcl} & & \left [{H}^{v}\left ({\mathbf{X}}^{v}\right )\right ] \\ & & \quad = \left [\left ({H}^{v}\left (\mathbf{x}_{ 1}^{v}\right ) -\overline{{H}^{v}\left ({\mathbf{x}}^{v}\right )}\right ),\left ({H}^{v}\left (\mathbf{x}_{ 2}^{v}\right ) -\overline{{H}^{v}\left ({\mathbf{x}}^{v}\right )}\right ),\ldots ,\left ({H}^{v}\left (\mathbf{x}_{ K}^{v}\right ) -\overline{{H}^{v}\left ({\mathbf{x}}^{v}\right )}\right )\right ].\end{array}$$
(16.20)

Thus, the ETKF allows non-linear cost functions without the need for the first derivative (Jacobian) of the non-linear verification time functions of interest.

Equation 16.19 gives the forecast error covariance of the user specified functions of interest for the ith deployment of adaptive observations. Often, users will reduce the information in this matrix to a single cost function by, for example, evaluating the trace of the matrix. To find which of all feasible deployments of adaptive observations minimizes the user specified cost function, one simply evaluates (16.19) for all feasible deployments of adaptive observations and chooses the deployment which minimizes the cost. Since the transformation matrix (16.2) associated with the routine observational network and the \(\left [{H}^{v}\left ({\mathbf{X}}^{v}\right )\right ]\) matrix only need to be evaluated once, the main computational expense associated with each deployment is the K ×K eigenvector decomposition (16.12). For ensemble sizes smaller than 100, this is a trivial expense on today’s CPUs and thousands of networks can be evaluated in a matter of minutes on moderate computing resources.

To highlight and predict the impact of the targeted observations, it is also of interest to predict the covariance of the distribution of changes to the forecast that would be imparted by the ith observational network given an infinite sampling of the distributions of observation and forecast. As shown in Bishop et al. (2001), at the observation time this covariance is given by

$$\displaystyle\begin{array}{rcl} \left \langle \left (\mathbf{x}_{i}^{o} -\mathbf{x}_{ r}^{o}\right ){\left (\mathbf{x}_{ i}^{o} -\mathbf{x}_{ r}^{o}\right )}^{T}\right \rangle & =& \mathbf{P}_{ r}^{e}\mathbf{\tilde{H}}_{ i}^{\mathit{aT}}{\left (\mathbf{\tilde{H}}_{ i}^{a}\mathbf{P}_{ r}^{e}\mathbf{\tilde{H}}_{ i}^{\mathit{aT}} + \mathbf{I}\right )}^{-1}\mathbf{\tilde{H}}_{ i}^{a}\mathbf{P}_{ r}^{e} \\ & =& \frac{{\mathbf{X}}^{o}\mathbf{TC}_{i}\Gamma _{i}{\left (\Gamma _{i} + \mathbf{I}\right )}^{-1}\mathbf{C}_{i}^{T}{\mathbf{T}}^{T}{\mathbf{X}}^{\mathit{oT}}} {K - 1} \qquad \qquad \end{array}$$
(16.21)

where x r o represents the minimum error variance state estimates at the observation time given routine observations while x i o represents the minimum error variance state estimates at the observation time given routine observations and the ith deployment of adaptive observational resources. Thus, it represents the covariance of changes to the state estimate due to adaptive observations. The changes due to the adaptive observations are called signals and the covariance of these changes is called the signal covariance. The expression for the signal covariance at the verification time is

$$\left \langle \left (\mathbf{x}_{i}^{v} -\mathbf{x}_{ r}^{v}\right ){\left (\mathbf{x}_{ i}^{v} -\mathbf{x}_{ r}^{v}\right )}^{T}\right \rangle = \frac{{\mathbf{X}}^{v}\mathbf{TC}_{ i}\Gamma _{i}{\left (\Gamma _{i} + \mathbf{I}\right )}^{-1}\mathbf{C}_{ i}^{T}{\mathbf{T}}^{T}{\mathbf{X}}^{vT}} {K - 1}$$
(16.22)

As can be seen by comparing (16.21) with (16.8) and as was discussed in Bishop et al. (2001), for an optimal data assimilation scheme, the signal variance is precisely equal to the reduction in forecast error variance due to the observations that created the signals. Comparison of geographical plots of the diagonal elements of (16.21) and (16.22) with actual changes in forecasts due to targeted observations can give a good indication of whether the ETKF signal variance predictions are reasonable or not.

16.3 Atmospheric Forcing Ensemble Generation

Based on the theory that model forecast errors are often well described in terms of shifting and timing errors (Hoffman et al. 1995), the uncertainty of atmospheric forcing can be represented by adding perturbations to surface fields from a single deterministic atmospheric forecast through spatial and temporal deformation. The amplitude of the perturbations is chosen to be small enough to ensure that the perturbed field lies within the error bounds of the forecast. To control the amplitude and horizontal correlation length scale of the random perturbations, the covariance matrix of the shift-vector δ t of shifts at a certain time is given by:

$$\left \langle \delta \mathbf{t}\delta {\mathbf{t}}^{T}\right \rangle = \mathbf{DE}\,\Lambda \,{\mathbf{E}}^{T}\mathbf{D}$$
(16.23)

where D is a diagonal matrix of the variances we wish to assign to the random process at each grid point and EΛE T defines a correlation matrix whose diagonal values are all equal to 1. For simplicity, we chose the columns of E to be the two-dimensional sinusoids and cosinusoids that define a basis for the two-dimensional domain upon which the ocean state is defined. Let a be a random normal vector with zero mean and covariance \(\left \langle {\mathbf{aa}}^{T}\right \rangle \,=\,\Lambda \). Now consider random vectors y obtained using y = Ea. Note that since the columns of E are the sinusoidal basis used in inverse Fourier transform, the operation Ea is simply an inverse Fourier transform. To ensure that the random perturbations satisfy (16.1), we generate each perturbation using

$$\delta \mathbf{t} = \mathbf{DEa},\mathrm{where}\left \langle \mathbf{a}\right \rangle = 0\ \mathrm{and}\ \left \langle {\mathbf{aa}}^{\mathrm{T}}\right \rangle = \Lambda $$
(16.24)

In other words, a random perturbation is created by

  1. 1.

    Creating a vector b of n normally independently identically distributed numbers each of which has a mean of zero and a variance of 1.

  2. 2.

    Letting \(\mathbf{a} = {\Lambda }^{1/2}\mathbf{b}\).

  3. 3.

    Performing the inverse Fourier transform implied by Ea.

  4. 4.

    Performing the operation δ t = DEa.

To see that this process creates random perturbations that satisfy (16.1) note that

$$\displaystyle\begin{array}{rcl} \left \langle \delta \mathbf{t}\delta {\mathbf{t}}^{T}\right \rangle & =& \left \langle {\mathbf{DEaa}}^{T}{\mathbf{E}}^{T}\mathbf{D}\right \rangle \\ & =& \mathbf{DE}\left \langle {\mathbf{aa}}^{T}\right \rangle {\mathbf{E}}^{T}\mathbf{D}\ \mathrm{because}\ \mathbf{E}\ \mathrm{and}\ \mathbf{D}\ \mbox{ are constant} \\ & =& \mathbf{DE}\,\Lambda \,{\mathbf{E}}^{T}\mathbf{D},\ \mathrm{because}\ \left \langle {\mathbf{aa}}^{T}\right \rangle = \Lambda \end{array}$$
(16.25)

The scales and magnitudes of the random perturbations are thus determined by the user’s specification of D and Λ. Here, we chose D = α I so that the constant α gives the variance at each point and let the diagonal elements λ ii of Λ be given by the Gaussian function of the total wavenumber to which they pertain that is given by

$$\lambda _{ii}(k,l) = C \times \exp \left (\frac{-({k}^{2} + {l}^{2})} {{L}^{2}} \right )$$
(16.26)

where k and l are non-dimensional wave numbers (associated with the indexing of grid points in the FFT routine), L (a non-dimensional length scale) controls the horizontal correlation length scale in spectral space. Decreasing L increases the spatial scale of the random fields by (16.24). The scale C is an amplitude factor that is used to ensure that the diagonal elements of EΛE T are equal to unity and hence that EΛE T is a valid correlation matrix. The values of C, L and α used in our experiments are 0.5, 10, and 0.5 h, respectively. With these parameters, (16.24) produces a spatially correlated field of time shifts with a standard deviation of α = 0. 5 h.Footnote 1

To create a time shift vector δ t(t) that varies in time as well as space, we used (16.24) to create two entirely independent time-shift vector shifts δ t(t i ) and δ t(t i + 1) corresponding to the discrete times t i  and t i + 1. These two times might be 24 or 72 h apart depending on the perceived decorrelation time of atmospheric forcing errors. (In our study independent fields were generated every 24 h). To ensure that the time shift vector varied smoothly between these two times, we set

$$\delta \mathbf{t}(t) =\delta \mathbf{t}(t_{i})\cos \left [ \frac{\pi } {2}\left ( \frac{t - t_{i}} {t_{i+1} - t_{i}}\right )\right ] +\delta \mathbf{t}(t_{i+1})\sin \left [ \frac{\pi } {2}\left ( \frac{t - t_{i}} {t_{i+1} - t_{i}}\right )\right ]$$
(16.27)

Equation 16.27 implies that the evolution of the covariance of time shifts is given by

$$\displaystyle\begin{array}{rcl} \left \langle \delta \mathbf{t}\left (t\right )\delta \mathbf{t}{\left (t\right )}^{T}\right \rangle & =& \mathbf{D}_{ i}\mathbf{E}\Lambda _{i}{\mathbf{E}}^{T}\mathbf{D}{_{ i}\cos }^{2}\left [ \frac{\pi } {2}\left ( \frac{t - t_{i}} {t_{i+1} - t_{i}}\right )\right ] \\ & & +\,\mathbf{D}_{i+1}\mathbf{E}\Lambda _{i+1}{\mathbf{E}}^{T}\mathbf{D}{_{ i+1}\sin }^{2}\left [ \frac{\pi } {2}\left ( \frac{t - t_{i}} {t_{i+1} - t_{i}}\right )\right ]\qquad \qquad \end{array}$$
(16.28)

This formulation allows both the scale and magnitude of the deformations to be a function of time. Note also that in the special case that \(\mathbf{D}_{i+1}\mathbf{E}\Lambda _{i+1}{\mathbf{E}}^{T}\mathbf{D}_{i+1} = \mathbf{D}_{i}\mathbf{E}\Lambda _{i}{\mathbf{E}}^{T}\mathbf{D}_{i}\), the trigonometric rule \({\cos }^{2}\theta {+\sin }^{2}\theta = 1\) ensures that the covariance of the time shifts given by (16.27) and (16.28) is constant even though each individual time shift is smoothly evolving through time.

For the experiments reported in this Chapter, the eigenvector matrix E was comprised by the set of sinusoidal basis functions spanning a two dimensional plane. By making the domain on which the time shifts δ t were generated larger than that of the regional ocean model, it was possible to produce aperiodic time-shifts.

The temporally shifted fields include surface wind, air temperature, relative humidity, precipitation, sea-level pressure, and short- and long-wave radiation. Each randomly shifted field is used to compute the surface wind stress and heat fluxes for each ensemble member. The NCOM-predicted SST is interactively feedback to the surface latent and sensible heat fluxes using the drag coefficient from the standard bulk formulas of Kondo (1975) (Martin and Hodur 2003; Hong et al. 2007, 2009b). The surface salt flux for NCOM is calculated from the computed latent heat flux and the COAMPS precipitation.

Figure 16.2 shows u- and v- components of surface 10-m wind from a single COAMPS deterministic forecast, a time shifted field and the difference between the original and shifted fields. The high-resolution COAMPS atmospheric forecast presents a strong northwesterly, which is favorable for the ocean coastal upwelling for the Monterey Bay during the AOSN II field campaign (Doyle et al. 2008). The northwesterly lasts from 7 to 19 August and induces an upwelling period. The perturbed atmospheric forcing fields for a particular ensemble member and forecast lead time present smooth features over the entire domain. The northwesterly wind is preserved in the perturbed fields so that the upwelling will be induced in each ocean ensemble forecast with the inclusion of atmospheric forcing uncertainty. The difference between the original and shifted fields displays various locations of maximum perturbation, which explains the feature of random distribution from space and time shifting.

Fig. 16.2
figure 2

Original (first column), shifted (second column) and difference between original and shifted u-component (upper panel) and v-component of surface 10-m wind speed from COAMPS forecast for AOSN II domain (Monterey Bay)

16.4 Ocean Ensemble Forecast

Ocean ensemble generation is based on the ET technique, which has been used for atmospheric ensemble generation (Bishop et al. 2009) and for coupled atmosphere/ocean ensemble generation (Holt et al. 2011). The ET technique provides initial perturbations that (1) have an initial variance consistent with the best available estimates of initial condition error variance, (2) are dynamically conditioned by a process similar to that used in the breeding technique (Toth and Kalnay 1993, 1997), (3) add to zero at the initial time, (4) are quasi-orthogonal and equally likely, and (5) partially respect mesoscale balance constraints by ensuring that each initial perturbation is a linear sum of forecast perturbations from the preceding forecast. The analysis error variance is used to constrain the magnitude of initial perturbations that represent transformations or linear combinations of ensemble forecast perturbations, so called ET perturbations (Bishop and Toth 1999; Bishop et al. 2009). The analysis error variance used in this study is scaled from the NCODA ocean analysis to adjust large untruthful values from the sparse ocean observations. A complete description of the ET technique and the detailed steps to creating an ET ensemble can be found in Bishop et al. (2009).

The ocean ensemble with 20 ensemble members is initialized from a set of perturbations derived from a control deterministic NCOM run for one month from August 1–31, 2003. The NCOM monthly run is performed in a sequential incremental update cycle with an update interval of 24 h and produces 72 h forecast at each analysis update time (Hong et al. 2009a). The differences between every 12 h forecast (up to 24 h) and monthly mean generate 62 perturbations, which provide a database for random selection of initial ensemble perturbations.

From August 7–19, the winds are upwelling favorable with north/northwesterly (Doyle et al. 2008) and induce strong upwellings with two upwelling centers developed off Point Ano Nuevo and Point Sur (Hong et al. 2009a). Ensemble means display stronger upwellings from the two upwelling centers than in the control run and provide features more comparable with the observation (Fig. 16.3). Stronger horizontal SST gradients occur between the upwelled cold water and the offshore warm water. The seaward advection is more consistent with the observation from the ensemble mean on August 12 (upper panel in Fig. 16.3). Later in the upwelling period, a cold tongue of upwelled water off Point Ano Nuevo is advected southward across the mouth of the Monterey Bay and joins with the upwelled cold water from Point Sur, resulting in a large, cold-water region located just off the coast both in ensemble mean and the observation. These results indicate that the ensemble means are more accurate to the observation MCSST than the control run.

Fig. 16.3
figure 3

SST from NCOM control run, ensemble mean and NOAA POES AVHRR HRPT (Courtesy NWS and NOAA Coastwatch). The model outputs are from 18 h forecast valid at 18Z August 12, 2003 for the upper panel and 18Z August 15, 2003 for the lower panel

The ensemble spread increases with the forecast lead time as shown for SST forecast in Fig. 16.4. Large ensemble spread transports southward with time, reflecting the upwelled cold water movement. It indicates that the forecasted transport of upwelled cold water across the mouth of the Monterey Bay during the upwelling period has high uncertainty.

Fig. 16.4
figure 4

Ensemble spread for 24, 48 and 72 h forecast initiated from August 12, 2003

16.5 Adaptive Sampling for the AOSN II Glider Observation

The underwater vehicle network features a fleet (up to 15 gliders) of autonomous underwater gliders during the AOSN-II field campaign. Underwater gliders are small, relatively simple and inexpensive, winged, buoyancy-driven submersibles. They are ideal platforms to collect scientific data for the ocean adaptive sampling. The deployment of the gliders are efficient and effective by allowing them to change plans on-line in response to the state and environmental measurement needs with daily time scale and faster time scale (on the order of every two hours) (Leonard and Robinson 2003). With the ability to frequently update the glider plan, the time for decision-making for optimal glider deployment can be shorter than other type of platform deployment, such as aircraft equipped with GPS dropwindsondes for upstream observation of significant weather event (Majumdar et al. 2002).

Key times involved in the decision-making process for the adaptive sampling application of AOSN II glider observation are illustrated in Fig. 16.5. The goal of the adaptive sampling is to use an available ensemble forecast to identify the future glider path that would maximally reduce the forecast error variance in the verification region at the verification time. As an example, consider the ensemble forecast initialized at the initialization time of 00 UTC Aug 12th. A new forecast will be initialized at the targeting time of 00 UTC Aug 13th using targeted observations. The decision time is the time when one must decide the location to which the glider should be sent in order to minimize the error norm of the forecast to be initialized on 00 UTC Aug. 13th. The verification time selected here is 00 UTC Aug 14th to verify the forecast error reduction for the upwelled cold water transport across the mouth of the Monterey Bay.

Fig. 16.5
figure 5

Illustration of ensemble initialization time, decision time, targeting time and verification time for adaptive sampling used in this study

For a group of adaptive observations, the signal variance, which would be equal to the reduction in forecast error variance in an optimal system, is used to identify the best location for the deployment. The verification region is placed in a location within which the ensemble variance is large at the verification time. This choice of verification region increases the chances that the targeted observations will result in a significant reduction in forecast error (Bishop et al. 2006). Figure 16.4b illustrates the fact that for a verification time 48 h from the ensemble initialized, at 00 UTC Aug. 12th, there is a large ensemble spread across the mouth of the Monterey Bay due to the uncertainty of the southward transportation of upwelled cold water from Point Ano Nuevo. The verification region selected to enclose some of this high spread region is shown by the ellipse on Fig. 16.4b. The possible location for optimal adaptive deployment can be tested in the two areas where the ensemble spread is significant at the targeting time. As shown in Fig. 16.4a, there are two possible locations with one off the mouth of the Monterey Bay (location #1) and another one in the south off Point Sur coast (location #2).

Nine adjacent “test” observations of surface temperature are placed for these two locations centered at 36. 7 ∘ N, 122. 5 ∘ W and 36. 2 ∘ N, 122. 1 ∘ W, respectively and used to calculate signal variance at the targeting and verification times (Fig. 16.6). There are high signal variances for both locations of the adaptive observation at the targeting time. It shows larger signal variances at the location #1 (Fig. 16.6a) compared to the location #2 (Fig. 16.6c) due to larger ensemble spread at the targeting time. The signal variance at the verification time has larger values within the verification region from the location #1 (Fig. 16.6b) compared to the location #2 (Fig. 16.6d). This suggests the first location for the deployment is more likely to improve the forecast than the second location.

Fig. 16.6
figure 6

Signal variance for nine observations centered at 36. 7 ∘ N, 122. 5 ∘  W for (a) targeting time 00 UTC 13 Aug 2003 and (b) verification time 00 UTC 14 Aug 2003. Signal variance for nine observations centered at 36. 2 ∘  N, 122. 1 ∘  W for (c) targeting time same as (a) and verification time same as (b). The black ellipse contour indicates verification region

Figure 16.7a depicts the predicted reduction in forecast error variance at the verification time due to a surface temperature observation at the targeting time at the location indicated by the white cross. By integrating this field across the verification region we obtain a prediction of the reduction in forecast error variance due to an observation at the white cross. Figure 16.7b plots the mean reduction in forecast error variance as a function of the location of the test observation. We refer to maps like Fig. 16.7b as a “summary map”.

Fig. 16.7
figure 7

(a) Signal variance at the verification time for the feasible deployment of adaptive observations indicated by the white crosses, (b) Summary map of average signal variance over the verification area at the verification time as a function of a single temperature observation, (c) Bar chart of signal variance for eight glider tracks displayed in (b)

If gliders are available for adaptive sampling, summary bar charts can be used to choose among several feasible glider paths. At a particular location, a glider needs to be directed which direction it will be towards to. To demonstrate how signal variance summary bar chart can be used, assuming that for a particular location, a glider can have eight possible tracks (red lines in Fig. 16.7b). The predicted reduction in forecast error variance within the verification region at the verification time as a function of each of the eight possible glider paths is plotted as a bar chart (Fig. 16.7c). Each bar gives the ETKF prediction of the reduction of forecast error variance within the verification region to be associated with a particular glide track. Given knowledge about where a glider is at the beginning of the targeting time, these bar charts can be used to direct the glider along the path predicted to have the maximum impact on the forecast error reduction. Thus, the signal variance given on the bar chart suggests that track seven is the best of these eight glider deployments.

During the AOSN II field campaign, up to 15 different gliders are crisscrossing the Monterey Bay at any given time. For example, thirteen gliders are deployed on Aug 13, 2003 during a 24 h observation time window and each takes the path indicated in Fig. 16.8a. As a test of a target technique, it is of a great interest to see which of these 13 glider paths would have been the best choice if one were only going to assimilate observations from just one of the 13 gliders. Figure 16.8b gives the ETKF predicted reduction in forecast error variance in the verification region at the verification time as a function of the glider track. It shows that, according to our implementation of the ETKF, path 6 would have been the best followed by path 2 and 11.

Fig. 16.8
figure 8

(a) Glider tracks on Aug 13, 2003. (b) Signal variance summary bar chart for 13 glider tracks shown in (a)

16.6 Summary

The purpose of this Chapter is to illustrate the development and present preliminary results of the ETKF ocean adaptive sampling system that incorporates three distinctive techniques: (1) a time-shifting technique that enables an ensemble of very high resolution atmospheric forecasts to be generated from a single high resolution ensemble member, (2) an ET ensemble generation technique for the generation of ocean ensemble, and (3) an ETKF technique for ocean adaptive sampling. The system is applied to the Monterey Bay area during the AOSN II field campaign in the month of August 2003.

The atmospheric forcing from COAMPS AOSN II forecast is shifted smoothly in time to transfer a single deterministic forecast to an ensemble for ocean ensemble forecast. The shifted atmospheric forcing fields are able to preserve the important aspects of the atmospheric features so that each ocean ensemble member is forced with an approximation to a realization of the true atmospheric state given previous observations.

The NCOM ensemble mean is found to be able to give a better representation of the upwelling features than the single deterministic run during the upwelling period. Two upwelling centers are found. One is near the coast of Point Ano Nuevo and the other near Point Sur. The ensemble mean is also found to be closer to the features in the satellite observations than the ones in the control forecast. Furthermore, the ensemble mean is closer to the observed cold water seaward movement and transport across the mouth of the Monterey Bay during an earlier and later time of the upwelling period, respectively. The ensemble spread is found to be maximized near the upwelled cold water transport across the mouth of the Monterey Bay.

An ocean adaptive sampling system derived from the ETKF technique is illustrated using the data collected during the AOSN II field campaign. For a large number of possible adaptive observations, a signal variance summary map provides an overview of the predicted reduction in forecast error variance within the verification region as a function of the location of a plausible future observation. The predicted reduction in forecast error variance for a large number of possible glider tracks is summarized and displayed in a bar chart for each feasible deployment. The real glider tracks from the AOSN II field campaign are used to derive a signal variance bar chart with 13 possible glider deployments. The ETKF adaptive sampling distinguishes one path with a large summarized signal variance near the verification area. The use of this path, in our view, would have been most likely to reduce the forecast error within the verification region.

As discussed in Majumdar et al. (2002), the quantitative assessments of the accuracy of ETKF signal variance predictions require a large number of events. Unfortunately, the limited events during the AOSN II do not provide enough cases for such quantitative assessments to be made. Nevertheless, the aforementioned experiment indicates that the adaptive sampling locations selected using the technique presented here are, at the very least, consistent with the group velocity of wave packets of ocean forecast errors that are unlikely to propagate very far over a 24 h period in the ocean. For the future work, we hope to use a large number of cases to quantitatively measure the accuracy of the ETKF prediction of forecast error variance reduction in the ocean prediction.