1 Probabilistic Forecasting

Up to this point, we have described deterministic weather models. These models are governed by the initial state, and the errors in this state grow as the model predicts the future, since the models are unstable systems characterized by nonperiodicity. So, the accuracy of the forecast depends on the initial state which is uncertain.

This relationship between the initial state and the deterministic prediction was discovered by Edward Lorenz and is discussed in his book “The Essence of Chaos” [24]. In 1962 [22], he simulated the evolution of the atmospheric state using the geostrophic form of the two-layer baroclinic model proposed in [21] consisting of 12 ordinary differential equations in 12 variables. It used a linear regression from the output of a model. When he ran the simulations, he found out that some solutions were drastically different. Analyzing the results, he found out that, in some experiments, he had truncated the model output to three digits accuracy while the original values had a precision of six digits. Just this small change lead to significant differences in the forecast results. These differences imply that observations need a precision up to the three decimal places to obtain a reliable forecast.

This result prompted the scientific community to determine a procedure to determine which is the best forecast of the atmosphere state according to the available data. Nowadays, there are several meteorological agencies worldwide running their numerical weather prediction models (NWP), each one different from the others. The results from these models are consistent with the observed data but they differ between them, so we cannot say which model is the “correct one”. Instead, we can think of each forecast as a member of an ensemble of atmospheric states that are consistent with the observations.

With this idea, Epstein realized that the atmosphere is deterministic since it obeys the fundamental laws of hydrodynamics, but its state can only be known in a probabilistic way. Therefore, in [11], he proposed a “stochastic dynamic” (SD) approach consisting in using the continuity equation for probability [14] in the observations data. He compared the results of the SD model with the results of a deterministic model that used as the initial condition the ensemble mean from the Monte Carlo method.

The problem with SD is that it is expensive; the number of equations for SD prediction is equal to the number of spectral components raised to the power of the number of moments. Philip Thompson [34] proposed a more efficient model by using variances directly instead of covariances; this way the number of equations was reduced.

With the advent of parallel machines, researchers developed different approaches to deal with the uncertainty of the initial state. Murphy [26] ran an experiment using the hemispheric version of the Meteorological Office (UKMO) five-level general circulation model. Initial conditions were obtained by perturbing a given state. Seven individual perturbations were used, and the ensemble forecast consisted of their integration.

To obtain the perturbed initial state, Murphy considers two different methods: random perturbation and lagged-averaged forecast. The random perturbation generates the seven initial states by adding independent perturbations to the known initial state. These perturbations are consistent with some analysis errors. This random perturbation method is similar to the Monte Carlo method. The lagged-averaged forecast method uses past observations to generate each member of the ensemble.

The Monte Carlo (or random perturbation) approach has some limitation; for example, the perturbed parameters can lead to imbalances in the atmospheric state. Another issue is that the perturbations in the Monte Carlo approach were random, while the parameters should have certain preferred directions. With this ideas, different strategies for perturbing dynamical prediction models were studied. The two more used methods nowadays are the singular vector decomposition (SV) [1, 10, 28] used by the European Centre for Medium-Range Weather Forecasts (ECMWF) [19] and the Breeding Vector technique (BV) used by the National Centers of Environmental Prediction (NCEP) [36]. A comparison between the two methods using the ECMWF Integrated Forecast System is described in [25].

These advances, along with more powerful parallel machines, and improvements in deterministic forecasting [33], led to the birth of Ensemble Prediction Systems (EPSs). EPSs are operational systems that provide probabilistic forecasts based on ensemble members. The method to create these ensemble members is different between systems. The Meteorological Service of Canada (MSC) uses a Monte Carlo approach, and, as said previously, ECMWF uses SV, and NCEP uses BV [4].

More recently, a new approach to ensemble forecasting has been developed, the multimodel ensemble forecast. This approach uses forecasts from different models as ensemble members. The ensemble may be composed of deterministic forecasts or from ensemble prediction systems (called superensemble). The idea is to combine the strengths and weaknesses of each model and obtain a more reliable prediction [9, 16].

The THORPEX Interactive Grand Global Ensemble (TIGGE) [3] is a multimodel ensemble system that combines the predictions of the following models: ECMWF, UK Met Office (UKMO), National Centre for Medium Range Weather Forecasting—India (NCMRWF), CMA, Japan Meteorological Agency (JMA), National Centers for Environmental Prediction (NCEP-USA), Meteorological Service of Canada (CMC), Bureau of Meteorology Australia (BOM), Centro de Previsao Tempo e Estudos Climaticos Brazil (CPTEC), Korea Meteorological Administration (KMA), and MeteoFrance (MF) global models. Apart from this global initiative, there is the North American Ensemble Forecasting System (NAEFS) [7] that combines the systems from the Canadian Meteorological Centre (CMC) and the National Centers for Environmental Prediction (NCEP); and an European initiative: the Development of a European Multimodel Ensemble System for Seasonal to Interannual Prediction project (DEMETER) [27].

If the reader is interested in these developments, John M. Lewis [20] wrote a more thorough review of the history of ensemble models.

1.1 Initial State Perturbation Methods

In this subsection, we describe the two most used methods to perturbate the initial state. A simple way of converting a deterministic forecast into a probabilistic forecast would be to modify the deterministic result using a probability distribution constructed from previous forecast errors. This strategy would not work because the underpinning dynamical equations are nonlinear, then the errors at the initial state do not relate directly to the predicted result. So we need to perturbate the initial state. The Monte Carlo approach is to create a random perturbation of the initial state according to their known error characteristics. However, this leads to underdispersive forecast ensembles [5]. The reason for this is that there are many unrepresented sources of uncertainty not explicitly represented in a Monte Carlo forecast.

For this reason, new techniques were required to represent the nonlinearity of the dynamical equations in the ensemble predictions. Two of the most commons techniques will be discussed in this sections; the Singular Vector decomposition, and the Breeding Vector technique.

1.1.1 Singular Vector Decomposition

The main idea behind Singular Vector decomposition is the singular value decomposition of the forward tangent linear operator. This can be physically interpreted as the fastest growing perturbations. Therefore, SVs give information about the direction and dynamics of rapidly growing instabilities and perturbations.

The method was devised by Lacarra and Talagrand in [18] where they were interested in identifying the perturbations that lead to the maximum difference between the simulated state and a reference one. They defined \(\mathbf {x}(0)\) as the vector containing the initial state information. The model is defined as \({\mathbf {M}}:\mathbb {R}^n\rightarrow \mathbb {R}^n\). Therefore the state at time t is defined as

$$\begin{aligned} \mathbf {x}(t) = {\mathbf {M}}(\mathbf {x}(0)) \end{aligned}$$
(1)

Since they were interested in knowing the perturbations that differed more from a reference state they need to know how the state evolves. For this reason, they define the resolvent of \({\mathbf {M}}\) as

$$\begin{aligned} {\mathbf {F}}(\mathbf {x}) = \frac{d\mathbf {x}}{dt} \end{aligned}$$
(2)

If the perturbed initial state is defined as \((\mathbf {x}(0) + \mathbf {\chi }(0))\), then the time evolution of the perturbed state can be written as

(3)

the second-order term can be neglected, and the derivative of \(\mathbf {\chi }\) is

(4)

This linear system of equations is called the tangent linear system of \({\mathbf {M}}\) in the vicinity of the particular solution \(\mathbf {x}(t)\). It describes the temporal evolution of the perturbation \(\mathbf {\chi }(t)\), to first order concerning the initial perturbation \(\mathbf {\chi }(0)\). We can rewrite Eq. (4) as

$$\begin{aligned} \chi (t) = {\mathbf {L}}(0,t)\mathbf {\chi }(0) \end{aligned}$$
(5)

where the operator \({\mathbf {L}}(0,t)\) is the forward tangent linear operator or the linear propagator. So the perturbations that will maximize the difference can be found using the singular value decomposition of \({\mathbf {L}}(0,t)\).

$$\begin{aligned} {\mathbf {L}} = {\mathbf {W}}\varLambda {\mathbf {Y}}^* \end{aligned}$$
(6)

where \(\varLambda \) is a diagonal matrix with the singular values of \({\mathbf {L}}\) (\(\lambda _1\text {, }\lambda _1\text {, } \dots \)). \({\mathbf {Y}}^*\) is the conjugate transpose of \({\mathbf {Y}}\). The columns of \({\mathbf {Y}}\) correspond to the initial (or right) singular vectors. The columns of \({\mathbf {W}}\) are the evolved (or left) singular vectors.

The singular vectors of \({\mathbf {L}}\) are the same as the eigenvectors of \({\mathbf {L}}^*{\mathbf {L}}\). And, specifically, \({\mathbf {Y}}\) and \({\mathbf {W}}\) are related in the following manner:

$$\begin{aligned} {\mathbf {L}}^*{\mathbf {L}}\mathbf {y}_i = \lambda _i^2\mathbf {y}_i \end{aligned}$$
(7)
$$\begin{aligned} {\mathbf {L}}{\mathbf {L}}^*\mathbf {w}_i = \lambda _i^2\mathbf {w}_i \end{aligned}$$
(8)

To find the perturbations with the maximum amplitude growth, we need to compute them. To this end, we can use any norm \({\mathbf {E}}\)

$$\begin{aligned} ||\chi ||_E = \langle \chi , {\mathbf {E}}\chi \rangle \end{aligned}$$
(9)

where \({\mathbf {E}}\) is a matrix operator that defines the inner product.

For a linear operator \({\mathbf {L}}\), exists its adjoint \({\mathbf {L}}^*\) such that \(\langle \chi , {\mathbf {L}}y\rangle = \langle {\mathbf {L}}^*\chi , y\rangle \). Its possible to choose different norms at the initial and the final time

$$\begin{aligned} ||\chi (t_0)||_{E_0}^2 = \langle \chi (t_0), E_0\chi (t_0)\rangle \end{aligned}$$
(10)
$$\begin{aligned} ||\chi (t)||_{E_t}^2 = \langle \chi (t), E_t\chi (t)\rangle \end{aligned}$$
(11)

The objective is to maximize the growth rate, or amplification factor, defined as

$$\begin{aligned} \begin{aligned} \lambda ^2&= \frac{||\chi (t)||_{E_t}^2}{||\chi (t_0)||_{E_0}^2} = \frac{\langle \chi (t), E_t\chi (t)\rangle }{\langle \chi (t_0), E_0\chi (t_0)\rangle } \\&= \frac{\langle {\mathbf {L}}\chi (t_0), E_t{\mathbf {L}}\chi (t_0)\rangle }{\langle \chi (t_0), E_0\chi (t_0)\rangle } = \frac{\langle {\mathbf {L}}^*E_t{\mathbf {L}}\chi (t_0), \chi (t_0)\rangle }{\langle \chi (t_0), E_0\chi (t_0)\rangle } \end{aligned} \end{aligned}$$
(12)

To maximize \(\lambda ^2\), we solve the following eigenvalue problem:

$$\begin{aligned} \left( {\mathbf {L}}^*E_t{\mathbf {L}} \right) y_i(t_0) = \lambda _i^2E_0y_i(t_0) \end{aligned}$$
(13)

We can rewrite this equation using the variable transformation \(y_i(t_0) = E_0^{-\frac{1}{2}}\gamma _i(t_0)\):

$$\begin{aligned} \left( E_0^{-\frac{1}{2}}{\mathbf {L}}^*E_t{\mathbf {L}}E_0^{-\frac{1}{2}}\right) \gamma _i(t_0) = \lambda _i^2\gamma _i(t_0) \end{aligned}$$
(14)

This equation has the same form as Eq. (7); comparing them we can conclude that the eigenvectors of \(E_0^{-\frac{1}{2}}{\mathbf {L}}^*E_t{\mathbf {L}}E_0^{-\frac{1}{2}} = \left( E_0^{-\frac{1}{2}}{\mathbf {L}}^*E_t^{\frac{1}{2}}\right) \left( E_t^{\frac{1}{2}}{\mathbf {L}}E_0^{-\frac{1}{2}}\right) = {\mathbf {L}}_s^*{\mathbf {L}}_s\) are the initial singular vectors of \({\mathbf {L}}_s\); and they represent the perturbations with a maximum amplification factor in the time interval \((t_0, t)\).

When used in real numerical weather prediction models, the calculation of the singular vector is difficult because the definition of the model \({\mathbf {M}}\) has to be computed analytically. In operational ensemble prediction systems, this calculation is made using tangent linear and adjoint models and an iterative Lanczos algorithm [6, 12]. A review of the method with applications to El Niño as well as decadal forecasting is presented in [29]. Also, Diaconescu and Laprise [8] review the applications such as forecast error estimation, ensemble forecasting, target adaptive observations, predictability studies and growth arising from instabilities.

1.1.2 Breeding Vector

This method is the most computationally inexpensive [38]. There are two different versions of this method: the simple breeding [35], and the masked breeding [36].

The main idea of the method is that the choice of the initial perturbation has to cover all the space of possible analysis errors. In an operational NWP, the perturbation of the initial state is reduced by the use of observations. Therefore, the most important errors are those associated with the evolution of the model. The breeding method modifies the perturbation using the difference between the perturbed and the unperturbed forecast. Using this technique, all random perturbations develop into the structure of the leading local (time-dependent) Lyapunov vectors (LLVs; see [37]) of the atmosphere after a transient period.

Toth and Kalnay [36] describe the main steps of the breeding method as

  1. 1.

    add a small, arbitrary perturbation to the atmospheric analysis (initial state) at a given day \(t_0\)

  2. 2.

    integrate the model from both the perturbed and unperturbed initial conditions for a short period \(t_1\)

  3. 3.

    subtract one forecast from the other

  4. 4.

    scale down the difference field so that it has the same norm as the initial perturbation

  5. 5.

    add this difference into the analysis corresponding to the following period \(t_1\)

By construction, this method “breeds” the nonlinear perturbations that grow fastest. Therefore, independent perturbations will converge to the same perturbations after enough time steps. This perturbation is related to LLVs. LLVs have been used to characterize the behavior of dynamical systems. The Lyapunov exponents(\(\lambda _i\)) are defined as

$$\begin{aligned} \lambda _i = \lim _{t\rightarrow \infty }\frac{1}{t}\log _2\left( \frac{p_i(t)}{p_i(0)} \right) \end{aligned}$$
(15)

where p is a linear perturbation spanning the phase space of the system with orthogonal vectors.

Each Lyapunov exponents can be associated with a perturbation vector. The vector associated to the largest exponent has the property that any random perturbation introduced an infinitely long time earlier develops into it. Lorenz [23] described this property; he noted that initially random perturbations had a strong similarity after 8 days of integration. The breeding method converges to this LLVs after 3 or 4 days of integration.

The masked breeding is the same as the simple breeding described before, but taking into account the geographically dependent uncertainty.

1.2 Multimodel Ensemble Methods

The rationale behind multimodel ensemble methods is that collective information is better than single information, especially the more complex the process. In the concrete case of short- and medium-range weather forecasting Sanders, it was demonstrated that combining different forecast could be beneficial [2, 15, 32]. Combining multiple models, Fritsch et al. [13] suggested that the superiority of the forecast relied on the variations in model physics and numerics between models leading to a substantial role in generating the full spectrum of possible solutions.

However, we should note that model physics and numerics is not enough, another source of uncertainty is the initial state of the atmosphere. This kind of uncertainties is handled by Ensemble Prediction Systems using a technique to perturbate the initial state (such the ones described in Sect. 1.1). So, a good idea could be to combine both models. Palmer et al. [27] developed a European multimodel ensemble system known as DEMETER.

When developing a multimodel ensemble system, there are several choices to be made. For example, we can consider all the individual forecasts equal, so we just combine them with the same weight. However, more complex methods of optimally combining the single-model output have been described [17, 30, 31]. Another aspect is how the initial state is perturbed; is it better to use the same perturbation in all models? Or we should use the default perturbation technique for each model?

In the concrete case of DEMETER, from each model, except that of the Max- Planck Institute (MPI), uncertainties in the initial state are represented through an ensemble of nine different ocean initial conditions. Three different ocean analyses; a control ocean analysis is forced with momentum, heat, and mass flux data from the ECMWF 40-yr Reanalysis, and two perturbed ocean analyses are created by adding daily wind stress perturbations to the ERA-40 momentum fluxes. The wind stress perturbations are randomly taken from a set of monthly differences between two quasi-independent analyses. Also, to represent the uncertainty in SSTs, four SST perturbations are added and subtracted at the start of the hindcasts. As in the case of the wind perturbations, the SST perturbations are based on differences between two quasi-independent SST analyses. Atmospheric and land surface initial conditions are taken directly from ERA-40.

Palmer [27] concludes that the multimodel ensemble is a viable, pragmatic approach to the problem of representing model uncertainty in seasonal-to-interannual prediction, and leads to a more reliable forecasting system than that based on any one single model.

A study of the superiority of multimodel ensemble systems has been done by Hagedorn et al. in [9, 16].

2 Ensemble Model for Diagnostic Wind Field

Given the importance of introducing the uncertainties in the prediction of the wind field, in this chapter, we describe a simple ensemble method designed for Wind3D, the diagnostic wind model presented in Chap. 4. In the same spirit as Wind3D, the ensemble approach described in this section is a fast procedure designed for the microscale.

Schematically, in any NWP, the main sources of uncertainty comes from observations, model parameters, data assimilation procedures, and boundary conditions.

In the wind model described in Chap. 4, we have detected the parameters with more uncertainty, namely: Gauss moduli parameter (\(\alpha \)), roughness length (\(z_0\)), and displacement height (d).If we categorize these uncertain parameters in the four categories defined above, \(\alpha \) belongs to the model parameters while \(z_0\) and d belong to boundary conditions. An evolutionary algorithm has been presented to characterize these parameters. However, it has been noted that even the “best estimation” has some uncertainty; in Sect. 4.2 several evolutionary algorithms have been run leading to different parameter estimations.

Another source of uncertainty in Wind3D comes from the observations. Please, remember that these observations can originate from measurement stations or the forecast of a deterministic NWP. In the case of the measurement data, the errors are related to the machine and the daily conditions whereas in the deterministic NWP forecast, we are using the “best forecast” provided by the NWP, but we have already seen that this forecast may be inaccurate. Moreover, due to the differences in horizontal resolution between the local scale diagnostic wind model and the NWP, the height of the grid points between models can be inconsistent. In this case, we do not know if these points are reliable for Wind3D. So, we may ask ourselves “Which are the reliable NWP forecast points?”

Since the method described is an ensemble forecast system, the wind model is used in conjunction with an NWP to have the predictability capability. In this case, to be able to estimate the variables, we need two different sets of data, the set used to run the wind model and the set of observations the results are compared against. Instantly another question arises “How do we generate these sets?”.

The ensemble model described here tries to answer the two doubts that have arisen. The model chooses the valid NWP points based on the difference between their height; when the difference between the NWP height and the diagnostic height is lower than a threshold, the point is valid. Once we have chosen the viable points, we construct the two subsets (model observations and validation data) using a random selection. Once the two subsets are created, we estimate the best values for \(\alpha \), \(\varepsilon \), \(z_0\), and d using the memetic algorithm discussed in Sect. 4.2. Figure 1 shows the diagram of the method.

Fig. 1
figure 1

Diagram of the ensemble system

This method can also be used with various NWP forecast emulating a multimodel ensemble. For example, we can have some ensemble members from ECMWF model, other members from NCEP, and the rest from AROME–HARMONIE.

3 Numerical Experiment

In this section, we present an application of the presented methodology. The application is in Gran Canaria island. The ensemble forecast is generated from AROME–HARMONIE forecast with a horizontal resolution of 2.5 km. The ensemble model is validated against measured data from the AEMET network stations. The day of the simulation is February 20, 2010.

Fig. 2
figure 2

Terrain discretization

The mesh created for this application is created with the Meccano method (Chap. 3) from a digital terrain model of the Gran Canaria island. The height of the domain is 10.000 m., and the resulting mesh has 251.808 nodes and 1.090.366 tetrahedra (Fig. 2)

Figure 3 shows the terrain height in the Meccano mesh and the AROME–HARMONIE grid. We can observe the differences between the height considered by the Wind3D and AROME–HARMONIE. The maximum height is around 1.000 m in the AROME–HARMONIE discretization and 2.000 m in the Wind3D discretization. This big height difference indicates that, at some points, the AROME–HARMONIE 10 m velocity may not be appropriate. For this reason, instead of using all the 10 m data, we have selected a subset of points attending to a height difference criteria.

Once a set of points has been chosen, we randomly divide them into two different subsets. One subset is used as observations in Wind3D, and the other subset is used by the evolutionary algorithm to compute the fitting function. The fitting function is the Root Mean Square Error (RMSE) between the forecast values by Wind3D and the data in the second subset. In this case, we have selected the points which height difference is less than 50 m. These selected points are shown in Fig. 4 (left). The two randomly generated subsets can be seen in Fig. 4 (right); green points are used as observations for Wind3D, and red points are used to compute the RMSE.

Fig. 3
figure 3

Terrain heights (m.)

Fig. 4
figure 4

AROME–HARMONIE points used in simulation

Now we have generated all the members of the ensemble. Then, we estimate the best values of \(\alpha \), \(\varepsilon \), \(z_0\) and d, and with these best values, we compute the forecast wind using the Wind3D model.

Finally, to validate the method, we compare the ensemble forecast results with the observed data measured in the AEMET network of automatic stations. Each station provides two data; the average and the maximum wind velocity of the last 10 min. Their UTM coordinates are summarized in Table 1, and their position in a map is shown in Fig. 5.

Table 1 Location of measurement stations (UTM coordinates)
Fig. 5
figure 5

Location of the AEMET measurement stations

Figure 6 shows the comparison of measured data and the ensemble box plot forecast. We show the most representative comparisons from four stations. The first thing that we can notice is that, in general, the mean value of the ensemble forecast is reasonably similar to the measured wind velocity. In some cases, the forecasted velocity is close to the maximum (C625O, C639Y), in some others, it is close to the average velocity (C619X), and sometimes it is in between (C635B).

Fig. 6
figure 6figure 6

Comparison of the average and maximum measured data and the ensemble box plot forecast

Another observation is that the variation of the mean value of the ensemble forecast is smoother than the measured velocity. In contrast, the measured data exhibits abrupt changes among time steps. These abrupt changes are not captured by any member of the ensemble.

A more detailed inspection of the comparatives shows interesting remarks. For example, the ensemble forecast in station C619X has many outliers in all time steps. C639Y also has some of them, but they are close to the mean values. However, C635B and C625O do not have outliers in all the time steps. These outliers sometimes can provide interesting information, for example, in station C619X from 0–7 h they capture the total variation between the average data and the maximum.

C625O station deserves a special mention. Analyzed carefully, we can observe that, between 11 h and midnight, the difference between maximum and average measured data increases. This increase is captured in the ensemble forecast by the higher dispersion of the box plot. This agreement between ranges shows that the resulting ensemble probability can be useful in predicting the uncertainty of the wind velocity.

4 Conclusions

In this chapter, we have seen the necessity of a probabilistic approach to numerical weather prediction is necessary. It is introduced with a brief review of the progress done in this area: the discovery of the need for a probabilistic approach and the development of these techniques. Then we go into more detail with the description of two of the more used methods to perturbate the initial state; Singular Vector decomposition and Breeding Vectors. To finish the introduction, we describe the basis of a multimodel ensemble method.

Next, we describe an ensemble forecast method specially designed for the microscale. This method is based on the estimation of the uncertain parameters using an evolutionary algorithm. The uncertain parameters are both model parameters, i.e., \(\alpha \) and \(\varepsilon \), and physical parameters, namely the roughness length (\(z_0\)) and the displacement height (d). The evolutionary algorithm minimizes the error of the predicted wind field by a microscale wind model and the forecast of an NWP. The NWP forecast is used for the input data of the model and the control data to compute the fitting function of the evolutionary algorithm. The selection of these two subsets is random and generates the different members of the ensemble system.

Finally, to illustrate the methodology and validate the model, we present a numerical experiment. In this experiment, we use the microscale model Wind3D described in Chap. 4 coupled with the AROME–HARMONIE model described in Chap. 5. The experiment is located in Gran Canaria island during February 20, 2010. The results have shown that, at any predicted time and station, the forecast ensemble probability lies between the average and the maximum velocity, usually closer to the maximum. Also, the range of the forecast increases when the difference between the maximum and average velocity raises, providing a tool to predict variability in the wind field.