Forecasting skill of model averages

Winter, C. L.; Nychka, Doug

doi:10.1007/s00477-009-0350-y

Forecasting skill of model averages

Original Paper
Published: 10 October 2009

Volume 24, pages 633–638, (2010)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Forecasting skill of model averages

Download PDF

C. L. Winter^1,2 &
Doug Nychka²

292 Accesses
40 Citations
Explore all metrics

Abstract

Given a collection of science-based computational models that all estimate states of the same environmental system, we compare the forecast skill of the average of the collection to the skills of the individual members. We illustrate our results through an analysis of regional climate model data and give general criteria for the average to perform more or less skillfully than the most skillful individual model, the “best” model. The average will only be more skillful than the best model if the individual models in the collection produce very different forecasts; if the individual forecasts generally agree, the average will not be as skillful as the best model.

Assessing the fidelity of predictability estimates

Article 19 September 2017

Evaluation of the skill of North-American Multi-Model Ensemble (NMME) Global Climate Models in predicting average and extreme precipitation and temperature over the continental USA

Article 04 August 2016

Relationships between potential, attainable, and actual skill in a decadal prediction experiment

Article 12 September 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Scientific models of environmental systems are based on accepted physical, chemical and biological principles of energy and mass transfer. The goal of a science-based environmental system model is to approximate selected states of the system, and science-based computational models are often used to forecast system states. Weather prediction, climate studies, and analyses of groundwater flow and transport provide examples too numerous to list. The accuracy of model forecasts must be assessed when management and policy decisions are based on them. In general, forecasts are assessed by comparing predicted states to observations taken over given periods, locations, or both. A wide range of evaluation measures has been used for assessment, including correlations between observed states and forecasts (Epstein and Murphy 1989; Murphy 1989), anomaly correlation coefficients (Wilks 2005), ranked probability score (Epstein 1969; Murphy 1971), receiver operating characteristic under the curve (Swets 1973), Peirce skill score (Peirce 1884), potential predictability (Boer 2004), various information criteria (Neuman 2003; Ye et al. 2008), odds ratio skill score (Thornes and Stephenson 2001), and mean-square error (Wilks 2005), which is the basis of square-error skill scores used in weather forecasting (Murphy 1989, 1996) and is also the basis for the analysis in this paper.

Assessing model forecasts is complicated by the existence of alternative models that produce different estimates of system states. The question of how to accommodate differences among models naturally arises, and one response has been to average model forecasts, the idea being that an average can achieve a consensus among individuals that emphasizes their points of agreement. In both climate (Palmer et al. 2004; Latif et al. 2006) and groundwater applications (Ye et al. 2004, 2005, 2008; Beven 2006; Poeter and Anderson 2005), the (weighted) average of a collection of models seems to produce better forecasts than any individual model when evaluated by standard measures of skill. For example, the average of models is in general the “… ‘best’ model in its ability to simulate current climate, at least in terms of typical second order measures such as mean square differences, spatial correlation, and the ratio of variances” (Boer 2004). Nonetheless, the reasons for that are not completely understood (Latif et al. 2006).

We use observed mean-square error,

$$ {\text{MSE}}(X,Y) = {\frac{1}{T}}\sum\limits_{t = 1}^{T} {(X_{t} - Y_{t} )^{2} } $$

(1)

to compare the relative skill of individual models and their averages because it is a standard measure of forecast skill (e.g., Murphy 1989, 1996; Epstein and Murphy 1989; Boer 2004; Palmer et al. 2004; Wilks 2005; Poeter and Anderson 2005; Ye et al. 2008). Observed MSE is a strong measure of skill in the sense that it directly compares model forecasts, X _t ∈ (X ₁,…,X _T) to observed state variables, Y _t ∈ (Y ₁,…,Y _T) obtained over an interval of length T. The observed variables may be direct observations of the system state or a related variable, for instance, some combination of the principal components of the observed state and/or model. From here on “skill” means observed square error skill.

Our goal is to compare the skill of forecasts of a system state at t made by m = 1,…,M individual models, X ^(m)_t , to the skill of their average,

$$ \bar{X}_{t} = \sum\limits_{m = 1}^{M} {w_{m} X_{t}^{(m)} } ,\quad{\text{where}}\;\sum\limits_{m} {w_{m} } = 1\;{\text{and}}\;w_{m} \ge 0\quad{\text{for}}\;{\text{all}}\;m. $$

(2)

In many climate applications, e.g., Meehl et al. (2007), model weights are uniform, w _m = 1/M, and we use uniform weights in our climate example (Sect. 3). However, that is not necessary. In geohydrology, e.g., Neuman (2003), Ye et al. (2004, 2005, 2008), weights are often chosen by Bayesian methods. We make no assumptions about weights, beyond those just stated in (2), to derive our general results (Sect. 4), which are therefore independent of the method used to choose weights. In some approaches, for instance Palmer et al. (2004), the values X ^(m)_t are themselves the result of stochastic averaging, but that also does not affect our analysis. Krishnamurti and his colleagues have considered weighted ensembles where some of the weights may be negative, but that approach raises several questions that go beyond the scope of this paper and so we only consider nonnegative weightings.

We show by example (Sect. 3) and analysis (Sect. 4) that a forecast produced by averaging outputs from a collection of M models will be more skillful than the forecast of any individual model, $ X^{(m)} , $ only if the models in the collection do not correspond too closely. In other words, for the average to be more skillful than any $ X^{(m)} , $ it is necessary that the collection of models include a diverse set of distinctly different forecasts. Second, if the forecasts of individual models are too similar, the average will produce worse forecasts than the most skillful individual model. Intuitively, averaging in this case dilutes the best forecast with other, similar forecasts that are not as good.

2 Components of MSE($ \bar{X},\;Y $)

Using w _m ≥ 0 for all m, the MSE of the average of a collection of models,

$$ {\text{MSE}}(\bar{X},Y) = {\frac{1}{T}}\sum\limits_{t} {(\bar{X}_{t} - Y_{t} )^{2} } = \sum\limits_{m} {w_{m}^{2} \;{\text{MSE}}(X^{(m)} ,Y)} + \mathop {\sum {\sum {} } }\limits_{{m \ne m^{\prime}}} w_{m} w_{{m^{\prime}}} R_{{m,m^{\prime}}} , $$

(3)

depends on the MSEs of the individual models, as well as the correspondences between models,

$$ R_{{m,m^{\prime}}} = {\frac{1}{T}}\sum\limits_{t} {(X_{t}^{(m)} - Y_{t} )(X_{t}^{{(m^{\prime})}} - Y_{t} )} . $$

(4)

The correspondence has an obvious geometrical interpretation,

$$ R_{{m,m^{\prime}}} = \sqrt {{\text{MSE}}(X^{(m)} ,Y)\;{\text{MSE}}(X^{{(m^{\prime})}} ,Y)} \cos \theta_{{m,m^{\prime}}} , $$

(5)

due to its dependence on the angle, $ \theta_{{m,m^{\prime}}} $ between the vectors

$$ \vec{Z}^{(m)} = \left( {X_{1}^{(m)} - Y_{1} , \ldots , \, X_{T}^{(m)} - Y_{T} } \right) $$

(6)

and $ \vec{Z}^{{(m^{\prime})}} $ in ℜ^T. We use this to motivate our general results in Sect. 4. If the models are unbiased, i.e., E [X ^(m)_t ] = Y _t, the correspondence is proportional to the correlation between models and has similar mathematical properties. But it should be emphasized that we are focusing on the model results as a fixed set of outcomes and are not assuming additional probability structure in this problem. From now on we write $ {\text{MSE(}}\vec{Z}^{(m)} )= {\text{MSE(}}X^{m} ,Y ) $ when it is convenient.

3 North American climate example

To illustrate these relationships, we consider the departures,

$$ \vec{Z}^{(m)} = \left( {Z_{\text{WNA}}^{(m)} ,\;Z_{\text{CNA}}^{(m)} ,\;Z_{\text{ENA}}^{(m)} } \right) $$

(7)

produced by 19 global climate models (m = 1,…,19) when estimating normal winter temperature for Western North America (WNA), Central North America (CNA) and Eastern North America (ENA). The normal is defined for the period 1970–1999, and winter consists of December, January and February. The data are normalized by summing over months and through the normal period 1970–1999,

$$ Z_{i,j}^{\left( m \right)} = {\frac{1}{3}}\left( {Z_{{i,j,{\text{Dec}}}}^{(m)} + Z_{{i,j,{\text{Jan}}}}^{(m)} + Z_{{i,j,{\text{Feb}}}}^{(m)} } \right)\quad{\text{and}}\;Z_{i}^{(m)} = {\frac{1}{30}}\sum\limits_{j = 1970}^{1999} {Z_{i,j}^{(m)} } . $$

(8)

Here we use i = WNA, CNA, or ENA instead of t to emphasize that the data are normalized and range over regions.

The model results are taken from the coordinated modeling effort supporting the Intergovernmental Panel on Climate Change (IPCC) Fourth Assessment Report (Meehl et al. 2007). The model output fields were re-gridded to a common 5° grid and compared to the observational data set from the Climate Research Unit (CRU), East Anglia, and the Hadley Centre, UK MetOffice (Jones et al. 1999). More details of these data and a global analysis of this multi-model sample can be found in Tebaldi et al. (2005), Tebaldi and Knutti (2007), Jun et al. (2008), Knutti et al. (2008). The regions WNA, CNA, and ENA are a subset of the Giorgi and Mearns (2002) divisions.

The model departures from regional normals and MSEs are shown in Table 1 where models correspond to rows and regions to columns. The last two rows are average departures over (1) the full suite of models, $ \bar{Z}_{19} , $ including the most skillful, and (2) the suite of models with the most skillful left out, $ \bar{Z}_{18} . $ As noted, models are uniformly weighted in this section, w _m = 1/M for all m, where M = 18 or 19 depending on which average is being considered.

Table 1 Departures (°C) and MSE for North America

Full size table

Looking at the suite of all 19 models first, the average, $ \bar{X}_{19} , $ is not as skillful as the best model, m = 16, for these data: MSE($ X^{{({ \min }_{19} )}} , $ Y) = MSE(X ⁽¹⁶⁾, Y) = 0.08 while MSE($ \bar{X}_{19} , $ Y) = 0.39. When model 16 is removed from the set of models, MSE($ X^{{(\min_{18} )}} , $ Y) = MSE(X ⁽²⁾, Y) = 0.47 while the average of the reduced set, $ \bar{X}_{18} , $ yields MSE($ \bar{X}_{18} , $ Y) = 0.45. These two cases illustrate the sensitivity of model evaluations to the set of models. There are perhaps other factors that affect these results, such as the specific observation interval and choice of regions, however, our analysis does not address these points.

Additional insight can be gained by decomposing

$$ \begin{aligned} {\text{MSE}}\left( {\bar{Z}} \right)-{\text{MSE}}\left( {\vec{Z}^{(\min)} } \right) & = \sum\limits_{m} {w_{m}^{2} \left( {{\text{MSE}}(\vec{Z}^{m} )} \right) - {\text{MSE}}\left( {\vec{Z}^{(\min)}} \right)} \\ & \quad + \mathop {\sum {\sum {} } }\limits_{{m \ne m^{\prime}}} w_{m} w_{m'} \left( {R_{m,m'} -\;{\text{MSE}}\left( {\vec{Z} } \right)^{(\min )} } \right) \\ & = m^{2} + r, \\ \end{aligned} $$

(9)

which makes clear that the model average is more skillful than the best model when r ≤ −m ², i.e., when the models do not correspond too much. Values of r and m corresponding to the full suite of nineteen models with model 16 included are r = 0.16 and −m ² = −0.15 (Table 2). This is a case where the best model is so much more skillful than all others that averaging only dilutes its individual skill. When model 16 is removed, there is then enough disagreement among the remaining models for the average to perform better than any of the rest: r = −0.17 and −m ² = −0.15 (Table 2).

Table 2 Effect of model collection on performance

Full size table

4 General results

Our general results consist of a sufficient condition for the model average to perform less skillfully than the best model, and a necessary condition for the average to perform better. Each result only requires that the weights be non-negative and sum to 1 (Eq. 2).

4.1 Result 1

If the models correspond too closely, the average is less skillful than the best model. Intuitively, this is because the other models dilute the performance of the best model in this case. Referring to Eq. 9, it is clear that MSE($ \bar{Z} $) > MSE($ \vec{Z}^{(\min )} $) if R_m,m′ > MSE(Z^(min)), for all m and m′.

Another way for the best model to be more skillful than the average is for the best to be much more skillful than all the other models, i.e., $ {\text{MSE}}(\vec{Z}^{(\min )} ) \ll {\text{MSE}}(\vec{Z}^{(m)} ) $ for all m ≠ min, as was the case in the climate example. In that case, MSE$ (\bar{Z}_{n + 1} ) - {\text{MSE}}(\vec{Z}^{(\min )} ) \cong {\text{MSE}}(\bar{Z}_{n + 1} ) \ge 0 $. However, Result 1 shows it is not necessary for one model to be much more skillful than the rest for the average to be less skillful; it is enough for all the models to correspond at a level that only depends on MSE(Z ^(min)).

4.2 Result 2

The average is more skillful than the best model only if some individual models do not correspond too much. This is derived from Result 1 as a proof by contradiction (modus tollens).

To emphasize the role of geometry, we illustrate the results in ℜ^T, the T-dimensional vector space of model forecasts (Figs. 1, 2). Letting $ \bar{Z} = (\bar{Z}_{1} , \ldots ,\bar{Z}_{T} ) $ and taking $ \vec{Z}^{(m)} $ from Eq. 6, the MSE are equivalent to squared lengths in ℜ^T, $ {\text{MSE}}(\bar{Z}) = {\frac{1}{T}}\left\| {\bar{Z}} \right\|^{2} $ and $ {\text{MSE}}(\vec{Z}^{(m)} ) = {\frac{1}{T}}\left\| {\vec{Z}^{(m)} } \right\|^{2}, $ while $ R_{{m,m^{\prime}}} = \sqrt {{\text{MSE}}(X^{(m)} ){\text{MSE}}(X^{{(m^{\prime})}} )} \cos \theta_{{m,m^{\prime}}} $ is the vector product $ \vec{Z}^{(m)} \cdot \vec{Z}^{{(m^{\prime})}} . $ In the figures, T = 3 for convenience of illustration, so each $ \vec{Z}^{(m)} $ = (Z ^(m)₁ , Z ^(m)₂ , Z ^(m)₃ ). The thin vectors represent the performances of different models, while the thick vector is the average, $ \bar{Z}, $ their lengths corresponding to MSE($ \bar{Z} $) and MSE(Z ^(m)) respectively.

Figure 1 shows a typical case of Result 1 leading to MSE($ \bar{Z} $) > MSE($ \vec{Z}^{{({ \min })}} $). The models all vary similarly about $ \vec{Y}, $ as evidenced by their approximate colinearity, but $ \vec{Z}^{(\min )} $ performs much better than the others. The result of averaging is to “stretch” $ \bar{Z} $ away from $ \vec{Z}^{(\min )} $ in the general direction of the other models. Figure 2 illustrates Result 2, a case when MSE($ \vec{Z}^{(\min )} $) ≥ MSE($ \bar{Z} $). The role of the anti-correspondence requirement,

$$ \cos \theta_{{m,m^{\prime}}} < {\frac{{{\text{MSE}}(Z^{(\min )} )}}{{\sqrt {{\text{MSE}}(Z^{(m)} )} {\text{MSE}}(Z^{{(m^{\prime})}} )}}} \;\;\;\;{\text{for}}\;{\text{some}}\;m,\;m^{\prime}, $$

is clear: MSE(Z ^(min)) ≥ MSE($ \bar{Z} $) because the two sets of vectors “pull” against each other to produce a reduced $ \bar{Z}. $

5 Summary and discussion

Alternate science-based computational models of a given environmental system always forecast system states that differ somewhat, and sometimes forecast states that differ considerably. Model averaging has been proposed as a means for dealing with differences among model forecasts, and it has been noted that in some cases the average of a collection of models produces “better” forecasts than any individual model in the collection (e.g., Boer 2004). We have compared models and their averages on the basis of mean-square difference skill score, which is a standard assessment measure in weather forecasting, climate studies and groundwater hydrology. In addition to its ubiquity, no other measure of skill has the straightforward metric properties of mean square error, the normalized distance between observations and forecasts.

We investigated two climate examples to compare the sensitivity of the skill of an average of a collection of models to the most skillful model (the “best”) in the collection. The best model in the first example was so much more skillful than any other model that the average simply could not perform as well as the best. When the best model was removed from the first collection, the average performed better than any individual model in the reduced collection. In that case, none of the remaining climate models performed well enough to dominate the others. Furthermore, the models disagreed, thus giving an example of the general result (Result 2) that the average can be more skillful than the best model only if some models make markedly different forecasts. The example also illustrated the sensitivity of skill assessments to the collection of models, but we did not go more deeply into that point.

Our general results give a sufficient condition for the best model to be more skillful than the average (Result 1) and a necessary condition for the average to be more skillful than the best individual (Result 2). In general, (1) the average is less skillful than the best individual if the forecasts of individuals correspond closely to each other when compared to the skill of the best model, and (2) the average is more skillful than the best model only if the forecasts of some individuals do not correspond. These results depend just on simple geometric properties of the collection of models, and are independent of (i) how the models make their forecasts and (ii) the weighting scheme used to derive their average (except the weights must be positive and sum to 1).

These results, and the example, suggest a certain amount of caution should be applied when making strong claims that model averages are more skillful than individual models, at least if those claims are based on second-order performance measures like observed square-error skill. At the same time, the results should not be over-interpreted. In the first place, Result 2 does not indicate that collections of models should be assembled with the idea of maximizing the differences among individuals. The goal is to make skillful forecasts, not to merely have a collection of models whose average is more skillful on a set of observations than any individual model. At the same time, Result 1 does not imply that an average is never useful when models agree. A collection of good models (models based on reasonable physical assumptions and estimates of system parameters) can be expected to produce forecasts that correspond strongly with each other, but differ in their details. An average might be useful in that setting even if it is not as skillful as the best model on a set of observations.

The discussion of the relative skill of science-based environmental models and their averages has taken place so far in the absence of statistical tests. The complex probability distributions of environmental forecasts in realistic settings is one reason for that. What is needed is a statistical test (or tests) to evaluate hypotheses about the means of forecasts made by complex physics models whose uncertain physical parameters are not Gaussian, or even necessarily unimodal (cf., Rubin 1995; Gomez-Hernandez and Wen 1998; Winter and Tartakovsky 2000, 2002; Christakos 2003; Guadagnini et al. 2003; Neuman and Wierenga 2003). In the absence of such tests, the results in this paper indicate the sources of apparent forecasting skills of model averages have a simple geometric explanation in some cases.

References

Beven K (2006) A manifesto for the equifinality thesis. J Hydrol 320:18–36
Article Google Scholar
Boer GJ (2004) Long time-scale potential predictability in an ensemble of coupled climate models. Clim Dyn 23:29–44
Article Google Scholar
Christakos G (2003) Another look at the conceptual fundamentals of porous media upscaling. Stoch Env Res Risk Assess 17(5):276–290
Article Google Scholar
Epstein ES (1969) A scoring system for probability forecasts of ranked categories. J Appl Meteorol 8:985–987
Article Google Scholar
Epstein ES, Murphy AH (1989) Skill scores and correlation coefficients in model verification. Mon Weather Rev 117:572–581
Google Scholar
Giorgi F, Mearns LO (2002) Calculation of average, uncertainty range, and reliability of regional climate changes from AOGCM simulations via the “reliability ensemble averaging” (REA) method. J Clim 15:1141–1158
Article Google Scholar
Gomez-Hernandez JJ, Wen X-H (1998) To be or not to be multi-Gaussian? A reflection on stochastic hydrogeology. Adv Water Resour 21(1):47–61
Article Google Scholar
Guadagnini A, Guadagnini L, Tartakovsky DM, Winter CL (2003) Random domain decomposition for flow in heterogeneous stratified aquifer. Stoch Env Res Risk Assess 17(6):394–407
Article Google Scholar
Jones PD, New M, Parker DE, Martin S, Rigor IG (1999) Surface air temperature and its variations over the last 150 years. Rev Geophys 37(173):173–199
Article Google Scholar
Jun M, Knutti R, Nychka DW (2008) Spatial analysis to quantify numerical model bias and dependence: how many climate models are there? J Am Stat Assoc 103(483):934–947
Article CAS Google Scholar
Knutti R, Jun M, Nychka DW (2008) Local eigenvalue analysis of CMIP3 climate model errors. Tellus A 60(5):992–1000
Article Google Scholar
Latif M, Collins M, Pohlmann H, Keenlyside N (2006) A review of predictability studies of Atlantic sector climate on decadal time scales. J Clim 19:5971–5987
Article Google Scholar
Meehl GA, Stocker TF, Collins WD, Friedlingstein P, Gaye AT, Gregory JM, Kitoh A, Knutti R, Murphy JM, Noda A, Raper SCB, Watterson IG, Weaver AJ, Zhao ZC (2007) Global climate projections. In: Climate change 2007: the physical science basis. Contribution of working group I to the fourth assessment report of the intergovernmental panel on climate change. Cambridge University Press, Cambridge
Murphy AH (1971) A note on the ranked probability score. J Appl Meteorol 10:155–156
Article Google Scholar
Murphy AH (1989) Skill scores based on the mean-square error and their relationships to the correlation coefficient. Mon Weather Rev 116:2417–2424
Google Scholar
Murphy AH (1996) General decompositions of MSE based skill scores: measures of some basic aspects of forecast quality. Mon Weather Rev 124:2353–2369
Google Scholar
Neuman SP (2003) Maximum likelihood Bayesian averaging of uncertain model predictions. Stoch Env Res Risk Assess 17(5):291–305
Article Google Scholar
Neuman SP, Wierenga PJ (2003) A comprehensive strategy of hydrogeologic modeling and uncertainty analysis for nuclear facilities and sites (NUREG/CR-6805). Office of Nuclear Regulatory Research, U.S. Nuclear Regulatory Commission, Washington, DC, July
Palmer TN, Alessandri A, Andersen U, Cantelaube P, Davey M, Deleclue P, Deque M, Diez E, Doblas-Reyes J, Feddersen H, Graham R, Gualdi S, Gueremy J-F, Hagedorn R, Hoshen M, Keenlyside H, Latif M, Lazar A, Maisonnave E, Marletto V, Morse AP, Orfila B, Rogel P, Terres J-M, Thomson MC (2004) Development of a European multimodel ensemble system for seasonal-to-interannual prediction (DEMETER). Bull Am Meteorol Soc 6:853–872
Google Scholar
Peirce CS (1884) The numerical measure of the success of predictions. Science 4:453–454
Article Google Scholar
Poeter E, Anderson D (2005) Multimodel ranking and inference in ground water modeling. Ground Water 43(4):597–605
Article CAS Google Scholar
Rubin Y (1995) Flow and transport in bimodal heterogeneous formations. Water Resour Res 31(10):2461–2468
Article Google Scholar
Swets JA (1973) The relative operating characteristic in psychology. Science 182:990–1000
Article Google Scholar
Tebaldi C, Knutti R (2007) The use of the multi-model ensemble in probabilistic climate projections. Philos Trans R Soc A 365(1857):2053–2075
Article Google Scholar
Tebaldi C, Smith RL, Nychka D, Mearns LO (2005) Quantifying uncertainty in projections of regional climate change: a Bayesian approach to the analysis of multimodel ensembles. J Clim 18(10):1524–1540
Article Google Scholar
Thornes JE, Stephenson DB (2001) How to judge the quality and value of weather forecast products. Meteorol Appl 8:307–314
Article Google Scholar
Wilks DS (2005) Statistical methods in the atmospheric sciences, 2nd edn. Academic Press, New York
Google Scholar
Winter CL, Tartakovsky DM (2000) Mean flow in composite porous media. Geophys Res Lett 27(12):1759–1762
Article Google Scholar
Winter CL, Tartakovsky DM (2002) Groundwater flow in heterogeneous composite aquifers. Water Resour Res 38(8):1148
Article Google Scholar
Ye M, Neuman SP, Meyer PD (2004) Maximum likelihood Bayesian averaging of spatial variability models in unsaturated fractured tuff. Water Resour Res 40(5):W05113
Article Google Scholar
Ye M, Neuman SP, Meyer PD, Pohlmann KF (2005) Sensitivity analysis and assessment of prior model probabilities in MLBMA with application to unsaturated fractured tuff. Water Resour Res 41(12):W12429
Article Google Scholar
Ye M, Meyer PD, Neuman SP (2008) On model selection criteria in multimodel analysis. Water Resour Res 44(3):W03428
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Hydrology and Water Resources, University of Arizona, Tucson, AZ, 85721, USA
C. L. Winter
National Center for Atmospheric Research, Boulder, CO, 80305, USA
C. L. Winter & Doug Nychka

Authors

C. L. Winter
View author publications
You can also search for this author in PubMed Google Scholar
Doug Nychka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to C. L. Winter.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Winter, C.L., Nychka, D. Forecasting skill of model averages. Stoch Environ Res Risk Assess 24, 633–638 (2010). https://doi.org/10.1007/s00477-009-0350-y

Download citation

Published: 10 October 2009
Issue Date: July 2010
DOI: https://doi.org/10.1007/s00477-009-0350-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Forecasting skill of model averages

Abstract

Similar content being viewed by others

Assessing the fidelity of predictability estimates

Evaluation of the skill of North-American Multi-Model Ensemble (NMME) Global Climate Models in predicting average and extreme precipitation and temperature over the continental USA

Relationships between potential, attainable, and actual skill in a decadal prediction experiment

1 Introduction

2 Components of MSE(\( \bar{X},\;Y \))

3 North American climate example