Introduction

Progress in forecasting the impacts of climate change on forest ecosystems (stand growth, succession sequences) and disturbance regimes (insect and disease outbreaks, fire frequencies) involves the ability to simulate extreme weather events. Extreme events are often sequences ranging from a few to several successive days of unusually hot, cold, dry or wet weather. Many researchers are developing climate-sensitive models of ecological processes that require as input daily or even hourly time series of temperature and precipitation (Prentice et al. 1993; Kimball et al. 1997; Friend 1998; Lasch et al. 2002; Régnière and Nealis 2002; Logan et al. 2003; Gray 2004). Typically, however, climate change scenarios are generated from single runs of Global Circulation Models (CGMs) with daily output at coarse spatial resolution. Because this output represents a single realization from a large set of possible outcomes, the possibility of statistical treatment (replication) is severely limited (Oelschlägel 1995). However, the output of CGMs can be summarized as monthly statistics (normals) or as “anomalies” (deviations from base-line normals) and disaggregated through stochastic daily weather generators (Bürger 1997; Hansen and Ines 2005). The low spatial resolution of GCMs can be increased through regional models or other down-scaling approaches, and monthly statistics can be input into ecological models to take into account climate change scenarios (Fowler et al. 2005; Qian et al. 2005).

The major issue in the development of stochastic daily weather generators is to obtain a sequence of daily weather that conforms not only to observed (or expected) mean values, but also to natural variability, extremes, and time series characteristics (Wallis and Griffiths 1995). Several approaches to generation of daily weather information have been developed, and most are based on the simulation of alternating dry- and wet-day series using Markov-chain transition probabilities (Richardon 1981; Racsko et al. 1991; Hutchinson 1995a; Katz 1996; Wilks 1999). Many of these models, however, require considerable amounts of input information and often must be calibrated for application in specific geographical areas (Richardson and Wright 1984; Racsko et al. 1991; Semenov and Barrow 1997; Bürger 1997; Semenov et al. 1998). There has also been considerable attention focused on adding interannual variation in these generators for ecological applications (Mearns et al. 1997; Hansen and Mavromatis 2001; Mavromatis and Hansen 2001; Dubrovský et al. 2004).

Régnière and Bolstad (1994) developed an algorithm (TEMPGEN) for simulation of daily minimum and maximum air temperature based on normals (30-year monthly average and extreme minimum and maximum temperatures). This algorithm generates realistic distributions, autocorrelation and cross-correlation patterns of daily minimum, maximum temperatures and ranges. It has been quite useful in studying the responses of models driven by daily weather information under “normal” weather conditions and has been used in many applications through the BioSIM system (Régnière 1996). However, this generator has three weaknesses: (1) it does not take into account the interannual variation of monthly average temperatures, and therefore generates temperature traces that are not as variable as they are in nature; (2) it does not generate daily precipitation; and (3) it was parameterized from data from the Atlantic coastal area of the United States, and as such is not applicable in other areas.

The objective of the work reported here was to remedy these three weaknesses by generalizing and validating the Régnière and Bolstad (1994) algorithm for the entire North American continent (north of Mexico) and adding daily precipitation to its output. We sought a model that would require minimal input (in the form of local monthly normal statistics) would need no for calibration, and would generate realistic daily times series of minimum and maximum air temperature and precipitation, exhibiting levels of variability and statistical distributions close to those of observed time series.

Methods

Algorithm structure

Throughout this document, temperatures are expressed in °C and precipitation in millimeters of water. The symbol ∼ above a term designates a 30-year normal while ¯ represents an average of some other type. Inputs of this model are 11 sets of monthly statistics extracted over a standard normal period (SNP) of 30 years of daily observations from a given location. The first two sets of normals are mean daily minimum (\( \widetilde{\mu }_{n} \)) and maximum (\( \widetilde{\mu }_{x} \)) temperatures. From these, normal daily minimum and maximum temperatures are obtained by linear interpolation between the means of successive months, adjusted to correspond to the values occurring at mid-month. Seven additional monthly normals are calculated from the differences between observed daily minimum and maximum temperatures and these daily normals: their standard deviations (\( \widetilde{\sigma }_{\delta } \) for minimum and \( \widetilde{\sigma }_{\varepsilon } \) for maximum temperature); 2nd order autoregressive terms describing their time-series characteristics (\( \widetilde{A}_{\delta } \), \( \widetilde{B}_{\delta } \) for minimum and \( \widetilde{A}_{\varepsilon } \), \( \widetilde{B}_{\varepsilon } \) for maximum temperatures); and their cross-correlation (\( \widetilde{r}_{{\delta \varepsilon }} \)). Two additional monthly statistics describe the characteristics of precipitation: average monthly total (\(\widetilde{P}\)); and the standard deviation (\( \widetilde{\sigma }_{P} \)) of the ratio of observed monthly total to normal precipitation, given by:

$$\widetilde{\sigma }_{P} = {\sqrt {{{\sum\limits_i {{\left( {\frac{{P_{m} }}{{\widetilde{P}}} - 1} \right)}^{2} } }} \mathord{\left/ {\vphantom {{{\sum\limits_i {{\left( {\frac{{P_{m} }}{{\widetilde{P}}} - 1} \right)}^{2} } }} {{\left( {n - 1} \right)}}}} \right. \kern-\nulldelimiterspace} {{\left( {n - 1} \right)}}} }$$
(1)

where P m is a month’s total precipitation in year m and n=30 years.

Daily minimum and maximum temperatures are generated with the algorithm of Régnière and Bolstad (1994):

$$ \begin{array}{*{20}c} {T_{n} {\left( t \right)} = \widetilde{T}_{n} {\left( t \right)} + \delta {\left( t \right)}} \\ {T_{x} {\left( t \right)} = \widetilde{T}_{x} {\left( t \right)} + \varepsilon {\left( t \right)}} \\ \end{array} $$
(2)

where \(\widetilde{{T_{n} }}\)(t) and \(\widetilde{{T_{x} }}\)(t) are daily normal minimum and maximum temperatures (obtained by linear interpolation as described above), and δ(t) and ɛ(t) are auto-correlated (2nd order) and cross-correlated random numbers with means of 0 and variances \( \widetilde{\sigma }^{2}_{\delta } \) and \( \widetilde{\sigma }^{2}_{\varepsilon } \). The autocorrelation and cross-correlation are generated through general linear stochastic processes expressed as:

$$\begin{array}{*{20}c} {\delta {\left( t \right)} = {\sum\limits_i {\alpha _{i} \gamma {\left( {t - i} \right)}} }} \\ {\varepsilon {\left( t \right)} = \frac{{\widetilde{\sigma }_{\varepsilon } }}{{\widetilde{\sigma }_{\delta } }}{\sum\limits_i {\beta _{i} } }{\left[\,\, {\rho \gamma {\left( {t - i} \right)} + {\left( {1 - {\left| \rho \right|}} \right)}\zeta {\left( {t - i} \right)}} \right]}} \\ \end{array} $$
(3)

where α i and β i are infinite series of coefficients such that:

$$ \begin{array}{*{20}c} {\alpha _{1} = 1,\alpha _{2} = \widetilde{A}_{\delta } ,{\text{and }}\alpha _{i} = \widetilde{A}_{\delta } \alpha _{{i - 1}} + \widetilde{B}_{\delta } \alpha _{{i - 2}} {\text{for i $>$ 2}}} \\ {\beta _{1} = 1,\beta _{2} = \widetilde{A}_{\varepsilon } ,{\text{and }}\beta _{i} = \widetilde{A}_{\varepsilon } \beta _{{i - 1}} + \widetilde{B}_{\varepsilon } \beta _{{i - 2}} {\text{for i $>$ 2}}} \\ \end{array} $$
(4)

and \( \gamma \in {\text{Nor}}{\left( {0,\sigma ^{2}_{\gamma } } \right)} \), \( \zeta \in {\text{Nor}}{\left( {0,\sigma ^{2}_{\zeta } } \right)} \). The formulation of Eq. 3 is a generalization of Régnière and Bolstad’s (1994) model 2, where ρ can be in the range [−1,1] and ɛ can have any variance regardless of the values of \( \widetilde{A}_{\delta } \), \( \widetilde{B}_{\delta } \), \( \widetilde{A}_{\varepsilon } \), \( \widetilde{B}_{\varepsilon } \) and ρ. The variances \( \widetilde{\sigma }^{2}_{\gamma } \) and \( \widetilde{\sigma }^{2}_{\zeta } \) are given by:

$$ \sigma ^{2}_{\gamma } = \frac{{\widetilde{\sigma }^{2}_{\delta } }} {{{\sum\limits_i {\alpha ^{2}_{i} } }}} $$
(5)

and

$$ \sigma ^{2}_{\zeta } = \frac{{\widetilde{\sigma }^{2}_{\delta } }} {{{\left( {1 - {\left| \rho \right|}} \right)}^{2} }}{\left( {\frac{1} {{{\sum\limits_i {\beta ^{2}_{i} } }}} - \frac{{\rho ^{2} }} {{{\sum\limits_i {\alpha ^{2}_{i} } }}}} \right)} $$
(6)

and the value of ρ is:

$$ \rho = \widetilde{r}_{{\delta \varepsilon }} \frac{{{\sum\limits_i {\alpha ^{2}_{i} } }}} {{{\sum\limits_i {{\left( {\alpha _{i} \beta _{i} } \right)}} }}} $$
(7)

(see Régnière and Bolstad 1994, Appendix 2, for the derivation of Eqs. 57). The sums \( {\sum {\alpha ^{2} } } \), \( {\sum \beta }^{2} \) and \( {\sum {{\left( {\alpha \beta } \right)}} } \) converge within the first 20 terms. In the original model of Régnière and Bolstad (1994), \( \widetilde{A}_{\delta } \), \( \widetilde{B}_{\delta } \), \( \widetilde{A}_{\varepsilon } \), \( \widetilde{B}_{\varepsilon } \) and \( r_{{\delta \varepsilon }} \) were calculated from station latitude and time of year. Observed extreme minimum and maximum observations were also needed to calculate \( \widetilde{\sigma }_{\delta } \) and \( \widetilde{\sigma }_{\varepsilon } \), based on the analysis of data from the Atlantic area of the United States. In the present model, they are input station normals.

Contrary to most daily weather generators where daily precipitation is the basis for simulation of temperature values (Katz 1996), in this model air temperature is generated first, and precipitation is computed on the basis of daily temperatures, with the underlying assumption that precipitation is more likely on days with smaller temperature ranges. Total monthly precipitation (P) is generated by drawing a multiplicative random deviation ΔP ≥ 0 from a family of Weibull probability distribution functions entirely parameterized from \( \widetilde{\sigma }_{P} \):

$$ P = \Delta P \cdot \widetilde{P} $$
(8)

The probability of daily precipitation p(t) is generated from total monthly precipitation P, its variance \( \sigma ^{2}_{P} \) and the daily temperature range \( R{\left( t \right)} = T_{x} {\left( t \right)} - T_{n} {\left( t \right)} \). The value of p(t) is then compared with a uniformly distributed random number \( \psi \in \mathcal{U}{\left( {0,1} \right)} \). Whenever p(t) < ψ, p(t) is reset to 0. For days where p(t) > 0, the amount of daily precipitation P(t) is calculated as follows. The proportion x(t) of total monthly precipitation falling on day t is obtained from a Beta distribution specified entirely from the month’s \( \widetilde{\sigma }_{P} \) and that day’s precipitation is given by:

$$ P(t) = x(t)P $$
(9)

where \( {\sum\limits_t {x{\left( t \right)}} } = 1 \) within a month.

Data sources

Monthly normals \( {\left( {\widetilde{\mu }_{n} ,\widetilde{\mu }_{x} ,\widetilde{\sigma }_{\delta } ,\widetilde{\sigma }_{\varepsilon } ,\widetilde{A}_{\delta } ,\widetilde{B}_{\delta } ,\widetilde{A}_{\varepsilon } ,\widetilde{B}_{\varepsilon } ,\widehat{r}_{{\delta \varepsilon }} ,\widetilde{P}\;{\text{and}}\;\widetilde{\sigma }_{P} } \right)} \) for stations in North America were calculated over the 1961–1990 SNP from daily records contained in two databases. For Canada, data from Atmospheric Environment Service Daily Climate Data CD-ROMs http://www.msc-smc.ec.gc.ca/) were used. For the United States (excluding Alaska, Atlantic and Pacific Island states), the NCDC COOP Summary of the Day TD3200 database http://www.ncdc.noaa.gov/) was used. Only stations with at least 20 years of data for each month during the SNP were retained. For these stations, a month’s data were retained only if it contained at least 20 days of non-missing temperature or precipitation data. Total precipitation in months with missing data was adjusted (P=P unadjusted ×days in month/non-missing days in months).

Variation of monthly mean temperature and total precipitation

Annual variation of monthly mean temperature and precipitation was analyzed using data from four clusters of weather stations representing a wide range of North American climates: 156 in Quebec, 129 in British Columbia, 118 in California and 108 in Tennessee and North Carolina. Deviations of monthly mean temperature (μ) and total precipitation (P) from 30-year normals \(\left[ {\widetilde\mu = {{\left( {\widetilde\mu _n + \widetilde\mu _x } \right)} \mathord{\left/ {\vphantom {{\left( {\widetilde\mu _n + \widetilde\mu _x } \right)} 2}} \right. \kern-\nulldelimiterspace} 2}} \right]\;\text{and}\;\widetilde P\) were calculated:

$$ \begin{array}{*{20}l} {{\Delta \mu = \mu - \widetilde{\mu }} \hfill} \\ {{{\Delta P = P} \mathord{\left/ {\vphantom {{\Delta P = P} {\widetilde{P}}}} \right. \kern-\nulldelimiterspace} {\widetilde{P}}} \hfill} \\ \end{array} $$
(10)

To detect month-to-month autocorrelation, values of Δμ and ΔP were averaged by year and month over all stations in each cluster. This provided eight 360-month time series: four of Δμ and four of ΔP. Autocorrelation in each of these time series was calculated for lags of 1–12 months. Cross-correlations (without lag) between deviations in temperature and precipitation were also tested for significance.

The distribution of monthly precipitation deviations ΔP changed consistently from month to month, and the resulting patterns were different between station clusters. The Weibull probability distribution function (see Hastings and Peacock 1975) was chosen as a suitably flexible model to describe these patterns:

$$ p{\left( {\Delta P \leqslant X} \right)} = 1 - e^{{ - {\left[ {{{\left( {X + 0.05} \right)}} \mathord{\left/ {\vphantom {{{\left( {X + 0.05} \right)}} \lambda }} \right. \kern-\nulldelimiterspace} \lambda } \right]}^{\eta } }} $$
(11)

where X ≥ 0. The value 0.05 was added to X to ensure that months with no precipitation could have a non-zero probability of occurring (p > 0 when X = 0).

Equation 11 was fitted to the cumulative distribution of ΔP for each month, pooled separately over stations from each cluster. Parameters η and λ were estimated for the 48 distributions (12 months, for stations in Quebec, British Columbia, California and Tennessee/North Carolina) by non-linear least squares regression (SAS release 8.04, PROC NLIN). Parameter η was highly correlated to the monthly values of \( \overline{\sigma } _{P} = \tfrac{1} {n}{\sum {\widetilde{\sigma }_{P} } } \) averaged over all stations (by cluster and month). The relationship was expressed as:

$$\widehat{\eta } = a_{\eta } + b_{\eta } /\overline{\sigma } _{P} $$
(12)

where \(\widehat{\eta }\) is an estimate of η given \( \overline{\sigma } _{P} \). Parameters a η and b η were estimated by linear regression. There was also a strong relationship between the values of η and λ, which we modeled with:

$$ \widehat{\lambda } = a_{\lambda } {\left( {1 - e^{{ - b_{\lambda } \widehat{\eta }}} } \right)}^{{c_{\lambda } }} $$
(13)

where \( \widehat{\lambda } \) is an estimate of λ; a λ , b λ and c λ are parameters that were estimated by non-linear least squares regression. The probability distribution of total monthly precipitation for a given station can be determined from \(\widetilde{P}\) and \( \widetilde{\sigma }_{P} \) with Eqs. 11, 12 and 13.

Generation of daily precipitation

For this analysis, daily air temperature and precipitation records were compiled from the four clusters of weather stations described above. Daily probabilities of precipitation were related through a purely empirical model to three variables: total monthly precipitation P, its interannual variability \( \widetilde{\sigma }_{P} \), and daily temperature range R(t). The distinction between wet and dry days is made on the basis of these probabilities. Total monthly precipitation is partitioned among wet days through a family of Beta distributions.

The probability of daily precipitation p(t) on day t was modeled with:

$$ g{\left( p \right)} = k_{0} + k_{1} \log _{{10}} {\left( {P + 1} \right)} + k_{2} \log _{{10}} W + k_{3} \widetilde{\sigma }_{P} $$
(14)

where

$$ g{\left( p \right)} = \ln {\left[ {p \mathord{\left/ {\vphantom {p {{\left( {1 - p} \right)}}}} \right. \kern-\nulldelimiterspace} {{\left( {1 - p} \right)}}} \right]} $$
(15)

is the logistic transform of p, and W is a Weibull probability density function of daily temperature range R(t):

$$ W = \frac{{\omega _{2} }} {{\omega _{1} }}{\left( {\frac{{R{\left( t \right)} + 1}} {{\omega _{1} }}} \right)}^{{\omega _{2} - 1}} e^{{ - {\left( {\frac{{R(t) + 1}} {{\omega _{1} }}} \right)}^{{\omega _{2} }} }} $$
(16)

with four parameters:

$$ \omega _1 = k_4 + k_5 \,\widetilde\sigma _p \;\text{and}\;\omega _2 = k_6 + k_7 \;\widetilde\sigma _P . $$
(17)

Equations 1417 were fitted to observations in the following manner. Days with precipitation exceeding 0.1 mm were assigned a precipitation index of 1 (otherwise 0). Precipitation indices were averaged, providing values of p(t) for classes of P, R(t) and \( \widetilde{\sigma }_{P} \). P values were grouped into 5 mm/month classes from 0 to ≥250 mm/month. R(t) values were grouped into 1°C classes from ≤3 to ≥25°C. \( \widetilde{\sigma }_{P} \) values were grouped into 14 classes: 0–0.3, 0.3–0.4, ..., 0.9–1.0, 1.0–1.5, ..., 3.0–3.5, >3.5. Classes containing fewer than 10 observations were dropped, leaving a total of 9,821 distinct combinations of the independent variables. Parameters k 0k 7 were estimated simultaneously by non-linear logistic regression analysis.

The distribution of the proportion of total monthly precipitation falling on a given rainy day t, the ratio P(t)/P where P(t) > 0.1 mm, was approximated by a family of Beta distributions Beta(v,w) where shape parameters v and w are estimated from the mean and uncorrected variance of the P(t)/P ratio (see Hastings and Peacock 1975). The data set was subdivided into 14 groups according to the values of \( \widetilde{\sigma }_{P} \) (classes listed above), v and w were calculated by month within each class and estimates of w were related to \( \widetilde{\sigma }_{P} \) (the average \( \widetilde{\sigma }_{P} \) within each group) with:

$$ w = m_{0} + \frac{{m_{1} }} {{{\left( {\overline{\sigma } _{P} + 1} \right)}^{{m_{2} }} }} $$
(18)

Parameters m 0m 2 were estimated by non-linear least-squares regression. There was also a curvi-linear relationship between v and w expressed with:

$$ v = {m_{3} + m_{4} w + m_{5} } \mathord{\left/ {\vphantom {{m_{3} + m_{4} w + m_{5} } w}} \right. \kern-\nulldelimiterspace} w $$
(19)

where m 3m 5 are parameters estimated by linear regression. From relationships in Eqs. 18 and 19, a Beta distribution of the P(t)/P ratio can be assigned for any value of \( \widetilde{\sigma }_{P} \).

Model validation

Data from one station in each of the four clusters used in model parameterization were used for a detailed comparison of observed and simulated patterns of monthly variation and distributions. These stations were selected because they had the least missing data: Summerland BC (49°34′N, 119°39′W), Quebec QC (45°48′N, 71°23′W), Button Willow CA (35°24’N, 119°28W) and Knoxville TN (35°49′N, 83°59′W) (Fig. 1). For each station, ten observed and simulated monthly statistics were compared: extreme and mean minimum and maximum temperature, the standard deviations of mean temperature and total precipitation, average daily precipitation and probability of precipitation, and the duration of droughts and hot spells. Droughts were defined as periods of successive days with precipitation < 0.1 mm. Hot spells were defined as periods of successive days with maximum temperature ≥30°C. Seven observed and simulated frequency distributions were compared: daily minimum and maximum temperatures, daily temperature range, frost-free period, duration of droughts and hot spells, and amount of daily precipitation.

Fig. 1
figure 1

Map of Canada and the United States. ◊ Data sources for detailed model output analysis, • data sources for simulator validation

Normals and daily data over the entire 1961–1990 SNP were obtained for a set of 708 North American weather stations. This set was selected at random by a thinning algorithm eliminating neighboring stations within 100 km of each other, so that selected stations were distributed as evenly as possible across Canada and the continental United States, conditional on station density. Station density was very low in northern Canada (Fig. 1). From the monthly normals of each station (\(\widetilde{\mu }_{n} ,\widetilde{\mu }_{x} ,\widetilde{\sigma }_{\delta } ,\widetilde{\sigma }_{\varepsilon } ,\widetilde{A}_{\delta } ,\widetilde{B}_{\delta } ,\widetilde{A}_{\varepsilon } ,\widetilde{B}_{\varepsilon } ,\widetilde{r}_{{\delta \varepsilon }} ,\widetilde{P}\;{\text{and}}\;\widetilde{\sigma }_{P} \)), daily temperature and precipitation data (300 years) were generated with the daily weather generator and simulator-generated monthly normals for each station were compiled. Six additional observed and simulated monthly statistics were calculated by station: mean extreme minimum and maximum temperatures, average daily temperature range, average daily precipitation and probability of precipitation, and the standard deviation of mean temperature, designated \( \sigma _{\mu } \). Observed and simulated annual frost-free periods were compiled. Five extreme events were extracted from observed and simulated frequency distributions compiled by station over the entire observation (30 years) or simulation (300 years) period: the coldest and hottest temperatures (5th percentile lowest minimum and 95th percentile highest maximum temperatures), the heaviest daily precipitation events (95th percentile), the longest droughts (95th percentile) and the longest hot spells (95th percentile).

Results

Data analysis

Autocorrelation

There was a weak but significant 1-month lag autocorrelation in the monthly mean deviations from normals temperature (Δμ) among stations from British Columbia (r = 0.229, P < 0.001), California (r = 0.159, P = 0.003), Tennessee/North Carolina (r = 0.131, P = 0.014), but not from Quebec (r = 0.067, P  = 0.201). There was no significant autocorrelation in deviations of total monthly precipitation from normals (ΔP) among stations of any cluster normals at any time lag. Thus, the simulator was designed without a built-in autocorrelation structure in monthly deviations from monthly mean temperature or total precipitation. There was a weak correlation between deviations from monthly mean temperatures and deviations from monthly total precipitation among time series from British Columbia (r = −0.149, P  < 0.001), Quebec (r = 0.045, P  < 0.001) and California (r = −0.130, P  < 0.001), but none from Tennessee/North Carolina (r = −0.004, P  = 0.454). Because of the weakness and ambiguity of this association, no attempt was made to correlate average monthly temperature and total precipitation in the simulator.

Precipitation

Equations 12 and 13 described precisely the relationships between η and \( \overline{\sigma } _{P} \) (Fig. 2a), and λ and η (Table 1; Fig. 2b). The family of Weibull distribution functions (Eq. 11) generated by these relationships approximated quite well the patterns of monthly deviations of total precipitation from normals, ΔP, among data from stations from the four clusters: British Columbia (Fig. 3a), Quebec (Fig. 3b), California (Fig. 3c) and Tennessee/North Carolina (Fig. 3d).

Fig. 2a, b
figure 2

Variation of parameters η and λ of Eq. 11 describing the distribution of \( {\Delta P = P_{i} } \mathord{\left/ {\vphantom {{\Delta P = P_{i} } {\widetilde{P}}}} \right. \kern-\nulldelimiterspace} {\widetilde{P}} \), relative monthly precipitation for stations in (○) Quebec, (•) British Columbia, (□) California and (▪) Tennessee/North Carolina. a Relationship between η and the standard deviation of monthly precipitation \( \overline{\sigma } _{P} \) (line: Eq. 12). b Relationship between λ and η (line: Eq. 13)

Table 1 List of parameter estimates for Eqs. 1219 of the simulator
Fig. 3
figure 3

Cumulative distribution of \( {\Delta P = P_{i} } \mathord{\left/ {\vphantom {{\Delta P = P_{i} } {\widetilde{P}}}} \right. \kern-\nulldelimiterspace} {\widetilde{P}} \), the ratio of monthly precipitation on normal precipitation, by month, for stations in a Quebec, b British Columbia, c California and d Tennessee/North Carolina. ○ Observations, ― Weibull distribution function (Eq. 11)

There was a clear diminishing-return relationship between the daily probability of precipitation p(t) and total monthly precipitation P (Fig. 4a). There was also a strong relationship between p(t) and daily temperature range R(t) (Fig. 4b), confirming the initial underlying assumption that the likelihood of precipitation is inversely proportional to R(t). These relationships were modulated strongly by variability of monthly precipitation (\( \widetilde{\sigma }_{P} \)). Equation 14 provided a good description of these relationships (Table 1).

Fig. 4
figure 4

Relationships between the daily probability of precipitation p(t) and a total monthly precipitation P or b daily temperature range R(t) for different values of precipitation variability \( \widetilde{\sigma }_{P} \) . • Averaged observations, ― Eqs. 1417

The Beta distribution captured well the changes in distribution of the ratio P(t)/P, the proportion of a month’s total precipitation falling on a given day, with respect to \( \widetilde{\sigma }_{P} \). Equations 18 and 19 accurately described the relationships between the v and w parameters of the Beta(v, w) distribution and \( \widetilde{\sigma }_{P} \) (Table 1; Fig. 5).

Fig. 5a–c
figure 5

Variation of the parameters v and w of the family of Beta functions describing the distribution of daily precipitation. a Eq. 18 relating w to \( \widetilde{\sigma }_{P} \). b Eq. 19 relating v to w. c Cumulative distribution of the amount of daily precipitation P(t) relative to total monthly precipitation P, for five classes of precipitation variability \( \widetilde{\sigma }_{P} \). ― expected values, • observations

Model validation

Simulated and observed monthly temperature fluctuations (\( \widetilde{\mu }_{n} ,\widetilde{\mu }_{x} \) and mean extreme minimum and maximum) were nearly identical at the four stations selected for detailed examination (Summerland BC, Quebec QC, Knoxville TN and Button Willow CA; Fig. 6a). Seasonal fluctuations of the variance of mean monthly temperature and total precipitation (\( \sigma _{\mu } \) and \( \widetilde{\sigma }_{P} \)) were also well mimicked by the model (Fig. 6b). However, the simulated values of \( \sigma _{\mu } \) tended to be lower than observed (closed circles, Fig. 6b). Simulated and observed daily precipitation averages were nearly identical (Fig. 6c). Monthly variations of the probability of simulated daily precipitation varied in good general agreement with observations (Fig. 6d). The simulated seasonal patterns of drought and hot-spell duration were very similar to observations, although simulated droughts tended to be shorter than observed in Button Willow CA (Fig. 6e,f).

Fig. 6a–f
figure 6

Comparison of simulated and observed monthly statistics for four selected stations. Columns from left to right: 1 Summerland BC, 2 Quebec QC, 3 Button Willow CA, 4 Knoxville TN. Symbols observations, lines model. a Extreme and mean minimum and maximum temperatures. b Standard deviation of mean temperature (―•―) and precipitation (---○---). c Average daily precipitation. d Daily probability of precipitation. e Drought duration. f Hot-spell duration

The distributions of observed and simulated daily minimum and maximum temperatures and of daily ranges were nearly identical (Fig. 7a,b). The distribution of the frost-free period was well simulated by the model in three of the four stations (Fig. 7c). In Button Willow CA, observed frost-free periods were about 10 days longer than simulated. The distribution of hot spell duration was well mimicked by the model in all four locations (Fig. 7d). While the distributions of simulated and observed single-day precipitation amounts (Fig. 7e) and droughts (Fig. 7f) were quite similar in three of the four stations, both simulated distributions tended toward higher values than observed in Button Willow CA.

Fig. 7a–f
figure 7

Comparison of cumulative distributions of simulated and observed statistics for four selected stations. Columns from left to right: 1 Summerland BC, 2 Quebec QC, 3 Button Willow CA, 4 Knoxville TN. Symbols observations, lines model. a Daily minimum (○) and maximum (•) temperatures. b Daily temperature range. c Frost-free period. d Hot-spell duration. e Amount of daily precipitation. f Drought duration

Over the 708 stations in the validation data set, 9 of the 11 monthly normals calculated from model output were for all practical purposes identical to observed normals in terms both of means and variances (\(\widetilde{\mu }_{n} ,\widetilde{\mu }_{x} ,\widetilde{\sigma }_{\delta } ,\widetilde{\sigma }_{\varepsilon } ,\widetilde{A}_{\delta } ,\widetilde{B}_{\delta } ,\widetilde{A}_{\varepsilon } ,\widetilde{B}_{\varepsilon } ,\widetilde{r}_{{\delta \varepsilon }} \); Table 2). Simulated total monthly precipitation \(\widetilde{P}\) was slightly lower than observed, with a mean error (ME) of 0.3 ± 2.7 mm/month and a mean absolute error (MAE) of 1.7 ± 1.9 mm/month. However, these differences were very small (0.5% on average) in comparison to the observed average monthly precipitation of 63.6 ± 47.8 mm/month and the correlation between observed and average total monthly precipitation was very high (r = 0.998). While the ME and MAE between observed and simulated variation of monthly total precipitation \( \widetilde{\sigma }_{P} \) were very small (Table 2), the correlation between them was not as high as for other normal statistics (r = 0.984). This indicates that the model realistically mimics this variation, but does not do this exactly, especially for higher values of \( \widetilde{\sigma }_{P} \) (Fig. 8a).

Table 2 Comparison of observed and simulated statistics from 708 randomly-selected weather stations
Fig. 8a–h
figure 8

Comparison of selected observed (ordinate) and simulated (abscissa) statistics at 708 randomly selected weather stations, either a single annual value (n = 708) or 12 monthly values per station (n = 8496) (see Fig. 1 for geographical distribution). Lines: equality. a Standard deviation of monthly precipitation (n = 9,496 records). b Mean extreme maximum (n = 8,496). c Standard deviation of monthly mean (n = 8,496). d Average daily probability of precipitation (n = 8,496). e Frost-free period (n = 708). f Extreme (95th percentile) hot spell duration (n = 708). g Extreme (95th percentile) daily precipitation (n = 708). h Extreme (95th percentile) drought duration (n = 708)

The model generated accurate average monthly mean extreme minimum temperatures (Table 2) that were nearly unbiased (ME:−0.3 ± 0.9°C; MAE: 0.7 ± 0.7°C) and were highly correlated with observations (r = 0.998). Simulated average monthly maximum temperature was clearly biased (ME:1.1 ± 1.0°C, MAE 1.2 ± 1.2°C), despite a high correlation between observed and simulated values (r = 0.996; Fig. 8b). To explain this bias, the distributions of observed and simulated deviations of maximum temperature from daily averages, ɛ(t) in equation [2], were compared for the four stations selected for detailed analysis. While the distributions of simulated deviations were nearly normal, observed deviations had lower than expected frequencies at the higher extremes (> 98th percentile), in all four stations and throughout the year (Fig. 9). This difference in distributions between simulated and observed ɛ(t) is the cause of the bias in mean monthly extreme maximum temperatures output by the model. Limiting the values of ɛ(t) to the range \( \pm 1.96\,\widetilde{\sigma }_{\varepsilon } \) in the calculation of T x (t) (Eq. 2) corrects this bias very efficiently without impacting other output characteristics.

Fig. 9a–h
figure 9

Comparison of the distribution of observed and simulated daily deviations of maximum temperature from their means at four selected weather stations. Rows: 1 Quebec QC, 2 Summerland BC, 3 Button Willow CA, 4 Knoxville TN. ad January. eh July. ○ Observed, –---- simulated, ― normal distribution. Arrows indicate the systematic shortage of very high values among observations

The standard deviations of mean monthly temperature \( \sigma _{\mu } \) generated by the model were smaller than observed (ME:−0.3 ± 0.3; MAE:0.3 ± 0.3; Table 2), although the correlation between observed and simulated values was high (r = 0.966). This bias represents a 15% difference on average (Fig. 8c) and was most pronounced at higher values of \( \sigma _{\mu } \) (Fig. 8c). In the model, this variation results entirely from the variances \( {\left( {\widetilde{\sigma }_{\varepsilon } ,\widetilde{\sigma }_{\delta } } \right)} \), autocorrelation \( {\left( {\widetilde{A}_{\delta } ,\widetilde{B}_{\delta } ,\widetilde{A}_{\varepsilon } ,\widetilde{B}_{\varepsilon } } \right)} \) and cross-correlation \( {\left( {\widetilde{r}_{{\delta \varepsilon }} } \right)} \) structure of the stochastic processes δ and ɛ in Eq. 3. There was a strong spatial pattern in observed values of \( \sigma _{\mu } \) over the North American continent, with the higher values occurring in the Prairie Provinces and around Hudson’s Bay in Canada (Fig. 10a). The differences between model output and observations being proportional to \( \sigma _{\mu } \), they had a very similar spatial pattern (Fig. 10b).

Fig. 10a–h
figure 10

Maps of selected statistics from 708 randomly selected weather stations (see Fig. 1 for geographical distribution). Columns: Left Observed values, right simulated-observed differences. a, b \( \sigma _{\mu } \). c, d Daily probability of precipitation. e, f Extreme (95th percentile) daily precipitation. g, h Extreme (95th percentile) drought duration

The average daily ranges and daily precipitation generated by the model were nearly identical to observations, with no bias and very high correlation coefficients (Table 2). However, average simulated daily probability of precipitation was lower than observed (ME: 0.020 ± 0.083; MAE: 0.067 ± 0.052; Table 2). Because of the high variability of daily precipitation probability averaged by month and station, the correlation between simulated and observed means was only 0.84 (Fig. 8d). The daily probability of precipitation in North America is strongly patterned: it is highest in the Pacific Northwest and the Northeast (especially northern Ontario and Quebec), and lowest in the Southwest (Fig. 10c). Differences between model output and observations were also patterned spatially (Fig. 10d). The largest negative differences (simulated < observed) were located in the mountains of the West, the Canadian Prairies and at the latitude of James Bay in eastern Canada. Positive differences were concentrated in the eastern half of the continent, south of the Great Lakes, and along the Southern half of Canada’s Atlantic Coast.

The model generated highly realistic distributions of extreme temperatures (5th percentile lowest daily minimum and 95th percentile highest daily maximum). The values of these two statistics generated by the model were highly correlated with observations, with low ME and MAE (Table 2). Average frost-free periods were also quite well predicted, with no bias and a small MAE of 4.9 ± 4.5 days (Fig. 8e). The model predicted reasonably well the duration of extreme hot spells (95th percentile longest) with an MAE of only 2.6 ± 3.5 days, relative to the observed average of 15.0 ± 18.8 days. The correlation between simulated and observed extreme hot spells was high (r = 0.979; Fig. 8f). However, the model did not perform so well in predicting extreme daily precipitation (95th percentile highest values) or extreme drought durations (95th percentile longest). The model generated extreme precipitation events that were only slightly larger than observed on average (ME: 2.7 ± 9.5 mm/day), but the the MAE was quite high (6.9 ± 7.0 mm/day) relative to the observed average of 25.0 ± 12.0 mm/day (Fig. 8g). Extreme precipitation events have a strong spatial pattern in North America, and are largest in the Pacific Northwest and around the Gulf of Mexico (Fig. 10e). The model tended to predict heavier extreme precipitation than observed in the western half of the United States, and lighter than observed in the Southeast (Fig. 10f). Average predicted extreme drought duration was unbiased (ME: 0.1 ± 7.5 days) but absolute differences between model output and observations were high (MAE: 4.8 ± 5.8 days) compared to the observed average (18.1 ± 13.5 days; Fig. 8h). The model underestimated the extreme summer droughts observed in the southwestern corner of the continent, but overestimated those over the high-elevation terrain of much of the western United States (Fig. 10g,h).

Discussion

Two of the objectives of this modification of Régnière and Bolstad’s (1994) TEMPGEN algorithm were to generalize it to the entire North American continent, north of Mexico, and to generate realistic interannual variability in monthly means. These objectives were achieved in two ways: (1) by replacing empirical relationships between station latitude and the variance, autocorrelation and cross correlation of daily minimum and maximum temperature time series by input station-specific monthly statistics \({\left( {\widetilde{\sigma }_{\delta } ,\widetilde{\sigma }_{\varepsilon } ,\widetilde{A}_{\delta } ,\widetilde{B}_{\delta } ,\widetilde{A}_{\varepsilon } ,\widetilde{B}_{\varepsilon } ,\,\widetilde{r}_{{\delta \varepsilon }} } \right)}\); and (2) by generalizing the stochastic processes (Eq. 3) that generate the daily fluctuations around normal monthly (mean) temperatures \( {\left( {\widetilde{\mu }_{n} ,\widetilde{\mu }_{x} } \right)} \). Comparison of the characteristics of observed and simulated daily temperature time series indicates that these objectives have been met successfully in terms of both seasonality and accuracy. The model also produces highly realistic frost-free periods and extreme events such as very low or very high temperatures and hot spells.

The model-generated interannual variation of monthly mean temperatures, which results entirely from the stochastic processes generating daily minimum and maximum temperature time series, is 15% smaller than observed variability. This suggests that other sources of variability are at play in observed temperature fluctuations, such as low-frequency fluctuations including the El Niño/Southern Oscillation (Philander 1990), the North Atlantic Oscillation (Watanabe and Nitta 1999) and the Pacific Decadal Oscillation (Latif and Barnett 1994). No attempt has been made at this point to include such synoptic low-frequency sources of temperature variability in TEMPGEN. Yet low-frequency interannual variability can have important consequences in ecological systems and needs to be taken into consideration in the generation of daily weather data (Hansen and Mavromatis 2001; Mavromatis and Hansen 2001; Dubrovský et al. 2004). One simple way to achieve this without recourse to models of those complex climate oscillation processes would be to apply a suitably small amount of stochastic annual variation simultaneously to monthly mean minimum and maximum temperatures \( {\left( {\widetilde{\mu }_{n} ,\widetilde{\mu }_{x} } \right)} \).

The third objective of this work was to add daily precipitation to TEMPGEN’s output. Many of the daily weather generators developed since the 1950s (Bruhn 1980; Richardon 1981; Racsko et al. 1991; Johnson et al. 1996; Kimball et al. 1997; Friend 1998; Skiles and Richardson 1998; Jones and Thornton 2000; Yu 2003; Hansen and Ines 2005) are based on the simulation of precipitation using Markov-chain transitions between wet and dry states (Chin 1977), and Alpha distributions of the daily amount of precipitation. The TEMPGEN generator here is not based on a distinction between wet and dry days. It approaches precipitation on the basis of an empirical relationship between probability of precipitation and daily temperature range. Rather than describing transitions between wet and dry periods, TEMPGEN’s wet and dry spells are the result of the autocorrelation and cross-correlation structure of minimum and maximum daily temperatures. The distribution of dry spell durations is an emergent property, conditioned by expected total monthly rainfall and its interannual variability, both station-specific model inputs \( {\left( {\widetilde{P},\widetilde{\sigma }_{P} } \right)} \). The variability \( \widetilde{\sigma }_{P} \) is used to generate stochastic values of total monthly precipitation. From these and daily temperature ranges, the daily probability of precipitation is calculated. The proportion of total monthly precipitation falling on any given day is a Beta-distributed random variable with parameters entirely specified by the input value of \( \widetilde{\sigma }_{P} \). In very dry climates (high \( \widetilde{\sigma }_{P} \)), such as in the southwestern United States in summer, most monthly precipitation falls within a few days. In less variable climates (low \( \widetilde{\sigma }_{P} \)), precipitation is more evenly distributed. TEMPGEN generates realistic amounts and seasonal patterns of monthly precipitation. The average daily amount of precipitation is also accurately simulated. However, the daily probability of precipitation generated by the model is slightly lower than observed (by about 2%). The model generates fairly accurate drought durations, except in the driest climates where it underestimates extreme drought durations. The extreme daily precipitation events generated by the model are unbiased, but their correlation with observations is low (r = 0.0677). It must be noted, however, that these extreme precipitation events are rare and that observations used for comparison with model output were based on only 30 years of data per station.

Many authors have used daily weather generators as a means of down-scaling the outputs of Global Circulation Models (CGMs) for climate change impact studies (Oelschlägel 1995; Bürger 1997; Dubrovský 1997; Semenov and Barrow 1997). This involves summarizing CGM outputs and generating daily weather from the resulting distributions (Qian et al. 2005), or altering monthly statistics by fixed amounts (Mearns et al. 1997). One of the problems with these approaches is that CGM outputs often do not reflect many of the statistical characteristics of actual daily weather fluctuations, especially precipitation patterns at regional scales. This is a particularly problematic issue with respect to dry-wet day models, as discussed by Katz (1996). Many daily weather generators require calibration for use at specific locations (Soltani and Hoogenboom 2003; Fowler et al. 2005). The TEMPGEN algorithm has the main advantage of requiring no input data other than the 11 sets of station-specific monthly normals. It can be used to generate daily minimum and maximum temperature and precipitation time series for any location in Canada or the continental United States when provided with local climate normals (whether actual or as a synthesis of CGM output). To provide such normals for locations not close to a weather station or at a finer spatial scale than CGM output, several spatial disaggregation methods are available, such as application of regional climate models or spatial interpolation on the basis of a network of weather station records and topography (Hutchinson 1991, 1995b; Gignac 2000; Price et al. 2000).