1 Introduction

Assessment of vulnerability of local and regional water supply protection and management projects to climate change is currently an active area of research. In this context, many impact related studies tend to rely on outputs from Global Climate Models (GCMs). However, these outputs cannot be applied directly at local and regional scales primarily due to the spatial resolution of GCMs, which is much coarser than that typically required for many impact assessment studies (Fowler et al. 2007; Maraun et al. 2010). Also, some investigators have expressed doubts about the reliability and local scale utility of some GCM outputs (e.g., precipitation) that are critically dependent on sub-grid scale processes such as those involving clouds and topography (Huth 2002; Cavazos and Hewitson 2005; Dibike et al. 2008). These limitations lead to a scale mismatch between the information that GCMs at the moment are able to provide and that which is desired in many impact assessment studies (e.g., Zorita and von Storch 1997).

To circumvent the above mentioned shortcomings, techniques based on dynamical and statistical downscaling have emerged. The dynamic downscaling techniques use Regional Climate Models (RCMs) to predict finer-scale climate variables when these models are driven by GCM outputs at their boundaries (Giorgi 2006). Though on the rise, direct application of RCM outputs for regional impact assessment is often restricted because of the high computational cost involved and/or bias partly originating from the driving GCM. Alternatively, statistical downscaling aims at relating large scale atmospheric covariates to local scale surface variables (Wilby and Wigley 1997; Turco et al. 2011; Gutiérrez et al. 2013; D’Onofrio et al. 2014). One of such techniques is the use of weather generators (WGs) for simulating realistic random sequences of weather variables of any length that are consistent with a given climatology. A detailed review of these techniques, which generally fall under the category of stochastic modelling tools, can be seen in Maraun et al. (2010).

Stochastic modelling of weather variables at the daily or sub-daily scale is particularly challenging due to the intermittence that is inherent in, for example, precipitation at such scales. In some studies, precipitation has been modeled by a two-stage process involving separate models for precipitation occurrence and amounts when wet (Todorovic and Woolhiser 1975; Katz 1977; Buishand 1978; Stern and Coe 1984). Daily precipitation occurrence is often modeled using a two-state Markov process corresponding to wet and dry states (e.g., Richardson 1981; Wilks 1998; Katz et al. 2003), while the gamma distribution has commonly been used to model precipitation amounts (Katz 1977; Stern and Coe 1984). Elsewhere, exponential and mixed exponential distributions (Richardson 1981), as well as mixtures of different continuous distributions (Hundecha et al. 2009) have been used. Generalized linear models (GLMs) (McCullagh and Nelder 1989) offer a framework that unites and extends many of the existing approaches that have been proposed to model precipitation occurrences. These models have been utilized successfully for modelling precipitation sequences (e.g., Chandler and Wheater 2002; Furrer and Katz 2007).

Chandler and Wheater (2002) used a logistic regression to model the probability of rain on a given day at stations in Ireland, with the observed North Atlantic Oscillation being the predictor explicitly representing the large scale atmospheric structure in addition to an indicator of precipitation occurrence on the previous day. They found such models to provide a good representation of the organized structures in the precipitation data in addition to satisfying their distributional assumptions. Kenabatho et al. (2012) explored GLMs to model daily rainfall data from 13 stations located in the Limpopo basin in Botswana. Although their results showed quite high uncertainty, they recommended GLMs for modelling rainfall sequences in semi-arid climates. In the Peruvian Andes, Bergin et al. (2012) modeled daily rainfall using GLMs and concluded that rainfall statistics were satisfactorily reproduced by the models particularly in relatively small catchments. In the context of multisite daily rainfall downscaling in Australia (Frost et al. 2011), the performance of the GLM-based WG was found quite satisfactory compared to other state-of-the-art techniques. Recently, Chun et al. (2013) performed a comparative single-site downscaling of daily precipitation at four selected locations in western Canada using the LARS-WG (Long Ashton Research Station weather generator) (Semenov and Stratonovitch 2010) and GLM approaches. Although both approaches were able to reproduce most of the statistical properties of the historical precipitation records, the GLM-based WG out-performed the LARS-WG in terms of simulating characteristics of extreme events as well as inter-annual variability of precipitation sequences.

Most of the WGs focus on individual sites (e.g., Rajagopalan and Lall 1999; Wilby et al. 2002) and are therefore unable to represent the spatial structure of the observed climatic variables. Although these models can generate time series at more than one site when applied separately, the series so generated would not be spatially consistent, due to neglecting inter-station correlations (Wilks 1998; Mehrotra and Sharma 2007; Jeong et al. 2012). However, for many water resources design and management related projects, particularly in large river basins, it is important to model simultaneous sequences of multiple variables (e.g., precipitation and temperature) over large heterogeneous areas, while maintaining physically plausible spatial, temporal and inter-variable relationships. Several approaches have been developed for simultaneous multisite multivariate generation of climate variables (Apipattanavis et al. 2007; Steinschneider and Brown 2013). However, as noted by Maraun et al. (2010), multisite generation offers many significant challenges primarily due to the need to model the joint distribution of, for example, precipitation simultaneously at all sites and inter-variable and inter-site dependence structures. The GLM-based Rglimclim software package (Chandler 2014) provides a flexible framework for accomplishing such tasks within the R programming environment (R Development Core Team 2014).

This study seeks to investigate the suitability of GLMs for multisite multivariate modelling of precipitation and temperature fields in the Canadian Prairie Provinces, with the aim of using these models for downscaling GCM outputs for climate change impact analysis. This region comprises 47 diverse watersheds including the Saskatchewan, Athabasca, Peace and Churchill River Basins, which serve various needs of the communities ranging from agricultural to domestic usage and fulfilling rapidly expanding requirements of the industrial sector. Apart from regional inhomogeneity and a paucity of ground-based observations, this region of Canada is also characterized by a highly variable hydro-climate with recurrent floods and multi-year droughts.

The paper is organized as follows: Sect. 2 describes the study area and datasets used. The methodology for multisite multivariate modelling of precipitation and temperature sequences based on the Rglimclim software package is described in Sect. 3. Results of the study are presented and discussed in Sect. 4, while a summary and conclusions are given in Sect. 5.

2 Study area and data

The study area comprises the Canadian Prairie Provinces of Alberta, Saskatchewan and Manitoba (Fig. 1) with a total surface area of 1,960,681 km2. The ecosystems of this region depend heavily on precipitation amount and its timing (Hogg et al. 2000). Apart from the moderating effects due to regional changes in topography, atmospheric circulation also controls precipitation patterns (Borchert 1950). Annual average precipitation is approximately 454 mm, rather less than the Canada-wide average of 535 mm (Phillip 1990). The major inflows to the Saskatchewan River Basin, the largest river system in the region, originate from the Rocky Mountains (Wheater and Gober 2013). Characterized by a highly variable hydro-climate and diminishing water resources (Bonsal et al. 2012), southern parts of this region support a vibrant agro-based economy that was hard-hit by the most severe and prolonged droughts of 1988 and 1999–2005, as well as severe floods of 2011, 2013 and 2014.

Fig. 1
figure 1

Study area and observation stations (black dots and red squares) considered in the study. Precipitation is observed at all stations, while temperature is recorded only at stations indicated as black dots. Forty seven watersheds spanning the study area including the provinces of Alberta, Saskatchewan and Manitoba (left to right) are also shown. The inset shows location of the study area in Canada

The datasets used in this study include daily total precipitation, and minimum and maximum temperatures for the 1961–2005 period from a network of 120 stations (Fig. 1, and Table 1 in the Appendix), obtained from Environment Canada (http://www.ec.gc.ca). Temperature is recorded at 96 of the 120 stations. These datasets have been quality controlled and adjusted to account for known changes in recording practice (see Vincent et al. 2009; Mekis and Vincent 2011).

Standardized daily values of large scale atmospheric covariates are derived for the 1961–2005 period from the National Center for Environmental Prediction and the National Center for Atmospheric Research (NCEP/NCAR) Reanalysis-I (Kalnay et al. 1996) over a spatial domain encompassing latitudes 40°N to 70°N and longitudes 130°W to 70°W. In total, 21 large scale covariates (wind speed at 10-m, 500- and 850-hPa; U-component and V-component at 10-m, 500- and 850-hPa, vertical velocity, geo-potential height, specific humidity, and relative humidity at 850- and 500-hPa; total cloud cover, mean sea level pressure, precipitable water and 2-m air temperature) are explored. Monthly indices of teleconnection patterns, such as Pacific Decadal Oscillation (PDO), Pacific North American mode (PNA) and Artic Oscillation (AO), are sourced from the Joint Institute for the Study of the Atmosphere and Ocean, University of Washington (http://jisao.washington.edu/analyses0302/).

It is important to note that the above mentioned observed temperature and precipitation datasets, large scale atmospheric covariates and indices of PDO, PNA and AO were used in Asong et al. (2015) to partition the study area into five homogeneous precipitation regions on which most of the analyses presented herein are based. The partitioning was done using the same set of atmospheric covariates as are used in the present study.

3 Methodology

This section provides methodological background of the GLM framework for modelling daily precipitation and temperature variables. In addition, other important topics ranging from selection of covariates, spatial–temporal dependence structure to model calibration and validation procedures are also discussed. The methodology is described as implemented in the Rglimclim software package of Chandler (2014), which is used for this study.

3.1 GLM for daily precipitation

A two-stage approach involving separate amount and occurrence models has been used previously to model precipitation sequences (Coe and Stern 1982; Chandler and Wheater 2002; Chandler 2005; Furrer and Katz 2007). In a GLM, an n × 1 vector of data y 1, …, y n are considered to be the realized values of the random variables Y = (Y 1, …, Y n )′ with a mean vector μ = (μ 1, …, μ n )′ where μ i is related to the values of a row vector x i of predictors such that:

$$g(\mu_{i} ) = \mathcal{x}_{\mathcal{i}} \mathcal{\beta } = \eta_{i}$$
(1)

where g(.) is a monotonic transformation known as the link function and \(\varvec{\beta}\) is a x × 1 vector of coefficients. The precipitation occurrence process (i.e. the pattern of wet and dry days) is modelled using logistic regression and the precipitation amounts (i.e. intensity) process on wet days is modelled using the gamma distribution. The precipitation occurrence process takes the form:

$$\ln \left( {\frac{{p_{i} }}{{1 - p_{i} }}} \right) = \varvec{x}_{\varvec{i}}\varvec{\beta}$$
(2)

where p i is the probability of precipitation for the ith case in the dataset conditional on a covariate row vector x i with coefficient column vector \(\varvec{\beta}\). Subsequently, for a potentially different covariate vector \(\varvec{\xi}_{\varvec{i}}\), the precipitation intensity process for the ith wet day is modelled as gamma-distributed with mean μ i and shape parameter ν, where

$$\ln (\mu_{i} ) =\varvec{\xi}_{\varvec{i}} \varvec{\varphi }$$
(3)

with the shape parameter ν assumed to be constant (e.g., Yang et al. 2005) for all observations at all sites, and \(\varvec{\varphi }\) is a column vector of coefficients. The coefficient vectors \(\varvec{\beta}\) and \(\varvec{\varphi }\) are estimated using the maximum likelihood method assuming that the observations from different sites are independent (Chandler 2005; Chandler and Bate 2007), with subsequent adjustments for inter-site dependence that is generally present.

3.2 GLM for daily temperature

Khalili et al. (2013) developed a statistical downscaling approach to model daily minimum (Tmin) and maximum (Tmax) temperatures at 10 different locations in Ontario and Quebec. Their approach consists of a combination of a linear regression component to describe the linkage between predictors and temperature values, and a stochastic component based on a spatial moving-average process to reproduce the observed spatial dependence between the values at different sites. Several other approaches also exist in the literature. For example, regression-based methods and artificial neural networks were used by Schoof and Pryor (2001), while first-order trivariate auto-regression that is conditional on precipitation occurrence as implemented in Weather GENerator (WGEN) by Richardson and Wright (1984) have also been applied extensively. Elsewhere, Chen et al. (2012) developed the MulGETS WG wherein a first-order auto-regression was used to model temperature, while Furrer and Katz (2007) modelled both precipitation and temperature at multiple sites using GLMs. Standard linear regression methods assume constant variance for daily time series, \(\varvec{Y}_{st}\), at each site s on a given day t. However, the assumption of constant variance is often violated when analyzing temperature series at the daily time scale (Chandler 2014). Therefore, following Chandler (2005), the method used here includes a two-stage approach whereby separate mean and variance components are developed within a normal-heteroscedastic framework in which the mean (μ st ) and variance (σ 2) of \(\varvec{Y}_{st}\) depend on possibly different covariate vectors. As suggested by Chandler (2014), for modelling Tmin and Tmax, the preferred approach will be to model the mean of the two variables using a normal distribution, and then the difference between them using a gamma distribution. This will guarantee that Tmax is always greater than Tmin in the simulated sequences. However, in this study, we modelled Tmin and Tmax directly.

3.3 Selection of probable candidate predictors

Selection of significant candidate predictors is the most important factor that could affect the accuracy of the estimated predictands (Wilby and Wigley 2000). Recently, Asong et al. (2015) studied spatio-temporal relationships of various precipitation characteristics and the predictors described above in Sect. 2. Principal component and canonical correlation analyses were used to screen the large scale covariates. They found the following eight predictors to influence significantly the precipitation characteristics both in space and time: 2-m air temperature, 850-hPa relative humidity, 500-hPa specific humidity, 850-hPa geo-potential height, mean sea level pressure, horizontal wind components (850-hPa meridional and 10-m zonal wind), vertical velocity (i.e. omega at 500-hPa), and the PDO and PNA indices. The selected predictors reflect information about the thickness, circulation and moisture content of the atmosphere. Subsequently, for modelling precipitation, Tmax and Tmin, the statistical significance of the covariates is assessed simultaneously using likelihood ratio tests, adjusted for inter-site dependence following the approach described in Chandler and Bate (2007), when extending a model by adding more covariate terms in the GLM framework. Thus, ensuring parsimony and reducing the artefacts resulting from over-fitting.

3.4 Spatial–temporal dependence structure

Daily weather sequences often exhibit a high level of temporal and spatial autocorrelation (Wilks 1998). The GLM framework allows for modelling of marginal distributions. However, the flexible approach of Rglimclim offers an opportunity for incorporation of several inter-site dependence models. Given that most weather sequences at different sites tend to be correlated, potentially as a result of being produced by similar large scale weather systems, it is possible to construct a joint distribution of precipitation or temperature at all sites which respects marginal distributions from at-site GLMs. A meaningful GLM for generating multisite multivariate weather sequences must therefore preserve the spatial coherence. This requires a computationally tractable representation of inter-site dependence. This feature is incorporated by transforming the precipitation amounts to Gaussianity and then studying inter-site correlations on the transformed scale (see Yang et al. 2005 for details). For temperature, inter-site dependence is specified directly via correlations between the standardized residuals. The software also offers various options for modelling temporal autocorrelation structure mostly as a function of lagged values and a ‘persistence indicator’. Intervariable relationships are represented as functions of concurrent/simultaneous and lagged values of other variables. An advantage of using a spatial correlation model is that it provides the opportunity to simulate weather sequences at ungauged locations which is an important consideration for the current study area due to the sparse network of observation stations.

Multisite simulation of precipitation occurrence in a large study area with marked convective activity during summer makes the incorporation of spatial dependence into binary sequences a very challenging task. Yang et al. (2005) reviewed related techniques in the context of daily rainfall generation and found that none of the approaches was suitable for their study case. Their main difficulty was that the study area was relatively small compared to the synoptic weather systems affecting it. As our study area is very large and the precipitation production processes (e.g. convective cells) are highly localized, we adopt the same approach as in Ambrosino et al. (2014). Supposing that it is required to generate a vector \(\varvec{Y} = (Y_{1} , \ldots ,Y_{st} )^{'}\) of correlated binary variables and that Eq. (2) gives the probability of precipitation at site st as p st . A conceptually easy to implement approach is to start by generating a set of correlated Gaussian variables \(\varvec{Z} = (Z_{1} , \ldots ,Z_{st} )\) and then define a threshold (to handle treatment of “small” values) that is chosen to ensure that P(Y st  = 1) = p st as required by the logistic regression model in Eq. (2) since the threshold is dependent on the probabilities derived from the occurrence model.

3.5 Model fitting and evaluation: calibration and validation

The primary stage in model building is to decide on an appropriate class of models to represent the variable(s) of interest, which is addressed in Sects. 3.1 and 3.2 above in the context of GLM framework. In this study, GLMs are fitted separately to precipitation and temperature fields (i.e. Tmin and Tmax) considering the entire study area as a single region and using observations from the 1971–2000 period. Herein, a day is defined as wet if the recorded amount of precipitation exceeded 0.5 mm. First, for the precipitation case, models are fitted using data from all 120 sites. Subsequently, Tmin and Tmax from 96 of the 120 stations are modeled separately and intervariable relationships are accounted for by using simultaneous and lagged values of precipitation as covariates to model temperature. This approach is refined further based on smaller homogeneous partitions of the study domain. The first step involved in the calibration is the development of ‘initial’ GLMs consisting of a constant term and basic factors influencing weather variability such as seasonality, autocorrelation and geographical attributes (site effects). Subsequently, daily values of NCEP-based covariates and monthly values of teleconnection indices (see Sect. 3.3) are incorporated as external covariates. The rationale for adding successive predictors to the existing model was assessed by evaluating the predictive performance, dependence-adjusted log-likelihood and the residual structure for each fitted model.

It is possible, for example, that climate variability in the Canadian Prairie Provinces is linked with the PDO and PNA phenomena, especially during winter months. Therefore, the coefficient of the PDO in a GLM should vary by season of the year. Instead of fitting separate models for each month of the year, the coefficient of the PDO can be represented as a linear combination of covariates explaining seasonality. This is achieved within the GLM framework via interactions (Chandler and Wheater 2002; Chandler and Scott 2011). The software provides a wide range of residual-based diagnostics to check that the fitted models are able to reproduce the systematic structure in the observations, as well as the distributional assumptions (e.g., precipitation intensities follow gamma distributions) and the assumed inter-site correlation structure (see Yang et al. 2005 for further details). For example, to check that the underlying structure has been captured by the fitted model, we define Pearson residuals as:

$$r_{i}^{(P)} = \frac{{Y_{i} - \mu_{i} }}{{\sigma_{i} }}$$
(4)

where Y i is the observed response for case i, and μ i and σ i are the modeled mean and standard deviation. If the fitted model is correct, all of the Pearson residuals have expected value zero and variance 1. In addition to Pearson residuals, Anscombe residuals (Eq. 5) for the gamma distribution are defined for the amounts model to ensure that the probability structure of the fitted models is correct.

$$r_{i}^{(A)} = \left( {\frac{{Y_{i} }}{{\mu_{i} }}} \right)^{1/3}$$
(5)

The suitability of the calibrated models for generating weather sequences independent of the calibration period is tested by validating the models on the pre- and post-calibration periods (i.e., 1961–1970 and 2001–2005). To simulate weather sequences, the parameters of the fitted models are constrained using external covariates from the corresponding validation periods. For comparing simulated statistics with observed ones, it is important to assess the uncertainty resulting from missing observations. For this purpose, 39 imputations (whereby missing values at gauged and ungauged sites are sampled from their conditional distributions given the available observed data; see Chandler 2014, page 64 for details) for defining the 95 % uncertainty interval for the true value are carried out using predictors from the respective calibration and validation periods. Selected statistics, such as the Mean, standard deviation (Std), lag-1 autocorrelation function (ACF(1)), proportion of wet days (P W ), conditional mean (Mean cond) and conditional standard deviation (Std cond) are computed for each of the resulting imputed data sets. The variability in the resulting statistics is indicative of the historical uncertainty due to missing values. Conditional statistics are computed for precipitation only, based on the proportion of exceedances of the 0.5 mm threshold. Using the fitted models, 100 realizations are obtained for the calibration and validation periods. In each case, predictors for the first year are used to initialize simulations. Subsequently, the same selected statistics are computed from the simulated sequences and compared with the corresponding observed values. Model performance is first evaluated by region and then by site.

3.5.1 Additional assessments

It is likely that changes in the seasonal and extreme precipitation characteristics will have important implications for managing regional water resources related projects in the study area (Mladjic et al. 2011; Khaliq et al. 2014). Therefore, in addition to the above mentioned statistics, the ability of the GLMs in reproducing observed distributions of seasonal extremes is also assessed. For this purpose, seasonal maxima (minima) of daily Tmax (Tmin) are derived from observed data as well as from simulated sequences for the calibration and the two validation periods. In like manner, seasonal maxima of daily precipitation amounts are obtained from the observed and simulated data. For example, for the 120 sites for the calibration period (1971–2000), 100 simulations of precipitation per site are made, and then for each season, the maximum value per year is extracted for each simulation and for a given site. This will give 30 maxima/minima per year per simulation. Then, the 95th percentile value is computed from each simulation, resulting to one value per simulation. Subsequently, the 95th (5th) percentile of observed precipitation and Tmax (Tmin) extremes is compared to the 100 95th percentiles values obtained from 100 simulations. It is worth noting that the model performance during the two validation periods has been evaluated using data for 5 and 10 years only. Thus, it is difficult to compare scientifically the simulated distribution of 95th percentiles of precipitation and temperatures with the observed value since the 95th percentile of a sample of 5 or 10 observations is almost meaningless and difficult to interpret. A more robust approach will be to use long records of data but in the present case the data are insufficient to carry out such analyses.

In addition, seasonal values of commonly used climate indices, i.e., mean wet spell length (pwsav), mean dry spell length (pdsav), maximum number of consecutive dry days (pxcdd), maximum number of consecutive wet days (pxcwd), and extreme hot and cold temperature spells (i.e. the 90th percentile heat wave duration–txhw90 and the 10th percentile cold wave duration–tncw10), are investigated. These indices have been selected from a set of 27 different indices suggested by Goodess (2003) in order to develop a set of harmonized indices across the globe. Specifically, for txhw90, let Tx ij be the daily maximum temperature at day i of period j and let Txq90 inorm be the calendar day 90th percentile calculated for a 5-day window centered on each calendar day during a specified period. Then the maximum number of consecutive days per period, where Tx ij  > Txq90 inorm , is obtained to calculate txhw90. Similarly, for tncw10, let Tn ij be the daily minimum temperature at day i of period j and let Tnq10 inorm be the calendar day 10th percentile calculated for a 5-day window centered on each calendar day during a specified period. Then the maximum number of consecutive days per period, where Tx ij  < Txq90 inorm , is obtained to calculate tncw10. Further details on the computation of other indices can be found at http://www.cru.uea.ac.uk/projects/stardex/deis/Diagnostic_tool.pdf.

4 Results and discussion

This section contains results of various components of the study, ranging from preliminary diagnostics to model calibration and validation, as well as associated discussions. Though all components are presented and discussed in separate sections, graphical outputs of the validation part of the study are presented alongside the calibration results for ease of comparison.

4.1 Preliminary diagnostics, inferences, and calibration of GLMs

We start by fitting GLMs to precipitation sequences from all 120 sites, by considering the entire study domain as a single region, and then diagnose Pearson residuals, classified by site, month and year, for the presence or absence of unexplained spatiotemporal structures. Following this approach, the spatial distributions of “mean residuals by site” obtained from the amounts and occurrence models for all sites are shown in Fig. 2. In the presence of any systematic regional variations that are not accounted for by the fitted model, the sites with positive mean residuals will tend to cluster together and the same will be the case for negative mean residuals. In Fig. 2, a discernible spatial trend in the pattern of residuals is evident. For example, to the southeast and in western parts of the study domain, clusters of positive-only residuals (unfilled circles) can be seen. Likewise, to the south-central region, groupings of negative-only residuals are evident. Additional results of the residual analysis by month and year for the same amounts and occurrence models (figures are not shown) suggest that a single model for the entire region is not adequate for describing daily precipitation sequences because the pattern of residuals do not satisfy the underlying distributional assumptions. Moreover, it was also noted that most of the selected statistics and inter-variable correlations were not satisfactorily reproduced.

Fig. 2
figure 2

Bubble map showing spatial distribution of mean Pearson residuals at each site from the fitted precipitation a amounts and b occurrence models. The bubble maps were obtained from the GLMs fitted by considering the entire study domain as a single region. The size of the circle is proportional to the standardized mean residual. Description of the regions A to E corresponding to different colors is provided in Fig. 3

Having gained insights from the results discussed above, GLMs are fitted separately to each of the five pre-defined statistical and climatological homogeneous partitions/regions of the study area, identified recently in Asong et al. (2015) (Fig. 3). These regions were delineated using principal component and canonical correlation analyses and Fuzzy C-Means clustering of the feature vectors derived from large scale atmospheric covariates and geophysical attributes. The pattern of residuals shown in Fig. 2 shows some similarity with the geographical extent of these homogeneous regions. Therefore, the rest of the analyses for the precipitation case presented hereafter are based on models fitted separately to each of these regions. Evaluation of the residuals from the fitted models for each region indicated a good fit, when assessed on the basis of 95 % confidence intervals (see supplementary material).

Fig. 3
figure 3

Statistical and climatological homogeneous regions (A, B, C, D and E), along with the spatial distribution of respective defuzzified precipitation gauges from Asong et al. (2015)

For the temperature field, Tmin and Tmax are modeled separately considering the entire study domain as one region. Based on the residual plots, distributional features of both Tmin and Tmax are relatively better described by the GLMs compared to the precipitation field when the entire domain is considered as one region. To develop a joint model for precipitation and temperature, we use concurrent and/or lagged precipitation values in each homogenous region as a covariate to model temperature.

The influence of teleconnections on regional precipitation and temperature patterns is also examined. The PDO and PNA are found to be the dominating teleconnection indices modulating regional and seasonal precipitation patterns. Spatially, the PDO is found to influence significantly precipitation processes in the western and northeastern parts of the study area, while the PNA showed dominance in the southeast (region A in Fig. 3). Temporally, the PDO and PNA are found to have a substantial time-lag for precipitation occurrence and intensity processes for up to 3 years for most parts of the study area. However, a simultaneous response is found between the PDO and variance of Tmin and Tmax. Given that no simultaneous response is found between precipitation and teleconnection indices, it is likely that the atmospheric patterns delivering precipitation over the study region are not closely associated with the atmospheric patterns that control PDO and PNA variations.

4.1.1 Evaluation of spatial dependence and distributional assumptions

The ability of the GLMs to preserve the probability structure of the observed precipitation and temperature fields is assessed through Q–Q plots of standardized Anscombe’s residuals under the fitted amounts models. Besides, the relationship between the observed and modeled inter-site correlations with distance, calculated from the site’s latitude and longitude is also examined. A powered exponential correlation function with decreasing correlation at large distances (Chandler 2014) is found suitable for modelling inter-site dependence of conditional precipitation intensity process, and temperature values. Figure 4 shows the fitted correlation models for each region, alongside Q–Q plots of the residuals pooled over all sites in each region. For all regions, the residuals correspond to the theoretical values very well (Fig. 4a). Figure 4b shows observed inter-site correlations, overlain by the fitted models. The exponentially decaying behavior of observed correlations is well described by the assumed theoretical models. In summary, inter-site correlations for all regions are well captured. In Fig. 5, results of spatial dependence analysis for temperature are shown. The upper row corresponds to Tmin while the lower one shows plots for Tmax. The fitted inter-site correlations generally are in good agreement with those observed and are judged to be satisfactory for additional analyses. However, slight discrepancies for the lower end of the distribution can clearly be noted.

Fig. 4
figure 4

a Q–Q plots of standardized Anscombe residuals pooled over all sites in each region, for the fitted precipitation amounts model; b Observed inter-site correlations and the fitted correlation model (red line)

Fig. 5
figure 5

Inter-site correlations (grey dots) that decay exponentially with distance for daily a minimum and b maximum temperatures. Q–Q plots of standardized Anscombe’s residuals from the jointly fitted mean and variance model for daily c minimum and d maximum temperatures, respectively

4.1.2 Simulated characteristics of daily, seasonal and extreme values of precipitation and temperature

Figures 6 and 7 show regionally pooled (i.e., over all sites in a region) simulated values of selected statistics (see Sect. 3.5) of daily precipitation, together with simulated distributions obtained from 39 imputations in lieu of missing observations. In Fig. 6, there is generally a good agreement between the simulated and observed values for each month of the year, with few exceptions. The observed values (i.e., grey) of P W are slightly overestimated for nearly all regions, particularly for the summer months. The performance of the models for P W appears to be sensitive to the choice of the threshold used for defining a wet day because the values of P W are found to be relatively well reproduced when 1 mm (instead of 0.5 mm) threshold is used (results are not shown). Though with a wider simulated distribution, the ACF(1) values are also satisfactorily reproduced for all regions and months, except for the month of December and about same is the behavior of the Mean cond. Based on the analyses performed for other sites (not shown due to space constraints), the GLMs were able to reproduce the systematic regional variations and spatial structures of both mean and extreme weather states at the majority of the 120 sites.

Fig. 6
figure 6

Comparison of observed and simulated values of selected statistics—lag-1 autocorrelation function (ACF(1)), proportion of wet days (P W ), conditional mean (Mean cond), and conditional standard deviation (Std cond) of precipitation sequences—for all regions for the calibration period (1971–2000), together with distributions obtained from 39 imputations of observed data. Thick grey band is the 95 % interval for the imputed values. The pink, green and black lines indicate respectively the 2.5th, 50th and 97.5th percentiles, while the blue line represents the minimum and the red line represents the maximum values of the simulated precipitation amounts

Fig. 7
figure 7

Comparison of observed and simulated values of spring (MAM), summer (JJA), winter (DJF) and autumn (SON) daily precipitation pooled over all sites in a region. Results are shown for three selected regions A, B and C and remaining convention is the same as in Fig. 6

For some applications such as water balance studies, it is important to reproduce observed variations in precipitation totals over monthly or longer time scales. Moreover, simulating the inter-annual variability is particularly important as it indicates that the model is correctly reproducing the predictor–predictand relationships. This feature provides some additional confidence that changes in the predictors under climate-change conditions will be able to produce correct changes in the predictands (Haylock et al. 2006) when these models will be used in that context. Figure 7 shows simulated values of seasonal mean daily precipitation and the corresponding observed values for three selected regions A, B and C. Apart from region A and C, where there is a slight tendency for the model to overestimate the monthly precipitation totals in spring and summer, the GLM framework is able to preserve the observed inter-annual variability. This feature of the GLM framework is also discussed in Chun et al. (2013). For most of the years, the observed precipitation values are found to be within the 2.5th and 97.5th percentiles of the simulated distribution. Also for the case of observed precipitation, the imputation range (i.e., the thick grey band) points to substantial uncertainty due to missing values; for example, region B in winter. The behavior of the seasonal mean precipitation for the remaining two regions D and E was about the same as discussed above for region B.

The next assessment is for the probability distribution of monthly precipitation amounts. One way to assess the ability of a model in simulating the probability distribution of observed monthly precipitation totals is by plotting quantiles of simulated and observed amounts. Figure 8a shows Q–Q plots of observed and simulated monthly precipitation totals averaged over the number of sites in each region. These plots indicate a good correspondence between observed and simulated monthly precipitation totals for all regions for the calibration period.

Fig. 8
figure 8

Q–Q plots of observed and simulated monthly precipitation totals (in mm) pooled over the number of sites in each region for the calibration (1971–2000) and two validation (1961–1970 and 2001–2005) periods

Figures 9a and 10a show monthly statistics of Tmax and Tmin, respectively. Unlike precipitation, most statistical properties of both temperature fields are well reproduced by the GLMs, except ACF(1) which is underestimated for summer months. The last two columns in these Figures show intervariable correlations. For Tmax, the correlation between its selected percentiles and that of the precipitation field are not well reproduced especially in summer. However, this issue can probably be resolved, for example, by including in the model definition interaction terms between precipitation covariates and seasonality. Also, the use of Convective Available Potential Energy (CAPE) and Convective Inhibition (CIN) which play a dominant role in convective precipitation (both its genesis and intensity) as additional external covariates in the GLMs can be explored in future studies. Unlike inter-variable correlations of precipitation and temperature fields, the correlation between Tmax and Tmin is fairly well captured for most months, except the winter months. Compared to the correlation between precipitation and Tmax, the correlation between precipitation and Tmin is satisfactorily reproduced.

Fig. 9
figure 9

Comparison of observed and simulated values of selected statistics of TmaxMean, standard deviation (Std), lag-1 autocorrelation function (ACF(1)), and correlation between maximum temperature and precipitation (cor(Tmax,Precip)) and minimum and maximum temperatures (cor(Tmax,Tmin))—for the calibration and two validation periods, together with distributions obtained from 39 imputations of observed data. Remaining convention is the same as in Fig. 6

Fig. 10
figure 10

Same as in Fig. 9 but for Tmin

Figure 11 (left column) shows distributions (i.e. boxplots) of winter and summer seasonal maxima of daily precipitation amounts for Hudson Bay (GG89), Fort McMurray (GG35), Saskatoon (GG77), Edmonton (GGG4) and Medicine Hat (GG20), selected respectively from regions A–E (Table 1, “Appendix”). For each location, the boxplots represent distributions of 95th percentile values derived from 100 simulations, each consisting of one seasonal maximum or minimum per year. The observed values for all locations are well simulated for both seasons. For most cases, the observed value lies within the inter-quartile range of the simulated distribution. For the case of temperature, seasonal maxima of Tmax are evaluated to illustrate simulation of summer extremes, while seasonal minima of Tmin are evaluated to show simulation of winter extremes. The results of this evaluation are shown in Fig. 12 (first column). As for the case of precipitation, it is evident that the GLMs are also able to satisfactorily simulate upper and lower tail behavior of temperature extremes.

Fig. 11
figure 11

Evaluation of GLM performance for simulating a winter and b summer extremes of precipitation amounts for the calibration (1971–2000) and two validation (1961–1970 and 2001–2005) periods for Hudson Bay, Fort McMurray, Saskatoon, Edmonton, and Medicine Hat, located respectively in each of the five homogeneous regions AE. In each boxplot, the box corresponds to the interquartile range, the line in the middle of the box to the median value and the whiskers to either the maximum or minimum value of the simulated distribution. Red dots indicate observed values. Boxplots represent distributions of 95th percentile values derived from 100 simulations, each consisting of one seasonal maximum per year

Fig. 12
figure 12

Evaluation of GLM performance for simulating extreme values of a winter Tmin and b summer Tmax temperatures for the calibration (1971–2000) and two validation periods (1961–1970 and 2001–2005) for Hudson Bay, Fort McMurray, Saskatoon, Edmonton, and Medicine Hat, located respectively in each of the five homogeneous regions AE. Boxplots represent distributions of 95th percentile values derived from 100 simulations, each consisting of one seasonal maximum in case of Tmax and minima in case of Tmin per year. Remaining convention is the same as in Fig. 11

For evaluating simulations of selected climate indices (see Sect. 3.5.1), we concentrated on the same selected locations as for the precipitation and temperature extremes presented above. Detailed graphical results are omitted for the calibration period, but a summary of the observations made is presented below. For temperature related indices (i.e. tncw10 and txhw90), it can be stated that the GLMs are able to simulate well observed median values for both winter and summer, given that the observed values were within the inter-quartile range of the simulated distribution for most of the cases. For precipitation related indices, a comparison of observed and simulated frequency-based indices (pxcdd; pxcwd) and mean length of wet/dry spells (pwsav; pdsav) suggested satisfactory performance of the GLMs. In general, GLMs performed relatively better in summer than in winter. Overall, temperature related indices were better reproduced than the precipitation related indices.

4.2 Validation of GLMs

The calibrated models are evaluated by generating 100 realizations of daily precipitation and temperature fields for the pre- and post-calibration periods (i.e. 1961–1970 and 2001–2005). In summary, for both validation periods, ACF(1), Mean cond, and Std cond of simulated precipitation sequences are satisfactorily reproduced except P W , which is overestimated by the models. Figures are omitted as the results were very similar to those for the calibration period. Figure 8b, c shows a comparison of observed and simulated monthly precipitation totals. On the regional level, monthly precipitation totals are well reproduced for most of the regions except region E, where the observed values are underestimated for the 2001–2005 period (Fig. 8c). For this region, which corresponds to the Rocky Mountains, the spatial structure of precipitation is probably more complex than in other regions of the study area. Jiang (2003) noted that modelling of precipitation in mountainous areas is particularly challenging because of the multiscale nature of the complex terrain, interactions between terrain and airflow, the complex role of latent heating/cooling, and the complexity of cloud physics. The results of comparison of observed and simulated statistics of Tmax and Tmin are shown in Figs. 9b, c and 10b, c, respectively. These results suggest that the model performance is very similar to that discussed for the calibration period.

Next, the evaluation of the GLM framework in reproducing seasonal precipitation and temperature extremes is discussed. For the case of precipitation extremes, it is evident from column two and three of Fig. 11 that the winter and summer extremes are satisfactorily reproduced. The results of temperature related extremes are shown in Fig. 12 (column two and three) for winter and summer seasons. Again, the GLMs are able to capture both lower and upper tail behavior of the observed distributions for the two validation periods. As noted for the calibration case, temperature related extremes are better reproduced than the precipitation related extremes.

Figure 13 shows boxplots of selected climate indices for winter and summer seasons for the 1961–1970 period only. For each location, the boxplot represents distributions of indices derived from 100 realizations. For temperature related indices (i.e. tncw10 and txhw90), observed values lie within the inter-quartile range of the simulated values for most cases. For the case of precipitation, frequency-based indices (i.e. pxcdd and pxcwd) and mean length of wet/dry spells (pwsav; pdsav) are satisfactorily captured in both seasons. The performance of the GLMs is generally better in summer than in winter. Similar results were realized for the other validation period (2001–2005) for most of the regions, except region E, as discussed above. A probable explanation for these discrepancies could be that this region experienced severe drought conditions in 2001–2005 period compared to the 1971–2000 period used for calibration. This study, most probably is the very first attempt on multisite multivariate downscaling of GCM outputs to station scale in this region of Canada. Apart from exploring the significance of additional terms such as interactions in the model definition, we recommend the use of other re-analyses products (e.g., ERA40, ERA-Interim) in model fitting to improve the tools for downscaling of climate extremes in this region of Canada. Also, other downscaling models should be tested to see if the historical extremes (e.g. droughts) which are important to inform climate change and adaptation strategies are better reproduced.

Fig. 13
figure 13

Evaluation of GLM performance for simulating climate indices in a winter and b summer for the 1961–1970 validation period for Hudson Bay, Fort McMurray, Saskatoon, Edmonton, and Medicine Hat, located respectively in each of the five homogeneous regions AE. The remaining convention is the same as in Fig. 11

5 Summary and conclusions

The main goal of this study is to explore the suitability of GLMs for modelling multisite precipitation and temperature sequences in the Canadian Prairie Provinces using large-scale atmospheric fields from NCEP reanalysis-I and the PDO and PNA as exogenous covariates. The logistic regression approach is used to model precipitation occurrences, while the two-parameter gamma distribution is used to model precipitation amounts. A jointly fitted model comprising the mean and dispersion components is used to model daily minimum and maximum temperatures. The suitability of the fitted GLMs for characterizing precipitation and temperature fields in terms of (a) simulating their mean values at the daily, monthly and seasonal scales, (b) characteristics of extreme values, (c) intervariable relationships and (d) selected climate indices are investigated using independent observations from pre- and post-calibration periods. The following conclusions can be drawn from the various analyses presented:

  1. (1)

    Based on residual analysis, it is found that a single model for precipitation sequences could not be justified for the entire study area. Therefore, separate models are developed on the basis of five pre-defined homogeneous regions covering the study area. Following this approach, residual plots for each region show significant improvement in the performance of the fitted GLMs.

  2. (2)

    For both calibration and validation periods, there is generally good agreement between the simulated and observed values of various precipitation and temperature characteristics for each month of the year. Most of the statistical features are generally well reproduced, except the proportion of wet days, which is slightly overestimated. The observed characteristics lie generally within the 2.5th and 97.5th percentiles of the simulated values. The uncertainty bands due to missing observed values are found to be quite large, especially for the winter season. In general, the simulated values of precipitation characteristics are more variable than those of temperature fields.

  3. (3)

    The fitted GLMs are able to capture spatial and inter-variable dependence structure. Distance-based inter-site correlations are well reproduced by the GLMs. The temporal correlations between precipitation and Tmin are well captured by the models. However, the temporal dependence between summer precipitation and Tmax is generally underestimated. However, this issue can be easily fixed by including in the model definition interaction terms between precipitation covariates and seasonality.

  4. (4)

    The fitted models are also assessed for robustness in terms of their ability to reproduce characteristics of extreme events and some of the commonly used climate indices. In summer, the performance of the models is generally better than in winter as the observed values for most indices and 95th percentiles of the winter and summer seasonal extremes fall mostly within the inter-quartile range of the simulated values. Overall, hot and cold temperature related indices and characteristics of temperature extremes are better reproduced than the precipitation related indices and characteristics of extreme precipitation amounts.

Finally, it can be concluded that apart from few limitations (such as overestimation of proportion of wet days), the GLM framework has the potential for multisite multivariate modelling of daily precipitation and temperature fields. This framework is able to describe satisfactorily mean and extreme climate characteristics using NCEP reanalysis-I predictors and teleconnection indices. So far, we have not come across any plausible weather generator that can reasonably be applied to a huge and clearly inhomogeneous region studied in this paper. The next phase of this study is to use the fitted models for downscaling climate projections from state-of-the-art GCMs participating in the Coupled Model Intercomparison Project Phase 5 (CMIP5) of the Intergovernmental Panel on Climate Change. Such analyses will furnish additional opportunities for further evaluation of the GLM framework, in particular, validity of the key assumptions of statistical downscaling, including temporal invariance, discussed in Wilby et al. (2004).