Improvement of multiple linear regression method for statistical downscaling of monthly precipitation

Pahlavan, H. A.; Zahraie, B.; Nasseri, M.; Mahdipour Varnousfaderani, A.

doi:10.1007/s13762-017-1511-z

Improvement of multiple linear regression method for statistical downscaling of monthly precipitation

Original Paper
Published: 05 September 2017

Volume 15, pages 1897–1912, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Environmental Science and Technology Aims and scope Submit manuscript

Improvement of multiple linear regression method for statistical downscaling of monthly precipitation

Download PDF

H. A. Pahlavan ORCID: orcid.org/0000-0003-3964-6223¹,
B. Zahraie²,
M. Nasseri^1,3 &
…
A. Mahdipour Varnousfaderani¹

666 Accesses
11 Citations
Explore all metrics

Abstract

This article aims at proposing an improved statistical model for statistical downscaling of monthly precipitation using multiple linear regression (MLR). The proposed model, namely Monthly Statistical DownScaling Model (MSDSM), has been developed based on the general structure of Statistical DownScaling Model (SDSM). In order to improve the performance of the model, some statistical modifications have been incorporated including bias correction using variance correction factor (VCF) to improve the computed variance pattern. We illustrate the effectiveness of the proposed model through its application to 288 rain gauge stations scattered in different climatic zones of Iran. Comparison between the results of SDSM and the proposed MSDSM has indicated superiority of the proposed model in reproducing long-term mean and variance of monthly precipitation. We found that the weakness of MLR method in estimating variance has been considerably improved by applying VCF. We showed that the proposed model provides a promising alternative for statistical downscaling of precipitation at monthly time scale. An investigation of the effects of climate change in different climatic zones of Iran by the use of Representative Concentration Pathways (RCPs) has shown that the most significant change is an increase in precipitation in fall and that the largest share of this increase belongs to arid climate.

A new statistical precipitation downscaling method with Bayesian model averaging: a case study in China

Article 31 January 2015

A method for deterministic statistical downscaling of daily precipitation at a monsoonal site in Eastern China

Article 28 December 2017

Application of Multiple Linear Regression as Downscaling Methodology for Lower Godavari Basin

Discover the latest articles, news and stories from top researchers in related subjects.

Environmental Chemistry

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Coarse spatiotemporal resolution of general circulation models (GCMs) is one of the major problems in using their outputs to assess the impacts of climate change on water resources systems (Willems and Vrac 2011). In order to extract regional-scale meteorological variables from GCM outputs, downscaling methods have been widely implemented in climate change studies. Statistical downscaling methods that establish statistical relationship between large-scale atmospheric information and local-scale meteorological data have been used by many researchers mainly because of the limited data they need and their computational simplicity (Wilby et al. 2002, Olsson et al. 2004, Hessami et al. 2008, Raje and Mujumdar 2011).

Among statistical downscaling techniques, regression-based methods have received more attention in recent years because they are computationally less demanding, simple to apply, and statistically efficient (Semenov et al. 1998, Dibike and Coulibaly 2005, Hashmi et al. 2011). Regression-based methods correlate the regional-scale state of the atmospheric variables (predictors) to the local-scale meteorological variables (predictands) such as precipitation, temperature, and streamflow. The statistical models can be calibrated and validated based on historical data, assuming that the transfer from the predictors to the predictands does not significantly change even under altered climatic conditions (Willems et al. 2012).

Current literature on the development and application of statistical downscaling mostly focuses on daily or (rarely) sub-daily time scales (Willems and Vrac 2011), whereas in some areas particularly in arid regions, daily or sub-daily information might not be accessible (Prudhomme et al. 2002, Sachindra et al. 2014b). Furthermore, many of the monthly downscaling methods introduced in the literature are focused on spatial downscaling of the climate model outputs to relatively coarse scales (predominantly to 1/8-degree spatial resolution), not providing any information at finer spatial scales (Prudhomme et al. 2002, Wood et al. 2004, Tripathi et al. 2006, Anandhi et al. 2008, Maurer and Hidalgo 2008, Ojha et al. 2010, Goyal et al. 2011, Hashmi et al. 2013).

Applications of data mining and regression analysis techniques such as artificial neural networks (ANNs) (Fistikoglu and Okkan 2010), support vector machine (SVM) (Najafi et al. 2011), adaptive-network-based fuzzy inference system (ANFIS) (Najafi et al. 2011), and multiple linear regression (MLR) (Huth and Kyselý 2000, Hellstrom et al. 2001, Najafi et al. 2011, Goly et al. 2014, Sachindra et al. 2014a) have been reported in the literature for statistical downscaling of precipitation and temperature. Most of the proposed methods have shown significant weakness in reproduction of monthly precipitation variance values (Hessami et al. 2008, Nasseri et al. 2013, Tavakol-Davani et al. 2013).

Among various statistical methods that use MLR, Statistical DownScaling Model (SDSM) has been widely used in studies focusing on climate change impact assessment. MLR methods in general and SDSM in specific usually accurately estimate mean of local meteorological predictands, but their performance in estimating variance and extreme values is sometimes substantially weak (Wilby et al. 2004, Hessami et al. 2008). Huth and Kyselý (2000) used MLR in downscaling of monthly precipitation and temperature. Their model showed low skill levels for downscaling monthly precipitation. Goly et al. (2014) compared different statistical downscaling models that use MLR including positive coefficient regression (PCR), stepwise regression (SR), and support vector machine (SVM) techniques for estimating monthly precipitation amounts. They found that the models are able to preserve monthly mean values but not the variances. The models they tested failed to downscale highly variable monthly rainfalls in the wet season. Sachindra et al. (2014a) developed two statistical models based on the MLR method using two sets of regenerated data by National Center for Environmental Prediction (NCEP) and HadCM3 models for downscaling monthly precipitation. They found that both models tend to underestimate the high monthly precipitation values.

Considering the small number of monthly data that aggravate the limited ability of MLR methods in reproducing variance of predictands, in this study, we utilize the main structure of SDSM as the platform to develop a Monthly Statistical DownScaling Model (MSDSM). We propose application of variance correction factor (VCF) in MSDSM for increasing accuracy of variance estimation in the monthly statistical downscaling of precipitation. We evaluate the performance of MSDSM through comparing its results in downscaling precipitation in 288 rain gauge stations scattered in different climatic zones of Iran with SDSM results. In this paper and for the first time, the future variations of projected precipitation under different RCP scenarios over a large part of Iran have been estimated and presented.

"Materials and methods" section of this paper describes the local- and large-scale datasets utilized in this study. Methodol ogy section gives an overview of the proposed methodology for downscaling. "Results and discussion" section thereafter presents the results of the case study, and finally, a set of concluding remarks are presented in "Conclusion" section.

Materials and methods

Local dataset

Iran is located in the southwest of Asia, with a complex orography and a wide latitudinal extent between 25°N and 40°N, resulting in high variability of precipitation in both space and time. North and west of Iran are surrounded by Alborz and Zagros Mountains, respectively, which play a key role in triggering precipitation on their windward sides and act as barriers to moisture transfer to the arid and semiarid regions of central and eastern Iran. Southern coast of Caspian Sea and northwestern coast of Persian Gulf are considered as complex climatic areas due to placement between sea and high mountains on either side. The elevations range from −32 m below the sea level up to 5600 m with a national average of 1200 m (Fig. 1a).

Low precipitation and its severe fluctuations in the daily, seasonal, and annual time scales are the intrinsic characteristics of Iran’ s climates (Khalili et al. 2016). The annual rainfall varies between 1800 mm in the north to <100 mm in the central and eastern arid regions of Iran. Based on the modified de Martonne climate classification, there are three different climatic zones in Iran including humid, mediterranean, and arid (Fig. 1b).

Two hundred and eighty-eight rain gauge stations scattered in different climatic zones of Iran have been used to evaluate the performance and the applicability of the proposed downscaling method. Figure 1b shows the locations of the rain gauge stations. As it can be seen, selected rain gauge stations are scattered mostly in western, central, and northern parts of the country, which characterize by arid to humid climates. The number of stations in each of the three climatic zones and the average of their statistical characteristics are presented in Table 1. Iran Water Resources Management holding company of the Ministry of Energy provided the daily precipitation records utilized in this study. Time series of observed precipitation in the rain gauge stations are considered as a predictand in this study.

Table 1 Basic characteristics of the rain gauge stations used in this study

Full size table

Selection of the rain gauge stations has been done based on the availability of observed daily precipitation (containing <30% missing values) and passing various homogeneity tests including Standard Normal Homogeneity Test (SNHT), Pettitt test (PT), Buishand range (BR), and runs tests (RTs). Outliers of precipitation time series have been detected based on their distance from the average of the series and replaced by a threshold (Barnett and Lewis 1974) or removed and treated as a missing data, using a methodology developed by (Dixon 1953). After outlier elimination, homogeneity of the time series has been checked. If the time series of observed precipitation in a station has been classified as non-homogenous with none or only one test and homogenous with other tests at the significance level of 5%, it has been classified as a homogenous dataset and has been used in this study.

The reference period (1971–2005) has been partitioned into two sets: we use the first 70% of the data (1971–1995) to calibrate the downscaling models and the rest of the data (1996–2005) for validation.

Large-scale datasets

In this paper, we use the output from the second-generation Canadian Earth System Model (CanESM2). Canadian Centre for Climate Modeling and Analysis (CCCma) developed CanESM2 within the framework of CMIP5 (Climate Inter-comparison Project Phase 5) that contributed results to the Fifth Assessment Report (AR5) of the IPCC. In CMIP5, the historical run is forced by observed atmospheric composition changes reflecting both anthropogenic and natural sources, and the projections of climate change are forced with specified emission scenarios or concentrations referred to as Representative Concentration Pathways (RCPs) (Taylor et al. 2012).

For CMIP5, four RCPs represent a range of projections of future population growth, technological advancements, and societal responses. The labels for the RCPs provide a estimate of the radiative forcing in the year 2100 relative to preindustrial conditions (Taylor et al. 2012).

The datasets of CanESM2 are extracted for 59 grid points over and around Iran for the period of 1971–2100. These grids are uniformly distributed with horizontal resolution of roughly 2.8125°. CanESM2 includes a fourth-generation atmospheric general circulation model (CanCM4), a physical ocean component (OGCM4), the Canadian Model of Ocean Carbon (CMOC), and a process-based dynamic vegetation model, known as the Canada Terrestrial Ecosystem Model (CTEM) (Arora and Boer 2010, 2014).

We used CanESM2 model projections for RCPs 2.6, 4.5, and 8.5 to project the future climate conditions (2006–2100), while 35 years (1971–2005) of reanalysis data from National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) was used as the models predictors in the reference period (Kalnay et al. 1996). These daily datasets were been extracted from Canadian Climate Data and Scenarios (CCDS) website (http://ccds-dscc.ec.gc.ca) at daily time scale. In order to make a fair comparison between the results of the proposed monthly model (MSDSM) and the daily model (SDSM), these daily datasets were converted into monthly data and used in the monthly model. These datasets contain twenty-six different atmospheric variables as listed in Table 2.

Table 2 NCEP predictors

Full size table

Methodology

Statistical Downscaling Model (SDSM)

The proposed model, MSDSM, has been developed based on the existing SDSM model in the MATLAB environment; thus, we first describe the platform of SDSM in this section of the paper.

Multiple linear regression downscaling model (MLRDM) is the mathematical base of SDSM software (Wilby et al. 2002). SDSM outputs are the average of several weather ensembles that are the results of applying linear regression models with stochastic terms of bias correction. Because of the linear structure of SDSM, prediction selection is usually handled through linear (and/or partial) correlation analysis between a predictand and predictors. The weights of the predictors are calculated via simple least square or dual simplex methods.

SDSM contains two separate sub-models: one to determine the occurrence of conditional meteorological variables (discrete variables) including precipitation, and the other one to estimate the amount of conditional and/or unconditional variables (continuous variables) such as temperature or evaporation. Comprehensive description about SDSM and its modules can be found in Wilby et al. (1999), Wilby et al. (2002). The general platform of SDSM is shown in Fig. 2. The major steps when utilizing SDSM are as follows:

1.
Feature selection SDSM provides statistical analysis tools for selecting the best predictor(s). In SDSM, predictors should have acceptable unconditional and conditional correlations with the predictand. In addition, partial correlation, P value and explained variance of the predictors can be checked while using SDSM.
2.
Amount and occurrence modeling MLR is utilized to simulate the occurrence or estimate the amounts of climatic variables. This model can be calibrated by two different methods, namely ordinary least square and dual simplex methods. An autoregressive term can also be added to this model. For conditional meteorological variables such as precipitation, for each day in an ensemble, a uniformly distributed random number between 0 and 1 is generated. If the generated random number is less than the output of the occurrence model in that day, the event occurs. Otherwise, it does not occur. Utilizing SDSM, different conditional or unconditional models can be calibrated for each of the 12 months of the year.
3.
Bias correction and variance inflation In SDSM, bias correction (b in Eq. (1)) and variance inflation factor (VIF in Eq. (2)) actions can be applied on the results of each of the monthly models to achieve acceptable ensemble results both in the calibration and in the validation periods (Hessami et al. 2008):

$${\text{b}} = \text{Mean}_{{\text{obs}}} - \text{Mean}_{{\text{mod}}}$$

(1)

$$\text{VIF} = \frac{{12\left( {\text{Var}_{{\text{obs}}} - \text{Var}_{\bmod } } \right)}}{{\text{Ste}^{2} }}$$

(2)

where Mean_obs and Mean_mod are the mean values of the observed and modeled predictand, respectively. Var_obs and Var_mod are the variances of observed and modeled predictands for the calibration period and Ste is the standard error in the same period. b − 1 is added to the amount of predictand in each day, and √(VIF/12) is multiplied to the standard deviation of modeling error. While the downscaling model is calibrated using NCEP dataset, in estimating VIF and bias correction, variables with the subscript mod are estimated using downscaling model outputs based on GCM simulations. This approach allows the modeler to take into account the bias of GCM results in the downscaling process (Nasseri et al. 2013, Tavakol-Davani et al. 2013).

Finally, in order to achieve a single downscaled time series from all projected ensembles, their arithmetic mean is calculated. In this study, we use SDSM reproduced in MATLAB environment (Nasseri et al. 2013, Tavakol-Davani et al. 2013).

Proposed methodology for MSDSM application

We modified SDSM platform to make it applicable to monthly time scale and added some statistical post-processing tools, which are introduced in the following sections of the paper.

Variance correction factor (VCF)

Downscaling methods developed based on MLR have a tendency to underestimate variance (Fowler et al. 2007). The results of the previous studies have shown that even by applying VIF, SDSM performance in reproducing long-term variance can still be poor (Hessami et al. 2008, Hashmi et al. 2011). Therefore, in order to improve the consistency between variance of the model results and the observed records of predictand, we used variance correction factor (VCF) inspired by the work of Chen et al. (2013). VCF is a type of bias correction approach in which it is assumed that the model biases stay constant over time, and the relationship between the probability distributions of the observed and modeled predictands is the same for the reference and the future periods.

When applying VCF, in order to calculate the relationship between the probability distributions of the observed and modeled predictands, values are compared based on the exceedance probabilities. In other words, variance correction factors are obtained per month after sorting the observed monthly predictand, and the model results in the reference period in descending order:

$$\varphi \left( p \right) = \frac{{Q_{{\text{obs}}} \left( p \right)}}{{Q_{\bmod } \left( p \right)}}$$

(3)

where φ(p), Q _obs(p), and Q _mod(p) are VCF, observed and modeled monthly predictand values, respectively, all associated with the exceedance probability p. p for each value is calculated based on Weibull statistical distribution as follows:

$$p\left( k \right) = \frac{k}{n + 1}$$

(4)

where k is the rank of each value after sorting the data in descending order and n is the sample size. The procedure is the same in order to apply the obtained variance correction factors to the modeled distribution in the future period. That is, after calculating exceedance probability for monthly predictand in the future period, the corresponding factors are applied to them.

When the lengths of the reference and future periods are different, the factors estimated for the closest empirical exceedance probability are applied to the future predictions. Suppose that the reference and future periods contain n _r and n _f values in a month, respectively, VCF for any value in the future period is selected such that the following equation for all integer values of k _r (the rank number in the reference period) in the range of 1 ≤ k _r ≤ n _r is minimized:

$$z = \left| {\frac{{k_{r} }}{{n_{r} + 1}} - \frac{{k_{f} }}{{n_{f} + 1}}} \right|$$

(5)

where k _f is the rank number of the future monthly predictand.

k-fold cross-validation

The historical records of predictands are limited in many meteorological stations. Considering the low number of available monthly records compared with daily records, it is not reasonable to set aside a part of dataset in order to perform validation. To address this issue, k-fold cross-validation technique has been utilized in this study for application of MSDSM.

A typical validation is a statistical method for evaluating the performance of a model by dividing the dataset into two mutually exclusive subsets that one is used to calibrate a model and the other is utilized to validate the model (Kohavi 1995, Refaeilzadeh et al. 2009). In typical cross-validation, the calibration and validation subsets must crossover in successive rounds.

One of the popular approaches of cross-validation is k-fold approach in which the dataset is randomly split into k segments (the folds) of approximately equal size. Subsequently, the model is calibrated and validated k times such that each time a different fold of the data is left out for validation, whereas the remaining k − 1 folds are used for calibration; thus, each of the k segments is used exactly once as the validation dataset. In the proposed model, k value is optional and can be set between two and the number of years of available data records. When k is equal to the number of years of observation, the k-fold cross-validation is exactly the same as leave-one-out cross-validation, which has been widely used when limited data are available (Refaeilzadeh et al. 2009).

Selection of the predictors

Three main branches of feature selection (or selection of predictors, here in this manuscript) are embedded-, wrapper-, and filter-based approaches (Tan et al. 2006). The wrapper- and filter-based methods are the most well-known procedures in the realm of feature selection (Guyon and Elisseeff 2003). To select the suitable subsets of the probable inputs, wrapper methods evaluate the model performance for nearly all possible subsets of input variables based on their calibration performance (Liu and Yu 2005).

Filter-based techniques are model-free approaches that utilize statistical indicators to find the existing dependencies between the probable input and output variables. The linear correlation coefficient is a popular criterion for measuring dependencies in these techniques. It has been shown that effectiveness of linear correlation coefficient in detecting the relationship between predictors and predictand is mostly linked to the interaction of noise and data transformation during the procedure of feature selection, so it is not recommended for feature selection in real nonlinear systems (Battiti 1994). Mutual Information (MI) index is another filter-based method for feature selection. It is a dimensionless statistical indicator and describes the reduction in amount of uncertainty in estimation of one parameter when another is available (Liu et al. 2009). This statistical indicator is a robust and nonlinear approach and recently has been found to be an appropriate statistical criterion in feature or predictor selection problems in hydrology (Füssel et al. 2003, Bowden et al. 2005a, b, May et al. 2008a, b, Jeong et al. 2012, Nasseri et al. 2013, Fu et al. 2016). Achieving the best subset of input predictors in downscaling problems is complicated and challenging because of the large number of meteorological predictors while considering the interactions of model parameters and its structure. In this study, we selected the filter-based feature selection approach using MI indicator for choosing the best predictors for the proposed downscaling model.

Performance evaluation

We illustrate the effectiveness of the presented approach through a comparison between MSDSM and SDSM results. For this purpose, at first, the daily outputs of SDSM have been converted into monthly time scale, and then, the performance of the models has been evaluated by comparing their results with observed data in the validation period.

The performance of the models in estimating monthly mean and monthly variance is assessed using the following equation:

$$\theta = \frac{{\mathop \sum \nolimits_{m = 1}^{12} \left| {\left( {Y_{m}^{\rm SDSM} - Y_{m}^{\rm obs} } \right)} \right| }}{{\mathop \sum \nolimits_{m = 1}^{12} \left| {\left( {Y_{m}^{\rm MSDSM} - Y_{m}^{\rm obs} } \right)} \right|}}$$

(6)

where m is the month number (m = 1, …, 12), Y ^SDSM_m and Y ^MSDSM_m are the modeled values by SDSM and MSDSM, respectively, and Y ^obs_m is the observed value for either monthly mean or monthly variance in each station. The closer $\theta$ is to one, the model performance are more similar. A significantly larger than one value for $\theta$ implies superiority of MSDSM over SDSM.

For further assessment of the models’ performances, the results of the models have been compared based on absolute relative error (RE) in estimating monthly mean and monthly variance values:

$$\text{RE}_{m} = \frac{{\left| {\left( {Y_{m}^{\bmod } - Y_{m}^{{\text{obs}}} } \right)} \right|}}{{Y_{m}^{{\text{obs}}} }}$$

(7)

where m is the month number (m = 1, …, 12), Y ^mod_m is the modeled value and Y ^obs_m is the observed value for either monthly mean or monthly variance. After calculating RE for both monthly mean and monthly variance in all months, error improvement (EI) is calculated by the following equation:

$${\text{EI}}_{m} = {\text{RE}}_{m}^{\text{SDSM}} - {\text{RE}}_{m}^{\text{MSDSM}}$$

(8)

Positive EI indicates better performance of MSDSM and negative EI implies that SDSM works better.

Results and discussion

To apply SDSM and MSDSM models, selection of appropriate predictors from the pool of meteorological predictors is the first step. For this purpose, the average MI index values have been calculated for all combinations of predictors and precipitation to select suitable predictors from all grid boxes (i.e., 26 × 59 MI values for each station). In order to pick out different predictors, the first five dissimilar predictors with highest MI values have been selected for each rain gauge station. MI values have been calculated based on daily time scale of the available dataset without considering any time lag.

Table 3 shows the top six selected predictors in different climatic zones. The numbers in this table represent the percentage of stations (located in a specific climatic zone) in which each predictor has been selected. The most selected among the predictors are 500 hPa geopotential height, total precipitation, near-surface specific humidity, 850 hPa meridional velocity, 500 hPa geostrophic air flow velocity, and 2 m mean temperature. It can be seen from the table that the selected variables are consistent and do not change significantly between different climates. It should be mentioned, although the proposed model is applicable to all meteorological variables, here we just present and discuss the downscaling results for precipitation.

Table 3 The top six selected predictors in different climatic zones (the numbers are the percentage of stations in which the predictor has been selected)

Full size table

As mentioned earlier, to evaluate the performance of MSDSM and compare its results with SDSM, the same datasets have been used for both models with different time resolutions. The daily time series of predictors and precipitation (predictand) have been used in SDSM application, and then, the downscaled values obtained from SDSM have been converted to monthly values in order to be compared with MSDSM outputs. The monthly time series of predictors and precipitation (predictand) have been used in MSDSM application. The number of generated ensembles in each downscaling simulation is set to 100, and mean values of the ensembles are presented. It should be noted, since cross-validation cannot be done by SDSM, we have not used this ability in MSDSM at the first part of this section. Thus, the first set of results (i.e., Figs. 3, 4; Table 4) are outputs of the models in the validation period (1996–2005). In this set of results, VCF has been applied to MSDSM outputs. Nonetheless, in the second part (i.e., Figs. 5, 6; Table 5), we have investigated the effectiveness of VCF method by comparing the results of MSDSM before and after applying VCF. It enables us to use the cross-validation technique, and thus, these results have been obtained using the whole reference period (1971–2005).

Table 4 Averaged θ for all of the stations in different climatic zones in the validation period

Full size table

Table 5 Averaged θ for all of the stations in different climatic zones in the reference period

Full size table

The percentage of the stations in which calculated θ values (for monthly mean and monthly variance) in the validation period are larger than 1.10 or smaller than 0.90 is presented in Fig. 3. If θ falls in [0.90, 1.10], the results of the models are considered to be relatively similar. It is apparent from Fig. 3a that MSDSM performs significantly better in simulating monthly mean in each of the three different climatic zones. In particular, θ is larger than 1.10 in more than 70% of the stations located in the arid climate zone. For the majority of the stations (about 63%), $\theta$ is larger than 1.10, while it is smaller than 0.90 for only 21% of the stations which indicates the superiority of MSDSM in simulating monthly mean of the precipitation. Averaged θ of monthly mean for all of the stations is 1.40 (Table 4).

It can be concluded from Fig. 3b that SDSM is better in simulating variance in stations located in humid and mediterranean climates, while MSDSM performs better in the stations located in the arid region. In total, the number of stations in which $\theta$ is smaller than 0.90 is approximately the same as the number of stations in which $\theta$ is larger than 1.10 (about 40%). It is worth mentioning that very large values of $\theta$ have been observed for some of the stations such that the average $\theta$ of monthly variance is still about 1.0 in the regions with humid and mediterranean climates, whereas it is larger than 2 in the arid climate zone (Table 4).

We estimated the error improvements (EIs) for all the stations and calculated their average based on the different climatic zones (Fig. 4). In this figure, the primary axis is EI in monthly mean and the secondary axis is EI in monthly variance. The EIs in monthly mean are slightly negative only in April to July for the stations located in the humid climate zone, in June for the stations located in the mediterranean climate zone, and in October for the stations located in arid climate zone. When EI is averaged for all of the 288 stations, it is marginally negative only in October (Fig. 4d). These results again emphasize on the noticeably better performance of MSDSM in estimating monthly mean.

The EIs in monthly variance are highly variable from month to month in each of the three climatic zones. In the humid climate zone, SDSM demonstrates a better performance, while in the mediterranean and arid climates, we can see a better consistency of MSDSM results with the observed monthly variance particularly in March and April. In Fig. 4d, the average values of EIs in monthly variance for all stations are negative only in October and June, revealing the superiority of MSDSM.

Overall, Figs. 3 and 4 and Table 4 show that MSDSM has been able to improve or sustain the SDSM’s level of accuracy in reproducing mean and variance of the precipitation observed in rain gauge station scattered in various climatic zones. VCF application in MSDSM has been helpful in keeping the variance of the downscaled monthly series close to daily downscaled precipitation series obtained from SDSM. In order to evaluate the effectiveness of VCF method, we have compared the results of MSDSM before and after applying VCF using Eq. 6. When θ is larger than 1.10, it demonstrates the positive effect of VCF method, and when it is smaller than 0.90, it indicates the probable adverse effect of VCF modification factor. It is worth mentioning again that in this part of the paper, the cross-validation ability of MSDSM has been used and leave-one-out cross-validation has been carried out.

It is apparent from Fig. 5a that $\theta$ is smaller than 0.90 in more than 60% of the stations in each of the three climates. $\theta$ is >1.10 in only 10% of all of the 288 stations so that its overall average is 0.83 (Table 5). It means that the performance of MSDSM in estimating mean values is better before applying VCF method. This point was predictable because MLR methods have a high ability to simulate the long-term mean of the predictands.

The significant effect of VCF on the model performance has been in simulating monthly variance. In all of the stations in each of the three climates zones, θ values are considerably larger than 1.10 (Fig. 5b). The average of θ for all of the stations is 5.95, while we see the highest value of θ in the humid climate zone (Table 5), where the variances of observed monthly precipitation are high (see Table 1). It demonstrates the significant improvement of MSDSM in regenerating monthly variance values after applying VCF method.

The average of EI values for each of the climatic zones is presented in Fig. 6. In this figure, positive EI indicates the better performance of MSDSM after applying VCF, while negative EI implies that MSDSM works better without VCF. We see approximately the same pattern in all climates. After applying VCF, the error of the model in estimating monthly mean increases slightly while its performance in simulating monthly variance has improved significantly. The smallest negative values of EI in monthly mean (down to −7) occur simultaneously with the highest values of EI in monthly variance (up to 80) between May and October. The significant difference between the orders of magnitude of the primary and secondary axes in Fig. 6 shows that significant improvement of variance estimation with VCF-enabled MSDSM has been achieved by much lower reduction in accuracy of mean estimation in months with little precipitation. It also implies that without VCF application, MSDSM is not a suitable tool for downscaling of monthly precipitation.

Figure 7 shows average share of monthly precipitation in all rain gauge stations in the three climate zones. As it can be seen in this figure, between May and October, approximately 21, 19, and 9% of the total precipitation occur in humid, mediterranean, and arid climate zones, respectively. This implies much bigger share of precipitation in the months of January through April and December in which Fig. 6 shows that the accuracy of mean estimations by MSDSM with and without VCF has been almost the same.

We have also investigated the effects of climate change on precipitation in different climatic zones of the study area. For this purpose, RCP scenarios have been downscaled on the stations using MSDSM for the 2011–2040 period. Change in mean values from reference period (1971–2005) to future period (2011–2040) (Δ) is calculated using the following equation:

$$\Delta = \frac{{Y_{\bmod } - Y_{\text{obs}} }}{{Y_{\text{obs}} }}$$

(9)

where Y _mod is the downscaled value (2011–2040) and Y _obs is the observed value (1971–2005) for the monthly mean precipitation. Since climate change is expected to impose various impacts on different seasons, we calculated the average of Δ for each season (Table 6). It can be seen in the table that all the scenarios show a decrease in winter precipitation in all of the climatic zones. The changes in spring precipitation are not significant, while we see a considerable increase in the summer precipitation in the arid climate zone (up to 27%). The most significant increase in precipitation occurs in fall, and the largest share of this increase belongs to arid climate which is in accordance with the reported results in (Karandish et al. 2016). They used the outputs of 15 GCMs under three SRES scenarios of A1B, A2, and B1 and analyzed the seasonal precipitation in different climatic zones in Iran. They also showed that the lowest change occurs in humid regions, which supports the results of this study as well (Table 6).

Table 6 Values of Δ in different seasons in each of the climatic zones under RCP scenarios

Full size table

For further investigation of the variations in the spatial distribution of precipitation, the calculated Δ are used and interpolated by inverse distance weighting (IDW) method (Fig. 8). Although the density of rain gauge stations, particularly in central and eastern regions, is not enough for interpolation at ungauged regions, the maps generated by IDW method can still be informative. As expected, Fig. 8 shows a decrease in winter precipitation in all the climates except in southwest. In RCP 2.6, the highest decrease has occurred in the arid climate, while in RCP 8.5, it is the mediterranean climate that experiences the main reduction.

In comparison with the other seasons, changes in winter precipitation show the highest variations under different scenarios so that their spatial correlation is <0.7. Conversely, in summer, the changes under RCPs 4.5 and 8.5 demonstrate the highest spatial correlation (0.97). We see a decrease in spring precipitation in the southern coast of Caspian Sea under all scenarios. The same thing is happening in summer precipitation in northwest. It is apparent from the figure that the whole country is experiencing more precipitation in fall season so that the highest increase has occurred in arid climate under RCP 4.5 scenario. Changes in winter precipitation under RCP 2.6 and in fall precipitation under RCP 8.5 show the highest negative spatial correlation (−0.35), which represent the opposite effects of climate change in different seasons.

Conclusion

In this paper, a statistical method has been proposed for improving monthly downscaling of precipitation. The results of the proposed model, namely MSDSM, in 288 rain gauge stations in Iran have been compared with the results of SDSM. It was found that MSDSM (with applying VCF) has a satisfactory performance in downscaling monthly precipitation, and it can be a useful alternative to the other downscaling models such as SDSM in monthly time resolution. More accurate estimation of monthly mean by MSDSM in 63% of all stations in the three climate zones and 70% of the stations located in arid climate zone have shown superiority of MSDSM (with VCF) over SDSM. Monthly mean values have been estimated less accurately in just 21% of the rain gauge stations.

MSDSM (with VCF) that uses k-fold cross-validation has also performed significantly better than the original MSDSM. After applying VCF, the results indicate a slight error increase in estimating monthly mean in the months of May through October in which <9–21% of the annual precipitation occurs in various climate zones. In the rest of the months, the performance of MSDSM model, with and without VCF, has been almost the same in mean estimation. The reduction in relative error of variance estimation in various months in the results of MSDSM, when VCF has been applied, has been between 40 and 80%. Overall, the results show that MSDSM when combined with VCF application can be a suitable replacement for SDSM. In other words, in studies in which monthly time resolution is enough for assessing climate change impacts, combination of MSDSM and VCF can be used as a suitable downscaling technique.

In projecting precipitation variations in the future using different RCPs, we found that the lowest change occurs in humid regions, while the most significant increase takes place in fall and the largest share of this increase belongs to arid climate (Table 6). We also see the highest negative spatial correlation (−0.35) between changes in winter precipitation under RCP 2.6 and in fall precipitation under RCP 8.5 which represent the opposite effects of climate change in different seasons (Fig. 8).

MSDSM is an unconditional model, so it is applicable for downscaling of all meteorological variables including temperature, evaporation, and streamflow, and there is no restriction for using MSDSM for other variables. Future studies can focus on assessing MSDSM performance in downscaling other meteorological variables. Furthermore, the results of the paper confirmed that the choice of the downscaling method introduces additional uncertainty. Future work might consider the assessment of uncertainties in MSDSM and SDSM structures and outputs.

References

Anandhi A, Srinivas V, Nanjundiah RS, Nagesh Kumar D (2008) Downscaling precipitation to river basin in India for IPCC SRES scenarios using support vector machine. Int J Climatol 28:401–420
Article Google Scholar
Arora V, Boer G (2010) Uncertainties in the 20th century carbon budget associated with land use change. Glob Change Biol 16:3327–3348
Article Google Scholar
Arora V, Boer G (2014) Terrestrial ecosystems response to future changes in climate and atmospheric CO2 concentration. Biogeosciences 11:4157
Article Google Scholar
Barnett V, Lewis T (1974) Outliers in statistical data. Wiley, Hoboken
Google Scholar
Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Networks 5:537–550
Article CAS Google Scholar
Bowden GJ, Dandy GC, Maier HR (2005a) Input determination for neural network models in water resources applications. Part 1—background and methodology. J Hydrol 301:75–92
Article Google Scholar
Bowden GJ, Maier HR, Dandy GC (2005b) Input determination for neural network models in water resources applications. Part 2. Case study: forecasting salinity in a river. J Hydrol 301:93–107
Article CAS Google Scholar
Dibike YB, Coulibaly P (2005) Hydrologic impact of climate change in the Saguenay watershed: comparison of downscaling methods and hydrologic models. J Hydrol 307:145–163
Article Google Scholar
Dixon W (1953) Processing data for outliers. Biometrics 9:74–89
Article Google Scholar
Fistikoglu O, Okkan U (2010) Statistical downscaling of monthly precipitation using NCEP/NCAR reanalysis data for Tahtali River basin in Turkey. J Hydrol Eng 16:157–164
Article Google Scholar
Fowler H, Blenkinsop S, Tebaldi C (2007) Linking climate change modelling to impacts studies: recent advances in downscaling techniques for hydrological modelling. Int J Climatol 27:1547–1578
Article Google Scholar
Fu Q, Lin L, Huang J, Feng S, Gettelman A (2016) Changes in terrestrial aridity for the period 850–2080 from the community earth system model. J Geophys Res Atmos 121:2857–2873
Article Google Scholar
Füssel H-M, Toth FL, van Minnen JG, Kaspar F (2003) Climate impact response functions as impact tools in the tolerable windows approach. Clim Change 56:91–117
Article Google Scholar
Goly A, Teegavarapu RSV, Mondal A (2014) Development and evaluation of statistical downscaling models for monthly precipitation. Earth Interact 18:1–28
Article Google Scholar
Goyal MK, Ojha C, Burn DH (2011) Nonparametric statistical downscaling of temperature, precipitation, and evaporation in a semiarid region in India. J Hydrol Eng 17:615–627
Article Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Google Scholar
Hashmi MZ, Shamseldin AY, Melville BW (2011) Comparison of SDSM and LARS-WG for simulation and downscaling of extreme precipitation events in a watershed. Stoch Env Res Risk Assess 25:475–484
Article Google Scholar
Hashmi MZ, Shamseldin AY, Melville BW (2013) Statistically downscaled probabilistic multi-model ensemble projections of precipitation change in a watershed. Hydrol Process 27:1021–1032
Article Google Scholar
Hellstrom C, Chen D, Achberger C, Raisanen J (2001) Comparison of climate change scenarios for Sweden based on statistical and dynamical downscaling of monthly precipitation. Clim Res 19:45–55
Article Google Scholar
Hessami M, Gachon P, Ouarda TB, St-Hilaire A (2008) Automated regression-based statistical downscaling tool. Environ Model Softw 23:813–834
Article Google Scholar
Huth R, Kyselý J (2000) Constructing site-specific climate change scenarios on a monthly scale using statistical downscaling. Theoret Appl Climatol 66:13–27
Article Google Scholar
Jeong D, St-Hilaire A, Ouarda T, Gachon P (2012) Comparison of transfer functions in statistical downscaling models for daily temperature and precipitation over Canada. Stoch Env Res Risk Assess 26:633–653
Article Google Scholar
Kalnay E, Kanamitsu M, Kistler R, Collins W, Deaven D, Gandin L, Iredell M, Saha S, White G, Woollen J (1996) The NCEP/NCAR 40-year reanalysis project. Bull Am Meteor Soc 77:437–471
Article Google Scholar
Karandish F, Mousavi SS, Tabari H (2016) Climate change impact on precipitation and cardinal temperatures in different climatic zones in Iran: analyzing the probable effects on cereal water-use efficiency. Stoch Environ Res Risk Assess. doi:10.1007/s00477-016-1355-y
Article Google Scholar
Khalili K, Tahoudi MN, Mirabbasi R, Ahmadi F (2016) Investigation of spatial and temporal variability of precipitation in Iran over the last half century. Stoch Env Res Risk Assess 30:1205–1221
Article Google Scholar
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI 14:1137–1145
Google Scholar
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17:491–502
Article Google Scholar
Liu H, Sun J, Liu L, Zhang H (2009) Feature selection with dynamic mutual information. Pattern Recogn 42:1330–1339
Article Google Scholar
Maurer EP, Hidalgo HG (2008) Utility of daily vs. monthly large-scale climate data: an intercomparison of two statistical downscaling methods. Hydrol Earth Syst Sci 12:551–563
Article Google Scholar
May RJ, Dandy GC, Maier HR, Nixon JB (2008a) Application of partial mutual information variable selection to ANN forecasting of water quality in water distribution systems. Environ Model Softw 23:1289–1299
Article Google Scholar
May RJ, Maier HR, Dandy GC, Fernando TG (2008b) Non-linear variable selection for artificial neural networks using partial mutual information. Environ Model Softw 23:1312–1326
Article Google Scholar
Najafi M, Moradkhani H, Wherry S (2011) Statistical downscaling of precipitation using machine learning with optimal predictor selection. J Hydrol Eng 16:650–664
Article Google Scholar
Nasseri M, Tavakol-Davani H, Zahraie B (2013) Performance assessment of different data mining methods in statistical downscaling of daily precipitation. J Hydrol 492:1–14
Article Google Scholar
Ojha C, Goyal MK, Adeloye A (2010) Downscaling of precipitation for lake catchment in arid region in India using linear multiple regression and neural networks. Open Hydrol J 4:122–136
Article Google Scholar
Olsson J, Uvo C, Jinno K, Kawamura A, Nishiyama K, Koreeda N, Nakashima T, Morita O (2004) Neural networks for rainfall forecasting by atmospheric downscaling. J Hydrol Eng 9:1–12
Article Google Scholar
Prudhomme C, Reynard N, Crooks S (2002) Downscaling of global climate models for flood frequency analysis: where are we now? Hydrol Process 16:1137–1150
Article Google Scholar
Raje D, Mujumdar P (2011) A comparison of three methods for downscaling daily precipitation in the Punjab region. Hydrol Process 25:3575–3589
Article Google Scholar
Refaeilzadeh P, Tang L, Liu H (2009) Cross-validation. In: Liu L, Özsu MT (eds) Encyclopedia of database systems. Springer, New York
Google Scholar
Sachindra D, Huang F, Barton A, Perera B (2014a) Statistical downscaling of general circulation model outputs to precipitation—part 1: calibration and validation. Int J Climatol 34:3264–3281
Article Google Scholar
Sachindra D, Huang F, Barton A, Perera B (2014b) Statistical downscaling of general circulation model outputs to precipitation—part 2: bias-correction and future projections. Int J Climatol 34:3282–3303
Article Google Scholar
Semenov MA, Brooks RJ, Barrow EM, Richardson CW (1998) Comparison of the WGEN and LARS-WG stochastic weather generators for diverse climates. Climate Res 10:95–107
Article Google Scholar
Tan P-N, Steinbach M, Kumar V (2006) Classification: basic concepts, decision trees, and model evaluation. Introduction to data mining, vol 1, pp 145–205
Tavakol-Davani H, Nasseri M, Zahraie B (2013) Improved statistical downscaling of daily precipitation using SDSM platform and data-mining methods. Int J Climatol 33:2561–2578
Article Google Scholar
Taylor KE, Stouffer RJ, Meehl GA (2012) An overview of CMIP5 and the experiment design. Bull Am Meteor Soc 93:485–498
Article Google Scholar
Tripathi S, Srinivas V, Nanjundiah RS (2006) Downscaling of precipitation for climate change scenarios: a support vector machine approach. J Hydrol 330:621–640
Article Google Scholar
Wilby RL, Hay LE, Leavesley GH (1999) A comparison of downscaled and raw GCM output: implications for climate change scenarios in the San Juan River basin, Colorado. J Hydrol 225:67–91
Article Google Scholar
Wilby RL, Dawson CW, Barrow EM (2002) SDSM—a decision support tool for the assessment of regional climate change impacts. Environ Model Softw 17:145–157
Article Google Scholar
Wilby R, Charles S, Zorita E, Timbal B, Whetton P, Mearns L (2004) Guidelines for use of climate scenarios developed from statistical downscaling methods. Supporting material of the Intergovernmental Panel on Climate Change, available from the DDC of IPCC TGCIA, 27
Willems P, Vrac M (2011) Statistical precipitation downscaling for small-scale hydrological impact investigations of climate change. J Hydrol 402:193–205
Article Google Scholar
Willems P, Arnbjerg-Nielsen K, Olsson J, Nguyen VTV (2012) Climate change impact assessment on urban rainfall extremes and urban drainage: methods and shortcomings. Atmos Res 103:106–118
Article Google Scholar
Wood AW, Leung LR, Sridhar V, Lettenmaier D (2004) Hydrologic implications of dynamical and statistical approaches to downscaling climate model outputs. Clim Change 62:189–216
Article Google Scholar

Download references

Acknowledgements

This study has been carried out in the Water Institute of the University of Tehran. Authors appreciate supports of the institute for this research. Also, the authors acknowledge and appreciate the constructive comments of the anonymous reviewers.

Author information

Authors and Affiliations

School of Civil Engineering, College of Engineering, University of Tehran, Tehran, Iran
H. A. Pahlavan, M. Nasseri & A. Mahdipour Varnousfaderani
Center of Excellence on Infrastructure Engineering and Management, School of Civil Engineering, College of Engineering, University of Tehran, Tehran, Iran
B. Zahraie
Department of Environment, Climate Change Office, Tehran, Iran
M. Nasseri

Authors

H. A. Pahlavan
View author publications
You can also search for this author in PubMed Google Scholar
B. Zahraie
View author publications
You can also search for this author in PubMed Google Scholar
M. Nasseri
View author publications
You can also search for this author in PubMed Google Scholar
A. Mahdipour Varnousfaderani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to H. A. Pahlavan.

Additional information

Editorial responsibility: M. Abbaspour.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pahlavan, H.A., Zahraie, B., Nasseri, M. et al. Improvement of multiple linear regression method for statistical downscaling of monthly precipitation. Int. J. Environ. Sci. Technol. 15, 1897–1912 (2018). https://doi.org/10.1007/s13762-017-1511-z

Download citation

Received: 13 April 2016
Revised: 01 July 2017
Accepted: 07 August 2017
Published: 05 September 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s13762-017-1511-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Improvement of multiple linear regression method for statistical downscaling of monthly precipitation

Abstract

Similar content being viewed by others

A new statistical precipitation downscaling method with Bayesian model averaging: a case study in China

A method for deterministic statistical downscaling of daily precipitation at a monsoonal site in Eastern China

Application of Multiple Linear Regression as Downscaling Methodology for Lower Godavari Basin

Introduction