1 Introduction

There exists a mismatch between the spatial resolution of general circulation model (GCM) outputs and the scales of interest in climate impact studies. As a solution to bridge this gap, many downscaling methods have been proposed. It is assumed that large-scale climate characteristics have great influences on the local scale hydroclimatology. By contrast, the effects from catchment scale on continental scale are negligible (Maraun et al. 2010). In order to be used in local scale climate impact assessment, several methods for downscaling coarse scale general circulation model (GCM) outputs are presented. Downscaling models can be divided into two classes: dynamic downscaling and statistical downscaling (Wilby and Wigley 1997; Xu 1999; Fowler et al. 2007; Sachindra et al. 2014). Dynamical downscaling is associated with high computational costs, whereas statistical downscaling is used widely to produce climate information at point or local scales due to the simplicity and effectiveness (Hu et al. 2013).

Various methods have been developed to obtain the relationship between predictors and predictand of interest in statistical downscaling. These techniques include the weather classification technique (Shao and Li 2013), classical regression models (Meenu et al. 2013), and more sophisticated methods such as support vector machine (Ghosh and Katkar 2012), artificial neural networks (Tolika et al. 2008), and generalized additive models (Tisseuil et al. 2010). Classical regression methods are the most widely used in statistical downscaling processes, such as in Generalized Linear Model for daily CLImate (GLIMCLIM; Chandler 2002) and Statistical DownScaling Model (SDSM; Wilby et al. 2002). However, some drawbacks exist in these methods. Ordinary linear regression model (OLS) allows the relationship between the simulated values, and influencing factors remain the same in all places, which does not handle the problem of spatial non-stationarity. Unlike OLS, geographically weighted regression (GWR) method enables the regression parameters to vary as continuous functions over space and is suitable for modeling precipitation with large gradients (Brunsdon et al. 1996). GWR can provide more detailed information between variables that may be lost in conventional linear regression model (Kamarianakis et al. 2008). Moreover, traditional statistical downscaling method is just based on various regression methods, which is restricted by the choices of the predictors and the number of gauges.

In this study, we give a new statistical downscaling method based on GWR and high accuracy surface modeling method (HASM). We first use GWR to give the transfer function, which exhibits spatial non-stationarity for fitting data between precipitation and predictors to be addressed. Then, we apply HASM to interpolate the residuals of the regression and build the anomaly surface. The local regression is typically established by GCM outputs rather than historical ground observations. The residual will be modified by using station observations. Downscaled precipitation estimate is subsequently validated using two independent datasets from precipitation gauges in China and Jiangxi Province both on annual and seasonal scales. Then, the scenarios of future changes with a resolution of 1 km × 1 km are given by the proposed statistical downscaling method.

2 Study area and data

The study area, China, is located between 3° 51′ to 53° 33′ N and 73° 33′ to 135° 05′ E in East Asia. It covers an area of 9,600,000 km2 with an elevation ranging from −152 to 8682 m. The topographical appearance of China is roughly high mountains and plateaus in the southwest, inhospitable deserts in the northwest, and low, fertile plains in the east (Fig. 1). It is a predominantly mountainous country covering one third of her landmass. The climate in China shows great variation (Domroes and Peng 1998). In the north, the summers are hot and dry, and the winters are freezing cold. The south regions have semi-tropical summers and cool winters with plenty of precipitation. Seasonal change of precipitation is obvious and is decisively determined by winter and summer monsoon systems. The annual total precipitation has a remarkable change from less than 20 mm in the northwest to more than 2500 mm in the southeast because of the monsoon circulation and the effects of topography (Wei et al. 2015). Figure 2 gives the scatter diagrams of the annual and seasonal mean precipitation against elevation for the period 1976–2005. The precipitation decreases with elevation as a whole. The difference in seasonal mean precipitation among the stations ranges from 276 mm (dry season, winter) to 1680 mm (wet season, summer). Summer precipitation accounts for about 50 % of annual precipitation. However, only about 10 % of annual precipitation occurs in the winter months.

Fig. 1
figure 1

Distribution map of meteorological stations and DEM in China

Fig. 2
figure 2

Precipitation of the meteorological stations plotted against elevation for a annual mean precipitation, b mean precipitation in spring, c mean precipitation in summer, d mean precipitation in autumn, and e mean precipitation in winter

Historical precipitation data for the period from 1976 to 2005 were collected from 732 national meteorological stations, from the China Meteorological Data Sharing Service System, which were further subjected to strict quality control procedures. We chose about 10 % of the meteorological stations to validate results, while we set aside the remaining 90 % for downscaling calculations. Precipitation gauge measurements in Jiangxi area were also used to test the results (Fig. 1). Datasets from the fifth phase of the Coupled Model Intercomparison Project (CMIP5) were used with a resolution of 1° × 1° (Moss et al. 2008). The outputs from a 21-member ensemble of CMIP5 GCMs include both climate simulations in the twentieth century and twenty-first century climate projections for the IPCC low mitigation (RCP2.6), medium mitigation (RCP4.5), and high emission (RCP8.5) scenarios (Vuuren et al. 2011).

3 Methodology

Statistical downscaling methods involve establishing transfer functions relating coarse scale variables to fine scale variables. In this method, relationship between the predictand and predictor can be given as,

$$ D=f\left(x,\beta \right) $$
(1)

where D stands for the predictand, x is the predictor, and β represents the regression coefficient. f is a transfer function and is usually developed by gauge measurements or re-analysis datasets in the past (Khan et al. 2006; Yue 2011; Fan et al. 2012). In this paper, we employ GWR method to form the regression function by using the outputs Of CMIP5 in the period 1976–2005. Then, we used station data to modify the residuals by HASM.

Before giving the transfer function f, we should transform the precipitation first to avoid extreme values in the final simulations, because of the large gradients in precipitation.

$$ \overline{\overline{ \Pr {e}_i}}=\frac{ \Pr {e}_i}{ \max \left\{ \Pr {e}_i|i=1,\cdots, n\right\}} $$
(2)

where Pre i stands for CMIP5 outputs, \( \overline{\overline{ \Pr {e}_i}} \)is the transformed value, and n is the grid number.

Then, Box-Cox transformation is carried to \( \overline{\overline{ \Pr {e}_i}} \) to adjust the skew of the distribution and thus obtain modified simulations (Bartczak et al. 2014):

$$ \overline{ \Pr {e}_i}=\left\{\begin{array}{c}\hfill \ln \overline{\overline{ \Pr {e}_i}},\delta =0\hfill \\ {}\hfill \frac{{\overline{\overline{Pre}}}_i^{\delta }-1}{\delta },\delta \ne 0\hfill \end{array}\right. $$
(3)

where \( \overline{ \Pr {e}_i} \) is the transformed data, δ is a suitable parameter making \( \overline{ \Pr {e}_i} \) close to normal distribution and meet the requirement of GWR (Harris et al. 2010). Here, δ = 4. This process can avoid negative values in precipitation simulation (Yue et al. 2013).

We apply GWR to give the formulation of f:

$$ f=\alpha Y, $$
(4)

where f is the downscaling value at a resolution of 1 km × 1 km, Y is the covariate matrix which represents several independent covariates, α is a vector of unknown parameters and is a function of longitude and latitude. A different predictor choice will result in a different performance. The independent variables are selected from longitude, latitude, elevation, sky view factor, and impact coefficient of aspect (Yue 2011) based on the determination coefficient in GWR. For the annual mean precipitation, elevation, longitude, latitude, and impact coefficient of aspect are selected as the most effect factors, with R 2 being equal to 0.92. On the seasonal scale, we take summer (June, July, August) and winter (December, January, February) as examples, since summer is the main rainy season and winter is the driest season in China. For the precipitation in summer, longitude, latitude, elevation, and sky view factor are chosen with R 2 being equal to 0.95. The most significant explanatory variables are longitude, latitude, elevation, and impact coefficient of aspect, and correspondingly, R 2 is 0.96 in winter.

It is worse to use only GCM-based predictors due to the possibility of the missing local details. The residuals produced by the GWR are then interpolated by HASM, which describe fluctuations about the mean. The formula of HASM is (Zhao and Yue 2014)

$$ W{x}^{n+1}={v}^n $$
(5)

where W is a symmetric positive definite matrix and means the first fundamental coefficient of a surface, which denotes the local information of the surface and v stands for the second fundamental coefficient and represents the macro information. Diagonal preconditioned conjugate gradient method is applied to solve Eq. (5) to obtain a modified residual x. Thus, the final downscaling result is

$$ \Pr {e}_{\mathrm{downscale}}=f+x $$
(6)

For the future climate change scenarios, it is assumed that the established predictor-predictand relationship remains valid (Fowler et al. 2007). Based on this, we give the formula of precipitation in future scenarios,

$$ \Pr {e}_{\mathrm{downscale}\_\mathrm{furure}}\left(x,y,{t}_k\right)\kern1em =\kern1em \Pr {e}_{\mathrm{downscale}}\left(x,y,{t}_0\right)\kern0.5em +\kern0.5em \mathrm{HASM}\left( \Pr {e}_{\mathrm{CMIP}5}\left(x,y,{t}_k\right)\kern0.5em -\kern0.5em \Pr {e}_{\mathrm{CMIP}5}\left(x,y,{t}_0\right)\right) $$
(7)

where Pre downscale_furure(x, y, t k ) is the downscaling results of the future scenarios, Pre downscale(x, y, t 0) is the downscaling result for the period 1976–2005 (t 0), Pre CMIP5(x, y, t 0)is CMIP5 result in the period 1976–2005 (t 0), and Pre CMIP5(x, y, t k ) is the output of CMIP5 under different RCP scenarios. k = 1,2,3 mean different periods: 2011–2040(t 1), 2041–2070(t 2), and 2071–2100(t 3).

4 Results and discussion

4.1 Comparison with another statistical downscaling method

For comparison, the proposed method as described above is named Pre downscale1, while the compared method in this section is termed Pre downscale2. The major difference between the methods is the way the data are used. In Pre downscale1, meteorological information are used for modifying the residual while CMIP5 outputs are applied to establish the transfer function f. In Pre downscale2, meteorological information are used to produce the regression function f, and CMIP5 outputs are employed to the correct the residual. Pre downscale2 has been used in climate impact studies over the past years (Fan et al. 2012; Wang et al. 2012; Yue 2011).

The two methods are first compared on the annual scale in Table 1. About 10 % of the total observations are randomly chosen from the dataset and with the remaining data performing the interpolation calculation. The process is repeated ten times. The accuracy of the predictions are determined by comparing root mean square error (RMSE), mean absolute error (MAE), and mean relative error (MRE) between the observations and downscaling values:

$$ \mathrm{RMSE}=\sqrt{\frac{1}{N}{\displaystyle \sum_{k=1,\cdots, N}{\left( \Pr {e}_{sim}- \Pr {e}_{obs}\right)}^2}},MAE=\frac{1}{N}{\displaystyle \sum_{k=1,\cdots, N}\left| \Pr {e}_{sim}- \Pr {e}_{obs}\right|},MRE=\frac{1}{N}{\displaystyle \sum_{k=1,\cdots, N}\left|\frac{ \Pr {e}_{sim}- \Pr {e}_{obs}}{ \Pr {e}_{obs}}\right|}, $$
Table 1 Performance of the two downscaling methods on the annual scale

The results show that Pre downscale1 is much better than Pre downscale2 for both datasets based on the three error indexes. Pre downscale2 works worse than CMIP5 outputs in Jiangxi area.

Downscaling results are displayed in Fig. 3, which shows that large errors are obvious (as is shown in Fig.3a) in CMIP5 outputs. The largest error can be obtained from the southeast of Tibetan Plateau. A clear similarity is shown in Fig. 3a, c, which indicates that Pre downscale2 does not modify the errors in CMIP5 outputs. Pre downscale2, which uses station data to establish the regression function and the CMIP5 outputs to correct the residual, might not have the best performance. Pre downscale1, which shows an increasing precipitation pattern from northwest to southeast in China, is consistent with the reality. This may be due to the limited station information used in establishing regression relationship and the relatively more information provided by CMIP5 outputs. It is clear that the data usage way is critically important for model output. The station data is necessary for modifying the local details by using HASM. Moreover, it can be seen that CMIP5 outputs could not be used directly due to the large uncertainty.

Fig. 3
figure 3

Downscaling results on the annual scale. a CMIP5 output. b Pre downscale1. c Pre downscale2

For the seasonal scale, Table 2 gives the errors of the two downscaling methods, which shows that Pre downscale1 is much better than Pre downscale2 and CMIP5 outputs. The advantage of Pre downscale1 is obviously based on the validate dataset in China. The reason is that we established the predictand-predictor relationship based on the whole area of China. The simulation accuracy in Jiangxi Province may be higher if we consider the influence factors in this local region. For summer and winter, the results of Pre downscale2 are worse than the original CMIP5 outputs.

Table 2 Comparison of two downscaling methods on the seasonal scale

Figure 4 displays the downscaling results. As is shown, there are large errors in the southwest of China both for summer and winter, which occurred in CMIP5 outputs (Fig. 4a, d) and the results of Pre downscale2 (Fig. 4c, f). The distributions of precipitation are similar for the results of CMIP5 and Pre downscale2(Fig. 4a, c, in summer and Fig. 4d, f, in winter), indicating that Pre downscale2 does not modify the errors occurred in CMIP5 outputs. The results of the method Pre downscale1 are in accordance with the true situations (Fig. 4b, e).

Fig. 4
figure 4

Downscaling results on the seasonal scale. a CMIP5 output in summer. b Pre downscale1 in summer. c Pre downscale2 in summer. d CMIP5 output in winter. e Pre downscale1in winter. f Pre downscale2 in winter

4.2 Comparison of different interpolators

We further compare different residual modification methods: HASM, inverse distance weighted method (IDW), Kriging, and Spline to give the best results. These methods are implemented ten times in ArcGIS 10.1. For comparison, we also give the downscaling result produced by GWR. GWR-HASM in Table 3 is the Pre downscale1. The average results of ten times show that GWR-HASM performs the best for two verification datasets. The worst result obviously comes from GWR, suggesting the importance of the residual modification process in precipitation simulation. GWR-Spline is better than GWR but worse than others. In China, GWR-Kriging outperforms GWR-IDW, while GWR-IDW outperforms GWR-Kriging in Jiangxi Province according to MAE and MRE. And from RMSE, GWR-Kriging is better than GWR-IDW, indicating that IDW is sensitive to outliers. The results presented here indicate that the inclusion of residual modification leads to a significant reduction in the error statistics and HASM is the optimal residual correction method.

Table 3 Comparison of different interpolators in the residual modification process on the annual scale

We also employ the downscaling method Pre downscale1 to downscale seasonal mean precipitation by using different interpolators. Table 4 gives the simulation errors in summer. We can see that GWR-HASM gives the best results both in China and Jiangxi Province. For the validate dataset in China, the worst method is GWR, which indicates the importance of the residual modification in summer precipitation simulation. Spline is worse than other interpolators in China, and GWR-Spline performs worst in Jiangxi Province, indicating that Spline usually produces extreme values. GWR-Kriging performs better than GWR-IDW in China, while GWR-IDW is slightly better than GWR-Kriging according to MAE. In winter, the residual modification process is critically important as is shown in Table 5. GWR provides the worst performances in China and Jiangxi Province. GWR-HASM is more accurate than the classical interpolation methods, which is followed by GWR-Kriging.

Table 4 Comparison of different interpolators in the residual modification process for summer
Table 5 Comparison of different interpolators in the residual modification process for winter

Each interpolation method gives different results when the temporal and spatial scales differ. Since the complicated terrain and elevation in China, precipitation systems are non-stationary. The residual modification process is necessary because of the various influence factors. For the annual mean precipitation, GWR-HASM method performs better than others in the case of China. However, for Jiangxi Region, GWR-HASM, GWR-IDW, and GWR-Kriging present similar results (Table 3). In summer, GWR-HASM, GWR-IDW, and GWR-Kriging give obvious different results both in China and Jiangxi Region (Table 4). And in winter, we find that GWR-HASM and GWR-Kriging produce similar results (Table 5). For all cases tested, GWR-HASM shows overall better performance than other commonly used interpolation methods. The reason for these possibly is that HASM is activated by the driving field that is produced using other interpolators and iterated by introducing station data. In this case, HASM performs better than other interpolators. However, the advantage of HASM is different when simulating precipitation on different times or spatial scales. This is possibly because of the finite difference method applied in the differential equations of HASM. The difference scheme in HASM consists more of the distribution pattern of annual mean precipitation in China (Zhao and Yue 2014).

4.3 Simulation of future precipitation under different RCP scenarios

Climate change is one of the hot issues getting more attention than ever. In this section, based on the hypothesis that the predictor-predictand relationship remains valid in the future, we used the proposed downscaling technique GWR-HASM to predict precipitation for the period 2011–2040 (T1), 2041–2070 (T2), and 2071–2100 (T3) in China under RCP2.6, RCP4.5, and RCP 8.5 scenarios using Eq.(7). Figures 5, 6, and 7 give the distributions of precipitation under different scenarios. It can be seen that the distributions of precipitation represent an increasing pattern from northwest to southeast over China. The difference is obvious under RCP4.5 and RCP8.5 especially in Tibetan Plateau. Detailed information can be obtained from Table 6 and Table 7. Increased precipitation can be seen in different RCP scenarios from T1 to T2 and T2 to T3. The most significant increase occurs in RCP 8.5 from T2 to T3. The least increase is found in RCP2.6 from T2 to T3, and the corresponding amount of increased precipitation is 2.12 mm. On the whole, the most notable change occurs in RCP8.5 scenario.

Fig. 5
figure 5

Prediction of precipitation under RCP2.6 in a 2011–2040, b 2041–2070, and c 2071–2100

Fig. 6
figure 6

Prediction of precipitation under RCP4.5 in a 2011–2040, b 2041–2070, and c 2071–2100

Fig. 7
figure 7

Prediction of precipitation under RCP8.5 in a 2011–2040, b 2041–2070, and c2071–2100

Table 6 Simulation information of precipitation in each period under different RCP scenarios
Table 7 Precipitation change under different RCP scenarios

5 Conclusion

This paper proposes a new downscaling method based on a local regression method, GWR, and a recently developed interpolator, HASM, by effectively using the CMIP5 outputs and the observed climate records. Different usage ways of CMIP5 results and the station data are compared by employing datasets distributed randomly across China and Jiangxi area. Four widely used interpolation methods are also compared to give the optimal residual modification process which produced by the local regression method. And the future climate change scenarios are then simulated based on the proposed method. It is indicated that GCM outputs could not be directly applied in local scale studies. The technique that builds transfer function using the ground observations produces large uncertainties in the final results. Best the result is obtained when the method uses station observations and model results effectively. The comparison of four interpolators indicates that HASM performs the best compared to Kriging, IDW, and Spline. We also find that precipitation simulation is strongly improved with residual correction by using the meteorological observations. The success of the proposed technique lies in the effective use of the datasets and the modification process of the residual by HASM, which can fuse the results of other interpolators and point data effectively. Simulations of future precipitation show that precipitation exhibits overall increasing trend from T1 to T2 and T2 to T3 in different RCP scenarios. The most significant increase occurs in RCP8.5 from T2 to T3, while the smallest rise is found in RCP2.6 from T2 to T3. Choosing optimal predictors is significantly important for predicting future scenarios in the statistical downscaling methods. Further researches will focus on the choice of different explanatory variables for different temporal and spatial scales. Moreover, the local linear relationship in GWR method should be modified to nonlinear since the precipitation heterogeneity especially in mountain areas.