Introduction

Precipitation as a fundamental component of the global water cycle is a key parameter of ecology, hydrology and meteorology (Goovaerts 2000; Langella et al. 2010; Li and Shao 2010; Antonellini et al. 2014; Samper et al. 2014). Understanding and quantifying the spatial variability of precipitation are of key importance in hydrological studies as precipitation drives most hydrological, environmental and agricultural processes. However, strong precipitation gradients over short distance are difficult to capture with point measurements from meteorological stations. Stations are generally located in areas which are readily accessible. It is usually low and insufficient for the use of conventional spatial interpolation techniques (Celleri et al. 2007; Ward et al. 2011). In recent years, the development of remote sensing and geographic information technology has presented us with new methods of precipitation observation (Michaelides et al. 2009). Satellite precipitation data have been widely evaluated with a better performance (Dinku et al. 2007) and used for many applications such as hydrological modeling (Li et al. 2012; Su et al. 2008; Swenson and Wahr 2009), flood prediction (Li et al. 2009), land cover (Cho et al. 2014), rainfall erosivity estimation (Vrieling et al. 2010) and climatological studies (Islam and Uyeda 2007). But studies show that different remote sensing data have different performances in China (Gao and Liu 2013; Kan et al. 2013). Moreover, the temporal coverage of remote sensing data is limited, not long enough to resolve the decadal trends and variability in China.

In the World Climate Research Programme (WCRP), different global climate models (GCMs) participate in the Coupled Model Intercomparison Project Phase 5 (CMIP5). The Coupled Model Intercomparison Project Phase 5 (CMIP5) datasets have been used for the Fifth Assessment Report of the Intergovernmental Panel on Climate Change (AR5). These simulations of GCMs have demonstrated the ability to generally replicate the precipitation trend over the second half of the twentieth century, and can offer precipitation in a longer time scale. However, GCMs have so far been too coarse to resolve this geographically well-defined region. A number of studies have been carried out to create a connection between climate change at the large scale and at the regional scale. The most straightforward approach is linear or more sophisticated methods of interpolation between large-scale grid points closest to the region to infer the regional scale. This method has attracted a lot of criticism, since it is felt that the model resolution is too coarse and the model performance is too poor to allow for interpolation of the results. To overcome the problems with direct interpolation, the approach termed downscaling can be pursued. This approach is based on the understanding that the large-scale information provided by standard coarse-grid GCMs may be postprocessed together with the regional information to specify the regional details of the present climate and its sensitivity to changes in atmospheric composition or other external anomalies. Downscaling methods are usually classified as either dynamical or statistical. Dynamical downscaling involves the use of high-resolution, limited-area climate models within the domain of interest, whereas in statistical downscaling relatively simple statistical models are used to represent the link between atmospheric circulation variables, presumably well simulated by the GCMs, and local weather variables such as precipitation and temperature (Wilby and Wigley 1997; Fowler et al. 2007; Tareghian and Rasmussen 2013; Duan and Mei 2014). Statistical downscaling method is widely undertaken because it is easy and fast to apply (Fowler et al. 2007; Haylock et al. 2006; Barfus and Bernhofer 2014). Statistical downscaling is a two-step process consisting of (1) the development of statistical relationships between local climate variables and large-scale predictors and (2) the application of such relationships to the output of GCM experiments to simulate local climate characteristic in the future. The two main challenges in statistical downscaling are the determination of the functional relationship and the identification of the predictor variables that convey the most relevant information about the predictand and the climate change signal.

Although there is a large body of literatures where an intercomparison of different downscaling methods has been made (Mehrotra et al. 2004; Diaz-Nieto and Wilby 2005; Frost et al. 2011; Liu et al. 2012), very few of these studies have compared downscaling methods from the point of data usage ways. Here, we present two statistical downscaling methods and compare them to give the optimal one for China. We use the meteorological site information over China to downscale the simulations of CMIP5 output results. Both statistical downscaling methods used here involve two steps: (1) determining a local linear model by Geographical Weighted Regression method (GWR) for every location in the prediction domain, (2) using the High Accuracy Surface Modeling method (HASM) to modify the residual produced by the first step. The major difference between them is the data used in the two steps. Then, we use the separate dataset in Jiangxi province and 10 % of the data from national scale to validate the results. At last, a conclusion is given in the final section.

Study area and data

China is located in east Asia. It is the third largest country on earth. China’s topography varies enormously from high mountainous regions to inhospitable desert zones and flat, fertile plains. It is a predominantly mountainous country with a very distinct structural pattern. The extremely varied landforms of China affect the climate conditions in various ways. Precipitation over China exhibits complex space and time structures. Large interannual variability causes local precipitation to fluctuate from year to year. Several floods and droughts often occur in the same season of a year over different regions. Precipitation over China has distinct seasonal characteristics, and is largely controlled by the monsoon circulation. Traditionally, the time from mid-May to the end of August has been defined as the east Asian summer monsoon season, resulting in remarkably variable precipitation for the whole region (Wang and Li 2007).

The historical precipitation data of 752 stations across China were obtained from the national meteorological network in China for the period 1976–2005, which were further analyzed for quality control. The sampling periods of the meteorology stations are not synchronous. Only 712 stations with more than 20 complete years are selected with the exception of 30 locations with between 15 and 25 complete years, which are located in the west of China. We chose 10 % of the total sampled points to verify test results and withheld from the downscaling calculations. We also used the meteorological stations in Jiangxi province to validate the results (Fig. 1). The WCRP’s Coupled Model Intercomparison Project phase 5 (CMIP5) multi-model datasets (Moss et al. 2008) were used in the period 1976–2005 with a resolution of \( 1^{\text{o}} \times 1^{\text{o}} \). The output databases from 21 climate models were selected for the climate change projections in China under the Representative Concentration Pathways (RCP) scenarios. The selected models include both twentieth century climate simulations and twenty-first century climate projections under the RCP2.6, RCP4.5, and RCP8.5 scenarios.

Fig. 1
figure 1

Spatial distribution of the meteorological network in China

Method

The statistical downscaling method used in this study can be summarized as,

$$ {\text{Pre}}_{\text{sim}} = {\text{Pre}}_{\text{downscale}} + {\text{Pre}}_{\text{res}} $$
(1)

where \( \text{Pre}_{\text{sim}} \) is the final result, \( {\text{Pre}}_{\text{downscale}} \) is the downscaling result and will be obtained by the GWR method. \( \text{Pre}_{\text{res}} \) is the residual produced by GWR and will be interpolated by HASM. The two downscaling methods are different according to the data used in \( {\text{Pre}}_{\text{downscale}} \), and thus in \( \text{Pre}_{\text{res}} \). We denote the result of the first method is \( \text{Pre}_{\text{sim1}} \) and the second is \( \text{Pre}_{\text{sim2}} \). For \( \text{Pre}_{\text{sim1}} \), we use CMIP5 output to form the regression function and then get \( {\text{Pre}}_{\text{downscale1}} \), and employ station data to modify the residual to obtain \( {\text{Pre}}_{\text{res1}} \). While for \( \text{Pre}_{\text{sim2}} \), we use meteorological information to establish a statistical transfer function using latitude, longitude, elevation, and impact coefficient of aspect as independent variables to produce \( {\text{Pre}}_{\text{downscale2}} \), and employ the results of CMIP5 to modify the residual and obtain \( {\text{Pre}}_{\text{res2}} \). The second method \( \text{Pre}_{\text{sim2}} \) has been widely used in climate change research in recent years (Yue 2011; Wang et al. 2012; Fan et al. 2012).

Geographically weighted regression method

Due to the large gradients in precipitation means and variances in China, it is common practice to transform observed precipitation first:

$$ \overline{\overline{{\text{Pre}_{i} }}}\, { = }\, \frac{{\text{Pre}_{i} }}{{\text{max}\left\{ {\text{Pre}_{i,i = 1, \ldots ,n} } \right\}}}, $$
(2)

where \( {\text{Pre}}_{i} \) is the CMIP5 simulation value in the first method or the station data in the second method, \( \overline{\overline{{{\text{Pre}}_{i} }}} \) is the transformed data, \( n \) is the number of grids of CMIP5 results or the number of stations. This process can limit extreme values in the results.

Then, we carry Box–Cox transform of \( \overline{\overline{{{\text{Pre}}_{i} }}} \), which can give a more normal distribution and/or improved predictions (Box and Cox 1964; Sakia 1992). The formulation of this transformation is,

$$ \overline{{{\text{Pre}}_{i} }} = \left\{ {\begin{array}{*{20}c} {\ln \overline{\overline{{{\text{Pre}}_{i} }}} ,\;\;\delta = 0} \\ {\frac{{\overline{\overline{{{\text{Pre}}}}}_{i}^{\delta } - 1}}{\delta },\;\;\delta \ne 0} \\ \end{array} } \right. $$
(3)

where \( \overline{{{\text{Pre}}_{i} }} \) is the Box–Cox transformed data and \( \delta \) is a suitable parameter, which is selected to make \( \overline{{\Pr {\text{e}}_{i} }} \) obey normal distribution and thus satisfy the assumption of GWR method (Fotheringham et al. 2002). In this paper, \( \delta = 0.4 \) in the first method and \( \delta = 0.48 \) in the second method. Studies have shown that this process avoids negative values in the results and is necessary for precipitation interpolation (Yue et al. 2013).

It is incorrect to hold that the same linear relationship is appropriate in all places especially in the case of orographic enhancement. Unlike the ordinary linear regression model, GWR (Brunsdon et al. 1996; Loader 2004) is developed to deal with non-stationarity in the regression context, which is especially important for characterizing highly variable precipitation within China. GWR method has been successfully used in precipitation research (Brunsdon et al. 2001) and the formulation of GWR can be written as

$$ {\text{Pre}}_{\text{downscale}} = d_{0,0} (x_{i} ,y_{j} ) + \sum\limits_{i,j = 1}^{N} {a_{i,j} d_{i,j} (x_{i} ,y_{j} )} $$
(4)

\( {\text{Pre}}_{\text{downscale}} \) is the downscaling value of \( (i,j) \) grid-box in the finer scale; \( d_{0,0} \left( {x_{i} ,y_{j} } \right) \) is the intercept; \( a_{i,j} \) is the explanatory variable and \( d_{i,j} \left( {x_{i} ,y_{j} } \right) \) is the corresponding coefficient which is a function of the position. \( x_{i} ,y_{i} \) are the longitude and latitude, respectively. We select the independent variables from latitude, longitude, elevation, impact coefficient of aspect and sky view factor according to the value of the adjust R2 in GWR. In this research, the most influence factors are latitude, longitude, elevation and impact coefficient of aspect with R2 is equal to 0.92 for the first method and 0.91 for the second method.

Hasm

As an innovative surface modeling method (Yue 2011), HASM is based on the fundamental theorem of surfaces which ensures that a surface is uniquely defined by its first and second fundamental coefficients. The first fundamental coefficients reflect the local details in the surface and the second fundamental coefficients mean the macro-information of the surface. The equation of HASM is the following symmetric positive definite linear system (Zhao and Yue 2014),

$$ Wx^{n + 1} = v^{n} $$
(5)

where \( W{ = }A^{T} A + B^{T} B + C^{T} C + \lambda^{2} S^{T} S \), \( v = A^{T} d + B^{T} q + C^{T} p + \lambda^{2} S^{T} k \), and \( \lambda \) is a suitable parameter. The preconditioned conjugate gradient method can be used to solve Eq. (5) and the solution \( x \) is the simulated value of the residual \( {\text{Pre}}_{\text{res}} \) in Eq. (1).

Results and discussion

We first compare two methods in Table 1. Prems is the CMIP5 output. Three indices, mean absolute error (MAE), mean relative error (MRE) and root mean square error (RMSE), were calculated from the station value and downscaling value at each validation sample site. The formulations of these indexes are:

Table 1 Comparison of two downscaling methods
$$ {\text{MAE}} = \frac{1}{N}\sum\limits_{k = 1, \ldots ,N} {\left| {{\text{Pre}}_{\text{sim}} - {\text{Pre}}_{\text{obs}} } \right|} ,\;{\text{MRE}} = \frac{1}{N}\sum\limits_{k = 1, \ldots ,N} {\left| {\frac{{{\text{Pre}}_{\text{sim}} - {\text{Pre}}_{\text{obs}} }}{{{\text{Pre}}_{\text{obs}} }}} \right|} ,\;{\text{RMSE}} = \sqrt {\frac{1}{N}\sum\limits_{k = 1, \ldots ,N} {({\text{Pre}}_{\text{sim}} - {\text{Pre}}_{\text{obs}} )^{2} } } , $$

Results show that the first method is much better than the second from these three error indexes for both datasets. The accuracy of the downscaling method Presim2 is worse than the result of CMIP5 based on the validation dataset in Jiangxi province. Scatter correlation plots for the observed and predicted precipitation (Fig. 2) suggest that the first downscaling method estimates the annual mean precipitation quite reliably, as shown in Fig. 2a and c. Many simulation points are relatively far from the straight line of \( y = x \) using the second method. Underestimation of precipitation is evident for the points from national scale and overestimation of precipitation is obvious for Presim2 in Jiangxi province (see Fig. 2b, d). The correlation coefficients between predicted and observed values are 0.97 for Presim1 and 0.73 for Presim2 for the 10 % of the total sampled points in China. The correlation coefficients are 0.75 and 0.71 for Presim1 and Presim2, respectively, in Jiangxi province.

Fig. 2
figure 2

Observed and estimated precipitation using different methods, a Presim1 for 10 % points from China, b Presim2 for 10 % points from China, c Presim1 for points from Jiangxi province, d Presim2 for points from Jiangxi province

Figure 3 illustrates the downscaling results. We can see that due to the large errors in the original CMIP5 output (Fig. 3a), especially in southeastern of the Tibetan Plateau, the second method which used the CMIP5 output to modify the residual is worse than the first one. The distribution trends in Fig. 3a, c are similar, which show that the second downscaling method did not modify the errors produced by CMIP5. However, Fig. 3b, which is produced by the first downscaling method, agrees well with the real situation. The reason of this is the function of the meteorological station information. The accuracy of the results mainly depends on the first step in the downscaling process. For Presim1, there are about 969 points of CMIP5 output that distribute evenly across China. While for Presim2, 641 meteorological observations are used for downscaling which distribute extremely uneven in China. The site density is higher in eastern China than in western China, which did not well reflect the characteristics of precipitation in China. The number and the distribution of the stations limit the accuracy of the downscaling results. The evenly distributed points of CMIP5 results and the local regress method, GWR which considers the non-stationarity of the precipitation, ensure the accuracy of the downscaling results in the first step. And further, we can see that original CMIP5 output is not good enough for use, which means that the introduction of station data is necessary to modify the local details that implemented by HASM. The comparison of the two downscaling methods also reveals that the second step in the downscaling process, that is, the residual correction, is critical important for accuracy improvement.

Fig. 3
figure 3

The comparison of two downscaling methods, a original CMIP5 output, b the first method Presim1, c the second method Presim2

Conclusion

Precipitation, as a fundamental component of the global water cycle, is a key parameter in ecology, hydrology and meteorology. Precipitation data with accurate, high spatial resolution are crucial for improving our understanding of basin-scale hydrology. In this study, we compare two statistical downscaling methods using two datasets. One dataset scatters randomly in the whole of China and another is located in Jiangxi province. As expected, the results show that GCMs cannot be used directly in climate change impact studies. In China, the second method Presim2 which establishes regression model based on the station data has a tendency to overestimate or underestimate the real values. The advantage of the first method is obvious, which fuses the mode data and station data effectively. Results also show the importance of the meteorological station data in the process of residual modification. China is such a vast area, precipitation is affected by many geographical and topographical factors, which means that more accurate results can be obtained in different regions with different explanatory variables, especially for short time scales. Except the variables considered in this study, further researches should concentrate on more explanatory variables to gain more accurate results.