1 Introduction

Northwest China accounts for one third of the territory, but only 5 % of the water resources of China. Water shortage is one of the most important factors that restrict the economic development and ecological deterioration for the Northwest region. In the background of global climate continuing to warm, the tendency of water resources in the Northwest arid area of China must be carefully considered by the strategy formulation for regional economy development and the management of water resources and also attracts high attention of research community (Lan et al. 2005). The Heihe River Basin (HRB) is the second largest inland river basin in Northwest China, which is located in 98–101.5° E, 38–42° N, with the drainage area of about 130 km2 and a total length of about 810 km. The land type of HRB can be divided into three categories: mountain, oasis, and desert (Cheng et al. 2006). Water resources are the core of the Heihe River research and the link of ecological and economic system. Rainfall is one of the most important parameters of the hydrological system and the main sources of water resources in the HRB. Therefore, the rainfall data with high spatial resolution has important guiding significance on the sustainable development of society, economy and environment, and the hydrological research in Northwest China. Global climate models (GCMs) or reanalysis data could be used to simulate rainfall scenarios. However, GCMs are not able to provide reliable information on a regional scale. At the same time, the spatial distribution of ground observations is uneven and sparse, with only 19 meteorological stations in the HRB region. Hence, downscaling methods could be applied to compensate for these deficiencies (Gao et al. 2008).

Downscaling techniques are often used to derive high spatial resolution rainfall datasets from GCMs or reanalysis data, which can be broadly divided into two approaches: dynamical downscaling (e.g., Druyan et al. 2002; Lenderink et al. 2007; Denis et al. 2002) and statistical downscaling (e.g., Wilby et al. 1997, 2000; von Storch et al. 1993; Maraun et al. 2011). Dynamical downscaling uses GCMs or reanalysis data as boundary and initial conditions to drive a regional climate model (RCM), which includes numerous physical processes and does not depend on observations. Numerous dynamical downscaling models have been used to simulate rainfall in the USA (e.g., Harding et al. 2014), West Africa (e.g., Siegmund et al. 2014), Southwest Asia (e.g., Xu et al. 2012), and Europe (e.g., Murphy, 1999; Schmidli et al. 2007). In the HRB, some RCMs were applied to predict regional climate change (e.g., Pan et al. 2012; Gao et al. 2006, 2007; Liu et al. 2008). However, these RCMs adopted international standard parameters without considering the effects of the complex terrain and surface features (Xiong et al. 2013).

Statistical downscaling establishes statistical relationships between large-scale GCMs or reanalysis data (predictors) and local-scale meteorological variables (predictands) and extends these relationships to obtain the time series of predictands from the predictors. Numerous statistical downscaling methods have been applied to simulate rainfall in many regions. Storch et al. (1993) used canonical correlation analysis (CCA) to construct a simple statistical regression model to simulate wintertime rainfall of the Iberian Peninsula in Europe. The analog method was used to simulate rainfall in the USA (Zorita et al., 1994). Principal component regression models were used to simulate winter rainfall in Southern Australia (Li et al. 2008). However, the statistical downscaling technique has not been applied to the study of hydrological cycle in the HRB with scarcely and unevenly meteorological observations.

There have been numerous studies to compare the skill of simulating rainfall in many regions by statistical and dynamical downscaling methods (e.g., Murphy 1999; Mearns et al. 1999; Mehrotra et al., 2014). Furthermore, the period of rainfall in the HRB is mainly concentrated in the rainy season (from May to September) (Zhang et al., 2008). Therefore, in the paper, a dynamical downscaling model (DDM) with a high resolution of 3 km was build up based on the Regional Integrated Environmental Model System (RIEMS 2.0). The observed monthly rainfall in the HRB (predictands) and 14 reanalysis variables (predictors) were used to establish a statistical downscaling model (SDM) by the stepwise regression method. Monthly rainfall in rainy season in the HRB over the period of 2003–2012 was simulated by SDM and DDM to compare with the observed datasets (OBS). The main goals are to systematically compare the capability of simulating monthly rainfall and explore the advantages and disadvantages of the two downscaling models in the HRB.

2 Data and methods

2.1 Data

2.1.1 Observation stations datasets

Monthly rainfall of 10 meteorological observation stations selected from 19 stations, which include with missing values, are used as predictands to fit the statistical model (Fig 1, Table 1). The data from 1971 to 2012 is obtained from the Chinese Meteorological Data Sharing Service System (http://cdc.cma.gov.cn).

Fig. 1
figure 1

Spatial distribution of meteorological stations in the HRB

Table 1 Coordinates and altitudes of stations

2.1.2 Predictors selection

In the study, sea level pressure (SLP), wind speed, and direction at 850, 700, and 500 hPa (U/V850, U/V700, U/V500), geopotential height at 1000, 850, 700, and 500 hPa (H1000, H850, H700, H500), and specific humidity at 850, 700, and 500 hPa (S850, S700, S500) are selected as predictors according to Wetterhall et al. (2006). These 14 predictors with resolution of 2.5° × 2.5° are obtained from the National Center for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) reanalysis project (Kalnay et al. 1996).

2.2 Statistical downscaling model

Firstly, monthly rainfall over the period of 1971–2002 as training sample is used to establish the statistical model. The predictors and predictands are standardized as follows:

Y = (X - b)/c, (1)

where Y is the standard value; X is a predictor (SLP, U/V850, U/V700, U/V500, H1000, H850, H700, H500, S850, S700, or S500) or a predictand (monthly rainfall); b is the mean monthly value of X; and c is the standard deviation of X.

Then, principal component analysis (PCA) to these 14 predictors is applied. The general aim of PCA is to simplify a spatial–temporal dataset by transforming it to spatial patterns of variability and temporal projections of these patterns. The associated temporal projections are the principal components (PCs) and are the temporal coefficients of the empirical orthogonal function (EOF) patterns. PCA helps to reduce the numbers of variables in a dataset without losing much of the information. This objective method can be achieved by including only the first few PCs (Ruping et al., 2002). Therefore, the first four PCs of every predictor were selected to establish statistical models.

Last, stepwise regression method is used to develop the model as follows:

\( Y(t)=\sum_{n=1}^N{\alpha}_n{X}_n(t)+{\varepsilon}_t \) , (2).

where Y is the predictand (monthly rainfall), X is the predict variable (N = 56), α is the regression coefficient, and ε t is the residual not described by the statistical model.

For simplicity, only the selected predictors for the month of July at 10 stations are shown in Table 2. It can be seen that S700 and S850 are the most important predictors for rainfall in the HRB.

Table 2 Variables were selected for the month of July at 10 stations

2.3 Dynamical downscaling model

The high-resolution RCM for HRB was built up. The key parameter datasets of land surface process provided by the Environmental and Ecological Science Data Center for West China, National Natural Science Foundation of China, which included the soil water content, soil water potential, soil hydraulic conductivity, field capacity, and wilt point moisture content, were recalibrated (Xiong et al. 2013). Initial and lateral boundary conditions for wind, temperature, water vapor, and surface pressure were extracted from ERA–interim reanalysis data (Dee et al., 2011), which was downloaded from the European Center for Medium Range Weather Forecasts Data Server. The simulated domain encompassed the entire HRB region, centered at 40.30° N, 99.50° E; the horizontal mesh consisted of 181 and 221 grid points in the longitudinal and latitudinal directions, respectively, with a horizontal resolution of 3 km. The high-resolution RCM have finished simulating the rainfall in the HRB for the period from January, 2003 to December, 2012 (Xiong et al. 2013).

3 Results

3.1 Spatial distribution of rainy-season rainfall

The annual rainfall in the HRB is 400–700 mm in the upstream at Qilian Mountain, 100–200 mm in the midstream at the irrigated oases, and 15–50 mm in the downstream at the desert. The HRB is divided into three subregions according to the characteristics of rainfall: upstream (Yongchang and Tuole stations), midstream (Gaotai, Alashanyouqi, Jiuquan, and Zhangye stations), and downstream (Dingxin, Jinta, Ejinaqi, and Guaizihu stations) (Cheng et al. 2006).

Rainy-season rainfall in three subregions over the period of 2003–2012 is simulated by SDM and DDM. From Table 3, it can be seen that the observed rainfall in the upstream is 249.38 mm. Rainfall simulated by SDM is 219.25 mm with bias of −12.09 %, and RMSE and mean absolute error (MAE) between SDM and OBS are 41.30 and 33.26 mm, respectively; whereas rainfall simulated by DDM is 225.92 mm with bias of −9.42 %, and RMSE and MAE between DDM and OBS are 42.09 and 35.52 mm, respectively. Rainfall simulated by DDM is better than that by SDM in the upstream.

Table 3 OBS and rainy-season rainfall simulated by SDM and DDM, bias, and RMSE and MAE between OBS and two models in the subregions of HRB

The observed rainfall in the midstream is 95.61 mm. Rainfall simulated by SDM is 99.35 mm with bias of 3.91 %, and RMSE and MAE between SDM and OBS are 20.14 and 18.10 mm, respectively; whereas rainfall simulated by DDM is 73.42 mm with bias of −23.22 %, and RMSE and MAE between DDM and OBS are 37.74 and 32.78 mm, respectively. Rainfall simulated by SDM is better than that by DDM in the midstream.

The observed rainfall in the downstream is 43.49 mm. Rainfall simulated by SDM is 38.75 mm with bias of −10.89 %, and RMSE and MAE between SDM and OBS are 14.44 and 12.60 mm, respectively; whereas rainfall simulated by DDM is 39.36 mm with bias of −9.50 %, and RMSE and MAE between DDM and OBS are 12.29 and 9.87 mm, respectively. There was a little difference between the rainfall simulated by SDM and DDM in the downstream.

In general, both methods reasonably reproduced the rainy-season rainfall in the HRB with a high-level skill. Rainfall simulated by DDM is better than that by SDM in the upstream, with biases of −12.09 and −13.59 %, respectively; rainfall simulated by SDM is better than that by DDM in the midstream, with biases of 3.91 and −23.22 %, respectively; there is little difference between the rainfall simulated by SDM and DDM in the downstream, with biases of −10.89 and −9.50 %, respectively. Both methods have their own advantages.

Figure 2 is the time series of rainy-season rainfall. The observed rainfall in the upstream is in the range of 1.40–1.98 mm/day. Rainfall simulated by SDM is in the range of 1.19–1.63 mm/day with maximal bias of −28.64 % in 2011; whereas rainfall simulated by DDM is in the range of 1.07–1.92 mm/day with maximal bias of −32.20 % in 2003.

Fig. 2
figure 2

The time series of rainy-season rainfall (mm/day) in the subregions of HRB: a upstream, b midstream, and c downstream

The observed rainfall in the midstream is in the range of 0.46–0.82 mm/day. Rainfall simulated by SDM is in the range of 0.48–0.79 mm/day, all of biases are in the range of −50–50 %; whereas rainfall simulated by DDM is in the range of 0.19–0.75 mm/day, which gives biases larger than 50 % in 2003–2005 and 2011.

The observed rainfall in the downstream is in the range of 0.18–0.41 mm/day. The rainfall simulated by SDM is in the range of 0.13–0.38 mm/day, which gives positive biases larger than 50 % in 2006; whereas rainfall simulated by DDM is in the range of 0.12–0.53 mm/day with bias of −38.95–36.04 %.

In general, most biases of two models are in the range of −50–50 %, which are consistent with the IPCC TAR (Houghton et al. 2001); thus, both SDM and DDM well simulate the time series of rainy-season rainfall in the HRB.

3.2 Monthly rainfall in rainy–season

Figure 3 shows monthly rainfall in the HRB. From Fig. 3, it can be seen that the observed rainfall in the upstream is in the range of 0.94 (May)–2.50 (July) mm/day. Rainfall simulated by SDM is in the range of 0.81 (May)–2.06 (July) mm/day with biases of −19.87–1.84 %; whereas rainfall simulated by DDM is in the range of 1.00 (May)–2.17 (July) mm/day with biases of −23.67–11.03 %. For different month, rainfall simulated by DDM is better than that by SDM in May and July.

Fig. 3
figure 3

Monthly rainfall (mm/day) in rainy-season in the subregions of HRB: a upstream, b midstream, and c downstream

The observed rainfall in the midstream is in the range of 0.35 (May)–0.91 (July) mm/day. Rainfall simulated by SDM is in the range of 0.38 (May)–0.98 (July) mm/day with biases of −24.78–15.65 %; whereas rainfall simulated by DDM is in the range of 0.14 (May)–1.03 (September) mm/day and tends to monthly increase with biases of −63.81–35.62 %, which gives negative biases larger than 50 % in May and June. For a different month, rainfall simulated by SDM is close to OBS than that by DDM except August.

The observed rainfall in the downstream is in the range of 0.13 (May)–0.47 (July) mm/day. Rainfall simulated by SDM is in the range of 0.08 (May)–0.50 (July) mm/day with biases of −41.54–15.30 %; whereas rainfall simulated by DDM is in the range of 0.07 (May)–0.43 (September) mm/day with biases of −47.43–33.29 %. For a different month, rainfall simulated by SDM is closer to OBS than that by DDM except August and September.

In general, most biases of monthly rainfall simulated by SDM and DDM are in the range of −50–50 %, which are consistent with the IPCC TAR. Therefore, both methods reasonably reproduce the monthly rainfall in rainy–season in different subregions. Rainfall simulated by DDM is better than that by SDM in May and July in the upstream; whereas rainfall simulated by SDM is closer to OBS except August in the midstream and except August and September in the downstream. In addition, the maximal and minimal values of OBS in three subregions are in July and May, respectively. However, DDM cannot capture this pattern in the midstream and downstream.

3.3 Comparison with stations

Tuole meteorological observation station, a typical station in the upstream, is located in 98.42° N, 38.80° E. The average annual rainfall is more than 300 mm over the period of 1971–2012. Figure 4 shows the rainfall obtained by SDM and DDM are compared with the OBS in July. The correlation coefficient between DDM and OBS is 0.43 and reaches a significance level of 1 %, and it is 0.23 between SDM and OBS.

Fig. 4
figure 4

Rainfall (mm/day) in July in Tuole station

Figure 5 shows the monthly rainfall in rainy season in Tuole, Zhangye, and Dingxin station. The observed rainfall in Tuole station in the upstream is in the range of 1.13 (May)–3.33 (July) mm/day. Rainfall simulated by SDM is in the range of 1.00 (May)–2.56 (July) mm/day with biases of −32.56–13.96 %; whereas rainfall simulated by DDM is in the range of 0.80 (September)–3.13 (July) mm/day with biases of −38.72–3.54 %. Biases of DDM are minor than that of SDM in May, June, and July.

Fig. 5
figure 5

Monthly rainfall (mm/day) in a Tuole station, b Zhangye station, and c Dingxin station

The observed rainfall in Zhangye station in the midstream is in the range of 0.50 (May)–0.93 (September) mm/day. Rainfall simulated by SDM is in the range of 0.36 (May)–1.27 (September) mm/day with biases of −28.39–44.87 %; whereas rainfall simulated by DDM is in the range of 0.18 (May)–1.19 (September) mm/day with biases of −64.18–27.43 %, which gives significant underestimates in May and June with biases larger than 50 %. Furthermore, similar to the midstream, rainfall in Zhangye station simulated by DDM tends to monthly increase. Biases of SDM are minor than that of DDM in May, June, and September.

The observed rainfall in Dingxin station in the downstream is in the range of 0.12 (May)–0.58 (July) mm/day. Rainfall simulated by SDM in the range of 0.05 (May)–0.58 (July) mm/day with biases of −63.79–0.04 %, which gives biases larger than 50 % in May and August; whereas rainfall simulated by DDM is in the range of 0.08 (June)–0.54 (September) mm/day with biases of −68.59–40.23 %, which gives a underestimate with bias larger than 50 % in June. Biases of DDM are minor than that of SDM in May, August, and September.

The multi-year average rainy-season rainfall in three stations is listed in Table 4. The observed rainfall in Tuole station is 313.85 mm. Rainfall simulated by SDM is 272.55 mm with bias of −13.16 %, and RMSE and MAE between SDM and OBS are 56.41 and 43.01 mm, respectively; whereas rainfall simulated by DDM is 274.94 mm with bias of −12.40 %, and RMSE and MAE between DDM and OBS are 59.42 mm and 44.01 mm, respectively. There was a little difference between the rainfall simulated by DDM and SDM.

Table 4 OBS and rainy-season rainfall simulated by SDM and DDM, bias, and RMSE and MAE between OBS and two models in three stations

The observed rainfall in Zhangye station is 113.19 mm. Rainfall simulated by SDM is 129.06 mm with bias of 14.02 %, and RMSE and MAE between SDM and OBS are 37.31 and 30.54 mm, respectively; whereas rainfall simulated by DDM is 96.70 mm with bias of −14.60 %, and RMSE and MAE between DDM and OBS are 46.08 and 37.82 mm, respectively.

The observed rainfall in Dingxin station is 52.97 mm. Rainfall simulated by SDM is 35.62 mm with bias of −32.75 %, RMSE and MAE between SDM and OBS are 26.08 and 22.21 mm, respectively; whereas rainfall simulated by DDM is 41.92 mm with bias of −19.34 %. RMSE and MAE between DDM and OBS are and 25.24 and 20.72 mm, respectively. Rainfall simulated by DDM is reproduced better than that by SDM in Dingxin station.

In general, for multi-year mean rainy-season rainfall in different stations, there is a little difference between the rainfall simulated by DDM and SDM in Tuole station in the upstream, with biases of −13.16 and −12.40 %, respectively; rainfall in Zhangye station simulated by SDM is overestimated with bias of 14.02 %, and rainfall simulated by DDM is underestimated with bias of −14.60 %; rainfall in Dingxin station simulated by DDM is reproduced better than that by SDM, with biases of −32.75 and −19.34 %, respectively. For a different month, biases of DDM are minor than that of SDM in May, June, and July in Tuole station, and in May, August, and September in Dingxin station; whereas biases of SDM are minor than that of DDM in May, June, and September in Zhangye station.

4 Discussion and conclusion

Statistical and dynamical downscaling methods simulated monthly rainfall of rainy season in three subregions of the HRB. The results showed the following: (1) Both methods reasonably reproduced the spatial pattern of rainy-season rainfall in the HRB with a high-level skill. Rainfall simulated by DDM was better than that by SDM in the upstream, with biases of −12.09 and −13.59 %, respectively; rainfall simulated by SDM was better than that by DDM in the midstream, with biases of 3.91 and −23.22 %, respectively; there was a little difference between the rainfall simulated by SDM and DDM in the downstream, with biases of −10.89 and −9.50 %, respectively. For a different month, rainfall simulated by DDM was better than that by SDM in May and July in the upstream, whereas rainfall simulated by SDM was closer to OBS except August in the midstream and except August and September in the downstream. (2) For multi-year mean rainy-season rainfall in different stations, there was little difference between the rainfall simulated by DDM and SDM in Tuole station in the upstream, with biases of −13.16 and −12.40 %, respectively; rainfall in Zhangye station simulated by SDM was overestimated with bias of 14.02 %, and rainfall simulated by DDM was underestimated with bias of −14.60 %; rainfall in Dingxin station simulated by DDM was reproduced better than that by SDM, with biases of −32.75 and −19.34 %, respectively. For a different month, biases of DDM were minor than that of SDM in May, June, and July in Tuole station, and in May, August, and September in Dingxin station, whereas biases of SDM were minor than that of DDM in May, June, and September in Zhangye station. The correlation coefficient of the rainfall in Tuole station between DDM and OBS was 0.43 in July and reached a significance level of 1 %.

Statistical downscaling methods in present studies were implemented in the regions possessing enough meteorological observation stations and good records of rainfall. Thus, SDM had its own shortcoming when applied to the regions lacking of stations, such as the HRB. Therefore, we should think about using satellite-sensing data to assess the capability of SDM and DDM to simulate rainfall in the HRB.