1 Introduction

Simulations with general circulation models (GCMs) have shown, within the assumptions and limitations under which both models and the underlying scenarios have been developed, that the climate will continue to change, unparalleled in recent human history, in the twenty-first century. Global temperatures are projected to increase almost all over the world (NRC 2010; Solomon et al. 2007), while precipitation changes show regional differences in trend direction (Solomon et al. 2007). Changes in these two foremost drivers of the hydrological cycle can jeopardize sufficient and good quality water supply to meet increasing demands, at least make careful management of this renewable but limited natural resource indispensable.

The coarse resolution of GCMs limits their prediction ability at small regional and local scales as they miss the relief of the region for which they are intended to be used and also often show bias at smaller scales (Segui et al. 2010). Therefore, studies of the hydrological cycle on watershed scale necessitate the downscaling of GCM simulation results to the local scale of interest. Dynamical downscaling using regional climate models (RCM) with finer resolution is one alternative (Sunyer et al. 2012), which is however computationally extensive. Widely used due to comparatively lesser computational burden are statistical downscaling methods (Benestad et al. 2008; Dibike and Coulibaly 2005; Jones and Thornton 2013; Timbal et al. 2009).

Statistical downscaling is defined as the process of establishing the link between variables representing a large scale and variables representing a smaller scale (Benestad et al. 2008). Statistical downscaling methods include multiple regression, artificial neural networks, and empirical orthogonal function analysis, among others (Prudhomme et al. 2002). A review of downscaling methods and limitations is presented by Wilby and Wigley (1997). Hamlet et al. (2010) describe two commonly used statistical downscaling approaches and develop a third hybrid approach, discussing their strengths and limitations for various water planning applications. Wilby et al. (2004) present guidelines for the application of climate scenarios developed from statistical downscaling methods. Multiple regression methods are widely used, and a software (SDSM—statistical downscaling model) has been developed (Wilby and Dawson 2007; Wilby et al. 2002) as a decision support tool to be used in regional climate change studies. An automated regression-based statistical downscaling tool is also present (ASD—automated statistical downscaling) for automatic predictor selection based on backward stepwise regression and partial correlation coefficients (Hessami et al. 2008). Statistical downscaling relying on multiple regression methods uses large-scale atmospheric variables such as mean sea level pressure, atmospheric circulation, stability, and moisture content as predictors. Predictands are local variables of interest, such as temperature and precipitation which are principal ingredients to hydrological models. Predictor selection is of crucial importance and constitutes the principal amount of work in statistical downscaling (Wilby and Dawson 2007). Once suitable regression equations are established, they can be utilized to obtain estimates from GCM outputs for different future climate scenarios. In this process, weather or scenario generators produce ensembles of future synthetic data for the variables of local interest. Downscaling tools like SDSM incorporate weather and scenario generators to be used consequently with downscaling (Wilby and Dawson 2007). Equally important as predictor selection is the validation of generated time series. The regression equations need to be tested with a different set of data towards their applicability beyond the time period at which they were calibrated. This process involves comparing the generated time series with observations using statistical tools for comparison of means and variances (Khan et al. 2006a, b).

In this study, temperature (minimum, average, and maximum), precipitation, cloudiness, relative humidity, and wind speed data from the National Centers for Environmental Protection (NCEP)/National Center for Atmospheric Research (NCAR) and ERA-40 reanalysis experiment were downscaled to five meteorological stations situated in and around the Porsuk Stream watershed in western inner Anatolia in Turkey using SDSM and ASD. The predictors for the temperature and precipitation time series were analyzed with respect to their relative magnitudes and spatial distribution within the watershed. The scenario generator feature of SDSM was utilized to generate projections into the future for three Special Report Emission Scenarios (SRES) scenarios (A1B, A2, and B2) and GCM simulation results obtained from the Canadian Climate Center (CGCM3.1(T63)) and Met Office Hadley Centre (2012) (HadCM3) (Canadian Centre for Climate Modelling and Analysis, 2012; ECMWF ERA-40 data 2012; Kalnay et al. 1996; Kistler et al. 2001).

2 Material and methods

2.1 Study area and meteorological stations

The study area is the Porsuk Stream watershed situated in the western portion of the inner Anatolia Region of Turkey (Fig. 1). The watershed has a surface area of 5800 km2 and the relief changes between 674 and 1766 m above sea level. The Porsuk Stream, the namesake of the watershed, is a 280-km-long stream with six comparatively shorter tributaries. The stream is intercepted by the Porsuk Reservoir in the western portion of the watershed. The land use in the watershed is principally agriculture, followed by mining. Irrigation is carried out with water withdrawn from the Porsuk Stream. Two large cities in the watershed, Eskişehir and Kütahya, with a total population of around one million, are centers of urbanization and industrialization (Albek et al. 2011; Gungor and Goncu 2013).

Fig. 1
figure 1

The location of the watershed and the meteorological stations

The watershed is dominated by a transitional climate. The western portion is under the influence of maritime climate and differs in temperature by around +1 °C from the eastern portion. These western regions receive more precipitation as they are closer to the moisture supplying coastal areas. Moreover, the watershed’s northwestern edge is situated at the terminus of a long valley system which sucks moisture from the Black Sea and shows higher humidity. The western portion also records higher cloud cover compared to the eastern one which is under continental influence.

The meteorological stations whose data have been used in this study are large climatic stations where all major meteorological measurements are taken at regular time intervals by the Turkish State Meteorological Service, quality checked, and stored at their website for use (TUMAS 2012). The locations and elevations of the stations and information about the meteorological data obtained from them are displayed in Table 1. For missing values spanning a short period of a few days, data filling was employed by interpolating between neighboring values (for 1 day of missing values) and interpolation between neighboring values together with values from neighboring meteorological stations (for longer than 1 day missing values). For periods spanning weeks or months, the corresponding month and year were omitted from subsequent statistical analysis. Missing values were encountered mostly in the first years of the data period.

Table 1 Meteorological stations and observational data

2.2 GCM data and SRES scenarios

The four GCM model plus scenario combinations (to be referred as only scenarios in the following text) used in this study are CGCM3.1(T63) from the Canadian Centre for Climate Modelling and Analysis based on SRES A1B and A2 and UKMO-HadCM3 from the Hadley Centre for Climate Prediction and Research/Met Office based on SRES A2 and B2 (Canadian Centre for Climate Modelling and Analysis, 2012; Met Office 2012; Nakicenovic et al. 2001). The time series for the downscaling predictors belong to the National Center for Environmental Prediction and National Center for Atmospheric Research as NCEP/NCAR reanalysis data (NCAR/NCEP Reanalysis 2012a, b) obtained through the Canadian Centre for Climate Modelling and Analysis website and to the European Centre for Medium-Range Weather Forecasts ERA-40 reanalysis project (Uppala et al. 2005). The reanalysis data set is a combination of two data sets where the precipitation data comes from the ERA-40 project and the remaining 24 predictors belong to the NCEP/NCAR reanalysis data set (to be referred as only reanalysis in the following text) (Table 2). Bilinear interpolation was applied to exactly match the meteorological station coordinates and the GCM and reanalysis data set coordinates. The predictands in downscaling were chosen as minimum, average, and maximum temperatures; precipitation; cloudiness; relative humidity; and wind speed from local meteorological stations (Table 1).

Table 2 General circulation model (GCM) and reanalysis data used in downscaling

The three SRES scenarios used differ in their depiction of the future conditions likely to prevail in the twenty-first century. The A1B scenario assumes a market-oriented world with the fastest economic growth among the scenarios and strong regional interaction. The A2 scenario places more emphasis on economy than A1B in a strongly heterogeneous world, meaning that the development and spread of new technologies concerning the abatement of carbon dioxide buildup in the atmosphere is less rapid. The B2 scenario, like the A2 scenario, emphasizes local solutions while the development of new technologies continues at a rate between the foregoing scenarios (Arnell et al. 2004; Nakicenovic et al. 2001).

All three scenarios do not take into consideration worldwide initiatives to control greenhouse gas emissions. Moreover, it is debated that the scenarios do not take into consideration supply-side limitations on fossil fuels which are likely to limit carbon dioxide emissions (Vernon et al. 2011). Every scenario sets different targets for the greenhouse gas emissions and ultimate atmospheric concentrations, and thus, the temperature increases based on them and the general circulation models they drive are different. The A2 scenario predicts the highest average near-ground temperature increase of 3.4 °C in a century and the B2 scenarios prediction lies at 2.4 °C, with A1B in-between at 2.8 °C (Nakicenovic et al. 2001).

2.3 Statistical downscaling with ASD and SDSM

The automated statistical downscaling (ASD) tool was employed to find the most optimum set of predictors to be used in the subsequent downscaling process using SDSM. In ASD, backward stepwise regression and partial correlation coefficients are utilized to obtain the predictor set (Hessami et al. 2008). The process of predictor selection begins with all the terms in a multiple regression model and removes the least significant terms until all the terms are statistically significant. The partial F test is used to decide on whether to remove a term and is used for every predictor at every step of stepwise regression. The SDSM is a decision support tool for assessing local climate impacts using a statistical downscaling technique (Wilby and Dawson 2007; Wilby et al. 2002). Multiple linear regression (MLR) equations are formed with predictor variables for a specific predictand variable where the coefficients of the predictors are found by an optimization algorithm. The predictor-predictand model structure may be monthly, seasonal, or annual. In this study, a monthly model structure was chosen and MLR equations for every month were constructed. SDSM allows conditional or unconditional models where a direct link is assumed between the predictors and the predictand in the latter. Temperatures (daily minimum, average, and maximum), relative humidity, and wind velocity were treated as unconditional processes in this study. In conditional processes, an intermediate process between large-scale forcing and local climatological variables exists. Precipitation was modeled as a conditional process as local precipitation amounts depend on wet-dry day sequences which themselves are dependent on large-scale atmospheric variables (Kilsby et al. 2007; Wetterhall et al. 2009). Cloudiness also was modeled as a conditional process, due to cloud-forming processes being dependent on local pressure systems and humidity. Temperature was treated as an autoregressive process whereas the precipitation sequence did not show such a structure when its structure was tested with the autocorrelation function. Winds are generally prevalent for days in the low-relief valley system in the Porsuk Stream watershed and are caused by large-scale atmospheric disturbances and not small-scale systems like mountain-valley winds, and the wind speed time series were treated as an autoregressive process as well. Cloudiness also showed itself as an autoregressive process, but as it is highly correlated with precipitation during precipitation events and not at other times, it was not treated as such.

Variance inflation by increasing or decreasing white noise and bias correction by compensating the tendency for over- and underestimating the mean can also be applied in SDSM (Wilby and Dawson 2007) which enable a calibration of the MLR equations (Easterling 1999; Khan et al., 2006a, b). The most appropriate coefficient set for a particular multiple regression equation was determined by applying various statistical tools inherent in SDSM like quantile-quantile plots together with variance inflation and bias correction. Von Storch (1999) questioned the validity of the use of inflation which is based on the assumption that all local variability can be traced back to large-scale variability. While Hu et al. (2013) found no improvement in downscaled precipitation with the use of variance inflation and bias correction, Liu et al. (2011) obtained best agreement between observed and downscaled results with the application of these statistical measures and also event threshold.

For each predictand at every station, hundred ensembles members were generated. Ensemble generation is enabled by the stochastic component of the MLR equations (white noise). CGCM3 time series do not take into account leap years so they were filled in by simple averaging. HadCM3 time series take every year as consisting of 360 days, and the missing days were filled in by allocating them randomly across a particular year and then using averages of the preceding and succeeding days. In this way, a probable seasonal bias of extra 5 or 6 days was avoided. The procedure followed in downscaling is shown schematically in Fig. 2.

Fig. 2
figure 2

The downscaling procedure

Five annual-derived time series were prepared from daily time series to be included among the annual aggregate variables. The heat wave in a particular year is the maximum temperatures sustained for at least five consecutive days with a return period of 1 year, as designated by 5T1 and was calculated utilizing the maximum temperature daily series. The number of dry days in a year was found by counting days with no precipitation. Likewise, the maximum precipitation was determined as the maximum value of the daily precipitation series in a particular year. Dry and wet spell durations were calculated by finding the longest duration without interruption (expressed in days) in a particular year for which no precipitation falls (dry spell) and for which there is no day without precipitation (wet spell).

The year used in obtaining aggregates was chosen as a water year which begins at October 1 in a particular year and ends at September 30 the next year. This convention was adopted in order not to split the winter season into two parts and to avoid using data from two consecutive winters in obtaining aggregates.

2.3.1 Validation with GCM predictors

The calibration and subsequent construction of regression equations were conducted using data from the period between the water years 1976 and 2003 (Fig. 2). The calibration was conceived as a run-through calibration using the whole period of the available reanalysis data set in order to utilize a calibration period nearing a climatological averaging period of 30 years. Validation of the downscaled time series was carried out for the period between 1976 and 2009 using local meteorological data and predictands from SDSM Weather Generator output with GCM predictors as an independent data set. The common practice of dividing the reanalysis data set into two periods, using the first period for calibration and considering the second period for validation, was not followed in order to avoid short periods of data calibration and validation. Such a procedure was applied by Brands et al. (2011), called suboptimal validation, and its limitations and implications were discussed. Hessami et al. (2008) detected no systematic decrease of performance in using GCM predictors instead of reanalysis ones in the comparison of downscaling results. A primary concern in using GCM predictors is the transfer of uncertainties of GCM results into validation statistics. The uncertainties brought in by reanalysis and GCM predictors are discussed in Sect. 3.2. Additional uncertainty can be partly overcome by validation with ensembles and using a long time period which was applied here. Moreover, as will be discussed in Sect. 3.4, the change in climate over the period 1961–2100 was analyzed by taking differences among GCM scenario projection periods and not among projections and observations. Thus, uncertainties in period differences are kept only at GCM levels.

Nonparametric statistical tests were used in validation as parametric tests have much lower power compared to their nonparametric equivalents when the underlying distributions are nonnormal (Conover 1971; Helsel 1987). Even if the data are distributed normally, nonparametric methods can be often almost as powerful as parametric methods (Tanizaki 1997). Normality checks on the time series were conducted using the Shapiro-Wilk test (Shapiro and Wilk 1965; Shapiro et al. 1968), and 20 % were found to be nonnormal warranting the use of nonparametric methods for attaining higher power with all time series.

Checking for location: the Wilcoxon rank sum test

The Wilcoxon rank sum test is the nonparametric equivalent of the parametric t test for equal means between two independent data sets. The test in its most general form is used to determine whether one data set tends to produce larger values than the other data set (Helsel and Hirsch 2002; Wilcoxon 1945). The test does not make any assumptions about how the data are distributed for both sets and thus is ideally suited for non-normal distributions.

In making projections using downscaled general circulation model results, one-by-one comparisons (such as comparing monthly or yearly values of scenario projections with observations) are not meaningful. The statistical downscaling procedure is designed not to create one-by-one correspondences, but time series which are similar to each other in location (means or medians) and scale (variances or nonparametric equivalents like interquartile range, etc.) parameters and trends in time within a prescribed statistical significance. Therefore, tests for independent data sets are applicable for validation of downscaled results with observations. Within this context, the Wilcoxon rank sum test is a suitable choice for tests for location parameters as it is used, for a more specific purpose, to determine whether the two data sets do or do not differ in the location parameter which in nonparametric analysis is the median (Helsel and Hirsch 2002). This test was applied to compare seven monthly and seven annual ensembles of predictands (temperature species, precipitation, cloudiness, relative humidity, and wind speed) and five additional derived annual ensembles (heat waves, dry days in a year, maximum precipitation, dry spell length, and wet spell length) against observations from local meteorological stations with the a confidence level set to 95 %. Thus, for a specific scenario out of four (CGCM3 A1B, CGCM3 A2, HadCM3 A2, and HadCM3 B2), there are hundred comparisons and subsequent decisions whether the null hypothesis of equal medians was accepted or not. The percentage of null hypothesis acceptances, called as acceptance level, was then treated as a measure of agreement between predictands and observations. A Hodges-Lehmann type of difference estimator was used to determine by how much the predictands differed from observations. The difference estimator (Eq. (1)) calculates all the possible differences between the two sets of data (predictands and observations) and then finds their median (Esterby 1996).

$$ HL=\mathrm{median}\left({O}_{\mathrm{j}}-{P}_{\mathrm{i},\mathrm{k}}\right)for\ \mathrm{all}\ \mathrm{pairs}\ \left(\mathrm{j},\mathrm{k}\right)\ \mathrm{and}\ \mathrm{ensemble}\ \mathrm{members}\ \mathrm{i} $$
(1)

where HL is the estimated difference for a specific scenario, O j is the observation for year or month j, P i,k is the predictand for year or month k and ensemble member i. The indices j and k run through all the years or months in a particular data set.

Checking for scale: the squared ranks test

Among the several nonparametric tests available for comparing scale parameters between data sets, the squared ranks test was utilized in this study (Conover 1971; Conover and Iman 1976; Miller 1991). The test assumes independence between the samples, and the null hypothesis for the two-tailed test states that the two data sets are identically distributed except for possibly different location parameters. Though the test uses parametric location and scale parameters, it does not call for normality and thus can be considered as a robust test against departures from normality. As with the Wilcoxon rank sum test, the observations are compared against hundred ensemble members and the acceptance level is a measure of the agreement between the scale parameters.

Testing for the equality of trend slopes

Time series may come from distributions with the same location and scale parameters, but may differ in slope if any trend exists in time. To test for trend slope equality, the ensemble members were subtracted from the observation time series (Eq. (2a)) and then a Sen slope (Esterby 1996) was calculated on the residuals with all possible pairwise slopes between the individual data sets and then taking the median of the slopes (Eq. (2b)).

$$ {D}_{\mathrm{j}}={O}_{\mathrm{j}}-{P}_{\mathrm{i},\mathrm{j}}\ for\ \mathrm{all}\ \mathrm{ensemble}\ \mathrm{members}\ \mathrm{i}\ \mathrm{and}\ \mathrm{year}\ \mathrm{or}\ \mathrm{month}\ \mathrm{j} $$
(2a)
$$ S=\mathrm{median}\left(\frac{D_m-{D}_n}{m-n}\right)\ for\ \mathrm{all}\ \mathrm{pairs}\ \mathrm{of}\ m\ \mathrm{and}\ n\ \mathrm{with}\ m>n $$
(2b)

where D j are the differences between the observations and ensemble members in a one-by-one correspondence, m and n are within-data-set indices indicating positions in time, and S is the Sen slope. The significance of the slope was tested by calculating the tie-corrected Mann-Kendall tau value (Helsel and Hirsch 2002). The null hypothesis states that there is no trend (i.e., tau = 0) against the alternative hypothesis with tau not being equal to zero. Acceptance of the null hypothesis leads to the conclusion that the two time series tested have slopes not differing from each other at a 95 % confidence level, and thus, they have identical trends whereas the contrary indicates differing slopes, either in direction or in magnitude, or both. Trend slope inequality can be considered as a serious problem if the MLR equations are to be used in future projections because very large deviations might arise due to diverging slopes, especially in far future.

3 Results and discussion

3.1 Predictors for the temperature and precipitation time series

The predictors used for the temperature, their annual relative magnitude (based on the average of monthly regression coefficients), and geographical distribution over the watershed according to meteorological stations are shown in Fig. 3. The autoregressive component and mean temperature at 2 m together constitute 60 to 75 % of the variation in the downscaled temperature at all stations. In the three western stations, 17702, 17155, and 17123 which are more strongly affected by the effects of maritime climate, the near-surface specific humidity follows at the third place between 16 and 18 %. At all stations, the mean sea level pressure contributes to the downscaled temperature at similar levels between 6 and 8 %. The 500-hPa meteorological variables are only represented at station 17123 in the middle of the watershed and among them only the 500-hPa geopotential height at an appreciable level of 14 %. At this, the 850-hPa variables are present to a very small extent at 1.7 % and only as the 850-hPa geopotential height. At the westernmost stations (17155 and 17702), the 850-hPa variables are represented at the 6 and 11 % level. The two easternmost stations (17726 and 17728) show almost identical predictors at identical levels. In contrast to the three western stations, the near-surface meridional velocity component contributes to the downscaled temperature at the eastern stations at the 6 % level. Most of the monthly predictors for the downscaled temperature at station 17123 follow a trend throughout the year (Fig. 4). Noteworthy is the contrasting behavior of the two pressure predictors, namely, the mean sea level pressure and 500-hPa geopotential height. The two coinciding maxima and minima of these predictors occur in December and January and April and May. December and January are the coldest months in the year and April and May are the months when spring showers due to convective air movements are most frequent. The mean temperature predictor closely follows the yearly variation in 500-hPa geopotential height. The near-surface specific humidity affects the temperature minimally in the dry summer months. The autoregressive component dips in June have therefore minimal effect on temperature. June is also the month with the most rigorous atmospheric mixing in the middle Porsuk Stream watershed which is also indicated by short-term, mostly afternoon showers.

Fig. 3
figure 3

The predictors for the annual temperature according to stations and relative magnitude

Fig. 4
figure 4

Monthly predictors for the temperature at station 17123

More of the near surface, 500- and 850-hPa variables are represented in precipitation predictors in contrast to temperature (Fig. 5). The mean sea level pressure is only encountered in the easternmost station (17728). Noteworthy is the close association of the 500- and 850-hPa geopotential heights throughout the year (Fig. 6) and the relation between the near-surface variables and mean sea level pressure. The effect of temperature on downscaled precipitation diminishes in late fall and early winter to again rise in spring and early summer months. In periods with low temperature influence, precipitation is mainly due to large-scale frontal atmospheric disturbances, whereas in periods with relatively higher influence, precipitation occurs predominantly by local vertical mixing.

Fig. 5
figure 5

The predictors for the annual precipitation according to stations and relative magnitude

Fig. 6
figure 6

Monthly predictors for the precipitation at station 17123

3.2 Uncertainties in the downscaled time series

For every annual time series, the three statistical tests were applied 2000 times (4 scenarios × 5 stations × 100 ensemble members) each. Likewise, 2000 median differences were calculated with Eq. (1). The use of ensembles for every predictand enables to estimate the uncertainties in the downscaled results. These uncertainties are displayed with the aid of box and whisker plots in Figs. 7 and 8 for station 17123. The statistics for describing uncertainty, the median as the location parameter, the interquartile range as the scale parameter, calculated from the difference between the third and first quartiles, and the minimum and maximum were obtained from 3400 values for a particular scenario at a station (100 ensemble members × 34 years used for validation) and 2800 for predictands from reanalysis predictors (100 ensemble members × 28 years used for calibration).

Fig. 7
figure 7

Box and whisker plots for annual time series (temperature species, cloudiness, relative humidity, and wind velocity) of observations and predictands from reanalysis predictors and scenarios at station 17123. The box limits are the first and third quartiles and the whiskers are the minimum and maximum values of the data sets

Fig. 8
figure 8

Box and whisker plots for annual time series (heat wave, precipitation, dry days in a year, maximum precipitation, dry spell duration, and wet spell duration) of observations and predictands from reanalysis predictors and scenarios at station 17123. The box limits are the first and third quartiles and the whiskers are the minimum and maximum values of the data sets

To compare uncertainties induced by reanalysis and GCM predictors, a nonparametric equivalent of the coefficient of variation (NCV) was calculated by dividing the interquartile range by the median for downscaled predictands. NCV is useful for comparing spread of data sets with differing central locations by normalizing spread by the central location. For each of the 12 time series, an NCV for downscaled results from reanalysis predictors (RNCV) and another one for all scenarios are taken together at a particular station (GNCV). This second NCV, by considering all scenario results from GCM predictors together (400 ensemble members), preserves the combined spread of all the individual scenarios while smoothing out the median. In station 17123, the average temperature, precipitation, and wind velocity produced RNCVs and GNCVs which differ by less than 5 % from each other indicating that reanalysis and GCM predictors perform similar to each other. NCVs for maximum temperature differed by 8 %. For the derived time series, dry days in a year and maximum precipitation showed equally comparable NCVs. The remaining time series produced NCVs differing by less than 25 %. Only the wet spell duration had a GNCV higher by 33 % than the corresponding RNCV. For other stations and monthly time series, similar results were obtained which are not shown here.

3.3 Validation of the downscaled time series: annual aggregate and derived time series

The acceptance levels for the statistical tests in Table 3 are the result of 2000 comparisons (100 ensemble members × 4 scenarios × 5 stations) normalized to 100. The median difference in the table is the median of all possible differences between the observations and all scenarios at all five stations (as determined by Eq. (1)) representing an overall bias between the observations and scenarios.

Table 3 Validation of annual and derived time series for all scenarios and meteorological stations

The downscaled and derived time series show varied behavior in terms of acceptance level. An acceptance level of 50 % was considered as a dividing line between a good and a poor performance. This percentage corresponds to the 25 % trimmed mid-range of a data set and leaves out values above and below the upper and lower quartiles, respectively. Four out of 12 time series met this criterion for annual aggregates. Heat wave performed very poorly as only 10 % of the 2000 ensemble members passed the Wilcoxon test of equal medians and showed consistently low acceptance percentages at all stations. For equality of spread, as measured by the squared ranks test, 3 out of 12 time series were below the 50 % acceptance level. Among primary time series, relative humidity and wind velocity performed poorly. The test for trend slope equality was failed by only one time series, namely, wind velocity. The negative overall median differences indicate that the downscaled time series most of the time overestimated although individual differences among scenarios and stations exist. The two principal drivers of the hydrological cycle, temperature, and precipitation showed good performance, for all three statistical tests conducted, especially precipitation, while the 68 % acceptance level for temperature for the Wilcoxon test points to relatively higher bias between observations and scenarios.

3.4 Validation of the downscaled time series: monthly aggregate time series

Validation results in terms of statistical test acceptance percentages for the monthly time series are presented in Table 4 together with the median differences (as determined by Eq. (1)). The monthly downscaled time series performed differently across seasons and predictands. For the Wilcoxon rank sum test, 38 out of 84 time series showed acceptance levels below 50 %. The numbers of percentages below 50 % were 16 and 9 out of 84 for the squared ranks and trend slope tests. Seasonally unequal trend slopes were exclusively encountered for wind velocity which also was observed with annual time series. Temperature species were overestimated in the winter months and slightly underestimated in the summer months. Median differences were also larger in the winter months. This contrasting behavior across seasons resulted in the averaging out of these differences in the annual aggregates. The larger differences in the winter months lead to the overestimation in the annual series which is a malady that should be taken into account in interpreting projections into the future. Cloudiness did not show large differences and is in all months statistically not found to be different from observations with high acceptance levels which were also observed in the annual comparisons. Relative humidity was overestimated in 9 months which is also reflected in the annual comparisons. Wind velocity and precipitation, like cloudiness, did not show a well-defined seasonal pattern. For precipitation, larger differences are encountered in the summer months, where precipitation is lowest and least predictable.

Table 4 Validation of monthly aggregate time series for all scenarios and meteorological stations

3.5 Projections into the future

The SDSM Scenario Generator was utilized for making projections into the future till the end of the twenty-first century. The results of the projections are presented in the form of differences between the water year periods 1962–1991 and 2071–2100, as is usual practice. For the predictands in the 1962–1991 period, the values from the Weather Generator of SDSM were used, as mentioned before. The period differences were found between predictands of the same scenario, and differences between future projections of scenarios and present observations were not considered. For the comparisons, hundred ensemble members were used for each predictand and a Hodges-Lehmann type of estimator (Eq. (3)) was utilized.

$$ D=\left({P}_{\mathrm{i},m}-{P}_{\mathrm{j},n}\right)\ for\ \mathrm{all}\ \mathrm{i},\mathrm{j}\ \mathrm{and}\ m,n\left(m=2071..2100;n=1962..1991\right) $$
(3)

where i and j are ensemble members and m and n year indices from different time periods, respectively. P are the projections (2071–2100) and predictions (1962–1991) and D is their difference. For each period, there are 3000 values (30 years × 100 ensemble members) and thus nine million differences among the periods for which statistical properties are calculated subsequently. The monthly period differences are treated in the same way as the annual ones.

The period differences are presented (in plots or tables) with the median differences (the median of nine million differences) and median absolute deviation (MAD) values which is a robust estimate of the spread of a data set (Eq. (4))

$$ \mathrm{MAD}=\mathrm{median}\left(\left|{D}_i-\mathrm{median}\left({D}_i\right)\right|\right)\ for\ i=1..9000000 $$
(4)

where D i are differences as given in Eq. (3). MAD, as it is calculated using the differences from Eq. (3), is an overall measure of uncertainties induced by variations within an ensemble member and also across ensemble members.

For average temperature and precipitation, trends over the time period 1962–2100 were calculated besides period differences. The trends are found as the Theil slope (the slope of the Kendall-Theil line which is a nonparametric regression line passing through the time series) which in turn is calculated in the same manner as the Sen slope in Eq. (2b).

3.5.1 Annual period differences

The annual period differences for average temperature show, with no exception for all stations and scenarios, that the temperatures are increasing (Fig. 9). The scenarios behave very similarly among the stations, but differently among themselves. The H3_A2 scenario projects higher than its counterpart C3_A2. The scenarios line up as C3_A1B, C3_A2, H3_B2, and H3_A2 in increasing order and, as a result of regionally different responses to the climatic forcing, behave contradictory to the world-average pattern (Sect. 2.2) where the B2 scenario shows lower increases than A1B. The MAD values, shown as error bars above the medians in the plots, are very small compared to the medians and demonstrate that the increase is well above the uncertainty levels in the series. Precipitation (Fig. 9) exhibits mostly decreasing behavior, in higher amounts for the C3 scenarios. In four cases (H3_B2 in 17123, 17726, and 17728 and H3_A2 in 17123), the changes are comparable to uncertainties expressed as MAD and should not be treated as such. This is also evident when Theil slopes are examined (Table 5 for station 17123). Only 1 and 11 ensemble members were significant for H3_A2 and H3_B2, respectively. The Theil slopes result in higher total trends as the trends are calculated over a period of 138 years (1962 to 2100) while the period differences reflect changes over 110 years between the midpoints of the two differencing periods (Fig. 10).

Fig. 9
figure 9

Annual temperature (upper plot) and precipitation (lower plot) period median differences and MAD (median absolute values) for all stations and scenarios. Median differences are shown as bars and MADs as error bars above each

Table 5 Trends for the temperature and precipitation time series for station 17123
Fig. 10
figure 10

Monthly temperature and precipitation period median differences and MAD (median absolute values) for station 17123 and all scenarios. Median differences are shown as bars and MADs as error bars above each

The climate change over the watershed in terms of temperature and precipitation are more clearly demonstrated in Figs. 11 and 12. The isotherms and isohyets are based on representative samples from the ensembles, the average for temperature and for precipitation, the ensemble member whose central location and scale parameters most closely matches the corresponding statistics of the overall ensemble. The period differences are then calculated with Eq. (3) for just one representative ensemble. Based on the median differences calculated for the stations, interpolation was applied between the stations with the Kriging algorithm.

Fig. 11
figure 11

Average temperature period median differences (based on representative sample) for the four scenarios at the five stations interpolated over the watershed with the Kriging algorithm

Fig. 12
figure 12

Precipitation period mean differences (based on representative sample) for the four scenarios at the five stations interpolated over the watershed with the Kriging algorithm

The eastern parts of the watershed which are farther away from the influences of the sea experience larger temperature increases (around 0.5 °C as compared to the western stations) over a course of 110 years. The C3_A1B scenario projects a lower temperature increase over the watershed while the H3_A2 scenario resides at the higher reaches of the scale which runs from 2.5 to 6.5 °C. The ranges of the C3_A2 and H3_B2 scenarios overlap considerably.

These results are in accordance with projections done by using the RegCM3 model applied over the whole of Turkey (Apak and Ubay 2007) with the SRES A2 scenario results from the Finite Volume General Circulation Model of NASA. Projections with the PRECIS (Providing REgional Climates for Impacts Studies) model of the Hadley Centre also show increases in average temperatures reaching between 5 and 6 °C with the A2 scenario, in the inner regions of Turkey (Demir et al. 2008).

For precipitation, the C3_A1B and C3_A2 scenarios project decreases almost everywhere and in appreciable amounts as the ranges over the color bars indicate. The western portion of the watershed experiences higher reductions in precipitation. Studies with the PRECIS model showed decreasing precipitation with percentages reaching 30–40 % in the inner Anatolia Region of Turkey (Demir et al. 2008). Considering an overall average precipitation of 400 mm in the twentieth century which is an average figure for the watershed, this percentage decrease corresponds to 120–160 mm decreases in precipitation amounts. Such reductions lie at the upper extreme regions of the H3_A2 and H3_B3 scenarios and encompass also the C3_A1B and C3_A2 scenarios.

Maximum and minimum temperatures follow the pattern of the average temperature and show consistent increases among all stations and scenarios (Table 6). The median differences are well above the range of uncertainties induced by ensembles and can be treated as a significant climate change signal. Cloudiness shows decreases, again with median differences surpassing the absolute median deviations and the decrease is homogeneous over the watershed and scenarios. Relative humidity also decreases in the twenty-first century in most stations. However, there are spatial differences. Station 17155, which lies in the southwestern part of the watershed closer to the Aegean coast and thus in a region where maritime influences are felt more than in the other stations, shows increases in relative humidity. In the two eastern stations (17726 and 17728), relative humidity decreases in comparison to the remaining two stations (17123 and 17702) at appreciably higher levels, pointing towards a drying in the parts of the watershed influenced by continental climate. Small increases in wind velocity are projected, but the downscaling of wind velocity time series were burdened with problems of validation, for all three statistical tests. In addition, the overall median difference between downscaled results and observations equaled −0.15 which is at the same order as the observations and downscaled results (Table 3).

Table 6 Median period differences and median absolute deviations for annual aggregate time series in the five stations and among the scenarios

Heat wave episodes also tend to increase, in accordance with the maximum time series from which it is generated. But again it must be taken into account that the time series performed very poorly in validation (Table 3). Heat waves are a feature of summer months, and in these months, the maximum temperature also performed poorly (Table 4), affecting the heat wave series. However, intuitively, increasing temperatures should lead to increasing temperatures during heat wave periods and at least a qualitative agreement can be reached. The number of dry days in a year increases, well above uncertainties, and the increase is homogeneous over the watershed while the H3_B2 scenario projected the least increase among the scenarios. Dry spell durations also increase, more at the eastern stations, reflecting the pattern shown by relative humidity. Wet spell durations on the contrary show decreases, but at very low levels and the median differences are at the same orders of magnitude with the uncertainties. Maximum precipitation shows heterogeneous behavior both among stations and scenarios. Median differences are at the same order of uncertainties and sometimes smaller which renders the drawing of a conclusion towards an increase or decrease meaningless.

3.5.2 Monthly period differences

Monthly period differences grossly follow the patterns of the annual period differences while preserving seasonal variations characterizing particular climatic parameters and also the region. Interstation differences exist, as they were discussed in the preceding section. Due to the large amount of results, the discussion is confined to one station (17123 in the middle of the watershed). Results for temperature and precipitation are presented in Fig. 10. For temperature, the increase as demonstrated in the annual time series is replicated in the seasonal differences and the sequence of the scenarios in increasing order is not changed. Three things stand out among seasonal variations. Firstly, the median differences are mostly well above the uncertainties, but for the months of January and December (with the exception of scenario H3_A2). Secondly, the month of March stands out in three scenarios (C3_A1B, C3_A2, and H3_A2) with a larger increase than it neighbors while such a disruption of order within seasons is not encountered elsewhere. For the H3_B2 scenario, the role of March is taken up by May. Thirdly, already warm periods (summers) experience higher temperature increases. In addition, the average temperatures in the summer months are underestimated (Table 4). However, these months also performed poorly in the Wilcoxon rank sum test.

Precipitation differences vary profoundly among months with agreements among scenarios together with irregularities. All scenarios agree on large decreases in the early summer months (May, June) and on moderate or no change later in the summer; at which time, however, there is already scanty precipitation at present. The C3 scenarios disagree with the H3 scenarios on winter precipitation, and the C3s project decreases while the H3s show increases.

Monthly differences for the maximum and minimum temperatures closely follow those of average temperature (Table 7). Monthly cloudiness differences show a marked seasonality in which there are decreases in the summer and autumn months and increases elsewhere. Irregularities as small decreases in the winter and small increases in late autumn exist, but they do not disturb the great picture, especially the large decreases in mid-summer. Relative humidity replicates the cloudiness pattern in which higher decreases are offset to a small degree by increases in winter, leading towards a negative annual balance. For wind velocity, there are increases in early and summer and decreases in the cooler and colder months. There are months, however, where uncertainties are of the same order as median differences.

Table 7 Median period differences and median absolute deviations for monthly aggregate time series in station 17123 for the four scenarios

4 Conclusions

Downscaling enables to generate meteorological time series to be used for predicting into the future. Such series can be used for two purposes. One purpose is the direct use of the time series to gain a scientifically sound picture of how the future will look like and plan to take precautions to alleviate adverse future conditions. The second purpose is to use these time series in hydrological models to predict changes in the hydrological cycle. Especially in the twenty-first century, efforts in the form of national actions and international collaborations have intensified when data from GCMs, used directly or downscaled to represent local conditions, showed a grim picture of the climate towards the end of the century. These efforts led to actions and plans to reduce the greenhouse gases responsible for global warming and consequent climate change. Individual nations have prepared plans to abate the adverse impacts of the change in climatic conditions, some of which in the form of never-before-experienced natural hazards have already cost many lives.

Downscaling is a modeling process and thus requires validation so that the resulting estimates can be used with confidence. Not all of the meteorological time series required in hydrological models could be modeled with equal success as exemplified in this study. The algorithms used in downscaling are designed to find the best agreement between the predictors and predictands, but in some cases, this is not sufficient for validation with independent data sets. The reason lies to a large extent in the inefficiency of the predictors to decrease the unexplained variation in the predictand. Uncertainties in the observations and GCM scenarios and the presence of uncorrectable measurement errors also account for the failure of validation.

As noted and discussed in Sect. 2.3.1, the validation has been carried out with GCM predictors. This approach has its limitations, although it also possesses the advantage of allocating the whole data period to calibration. Moreover, by the use of ensembles, the uncertainties brought in by the biases in the climate models are revealed and compared with statistical procedures. However, the approach cannot be called a strict validation procedure and the results need to be regarded by considering the trade-off between the advantages and disadvantages of the approach.

In this study, predictor sets were examined with respect to their areal differences over the watershed. The relatively small watershed covered by five meteorological stations enabled such an examination, and relationships were investigated over the spatial domain and over the months. The monthly changes in predictor variables for temperature and precipitation revealed some persistent trends which, considered within this context, might be seen as an indication that the multiple regression equations do not act completely as black box models but reveal, when treated as a whole, underlying mechanisms.

Projections towards the end of the twenty-first century give a picture of a much warmer watershed, especially in summer with decreased precipitation, drier summers with more clear skies, with the acceptance of uncertainties involved.

As such, the projections put forward climatic conditions in the future which will largely endanger the sustainability of many activities in the Porsuk Stream watershed. The sustainability of economy and agriculture is largely dependent on the Porsuk Stream which is already heavy burdened by pollution, point source from industry and diffuse from agriculture and mining. The use of the Porsuk Stream as a source for irrigation, industrial, and domestic water use is, among others, dependent on climatic conditions. Decreased precipitation in the year 2008, coupled with low water levels in the Porsuk Reservoir due to high releases in the preceding year, led to dry stretches in the lower reaches of the stream in the same year. Subsequent years with higher-than-normal precipitation prevented a sustained water crisis. However, with an ever-increasing demand for water, it is evident that even worse shortages might be encountered in the future. The reservoir lies in the western part of the watershed and it is this area where high precipitation decreases are projected. Increases in the number of dry days in a year and especially increases in the duration of dry spells are also projected. It becomes imperative, for the protection of water resources and for the insurance of adequate supplies, to prepare management plans which take into account, among others, the prospect of climate change.

Both general circulation model results and downscaled time series show increases in temperature over a very small time span which is unequaled in the history of civilization. Smaller scale climatic fluctuations in the past have had important effects on the welfare of nations and triggered events like migrations which are unthinkable in todays overpopulated world. Although the world has evolved into a very advanced technological state, it seems improbable that increases in temperature on the orders as projected here and elsewhere will be survived by the future society without extensive damage.