1 Introduction

Comprehensive flood prevention plans are recently established for large basins in South Korea to overcome the limitations of conventional flood control systems focusing on river embankments. The plans are designed to maximize the flood mitigation ability of a basin to deal with recently occurring abnormal floods through enlarging flood storage and operating integrated flood control systems in a basin. The flood control structures and/or systems are usually designed for future hydrologic conditions in 10–20 years from the current time (we call it design reference year hereafter). Therefore, the design flood should be computed taking into account (1) the change of probabilistic behavior of rainfall due to climate change and/or variability, and (2) the change of land use due to the basin development and urbanization.

This study mainly deals with the change of probabilistic behavior of rainfall in the future based on statistical analyses of extreme rainfall events in a non-stationary context. The conventional rainfall frequency analysis, which estimates rainfall depths for a given recurrence period, assumes the stability of climate and the stationarity of extreme rainfall data series (Strupczewski et al. 2001a). However, it cannot be reliably applied in cases when significant non-stationarity is found in practice (Cunderlik and Burn 2003). Even if a climate system has had stationary dynamics for a long time, a finite period may exhibit apparent non-stationarity in terms of the statistics (He et al. 2006). In addition, there is mounting evidence to suggest that the assumption of stationarity in rainfall frequency analysis is hardly ever met in reality (Katz et al. 2002; Ekström et al. 2005; Fowler et al. 2005; Khaliq et al. 2006; El Adlouni et al. 2007).

There are numerous studies dealing with design and control of water systems under non-stationary conditions. They are based on simulated extreme values produced under stationary conditions of various climate change scenarios and derive the probability distributions of hydrological variables relevant to the design of water resources project (Strupczewski et al. 2001a). However, the uncertainties of climate scenarios and Global Circulation Model-outputs are large (Kuusisto et al. 1994; Dooge et al. 1998; Strupczewski et al. 2001a) to be directly incorporated into the hydrologic frequency analysis.

In South Korea, due to the lack of flood observations, the design floods are usually determined based on the simulation of rainfall-runoff models with the rainfall depths-frequency curves which are constructed from rainfall frequency analysis. The comprehensive flood prevention plans usually employ the design rainfalls estimated based on the assumption of stationarity of rainfall because of the absence of estimator that can consider the non-stationarity in rainfall observations. The design rainfalls estimated from the stationary frequency analysis cannot effectively protect abnormal floods which may occur in the near future due to climate change and/or variability. Therefore, when establishing a comprehensive flood prevention plan requiring design rainfalls in the design reference year, the presence of non-stationarity in rainfall observations must be incorporated into the rainfall frequency analysis.

Recent studies have suggested various statistical analyses of extreme hydrologic events in a non-stationary context (Strupczewski et al. 2001b; Katz et al. 2002; Cunderlik and Burn 2003; Khaliq et al. 2006; Renard et al. 2006; El Adlouni et al. 2007; Park et al. 2010). Based on the achievement from the previous study, this study proposed the non-stationary Gumbel model (NSGM) to incorporate the observed non-stationarity into constructing rainfall depth-frequency curves in a design reference year. After detecting non-stationarity in rainfall observations, various statistical relationships between rainfall statistics and distribution parameters are evaluated for estimating distribution parameters in a design reference year. Then, based on the probability function of rainfall, the rainfall depths corresponding to occurrence probability are estimated, and rainfall depth-frequency curves are constructed for the hydrologic design in the comprehensive flood prevention plans.

2 Non-stationarity in rainfall observations

This study collected rainfall data measured from 56 rain gauging stations in South Korea as shown in Fig. 1. The gauging stations are managed by the Korean Meteorological Administration (http://www.kma.go.kr) and possess annual maximum daily rainfalls over 30 years measured by self-registering rain gauges. They also provide primary data for establishing the comprehensive flood prevention plans in South Korea.

Fig. 1
figure 1

Rainfall gauging stations in South Korea

The conventional rainfall frequency analysis is usually performed using the annual maximum data series under the assumption that rainfall is stationary during the design life of a hydraulic structure. However, recent works have detected a linear trend in hydrologic observations, which is one of non-stationary characteristics that can provide evidence to judge whether a time series is stationary or not. For instance, Fig. 2 shows annual maximum rainfall (AMR) series with 24-h duration collected at Geochang. The increasing trend in observations, as indicated by the linear line in Fig. 2, is very clear, and may affect the statistical behavior of rainfall in the future.

Fig. 2
figure 2

AMR at Geochang station (duration = 24 h)

We performed statistical tests such as Mann–Kendall test, Hotelling–Pabst test, and Spearman’s rho test, for detecting a linear trend in AMR series for all stations in Fig. 1. More details on the tests are found in Conover (1971), Hamilton (1992), and McCuen (2002). At the 5% significant level, Mann–Kendall test, Hotelling–Pabst test, and Spearman’s rho test rejected the hypothesis of stationarity at seven stations, as shown in Table 1. In addition, after fitting AMR data to a time-dependent linear function, at the 5% significant level, the p-values indicate that the slope parameter is known to be non-zero at six stations, as shown in Table 2. When a significant linear trend is identified in the AMR series, conventional rainfall frequency analyses cannot be applied in practice, because the distribution parameters change over time, as illustrated in Fig. 3. This means that the distribution parameters of AMR changing over time result in the change of occurrence probability of extreme rainfalls in the future.

Table 1 Statistics of trend tests for rainfall observations (duration = 24 h)
Table 2 Results of hypothesis testing for slope of regression line (duration = 24 h)
Fig. 3
figure 3

Change of pdf of rainfall at Geochang station (duration = 24 h)

According to Jung et al. (2011), one reason of increasing trends in summer rainfall in South Korea is strongly related to the temporal change of the Changma front, which causes heavy precipitation from mid-June to end-July. It is also certainly expected that the change of probability distribution of AMR associated with global warming and/or climate variability results in severe floods, which are confronted with empirical findings in several countries (Strupczewski et al. 2001b). For a comprehensive flood prevention plan in South Korea, it is very important to determine how long the detected trend will last and to estimate design rainfalls in a design reference year. Therefore, the conventional rainfall frequency analysis which assumes that the AMR series is stationary does not provide appropriate design rainfalls for the site with a significant trend of rainfall.

3 Methodology

A design rainfall is a design standard that is determined based on the rainfall depth-frequency curves. The rainfall depth-frequency curves provide various rainfall depths corresponding to exceedance probabilities in a design reference year. However, as shown in Fig. 3, the probability density function (pdf) of AMR varies over time. In order to make sure the hydrologic safety for floods during life time of hydraulic structure, a new statistical scheme is required to incorporate observed trends in rainfall data into constructing rainfall depth-frequency curves.

Recent works have described the parameters of probability function as a time dependent form (Strupczewski et al. 2001a; Strupczewski and Kaczmarek 2001; Katz et al. 2002; Ramesh and Davison 2002; Cunderlik and Burn 2003; Park et al. 2010). Although some studies showed good performance of the general extreme value (GEV) distribution to the extreme rainfall (Katz et al. 2002; Ekström et al. 2005; Park et al. 2010), empirical studies on non-stationary models indicate that it is preferable to represent the trend in both location and scale parameters (Strupczewski and Kaczmarek 2001; El Adlouni et al. 2007). In addition, since this study is to focus to develop a statistical scheme for non-stationarity rainfall analysis and to meet the practical need for updating the Korean Design Rainfall Maps (MOCT 2000) in which the Gumbel distribution was chosen for the best-fitted distribution, therefore, we mainly consider a Gumbel distribution which has two distribution parameters.

A NSGM has two distribution parameters which can be expressed as a function of time, as shown in Eq. 1.

$$ {\text{f}}\left( {{\text{x}}_{\text{i}} ;{\text{t}}} \right) = \frac{1}{{\sigma \left( {\text{t}} \right)}}\exp \left\{ { - \frac{{{\text{x}}_{\text{i}} - \mu \left( {\text{t}} \right)}}{{\sigma \left( {\text{t}} \right)}} - \exp \left[ { - \frac{{{\text{x}}_{\text{i}} - \mu \left( {\text{t}} \right)}}{{\sigma \left( {\text{t}} \right)}}} \right]} \right\} $$
(1a)
$$ {\text{F}}\left( {{\text{x}}_{i} ;{\text{t}}} \right) = \exp \left\{ { - { \exp }\left[ { - \frac{{x_{\text{i}} - \mu \left( {\text{t}} \right)}}{\sigma \left( {\text{t}} \right)}} \right]} \right\} $$
(1b)

where \( {\text{f}}\left( {{\text{x}}_{\text{i}} ; {\text{t}}} \right) \) and \( {\text{F}}\left( {{\text{x}}_{\text{i}} ; {\text{t}}} \right) \) refer to the pdf and the cumulative distribution function (cdf), respectively. \( \mu \left( {\text{t}} \right) \) and \( \sigma \left( {\text{t}} \right) \) are the location and the scale parameter of the NSGM at time t, respectively. The conventional frequency analysis [i.e., stationary Gumbel model (SGM)] assumes that the parameters are constant. However, the parameters change as time passed, as shown in Fig. 3. From the exploratory data analyses (for example, Table 3), it was found that the parameters may be a function of covariates such as time (in year) and statistics (mean, sum, and standard deviation) of rainfall. Based on the classification of Khaliq et al. (2006) the NSGM proposed in this study falls into the method of time-varying moments.

Table 3 Correlations among statistics of rainfall and parameters of Gumbel distribution at Seoul station (duration = 24 h)

The following section verifies the proposed method to construct rainfall depth-frequency curves in a design reference year which can take a linear trend of AMR and abnormal hydrologic events that can occur in the future into account.

4 Verification of NSGM

One of primary objectives of this study is to predict the time-dependent distribution parameters of rainfall in a design reference year. Unfortunately, for the stations which have a linear trend in rainfall data as listed in Tables 1 and 2, it is less convenient to verify the proposed model due to their relatively short record lengths (1969–2008). For example, as shown in Fig. 3, there are only 17 data sets for the time-dependent distribution parameters after setting the initial data period of 20 years. Therefore, in order to verify the proposed NSGM, we are obliged to employ the AMR at Seoul station. Although the AMR of Seoul is not significant for a linear trend, it has the increasing linear trend and the longest observations (1961–2008). Using observed rainfalls from 1961 to 1994, assuming the current year is 1994, the NSGM was applied to construct rainfall depth-frequency curve in 2008 (e.g., assuming as the design reference year), as shown in Fig. 4.

Fig. 4
figure 4

AMR at Seoul station (duration = 24 h)

In this study, the NSGM was proposed to incorporate the possible shift of pdf due to the non-stationarity of data into constructing rainfall depth-frequency curves. From the exploratory data analyses, it was found that distribution parameters have significant relationships with statistics, such as mean, sum, and standard deviation, of rainfall, as shown in Table 3. This study investigated the performance of the NSGM for various combinations of covariate to estimate the distribution parameters in a design reference year. The pdf and the cdf of the NSGM, of which parameters are expressed as a function of covariate, are expressed as in Eqs. 2a and 2b, respectively.

$$ {\text{f}}\left( {{\text{x}}_{\text{i}} ; {\text{t}}} \right) = \frac{ 1}{{{\text{b}}_{ 0} + {\text{b}}_{ 1} \times \omega \left( {\text{t}} \right)}}{ \exp }\left\{ { - \frac{{{\text{x}}_{\text{i}} - \left( {{\text{a}}_{ 0} + {\text{a}}_{ 1} \times \theta \left( {\text{t}} \right)} \right)}}{{{\text{b}}_{ 0} + {\text{b}}_{ 1} \times \omega \left( {\text{t}} \right)}} - { \exp }\left[ { - \frac{{{\text{x}}_{\text{i}} - \left( {{\text{a}}_{ 0} + {\text{a}}_{ 1} \times \theta \left( {\text{t}} \right)} \right)}}{{{\text{b}}_{ 0} + {\text{b}}_{ 1} \times \omega \left( {\text{t}} \right)}}} \right]} \right\} $$
(2a)
$$ {\text{F}}\left( {{\text{x}}_{\text{i}} ; {\text{t}}} \right) = { \exp }\left\{ { - { \exp }\left[ { - \frac{{{\text{x}}_{\text{i}} { - }\left( {{\text{a}}_{ 0} + {\text{a}}_{ 1} \times \theta \left( {\text{t}} \right)} \right)}}{{{\text{b}}_{ 0} + {\text{b}}_{ 1} \times \omega \left( {\text{t}} \right)}}} \right]} \right\} $$
(2b)

where \( \theta \left( {\text{t}} \right) \) and \( \omega \left( {\text{t}} \right) \) are covariates for \( \mu \left( {\text{t}} \right) \) and \( \sigma \left( {\text{t}} \right) \), respectively. a0, a1, b0, b1 are regression parameters, that is, \( \mu \left( {\text{t}} \right) = {\text{a}}_{ 0} + {\text{a}}_{ 1} \theta \left( {\text{t}} \right) \) and \( \sigma \left( {\text{t}} \right) = {\text{b}}_{ 0} + {\text{b}}_{ 1} \theta \omega \left( {\text{t}} \right) \).

The maximum likelihood (ML) method was used to estimate appropriate distribution parameters in Eq. 2. The ML method is more straightforward to be applied in the presence of covariate (Katz et al. 2002; Sankarasubramanian and Lall 2003; Zhang et al. 2004; Park et al. 2010). Assuming the independence of the data, the likelihood function is the product of the assumed densities for the observations x1, x2, …, xn.

$$ {\text{L}}\left( {{\text{x}}_{\text{i}} ; {\text{t}}} \right) = \prod\limits_{{{\text{i}} = 1}}^{\text{n}} {\frac{ 1}{{{\text{b}}_{ 0} + {\text{b}}_{ 1} \times \omega \left( {\text{t}} \right)}}{ \exp }\left\{ { - \frac{{{\text{x}}_{\text{i}} { - }\left( {{\text{a}}_{ 0} + {\text{a}}_{ 1} \times \theta \left( {\text{t}} \right)} \right)}}{{{\text{b}}_{ 0} + {\text{b}}_{ 1} \times \omega \left( {\text{t}} \right)}} - { \exp }\left[ {{ - }\frac{{{\text{x}}_{\text{i}} - \left( {{\text{a}}_{ 0} + {\text{a}}_{ 1} \times \theta \left( {\text{t}} \right)} \right)}}{{{\text{b}}_{ 0} + {\text{b}}_{ 1} \times \omega \left( {\text{t}} \right)}}} \right]} \right\}} $$
(3a)
$${\text{InL}}\left( {{\text{x}}_{\text{i}} ;{\text{t}}} \right)= - \sum\limits_{{{\text{i}} = 1}}^{\text{n}} {\left\{{{\text{In}}\left( {{\text{b}}_{0} + {\text{b}}_{1} \times \omega \left( {\text{t}} \right)} \right) + \frac{{{\text{x}}_{\text{i}} - \left( {{\text{a}}_{0} + {\text{a}}_{1} \times \theta \left({\text{t}} \right)} \right)}}{{{\text{b}}_{0} + {\text{b}}_{1}\times \omega \left( {\text{t}} \right)}} + \exp \left[ { -\frac{{{\text{x}}_{\text{i}} - \left( {{\text{a}}_{0} + {\text{a}}_{1} \times \theta \left( {\text{t}} \right)} \right)}}{{{\text{b}}_{0} + {\text{b}}_{1} \times \omega \left({\text{t}} \right)}}} \right]} \right\}} $$
(3b)

The ML method finds parameters to maximize the log-likelihood function in Eq. 3b. The direct search for maximum of the log-likelihood function by the gradient method was provided by the Matlab optimization toolbox (Mathworks 2010).

This study first estimated the covariates by a linear fit with time. A non-linear (i.e., polynomial) fit can also be applied. However, the polynomial fit has considerable uncertainty about the prediction beyond the range of observations (Ramesh and Davison 2002). After estimating the covariates in a design reference year, the rainfall quantiles were estimated corresponding to return periods of 10, 30, 50, 100, 150, and 200 years. In practice, since the estimated design rainfalls are directly used in hydrologic planning and design, it is desirable to evaluate the accuracy of estimated design rainfalls in the design reference year. Table 4 compares the performance of various combinations of covariates for the NSGM based on the mean of relative error (MRE) calculated by Eq. 4

$$ {\text{MRE}} =\frac{1}{\text{n}}\sum\limits_{\text{i}}^{\text{n}}{\frac{{(\widehat{{{\text{P}}_{\text{i}}}}{\text{ -P}}_{\text{i}}})}{{{\text{P}}_{\text{1}} }} \times 1 0} $$
(4)

where \( \widehat{{{\text{P}}_{ 1} }} \)’s represent the design rainfalls estimated by NSGM or SGM for the design reference year, i.e., 2008 in this section, using data during 1961–1994. \( {\text{P}}_{\text{i}} \)’s are the design rainfalls for 2008 using data during 1961–2008, which are the target values assumed in this section.

Table 4 MRE calculated for 2008 at Seoul station (duration = 24 h)

The MRE for the SGM, which is usually applied in practice to estimate the design rainfalls using the AMR during 1961–1994, i.e., available data up to current, under the assumption of stationarity of rainfall, is −8.95. That means the conventional stationary model considerably under-estimated design rainfalls in the design reference year. Figure 5 compares the rainfall depth-frequency curves constructed by the NSGM and SGM. As shown in Table 4 and Fig. 5, the NSGM estimates more accurate design rainfalls than the conventional analysis when appropriate covariates are introduced. Especially, when sum and mean are used as covariates for the location and scale parameter, respectively, the NSGM (Sum, Mean) has the best performance even though slightly under-estimated. Before constructing the NSGM, it was expected that the NSGM (Mean, Std) would have the best performance because it has the highest correlations with distribution parameters, i.e., 0.96 and 0.95, as shown in Table 3, respectively. However, since the ML produced quite low scale parameter and R 2 for the estimation of standard deviation is quite low (≈0.091), the NSGM (Mean, Std) did not show better performances than other NSGM with alternative combinations, for example, MRENSGM (Mean, Std) = −4.19 whereas MRENSGM (Sum, Mean) = −2.04.

Fig. 5
figure 5

Rainfall depth-frequency curves at Seoul station in 2008 (duration = 24 h). SGM (2008) and SGM (1994) mean the SGM with data during 1961–2008 and 1961–1994, respectively. NSGM (Sum, Mean) indicates the NSGM with sum and mean as covariates for location and scale parameter, respectively

5 Applications and discussions

This study applied the NSGM, which has sum and mean of the AMR as covariates for the location and scale parameter, respectively, to six stations at which a linear trend is significant in rainfall observations as discussed in Sect. 2. At each station, the sum of AMR (SAMR) and the mean of AMR (MAMR) were fitted to the linear function with time as the predictor. Then, the parameters of NSGM were also determined to maximize log-likelihood function in Eq. 3b. That is, in Eq. 3b, \( \theta \left( {\text{t}} \right) \) and \( \omega \left( {\text{t}} \right) \) become the estimates of SAMR and MAMR for the design reference year. The performance of the method partly depends on the how well the linear function explains the variability of the SAMR and MAMR. As shown in Table 5, the R 2’s for the SAMR are higher than 0.99 which provide very promising results in this study, while the R 2’s for the MAMR are higher than 0.8 except for Yeongju.

Table 5 R 2 of the estimates for SAMR and MAMR at non-stationary stations (duration = 24 h)

The rainfall depth-frequency curves were constructed for the several design reference years, i.e., 2020, 2030, 2040, and 2050, at six non-stationary stations, as shown in Fig. 6. For Geochang, Jecheon, and Seonsan, the rainfall depth-frequency curves increase within the upper confidence limit of SGM (2008). The trend detected in these stations may not significantly affect the design rainfalls. In fact, standard engineering practices can account for the uncertainty in the estimation of design rainfalls through introducing freeboard for hydraulic structures (Stakhiv 2010). For Inje, Mungyeong, and Yeongju, however, the design rainfalls are expected beyond the upper confidence limit, especially for 2040 and 2050. The rainfall depth-frequency curves are considerably influenced by the trend embedded in rainfall observations, which should be accommodated in the design practices to cope with the extreme floods.

Fig. 6
figure 6

Rainfall depth-frequency curves at non-stationary stations in various design reference years (duration = 24 h). The dotted lines refer to the 95% confidence intervals of SGM (2008)

This study also estimated design rainfall increase rate (i.e., MRE with respect to 2008) at non-stationary stations as shown in Fig. 7. Compared with the design rainfalls estimated in 2008, the design rainfalls increase on average around 15, 18, 20, and 23%, for 2020, 2030, 2040, and 2050, respectively. Especially, for 2050, the NSGM estimated the design rainfalls to increase by 15–30%. Since the covariates were estimated from the linear function with time, the rate increases about 3% every decade. The British Government provides the sensitivity ranges of design rainfall intensity in making an assessment of the impacts of climate change, which increase peak rainfall intensity up to 30% (Communities and Local Government 2010).

Fig. 7
figure 7

Design rainfall increase rate (%) at non-stationary stations (duration = 24 h)

The sample stations have severe experiences of the flood damage from extreme storm events since 1990. The results of trend tests in Tables 1 and 2 are realizations of these experiences embedded in recent rainfall observations. Without a proper approach to incorporate the non-stationarity in rainfall observations, the design rainfall is usually determined based on observations until the present under stationary conditions. However, the design rainfalls are generally expected to increase as shown in Figs. 6 and 7. The assumption of a time-independent storm characteristics, which is a typical approach performed in practice, might lead to a serious under-estimation of the occurrence probability of severe storms in the future. The results achieved in this study could serve as a basis for new standards for effective flood prevention plans.

6 Conclusions

The rainfall frequency analysis, which is a practical statistical method for estimating extreme values of rainfall for a given recurrence period, cannot reliably applied in cases when a significant non-stationarity is detected in observations. Recent climate variability critically requests to develop new approaches that allow predicting the probabilistic behavior of rainfall at a specific future time.

This study suggested a practical and robust scheme to incorporate the linear trend in rainfall observations into constructing the rainfall depth-frequency curves in a design reference year for the comprehensive flood prevention plan in South Korea. It permits the estimation of the distribution parameters even when shifted in the future by using linear relationships between rainfall statistics and distribution parameters. The results showed that ignoring even a weakly significant non-stationarity in rainfall observations may introduce serious bias in the quantile estimation of extreme rainfall in the future. In addition, the results also illustrated that incorporating the non-stationarity in rainfall observations provide more reasonable quantile estimates of extreme rainfall in the future than the conventional approach. However, the method suggested in this study can be applied in cases where the statistics of extreme rainfall is properly fitted to the linear function. Therefore, if the linearity of extreme rainfall and/or the linear relationship between rainfall statistics and distribution parameters are not clear, the performance of this approach might be limited.

The implicit premises in the analysis scheme proposed in this study are that the extreme events are independent from each other, and the trend embedded in observations is persistent in near future, in particular until the hydrologic design period. Future works will be focused on identification of linkages between non-stationarity in extreme storm events and climatic variability, and on improving the reliability of predicting extreme rainfalls under climate change which can eventually lead to establish a long-term flood prevention policy.