1 Introduction

Floods are of the most disastrous natural events, and each year result in substantial environmental, social and economic losses worldwide. Some major examples of flooding over the last decade are floods in the central Europe in 2002 and 2005, and in the East Coast of the United States in 2011 and 2012 (Kron 2005; Zahmatkesh et al. 2015). In general, considering the historical evidences, it can be inferred that the magnitude and frequency of floods have been increased over the last century (Milly et al. 2002; Zahmatkesh et al. 2014). Population and infrastructure rapid growth in flood prone areas, however, have been incentives to enhance understanding of the key drivers that are responsible for those increases. As an example, climate change has been proved to adversely impact the hydrologic cycle (Milly et al. 2002; Goharian et al. 2015; Zahmatkesh et al. 2015).

Considering the vulnerability of coastal areas to extreme water levels (which are representative of coastal floods), regional frequency analysis (FA) is performed to estimate design floods (Karamouz et al. 2014). Frequency analysis is concerned with determining the probability of occurrence of extreme events (Gilroy and McCuen 2012). Utilizing a suitable FA analysis, an optimum design can be achieved (Saf 2008). The exceedance probabilities are generally calculated according to the extreme value analysis of a sample of historical observed data (Mudersbach and Jensen 2010). For hydrologic extremes such as water levels, extreme value theory (EVT) suggests a sound theoretical framework (Hawkes et al. 2008). FA for extreme water levels is usually based on two approaches: annual maximum series (AMS), also called block maxima, and peak-over threshold (POT). While the AMS method fits a probability distribution to the annual maxima of water levels for the defined time intervals (blocks), POT deals with the selection of extreme water levels and modeling all the independent data above a certain threshold (Lang et al. 1999; Khaliq et al. 2006; Silva et al. 2014). Definition of the threshold in this method is critical and depends on the problem of interest (Li et al. 2012).

With the assumptions that the extreme water levels are independent (probability distribution parameters to be estimated come from independent and identically distributed observations), and that the data are stationary (driving forces that affect extreme data do not change in future), the generalized extreme value (GEV) and Generalized Pareto Distribution (GPD) are used as probability distributions for values selected with AMS and POT methods, respectively (Katz et al. 2002; Hawkes et al. 2008; AghaKouchak and Nasrollahi 2010; Li et al. 2015). The POT approach, along with the GPD, has been widely used in the extreme value analyses (Coles 2001; Mackay et al. 2001; Pandey et al. 2001; Ribatet et al. 2007).

Although the assumption of independency for the extreme water levels can be justified by temporal resolution, the likelihood of stationarity, resulted from the existing trends in the mean and/or the variability of the observations, is often suspected (Khaliq et al. 2006). Non-stationarity in the coastal floods generated by severe storms could be attributed to climate change and variability, land use modification, and watershed regulations, acting separately or together (Katz et al. 2002; Xiong and Guo 2004; Milly et al. 2005, 2008; Gilroy and McCuen 2012; Salas and Obeysekera 2013; Salas and Obeysekera 2014; Vasiliades et al. 2015). Before fitting probability distributions to data, the timeseries requires to be checked against trends for evaluating the presence of non-stationarity (Hawkes et al. 2008; Eregno et al. 2014; Bayazit 2015; Jaiswal et al. 2015). If there is significant non-stationarity in the data, it should be removed to achieve to a stationary timeseries (Salas 1993). However, it is recommended to take into account non-stationarity for FA analysis. One advantage of non-stationary analysis is that the observed data can be used without de-trending. Methods of time-varying moments can be employed for non-stationary flood FA (Strupczewski et al. 2001a, b; Strupczewski and Kaczmarek 2001; Coles 2001; Katz et al. 2002; Khaliq et al. 2006; Villarini et al. 2009; Salas and Obeysekera 2014).

In the non-stationary trend analysis, time-dependent parameters are utilized for the distribution function. Therefore, the results of the extreme value analysis are expected to vary with time. The non-stationary GEV model, for example, is an efficient tool to incorporate the dependencies between the extreme value data (El Adlouni et al. 2007; Cannon 2010). To consider non-stationarity in the GEV model, different expressions have been proposed for the parameters (Tramblay et al. 2013; López and Francés 2013; Ribereau et al. 2008). Generally, the shape parameter is assumed to be constant (Katz et al. 2002; Aissaoui-Fqayeh et al. 2009; Lopez and Frances 2013), while the other two parameters, i.e., location and scale, are considered to be time- or covariant-dependent. Non-stationary distributions have been utilized to overcome the necessity to update assessments by the stationary models and provide more accurate results. Many researchers (e.g., Katz et al. 2002 and Tramblay et al. 2013) have revealed that non-stationary models surpass the classical stationary models in some climate settings. The non-stationary EVT has been significantly used and improved in the last decade (Coles 2001; Khaliq et al. 2006; Mendez et al. 2007; Ribereau et al. 2008).

This paper is aimed to provide a framework for non-stationary analysis of water level for coastal flooding. To do so, historical extreme water levels in a coastal area of New York City (NYC), southern Manhattan, are used. Common statistical methods are applied on the extreme data to detect potential trends. GEV distribution and GPD are described and employed to estimate future design water levels. Various methods for selection of an appropriate threshold to be used in the POT method are also discussed and investigated. Finally, the common design levels are compared with the time-dependent ones.

The structure of the paper is as follows. In the next section, the study area is introduced. Then, the proposed methodology is described followed by providing the results. Finally, a summary and conclusion is given.

2 Case Study

New York has been the largest and richest state in the United States. Manhattan is the most densely populated borough of NYC with the area of 87.46 km2 and 1.626 million populations in 2013. This borough is surrounded by bays and rives and is threatened by coastal flooding. The most recent recorded events are hurricane Irene and super storm Sandy. Hurricane Irene, occurred in 2011, was a destructive storm with high speed landfalls. Super storm Sandy, happened in 2012, was the largest storm event ever documented in the Atlantic Basin. Based on data recorded in the Battery Park station at the southern part of Manhattan (Fig. 1), water level from Sandy at this station was 17.2 ft. (based on the Station Datum (STD)). Trend detection and frequency analysis are performed on water level data from 1920 to 2015 for Manhattan, which are recorded in the Battery Park station.

Fig. 1
figure 1

Location of the Battery Park station in the lower Manhattan (www.shutterstock.com)

3 Methodology

FA results are employed to design coastal structures based on a concerned return period or to define flood scenarios for floodplains delineation. For this purpose, analyses are performed on the extreme water level data as representative of coastal flooding. The objective of this study is to examine various probability distribution functions for water level FA considering data non-stationarity. Different distributions are used and water levels for 2, 5, 10, 25, 50, 100, 200, 500, 700, 1000 and 1300-year return periods are estimated. Figure 2 shows the proposed methodology flowchart. The suggested framework includes four main steps: 1- data collection and analysis, 2- extreme value analysis, 3-frequency analysis, and 4-comparing and interpreting the results.

Fig. 2
figure 2

Flowchart of the proposed methodology for non-stationary frequency analysis of extreme water level

3.1 Step 1 Data Collection and Analysis

Historical instantaneous values of maximum daily and annual water level data for the Battery Park station were obtained from http://tidesandcurrents.noaa.gov. Using the observed data, timeseries of: 1- maximum hourly water elevations (one value for each day) named daily maximum instantaneous, and 2- maximum annual water level data (one value for each year), were constructed. Then, the existence of trend in the timeseries of water levels was assessed. The non-parametric Mann-Kendall test (Mann 1945; Kendall 1976) was conducted to examine the variations in the data mean. After confirmation of the trend existence in data, Augmented Dickey–Fuller (ADF) (Said and Dickey 1984) and Kwiatkowski–Phillips–Schmidt–Shin (KPSS) tests were employed to check data stationarity.

3.1.1 Test of Data for Stationary

The Mann–Kendall Test

Mann-Kendall (M-K) test is a statistical method utilized for detecting trends in data timeseries. This test is used to identify whether the median of a data timeseries changes over time. In the M-K test, the H0 (null hypothesis) and H1 (alternative hypothesis) correspond to data timeseries without and with trend, respectively. The following relationships are used:

$$ s={\sum}_{i=1}^{n-1}{\sum}_{j=1+1}^n sgn\left({Y}_j-{Y}_i\right) $$
(1)
$$ sgn\left({Y}_j-{Y}_i\right)=\left\{\begin{array}{c}\hfill +1\kern0.75em if\kern0.5em \left({Y}_j-{Y}_i\right)>0\hfill \\ {}\hfill 0\kern1em if\ \left({Y}_j-{Y}_i\right)=0\hfill \\ {}\hfill -1\kern1em if\ \left({Y}_j-{Y}_i\right)<0\hfill \end{array}\right. $$
(2)
$$ VAR(s)=\frac{1}{18}\left[ n\left( n-1\right)\left(2 n+5\right)-{\sum}_{p=1}^q{t}_p\left({t}_p-1\right)\left(2{t}_p+5\right)\right] $$
(3)
$$ {Z}_m=\left\{\begin{array}{c}\hfill \frac{S-1}{\sqrt{VAR(S)}}\kern0.5em if\ S>0\hfill \\ {}\hfill 0\kern0.75em if\kern0.75em S=0\hfill \\ {}\hfill \frac{S+1}{\sqrt{VAR(S)}}\kern0.75em if\kern1em S<0\hfill \end{array}\right. $$
(4)

where Z M is the test statistic, n indicates the number of data, Y i and Y j are the ith and jth observations, q is the number of created clusters with more than two members when the variable (Y) is recorded of several classes, and t p is the number of data in the p th class. Negative values of Z M represent a decreasing trend while positive values indicate an increasing trend in the data timeseries. The H0 is rejected when the test statistic is meaningfully different from zero at 5% significance level, i.e., if | Z M | > 1.96, then H0 is rejected which means that trend has been detected in the timeseries.

The Augmented Dickey–Fuller (ADF) Test

ADF is utilized to test the stationarity in data. To obtain the coefficients of a model (distribution), this test uses ordinary least squares method. The significance of the coefficients is estimated through the modified Dickey-Fuller (i.e., t-Student) statistic and compared with the corresponding critical value. When the test statistic is less than the critical value, the H0 is rejected. For variable Y (Y1, Y2, …, Yn), ADF considers three differential autoregressive (AR) models to detect the presence of a unit root:

$$ \Delta {Y}_t= Y\Delta {Y}_{t-1}+{\sum}_{j=1}^p\left({\delta}_j\Delta {Y}_{t- j}\right)+{e}_t $$
(5)
$$ \Delta {Y}_t=\alpha +\gamma Y\Delta {Y}_{t-1}+{\sum}_{j=1}^p\left({\delta}_j\Delta {Y}_{t- j}\right)+{e}_t $$
(6)
$$ \Delta {Y}_t=\alpha +\beta t+\gamma Y\Delta {Y}_{t-1}+{\sum}_{j=1}^p\left({\delta}_j\Delta {Y}_{t- j}\right)+{e}_t $$
(7)
$$ {e}_t\sim N\left(0,{\sigma}^2\right) $$
(8)

where Y t indicates variable at time t, α is an intercept constant, β is the time trend coefficient (the coefficient that is showing the process root), p is the lag order of the AR1 (autoregressive model of order 1) process, and e t is the residual error. The differences between Eqs. 57 indicate the existence of the deterministic drift (α) and linear time trend (βt). The test focus is on determining whether β equals to zero (which means that the timeseries has a unit root) (Said and Dickey 1984).

The Kwiatkowski–Phillips–Schmidt–Shin (KPSS) Test

H0 in this test indicates the stationarity of timeseries around mean or a linear trend, while H1 assumes that timeseries is non-stationary because of existence of a unit root. Timeseries Y1, Y2, …, Yn can be decomposed to a deterministic trend (β t ), random walk (r t ) and stationary error (ε t ):

$$ {Y}_t={r}_t+{\beta}_t+{\varepsilon}_t\kern0.5em t=1,\dots, T $$
(9)
$$ {r}_t={r}_{t-1}+{U}_t;\begin{array}{cc}\hfill \hfill & \hfill {U}_t\sim N\left(0,{\sigma}_U^2\right)\hfill \end{array} $$
(10)

For timeseries Yt with a deterministic stationary trend, H0 is expressed as \( {\sigma}_U^2=0 \) which means that the intercept is a fixed component, against the alternative of a positive \( {\sigma}_U^2 \). For this case, the residual error e t  = ε t where \( {e}_t={Y}_t-\overset{-}{Y} \) and Y t  = r 0 + β t  + ε t (H0). The H1 states that e t  = r t  + ε t which means the process has a unit root. The general form for the KPSS test is as follows:

$$ KPPSS=\frac{1}{T^2}{\sum}_{t=1}^t{s}_t^2/{\widehat{\sigma}}_{\infty}^2 $$
(11)
$$ {s}_t={\sum}_{j=1}^t{e}_j; $$
(12)
$$ {\hat{\sigma}}_{\infty}^2={lim}_{\mathrm{t}\to \infty }\ var\left({\sum}_{t=1}^t{r}_t\right) $$
(13)

The value of the test statistic could be shown by KPSSμ (statistic for testing stationarity around mean) or by KPSST (statistic for testing stationarity around a trend). If the computed statistic is greater than a critical value at the given level of significance, the H0 (i.e., stationarity) is rejected (Syczewska 2010). Critical values of the KPSS test are provided in Kwiatkowski et al. (1992).

3.2 Step 2 Extreme Value Analysis (EVA)

EVA is a method to study the tail (i.e., extreme) values of a distribution. EVA methods are opposed to the standard approaches, which are usually concerned with the average behavior of data, and intend to capture the behavior of low-frequency high impact events. While the theory of AMS is a solution for modeling extreme values, the POT approach is a technique for modeling violations from a desired high value (the excesses above a big threshold). It should be noted that several distribution functions such as Weibull, Log-normal, GEV and GPD were fitted to data, however, considering previous studies in the study area (e.g., Karamouz et al. 2015), GEV distribution and GPD are used as the probability distributions for AMS and POT methods, respectively.

3.2.1 Model for Maxima: Annual Maximum Series (AMS)

In AMS frequency analysis, annual extreme values are used to obtain the parameters of a probability distribution.

The Generalized Extreme Value (GEV) Distribution

The GEV distribution parameters including location (μ), scale (σ), and shape (ξ) specify the center of the distribution, the deviation around mean and the tail behavior of the distribution, respectively. These parameters can be inferred as linear or quadratic approximations of time covariates. In the classic stationary GEV model, all of the parameters are constant. The general form for the non-stationary GEV model is:

$$ {Y}_{t\sim } GEV\left({\mu}_t,{\sigma}_t,{\xi}_t\right) $$
(14)
$$ GEV\left({\mu}_t,{\sigma}_t,{\xi}_t\right)=\left\{\begin{array}{c} \exp \left({\left[1+\xi \left(\frac{Y-\mu}{\sigma}\right)\right]}^{-\frac{1}{\xi}}\right)\begin{array}{cc}\hfill \hfill & \hfill if\kern0.35em \xi \ne 0\hfill \end{array}\hfill \\ {} \exp \left(-{e}^{-\left(\frac{Y-\mu}{\sigma}\right)}\right)\begin{array}{cc}\hfill \hfill & \hfill if\ \xi =0\kern2.28em \hfill \end{array}\hfill \end{array}\right. $$
(15)

In the non-stationary GEV, one or more parameters could be functions of time. For example, the location parameter could be a linear function of a co-variable (i.e., μ t  = β 1 + β 2 Y t ). In another form, the location parameter could be a quadratic function of a temporal co-variable (i.e., \( {\mu}_t={\beta}_1+{\beta}_2{Y}_t+{\beta}_3{Y}_t^2 \)).

3.2.2 Model for Exceedance: Peak over Threshold (POT)

This method is employed to analyze the distribution of exceedances above a considered high threshold. In POT, the GPD is used as an approximation of the distribution of the tail (extremes). The three-parameter GPD is expressed as:

$$ {G}_{u,\sigma, \xi}(Y)=\left\{\begin{array}{c}\hfill 1-{\left(1+\xi /\sigma \left( Y- u\right)\right)}^{-1/\xi}\hfill \\ {}\hfill 1-{e}^{-\left( Y- u\right)/\sigma}\hfill \end{array}\begin{array}{c}\hfill \hfill \\ {}\hfill {}_{if\kern0.10em \xi =0}^{if\ \xi \ne 0}\hfill \\ {}\hfill \hfill \end{array}\right. $$
(16)

where u is the threshold, ξ indicates the shape and σ shows the scale parameter. The shape parameter is associated with the heaviness of the tail. For data observations x i  , i = 1 ,  …  , n and threshold u, a GPD is fitted to the exceedances above u through maximum likelihood, and accordingly, the shape and scale parameters are determined. The following equation relates the GEV and GPD models:

$$ {G}_{u,\sigma, \xi}(Y)=1+{logGEV}_{\mu, \sigma, \xi}(Y) $$
(17)

One advantage of POT is that it offers control on the number of events related to an interested threshold. For this method, at the first step, a proper threshold should be determined and thereafter, with the given threshold, the GPD can be fitted to data. Parameters of the distribution can be obtained by maximizing the likelihood function (Chen 2014). The following methods are utilized for the estimation of the optimal threshold (Parent and Bernier 2003).

Mean Residual Life Plot

If Y has a GPD with scale and location parameters of σ and μ, respectively, then:

$$ E(Y)=\frac{\sigma}{1-\upmu} $$
(18)

Assume that the GPD is a valid model for the excesses of a threshold u 0 generated by timeseries x 1 , x 2 ,  …  , x n,. For random variable x:

$$ E\left( x-{u}_0\left| x>{u}_0\right.\right)=\frac{\sigma_{u_0}}{1-\upmu} $$
(19)

where \( {\sigma}_{u_0} \) indicates the scale parameter corresponding to the excess of the threshold u 0. If the GPD is valid for the excesses of the threshold u 0, it should also be valid for all thresholds u > u 0, conditional on the proper change of σ to σ u . Therefore, for u > u 0:

$$ E\left( x- u\left| x> u\right.\right)=\frac{\sigma_u}{1-\upmu}=\frac{\sigma_{u_0}+\mu \mathrm{u}}{1-\upmu} $$
(20)

For u > u 0, E(x − u|x > u) is a linear function of u. Furthermore, E(x − u|x > u) is the mean of the excesses of the threshold u. Based on Eq. 18, these estimates are expected to change linearly with u, at levels of u for which the GPD model is appropriate. This results in:

$$ \left\{\left( u,\frac{1}{n_u}{\sum}_{i=1}^{n_u}\left({x}_{(i)}- u\right)\right): u<{x}_{max}\right\} $$
(21)

where \( {x}_{(i)},\dots .,{x}_{\left({n}_u\right)} \) includes the n u observations that exceed u, and x max is the largest amount of the x i . Eq. 21 establishes the mean residual life plot. Above a threshold u 0 at which the GPD offers a valid estimate to the excess distribution, the mean residual life plot should be nearly linear in u.

Plots of Parameter Estimates at Various Thresholds for POT Modeling

When mean excess function for values greater than a considered threshold is a line with positive slope, it denotes that the data follow GPD with a positive tail. The mean excess function for data with exponential distribution is a horizontal line and for data with low density tail is a line with a negative slope.

Plot of the Hill Estimate

Consider there exist some losses (extreme values) from the distribution and the tail coefficient (ξ) needed to be estimated. The following steps should be taken (Makarov 2007):

  1. 1-

    Choosing a threshold u > 0.

  2. 2-

    Choosing all losses greater than or equal to u and subtract u from them.

  3. 3-

    Fitting the shifted Pareto distribution to the selected losses by a maximum likelihood estimator.

If x 1 ,. ., x n denote the losses greater than or equal to u, the probability distribution function for the shifted Pareto distribution is given by:

$$ {\hat{p}}_{\xi, u}(x)=\frac{d}{d_{\xi}}{\hat{p}}_{\xi, u}(x)=\frac{d}{d_x}\Big(1-\left(1+\frac{x}{u}\Big){}^{-\frac{1}{\xi}}\right)=1/\xi u{\left(1+\frac{x}{u}\right)}^{-\frac{1}{\xi -1}} $$
(22)

The log-likelihood function is:

$$ {\sum}_{i=1}^n\mathit{\log}\left[{\hat{p}}_{\xi, u}\Big({x}_i- u\right.\left.\Big)\right]={\sum}_{i+1}^n\left[\left(-\frac{1}{\xi}-1\right)\mathit{\log}\left[\frac{x_i}{u}\right]-\mathit{\log}\right.\left.\left[\xi u\right]\right] $$
(23)

Taking the derivative of the log-likelihood function with respect to ξ:

$$ \frac{d}{d_{\xi}}{\sum}_{i=1}^n\mathit{\log}\left[{\hat{p}}_{\xi, u}\Big({x}_1- u\right.\left.\Big)\right]={\sum}_{i=1}^n\left[\Big(\frac{1}{\xi^2}\mathit{\log}\left[\frac{x_1}{u}\right]-1/\xi \right.\left.\Big)\right] $$
(24)

Setting the derivative equal to zero, the estimator for the tail coefficient ξ is obtained as follows:

$$ \xi =\frac{1}{n-2}{\sum}_{i=1}^n\mathit{\log}\left[{x}_i/ u\right] $$
(25)

The above estimate is called the Hill’s estimator (Hill 1975).

3.3 Step 3 Frequency Analysis

Three packages in R software were used for FA. These packages include extRemes/ismev, POT and texmex. Only the first package can be used for block maxima method; however, the three of them are appropriate for the threshold method. Parameter estimation method for extRemes/ismev is maximum likelihood, and for POT is penalized maximum likelihood. Nevertheless, texmex can use both methods of estimation. These packages have the ability for non-stationary analysis as well as selecting/changing the values of the distribution model parameters (Gilleland et al. 2013).

ismev package allows for the incorporation of non-stationarity (Gilleland and Katz 2011; Stephenson 2011). The package also comprises of functions for identifying the quality of the fitted distributions as well as functions to assist the selection of an appropriate threshold for the threshold models. extRemes package is an interface for ismev package, but it contains more functionality for estimation of the GEV and GP distributions for the stationary FA. It also provides methods for selection of the threshold. texmex (Southworth and Heffernan 2010) implements the conditional multivariate extreme value modeling method of Heffernan and Tawn (2004). It includes the GPD modeling as well.

3.3.1 Distribution Parameters’ Estimation

To incorporate non-stationarity, location and scale parameters were allowed to vary with relevant covariates:

$$ \mu (t)={\beta}_{o,\mu}+{\beta}_{1,\mu}{Y}_1+\dots +{\beta}_{n,\mu}{Y}_n, $$
(26)
$$ \sigma (t)={\beta}_{o,\sigma}+{\beta}_{1,\sigma}{Y}_1+\dots +{\beta}_{n,\sigma}{Y}_n, $$
(27)

where β 0 , β 1 ,  …  , β n are the coefficients and Y i ‘s represent the covariates. The shape parameter is considered to be constant. To estimate the distributions’ parameters with both stationary and non-stationary assumptions, likelihood based approaches including maximum likelihood and penalized maximum likelihood are used.

Maximum Likelihood Estimation (MLE)

The likelihood is expressed as follows:

$$ l\left(\theta \right)={\prod}_{\mathrm{t}=1}^{\mathrm{n}}\mathrm{g}{\left({Y}_t|\theta \right)}_{+} $$
(28)

where g is the probability density function of variable Yt, n is the number of observations and θ indicates the set of the parameters with trend (that could be μ t  , σ t and ξ t ). The plus sign (+) subscript indicates that the argument should be positive (Obeysekera and Salas 2014). The MLE of θ is the value of θ that maximizes l(θ).

Penalized Maximum Likelihood Estimation (PMLE)

The MLE does not necessarily offer a single solution for the assessment of the parameters of a distribution (Ketterer 2011). MLE usually fails with GPD function with small observed data (Hosking et al. 1985). To acquire a feasible distribution, penalized (modified) likelihood function is used through the application of some functions on the distribution parameters (ξ is penalized). Quadratic penalization is typically utilized for this purpose. MPLE corrects some of the issues associated with the MLE. In this method, instead of maximizing the log-likelihood l(θ), the following expression is maximized:

$$ {l}_p\left(\theta \right)={l}_p\left(\theta \right)- Q $$
(29)

where Q is a non-negative value. The value of θ that maximizes l p (θ) is called a PMLE.

3.4 Step 4 Comparison of the fitted distributions

Akaike Information Criterion (AIC)

AIC weights the goodness of fit of a model (Akaike 1974):

$$ AIC=-2 l+2 K $$
(30)

where l is the log-likelihood value estimated for the fitted model and K is the number of the model parameters. Higher ranked models have lower AIC scores.

4 Results

4.1 Test of Stationary for Extreme Water Level Data

Test for stationarity was performed by checking the water level timeseries against a linear trend. The significance of this trend was investigated using the M–K test. The M–K variable τ measures the strength of the relationship between time and water level variables. τ was obtained 0.249, which can be considered significant at 95% significance level. P-value is a function of the observed sample results used for testing a statistical hypothesis. The p-values less than 0.05 indicate a statistically significant improvement in the model performance at the 5% significance level. Since the p-value was smaller than the significance level (α), it is concluded that the null hypothesis of no trend should be rejected and the alternative hypothesis was therefore accepted. The absolute value of Z is 2.5 which is greater than Zα/2 (i.e., 1.96) and p-value (0.0123) is less than the significance level of 0.05.

Timeseries of water level for period 1920–2015 were tested for stationarity using ADF method. The critical values of ADF test in three significance levels were obtained. It was checked if the value of ADF statistics was lower than the critical values, i.e., if the null hypothesis can be accepted. The ADF static test was obtained 0.528, more than all the critical values. Therefore, data timeseries was non-stationary. Furthermore, the p-value was 0.828 which rejected the null hypothesis (i.e., data with no trend).

For KPSS test, in the case of regression with drift (α in Eq. 6), at 1% significance level, i.e., when probability of rejection is less than 1%, data were identified to be non-stationary, while at 5% and 10% significance levels, data were stationary. Since p-value at the significance level of 1% was 0.0192, the non-stationary result was acceptable. In the case of regression with drift and trend, for significance levels of 1% and 5%, data were non-stationary and for significance level of 10% they were stationary. Since the p-value was lower than the significance level of 0.01, the null hypothesis was rejected. To sum up, considering the aforementioned tests, data timeseries were assessed to be non-stationary.

4.2 Peak over Threshold Analysis for Water Level Data

Mean residual life plot for maximum daily water level data is shown in Fig. 3a. The upper and lower dashed lines in this figure establish the confidence intervals at 95% level. We are searching for a point in the figure where the plots start to be linear with upward sloping. It can be observed that the plot starts with a linear behavior, but then indicates considerable curvature in the range 8.5 ft ≤ u ≤ 10 ft. For u > 9.5 ft. the plot is sensibly linear when compared with the confidence intervals. The interpretation of a mean residual life plot is not easy; however, based on the provided explanations, the threshold value of 9.5 ft. was selected to be appropriate.

Fig. 3
figure 3

(a) Mean residual life plot, and (b) location, scale and shape parameter estimates against the threshold with GPD model, and (c) Hill estimate against the number of observations (order statistics) with POT method for maximum daily water level data

The parameter life plot can also be utilized to assess the threshold. Figure 3b shows the estimated location, shape and scale parameters for GP distribution against the threshold for maximum water level data. The parameter plot is a plot of maximum likelihood estimates of the parameters of the GPD model. Based on Fig. 3b, selection of u = 9.5 as the threshold is reasonable. Before this value, the plots are nearly horizontal. It should be mentioned that the selection of the threshold is somehow arbitrary. As long as the threshold is within a proper range, in a way that the exceedances above the threshold follow the GPD model, the results can be considered stable.

In Fig. 3c, Hill estimates against the threshold and number of observations for water level timeseries is presented. The Hill estimator in the turning point has relatively a large deviation from the fitted straight line. Moreover, the sequence of the turning points is smaller than n/10 in which n equals to the number of observations (Loretan and Phillips 1994). The turning point is the latest sequence of points that fulfill the aforementioned conditions. Based on the Hill estimate, selection of the turning point 9.5 ft. confirms the threshold value identified previously.

4.3 Stationary and Non-stationary Extreme Value Analysis

10 stationary and non-stationary GEV models (GEV0, GEV1, …, GEV9) were fitted to data (Table 1). Table 2 shows the results of frequency analysis based on both annual maxima and peak over threshold methods. Based on this table, GEV9 estimated the maximum values of water levels at different return periods.

Table 1 Stationary and non-stationary GEV models fitted to the annual water level and daily maximum instantaneous water level data
Table 2 Analysis for water levels in different return periods based on stationary and non-stationary GEV models using AMS and POT methods

It can be seen that in the case of using annual maximum data, for both stationary (GEV0 vs. GEV3) and non-stationary (GEV1 vs. GEV4 and GEV2 vs. GEV5) data, there is not any significant difference between water levels with the same return periods derived from ML and PML estimators. This is the case when using POT data with the assumption of stationarity, i.e. GEV6 vs. GEV8). But, in the case of non-stationarity in POT data (GEV7 vs. GEV9), the difference between water levels becomes more significant for return periods of greater than 100-years.

4.4 Comparison of the Models

Table 3 shows values of AIC for the proposed distributions. Based on the results, extRemes and texmex GEV models with threshold value of 9.5 ft. (i.e., GEV7 and GEV9 models), with non-stationary location and shape parameters, have the lowest value of AIC and therefore, are the most appropriate distributions for frequency analysis of water level data in the study region. Estimated parameters for these distributions models are shown in Table 3.

Table 3 AIC test results for distribution models of extreme water levels and estimated parameters for the models with the lowest values of AIC

Figure 4 compares the results of FA for the selected distribution models. This figure indicates that non-stationary FA with the POT method results in significantly higher values of water level at different return periods. It also shows even with the same assumption (i.e., non-stationarity of parameters) and the same method (POT), application of different methods for parameter estimation (ML vs. PML) could result in different values for water levels in various return periods.

Fig. 4
figure 4

Values of extreme water level data in different return periods based on the selected distribution models

5 Summary and Conclusion

Driving forces such as climate change have been proved to significantly affect the stationarity in the hydrologic data. Non-stationarity cannot be ignored especially in the studies for coastal areas, since it can change the estimated design floods obtained based on the frequency analysis of the observed extreme water level data. In the present study, different stationary and non-stationary distribution models have been fitted to the historical extreme water level data for the lower Manhattan, NYC. In developing the models, two approaches of annual maxima series and peak over threshold were used. AIC criterion was used to compare the models and finally, water level with different return periods was estimated. Using the ADF and KPSS methods as the stationarity tests, it was concluded that data timeseries was non-stationary. Data for FA were selected based on AMS and POT approaches. Then 10 stationary and non-stationary GEV models, were fitted to the selected data. It was shown that in the case of using annual maximum data, for both stationary and non-stationary assumption, except in the case of non-stationarity in POT data, there is not any significant difference between water levels of the same return periods derived from ML and PML estimators. It was also shown that based on AIC criterion, the extRemes and texmex GEV models applied to POT data with threshold value of 9.5 ft. with non-stationary location and shape parameters revealed better performance (the lowest value of AIC) compared to other model and, therefore, are the most appropriate distributions for frequency analysis of water level data in the study region. In general based on the findings, application of different distributions for frequency analysis of extreme water levels, results in obtaining a range of potential values for water level in different return periods. Selecting the type of distribution and the method to choose which data to be analyzed, significantly affects the results. In this study, it was shown that frequency analysis of water level data with the assumptions of stationary and non-stationary results in significantly different results. Moreover, a comparison was made between the distribution parameters’ estimation methods to investigate how application of different estimation techniques can affect the FA results. Taking into account non-stationarity due to the natural and/or anthropogenic activities at local or global scales, non-stationary frequency analysis is recommended to be used to estimate values of hydrologic variables in different design periods.