1 Introduction

In recent years there has been considerable research in the development of time series models with seasonal or periodic properties in hydrology and water resources (Vecchia 1985a). These models are important for the planning and the design of water management policies. They aim at providing accurate forecasts to the river manager for allocating the resource between interest conflicting users for instance. These forecasts can be obtained using single-variable approach which doesn’t need to model the physical processes of the hydrologic system (Hipel and McLeod 1994). When such a hydrologic model exists, time series models can be used to analyse the difference between the observed data and the forecasts of the hydrological model as in Madsen and Skotner (2005) and Gragne et al. (2014). In both cases, there is a need to find automatic procedure for the identification and the estimation of the more relevant time series models.

Seasonal time series models like the seasonal autoregressive integrated moving average (SARIMA) model developed originally by Box and Jenkins (1970, chap. 9) have been extensively studied in the literature of hydrologic models. Mishra and Desai (2005) successfully applied ARIMA and seasonal ARIMA models to forecast droughts based on the usual stages of model development: identification, estimation and diagnostic checking. Fernandez et al. (2008) applied a multiplicative SARIMA model to forecast monthly streamflow in a small watershed in North West Spain. Durdu (2010) used a SARIMA model to predict drought in the Büyük Menderes river basin using the standardized precipitation index (SPI) as drought index. However, as pointed out by  McLeod (1993), river flows for a particular season of the year may be statistically similar from year to year, but may depend intrinsically on the season. This feature cannot be captured by SARMA models which represent a class of stationary models with large lag autocorrelations that are invariant with respect to the season. Moreover it turns out that many seasonal time series cannot be filtered to achieve second-order stationarity due to the correlation structure of these time series with the season (Vecchia 1985b). As the majority of river flow time series satisfy the property of periodic stationarity, meaning that their mean and covariance functions are periodic with respect to time, the use of periodic autoregressive (PAR) modeling in water resources is justify. Clearly, a PAR model encapsulates a separate autoregressive model for each season of the year, which are potentially different. The PAR model may be also seen as particular case of the PAR moving average (PARMA) class of models (Jimenez et al. 1989).

The method of moments based on Yule–Walker equations (McLeod 1994) and the least squares (LS) method in the univariate case (Franses and Paap 2004) are efficient to estimate PAR models. As mentioned by Hipel and McLeod (1994), when the seasonal data and the model for each season are used rather than the annual data and the associated model, significant gain in parameter efficiency can be achieved. However the main problem in PAR modeling relies on the number of parameters to estimate which varies with the choice of the season for river-flows. Moving from monthly surveys to quarter-monthly river flow data, increases both the number of models and the number of parameters to be estimated. To obtain parsimonious models, it is of interest to study situations in which linear constraints on the parameters of a given season are introduced (Ursu and Duchesne 2009). It also justifies the use of genetic algorithm with Bayes information criterion (BIC) to identify the optimal order of the PAR model. Tools such as genetic algorithm (Koutroumanidis et al. 2009), support vector machine and genetic programming (Wang et al. 2014) or artificial neural network (Aksoy and Dahamsheh 2009) appear as some of the numerous modern methods of simulation and optimization in hydrology. An overview of these simulation and optimization methods used in reservoir operation is presented in Fayaed et al. (2013).

A second problem in the parameter estimation of time series models occurs with the presence of outliers that may imply serious problems. Hau and Tong (1989) presents a new approach for outlier detection in autoregressive models with real data from water ressources. They also present two types of outliers which are generally discussed in the time series literature: innovation and additive outliers. As in Ursu and Pereau (2014), this paper focuses on additive outliers since their impact on the parameter estimates are known to be more important than innovation outliers (Li 2004, chap. 4). In particular the sensitivity of the LS estimation method to outliers requires the use of robust approaches (Denby and Martin (1979) for autoregressive models of first order; Ben et al. (1999) for vector autoregressive moving average (VARMA) models; Shao (2007) for univariate PAR models). Based on the robust scale estimator developed by Ma and Genton (2000), Sarnaglia et al. (2010) analyses its properties in PAR models. In periodic vector autoregressive (PVAR) models,  Ursu and Pereau (2014) implement a robust estimation method based on residual autocovariances (RA) to deal with additive outliers. An extensive review on the impact of outliers in time series models can be found in Maronna et al. (2006, chap. 8).

Model identification is typically the most difficult aspect of the model-building procedure. The examination of the plots of the periodic partial autocorrelation function as a tool to identify a periodic model requires a high level of user experience (see, e.g. Jimenez et al. 1989). Another approach is to use an automatic selection criterion, such as the Akaike information criterion (AIC) or the Bayesian information criterion (BIC) when all possible models are examined. A possible difficulty with this procedure is the large number of models to be investigated. The contribution of the paper to the theoretical literature is to provide an automatic methodology able to select and estimate PAR models in the presence of outliers. Moreover from an operational point of view, this paper shows that such a methodology can be applied successfully at a quarter-monthly temporal scale showing that one-step forecasts with the robust model are always better than the unrobust approach model various efficiency criteria. This approach shows that the number of parameters to be estimated is also lower with the robust model.

This article is organized as follows. In Sect. 2, the PAR model is introduced and least squares estimators are computed. In Sect. 3, a robust estimation in the presence of outliers is developed. Section 4 illustrates the results for the case study of the Garonne river. Section 5 offers some concluding remarks.

2 Periodic models

The class of PAR models extends the class of autoregressive (AR) models by allowing the autoregressive parameters to vary with the seasons. It is worth pointing out that a PAR model is formed by defining a different AR model for each season of the year. A PAR model with 12 periods can be associated with 12 AR models. It should be noted that, when the number of periods is 1, PAR model becomes AR model.

Let \(Y = \{ Y_t, t \in {\mathbb {Z}} \}\) be a PAR stochastic process given by

$$Y_{ns+\nu } = \sum _{k=1}^{p(\nu )} \phi _k(\nu ) Y_{ns+\nu -k} + \epsilon _{ns+\nu }.$$
(1)

For fixed \(\nu\) and predetermined value s, the random variable \(Y_{ns+\nu }\) represents the realization during the \(\nu\)th season, with \(\nu \in \{1,\ldots ,s\}\), at year n, \(n \in {\mathbb {Z}}\). With monthly data the value \(\nu =12\) is naturally selected, whereas that for quarter-monthly data \(\nu =48\). The autoregressive model order at season \(\nu\) is given by \(p(\nu )\), whereas \(\phi _k(\nu )\), \(k=1,\ldots ,p(\nu )\), represent the autoregressive model coefficients during season \(\nu\), \(\nu = 1,\ldots ,s\). The error process \(\epsilon = \{ \epsilon _t, t \in {\mathbb {Z}} \}\) in Eq. (1) corresponds to a periodic white noise, with \(E(\epsilon _t) = 0\) and \({\text {var}}(\epsilon _{ns+\nu }) = \sigma ^2(\nu )>0\), \(\nu = 1,\ldots ,s\). The random process \(Y_{t}\) in (1) is supposed to have zero mean.

Unless otherwise stated we assume that PAR models are stationary in the periodic sense. Periodic stationarity is discussed in Gladyshev (1961). Typically, the periodic models used in water resources and environmental systems are stationary, in the sense that they do not need to be differenced to achieve stationarity (or, otherwise put, data do not have unit roots). In applications, seasonal means are first removed from the time series.

2.1 Identification and estimation for PAR models

This section summarizes without proofs the relevant material on identification, and parameter estimation for PAR models. References that provide detailed proofs are included in the text.

Several estimation techniques are available for periodic models, namely the least-square method (Franses and Paap 2004; Lütkepohl 2005), the method of moments based on Yule–Walker estimation (Pagano 1978; Hipel and McLeod 1994), the maximum likelihood estimation (Vecchia 1985a; Eshete and Vandewiele 1992) and the Kalman filter (Jimenez et al. 1989).

Consider the time series data \(Y_{ns+\nu }\), \(n=0,1,\ldots ,N-1\), \(\nu = 1,\ldots ,s\) with sample size \(n=Ns\). Let

$$\begin{aligned} {\mathbf {z}}(\nu )= & {} \left( Y_{\nu }, Y_{s+\nu }, \ldots , Y_{(N-1)s+\nu } \right) ^{\top }, \\ {\mathbf {e}}(\nu )= & {} \left( \epsilon _{\nu }, \epsilon _{s+\nu }, \ldots , \epsilon _{(N-1)s+\nu } \right) ^{\top },\\ {\mathbf {X}}(\nu )= & {} \left[ \begin{array}{cccc} Y_{\nu -1} &{} Y_{\nu -2} &{} \ldots &{} Y_{\nu -p(\nu )} \\ Y_{s+\nu -1} &{} Y_{s+\nu -2} &{} \ldots &{} Y_{s+\nu -p(\nu )} \\ \vdots &{} &{} \ddots &{} \vdots \\ Y_{(N-1)s+\nu -1} &{} Y_{(N-1)s+\nu -2} &{} \ldots &{} Y_{(N-1)s+\nu -p(\nu )} \\ \end{array} \right] , \end{aligned}$$

be \(N \times 1\), \(N \times 1\) and \(N \times p(\nu )\) random matrices. By defining the \(p(\nu ) \times 1\) vector \({\varvec{\beta }}(\nu )\) of the parameters as:

$${\varvec{\beta }}(\nu ) = \left( \phi _1(\nu ), \ldots , \phi _{p(\nu )}(\nu ) \right) ^{\top },$$
(2)

the PAR model can be written in the following form:

$${\mathbf {z}}(\nu ) = {\mathbf {X}}(\nu ){\varvec{\beta }}(\nu ) + {\mathbf {e}}(\nu ), \; \nu = 1,\ldots ,s.$$
(3)

From Eq. (3), the least squares estimators (unconstrained and constrained) of \({\varvec{\beta }}(\nu )\) can be easily found. For more details we refer the reader to Ursu and Turkman (2012, Sect. 2). Once the estimates \(\hat{{\varvec{\beta }}}(\nu ) = \left( \hat{\phi }_1(\nu ), \ldots , \hat{\phi }_{p(\nu )}(\nu ) \right) ^{\top }\) are obtained, the model residuals are introduced:

$$\hat{\mathbf {\epsilon }}_{ns+\nu } = \left\{ \begin{array}{ll} Y_{ns+\nu } - \displaystyle \sum _{k=1}^{p(\nu )} \hat{\phi }_k(\nu ) Y_{ns+\nu -k}, & ns+\nu > p(\nu ), \\ 0, & ns+\nu \le p(\nu ), \end{array} \right.$$

which are well-defined for \(n=0,1,\ldots ,N-1\).

Various selection criteria using AIC or BIC can be used for PAR model identification. One possible way is to use the BIC selection criterion separately for each of the seasonal components:

$$BIC(\nu ) = \log \hat{\sigma }^2(\nu ) + \frac{\log (N)}{N} p(\nu ),$$
(4)

where \(\hat{\sigma }(\nu )\) stands for the least squares estimators of \(\sigma (\nu )\), and \(p(\nu )\) represents the number of autoregressive parameters in season \(\nu\) McLeod (1994).

Even if this method reduces the number of models to be investigated, the number of possible models remains very high. The large number of possible solutions for the PAR selection model suggests that genetic algorithms (GA) can be useful to an efficient examination of the space of solutions and selection of the combination of parameters that corresponds to the best model. The GA combined with BIC criterion is a reliable and easy way to identify PAR models (Ursu and Turkman 2012).

We briefly summarize our GA procedure for subset PAR modeling.

  • String representation Each subset AR model is encoded as a string, each locus in the string is filled with 1 if the parameter is free, and with 0 if the parameter is constrained to zero. Since a maximum search order has to be selected, every string has the same length L. For example, if we take \(s=12\), \(\nu =1\) and \(p(1)=15\), and the model

    $$Y_{12n+1} = \phi _6(1)Y_{12n-5} + \phi _7(1)Y_{12n-6} + \epsilon _{12n+1}$$

    then, the string representing our model is

    $$000001100000000.$$

    Note that in this case, the number of all possible models is \(12 \times 2^{15} = 393216\).

  • Initial population An arbitrarily population of chromosomes of size \(N_p\) is generated. Each chromosome is encoded as a binary string of length L as described above. The population size \(N_p\) and the length of the chromosome L are chosen by the investigator.

  • A fitness function Each chromosome is evaluated by means of a positive real-valued function called fitness function. Since the \(BIC(\nu )\) may be negative, a natural candidate for the fitness function is an exponential transformation

    $$f_j(\nu ) = \exp {\{BIC_j(\nu )/d\}},$$

    where \(BIC_j(\nu )\) stands for the \(BIC(\nu )\) value for the jth chromosome in the current population and d is a scaling constant. For yet another appropriate fitness function we refer the reader to Gaetan (2000).

  • Generating a new population A new population of potential chromosomes is created, using evolutionary operators as: selection, crossover and mutation. This cycle continues until the maximum number of generations \(N_g\) is attained, or until a stop condition is reached.

For many variations of the basic GA and detailed explanations, see Goldberg (1989), Mitchell (1996), Sivanandam and Deepa (2008).

2.2 Forecasting with PAR model

Forecasting with PAR models proceeds in the same way than standard AR models. The objective is to obtain a forecast with the lowest possible error, leading to the minimum mean squared error forecast (MMSE). The MMSE forecast is given by its conditional expectation (Hipel and McLeod 1994).

Assuming that observations and innovations are known up to the n-th year and \(\nu\)-th season, one takes the conditional expectation of eq. (1) to obtain the MMSE forecast \(\hat{Y}_{ns+\nu }(l)\), where \(\hat{Y}_{ns+\nu }(l)\) is interpreted as the l-step ahead forecast at the forecast origin \(t=ns+\nu\). For example, the 1-step ahead forecast made at the origin \(t=ns+\nu\) is

$$\begin{aligned} \hat{Y}_{ns+\nu }(1)=\, & E\,\left[ Y_{ns+\nu +1}|Y_{ns+\nu },Y_{ns+\nu -1},\ldots \right] \\=\, & \phi _1\,(\nu )Y_{ns+\nu } + \phi _2(\nu )Y_{ns+\nu -1}+\cdots \\ &+\phi _p(\nu )Y_{ns+\nu -p+1} \end{aligned}$$

The causal representation of PAR models (Uesu and Duchesne 2009, Eq. 5) can be use to compute confidence intervals for forecasts but this is beyond the scope of this paper. The best general reference for confidence intervals in periodic models are Hipel and McLeod (1994, chap. 15) and Anderson et al. (2013). The forecasting performance of several time series models used in river flow analysis is presented in Noakes et al. (1985). Results suggest that the PAR models provide the most accurate forecasts.

3 Robust modeling of PAR models

As it is well-known, estimation methods may be seriously affected in the presence of additive outliers that are the most recording errors  (Bustos and Yohai 1986; Shao 2007). Additive outliers refer to a PAR process with probability \(1-\omega\) and a PAR process plus an error with probability \(\omega\). The occurrence of outliers is generally small (\(\omega \le 0.05\)).

Robust estimators based on robust autocovariances for ARMA models were proposed by Bustos and Yohai (1986). Their methodology was extended for multivariate PAR models by Ursu and Pereau (2014). The system of equations obtained in Ursu and Pereau (2014, Eqs. (6) and (9)) can be easily adapted for PAR processes.

Therefore, in order to reduce the influence of the residuals suspected to be outliers, the residuals \(\hat{\epsilon }_{ns+\nu }\) defined in Sect. 2 are replaced by their modified residuals \(\tilde{\epsilon }_{ns+\nu }\) defined as:

$$\tilde{\epsilon }_{ns+\nu } = \psi \left( \frac{\hat{\epsilon }_{ns+\nu }}{\hat{\sigma }(\nu )}\right) ,$$
(5)

where \(\psi\) stands for an odd and bounded function and \(\hat{\sigma }(\nu )\) is an robust estimator for \(\sigma (\nu )\). An usual choice for the \(\psi\) function is the Huber function:

$$\psi _{H,k}(x) = {\text {sgn}}{(x)}\min {\{|x|,k\}},$$

where k is a constant and \({\text {sgn}}{(x)}\) is the signum function. Generally, an iterative algorithm is proposed for ARMA models in Bustos and Yohai (1986) and for PAR models in Ursu and Pereau (2014).

In all the simulations, we did not experienced any problems of convergence with the iterative algorithm as in Ursu and Pereau (2014). Other recent works on robustness in periodic time series include the estimator of PAR models proposed by Sarnaglia et al. (2010). Also, a robust estimation for PAR models was discussed in Shao (2007).

4 Case study: the Garonne river

The PAR model is applied to the average monthly river flows and average quarter-monthly river flows of the Garonne river located in the southwest of France. This river goes down from its source in Spain to the Atlantic Ocean over 647 km. It is the main contributor to the Gironde Estuary which is the major European fluvial-estuarine system. The Garonne river is the third largest river in France in terms of flow, with a catchment area of 51,500 km\(^{2}\) at Tonneins, the outlet of the watershed.

The Garonne River is managed by the Adour-Garonne Water Agency in charge of the planning and design of water policies related to various issues as urban water supply, hydropower, irrigation management, flood and drought control, pollution, protection of endangered fishes, wetland and habitats conservation  (Baker and Vervier 2004; Muylaert et al. 2009; Oeurng et al. 2011; Maire et al. 2013). River authorities aim at controlling the quantitative management of water to achieve a good water status according the ecological requirements imposed by the European Water Framework Directive. To ensure a good functioning of the economic and ecological system, the Adour-Garonne Agency sets at different gauging stations of the river some minimal threshold values measured in cubic meter per second (m\(^{3}\)/s) for the river-flows. These thresholds have to be reached every year during low water periods between the 1st of July–October 31st. To comply with these flow objectives, water storage policies consisting in water saving measures and water releases from dams and reservoirs have to be implemented upstream the gauging stations. In a context of higher water pressure increased by the climate change  (Caballero et al. 2007, Boe et al. 2009),  Tisseuil et al. (2010) show significant reduction in river flows at different gauging station of the Garonne River and in particular at Tonneins. Based on a rainfall-runoff model coupled with a reservoir management system,  Hendrickx and Sauquet (2013) show that earlier filling of reservoirs will be necessary in winter to ensure that water releases during the warm periods will comply the flow objectives.

Flow measures are recorded at the latest gauging station of the Garonne river at Tonneins. Data are obtained from daily discharge measurements in cubic meter per second (m\(^{3}\)/s) from January 1959 to December 2010 (DIREN-Banque Hydro, French water monitoring). Daily data flows are then transformed in monthly data, respectively quarter-monthly data, consisting in flows averaged for one month, respectively from the 1st to the 7th, from the 8th to the 15th, from the 16th to the 22nd, and from the 23rd to the end of the month as in  Hipel and McLeod (1994). At Tonneins, the threshold value of the daily flow is equal to 110 m\(^{3}\)/s for the dry season between the 1st of July to October 31st which corresponds to periods 25 to 40.

Figure 1 plots the annual flows of the Garonne river between 1959–2010. It shows that several episodes of severe drought occurred in 1989–1990 and in 2005. This figure also shows that annual flows remained below the mean of 600 m\(^{3}\)/s for several years during the last decade.

Fig. 1
figure 1

Plot of average annual flows of Garonne river at Tonneins between 1959 and 2010 in cubic meter per second (m\(^{3}\)/s)

To capture the periodic pattern of river flows, average monthly and quarter-monthly flow series are analysed. Tables 1 and 2 show respectively the sample mean, median and standard deviation for each flow series. A partial plot of monthly and quarter-monthly surveys between 1980 and 2000 is given in Fig. 2.

Table 1 Sample mean, median and standard deviation of the average monthly flow series
Table 2 Sample mean, median and standard deviation of the average quarter-monthly flow series
Fig. 2
figure 2

Partial plot of average monthly and quarter-monthly flows of Garonne river at Tonneins between 1980 and 2000 in cubic meter per second (m\(^{3}\)/s)

Figure 2 shows that a periodic fluctuation in the means and variances is clearly displayed. For monthly data, river flows are higher in February and much lower in August. For quarter-monthly data, most peaks occurred during the 4th, 5th, 6th and 7th period and most troughs occurred in the 30th, 31th, 32th and 33th period. Maximum mean quarter-monthly discharge was observed at 48th period in 1959 as 4059 m\(^3\)/s, while a minimum mean quarter-monthly discharge of 55.86 m\(^3\)/s was recorded at 27th period in 2003. The driest periods for monthly and quarter-monthly data are in August (8th month) and in period 31 respectively while February (2nd month) and the 4th period are the wettest period for monthly and quarter monthly data.

Figures 3 and 4 display the pattern of the series over the period 1959–2010. For the driest periods, we observe that the 10-year moving average has increased from 1959 to 1975, then it was stabilized between 1975 to 1980 followed by a decrease from 1980 to 1990. By 1990, the 10-year moving average was around 110 m\(^3\)/s and seems to remain stable. A similar behavior is observed for the wettest periods. Changes in the average hydrological conditions in the Garonne river flows can be explained by several factors related to natural changes after the severe droughts of 1989–1990 or human activities with increased irrigated agriculture and urban growth. Since the mid of the 1990s, the stabilization of the 10-year moving average around the value of 110 m\(^3\)/s can be explained by the implementation of water management policies.

Fig. 3
figure 3

The driest monthly (August) and quarter-monthly flows (31th period) of Garonne river at Tonneins from 1959 to 2010. The long term mean (for August) is 161.27 m\(^{3}\)/s and 150.34 m\(^{3}\)/s for period 31.

Fig. 4
figure 4

The wettest monthly (February) and quarter-monthly flows (4th period) of Garonne river at Tonneins from 1959 to 2010. The long term mean for February is 984.7 m\(^{3}\)/s and 1022.39 m\(^{3}\)/s for period 4

Figure 5 shows how many periods the quarter-monthly flows remain below the threshold value of 110 m\(^{3}\)/s. Between 1959 and 2010, it occurs more frequently during periods 28 to 33 with a maximum of 20 times for period 31.

Fig. 5
figure 5

Histogram of driest quarter-monthly flows of Garonne river at Tonneins between 1959 and 2010

The last year of the data set (12 months or 48 observations) has been omitted for the selection and the estimation of the periodic model. To ensure that residuals of the fitted model are approximately normally distributed and homoscedastic, a Box–Cox transformation has been used (Eshete and Vandewiele 1992; McLeod and Gweon 2013). The Box–Cox analysis shows that residuals are more symmetrical when flow data are transformed in natural logarithms.

4.1 Unrobust modeling of river flows

In the unrobust case modeling, data have been centered by subtracting the seasonal means of the logarithmic series. The PAR model is fitted to the truncated series. The number of AR models used in PAR model is equal to the number of season associated to the choice of data set. It gives 12 AR models for monthly data and 48 models for quarter-monthly data. Note that with the monthly model, the number of possible models to be estimated is approximately \(4 \times 10^5\) and it increases to \(16 \times 10^5\) with quarter-monthly data. This large number of possible solutions suggest the use of GA techniques to reduce the space of solutions and to select the combination of parameters which gives the best model. For each season, the parameters of the identified AR model are estimated by using the least squares with linear constraints.

Using GA methods, only 29 and 112 parameters have been estimated for the 12 and 48 different AR models, respectively. The most complicated model for quarter-monthly logarithmic flows is obtained for the 48th season where an AR model with nine parameters was identified. With monthly logarithmic flows, the number of estimated parameters is six for the month of November.Footnote 1 A parsimonious PAR model may be obtained by using a single model for all seasons in a given group and therefore the number of parameters in a PAR models is decreasing (Hipel and McLeod 1994, chap. 14).

The proposed models for Garonne are then used to generate one-step-ahead forecasts for both logarithmic flow series. To obtain the forecasts in the untransformed domain we take the inverse Box–Cox transformation of the forecasts in the transformed domain. Figure 7 shows forecast and the observed data for the last year of the data set (2010). Figure 1 shows that the river flow in 2010 is about 400 m\(^{3}\)/s which is lower than the mean of the whole sample. It could be considered as a dry year.

4.2 Robust modeling of river flows

Figure 6 shows that several observations can be identified as outliers in the seasonal boxplots (according to the one and half inter-quartile range rule). Outliers appear to be more numerous in quarter-monthly logarithmic flows than in monthly logarithmic flows in the wettest period (from periods 3 to 7), but also at the end of the driest periods (from periods 35 to 42).

Fig. 6
figure 6

Box-plots of the monthly and quarter-monthly logarithmic flows of Garonne river at Tonneins

Contrary to the unrobust case, data have been centered by subtracting the seasonal medians of the logarithmic series instead of seasonal means. As indicated by Shao (2007), the seasonal medians are preferred due to the lower impact of outliers on the medians. The robust procedure described in Sect. 3 is then applied. As mentioned in Sect. 2, the best approach for identifying the AR parameters required in each season for the PAR model is to use the genetic algorithm techniques. It is worth emphasizing that the number of estimated parameters for monthly and quarter-monthly logarithmic flows were 23 and 108 respectively. The most complicated AR model (for one period) implies five parameters (in November) and six parameters (27th period) for the logarithmic monthly and quarter-monthly data, respectively.

The robust models for Garonne data are used to generate one-step-ahead forecasts for the average flow series. Once again the inverse Box–Cox transformation has been applied. Robust forecasts and the observed data for 2010 are shown in Fig. 7.

Fig. 7
figure 7

The observed data for the first 51 years appear in solid line (in red), robust forecast are in dashed line (in blue) and unrobust forecast are in dotted line (in black). The observed data for the next year (52) have not be used in the forecast. In a forecasts are for the 12 months while, b deals with forecasts over the 48 quarters. Flows are in m\(^{3}\)/s

The forecast accuracy of the unrobust and robust models is evaluated with respect to the following measures: the root mean square error (RMSE), the mean absolute error (MAE) and the mean absolute percentage error (MAPE). These measures are explicitly defined in Hyndman and Koehler (2006). The MAE and the RMSE between the proposed model and the observed data are calculated in the same units of the observed data. A smaller value indicates a better model performance. The MAPE measure is based on percentage errors. These criteria have to be interpreted only as an indication to which model performs better, but no statement can be made from this comparison. To test the null hypothesis of no difference in the accuracy of the proposed models, a Wilcoxon signed rank test for paired data may be used (Noakes et al. 1985). Besides RMSE, MAE and MAPE measures, we provide the relative index of agreement (rd), the coefficient of determination (\(R^2\)) and the Nash-Sutcliffe efficiency (NSe) for better comparison. These measures are defined in Krause et al. (2005). The rd index varies between 0 and 1. A value of 1 indicates a perfect match, and 0 indicates no agreement at all. The range of NSe lies between \(-\infty\) to 1. Essentially, the closer to 1, the more accurate model is. The range of \(R^2\) lies between 0 and 1 and a higher coefficient is an indicator of a better model. All the measures are calculated using the hydroGOF and forecast packages in R. For another measures of forecast accuracy we refer to Hyndman and Koehler (2006), Krause et al. (2005).

Table 3 Accuracy of one-step forecasts of Garonne river flows

Another monthly predictions can be derived from the aggregation of the quarter-monthly predictions. They are obtained by taking the average over the four periods of each month. Table 3 gives the measures RMSE, MAE, MAPE, rd, \(R^2\) and NSe for monthly, quarter-monthly and aggregated quarter-monthly predictions. Results show that the robust model is better with respect to all criteria for the quarter-monthly and aggregated quarter-monthly data. On the contrary, the unrobust model seems to perform better for monthly data. This can be explained by the low number of outliers (equals to 11) but even in this case the robust model may be preferred due to the principle of parsimony. Results show that the number of estimates is lower in the robust case. Moreover the behaviour of the robust estimates was in most cases reasonable even if the outliers are missing as shown by Ursu and Pereau (2014) using simulation.

5 Conclusions

Accurate forecasting of river flows is one of the most important applications in hydrology, especially for the management of reservoir systems. The accuracy of the forecast analysis is important for the river management authorities to achieve minimal flow objectives during the driest seasons. Based on a robust modeling approach for the identification and the estimation of PAR time series model, this paper provides an application to the Garonne River over the period 1959–2010. To deal with the problem of large number of parameters needed to be estimated especially with quarter-monthly models, an automatic method using genetic routines has been developed. Results show that detection of outliers is higher with quarter-monthly flow data than monthly data, implying better robust estimators than the least square (unrobust) estimators. Results show that 1-year forecasts are better in quarter-monthly robust models rather than unrobust models. The aggregated quarter-monthly forecasts also shows better performance than the monthly forecasts in the robust analysis.

Our analysis also suggests to integrate the evolutionary algorithm for optimization of reservoir operation. As shown by  Wang et al. (2014), the use of more sophisticated methods as genetic algorithm or support vector machine may improve the quality of hydrologic prediction over the classical autoregressive models. Future research dealing with periodic threshold models should be done to improve river flow fitting and forecasting. In particular, data measures suggest a structural change in the hydrological regime around the year 1990. This requires to develop a new PAR modeling approach with endogenous switching points as in Koutroumanidis et al. (2009).

To address the question of the homoscedastic properties of streamflow residuals which is an important question in hydrological modeling, future research is need to implement a robust portmanteau test in PAR models. Such a robust test exists for ARMA time series models (Li 1988) but not for PAR models.  McLeod (1994) has developed a non robust test in PAR models. The residuals properties can also be better captured by combining periodic models with the extreme value theory as in Tesfaye et al. (2006).