Introduction

In hydrological modeling, there are multitude of mathematical statistical or stochastic models (limps black): conceptual, total, or distributed. In this work, we opted for the class of total conceptual hydrological models, such as the model GR4J (Génie Rural à 4 paramètres Journalier) developed by the CEMAGREF (Perrin 2002; Perrin et al. 2003). The interest of this model is that it does not require a detailed description of the watershed. The main input data are precipitations and evapotranspiration simple measures with few parameters to be calibrated. The main advantage of this model is its continuous operation, which ensures throughout the year a perfect accountancy between the arrived water on the basin and the one that comes out and performs monitoring of the overall humidity level. Indeed, (Andréassian et al. 2001) emphasized that optimizing the Excel solver function is not satisfactory, and advised to develop other optimization methods to improve the performances of the model.

The aim of the work was to use new methods of mathematical optimization (genetic algorithm and Gauss-Newton methods) to determine the characteristic parameters of the GR4J model for our gauged watershed in semi-arid context. These parameters will then be used to determine the evolution of water resources from the model outputs for other ungauged watershed of the same climatic context.

Presentation of the studied area

The Ourika watershed is a subbasin that belongs to the great Atlas Tensift watershed in the Marrakech region of Morocco (Fig. 1, realized in ArcGis to the basis of a MNT “Module Numérique du terrain” of 30 m). It lies between the 7o53’ longitude, 7o35’ West, 31o20’ latitude and 31o4’ North.It covers 503 km2 of the surface at the outlet of Aghbalou. Because of its size and relief, the Ourika basin is characterized by a highly differentiated climate from an area to another. Aridity indices show that the Ourika watershed is located in a semi-arid sub-humid zone

Fig. 1
figure 1

Location map of Ourika watershed (High Atlas, Morocco)

The region of Ourika is famous for the abundant and high reliefs. Seventy-five percent of the basin area is located between 1600 and 3200 m. Rainfall in this area is often convective characterized by a short duration, high intensity, and spatial heterogeneity (Said et al. 2006, 2010), where the annual average of rainfall is 532 mm at Aghbalou’s station.

Material and methods

Database

The input data (daily) of GR4J model are precipitations and the evapotranspiration (ETP); besides, the output data are the flows. Noting that rainfall amounts data come from the river basin agency of the Tensift (ABHT) for the period between 1970 and 2010. While the measurements are performed at eight rainfall stations (Fig. 1). The flows are provided from those of the Aghbalou station at the outlet of Ourika watershed for the same period. These are an obtained daily data from the water level on the redden limnométrique scale and processed using empirical calibrations curves. The choice of the rating curve is adjusted by means of control gauging made every month or after a flood event.

Due to the lack of data in the area studied, we needed to calculate the potential evapotranspiration (PET) model of sensitivity influence, and we used the PET calculated at station Lalla Takerkoust which has a similar climate to the Ourika basin.

The evaluation of 27 formulations of potential evapotranspiration (ETP) for rainfall-runoff modeling applications, (Oudin et al. 2005) led to the development of a simple and efficient ETP formula to achieve better results than all the formulations tested, in terms of flow rate restitution. The formulation is detailed by Oudin et al. (2005). It uses only the input temperature. It also uses the calculation of radiation extraterrestrial detailed by Morton (1983).

This PET is used as input to the hydrological model and calculated from the formula established by Oudin et al. (2005) with the following equation:

$$ \text{PET}\,=\, \frac{Re}{\lambda \times \rho}\frac{T_{\text{moy}}+5}{100}\,\,\, \text{if}\,T_{\text{moy}} (j) + 5 >0; \quad \text{else} \quad \text{PET}\,=\,0, $$
(1)

where λ is the latent heat of vaporization of water (2.25 MJ.kg−1) and ρ is the density of water (1000 kg.m−3). We denote the extraterrestrial radiation by Re (MJ.m−2.j−1) and the temperature of the air in the basin (°C) by T moy(J), the day considered.

Error data sources, precipitations and flows, exist in the initial crude measures of this data point. These uncertainties come primarily from the following:

  • temporal resolution, maintenance of the device, and exposure;

  • the effects of wind and obstruction by outsiders;

  • the measured height of rain;

  • the location of the measuring point. But the major uncertainties come from the collection of such data protocol in terms of quality and not quantity.

Description of model GR4J (version Perrin et al. 2003)

The GR4J model (Génie Rural à 4 paramètres Journalier) is a lumped rainfall-runoff model. It aims to ensure robust rainfall-runoff simulations to be reliable to use for resource management applications. The development of this model was initiated at CEMAGREF in early 1980 and had several versions proposed successively by Edijatno and Michel (1989), Edijatno (1991), Nascimento (1995), Edijatno et al. (1999), Perrin and Michel (2002), Perrin (2002), Perrin et al. (2003), Michel and Mailhol (1989), Michel et al. (2003), and Oudin et al. (2005). These versions have contributed to the gradual improvement of this model’s performance (Morton 1983). The rain transformation into a flow in the GR4J model is carried out by means of two reservoirs and a routing production (Fig. 2)

Fig. 2
figure 2

The architecture of model GR4J by (Perrin et al. 2003)

Before applying the genetic algorithm to optimize the parameters of the model, we first programmed the GR4J model in Matlab through all nonlinear model equations. This program is used to calculate the flows at the outlet and for the applications of optimization algorithms.

Optimization methods

In the parameter optimization phase, we adopted two methods: the first method genetic algorithm (GA) associated with the second method (Gauss-Newton) programmed in Matlab. The result of the GA (probabilistic method) is used as a starting point for a local optimization procedure. That is achieved by the method of Gauss-Newton (deterministic) method to determine the optimal parameter values in the calibration of the rainfall-flow model.

Genetic algorithms

The GA represent a method used in optimization problems, based on techniques from genetics and evolutionary mechanisms in nature crossover, mutation, and selection (Filho et al. 1994; Goldberg 1989; Sawadogo et al. 2015).

  • Coding and creation of the initial population

    The first step in the functionning of the GA is, then, the generation of an initial population. Each member of this population encodes a possible solution to a problem. Here, we subdivided the eligible field to several sub-fields. Then, the initial populations were seeded randomly using the uniform law in each field, that made the distribution within the population being diversified and useful for accelerating convergence when the user had knowledge of the search space. A constant notation to be used throughout the paper states that n describes the size of the population, and as a final step, we create a table of n variables.

  • Selection

    We are based on one of the most widely used ordinal selection schemes which is the tournament selection introduced by Goldberg (1989). The parents are selected according to their performance. By the selection stage, the user is expected to have solved for a row of vector of fitness values f of size n × 1. Then, the probability of each solution i being selected is defined as follows:

    $$ p=\frac{f_{i}}{\sum\limits_{j=1}^{n} f_{j}}. $$
    (2)
  • Crossing operation

    After reproduction phase, population is enriched with better individuals; it makes clones of good strings but it does not create new ones. Here, we could talk about the recombination operation or the crossover which is the principal key to the power of the genetic algorithm. In this method, we need to get a child population of size n, so we use the barycentric crossing which yields to selecting two genes P 1(i) and P 2(i) from each parent in the same position i. Then, they define two new genes C 1(i) and C 2(i) by linear combination

    $$\begin{array}{@{}rcl@{}} C_{1}(i)=aP_{1}(i)+(1-a)P_{2}(i); \end{array} $$
    (3)
    $$\begin{array}{@{}rcl@{}} C_{2}(i)=(1-a)P_{1}(i)+aP_{2}(i), \end{array} $$
    (4)

    where a ∈]0,1[; then, we crossed the whole mother population to get finally the child population of size n.

  • Mutation operation

    One of the most important factors creating genetic load is mutation where the general rule for mutation operators is that the only mutate; this means that an independent copy must be made prior to mutating individual, if the original has to be kept. In order to apply it (here a gaussian mutation) on the individual, we follow those steps. First of all, an individual x will be selected under a probability p. If this probability is lower than mutation probability p m , one adds centered normally distributed gaussian noise to x; it means that one replaces x by x + ε, where ε denote the random value obtained according to the Gauss’ law. Then, the newly created individual replaces the former one, if it is better and exists in the acceptable field.

GA is distinguished from the other techniques of optimization by four characteristics (Filho et al. 1994; Goldberg 1989):

  • they use a coding of parameters and not parameters themselves;

  • they are applied on a population of individuals (or solutions);

  • they use only the function values to be optimized, not the derivative, or other auxiliary knowledge;

  • they use probabilistic transition rules and not deterministic.

The algorithm used by GA as:

figure a

Indeed, application of GA in our study is based on the function f = f(X 1..., X n ) where X i denotes the debit, whose purpose is to determine the optimal set of parameters (X i ) that minimizes the function f that represents the difference between the observed and calculated flows in a space of given parameter (Fig. 3).

Fig. 3
figure 3

The process of optimization by genetic algorithm

The Gauss-Newton method

In mathematics, the Gauss-Newton algorithm is a method of problem-solving nonlinear least squares. In our case, it represents the difference between the observed rates and calculated ones. The function to be minimized usually takes the form as follows:

$$ g(x)=\frac{1}{2} \sum \limits_{i=1}^{m} g_{i} (x)^{2}. $$
(5)

Local methods use iteratively a strategy where we start from a point in space with the parameters obtained by the genetic algorithm, and which is moving in a direction that improves continuously the value of the criterion function, until the time when it generates no further improvement. The parameters found correspond to the optimum function.

The algorithm used was as follows:

figure b

Model evaluation criteria

The performances of our model have been evaluated by the following numerical criteria:

  • Nash criteria: dimensionless criterion proposed by Nash and Sutcliffe (1970) is defined by the following:

    $$ \text{Nash}=(1-\frac{\sum \limits_{i=1}^{n}(Q_{\text{obs}}-Q_{\text{sim}} )^{2} }{\sum \limits_{i=1}^{n} (Q_{\text{obs}}-\overline{Q_{\text{obs}}} )^{2} }).100, $$
    (6)

    where Q obs and Q sim are respectively observed and calculated flow rates during the calibration period and \(\overline {Q_{\text {obs}}} \) is the average of the observed. If Nash ≥ 70%, the fit is good; for if against Nash < 70%, the rate calculated by the model shows a poor estimate of the simple average flow.

  • RMSE: the average quadratic error (”Root Mean Squared Error”):

    $$ \text{RMSE} = \sqrt{\frac{1}{n} \sum \limits_{i=1}^{n} (Q_{\text{obs}} - Q_{\text{sim}})^{2}}. $$
    (7)

    The lower value of RMSE is, the lower error simulation flow is low.

  • Bilan : this criterion compares the performance of the model from one period to another.

    $$ \text{Bilan} = \frac{\sum\limits_{i=1}^{n} Q_{\text{sim}}}{\sum \limits_{i=1}^{n} Q_{\text{obs}}}. $$
    (8)

    A value of one indicates a perfect record. A value greater than one indicates an overestimation of the record and an underestimate if the value is lower than one.

Results and interpretations

Treatment of model input data

An important part of the work was devoted to the acquisition, processing, and data analysis. The quality and availability of data pose some problems for the application of the model. Compilation of data was performed. It consists in the correlation between the flow rate and the precipitations of each station on the one hand, and in the other hand, with the spatial rainfall by taking the maximum of rain recorded in the various stations (Fig. 4).

Fig. 4
figure 4

Change in flow and precipitation in different stations of Ourika basin during the period 1989–2010

We notice that spatial rainfall, with the use of the weighted method Theissen polygons, is very low to generate the flows observed at the outlet. This is also the case for the other stations except for Aghbalou. For this latter, precipitations better follow the variations of flow, indicating that the station may claim a representation of the basin; however, the flows generated by these precipitations are undervalued compared to observed flows. This has prompted us realized several model calibration tests with the Aghbalou rain that increased from 10 to 40% in steps of 5%. We notice that the input data that has been better represented in our study area once the Aghbalou precipitations increased by 20%.

According to the flow of Aghbalou, we note the existence of several periods that knows rates of registration problems and will be taken into account in the calibration of the model; the following problems met during those periods:

  • Periods following the floods where measured at station is distributed either by digging or by filling the section with contributions from the upstream basin. In this case, the measures are usually disturbed until the repair section.

  • Periods that represent picks are instantaneous flow rates with increase and decrease without the presence of precipitation in all the stations.

Calibration of GR4J model

The optimization of the settings model GR4J was performed on selected years earlier. The results are summarized in Table 1. We see that the calibration parameters for most years are very close to each other and give good results (each peak corresponds to the peak of precipitations simulated flow) (for example, see Fig. 5) except for the years that knew the problems which were already mentioned. The Nash criterion for most of the year is higher than 70%; there are small errors in the relatives Bilan 1.

Table 1 Results of GR4J model calibration in the Ourika basin
Fig. 5
figure 5

Variation of the observed and simulated flows from the calibration of the GR4J model for years 1970−1971 and 2003−2004

Model validation

Validation of the model also known as a control phase is performed on a different period of the calibration phase. If the model satisfies this criterion, it is considered as satisfactory interpolation tool to represent the dynamics of the basin, in the event of a stationarity behavior in the absence of long-term climate change. Validation of the model is verified by a comparison of simulated flow and observed flow through a quality criterion. Model evaluation is an integral part of the model development process. It helps to find the best model that represents our data and how well the chosen model will work in the future. To evaluate model performance, we used hold-out in this method; the most large dataset is randomly divided to three subsets: training set, validation set, and test set.According to the results of calibration, we note that the 1970–1071 and 2003–2004 years were the best years for the validation of their settings on other years. Table 2 shows the validation results for both cases of fitting parameters. Similarly, Fig. 6 shows some graphic examples of model validation.

Table 2 Model validation results GR4J by the parameters of the years 1970−1971 and 2003−2004 in the Ourika basin
Fig. 6
figure 6

Examples of validation of the GR4J model in the Ourika basin

Based on those results, we note that in the majority of the years, the simulated flow is underestimated compared to the flow observed during the period from March to July. This offset corresponds to the period of snowmelt which is not taken into account in the GR4J model. So the introduction of a snow model is needed which prompted us to complete the data using the CemaNeige model.

Simulation with CemaNeige model

CemaNeige is a model which takes into account the snow with two parameters, developed by Valéry (2010). The purpose of using this model is to calculate the resulting water lamina of snowmelt form during the period March to July. The calculation is performed by a loop of each altitude area for each one the temperature, and precipitations of the basin are extrapolated using altitudinal corrections optimized. The amount of water calculated is added to the input rainfall in the GR4J model. Thus, we can better evaluate the changes brought by the snow module.In our watershed, the main station which measures the snowfall is Oukaimden where chronic temperature and precipitations data are only available after 2003 (the application of the model is only for years after 2003 according to data availability). Table 3 and Fig. 7 clearly show the improvement of the model when we introduced snowmelt rain. This remarkable gain in the evaluation criteria after the introduction of the module snow in the model confirms the existence of a powerful influence in the temporary snow cover in the operation of the pool.

Table 3 Results of validation before and after the incorporation of the snow
Fig. 7
figure 7

Validation of GR4J model before and after the introduction of the snowmelts for 2002–2003 and 2005–2006 in the Ourika basin

Validation of the model over a long period

Even if we did not make the calibration for a long time, we tried to apply the validation over a period of 4 years that the data allows. The results obtained were very satisfactory (Table 4 and Fig. 8).

Table 4 Validation results GR4J model over long periods
Fig. 8
figure 8

Variation of observed and simulated flow on GR4J model for long periods in the Ourika basin

A later study of rainfall-flow modeling with GR4J model in Ourika’s watershed was conducted by Faouzi (2006). This study, based on the model programmed in Excel provided by CEMAGREF and in which the optimization process is the function ’solver,” gave very weak Nash criteria with a maximum value which is 48%. In our case, the programming model using mathematical optimization methods (genetic algorithm and Gauss-Newton method) allowed us to find criteria Nash with values above 70%, which shows a clear improvement of its reliability.

Conclusion

Modeling Ourika basin which is characterized by spatio-temporal heterogeneity of hydro-climatic properties requires the development of a robust and simple tool with few parameters. For this reason, we opted for GR4J model that represents a conceptual global model with a reservoir operating on the dough of the daily time and to which we combined optimization methods (GA and Gauss-Newton method).In front of all complex climatic conditions in this basin that influence the quality and quantity of input data, we find that the optimization methods show clearly their reliability comparing the “solver” to the proposed GR4J model.Similarly, it seems that this coupling is able to offer a very satisfactory simulation of flow rates without forgetting the integration of snow component in this model.The GR4J model can be at once a useful tool to aid decision in water resources management in the basin, and also seems to be extrapolated for ungauged watersheds in the calibration parameters. which represents a delicate even impossible task.