1 Introduction

The prediction of lake water level is a fundamental requirement in water management activities, such as the tasks relevant to irrigation and drinking water supply management and decision-making, fishing, tourism, rail road transportation and many other forms of recreational and socio-economic activities (Altunkaynak and Şen 2007). Recently, a number of artificial intelligence models have shown an extensive ability to model the lake, river system or streamflow water levels without the need for experimental apparatus and complex hydro-physical models based on physical principles and mathematical equations. A good search of existing literature shows that these investigations included the applications of the artificial neural network model (Altunkaynak and Şen 2007; Çimen and Kisi 2009; Kişi 2009a, b; Yarar et al. 2009; Kakahaji et al. 2013; Buyukyildiz et al. 2014; Khatibi et al. 2014; Deo and Şahin 2016; Yaseen et al. 2016; Prasad et al. 2017), gene expression programming and chaos theory model (Khatibi et al. 2014), triple diagram model (Altunkaynak et al. 2003), support vector machines (Çimen and Kisi 2009; Hipni et al. 2013; Buyukyildiz et al. 2014), adaptive neuro fuzzy inference systems (Yarar et al. 2009; Hipni et al. 2013; Buyukyildiz et al. 2014), local linear neuro fuzzy model (Kakahaji et al. 2013), neural networks trained with the particle swarm optimization algorithm (Buyukyildiz et al. 2014), neuro-wavelet (Kişi 2009b), feedforward back propagation neural network, generalized regression neural network (Vaheddoost et al. 2016), radial basis function neural network (Buyukyildiz et al. 2014; Vaheddoost et al. 2016) and fuzzy logic model (Altunkaynak and Şen 2007). The validation of non-decimated, maximum overlap discrete wavelet transform algorithm (i.e., moDWT) integrated with artificial neural networks (Prasad et al. 2017) and extreme learning machines compared with an artificial neural network model (Deo and Şahin 2016) have also been demonstrated for river water level and river discharge predictions. In spite of their successful applications, several research works have shown a very wide range of forecasting accuracies that in fact, have varied in respect to the geographic features of the tested sites, as shown in earlier studies (Deo and Şahin 2016; Prasad et al. 2017) where models tested at the study sites with different climatic and hydrological patterns were compared. Moreover, the search for a ‘one-size-fit’ it all forecast model for solving hydrological modelling problems remains an open contribution to be made to the existing literature, since no universal model currently exists for all types of climates and regions (Mishra and Singh 2011; Deo et al. 2017a). Yet, a very motivating question for hydrological modelers working in the area of water resources management and lake-hydrology studies is: does a hybrid intelligent model integrated with an optimization algorithm boost the predictive performance?

Attempting to seek answers to this question, a significant challenge faced by the hydrological system modelers is the proper selection of the input combinations for the model development process that is known to govern the predictive accuracy of the objective predictive model (Galelli and Castelletti 2013; Abbot and Marohasy 2014; Deo and Şahin 2016; Quilty et al. 2016), but moreover; how the information from these input data for different and geographically diverse sites are extracted in terms of the optimal black-box structure of models to yield the best model, remains a challenging problem for engineering and environmental applications. Often, modelers have utilized all available data incorporated in a standalone model without applying an optimization scheme for the best feature extraction process. The inclusion of an optimization scheme in a predictive model can thus determine the utilization of the most suitable data features (i.e., weights and biases), whilst acting as a primary tool necessary to improve the performance of the final model (Sedki and Ouazar 2010; Long and Meesad 2013; Quilty et al. 2016).

The Firefly Algorithm (FFA) applied in this paper, can be adopted as a potential optimization method, since this tool has been shown to be an ideal utility used to enhance the performance of the artificial intelligence-based models in a number of recent studies, particularly in hydrological modelling (Ghorbani et al. 2017a, b; Raheli et al. 2017; Yaseen et al. 2017). In its principle form, the conceptual theory of FFA, as originally proposed by Yang (2010b), is inspired from the flashing behavior of the fireflies. Other than the area of hydrological modelling, FFA has been applied successfully in many other fields: for example, in electrical load forecasting (Kavousi-Fard et al. 2014), the optimization of soil water characteristics (Fu et al. 2015), the prediction of soil moisture (Ghorbani et al. 2017b), the forecasting of river water quality (Raheli et al. 2017), the selection of data features to be used as model inputs (Emary et al. 2015) and other engineering design and optimization problems (Yang 2010a; Nascimento et al. 2013; Kazemzadeh-Parsi 2014; Talatahari et al. 2014). In many of these problems, the FFA was seen to lead to a significant improvement in the considered models used to solve the respective prediction problem. In a recent study (Garousi-Nejad et al. 2016), the results of a modified FFA-based artificial intelligence model was evaluated through a plethora of models with intelligent optimization algorithms, aimed to tackle multiple reservoir operation problems. At the conclusion of this study, the modified FFA model was seen to perform very effectively in comparison with the other optimization schemes However, to the best of the authors’ knowledge, there has been no previous research locatable in the existing literature that has investigated the capability of the Multilayer Perceptron-based FFA hybrid model used for lake water level prediction.

In this paper, an integration of the popular ANN-based model, the Multilayer Perceptron (MLP) with the Firefly Algorithm (FFA), has been undertaken in order to construct a hybrid predictive model for lake water level modeling. In the most general way, the standalone MLP model (where no ‘add-in’ optimizer is used) is a frequently applied artificial neural network tool. The present literature shows many successful applications of the standalone MLP model. These include the prediction of suspended sediment particles (Afan et al. 2015), prediction of significant wave heights using the geno-multilayer perceptron (Altunkaynak 2013), pan evaporation prediction using the hybrid MLP-FFA model (Ghorbani et al. 2017a), uncertainty assessment with the hybrid MLP-FFA model for the prediction of biochemical oxygen demand and dissolved oxygen concentration (Raheli et al. 2017), the prediction of rainfall patterns using the hybrid data intelligent model that was designed with the ANFIS-FFA modelling framework (Yaseen et al. 2017) and the prediction of wind speeds (Deo et al. 2017b). Different studies also used the standalone MLP model for solving other environmental problems such as the estimation of pan evaporation (Kişi 2009a; Tabari et al. 2010), comparison of soil and water assessment tools for watershed regions (Singh et al. 2012), prediction of daily outflow with log logistic and tangent sigmoid activation functions (Zadeh et al. 2010) and the forecasting of urban water demand with a wavelet-based MLP model (Adamowski and Karapataki 2010; Tiwari and Adamowski 2013). In respect to the lake water level prediction problem, the study of Kişi (2009b) developed an ANN-based model and a wavelet conjunction ANN model for the prediction of monthly water level fluctuations for a lake in Turkey, whereas the study of Çimen and Kisi (2009) compared two data-driven techniques (i.e., SVM and ANN) for modeling the lake water level in Turkey. The latter study found greater accuracy of the SVM model over the ANN model. The study of Güldal and Tongal (2010) applied the Recurrent Neural Network (RNN) and an ANFIS model for the prediction of water levels in Lake Egirdir in Turkey. While these studies and the others have demonstrated the efficacy of the artificial intelligence-based models for lake water level prediction, the application and validation of an integrated Multilayer Perceptron-FFA model for the same purpose has not been undertaken.

To fulfill the gaps in current literature, the aim of this study is to investigate the applicability of the hybrid MLP-FFA model versus a standalone MLP model used for the prediction of lake water level time series where the hybrid predictive model has been constructed by the integration of the standalone MLP algoirthm with the Firefly Optimizer algorithm. The overall purpose, of course, is to evaluate the preciseness of the integrated hybrid model and the predictive ability of the optimizer (FFA) with particular attention to the data Lake Egirdir over the study period 1961–2016. The rest of the paper has been structured as follows. Section 2 outlines the basic theoretical principles of the predictive model and the optimizer algorithms utilized and a concise description of the research methodology, Sect. 3 presents the results and discussion explaining the application and benefits of the integrated MLP-FFA model used to predict lake water levels and Sect. 4 concludes the findings and limitations of this research paper.

2 Methodology

2.1 Multilayer Perceptron Neural Network (MLP) model

Since its advent in in 1940s, an artificial neural network (ANN) has been adopted as a popular class of neural network models (McCulloch and Pitts 1943). The ANN consists of parallel information processing system with a set of neurons arranged in the hidden layers (McClelland and Rumelhart 1989). The Multilayer Perceptron (denoted as ‘MLP’) which is an explicit form of the conventional ANN model, consisting of a three-layer neuronal framework with an input section (where the data are fed into the primary predictive model), hidden layer (where the data features are extracted to construct a predictive model) and an output layer (where the predicted results are generated and evaluated). Within the hidden layer, the Levenberg–Marquardt backpropagation learning algorithm is a popular MLP architecture that has been adopted for the extraction of the data features contained within the model’s inputs. Figure 1 shows a schematic view of the MLP modelling framework. The neurons are connected by appropriate weights in each of the hidden layers to the neurons in contiguous layers during the training process. In this study, the sigmoid and the linear activation functions, which are commonly utilized equations for feature extraction and modeling purposes (Deo and Şahin 2016; Deo et al. 2017c; Fahimi et al. 2017), have been utilized in the hidden and the output layers, respectively. For more details on the MLP model structure and components, the readers can consult the study of Ghorbani et al. (2013).

Fig. 1
figure 1

A configuration of the multi-layer perceptron (MLP) neural network model (Ghorbani et al. 2013)

2.2 Hybrid MLP-FFA model

The prediction of lake water level in this research paper is based on the hybrid Multilayer Perceptron (MLP) model integrated with the Firefly Algorithm (FFA) as an add-in tool utilized to train the standalone MLP model. The nature-inspired FFA method was originally developed by Yang (2010a, b) as an extension of the swarm intelligence optimization techniques based on the movement of the fireflies. In this optimization approach, the solution of a problem can be assumed as agent i.e., the firefly which glows in proportion to its quality. Consequently, each brighter firefly is able to attract its partners, regardless of their sex, which makes the exploration of the search space more efficient (Łukasik and Żak 2009). As fireflies are attracted towards the light, the entire swarm moves towards the brightest firefly, which can be applied conceptually in a predictive model to solve the optimization problem. In this method, the attractiveness of the fireflies is directly proportional to their brightness and the brightness, in fact, depends on the intensity of the agent (Kayarvizhy et al. 2014). A major challenge of the firefly algorithm is therefore, the construction of the objective function and the distinction of the light intensity. The primary set-up variables of the FFA formulation are the light intensity I(r), the attractiveness \(\left( \beta \right)\), and the Cartesian distance between any two fireflies i and j at \(x_{i} {\text{and}} x_{j}\) respectively, that best can expressed (Yang 2010a, b):

$$I\left( r \right) = I_{O} \exp \left( { - \gamma r^{2} } \right)$$
(1)
$$\beta \left( r \right) = \beta_{O} \exp \left( { - \gamma r^{2} } \right)$$
(2)
$$r_{ij} = x_{i} + x_{j} = \sqrt {\mathop \sum \limits_{K = 1}^{d} \left( {x_{i,k} - x_{j,k} } \right)}$$
(3)

where \(x_{i,k}\) is the kth component of the spatial coordinate \(x_{i}\) of the ith firefly, \(\gamma\) is the light absorption coefficient, d is the dimensionality of the given problem, I(r) and \(I_{O}\) are the light intensity at distance r and initial light intensity from a firefly, \(\beta \left( r \right) {\text{and }}\beta_{O}\) are the attractiveness \(\beta\) at a distance r and r = 0.

The next movement of the firefly denoted as i can be represented as (Yang 2010a, b):

$$x_{i}^{i + 1} = x_{i} + \Delta x_{i}$$
(4)
$$\Delta x_{i} = \beta_{O} {\text{e}}^{{- {\gamma r}^{2}}} \left({x_{j} - x_{i}} \right) + \alpha \epsilon_{\text{i}}$$
(5)

In Eq. (5), the first phase indicates the attraction whereas the second phase denotes the randomization process while the term \(\alpha\) controls the randomization values that are located between 0 and 1 and \(\epsilon_{\text{i}}\) represents the random number of the Gaussian distribution (Ch et al. 2014).

In this research paper, the optimal values for the weights determined by the MLP model have been computed by the FFA algorithm where the final model aimed to optimize the magnitude of the weights dependent on the features that were present in the training dataset. Figure 2 shows the process utilized to obtain the optimal weights of the MLP model that has been integrated with the FFA. The modeling process firstly involved the determination of the input variables based on the Average Mutual Information (AMI) of two random variables (i.e., the present and the historical lake water level) as a measure of their mutual dependence, utilizing the role of memory to aide in the prediction of lake water level. Here, the AMI was applied quantify the “amount of information” present in the historical water level training data to be used as a predictor variable for the future water level value. Secondly, the data were fed in the Firefly Algorithm after a selection of the best set of input combinations based on their level of congruence with the target variable (i.e., lake water level) that was evaluated through the objective function (i.e., the minimum root mean square error generated in the training phase). The selected inputs were thus employed into the integrated MLP-FFA hybrid model to generate the predicted values of lake water level in the independent tested dataset.

Fig. 2
figure 2

A flowchart of the hybrid multi-layer perceptron-Firefly algorithm (MLP-FFA) model structure

2.3 Finding the optimum lag time series for the model

Prior to utilizing the data into the integrated MLP-FFA model, the historical lake water level data must be analyzed to determine the optimum lag on the time-scale for the best predictor-target matrix. This was performed by applying the Mutual Information (MI) approach, a method applied widely in linear and nonlinear correlation analysis and variables selections (Wang et al. 2010; Lee and Kim 2015). Assuming that there was a sequence of time series records denoted as:\(\left\{ {x_{1} , x_{2} , \ldots ,x_{t} , \ldots ,x_{n} } \right\},\) the mutual information computed for this series is likely to indicate the amount of information that are contained about the incremental state \(x_{t + \tau }\) if the state of \(x_{t}\) is known. Consequently, the AMI can be defined by its coefficient I (τ):

$$I\left( \tau \right) = \mathop \sum \limits_{t = 1}^{N - \tau } P\left( {x_{t} ,x_{t + \tau } } \right).\log \left( {\frac{{P\left( {x_{t} ,x_{t + \tau } } \right)}}{{P\left( {x_{t} ).P(x_{t + \tau } } \right)}}} \right)$$
(6)

In Eq. (6), \(P\left( {x_{t} } \right)\) is the probability density of the series \(x_{t}\), and the \(P\left( {x_{t} ,x_{t + \tau } } \right)\) is the joint probability density of the original series \(x_{t}\) and the time-incremental \(x_{t + \tau }\). In accordance with Eq. (6), the first local minimum of \(I\left( \tau \right)\) is then used to estimate the selection for the lagged time-series that are required for the input selection in the objective model (De Domenico et al. 2013).

2.4 Study area, data and performance evaluation criteria

In this research paper, historical changes in monthly lake water level for Lake Egirdir, Turkey has been selected as a case study to implement the proposed hybrid MLP-FFA model. Lake Egirdir is located in south-western end of Turkey. This lake has long been used as primary source of water for agriculture and drinking resources (Fig. 3). It has an average depth of 9 m, an average surface area of 470 km2, located 916 m above sea level, and extended to about 50 km in width to the northern part of the Egirdir County region. In this study, a total of 658 monthly records of the lake’s water level time series data between the period 1962 and 2016 years have been used to construct the predictive models.

Fig. 3
figure 3

The location of Lake Egirdir and the characteristics of data modelled in this study

Figure 3 displays the basic statistical characteristics of these data, within the training phase, testing phase and the whole dataset for Lake Egirdir. Included in this figure is also the variation in the water level time series for the period between October 1961 and July 2016. Except for the standard deviations, skewness and flatness factors, the other statistical values were found to be nearly identical for the training and testing datasets, which also matched the values within the full dataset. It is noteworthy that the measured data from the month of October 1961 to the month January 2000 (i.e., a total of 460 records or 70% of the full dataset) were used in the training phase and the data from February 2000 to July 2016 (i.e., a total of 198 records or 30% of the full dataset) were allocated to the testing phase while constructing the designated predictive models.

To evaluate the performance of the hybrid and the standalone modeling approaches, the following statistical score metrics have been used:

  1. 1.

    Nash–Sutcliffe efficiency coefficient (NSE) (Nash and Sutcliffe 1970) expressed as:

    $$NSE = 1 - \left[ {\frac{{\sum\nolimits_{i = 1}^{N} {\left( {Q_{{WL_{obs,i} }} - Q_{{WL_{pred,i} }} } \right)^{2} } }}{{\sum\nolimits_{i = 1}^{N} {\left( {Q_{{WL_{obs,i} }} - \mathop Q\limits^{\_\_}_{{WL_{obs} }} } \right)^{2} } }}} \right], \quad 0 \le NSE \le 1$$
    (7)

    Nash–Sutcliffe efficiency coefficient (NSE), used for the evaluation of hydrological models, is widely known to represent an improvement over the correlation coefficient since it is sensitive to the differences in the observed and the predicted means and variances in the testing phase. The range of NSE lies between 1.0 (perfect fit) and − ∞. A lower than zero efficiency indicates that the mean value of the observed time series would have been a better predictor than the model that is being tested (ASCE 1993; Krause et al. 2005).

  2. 2.

    Root mean square error (RMSE) expressed as (Chai and Draxler 2014):

    $$RMSE = \sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {\left( {P_{i} - Q_{i} } \right)^{2} } }$$
    (8)

    RMSE can provide a more balanced evaluation of the goodness of fit of the models as it is more sensitive to the larger relative errors caused by the low value. Expressed in the original units of the modelled data without any normalizations, a perfect predictive model is likely to have a value close to zero.

  3. 3.

    Mean absolute error (MAE) expressed as (Chai and Draxler 2014):

    $$MAE = \frac{1}{N}\sum\limits_{i = 1}^{N} {\left| {\left( {P_{i} - Q_{i} } \right)} \right|}$$
    (9)

    MAE represents the average of the absolute error that indicates the discrepancy between the observed and predicted data in the testing phase. This metric is a non-negative number that has no upper bound and for a perfect model, the result is likely to be zero (ASCE 1993; Krause et al. 2005).

    In regards to the use of both the RMSE and the MAE in model evaluation, it is reasonable to consider that both of these metrics should be utilized together, as recommended by Chai and Draxler (2014). In particular, the RMSE, a quadratic scoring rule (that satisfies the triangle inequality for a distanced metric) is recommended when the model error distribution in the tested dataset is expected to be approximately normal, whereas the MAE, the average error magnitude in a set of predictions (without considering their direction), must be used when the model error distribution is almost uniform. The use of both model evaluation metrics considers the fact that the RMSE provides the benefit of penalizing large errors with more emphasis on them whereas the MAE is more suitable if the magnitude of the error within the testing dataset is not so important but the overall error needs to be considered. In agreement with the extensive analysis and application of both of these error metrics in earlier study (Chai and Draxler 2014), this research paper has analysed both of these errors to provide complementary insights into the hybrid MLP-FFA model’s performance in the testing phase.

  4. 4.

    Willmott’s Index of agreement (WI) (Willmott 1981, 1982) expressed as:

    $$WI = 1 - \left[ {\frac{{\sum\nolimits_{i = 1}^{N} {\left( {Q_{i} - P_{i} } \right)^{2} } }}{{\sum\nolimits_{i = 1}^{N} {\left( {\left| {P_{i} - \mathop Q\limits^{\_\_} } \right| + \left| {Q_{i} - \mathop Q\limits^{\_\_} } \right|} \right)^{2} } }}} \right],\quad 0 \le WI \le 1$$
    (10)

    The mathematical basis of the WI is its further improvement in respect to the NSE where its value is expected to vary from 0 to 1, in which the higher value is indicative of better agreement between the modelled and the observed data in the testing phase. However, Willmott (1981) points that relatively high WI values may be obtained even for poorly fitted model and as such, the outliers in tested data can adversely affect this performance measures.

    In Eqs. (710), the terms \(Q_{i}\), P i and \(\bar{Q}\) are the monthly values of the observed, predicted and average of observed water level whereas n is the number of observed data in the training or the testing set.

  5. 5.

    Taylor Diagrams (Taylor 2001)

    Since there is no universal metric to evaluate the model’s preciseness, which is likely to depend on the tested data, the estimation problem and the models that are being considered (Legates and McCabe 1999; Krause et al. 2005; Dawson et al. 2007), we have also considered the Taylor Diagram that was introduced by Taylor (2001) to evaluate the hybrid MLP-FFA and the standalone MLP model used for lake water level prediction. Taylor Diagram can be used to examine concurrently the relative importance of multiple aspects such as the correlation coefficient (R) between the observed and the forecasted lake water level, root mean square centered difference and the standard deviation. Importantly, the Taylor diagram can be adopted to highlight the goodness of different models within the same comparative framework relative to a common observation point and thus, is likely to complement the validity of the statistical metrics that have been outlined in Eqs. (7)–(10). Taylor Diagram also aims to visualize as a series of points on a polar plot where the azimuth angle refers to the coefficient between the predicted and observed data and the radial distance from the origin represents the ratio of the normalized standard deviation (SD) of the simulation to that of the observation dataset.

3 Results and discussion

The selection of the input variables is one of the most important issues in developing a robust forecasting model. This is due to the influence of the correlated lag time that is present in the historical data, which must be utilized to predict and assess the model’s accuracy. As with the case of the other hydrological data (e.g., streamflow) (Deo and Şahin 2016; Yaseen et al. 2016), the lake water level is also expected to display a high degree of serial correlation (or hydrological persistence) that arises from evaporation, transpiration, recharge and other hydro-physical processes. This can provide the lake water data a memory of several months; therefore, the state of the initial catchment hydrology can act as a source of future prediction (Chiew et al. 1998). Thus, the mutual information function has been applied to the training data in order to determine the numbers of statistically significant lagged series that can be used for the prediction purpose, as advocated in earlier studies (Khatibi et al. 2011; Wen et al. 2016).

Figure 4 shows that the Average Mutual Information (AMI) exhibits a well-defined first minima at a time lag of 4 months. This indicates that the information embedded into the trends within the training data from the past 4 months can be employed to predict the future value of the lake water level (i.e., utilizes the role of memory to predict the future value). Considering this, a set of four different input combinations were designed that incorporated t-lagged combination of up to 4 months of historical lake water level data applied for the 1-month lead time forecasting of the lake water level.

Fig. 4
figure 4

The average mutual information (AMI) function for lake water level time series

The auto-correlation function (ACF) and the partial auto-correlation function (PACF) of the monthly water level data are displayed in Fig. 5. As it is evident from this plot, the ACF reveals a somewhat exponential decay in its self-correlation value up to a lag time of about 36 month (i.e., approximately 3 years). This finite memory indicates the presence of significance persistence in the lake water level time series data (Vitanov et al. 2008). It is thus evident from the PACF plot confirms a strong persistence behaviour associated with a 1–3 month lagged timescale, and also a lags of 1–14 months that are likely to have a significant impact on the future value of the lake water level series. In this paper, the 14 months of lag is unlikely to be the optimum value for the lake water level prediction in this problem.

Fig. 5
figure 5

a Auto-correlation and b partial auto-correlation coefficients for lake water level time series

In general, the Average Mutual Information can be considered as a nonlinear counterpart of the autocorrelation function (ACF). While the mutual information characterizes both the linear and the non-linear features within the system that is being considered, the autocorrelation function refers only to the underlying linear components (Vitanov et al. 2008). In the present study, the AMI method has thus been adopted for further calculation and modelling of the lake water level due to its greater ability to measure the nonlinear associations between the target variable and its historical behaviour (e.g., Jayawardena 2014). Prior to the design of the hybrid MLP-FFA and standalone MLP-based predictive model, the data were normalized to be bounded by [0, 1] in order to enable the algorithms to consider the attributes and patterns more rationally and equally, as represented by both the extremely low and the extremely high values relevant to the feature space regardless of their magnitudes.

Table 1 shows the list of the standalone or the non-optimized (MLP) and the optimized hybrid (MLP-FFA) models that were developed for the lake water level prediction where a total of four different predictive models with various input combinations have been shown.

Table 1 The input combinations used for constructing the hybrid MLP-FFA model for the lake water level prediction

It is important to note that for both MLP-based predictive models, the Levenberg–Marquardt (LM) backpropagation learning algorithm considered as a fast and efficient tool compared to the other training algorithm (e.g., Adamowski et al. 2012; Deo and Şahin 2015b, 2016; Deo et al. 2017b) has been adopted. The benefits of the LM algorithm is off course, the more accurate curve fitting process that is applied to the input-target data and its basic features as a second-order algorithm with a better training speed since it does not require the computation of the Hessian matrix. In this paper, the MLP network has been trained for a total of 1000 epochs with a learning rate set to 0.0013 and a momentum coefficient of 0.84. The optimal number of neurons in the hidden layer were identified using a trial and error procedure to screen the most accurate model with the lowest root mean square error, following earlier studies (Deo and Şahin 2015a, b; Deo et al. 2017c; Raheli et al. 2017).

In accordance with Table 2, the best neuronal architecture according to the highest values of NSE and WI, and the lowest values of RMSE and MAE was attained with 4 input neurons, 3 hidden neurons, and 1 output neuron (i.e., the best model denoted as 4-3-1). In accordance with the results obtained for the standalone MLP model, as stated in Table 2, the MLP-4 model is the optimum model, as verified by a value of NSE = 0.993, RMSE = 0.083 m, MAE = 0.064 m, and WI = 0.994 computed for the training data set. For the testing dataset evaluated with the standalone MLP model, the results reflected a value of NSE = 0.954, RMSE = 0.102 m, MAE = 0.081 m, and WI = 0.988, which also exceeded the performance level of the other standalone MLP models that were constructed with different input combinations. It is noteworthy that the application of the four set of lagged input data representing historical water level series (i.e., Lt-1, Lt-2, Lt-3 and Lt-4) led to a significant improvement in the MLP model’s performance within the training as well as the testing phases. This shows clearly that the utilization of the time-lagged input series based on the Average Mutual Information approach (i.e., Fig. 4) was appropriate to attain a very good level of the predictive accuracy.

Table 2 The neuronal architecture and the comparative performance of the standalone and hybrid models in the training and testing phases where the optimal predictive model has been boldfaced

Next we investigate the model’s preciseness in respect to the application of the Firefly optimizer aimed to improve the performance of the standalone MLP model. Table 2 shows this result. It is evident that the predictive performance of the MLP-FFA-based model for different input combinations are better in terms of values of the NSE, RMSE, MAE, and WI metrics for both the training and the testing phases compared to the standalone MLP model discussed above. Among all models that utilized the Firefly Algorithm, the hybrid MLP-FFA4 model is seen to exhibit the smallest value of RMSE (≈ 0.029 m) and MAE (≈ 0.024 m) and the highest value of NSE (≈ 0.996) and WI (≈ 0.999) in the testing phase. In respect to the physical interpretation of using both metrics in the testing phase, it is important to mention that the lowest values attained by hybrid MLP-FFA4 model clearly show that the optimized model exhibits a remarkable accuracy in respect to large errors that may be penalized in a more stringent and a heavily weighted manner (i.e., as measured by the overall RMSE). However, when the net prediction errors measured equally irrespective of its magnitude are considered, the lowest value of MAE supports the superior accuracy of the hybrid MLP-FFA model. Since the MAE is able to assign equal weights to all prediction errors in the tested dataset while the RMSE assigns extra weights mostly to the large errors (Chai and Draxler 2014), the use of both scoring metrics (that indeed provides complementary evaluation) unambiguously confirms the superiority of the hybrid MLP-FFA-4 over standalone MLP-4 models used for lake water level prediction.

While the standalone MLP and the hybrid MLP-FFA model both have offered relatively good performance in both the training and testing phases since the NSE and WI were all greater than 0.90 and the RMSE and MAE were relatively small, the hybrid MLP-FFA model executed with all input combinations outperformed the standalone MLP model utilized for the prediction of lake water level. This indicates that the integration of the Firefly Algorithm as an add-in tool for the MLP model led to better weights and biases being identified in the trained model; leading to an improved performance.

In general, based on the obtained results, the hybrid MLP-FFA model has been shown as a very successful predictive tool for forecasting 1-monthly lake water level in context of the considered semi-arid region. The efficacy of the hybrid MLP-FFA against the MLP-based model indicated that the proposed hybrid approach was superior for all of the prescribed input combinations (i.e., lagged data series). In this respect, it is worth mentioning the overall percentage value representing the improvement in predictive accuracy of the hybrid MLP-FFA model. When this factor was considered, the absolute value of the error metrics including the RMSE and MAE were appear to have been reduced by approximately 71 and 70%, respectively, for the optimal lagged combinations of model inputs when the data for the testing phase were assessed. These results therefore, exhibit a large degree of harmony with the latest results attained for lake water level prediction using a radial basis function-based predictive model integrated with the Firefly Algorithm (Soleymani et al. 2016).

To check visually the level of statistical agreements attained between the observed and the predicted lake water level data, the scatter plots for the optimal Multilayer Perceptron (MLP4) and the optimal hybrid Multilayer Perceptron-Firefly Algorithm (MLP-FFA4) model evaluated in the testing phase has been shown in Fig. 6. It is clear that the hybrid MLP-FFA4 results are much closer to the observed lake level values in the testing phase with the larger coefficient of determination (R2 = 0.997 vs. 0.955) reflecting a better correlation and a higher degree of statistical agreement between the observed and predicted data series relative to the MLP4 model. The larger R2 value attained by the hybrid MLP-FFA4 model verifies that the estimated amount of covariance between in the predicted and the measured data was relatively smaller than those in the standalone MLP4 model. Importantly, this result appears to concur with the overall prediction metrics that are stated in Table 2 where the Firefly Algorithm was seen to improve the performance of the standalone MLP model. Additionally, the degree of diversion from the ideal 1:1 line (i.e., a perfect model) for the case of the hybrid MLP-FFA4 model is considerably less than that of the standalone MLP model (Fig. 6).

Fig. 6
figure 6

The scatterplot of the predicted and the observed lake level time series in the testing period using: a standalone MLP4; and b hybrid MLP-FFA4 model. In each panel, the coefficient of determination (R2) and the linearly fitted equation y = mx + c is included (m = slope; c = y-intercept)

Based on the data presented in the testing phase, Fig. 7 plots the observed and the predicted lake water level together with the prediction error for every tested datum point. While the level of agreement between the two datasets is very similar for both the standalone MLP4 and the hybrid MLP-FFA4 model, the actual prediction errors for these two modelling scenarios are quite distinct. As evident from Fig. 7, the MLP-FFA4 model has a lower error than the MLP4 model, and so, it offers a much better performance in terms of estimating of the high and the low values of the lake level data. This result, which accedes with those attained in earlier studies (e.g., Kavousi-Fard et al. 2014; Fu et al. 2015; Olatomiwa et al. 2015) clearly indicate that the integration of the standalone MLP model with the FFA as an optimizer tool can lead to a more accurate prediction of the lake water level.

Fig. 7
figure 7

The comparison of the optimal standalone and the hybrid models (MLP4 & MLP-FFA4) with the observed values and their error plots in the testing period: a MLP4; and b MLP-FFA4 model

A histogram showing the ratio of the predicted and the observed lake level data in the testing phase for the standalone MLP4 and the hybrid MLP-FFA4 model have been prepared in order to assess the frequency of the datum points in a number of designated error bins. Here, the total number of months binned in each error ratio on x-axis has been analyzed where the probability of occurrence for any given time series in any definite interval has been checked. Figure 8a, b shows the resulting histogram. It is imperative to note that a wider bin representing a larger deviation of these data from the ideal ratio of 1 is exhibited for the case of the standalone MLP4 model compared to the hybrid MLP-FFA4 model. It is thus clear that the probability distribution of the hybridized MLP-FFA4 predictions is very close to those of the observed data for most of the intervals within the testing phase.

Fig. 8
figure 8

The histogram of the predicted and the observed ratio of lake level in the testing period: a MLP4 and b MLP-FFA4 models

Figure 9 displays a graphical presentation of the Taylor diagram (Taylor 2001), utilizing a combined method for graphically condensing how intently a predicted example dataset from several models (i.e., a group of models) matches the observation dataset. The similarity between the predictive models and the corresponding observation records has been quantified in term of their correlation coefficient (R) and the standard deviation (SD). In accordance with the results in Fig. 9, the hybrid model (i.e., MLP-FFA) is compared with standalone MLP in term of the mentioned indictors to assess the prediction score skill. The distance from the reference point (i.e., observation) is a measure of the centered RMSE difference (Taylor 2001). A perfect predictive model (being in full concurrence with the observation data) is set apart by the reference point with the correlation coefficient equivalent to 1, and a similar abundancy of varieties contrasted with the observations (Heo et al. 2013). According to the visualization of the model results, the hybrid MLP-FFA results appear to be closer to the observation data point compared to the standalone MLP results. It is noteworthy that both the standard deviation and the correlation coefficient for hybrid MLP FFA-4 are in parity with the statistical score metrics deduced from the observed dataset. Importantly, the correlation coefficients for all of the standalone prediction models (denoted as MLP-1, MLP-2, MLP-3 and MLP-4) where the Firefly Algorithm has not been incorporated, are significantly low (≈ 0.95) compared to a value greater than 0.99 for their FFA-based counterpart models. Likewise, the SD values are also quite different, especially for the case of standalone MLP-1 model. Taken together, these result reaffirms the better accuracy of the optimized MLP-FFA model where the Firefly Algorithm has been integrated into a standalone MLP-based in the problem of lake water level prediction.

Fig. 9
figure 9

Taylor diagram of the predicted lake level of the Egirdir Lake in the testing period. The arc represents the correlation (R) value and the radial axis shows the standard deviation (SD) of the predicted data

In accordance with the results presented so far, the integrated MLP-FFA model has retained its superiority over the standalone MLP model when applied for 1-month lead time prediction of water level of Lake Egirdir in Turkey. Although the 1-month lead time prediction can be considered important for several facets of real applications (e.g., irrigation and lake water resource management over moderate time-scale), it must be acknowledged that a longer forecast horizon, such as 3-, 6- or 12-month lead time prediction could also be more desirable for long-term applications (e.g., seasonal farming, short-term crop management, irrigation systems, etc.). Having stated that, an increment in the forecast horizon is like to lead to greater inaccuracy of the MLP-FFA hybrid and the standalone MLP model, as stipulated in earlier studies. For instance, in Kisi (2009a, b), it was demonstrated that for 1-month lead time lake level forecasting, the neuro-wavelet conjunction model reduced the RMSE and MAE by 87–34% and 86–31% for the Van and Egirdir lakes, respectively, but for 6-month lead time horizon, the reduction was only by 34–48 and 30–46% for the Van and Egirdir lakes, respectively. Furthermore, their ANN model prediction was much worse at 6-month lead time horizon compared to the 1-month lead time horizon. In another study focusing on wavelet analysis–artificial neural network (WA-ANN) conjunction model for multi-scale monthly groundwater level forecasting, Wen et al. (2016) showed that their 2- and 3-month ahead forecasting results were worse than 1-month ahead forecasting results. In a rather different application (i.e., urban water demand forecasting) at a range of shorter forecast horizons, the study of Tiwari and Adamowski (2013) predicted 1-, 3- and 5-day ahead water demand using a wavelet-bootstrap-ANN model; showing that the model simulated data correlation with the observed data deteriorated with an increase in the forecasting lead time, whereas the model’s error in the testing period continued to increase. In that study, it was deduced that there was less information available for longer lead time horizons, and that the deteriorating performance of the ANN model specifically for higher values at longer lead times potentially showed the weakness of the model structure to forecast the higher water demand values. Based on the analysis of literature, it is construed that the present 1-monthly model forecast accuracies are likely deteriorate with an increase in the forecasting horizon to longer-terms (e.g., 3-, 6- or 12 month horizons), although a follow-up study could investigate the range of different inputs that could specifically provide predictive features to accurately model lake water level at longer forecast horizon.

4 Conclusions

In this paper, a Multilayer Perceptron (MLP) based forecast model integrated with the Firefly Algorithm (FFA) as an optimizer tool has been adopted for the forecasting of the hydrological time series data: i.e., lake water level. The case study has been performed on a semi-arid region located at Lake Egirdir, in Turkey. By applying Average Mutual Information on the historical time-series of the lake water level data, a set of four input combinations of lake water level with lagged data series were considered to be most appropriate for the prediction of the 1-month lead time lake water level series. Several forecasting models have been developed, including the standalone MLP and the integrated MLP-FFA model using historical data for the period 1961–2016. The results have been evaluated with several statistical score metrics and visual displays; showing the better efficiency of the hybrid MLP-FFA model in terms of the correlation between the forecasted and observed water level data, Nash–Sutcliffe’s coefficient, root mean square error and the mean absolute error.

In accordance with the results, it was evident that the hybrid MLP-FFA4 model with four lagged input combinations of the historical data in training period was more accurate than the other counterparts, thus indicating the importance of the Firefly Algorithm as an optimizer for better accuracy of standalone models. The results of this study suggest that the Firefly Algorithm is a useful add-on tool for enhancing the forecasting accuracy of forecasting models applied for lake water level prediction. Also, this research provided evidence for the effectiveness of the hybrid model that can be utilized and investigated for engineering applications where historical data can provide features for developing a predictive model. In spite of the very good performance of MLP-FFA model attained in this study, it should be acknowledged that there are limitations to be investigated. For example, it is envisaged that further improvement in the performance accuracy is possible by inclusion of more significant information in the learning process of the predictive model.

In this paper, we are limited by utilizing only the antecedent lagged combinations of lake water level data where the memory of water fluctuations in the historical period within the objective variable (i.e., lake water) can be utilised to formulate the forecasting model. Hence, for further enhancement of results in this paper, it is important to include other significant hydro-meteorological variables, such as rainfall, evaporation, temperature and humidity that may contain a larger set of predictive features to assist in more accurate prediction the future value of lake water levels. In this case the, the non-univariate predictive modeling framework, is expected to yield better and more informative prediction with greater level of accuracy. A future research work could also apply the model for short-term prediction of lake water levels (e.g. daily or hourly discharge), which is likely to generate a more robust model.