Introduction

Pollution control in river systems via the modeling of qualitative parameters of water is one of the primary components that warrant a special focus in managerial scheduling. In some cases, pollution indices of an aquatic system are evaluated by two terms, namely biochemical oxygen demand (BOD) and declined dissolved oxygen (DO). BOD, which is an important parameter to be estimated accurately, reflects all present materials that can be oxidized by chemical processes and aerobic organisms and also the abundance and activity of the oxidizing organisms and is deemed as one of the primary criteria that are required for any aquatic system. Since an inverse relationship exists between the BOD and DO, the higher value of BOD is symptomatic of the deficiency of dissolved oxygen. Moreover, the concentration of DO in water should be known before measuring the value of BOD. Hence, both of the above criteria should be determined simultaneously. However, the determination of these values in laboratory constrained conditions is both time-consuming and costly. This clearly warrants a need for indirect methods to be applied to predict these values (Singh et al. 2009).

In recent years, there have been a number of statistical and deterministic models developed for modeling water quality; however, most of the existing models for water quality parameters are very complex and require a significant amount of field data to support the analysis (Chen et al. 2003; Kurunç et al. 2005). Many statistical-based water quality models assume that the relationship between response variable and prediction variable is linear and distributed normally. However, as water quality can be affected by many factors, the traditional data processing methods are no longer efficient enough for solving the problem, as such factors encompass a complicated nonlinear relation to the variables of water quality forecast (Wu et al. 2000; Xiang et al. 2006). Therefore, utilizing statistical approaches usually does not possess high precision. In recent decades, promising results have been reported by several studies that investigated water quality modeling problems using artificial intelligence (AI) techniques (Diamantopoulou et al. 2005; Sengorur et al. 2006; Palani et al. 2008; Hore et al. 2008; Najah et al. 2009; Singh et al. 2009; Dogan et al. 2009; Najah et al. 2011; Kim and Seo 2015; Sarkar and Pandey 2015; Salami and Ehteshami 2015). In most of these studies, the monthly parameters of water quality have been used for the simulation of water quality parameters (Diamantopoulou et al. 2005; Sengorur et al. 2006; Palani et al. 2008; Singh et al. 2009; Dogan et al. 2009; Sarkar and Pandey 2015).

Despite being relatively successful, these research works have covered a comprehensive range of forecasting accuracy which varied significantly owing to the environmental features of locations. Hence, the significance of input data to forecast the river water quality became evident. Therefore, so far, a comprehensive model capable of simulating various environments has been out of reach. Yet, an inspiring question for researchers in the area of water resources management is: Does a hybrid intelligent model integrated with optimization algorithm enhance the predictive model’s precision? (Fahimi et al. 2016).

Different combinations of input (or predictor) data have proven to govern the predictive accuracy of an objective model (Abbot and Marohasy 2014; Deo et al. 2016; Galelli and Castelletti 2013; Quilty et al. 2016), but moreover, the need for a novel methodology to extract the information concealed in the input dataset to yield desirable and accurate artificial intelligence model is strongly required in the field of river systems engineering. Quite often, a stand-alone model is seen to lack a suitable optimization procedure for the extraction of features within the input data, from which the best model can be obtained, which is a prerequisite to enhance the performance of the predictive model (Long and Meesad 2013; Quilty et al. 2016; Sedki and Ouazar 2010). Firefly algorithm (FFA) used in this paper acts as an optimization tool for artificial intelligence models and has recently been in area of predictive modeling (Ghorbani et al. 2017). Mainly, the conceptual theory of FFA is triggered by the flashing pattern of fireflies as stated by Yang (2010). FFA has performed favorably in many different fields (Kavousi-Fard et al. 2014; Fu et al. 2015; Emary et al. 2015; Kazemzadeh-Parsi 2014; Nascimento et al. 2013; Talatahari et al. 2014; Yang 2010). In estimation problems, FFA method culminated in a considerable improvement solution procedures. In conclusion, this study revealed that the modified FFA model performed very efficiently in comparison with other optimization algorithms.

Considering all the arguments above, different results obtained from AI techniques for all sets of input data can render the optimal evaluation of results to be impossible, thereby causing uncertainty in the output results (Kim and Seo 2015). Consequently, uncertainty in AI-based models is considered to be one of the most important restrictions especially in using the well-established ANN technique applied to develop strategies for appropriate control and management of water quality (Noori et al. 2015b).

Despite numerous publications in the area of uncertainty assessment in qualitative models applied in water research (Canale and Seo 1996; Wagener and Gupta 2005; Gupta et al. 2006; Sin et al. 2009; Cea et al. 2011; Srivastava et al. 2014), the number of studies in uncertainty analyses based on AI techniques (e.g., ANN) is insufficient and thus requires further research (Noori et al. 2015b). Out of the many research works by different authors, uncertainty analysis of AI techniques was also performed by Aqil et al. (2007) who have investigated the uncertainty of output values in a neuro-fuzzy model applied for the prediction of weekly flows of a river. Using a Monte Carlo method, they found that the method was suitable for assessing uncertainty of a neuro-fuzzy model. Noori et al. (2010) investigated the uncertainty within ANN and adaptive neuro-fuzzy inference system (ANFIS) models applied to predict carbon monoxide’s (CO) concentration in the atmospheric region of Tehran. In another work, the uncertainty within ANFIS and ANN models was investigated by Noori et al. (2013a, b) to predict the value of BOD5 in Sefidrood River. Jiang et al. (2013) used a new and efficient model based on an ANN and the Monte Carlo method (ANN-MCS) to analyze the uncertainty in the prediction of COD pollution hazard within the Yellow River located in Lanzhou.

A study by Dehghani et al. (2013) examined the uncertainties within the multilayer feedforward artificial neural network (FFANN) model using the Monte Carlo method and applied the model to predict hydrological drought in Karoon River located in southeast Iran. Antanasijevic et al. (2014) evaluated the uncertainty of the general regression neural network (GRNN) in predicting the DO parameter in Danube River. Noori et al. (2015a) assessed the uncertainty of ANN, support vector machine (SVM) and ANFIS techniques to predict the longitudinal distribution coefficient (LDC) in natural rivers, while Noori et al. (2015b) investigated the uncertainty of the SVM model to estimate the 5-day BOD in Sefidrood River. Recently, Ghorbani et al. (2016) examined uncertainties of an SVM, radial basis function (RBF) and MLP model in predicting the monthly current of Zarrinehrud River.

In this paper, an artificial intelligence model, namely multilayer perceptron (MLP), is integrated with FFA for river water quality (BOD and DO) modeling. In general, the MLP model is a common artificial neural network architecture (Ghorbani et al. 2017). The aims of this study are: (I) to investigate the applicability of an MLP model for BOD and DO prediction in Langat River, (II) to combine an MLP model with FFA to create a hybrid MLP-FFA model, and (III) to assess the predictive precision of MLP-FFA using a number of visual and statistical criteria observed and predicted BOD and DO (IV) uncertainty assessment of MLP model and comparison of results. To the best of the author’s knowledge, there is no prior research in the literature that investigated the ability of multilayer perceptron-based FFA hybrid model for river water quality prediction.

In current paper, the code used for the MLP model is a readily available code, while the codes used for the hybrid MLP-FFA method and for uncertainty assessment have been developed by the authors as their original contribution to this research. The structure of the proposed hybrid MLP-FFA model is shown in Fig. 1. It is noted that the list of used abbreviations is given at the end of the article.

Fig. 1
figure 1

Modeling methodology

Methodology

Multilayer perceptron neural network (MLP)

In recent decades, efforts have been undertaken to simulate a natural neuron, capable of capturing the characteristics of a biological system which is also easy to implement in NN models. Modeled networks are also known as “paradigms of the NN.” ANN models were firstly introduced by Pits and McCulloch (Govindaraju 2000). ANN is a mathematical structure designed to simulate the data processing of brain neurons (Hinton 1992; Jensen 1994).

In this study we have applied a very popular and a widely applied neural network model, which is known as the multilayer perceptron (MLP). It is important to mention that the MLP model is a variant form of the classical ANN model and this model has been widely used in the current era of big data analytics (Gardner and Dorling 1998; Ay and Kisi 2011). The basic MLP model comprises of three layers: (I) input layer, (II) hidden layer and (III) output layer. The input layer receives the set of input data, the processing of the features is performed in the hidden layer(s), and the output layer is used to reveal the predicted results. Figure 2 illustrates a sample of a three-layer perceptron neural network.

Fig. 2
figure 2

A typical multilayer perceptron neural network architecture (Najah et al. 2009)

The identification of the most accurate architecture of the ANN-MLP model is based on a trial-and-error method, and its final structure is determined through hidden layer and neuron values. In the structure of the MLP model, the inputs to the \( i \)th layer (\( x_{1} \) to \( x_{j} \)) are multiplied by their assigned weights (\( w_{i1} \) to \( w_{ij} \)) and then summed up. A threshold (\( b_{i} \)) is added to the input, and the net input (\( {\text{Net}}_{i} \)) is determined, which is always greater than zero. The weights indicate the strength of neurons’ connection and are optimized through the learning process.

$$ {\text{Net}}_{i} = b_{i} + \mathop \sum \limits_{j = 1}^{n} w_{ij } x_{j} $$
(1)

Subsequently, the member function receives inputs and outputs and transfers them to the next layer. Sigmoid functions are often used in artificial neural networks to introduce nonlinearity in the model (Maier and Dandy 2000).

$$ f ({\text{Net}}_{i} ) = \frac{1}{{1 + e^{{ - {\text{Net}}_{i} }} }} $$
(2)

In this research, all datasets were normalized and divided into two categories of testing and training data. In order to predict the quality of water, tangent sigmoid, linear stimulator and the Levenberg–Marquardt algorithm (LMA), which is a fastest method for training the feedback neural network (Adamowski et al. 2012; Deo and Şahin 2016), were used for mapping the information from the input layer to the hidden layer and from the hidden layer to the output layer, respectively. LMA is a simple and robust feature extraction tool which provides a solution for the minimization problem with respect to the function variables. The optimum number of neurons in the hidden layer is obtained by a trial-and-error method and by changing the number of neurons from 1 to 20 in the hidden layer in the present study.

Hybrid MLP-FFA model

It should be noted that the Levenberg–Marquardt training algorithm, known as a robust predictive training algorithm (in terms of speed and efficiency) (e.g., Tiwari and Adamowski 2013; Deo and Şahin 2016, 2017), was used in identifying the local minima, which may not necessarily be the global minimum within the feature dataset. This indicates that there is a room for further improvement in the MLP model’s performance. In our study, this has been achieved by the application of the FFA as an optimization tool following our earlier studies (e.g., Ghorbani et al. 2017). In general, the FFA is an optimization algorithm that is known to yield better performance that is attributed to identifying the global minimum within the feature datasets.

In essence, the nature-inspired FFA procedure was first introduced by Yang (2010) as an extension of the swarm intelligence optimization method relying on the movement of fireflies. In this approach, the solution to an optimization problem can be regarded as an agent, i.e., firefly which shines in proportion to its quality. As a result, each brighter firefly is able to attract its partners, regardless of their sex, which render the exploration of the search space more effective (Lukasik and Zak 2009).

As fireflies are attracted toward light, the whole swarm moves toward the brightest firefly. In this case, the attractiveness of the fireflies is highly proportional to their brightness and the brightness relies on the intensity of the agent (Kayarvizhy et al. 2014). The main defect of firefly algorithm is the construction of its objective function and the differentiation of the light intensity.

The variables of the FFA are the light intensity I(r), the attractiveness \( \left( \beta \right) \), and the Cartesian distance between any two fireflies i and j at \( x_{i} \;{\text{and}}\; x_{j} \), respectively, that best can expressed (Yang 2010) as:

$$ I\left( r \right) = I_{O} \exp \left( { - \gamma r^{2} } \right) $$
(3)
$$ \beta \left( r \right) = \beta_{O} \exp \left( { - \gamma r^{2} } \right) $$
(4)
$$ r_{ij} = x_{i} + x_{j} = \sqrt {\mathop \sum \limits_{K = 1}^{d} \left( {x_{i,k} - x_{j,k} } \right)} $$
(5)

where \( x_{i,k} \) is the kth component of the spatial coordinate \( x_{i} \) of the ith firefly, \( \gamma \) is the light absorption coefficient, d is the dimensionality of the given problem, I(r) and \( I_{O} \) are the light intensity at distance r and initial light intensity from a firefly, and \( \beta \left( r \right)\; {\text{and}}\; \beta_{O} \) are the attractiveness \( \beta \) at a distance r and r = 0.

The next movement of firefly i can be represented as (Yang 2010):

$$ x_{i}^{i + 1} = x_{i} +\Delta x_{i} $$
(6)
$$ \Delta x_{i} = \beta_{O} {\text{e}}^{{- {\gamma r}^{2}}} \left({x_{j} - x_{i}} \right) + \alpha \epsilon_{i} $$
(7)

Here, the first phase of formula (7) indicates the attraction, whereas the second phase denotes the randomization processes. The \( \alpha \) controls the randomization values that range between 0 and 1, and \( \epsilon_{\text{i}} \) represents the random number of the Gaussian distribution (Ch et al. 2014).

In this paper, a novel contribution to the prediction of BOD and DO is made where a newly constructed MLP-FFA hybrid model is attained. The MLP-FFA hybrid model was generated by integrating the traditional MLP model with the FFA that is a popular optimization tool used in data-driven modeling. Figure 3 shows the procedure of obtaining optimal MLP weights with FFA.

Fig. 3
figure 3

Flowchart of the MLP-FFA structure

The simulation procedure of the MLP-FFA model involves determining the combination of input parameters with regard to the correlation coefficient among input and output (target) variables. Afterward, the firefly algorithm is supplied with a selection of best inputs based on their congruence with the target variable normally assessed by the objective function, and the chosen inputs are utilized in the MLP-FA model to generate the prediction of BOD and DO.

Uncertainty analysis

Uncertainty is a result-dependent factor that demonstrates the range of values a modeling result can attain. It also represents the possibility that the measured value may fall into the specified range. This research paper aims to estimate the uncertainty of neural network output. Here, we apply the method recommended by Abbaspour et al. (2007) and Noori et al. (2015b) that was used to analyze the uncertainty of river quality prediction. In this method, the percentage of measured data bracketed by 95 percent predicted uncertainties (95PPU) are considered. In order to gain this value, the ranges of empirical cumulative distribution probability (X_L) 2.5% and (X_U) 97.5% are determined through 1000 predictions.

The appropriate confidence level is the level in which two requirements are met: (1) The 95PPU band brackets “most of the observations” and (2) the average distance between the upper (at 97.5% level) and the lower (at 2.5% level) parts of the 95PPU is “small.” Quantifications of the two requirements are problem dependent to an extent.

Abbaspour et al. (2007) reported that 80–100% of measured data should be in the 95PPU level provided that they are of good quality. In some regions that data are not of good quality, having 50% of data in the 95PPU level would suffice.

For the second requirement, it is essential that the average distance between the upper and the lower 95PPU be smaller than the standard deviation of the measured data (Abbaspour et al. 2007). We utilize the above two measures to quantify the strength of calibration, accounting for the combined parameter, model, and input uncertainties.

To evaluate the average width of the confidence interval band, the band width indicator was suggested by Abbaspour et al. (2007) as follows:

$$ d\text{-factor} = \frac{{\overline{{d_{X} }} }}{{\sigma_{X} }} $$
(8)

where \( \sigma_{X} \) is the standard deviation of observed data and \( \overline{{d_{X} }} \) is the confidence band’s average width which is defined as follows:

$$ \overline{{d_{X} }} = \frac{1}{k}\mathop \sum \limits_{l = 1}^{k} \left( {X_{U} - X_{L} } \right)_{i} $$
(9)

The percentage of the data within the confidence band of 95% is determined as follows:

$$ {\text{Bracketed}}\;{\text{by}}\;95{\text{PPU }} = \frac{1}{k} {\text{Count}} \left( {j | X_{L}^{l} \le X_{\text{reg}}^{l} \le X_{U}^{l} } \right) \times 100 $$
(10)

where 95PPU indicates 95% predicted uncertainty; \( k \) is the number of observed data; \( l \) is the current month which changes from 1 to \( k \); \( X_{L}^{l} \) and \( X_{U}^{l} \) are, respectively, the lower and the upper bands of uncertainty; and \( X_{\text{reg}}^{l} \) is the current month’s registered data.

Whenever the recorded data for the present month (\( l \)) are placed in the uncertainty range, one unit is added to the counter (\( j \)) and the maximum amount of \( j \) will occur when \( l = k \). If all the recorded amounts are within the lower and the upper band, then the maximum amount of “Bracketed by 95PPU” will be 100.

Model performance criteria

In order to assess the accuracy of the model’s results and the model fitness, the correlation coefficient (r), root-mean-square error (RMSE) (Willmott and Matsuura, 2005) and Willmott’s index of agreement (WI) (Willmott 1981, 1984) are used.

The correlation coefficient (r) is defined as the correlation between the observed and modeled data:

$$ r \, = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {P_{i} - \bar{P}} \right) \left( {O_{i} - \bar{O}} \right)}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {P_{i} - \bar{P}} \right)^{2} \mathop \sum \nolimits_{i = 1}^{n} \left( {O_{i} - \bar{O}} \right)^{2} } }} $$
(11)

RMSE is defined as:

$$ {\text{RMSE }} = \sqrt {\frac{1}{n} \mathop \sum \limits_{i = 1}^{n} \left( {O_{i} - P_{i} } \right)^{2} } $$
(12)

WI is defined as:

$$ {\text{WI}} = 1 - \left[ {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {O_{i} - P_{i} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {\left| {P_{i} - \bar{O}} \right| + \left| {O_{i} - \bar{O}} \right|} \right)^{2} }}} \right] $$
(13)

%RMSE is defined as:

$$ \% {\text{RMSE }} = \frac{RMSE}{{\bar{O}}} \times 100 $$
(14)

where n is the number of input variables, \( O_{i} \) and \( P_{i} \) are, respectively, the measured and the output of the \( i \)th element, and \( \bar{O} \) and \( \bar{P} \) are the average of the values within the testing dataset.

In this study, the optimal model’s accuracy was considered to be excellent when the %RMSE < 10%; good if 10% < %RMSE < 20%; fair if 20% < %RMSE < 30%; and poor if %RMSE > 30% (Heinemann et al. 2012; Jamieson et al. 1991).

Study area and model development data

In this research, the monthly water quality datasets of data of two stations in Langat River from the period 2001–2010 are utilized. Langat River is one of the most important rivers in Malaysia, and it is located in geographical location of 2° 40′ M 152″ N to 3° 16′ M 15N latitude and 101° 19′ M 20″ E to 102° 1′ M 10″ E longitude. Catchment of Langat River is an important catchment which provides water and other facilities for some 1.2 million people. Its total area is 1,815 km2. Big cities that receive their water from this catchment are Cheras, Kajang, Bangi, government center of Putrajaya, etc. The length of the main water flow is about 141 km located at 40 km east of Kuala Lumpur.

The basin of Langat River is located in southern and southeastern parts of the Selangor Darul Ehsan state. Langat River originates from Pahang–Selangor border in which highlands are 1500 m above sea level. It drains westward to the Straits of Malacca. Figure 4 provides detailed geographical features and water quality control stations of the Langat River. Sets of data are divided in two categories: Monthly data of the first 7 years from 2001 to 2007 (84 sets or 70% of the whole dataset) are used for training and monthly data of the last 3 years from 2008 to 2010 (36 sets or 30% of the whole data) are used for testing.

Fig. 4
figure 4

Location of the study area and the water quality monitoring station

Models proposed by researchers with regard to the prediction of the BOD and DO using various input parameters are summarized in Table 1. Based on Table 1 and parameters utilized by other researchers in their studies, in this study the following parameters are used: chemical oxygen demand (COD), total phosphate (PO4), total solids (TS), potassium (K), sodium (Na), chloride (Cl), electrical conductivity (EC), pH and ammonia nitrogen (NH4-N).

Table 1 Input parameters used in previous studies for the AI models

A key reason why the monthly data have been used is the discrete nature of the daily or hourly data for the case study, which is highly challenging to acquire. Besides this, there is enough evidence in the literature that gives credence to credibility of this choice, as explained in Introduction and Table 1. It is noteworthy that due to insufficient water quality data in the study area, in this research, only the aforementioned parameters are utilized for BOD and DO prediction. The statistical parameters of water quality data of the Langat River in two considered stations are shown in Table 2.

Table 2 Basic statistics of the measured water quality parameters in Langat River

Results and discussion

Combination of input parameters

Table 3 lists the most suitable combination of input parameters. Using SPSS software, correlation coefficients of parameters, previously mentioned in Table 2, are calculated. Subsequently, in Fig. 5, the correlation map of parameters for both stations was drawn based on a color scale such that the closeness of values to 1 or -1 indicates high correlation. Based on Table 3, all 9 input parameters were used in the first combination. However, in combinations 2, 3 and 4, PO4, pH and TS which had the lowest correlation with BOD and DO were eliminated, respectively (based on Fig. 5), and the best combination was obtained for input parameters.

Table 3 Combinations of model inputs
Fig. 5
figure 5

Correlation map of river water quality variable dataset for 1L04 and 1L05 stations

It should be noted that BOD and DO indicators are dependent variables, while the rest are independent variables.

A total of four models with various input combinations have been developed. Both non-optimized (MLP) and hybrid (MLP-FFA) models were constructed and tested in order to specify the optimum number of nodes in the hidden layer and transfer functions. Selection of an appropriate number of nodes in the hidden layer is of paramount importance as a large number of these may result in over-fitting, while a smaller number of nodes may not capture the information desirably (Singh et al. 2009). The optimum number of neurons was determined based on the minimum value of mean square error (MSE) of the training dataset. The network was trained in 1000 epochs, learning rate of 0.0013 and momentum coefficient of 0.9. In Tables 4 and 5, r, RMSE, %RMSE and WI values obtained from BOD and DO simulations in both testing and training data are shown along with optimal number of neurons. From the four sets of input data, the best set that has higher r and WI values and the lower RMSE and %RMSE during test is chosen as the best set.

Table 4 Comparative performance of the selected models for monthly river water quality prediction in 1L04 station
Table 5 Comparative performance of the selected models for monthly river water quality prediction in 1L05 station

Tables 4 and 5 also represent the best network structure and their respective function criteria. Based on the results of non-optimized MLP model (Tables 4, 5), the performance criteria reveal that the models designated as ANN(8,9,1) and ANN(8,6,1) are the best models to predict BOD and DO in 1L04 station, respectively (Table 4), and ANN(7,8,1) and ANN(9,13,1) are the best models to predict BOD and DO in 1L05 station, respectively (Table 5). The structure of ANN(8,9,1) consists of one input layer with eight input variables, one hidden layer with nine nodes and one output layer with one output variable.

A relatively low correlation coefficient between the measured and model output variables (BOD and DO) in the present study, especially at 1L05 station, may be due to the heterogeneous nature of the water quality (input and output) variables as these were measured over a span of 10 years in two sampling sites (as shown in Fig. 4). Moreover, relatively higher correlations between measured and model (NN) computed values of BOD and DO in various aquatic systems (Sengorur et al. 2006; Soyupak et al. 2003; Ying et al. 2007; Dogan et al. 2009) may be ascribed to the limited number of the input variables used. To fix this problem we investigated the model’s precision with respect to firefly optimizer algorithm.

In the MLP-FFA hybrid models, the multilayer perceptron model and firefly algorithm were integrated (Fig. 3). Tables 4 and 5 show the results of this study. It is obvious that the prediction performance of the MLP-FFA-based models in terms of r, RMSE, %RMSE and WI for training and testing periods is higher compared to the non-optimized models. That is, the MLP-FFA model displayed the smallest value of RMSE and %RMSE and the highest value of r and WI in the testing set (Tables 4, 5). In general, based on results, the MLP-FFA model is a powerful tool in predicting the water quality of rivers.

Also based on Tables 4 and 5, and results obtained from both MLP and MLP-FFA models, a relatively better performance (r between measured and computed values) of the BOD model as compared to that of the DO model shows that the selected influential factors (input variables) have relatively greater impact on BOD than on DO. Also, selection of the influential factors might affect the model output considerably (Ying et al. 2007).

Figures 6 and 7 show the comparison between MLP and MLP-FFA results and the observed data for the set of monthly test data. It is clear that the MLP-FFA model results are closer to the observed water quality values compared to MLP model. Moreover, BOD and DO parameters in station 1L04 show a higher correlation than that of the 1L05.

Fig. 6
figure 6

Comparative plots of observed and predicted monthly river water quality by MLP and MLP-FFA models for testing period 2008–2010 for 1L04 station: a, b BOD, c, d DO

Fig. 7
figure 7

Comparative plots of observed and predicted monthly river water quality by MLP and MLP-FFA models for testing period 2008–2010 for 1L05 station: a, b BOD, c, d DO

As previously discussed, one of the aims of this study is the uncertainty analysis of the multilayer perceptron neural network using two criteria, namely 95PPU and d-factor, such that the increase in observed data in 95PPU level and the decrease in average value of upper and lower bands (smaller than the standard deviation of the measured data) in uncertainty eventuate in a more favorable uncertainty. In this section, optimal structure of the models discussed in previous sections is used. Uncertainty indices of 95PPU and d-factor for testing datasets are given in Table 6. As shown in Figs. 6, 7 and Table 6, values bracketed by 95PPU indicate that about 77.7%, 72.2% of data for BOD and 72.2%, 91.6% of data for DO relate to 1L04 and 1L05 stations, respectively. Furthermore, the d-factor has a value of 1.648, 2.269 for BOD and 1.892, 3.480 for DO, which relate to 1L04 and 1L05 stations, respectively.

Table 6 Uncertainty indices of the MLP model for the testing stage

Based on the obtained values in both stations for 95PPU and d-factor indices, it can be concluded that all the observed data fall into the 95PPU band range (over 50% of observed data), and reasonable extent of uncertainty is achieved in simulating both BOD and DO. Simulation results for BOD are better than DO since the average distance between the upper and lower values of 95PPU (d-factor) is smaller than the standard deviation (SD) of measured data (2.269 < 3.55, 1.648 < 4.02) for BOD, while this is not valid for DO. Moreover, the uncertainty of MLP model in modeling BOD’s uncertainty is lower than that of DO’s as indicated by smaller d-factor.

In general, there are three types of uncertainties in all simulation processes. The first type involves the uncertainties associated with the simulator model. The second type involves uncertainties arising from data. The third type involves the local knowledge. Hence, the level of uncertainties varies significantly with the problem type. In this research, the uncertainty originates from the ANN model, local knowledge and the data, which stem from human and machine errors and some other unknown problems.

Sensitivity analysis

To evaluate the effective input parameters, two criteria (r and RMSE) are used to determine the most effective variables on the output. Based on Tables 4 and 5, and the obtained results, owing to relatively better performance, second (8 input variables) and third (7 input variables) combinations for BOD and second (8 input variables) and first (9 input variables) combinations for DO were used for sensitivity analyses in both stations. The analyses consisted of the comparison of overall 9 and 8 networks for BOD and 9 and 10 networks for DO in stations 1L04 and 1L05, respectively. Each one demonstrated to what extent the eliminated parameter would affect the network accuracy.

Obviously, the precision of MLP would become higher if all the suggested parameters were used as the input to the model for the testing dataset. Next, the most influential parameters were selected after determining the networks with reduced accuracy (lower r and higher RMSE) after the elimination of a parameter in testing stage compared to first network (all input parameters). Taking above arguments into consideration along with the results presented in Tables 7 and 8, in both stations the BOD parameter is more sensitive to Na, Cl and NH4-N, while DO parameter is more sensitive to COD, pH and NH4-N in station 1L04, and to K, pH and NH4-N in station 1L05.

Table 7 Results of sensitivity analysis of the MLP model with regard to the simulation of BOD in 1L04 and 1L05 stations
Table 8 Results of sensitivity analysis of the MLP model with regard to the simulation of DO in 1L04 and 1L05 stations

Conclusion

In this paper, a multilayer perceptron (MLP) forecasting model integrated with a firefly (FF) optimizer algorithm (MLP-FFA) was used for forecasting a river water quality “i.e., BOD and DO.” The case study was Langat River basin in Malaysia. By applying correlation coefficient to water quality data, a set of four input combinations were deemed suitable for prediction of BOD and DO.

Hence, a number of forecasting models were developed, including the traditional MLP and integrated MLP-FFA models over a 10-year duration (2001–2010) for 1L04 and 1L05 stations. The results were assessed with several statistical and visual criteria and showed the better efficiency of MLP-FFA model in terms of the correlation coefficient (r) between forecasted and observed water quality, root-mean-square error (RMSE), % root-mean-square error (%RMSE) and Willmott’s index of agreement (WI). It was obvious that the MLP-FFA model with (8,9,1) and (8,6,1) structure to predict BOD and DO in 1L04 station, respectively, and (7,8,1) and (9,13,1) structure to predict BOD and DO in 1L05 station, respectively, was more accurate than the other counterparts, thus impressing upon the importance of the firefly algorithm as an optimizer for better accuracy of conventional models.

The results of this study suggest that the firefly optimizer algorithm is a useful add-on tool for improving the forecasting accuracy of forecasting models applied for water quality prediction. Also, this research gives credence to the effectiveness of the hybrid model that is applicable to other engineering problems where historical data can provide features for developing a predictive model.

Despite the good performance of MLP-FFA model attained in this study, it should be admitted that there are limitations in this study that demand further research. Presumably, it is speculated that further improvement in the performance accuracy is possible by the inclusion of more significant information in the learning process of the predictive model. This study was limited to available data at hand. Hence, for simulation purposes, it is very important to include the other important variables, such as discharge, temperature, T-Alk, T-Hard, NO3-N, and datasets that may contain factors which may help to predict the value of the BOD and DO. A future research work could apply the model for short-term prediction of water quality (e.g., daily or hourly parameters). Such a study is likely to generate a thorough model for operational usage, but was beyond the scope of this paper and thus awaits another independent investigation.

Additionally, the reliability of the MLP model prediction was calculated by an uncertainty estimation. Based on the values in both stations for the 95PPU and d-factor indices, it is concluded that the neural network model has an acceptably low degree of uncertainty applied for BOD and DO simulations. Besides, a comparison between the presented results for uncertainty determination of MLP model showed a lower degree of uncertainty in simulating the BOD compared to the DO dataset as indicated by smaller d-factor. In future work, the above-mentioned restrictions can be obviated using other robust methods of uncertainty analysis, which in turn improve results and reduce uncertainty.

At the end, the effective input variables analyzed through sensitivity analysis showed that in both stations, the BOD parameter was more sensitive to the Na, Cl and NH4-N data, while the DO parameter was more sensitive to the COD, pH and NH4-N data in station 1L04, and to the K, pH and NH4-N data in station 1L05.