Abstract
Water pollution is a major global environmental issue. Predicting water quality parameters in advance is of utmost importance in the normal operation of society. However, existing empirical models exhibited low precision in water quality prediction due to the non-stationarity and non-linearity of the water quality series, and the performance of the long short-term memory network (LSTM) integrated with the improved meta-heuristic algorithm is still unclear and worth exploring. In this paper, we proposed a hybrid model based on the ensemble learning method that integrates the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and improved LSTM to model and forecast the variation of water quality parameters. First, the water quality series was denoised by an integrated filter based on CEEMDAN and fuzzy entropy (FE), and then, the ant lion algorithm based on chaos initialization, Cauchy mutation, and opposition-based learning (OBL) optimization (IALO) was used to determine the hyperparameters of LSTM. Weekly dissolved oxygen (DO) concentration data, from 1/2010-12/2016, collected at Yongjiang River and Beijiang River gauging stations in the Pearl River Basin were used to validate the effectiveness of the proposed model. The experimental results reveal that the proposed model has better predictive accuracy than other data-driven models because of better error performance, with average MAE, MAPE, and RMSE of 0.42, 5.37%, and 0.53 in the test stage, which is 50.24%, 47.66%, and 49.66% lower than the baseline LSTM model, respectively.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
The development of society cannot be separated from the nourishment of the water system. In recent years, the aggravation of human production activities leads to serious environmental pollution in some river basins (Zhai et al. 2014; Wang et al. 2018), which is not conducive to the construction of ecological civilization in the watershed area. The advanced prediction of water quality is a key aspect of watershed water resource management. Therefore, the water quality prediction with high accuracy is essential for the prevention of medical accidents due to water pollution and for the comprehensive and effective control of the watershed environment.
The existing research methodology for water quality prediction mainly includes deterministic and statistical methods (Ahmed et al. 2019). Among them, the deterministic models require informative source data that are difficult to obtain in practice (Gorai et al. 2016), and in many cases, various types of parameters that need to be determined are estimated by experience, resulting in limited accuracy. In contrast with the deterministic methods, the statistical methods make it possible to avoid the complexity and bother of modeling as well as to show a good performance due to the use of the statistical modeling technique based on the data-driven manner (Najah et al. 2014; Khadr and Elshemy 2017).
In recent years, statistical water quality prediction models, such as artificial neural networks, are developing rapidly (Chen et al. 2020; Aldhyani et al. 2020). Among them, recurrent neural network (RNN) can deal with the backward and forward dependence of time-series data because of its built-in feedback and cyclic structure. However, traditional RNNs suffer from gradient disappearance or explosion and the inability to solve the problem of long-time dependence, and their practical applications are limited (Li et al. 2019). As a variant of RNN, LSTM can effectively portray the long-time dependence characteristics among time-series data by introducing memory units in the network structure, which is more advantageous for dealing with time-series prediction problems (Wang et al. 2017; Zhou et al. 2018).
Notably, the water quality prediction performance of machine learning models depends largely on the selection of hyperparameters of the models (Ahmed et al. 2019), since the appropriate selection of hyperparameters could remarkably increase prediction efficiency and decrease the prediction cost. However, there is no formal mathematical method for determining the optimal solution of the hyperparameters in the case of LSTM (Hu et al. 2019). To overcome this drawback, meta-heuristic algorithms, such as the particle swarm optimization algorithm (PSO), the pigeon-inspired optimization algorithm, and the ant lion algorithm (ALO) can be employed to determine the value of the hyperparameters. Among them, the ALO is a new bionic intelligent optimization algorithm proposed by Mirjalili (2015). Because of fewer initial parameters and high convergence accuracy, ALO has been widely used in many engineering fields involving parameter optimization. Studies show that the optimization performance of ALO is better than seven intelligent algorithms such as PSO, genetic algorithm, and cuckoo search algorithm (Mani et al. 2018). Therefore, the application of ALO in hyperparameters optimization of the water quality prediction model is worth further exploration. However, similar to many intelligent algorithms, ALO has disadvantages such as premature convergence and a tendency to fall into local optimum (Banadkooki et al. 2020). To address this problem, an improved ALO (IALO) based on chaotic initialization (Li et al. 2017a, b), Cauchy mutation (Rajaee et al. 2020), and OBL (Sun et al. 2019) was proposed in this paper. First, Sin chaotic mapping was used to initialize the population, adaptively adjust the chaotic search space to get the optimal solution, improve the poorly adapted individuals, and improve the overall fitness of the population and the efficiency of the search. Second, the variational update of the optimal position, i.e., the target value, was performed by integrating the Cauchy mutation operator and the OBL strategy to improve the ability of the algorithm to jump out of the local optimum.
The rapid development of modern water quality monitoring technology has generated a large amount of data on water quality parameters, the potential laws of which need to be urgently explored. Due to the influence of hydrometeorology, human activities, monitoring equipment, and monitoring methods, the original water quality data, like most time-series, are characterized by certain nonlinearities and nonstationarities, and the data contain noise, which is useless or even detrimental for prediction (Song et al. 2021a, b). To improve the quality of data mining, appropriate noise reduction methods should be selected to reduce the noise of the monitored water quality data. The key to data denoising is to find out the noise components and eliminate them. Therefore, it is necessary to pre-process the original water quality series to eliminate the disturbing information in the original time-series to improve the prediction accuracy.
Currently, the commonly used noise reduction methods for nonlinear signals include wavelet transform (Zubaidi et al. 2020; Song et al. 2022), empirical mode decomposition (EMD) (Lu and Ma 2020), and ensemble empirical mode decomposition (EEMD) (Li et al. 2017a, b). Among them, the denoising effect of wavelet transform mainly depends on the quality of wavelet base (Rajaee and Jafari 2018). EMD methods are prone to modal mixing during the decomposition process, which affects the decomposition effect (Meng et al. 2019). EEMD is an improvement of the EMD method by adding auxiliary noise to eliminate the modal confounding phenomenon that occurs in EMD, and by offsetting and suppressing the effects produced by noise in the decomposition results through multiple experiments; however, after integration averaging over a limited number of experiments, its reconstructed components still contain residual noise of a certain amplitude, and although the reconstruction error can be reduced by increasing the number of integrations, it increases the computational scale (Yu et al. 2018). CEEMDAN was proposed based on EEMD, which adds adaptive white noise at each stage of the decomposition and obtains each modal component by calculating a unique residual signal, and compared with the EEMD method, the reconstruction error is almost zero regardless of the number of integrations (Abba et al. 2020), and its decomposition process has integrity and overcomes the problem of low efficiency of EEMD decomposition.
The frequency of components processed by CEEMDAN is different, and the components correspond to different indication significance. The existing research usually regards the high-frequency components as noise components, which lacks certain scientific basis. As an improvement of sample entropy and approximate entropy, FE is also a measure of time-series complexity, which has been successfully applied to feature extraction of water quality signals (Liu et al. 2018). In this paper, CEEMDAN and FE were combined, and the noise components were determined and eliminated by fuzzy entropy. The denoised water quality characteristic sequence obtained from the reconstruction of residual components was used for water quality prediction.
According to the literature, there are few studies that apply IALO-LSTM to water quality parameter prediction; therefore, it is worth exploring and analyzing the prediction performance for the improved model. The main objectives of the present study are: (1) to combine CEEMDAN and FE for noise reduction of the original time-series; (2) improve the ALO with the Sin chaos initialization, Cauchy mutation and OBL strategy; (3) to construct a novel optimal-hybrid model combining IALO and LSTM for water quality prediction. Comparing to existing models, our novel method makes the following major contributions to the literature: (1) a novel data cleaning method is proposed for the time-series; (2) an alternative framework for water quality prediction with high accuracy is established. The remainder of this paper is organized as follows. The section ‘Material and methods’ provides a brief description of the methodology; the section ‘Case study’ introduces the general information of the research region, the datasets, and error evaluation criteria used in this study; the section ‘Results’ presents experimental results which compare the performance of the proposed model with peer algorithms. Discussion of the experimental results are presented in the section ‘Discussion’, and the conclusions and future work are given in the section ‘Conclusion’.
Material and methods
CEEMDAN-FE
CEEMDAN is a time–frequency domain analysis method which further eliminates the modal effects by adding adaptive noise, possesses strong adaptivity and convergence, and is mostly used to deal with nonlinear and non-stationary signals. More details about CEEMDAN can be found in the literature (Torres et al. 2011; Fan et al. 2021).
FE is a measure that can be used to describe the complexity of the specified sequence, and FE is an enhanced version of sample entropy and approximate entropy method (de Boves Harrington 2017), which describes the similarity of two adjacent vectors. It is used to measure the uncertainty of random variables and to classify the degree of uncertainty. FE not only retains the advantages of sample entropy and similarity entropy, but also solves the problem of imprecise analysis in small fluctuations and baseline drift caused by the vector similarity determined by the absolute value difference of data, so that it can obtain the entropy value even when the parameter value is very small, and change stably with the adjustment of parameters. The implementation steps of FE algorithm can be found in the literature (Singh 2018).
LSTM
LSTM is a special RNN. On the basis of ordinary RNN, LSTM has the function of long-term memory due to the addition of memory unit, and can effectively avoid the gradient dispersion problem of RNN in the long-term dependent sequence model (Zou et al. 2020). LSTM model includes four parts: input gate, forgetting gate, output gate, and cell state. The input gate determines how much input information is transmitted to the cell state; the forgetting gate mainly controls how much information in the cell state of the previous period is forgotten and how much information is transmitted to the current moment; the output gate outputs the results based on the cell state updated by the forgetting gate and the input gate. The cell state is used to record the current input, the state of the hidden layer at the previous time, the cell state at the previous time, and the information in the gate structure. The specific steps of LSTM algorithm can be referred to the literature (Zhou 2020).
IALO
For the LSTM model selected in this paper, there are four parameters that have an important impact on the performance of the algorithm, which are the number of neurons in the LSTM hidden layer L1, L2 (L1 and L2 refer to the number of LSTM units in the first and second layer), the learning rate Lr, and the number of training iterations K. These four key parameters are used as features for particle search, and the LSTM model is tuned and optimized using the IALO algorithm.
In the ALO algorithm, the random wandering of the ant population around the elite ant lions ensures the convergence of the parameter optimization process, and the roulette operation helps to improve the global search ability of the ant population to a certain extent. The specific steps of ALO algorithm can be referred to the literature (Heidari et al. 2020; Malik et al. 2020). However, the algorithm still suffers from the following problems: (1) In the ALO algorithm, the positions of both the initial ant and antlion populations are generated randomly, and the population positions produced by this method are not representative, which may lead to a large number of solutions distributed in local areas and insufficient population diversity in the initial stage of the algorithm. (2) If the elite and roulette-selected individuals of a given generation are not in the global optimum region, the entire population led by a single elite will reduce the convergence rate of the algorithm. (3) The predation radius of the ant lion shrinks in phases with the increase of the number of iterations, which leads to the gradual decrease of the population diversity, and it is difficult to jump out of the algorithm once it falls into the local optimum. To solve the above problems, this paper focuses on the following three strategies.
(1) Sin chaotic initialization
The quality of the initial population can be improved by adding chaotic mapping sequences during population initialization, enhancing population diversity and preventing premature population convergence. Chaotic sequences are often used in optimization search problems due to their randomness, ergodicity, and initial value sensitivity. The basic idea is to use chaotic mapping to introduce chaotic states into the optimization variables, and to enlarge the ergodic range of chaotic motion to the value range of the optimization variables, and then to use the chaotic variables for search optimization. Among them, Tent model, Logistic model, and Sin model are some common chaos models, but the first two are chaos models with a finite number of mapping folds, while the Sin chaos model is a model with an infinite number of mapping folds. The literature (Yang and E, 2009) pointed out that Sin chaos is superior to other similar models in terms of chaotic characteristics, so Sin chaos is used as the population initialization method of ALO algorithm. The expression for the one-dimensional self-map of Sin chaos is as follows:
where N is the number of initial populations. The initial value cannot be set to 0 to prevent the generation of immobile and zero points at [− 1,1].
(2) Integration of Cauchy mutation and OBL
Swarm intelligence algorithms often apply mutation strategies to improve the convergence accuracy of the algorithm. Among them, Gaussian mutation and Cauchy mutation are two of the more common perturbation strategies (Rahnamayan et al. 2012). Gaussian distribution is a continuous random-type probability distribution function of variables, while the Cauchy distribution function is a continuous-type probability distribution function with no mathematical expectation. According to the standard Cauchy and Gaussian distribution probability density function curves (Lan and Lan 2008), it can be found that the Cauchy distribution has a wider range of values on the x-axis compared with the Gaussian distribution, which means that the corresponding Cauchy mutation can help expand the range of solution space for particle search, and the particles that have undergone Cauchy mutation can search for more and better feasible solutions, and the Cauchy mutation strategy will be more able to prevent the algorithm from falling into local optimum.
The probability density function of Cauchy distribution is as follows:
where m is the scale parameter.
The Cauchy mutation is introduced into the update formula of the optimal antlion position to exploit the perturbation ability of the Cauchy operator, so that the global optimization performance of the algorithm can be improved
where \(R_{E}^{t}\) denotes the random wandering of the elite antlion at the tth iteration, and Cauchy (0,1) is the standard Cauchy perturbed random value.
The Cauchy distribution random variable generating function is
where ξ is a uniformly distributed random value in the interval (0,1).
The concept of OBL was proposed by Tizhoosh (2005), and its main idea is to search both the current solution and the backward solution and select the candidate solution by merit. The literature (Rahnamayan et al. 2008) shows that the process of OBL to find an optimal solution takes less time than that of the stochastic model, and that the backward solution has a nearly 50% higher probability of being close to the global optimal solution than the current solution, while evaluating the forward and backward solutions of a candidate solution is more beneficial to accelerate the convergence of the algorithm. To enable the optimal antlion individual to better find the optimal solution, the OBL strategy is incorporated into the antlion algorithm, mathematically characterized as follows:
where \(R_{E}^{{t^{^{\prime}} }}\) is the inverse solution of the optimal solution in the tth generation, ub, lb are the upper and lower bounds, respectively, r is a 1*d (d is the spatial dimension) matrix of random numbers obeying a (0,1) standard uniform distribution, and b1 denotes the information exchange control parameter with the following equation:
where t is the current iteration, and T denotes the maximum number of iterations.
(3) Integration of dynamic selection and greedy rule
To further improve the algorithm's optimization-seeking performance, this paper draws on the dynamic selection strategy proposed by Mao and Zhang (2020) to update the target location by alternating the OBL and the Cauchy mutation strategy to dynamically update the target location under certain probability. In the OBL strategy, the backward solution is obtained through the OBL mechanism to expand the search field of the algorithm, and in the Cauchy mutation strategy, the Cauchy mutation operator is applied to perform a perturbation variation operation at the optimal solution position to derive a new solution, which improves the defect of the algorithm falling into the local region. As for which strategy to adopt for target position update, it is determined by the selection probability Ps, which is calculated as follows:
where θ is the adjustment parameter, and its value can be taken as 0.05.
The specific strategy is chosen in the following way.
If rand < Ps, the OBL strategy is selected for position update according to Eqs. (5)–(7); otherwise, the Cauchy mutation strategy is selected for target position update according to (3).
Although the ability of the algorithm to leap out of the local space can be enhanced by the two perturbation strategies mentioned above, it is not possible to determine whether the new position obtained after the perturbation variation is better than the fitness value of the original position, so after the perturbation variation update, the greedy rule (Deng et al. 2015) is introduced to determine whether the position should be updated by comparing the fitness values of the old and new positions. The greedy rule is shown in Eq. (9), and f(x) denotes the fitness value of the position of x
where \(Antlion_{j}^{t}\) denotes the position of the jth antlion in the tth iteration.
The design of the water quality forecasting model based on CEEMDAN-IALO-LSTM
The methodology of the CEEMDAN-IALO-LSTM water quality prediction model is shown in Fig. 1.
(1) CEEMDAN is used to decompose the water quality time-series to obtain K modal components and a residual component, and calculate FE for each component, remove the first two components corresponding to the largest FE to achieve noise reduction, and reconstruct the remaining components by summation.
(2) The reconstructed time-series are divided into a training set and a test set and normalized.
(3) Improve the traditional ALO using strategies that include chaotic initialization, Cauchy mutation, and OBL to build IALO.
(4) The hyperparameters (L1, L2, Lr, K) of the LSTM are determined using IALO to construct the IALO-LSTM hybrid prediction model.
(5) The denoised data are input into IALO-LSTM model for rolling prediction, and the output is performed after the inverse normalization process and the prediction results are analyzed.
Case study
Data source
The Pearl River is the largest river in southern China, flowing through six provinces (autonomous regions) and the northeastern part of the Socialist Republic of Vietnam, and is known as one of the seven major rivers in China. The Pearl River basin lies between 21°31'N–26°49'N and 102°14'E–115°53'E, covering a total area of 4.537 × 105 km2 (Fig. 2). The Pearl River basin is rich in water resources, with a total length of 2214 km and an average annual runoff of about 3,36 × 108 m3, ranking second in China.
With the acceleration of industrialization and urban–rural integration, the problems of uncoordinated water resources distribution and economic development, water pollution and water ecological environment deterioration, and mixed water supply and drainage in the Pearl River are becoming more and more prominent. The investigation results point out that the water quality of the Pearl River estuary is poor, and the main pollution factors are inorganic nitrogen and reactive phosphate. In recent years, after the pollution of the Pearl River, the environmental awareness and supervision of regional governments have been strengthened. By 2020, among the 165 surface water monitoring sections in the Pearl River Basin, the proportion of good water quality sections reached 92.7%, an increase of 3.0% over 2016; the proportion of water quality inferior to class V section achieved zero, a decrease of more than 3.6% over 2016.
The datasets used were collected at the Yongjiang River and Beijiang River, from 2010 to 2016 with a week's interval, which spans a period of 365 weeks. DO was selected as a predictor in the monitoring index.
The monitoring data of DO at Yongjiang River and Beijiang River are depicted in Fig. 3, and the statistical analysis results of the original time-series of DO concentration are given in Table 1.
Parameter specification
The selected monitoring data can be abnormal and missing due to instrument failure, signal interference, etc. The original data were first detected for outliers, and the criterion for determining this was the Laida criterion (Chen and Chau 2019), the outliers were considered as missing values, and then, the missing values were filled using the Lagrange interpolation method.
In this paper, the denoised data are divided into the training set and test set, and their ratios are 7:3. The sliding window length is 10, that is, the data of the first 10 weeks are used to predict the next one. To speed up the model calibration process, data were normalized to an interval by transformation of means
where the ith variable is represented by xi; ximin stands for the minimum value of xi in the training set; ximax is the maximum value of xi in the training set.
For CEEMDAN, the parameters are set as follows: the noise standard deviation is 0.2, the number of realizations is 500, and the maximum number of sifting iterations allowed is 5000. The initial parameters of the stand-alone LSTM algorithm are set to L1 = 20, L2 = 20, Lr = 0.005, and K = 20. The parameters of ALO used to optimize LSTM are: the initial number of ants is 5, and the maximum number of iterations is 20. The parameter setting of IALO is just adding the part of Cauchy mutation and OBL on the basis of ALO algorithm, and the rest of the settings are the same as ALO. The ALO and IALO are used for the hyperparameter search of LSTM, where the values of L1 are taken in the range [1,100], L2 in the range [1,100], K in the range [1,100], and Lr in the range [0.001,0.01].
To verify the effectiveness and superiority of IALO, the gray wolf optimization algorithm (GWO) and the whale optimization algorithm (WOA) are also used to optimize LSTM (Wang et al. 2022; Yang et al. 2022). The number of initial wolves and whales is 5, and the maximum number of iterations is 20. The value range of the above four hyperparameters is kept consistent with IALO. The back propagation neural network (BPNN), as a comparison algorithm, is a single implied layer structure, with ten hidden layer nodes, learning rate set to 0.1, maximum 1000 iterations, and target accuracy set to 1e-3.
All the algorithms and models applied in this study were implemented by in-house coding on the platform of MATLAB R2020a.
Error evaluation criteria
The selection of reasonable performance criteria is of utmost importance for assessing model performance. In this paper, the mean absolute error (MAE), root-mean-square error (RMSE), mean relative error (MAPE), correlation coefficient (CC), determination coefficient (R2), Nash–Sutcliffe efficiency coefficient (NSE), and Willimot indexes of agreement (IA) are adopted to statistically evaluate the models applied in this paper. The mathematical representations are cited as follows:
where n is the number of test set, \(O_{i}\) and \(P_{i}\) are the actual and predicted values of DO concentration at time i, and \(\overline{O}\) and \(\overline{P}\) are the mean value of the actual and predicted DO concentration.
The R2 measures the degree of correlation between the actual and predicted values, and the CC is often used to evaluate the linear relationship between the predicted and actual values. The NSE can also be used to assess the prediction power of models, introduced by Nash and Sutcliffe (1970). The best fit between the predicted value and actual value would have MAE = 0, MAPE = 0, RMSE = 0, R2 = 1, CC = 1, and NSE = 1, respectively.
In addition, to study and analyze the distribution characteristics of the prediction errors of each model, statistical analysis and Gaussian mixture distribution test were performed on the prediction errors of each model. Assuming that the prediction error sequence f(W) of the model obeys the probability distribution of the location parameter μ and the scale parameter σ, the Gaussian distribution formula of the prediction error satisfying N(μ,σ2) is
where μ is the expected mean of the model prediction error value; σ is the standard deviation of the model prediction error value.
Noise reduction effect evaluation criteria
This paper adopts three commonly used noise reduction effect evaluation criteria, namely signal-to-noise ratio Rsn (Song et al. 2021a, b), CC, and residual energy percentage Esn.
For Esn, the sequence energy is obtained as follows (Song et al. 2021a, b):
The Esn of the noise reduced sequence to the original sequence is defined as
where E represents the energy of the original sequence; E0 represents the energy of the noise reduced sequence.
Results
Although the data sets of both Yongjiang and Beijiang monitoring points are calculated in this paper, to avoid repetition, the main body only focuses on the Yongjiang river. The detailed introduction of the prediction results of Beijiang River is shown in Table 4.
Comparison of different noise reduction methods
CEEMDAN was used to decompose the Yongjiang water quality parameter series into 8 subseries with different amplitudes and frequencies, including 8 IMFs and 1 residual term (Fig. 4). It can be seen from Fig. 4 that each subsequence shows obvious periodic variation characteristics. The frequency of IMF1 is the highest and the wavelength is short. The frequency of IMF2 to IMF8 decreases in turn and the wavelength increases. The amplitude of IMF5 is the largest and the fluctuation range is [− 0.69,0.69]. The residual term represents the trend term of DO sequence, which shows that the DO content of Yongjiang River is increasing gradually from 2010 to 2016.
The FE is calculated for each component of the decomposition, and the results are shown in Fig. 5. According to Fig. 5, the FE of IMF1 and IMF3 is significantly higher than the rest of the components, both of which can be regarded as noise components, after the two components are removed, the remaining components can be added and reconstructed to get the denoised water quality data.
To test the noise reduction effect of CEEMDAN-FE on water quality signals, the results are compared with those obtained by combining EMD, EEMD, complete ensemble empirical mode decomposition (CEEMD), and variational mode decomposition (VMD) with FE, respectively, and the noise reduction method is also to remove the first two items with higher FE.
Figure 6 shows the reconstructed sequence errors of different noise reduction methods, including EMD-FE, EEMD-FE, CEEMD-FE, etc. The amplitude of sequence error after CEEMDAN-FE joint noise reduction fluctuates in [ −1.12,0.56], which is significantly smaller than the other three methods. Table 2 shows the noise reduction effect of different methods. The results are basically consistent with Fig. 6, although CEEMDAN-FE is not the best in terms of performance in all three evaluation indicators, and it is able to meet the needs of data noise reduction in terms of overall performance.
CEEMDAN-IALO-LSTM results
In this paper, the best mean square error (MSE) obtained by LSTM in the training process is used as the fitness function. The four parameters corresponding to the minimum MSE are the hyperparameters obtained from the optimization. The error convergent curve during training process is shown in Fig. 7a, and it can be found that the error of CEEMDAN-ALO-LSTM converged quickly as the iteration number became larger and larger. The fitness evolution curve of CEEMDAN-ALO-LSTM reached the required accuracy within 5 iterations and then maintained the optimal fitness value, showing strong learning ability. Compared with CEEMDAN-ALO-LSTM, the initial and final errors of CEEMDAN-IALO-LSTM are one order of magnitude smaller, and the model accuracy has been improved substantially. Figure 7b shows the calculated results of LSTM hyperparameters optimized by ISSA, i.e., L1 = 48, L2 = 69, K = 77, and Lr = 0.00865.
Comparison of different meta-heuristic optimization algorithms
To test the effectiveness of the proposed IALO, GWO, WOA, and ALO are used as the benchmark meta-optimization algorithm models in this paper. Table 3 shows the results of LSTM hyperparameters obtained by applying different meta-optimization algorithms.
The LSTM hyperparameters optimized by different meta-optimization algorithms are used for prediction, and the results are shown in Fig. 8. The meta-optimization algorithms play an important role in improving the prediction accuracy of the LSTM algorithm. Compared with LSTM, ALO-LSTM, IALO-LSTM, GWO-LSTM, and WOA-LSTM decreased by 10.58%, 23.38%, 20.60%, and 18.46% in MAE, 11.02%, 20.52%, 18.45%, and 19.80% in RMSE, and 10.57%, 22.09%, 21.95%, and 18.54% in MAPE, respectively. Results show that IALO-LSTM outperforms other models in terms of three error evaluation criteria.
Prediction results of different algorithms
To deeply analyze the prediction error distribution and evaluate the observation accuracy of each model, the Gaussian error distribution fitting test was used to analyze them, and after calculating their mathematical and statistical indicators such as mean, standard deviation and variance using Eq. (48), the error distribution was depicted, as shown in Fig. 9a. The prediction errors of the hybrid prediction model proposed in this paper are distributed around the zero point, i.e., the probability density of the interval with smaller errors is larger and has higher observation accuracy. The LSTM, autoregressive integrated moving average (ARIMA), and IALO-LSTM models have a wider range of error distribution and higher probability density at larger errors, and the proposed models have better observation accuracy than the BPNN, CEEMDAN-LSTM, and CEEMDAN-ALO-LSTM models.
Figure 9b shows the prediction curve of various models. It can be seen that the prediction curves obtained by each method are basically consistent with the trend of the actual value curves, and the prediction curves are not beyond 95% confidence interval of the actual monitoring results. However, through careful local observation, the CEEMDAN-IALO-LSTM model corresponds to a prediction curve that is closer to the actual monitoring curve and has higher prediction accuracy than the other models. This also indicates that the proposed CEEMDAN-IALO-LSTM model is more suitable for accurate prediction of short-term water quality parameters compared with other comparison models.
The Taylor diagram and Violin plot (Fig. 10) were chosen to compare and analyze the performance of different prediction methods. According to Fig. 10a, the results of all applied models had a CC between 0.6 and 0.95, and the CC is more than 0.90 for the proposed model. The root-mean-square deviation (RMSD) values are sorted as follows: CEEMDAN-IALO-LSTM < CEEMDAN-ALO-LSTM < BPNN < IALO-LSTM < CEEMDAN-LSTM < LSTM < ARIMA. The CEEMDAN-IALO-LSTM algorithm has the highest accuracy among the peer models, and can meet the requirements for short-term prediction accuracy of watershed water quality. According to Fig. 10b, the violin plot patterns of the prediction results of CEEMDAN-IALO-LSTM are most similar to those of the original measurement data, and the median and frequency distributions are closest to the actual values, indicating good prediction performance.
Table 4 shows the comparison of various models in prediction performance, and the bold values in the table indicate the best performance among peer models. According to Table 4, the CEEMDAN-IALO-LSTM has the best prediction performance among the peer models. Compared with the stand-alone LSTM, the CEEMDAN-IALO-LSTM model reduced by 44.27%, 43.16%, and 44.34% in terms of MAE, MAPE, and RMSE respectively, and increased by 53.77%, 24.00%, and 63.10% in terms of R2, CC, and NSE, respectively. Its improvement for the stand-alone LSTM model was higher than that of the other models, and its error performance was also better than some new improved LSTM algorithms (Dong et al. 2019; Xu et al. 2019; Liu et al. 2020). In addition, the DO prediction results of Yongjiang river are basically consistent with those of Beijiang river, which shows the practicability and robustness of the proposed model.
Discussion
In this paper, we verified the ability of the CEEMDAN-IALO-LSTM to address the advanced prediction problem of water quality parameters. Summarizing the research results, the error statistics of the Yongjiang river and Beijiang river demonstrate that the proposed model, combining the strong noise-resistant robustness of the CEEMDAN-FE and the nonlinear mapping of the LSTM, outperforms other models in terms of prediction performance. By averaging the error statistics of the two gauging stations, it can be found that, compared with the basic LSTM, the CEEMDAN-IALO-LSTM reduced by 50.24%, 47.66%, and 49.66% in MAE, MAPE, and RMSE, and increase by 71.74%, 31.42%, and 172.23% in R2, CC, and NSE, respectively. The performance improvement of CEEMDAN-IALO-LSTM for the LSTM model is significantly higher than that of the other models. The model developed in this study can be further applied in other regions to improve the forecasting of water quality parameters and other ambient indexes.
Because of the sensitivity of the short-term prediction model to the original water quality series, taking into account the impact of various factors on the water quality series, for further study, more effective pretreatment methods of noise reduction for water quality data should be explored and applied to further enhance the algorithm performance. The methods that can be tried include synchrosqueezing wavelet transform (SWT) (Zhang et al. 2020), Savitzky-Golay (SG) filter (Bi et al. 2021), threshold denoising method based on wavelet packet transform (Lang et al. 2018), etc.
In this paper, only three base prediction models (i.e., LSTM, BPNN, and ARIMA) are analyzed. There exist some other improvements to these models, for example, the seasonal autoregressive integrated moving average (Wang et al. 2019), gated recurrent unit (GRU) and the bidirectional LSTM (Chen et al. 2018). Therefore, more base models with various denoising methods should be compared and analyzed, to further strengthen the applicability of the IALO-LSTM model in watershed water quality prediction. In addition, it is valuable to examine the application of more ensemble learning models (Kim and Seo 2015) in the future.
Conclusion
The high-precision advance prediction of water quality can realize the early warning of water pollution and provide a reference basis for government decisions. This paper proposed a water quality prediction model based on CEEMDAN-IALO-LSTM, including the framework design of CEEMDAN-IALO-LSTM, the determination of model parameters, and the specification of datasets used, which effectively solves the problems of easy to fall into local optimum, slow late convergence, and early convergence in the traditional ALO algorithm. The results indicate that the comprehensive performance of the CEEMDAN-IALO-LSTM model outperforms most of the existing prediction algorithms, with average MAE, MAPE, RMSE, R2, CC, and NSE of 0.42, 5.37%, 0.53, 0.83, 0.91, and 0.82 in the test stage, which are 50.24%, 47.66%, 49.66%, 71.74%, 31.42%, and 172.23% improvements, respectively, compared to the baseline LSTM. In general, this study supports the applicability and robustness of CEEMDAN-IALO-LSTM algorithm in the field of water quality prediction, and to a certain extent, extending the application of ensemble learning technology.
The parameters used to characterize the water quality include DO, chemical oxygen demand (COD), total nitrogen (TN), total phosphorus (TP), etc. However, only DO indicators of Yongjiang and Beijiang were selected due to the completeness of monitoring data in this paper. In future research, for better verification of model robustness, more complete monitoring data should be selected and multi-parameter indicators from multiple regions should be involved in prediction. Principal component analysis or gray correlation analysis can be used to extract the key indicators.
Although the proposed model had an acceptable accuracy, it is possible to employ other meta-heuristic optimization algorithms and improved base models, and integrate them with preprocessing denoising methods such as SWT, Savitzky–Golay (SG) filter, threshold denoising method based on wavelet packet transform. This will create more accurate models in the prediction of DO and other environmental indicators. With the aim of enhancing the validation of the applicability and robustness of the proposed model, we recommend using large data samples with various indicators at different monitoring sites.
Data availability
All data generated or analyzed during this study are included in this published article.
References
Abba SI, Pham QB, Saini G, Linh NTT et al (2020) Implementation of data intelligence models coupled with ensemble machine learning for prediction of water quality index. Environ Sci Pollut Res 27(33):41524–41539. https://doi.org/10.1007/s11356-020-09689-x
Ahmed AN, Othman FB, Afan HA et al (2019) Machine learning methods for better water quality prediction. J Hydrol 578:124084. https://doi.org/10.1016/j.jhydrol.2019.124084
Aldhyani TH, Al-Yaari M, Alkahtani H, Maashi M (2020) Water quality prediction using artificial intelligence algorithms. Appl Bionics Biomech. https://doi.org/10.1155/2020/6659314
Banadkooki FB, Ehteram M, Ahmed AN et al (2020) Suspended sediment load prediction using artificial neural network and ant lion optimization algorithm. Environ Sci Pollut Res 27(30):38094–38116. https://doi.org/10.1007/s11356-020-09876-w
Bi J, Lin Y, Dong Q, Yuan H, Zhou M (2021) Large-scale water quality prediction with integrated deep neural network. Inf Sci 571:191–205. https://doi.org/10.1016/j.ins.2021.04.057
Chen XY, Chau KW (2019) Uncertainty analysis on hybrid double feedforward neural network model for sediment load estimation with LUBE method. Water Resour Manag 33(10):3563–3577. https://doi.org/10.1007/s11269-019-02318-4
Chen X, Feng F, Wu J, Liu W (2018) Anomaly detection for drinking water quality via deep biLSTM ensemble. In: Proceedings of the genetic and evolutionary computation conference companion (pp. 3–4). https://doi.org/10.1145/3205651.3208203
Chen Y, Song L, Liu Y, Yang L, Li D (2020) A review of the artificial neural network models for water quality prediction. Appl Sci 10(17):5776. https://doi.org/10.3390/app10175776
de Boves HP (2017) Support vector machine classification trees based on fuzzy entropy of classification. Anal Chim Acta 954:14–21. https://doi.org/10.1016/j.aca.2016.11.072
Deng W, Wang G, Zhang X (2015) A novel hybrid water quality time series prediction method based on cloud model and fuzzy forecasting. Chemom Intell Lab Syst 149:39–49. https://doi.org/10.1016/j.chemolab.2015.09.017
Dong Q, Lin Y, Bi J, Yuan H (2019) An integrated deep neural network approach for large-scale water quality time series prediction. In: 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC) (pp. 3537–3542). IEEE. https://doi.org/10.1109/SMC.2019.8914404
Fan S, Hao D, Feng Y, Xia K, Yang W (2021) A Hybrid model for air quality prediction based on data decomposition. Information 12(5):210. https://doi.org/10.3390/info12050210
Gorai AK, Hasni SA, Iqbal J (2016) Prediction of ground water quality index to assess suitability for drinking purposes using fuzzy rule-based approach. Appl Water Sci 6(4):393–405. https://doi.org/10.1007/s13201-014-0241-3
Heidari AA, Faris H, Mirjalili S, Aljarah I, Mafarja M (2020) Ant lion optimizer: theory, literature review, and application in multi-layer perceptron neural networks. Nat-Inspired Optim. https://doi.org/10.1007/978-3-030-12127-3_3
Hu Z, Zhang Y, Zhao Y, Xie M, Zhong J, Tu Z, Liu J (2019) A water quality prediction method based on the deep LSTM network considering correlation in smart mariculture. Sensors 19(6):1420. https://doi.org/10.3390/s19061420
Khadr M, Elshemy M (2017) Data-driven modeling for water quality prediction case study: the drains system associated with Manzala Lake, Egypt. Ain Shams Eng J 8(4):549–557. https://doi.org/10.1016/j.asej.2016.08.004
Kim SE, Seo IW (2015) Artificial neural network ensemble modeling with conjunctive data clustering for water quality prediction in rivers. J Hydro-Environ Res 9(3):325–339. https://doi.org/10.1016/j.jher.2014.09.006
Lan KT, Lan CH (2008) Notes on the distinction of Gaussian and Cauchy mutations. In: 2008 Eighth International Conference on Intelligent Systems Design and Applications (Vol. 1, pp. 272–277). IEEE. https://doi.org/10.1109/ISDA.2008.237
Lang X, Hu Z, Li P, Li Y, Cao J, Ren H (2018) Pipeline leak aperture recognition based on wavelet packet analysis and a deep belief network with ICR. Wirel Commun Mob Comput. https://doi.org/10.1155/2018/6934825
Li M, Wu W, Chen B, Guan L, Wu Y (2017a) Water quality evaluation using back propagation artificial neural network based on self-adaptive particle swarm optimization algorithm and chaos theory. Comput Water, Energy, Environ Eng 6(03):229. https://doi.org/10.4236/cweee.2017.63016
Li X, Cheng Z, Yu Q, Bai Y, Li C (2017b) Water-quality prediction using multimodal support vector regression: case study of Jialing River, China. J Environ Eng 143(10):04017070. https://doi.org/10.1061/(ASCE)EE.1943-7870.0001272
Li L, Jiang P, Xu H, Lin G, Guo D, Wu H (2019) Water quality prediction based on recurrent neural network and improved evidence theory: a case study of Qiantang River, China. Environ Sci Pollut Res 26(19):19879–19896. https://doi.org/10.1007/s11356-019-05116-y
Liu D, Cheng C, Fu Q et al (2018) Multifractal detrended fluctuation analysis of regional precipitation sequences based on the CEEMDAN-WPT. Pure Appl Geophys 175(8):3069–3084. https://doi.org/10.1007/s00024-018-1820-2
Liu J, Yu C, Hu Z, Zhao Y, Bai Y, Xie M, Luo J (2020) Accurate prediction scheme of water quality in smart mariculture with deep Bi-S-SRU learning network. IEEE Access 8:24784–24798. https://doi.org/10.1109/ACCESS.2020.2971253
Lu H, Ma X (2020) Hybrid decision tree-based machine learning models for short-term water quality prediction. Chemosphere 249:126169. https://doi.org/10.1016/j.chemosphere.2020.126169
Malik A, Tikhamarine Y, Souag-Gamane D, Kisi O, Pham QB (2020) Support vector regression optimized by meta-heuristic algorithms for daily streamflow prediction. Stoch Env Res Risk Assess 34(11):1755–1773. https://doi.org/10.1007/s00477-020-01874-1
Mani M, Bozorg-Haddad O, Chu X (2018) Ant lion optimizer (ALO) algorithm. Advanced optimization by nature-inspired algorithms. Springer, Singapore, pp 105–116
Mao Q, Zhang Q (2021) Improved sparrow algorithm combining Cauchy mutation and opposition-based learning. J Front Comp Sci Technol. https://doi.org/10.3778/j.issn.1673-9418.2010032
Meng E, Huang S, Huang Q et al (2019) A robust method for non-stationary streamflow prediction based on improved EMD-SVM model. J Hydrol 568:462–478. https://doi.org/10.1016/j.jhydrol.2018.11.015
Mirjalili S (2015) The ant lion optimizer. Adv Eng Softw 83:80–98. https://doi.org/10.1016/j.advengsoft.2015.01.010
Najah A, El-Shafie A, Karim OA, El-Shafie AH (2014) Performance of ANFIS versus MLP-NN dissolved oxygen prediction models in water quality monitoring. Environ Sci Pollut Res 21(3):1658–1670. https://doi.org/10.1007/s11356-013-2048-4
Nash JE, Sutcliffe JV (1970) River flow forecasting through conceptual models part I—a discussion of principles. J Hydrol 10(3):282–290. https://doi.org/10.1016/0022-1694(70)90255-6
Rahnamayan S, Tizhoosh HR, Salama MM (2008) Opposition-based differential evolution. IEEE Trans Evol Comput 12(1):64–79. https://doi.org/10.1109/TEVC.2007.894200
Rahnamayan S, Wang GG, Ventresca M (2012) An intuitive distance-based explanation of opposition-based sampling. Appl Soft Comput 12(9):2828–2839. https://doi.org/10.1016/j.asoc.2012.03.034
Rajaee T, Jafari H (2018) Utilization of WGEP and WDT models by wavelet denoising to predict water quality parameters in rivers. J Hydrol Eng 23(12):04018054. https://doi.org/10.1061/(ASCE)HE.1943-5584.0001700
Rajaee T, Khani S, Ravansalar M (2020) Artificial intelligence-based single and hybrid models for prediction of water quality in rivers: a review. Chemom Intell Lab Syst 200:103978. https://doi.org/10.1016/j.chemolab.2020.103978
Singh P (2018) Indian summer monsoon rainfall (ISMR) forecasting using time series data: a fuzzy-entropy-neuro based expert system. Geosci Front 9(4):1243–1257. https://doi.org/10.1016/j.gsf.2017.07.011
Song C, Yao L (2022) Application of artificial intelligence based on synchrosqueezed wavelet transform and improved deep extreme learning machine in water quality prediction. Environ Sci Pollut Res. https://doi.org/10.1007/s11356-022-18757-3
Song C, Yao L, Hua C, Ni Q (2021a) A novel hybrid model for water quality prediction based on synchrosqueezed wavelet transform technique and improved long short-term memory. J Hydrol 603:126879. https://doi.org/10.1016/j.jhydrol.2021.126879
Song C, Yao L, Hua C, Ni Q (2021b) A water quality prediction model based on variational mode decomposition and the least squares support vector machine optimized by the sparrow search algorithm (VMD-SSA-LSSVM) of the Yangtze River, China. Environ Monit Assess 193(6):1–17. https://doi.org/10.1007/s10661-021-09127-6
Sun L, Chen S, Xu J, Tian Y (2019) Improved monarch butterfly optimization algorithm based on opposition-based learning and random local perturbation. Complexity. https://doi.org/10.1155/2019/4182148
Tizhoosh HR (2005) Opposition-based learning: a new scheme for machine intelligence. In: International conference on computational intelligence for modelling, control and automation and international conference on intelligent agents, web technologies and internet commerce (CIMCA-IAWTIC'06) (Vol. 1, pp. 695–701). IEEE. https://doi.org/10.1109/CIMCA.2005.1631345
Torres ME, Colomina MA, Schlotthauer G, Flandrin P (2011) A complete ensemble empirical mode decomposition with adaptive noise. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4144–4147). IEEE. https://doi.org/10.1109/ICASSP.2011.5947265
Wang Y, Zhou J, Chen K, Wang Y, Liu L (2017) Water quality prediction method based on LSTM neural network. In: 2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE) (pp. 1–5). IEEE. https://doi.org/10.1109/ISKE.2017.8258814
Wang Y, Zhou X, Engel B (2018) Water environment carrying capacity in Bosten Lake basin. J Clean Prod 199:574–583. https://doi.org/10.1016/j.jclepro.2018.07.202
Wang J, Zhang L, Zhang W, Wang X (2019) Reliable model of reservoir water quality prediction based on improved ARIMA method. Environ Eng Sci 36(9):1041–1048. https://doi.org/10.1089/ees.2018.0279
Wang J, Xu Y, She C, Xu P, Bagal HA (2022) Optimal parameter identification of SOFC model using modified gray wolf optimization algorithm. Energy 240:122800. https://doi.org/10.1016/j.energy.2021.122800
Xu R, Xiong Q, Yi H, Wu C, Ye J (2019) Research on water quality prediction based on SARIMA-LSTM: a case study of Beilun Estuary. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS) (pp. 2183–2188). IEEE. https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00302
Yang HEJ (2009) An adaptive chaos immune optimization algorithm with mutative scale and its application. Control Theory Appl 6(10):1069–1074
Yang W, Xia K, Fan S, Wang L, Li T, Zhang J, Feng Y (2022) A multi-strategy whale optimization algorithm and its application. Eng Appl Artif Intell 108:104558. https://doi.org/10.1016/j.engappai.2021.104558
Yu Y, Zhang H, Singh VP (2018) Forward prediction of runoff data in data-scarce basins with an improved ensemble empirical mode decomposition (EEMD) model. Water 10(4):388. https://doi.org/10.3390/w10040388
Zhai X, Xia J, Zhang Y (2014) Water quality variation in the highly disturbed Huai River Basin, China from 1994 to 2005 by multi-statistical analyses. Sci Total Environ 496:594–606. https://doi.org/10.1016/j.scitotenv.2014.06.101
Zhang Y, Wen X, Jiang L, Liu J, Yang J, Liu S (2020) Prediction of high-quality reservoirs using the reservoir fluid mobility attribute computed from seismic data. J Petrol Sci Eng 190:107007. https://doi.org/10.1016/j.petrol.2020.107007
Zhou Y (2020) Real-time probabilistic forecasting of river water quality under data missing situation: deep learning plus post-processing techniques. J Hydrol 589:125164. https://doi.org/10.1016/j.jhydrol.2020.125164
Zhou J, Wang Y, Xiao F, Wang Y, Sun L (2018) Water quality prediction method based on IGRA and LSTM. Water 10(9):1148. https://doi.org/10.3390/w10091148
Zou Q, Xiong Q, Li Q, Yi H, Yu Y, Wu C (2020) A water quality prediction method based on the multi-time scale bidirectional long short-term memory network. Environ Sci Pollut Res. https://doi.org/10.1007/s11356-020-08087-7
Zubaidi SL, Al-Bugharbee H, Ortega-Martorell S et al (2020) A novel methodology for prediction urban water demand by wavelet denoising and adaptive neuro-fuzzy inference system approach. Water 12(6):1628. https://doi.org/10.3390/w12061628
Acknowledgements
The authors would like to thank the editors and anonymous reviewers for their constructive comments that greatly contributed to improving the manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Song, C., Yao, L. A hybrid model for water quality parameter prediction based on CEEMDAN-IALO-LSTM ensemble learning. Environ Earth Sci 81, 262 (2022). https://doi.org/10.1007/s12665-022-10380-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12665-022-10380-2