Introduction

The development of society cannot be separated from the nourishment of the water system. In recent years, the aggravation of human production activities leads to serious environmental pollution in some river basins (Zhai et al. 2014; Wang et al. 2018), which is not conducive to the construction of ecological civilization in the watershed area. The advanced prediction of water quality is a key aspect of watershed water resource management. Therefore, the water quality prediction with high accuracy is essential for the prevention of medical accidents due to water pollution and for the comprehensive and effective control of the watershed environment.

The existing research methodology for water quality prediction mainly includes deterministic and statistical methods (Ahmed et al. 2019). Among them, the deterministic models require informative source data that are difficult to obtain in practice (Gorai et al. 2016), and in many cases, various types of parameters that need to be determined are estimated by experience, resulting in limited accuracy. In contrast with the deterministic methods, the statistical methods make it possible to avoid the complexity and bother of modeling as well as to show a good performance due to the use of the statistical modeling technique based on the data-driven manner (Najah et al. 2014; Khadr and Elshemy 2017).

In recent years, statistical water quality prediction models, such as artificial neural networks, are developing rapidly (Chen et al. 2020; Aldhyani et al. 2020). Among them, recurrent neural network (RNN) can deal with the backward and forward dependence of time-series data because of its built-in feedback and cyclic structure. However, traditional RNNs suffer from gradient disappearance or explosion and the inability to solve the problem of long-time dependence, and their practical applications are limited (Li et al. 2019). As a variant of RNN, LSTM can effectively portray the long-time dependence characteristics among time-series data by introducing memory units in the network structure, which is more advantageous for dealing with time-series prediction problems (Wang et al. 2017; Zhou et al. 2018).

Notably, the water quality prediction performance of machine learning models depends largely on the selection of hyperparameters of the models (Ahmed et al. 2019), since the appropriate selection of hyperparameters could remarkably increase prediction efficiency and decrease the prediction cost. However, there is no formal mathematical method for determining the optimal solution of the hyperparameters in the case of LSTM (Hu et al. 2019). To overcome this drawback, meta-heuristic algorithms, such as the particle swarm optimization algorithm (PSO), the pigeon-inspired optimization algorithm, and the ant lion algorithm (ALO) can be employed to determine the value of the hyperparameters. Among them, the ALO is a new bionic intelligent optimization algorithm proposed by Mirjalili (2015). Because of fewer initial parameters and high convergence accuracy, ALO has been widely used in many engineering fields involving parameter optimization. Studies show that the optimization performance of ALO is better than seven intelligent algorithms such as PSO, genetic algorithm, and cuckoo search algorithm (Mani et al. 2018). Therefore, the application of ALO in hyperparameters optimization of the water quality prediction model is worth further exploration. However, similar to many intelligent algorithms, ALO has disadvantages such as premature convergence and a tendency to fall into local optimum (Banadkooki et al. 2020). To address this problem, an improved ALO (IALO) based on chaotic initialization (Li et al. 2017a, b), Cauchy mutation (Rajaee et al. 2020), and OBL (Sun et al. 2019) was proposed in this paper. First, Sin chaotic mapping was used to initialize the population, adaptively adjust the chaotic search space to get the optimal solution, improve the poorly adapted individuals, and improve the overall fitness of the population and the efficiency of the search. Second, the variational update of the optimal position, i.e., the target value, was performed by integrating the Cauchy mutation operator and the OBL strategy to improve the ability of the algorithm to jump out of the local optimum.

The rapid development of modern water quality monitoring technology has generated a large amount of data on water quality parameters, the potential laws of which need to be urgently explored. Due to the influence of hydrometeorology, human activities, monitoring equipment, and monitoring methods, the original water quality data, like most time-series, are characterized by certain nonlinearities and nonstationarities, and the data contain noise, which is useless or even detrimental for prediction (Song et al. 2021a, b). To improve the quality of data mining, appropriate noise reduction methods should be selected to reduce the noise of the monitored water quality data. The key to data denoising is to find out the noise components and eliminate them. Therefore, it is necessary to pre-process the original water quality series to eliminate the disturbing information in the original time-series to improve the prediction accuracy.

Currently, the commonly used noise reduction methods for nonlinear signals include wavelet transform (Zubaidi et al. 2020; Song et al. 2022), empirical mode decomposition (EMD) (Lu and Ma 2020), and ensemble empirical mode decomposition (EEMD) (Li et al. 2017a, b). Among them, the denoising effect of wavelet transform mainly depends on the quality of wavelet base (Rajaee and Jafari 2018). EMD methods are prone to modal mixing during the decomposition process, which affects the decomposition effect (Meng et al. 2019). EEMD is an improvement of the EMD method by adding auxiliary noise to eliminate the modal confounding phenomenon that occurs in EMD, and by offsetting and suppressing the effects produced by noise in the decomposition results through multiple experiments; however, after integration averaging over a limited number of experiments, its reconstructed components still contain residual noise of a certain amplitude, and although the reconstruction error can be reduced by increasing the number of integrations, it increases the computational scale (Yu et al. 2018). CEEMDAN was proposed based on EEMD, which adds adaptive white noise at each stage of the decomposition and obtains each modal component by calculating a unique residual signal, and compared with the EEMD method, the reconstruction error is almost zero regardless of the number of integrations (Abba et al. 2020), and its decomposition process has integrity and overcomes the problem of low efficiency of EEMD decomposition.

The frequency of components processed by CEEMDAN is different, and the components correspond to different indication significance. The existing research usually regards the high-frequency components as noise components, which lacks certain scientific basis. As an improvement of sample entropy and approximate entropy, FE is also a measure of time-series complexity, which has been successfully applied to feature extraction of water quality signals (Liu et al. 2018). In this paper, CEEMDAN and FE were combined, and the noise components were determined and eliminated by fuzzy entropy. The denoised water quality characteristic sequence obtained from the reconstruction of residual components was used for water quality prediction.

According to the literature, there are few studies that apply IALO-LSTM to water quality parameter prediction; therefore, it is worth exploring and analyzing the prediction performance for the improved model. The main objectives of the present study are: (1) to combine CEEMDAN and FE for noise reduction of the original time-series; (2) improve the ALO with the Sin chaos initialization, Cauchy mutation and OBL strategy; (3) to construct a novel optimal-hybrid model combining IALO and LSTM for water quality prediction. Comparing to existing models, our novel method makes the following major contributions to the literature: (1) a novel data cleaning method is proposed for the time-series; (2) an alternative framework for water quality prediction with high accuracy is established. The remainder of this paper is organized as follows. The section ‘Material and methods’ provides a brief description of the methodology; the section ‘Case study’ introduces the general information of the research region, the datasets, and error evaluation criteria used in this study; the section ‘Results’ presents experimental results which compare the performance of the proposed model with peer algorithms. Discussion of the experimental results are presented in the section ‘Discussion’, and the conclusions and future work are given in the section ‘Conclusion’.

Material and methods

CEEMDAN-FE

CEEMDAN is a time–frequency domain analysis method which further eliminates the modal effects by adding adaptive noise, possesses strong adaptivity and convergence, and is mostly used to deal with nonlinear and non-stationary signals. More details about CEEMDAN can be found in the literature (Torres et al. 2011; Fan et al. 2021).

FE is a measure that can be used to describe the complexity of the specified sequence, and FE is an enhanced version of sample entropy and approximate entropy method (de Boves Harrington 2017), which describes the similarity of two adjacent vectors. It is used to measure the uncertainty of random variables and to classify the degree of uncertainty. FE not only retains the advantages of sample entropy and similarity entropy, but also solves the problem of imprecise analysis in small fluctuations and baseline drift caused by the vector similarity determined by the absolute value difference of data, so that it can obtain the entropy value even when the parameter value is very small, and change stably with the adjustment of parameters. The implementation steps of FE algorithm can be found in the literature (Singh 2018).

LSTM

LSTM is a special RNN. On the basis of ordinary RNN, LSTM has the function of long-term memory due to the addition of memory unit, and can effectively avoid the gradient dispersion problem of RNN in the long-term dependent sequence model (Zou et al. 2020). LSTM model includes four parts: input gate, forgetting gate, output gate, and cell state. The input gate determines how much input information is transmitted to the cell state; the forgetting gate mainly controls how much information in the cell state of the previous period is forgotten and how much information is transmitted to the current moment; the output gate outputs the results based on the cell state updated by the forgetting gate and the input gate. The cell state is used to record the current input, the state of the hidden layer at the previous time, the cell state at the previous time, and the information in the gate structure. The specific steps of LSTM algorithm can be referred to the literature (Zhou 2020).

IALO

For the LSTM model selected in this paper, there are four parameters that have an important impact on the performance of the algorithm, which are the number of neurons in the LSTM hidden layer L1, L2 (L1 and L2 refer to the number of LSTM units in the first and second layer), the learning rate Lr, and the number of training iterations K. These four key parameters are used as features for particle search, and the LSTM model is tuned and optimized using the IALO algorithm.

In the ALO algorithm, the random wandering of the ant population around the elite ant lions ensures the convergence of the parameter optimization process, and the roulette operation helps to improve the global search ability of the ant population to a certain extent. The specific steps of ALO algorithm can be referred to the literature (Heidari et al. 2020; Malik et al. 2020). However, the algorithm still suffers from the following problems: (1) In the ALO algorithm, the positions of both the initial ant and antlion populations are generated randomly, and the population positions produced by this method are not representative, which may lead to a large number of solutions distributed in local areas and insufficient population diversity in the initial stage of the algorithm. (2) If the elite and roulette-selected individuals of a given generation are not in the global optimum region, the entire population led by a single elite will reduce the convergence rate of the algorithm. (3) The predation radius of the ant lion shrinks in phases with the increase of the number of iterations, which leads to the gradual decrease of the population diversity, and it is difficult to jump out of the algorithm once it falls into the local optimum. To solve the above problems, this paper focuses on the following three strategies.

(1) Sin chaotic initialization

The quality of the initial population can be improved by adding chaotic mapping sequences during population initialization, enhancing population diversity and preventing premature population convergence. Chaotic sequences are often used in optimization search problems due to their randomness, ergodicity, and initial value sensitivity. The basic idea is to use chaotic mapping to introduce chaotic states into the optimization variables, and to enlarge the ergodic range of chaotic motion to the value range of the optimization variables, and then to use the chaotic variables for search optimization. Among them, Tent model, Logistic model, and Sin model are some common chaos models, but the first two are chaos models with a finite number of mapping folds, while the Sin chaos model is a model with an infinite number of mapping folds. The literature (Yang and E, 2009) pointed out that Sin chaos is superior to other similar models in terms of chaotic characteristics, so Sin chaos is used as the population initialization method of ALO algorithm. The expression for the one-dimensional self-map of Sin chaos is as follows:

$$\left\{ {\begin{array}{*{20}l} {X_{i + 1} = \sin \left( {2/X_{i} } \right)} \hfill & {i = 0,1, \ldots ,N} \hfill \\ { - 1 \le X_{i} \le 1} \hfill & {X_{i} \ne 0} \hfill \\ \end{array} } \right.,$$
(1)

where N is the number of initial populations. The initial value cannot be set to 0 to prevent the generation of immobile and zero points at [− 1,1].

(2) Integration of Cauchy mutation and OBL

Swarm intelligence algorithms often apply mutation strategies to improve the convergence accuracy of the algorithm. Among them, Gaussian mutation and Cauchy mutation are two of the more common perturbation strategies (Rahnamayan et al. 2012). Gaussian distribution is a continuous random-type probability distribution function of variables, while the Cauchy distribution function is a continuous-type probability distribution function with no mathematical expectation. According to the standard Cauchy and Gaussian distribution probability density function curves (Lan and Lan 2008), it can be found that the Cauchy distribution has a wider range of values on the x-axis compared with the Gaussian distribution, which means that the corresponding Cauchy mutation can help expand the range of solution space for particle search, and the particles that have undergone Cauchy mutation can search for more and better feasible solutions, and the Cauchy mutation strategy will be more able to prevent the algorithm from falling into local optimum.

The probability density function of Cauchy distribution is as follows:

$$f(x) = \frac{1}{{\uppi }} \cdot \frac{m}{{m + x^{2} }}, - \infty < x < + \infty ,$$
(2)

where m is the scale parameter.

The Cauchy mutation is introduced into the update formula of the optimal antlion position to exploit the perturbation ability of the Cauchy operator, so that the global optimization performance of the algorithm can be improved

$$R_{E}^{t + 1} = R_{E}^{t} + \, Cauchy \, (0,1) \cdot R_{E}^{t} ,$$
(3)

where \(R_{E}^{t}\) denotes the random wandering of the elite antlion at the tth iteration, and Cauchy (0,1) is the standard Cauchy perturbed random value.

The Cauchy distribution random variable generating function is

$$\eta = \tan [(\xi - 0.5)\pi ],$$
(4)

where ξ is a uniformly distributed random value in the interval (0,1).

The concept of OBL was proposed by Tizhoosh (2005), and its main idea is to search both the current solution and the backward solution and select the candidate solution by merit. The literature (Rahnamayan et al. 2008) shows that the process of OBL to find an optimal solution takes less time than that of the stochastic model, and that the backward solution has a nearly 50% higher probability of being close to the global optimal solution than the current solution, while evaluating the forward and backward solutions of a candidate solution is more beneficial to accelerate the convergence of the algorithm. To enable the optimal antlion individual to better find the optimal solution, the OBL strategy is incorporated into the antlion algorithm, mathematically characterized as follows:

$$R_{E}^{{t^{^{\prime}} }} = ub + r \cdot \left( {lb - R_{E}^{t} } \right)$$
(5)
$$R_{E}^{t + 1} = R_{E}^{{t^{^{\prime}} }} + b_{1} \cdot \left( {R_{E}^{t} - R_{E}^{{t^{^{\prime}} }} } \right),$$
(6)

where \(R_{E}^{{t^{^{\prime}} }}\) is the inverse solution of the optimal solution in the tth generation, ub, lb are the upper and lower bounds, respectively, r is a 1*d (d is the spatial dimension) matrix of random numbers obeying a (0,1) standard uniform distribution, and b1 denotes the information exchange control parameter with the following equation:

$$b_{1} = \left( { \, T - t/T} \right)^{t} ,$$
(7)

where t is the current iteration, and T denotes the maximum number of iterations.

(3) Integration of dynamic selection and greedy rule

To further improve the algorithm's optimization-seeking performance, this paper draws on the dynamic selection strategy proposed by Mao and Zhang (2020) to update the target location by alternating the OBL and the Cauchy mutation strategy to dynamically update the target location under certain probability. In the OBL strategy, the backward solution is obtained through the OBL mechanism to expand the search field of the algorithm, and in the Cauchy mutation strategy, the Cauchy mutation operator is applied to perform a perturbation variation operation at the optimal solution position to derive a new solution, which improves the defect of the algorithm falling into the local region. As for which strategy to adopt for target position update, it is determined by the selection probability Ps, which is calculated as follows:

$$P_{s} = - \exp \left( {1 - t/T} \right)^{20} + \theta ,$$
(8)

where θ is the adjustment parameter, and its value can be taken as 0.05.

The specific strategy is chosen in the following way.

If rand < Ps, the OBL strategy is selected for position update according to Eqs. (5)–(7); otherwise, the Cauchy mutation strategy is selected for target position update according to (3).

Although the ability of the algorithm to leap out of the local space can be enhanced by the two perturbation strategies mentioned above, it is not possible to determine whether the new position obtained after the perturbation variation is better than the fitness value of the original position, so after the perturbation variation update, the greedy rule (Deng et al. 2015) is introduced to determine whether the position should be updated by comparing the fitness values of the old and new positions. The greedy rule is shown in Eq. (9), and f(x) denotes the fitness value of the position of x

$$\left\{ {\begin{array}{*{20}l} {R_{E}^{t} = A{\text{nt}}lion_{j}^{t + 1} } \hfill & {f\left( {A{\text{nt}}lion_{j}^{t + 1} } \right) < f\left( {R_{E}^{t} } \right)} \hfill \\ {R_{E}^{t} = R_{E}^{t} } \hfill & {f\left( {A{\text{nt}}lion_{j}^{t + 1} } \right) \ge f\left( {R_{E}^{t} } \right)} \hfill \\ \end{array} } \right.,$$
(9)

where \(Antlion_{j}^{t}\) denotes the position of the jth antlion in the tth iteration.

The design of the water quality forecasting model based on CEEMDAN-IALO-LSTM

The methodology of the CEEMDAN-IALO-LSTM water quality prediction model is shown in Fig. 1.

Fig. 1
figure 1

Flowchart of the CEEMDAN-IALO-LSTM algorithm

(1) CEEMDAN is used to decompose the water quality time-series to obtain K modal components and a residual component, and calculate FE for each component, remove the first two components corresponding to the largest FE to achieve noise reduction, and reconstruct the remaining components by summation.

(2) The reconstructed time-series are divided into a training set and a test set and normalized.

(3) Improve the traditional ALO using strategies that include chaotic initialization, Cauchy mutation, and OBL to build IALO.

(4) The hyperparameters (L1, L2, Lr, K) of the LSTM are determined using IALO to construct the IALO-LSTM hybrid prediction model.

(5) The denoised data are input into IALO-LSTM model for rolling prediction, and the output is performed after the inverse normalization process and the prediction results are analyzed.

Case study

Data source

The Pearl River is the largest river in southern China, flowing through six provinces (autonomous regions) and the northeastern part of the Socialist Republic of Vietnam, and is known as one of the seven major rivers in China. The Pearl River basin lies between 21°31'N–26°49'N and 102°14'E–115°53'E, covering a total area of 4.537 × 105 km2 (Fig. 2). The Pearl River basin is rich in water resources, with a total length of 2214 km and an average annual runoff of about 3,36 × 108 m3, ranking second in China.

Fig. 2
figure 2

Location of the gauging stations in the Pearl River Basin

With the acceleration of industrialization and urban–rural integration, the problems of uncoordinated water resources distribution and economic development, water pollution and water ecological environment deterioration, and mixed water supply and drainage in the Pearl River are becoming more and more prominent. The investigation results point out that the water quality of the Pearl River estuary is poor, and the main pollution factors are inorganic nitrogen and reactive phosphate. In recent years, after the pollution of the Pearl River, the environmental awareness and supervision of regional governments have been strengthened. By 2020, among the 165 surface water monitoring sections in the Pearl River Basin, the proportion of good water quality sections reached 92.7%, an increase of 3.0% over 2016; the proportion of water quality inferior to class V section achieved zero, a decrease of more than 3.6% over 2016.

The datasets used were collected at the Yongjiang River and Beijiang River, from 2010 to 2016 with a week's interval, which spans a period of 365 weeks. DO was selected as a predictor in the monitoring index.

The monitoring data of DO at Yongjiang River and Beijiang River are depicted in Fig. 3, and the statistical analysis results of the original time-series of DO concentration are given in Table 1.

Fig. 3
figure 3

The recorded weekly DO data at Yongjiang River and Beijiang River

Table 1 Statistical analysis of the original time-series of DO concentration

Parameter specification

The selected monitoring data can be abnormal and missing due to instrument failure, signal interference, etc. The original data were first detected for outliers, and the criterion for determining this was the Laida criterion (Chen and Chau 2019), the outliers were considered as missing values, and then, the missing values were filled using the Lagrange interpolation method.

In this paper, the denoised data are divided into the training set and test set, and their ratios are 7:3. The sliding window length is 10, that is, the data of the first 10 weeks are used to predict the next one. To speed up the model calibration process, data were normalized to an interval by transformation of means

$$x_{i}^{\prime } = \frac{{x_{i} - x_{i\min } }}{{x_{i\max } - x_{i\min } }},$$
(10)

where the ith variable is represented by xi; ximin stands for the minimum value of xi in the training set; ximax is the maximum value of xi in the training set.

For CEEMDAN, the parameters are set as follows: the noise standard deviation is 0.2, the number of realizations is 500, and the maximum number of sifting iterations allowed is 5000. The initial parameters of the stand-alone LSTM algorithm are set to L1 = 20, L2 = 20, Lr = 0.005, and K = 20. The parameters of ALO used to optimize LSTM are: the initial number of ants is 5, and the maximum number of iterations is 20. The parameter setting of IALO is just adding the part of Cauchy mutation and OBL on the basis of ALO algorithm, and the rest of the settings are the same as ALO. The ALO and IALO are used for the hyperparameter search of LSTM, where the values of L1 are taken in the range [1,100], L2 in the range [1,100], K in the range [1,100], and Lr in the range [0.001,0.01].

To verify the effectiveness and superiority of IALO, the gray wolf optimization algorithm (GWO) and the whale optimization algorithm (WOA) are also used to optimize LSTM (Wang et al. 2022; Yang et al. 2022). The number of initial wolves and whales is 5, and the maximum number of iterations is 20. The value range of the above four hyperparameters is kept consistent with IALO. The back propagation neural network (BPNN), as a comparison algorithm, is a single implied layer structure, with ten hidden layer nodes, learning rate set to 0.1, maximum 1000 iterations, and target accuracy set to 1e-3.

All the algorithms and models applied in this study were implemented by in-house coding on the platform of MATLAB R2020a.

Error evaluation criteria

The selection of reasonable performance criteria is of utmost importance for assessing model performance. In this paper, the mean absolute error (MAE), root-mean-square error (RMSE), mean relative error (MAPE), correlation coefficient (CC), determination coefficient (R2), Nash–Sutcliffe efficiency coefficient (NSE), and Willimot indexes of agreement (IA) are adopted to statistically evaluate the models applied in this paper. The mathematical representations are cited as follows:

$$MAE = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {P_{i} - O_{i} } \right|}$$
(11)
$$MAPE = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {\frac{{P_{i} - O_{i} }}{{O_{i} }}} \right|} \times 100\%$$
(12)
$$RMSE = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {P_{i} - O_{i} } \right)^{2} } }$$
(13)
$$R^{2} = \frac{{\sum\nolimits_{i = 1}^{n} {\left( {P_{i} - \overline{O} } \right)^{2} } }}{{\sum\nolimits_{i = 1}^{n} {\left( {O_{i} - \overline{O} } \right)^{2} } }}$$
(14)
$$CC = \frac{{\sum\nolimits_{i = 1}^{n} {\left( {P_{i} - \overline{P} } \right)} \left( {O_{i} - \overline{O} } \right)}}{{\sqrt {\sum\nolimits_{i = 1}^{n} {\left( {P_{i} - \overline{O} } \right)^{2} } } \sqrt {\sum\nolimits_{i = 1}^{n} {\left( {O_{i} - \overline{O} } \right)^{2} } } }}$$
(15)
$$NSE = 1 - \frac{{\sum\nolimits_{i = 1}^{n} {\left( {O_{i} - P_{i} } \right)^{2} } }}{{\sum\nolimits_{i = 1}^{n} {\left( {O_{i} - \overline{O} } \right)^{2} } }}$$
(16)
$$IA = 1 - \frac{{\sum\nolimits_{i = 1}^{n} {\left( {P_{i} - O_{i} } \right)^{2} } }}{{\sum\nolimits_{i = 1}^{n} {\left( {\left| {P_{i} - \overline{O} } \right| + \left| {O_{i} - \overline{O} } \right|} \right)^{2} } }},$$
(17)

where n is the number of test set, \(O_{i}\) and \(P_{i}\) are the actual and predicted values of DO concentration at time i, and \(\overline{O}\) and \(\overline{P}\) are the mean value of the actual and predicted DO concentration.

The R2 measures the degree of correlation between the actual and predicted values, and the CC is often used to evaluate the linear relationship between the predicted and actual values. The NSE can also be used to assess the prediction power of models, introduced by Nash and Sutcliffe (1970). The best fit between the predicted value and actual value would have MAE = 0, MAPE = 0, RMSE = 0, R2 = 1, CC = 1, and NSE = 1, respectively.

In addition, to study and analyze the distribution characteristics of the prediction errors of each model, statistical analysis and Gaussian mixture distribution test were performed on the prediction errors of each model. Assuming that the prediction error sequence f(W) of the model obeys the probability distribution of the location parameter μ and the scale parameter σ, the Gaussian distribution formula of the prediction error satisfying N(μ,σ2) is

$$f(W) = \frac{1}{{\sqrt {2{\uppi }} \sigma }}{\text{e}}^{{ - \frac{{(w - \mu )^{2} }}{{2\sigma^{2} }}}} ,$$
(18)

where μ is the expected mean of the model prediction error value; σ is the standard deviation of the model prediction error value.

Noise reduction effect evaluation criteria

This paper adopts three commonly used noise reduction effect evaluation criteria, namely signal-to-noise ratio Rsn (Song et al. 2021a, b), CC, and residual energy percentage Esn.

For Esn, the sequence energy is obtained as follows (Song et al. 2021a, b):

$$E = \int_{T} | x(t)|^{2} dt = \sum\limits_{t = 1}^{m} | x(t)|^{2} .$$
(19)

The Esn of the noise reduced sequence to the original sequence is defined as

$$E_{sn} = E_{0} /E,$$
(20)

where E represents the energy of the original sequence; E0 represents the energy of the noise reduced sequence.

Results

Although the data sets of both Yongjiang and Beijiang monitoring points are calculated in this paper, to avoid repetition, the main body only focuses on the Yongjiang river. The detailed introduction of the prediction results of Beijiang River is shown in Table 4.

Comparison of different noise reduction methods

CEEMDAN was used to decompose the Yongjiang water quality parameter series into 8 subseries with different amplitudes and frequencies, including 8 IMFs and 1 residual term (Fig. 4). It can be seen from Fig. 4 that each subsequence shows obvious periodic variation characteristics. The frequency of IMF1 is the highest and the wavelength is short. The frequency of IMF2 to IMF8 decreases in turn and the wavelength increases. The amplitude of IMF5 is the largest and the fluctuation range is [− 0.69,0.69]. The residual term represents the trend term of DO sequence, which shows that the DO content of Yongjiang River is increasing gradually from 2010 to 2016.

Fig. 4
figure 4

Decomposition results of CEEMDAN at Yongjiang river

The FE is calculated for each component of the decomposition, and the results are shown in Fig. 5. According to Fig. 5, the FE of IMF1 and IMF3 is significantly higher than the rest of the components, both of which can be regarded as noise components, after the two components are removed, the remaining components can be added and reconstructed to get the denoised water quality data.

Fig. 5
figure 5

Fuzzy entropy of each IMF component

To test the noise reduction effect of CEEMDAN-FE on water quality signals, the results are compared with those obtained by combining EMD, EEMD, complete ensemble empirical mode decomposition (CEEMD), and variational mode decomposition (VMD) with FE, respectively, and the noise reduction method is also to remove the first two items with higher FE.

Figure 6 shows the reconstructed sequence errors of different noise reduction methods, including EMD-FE, EEMD-FE, CEEMD-FE, etc. The amplitude of sequence error after CEEMDAN-FE joint noise reduction fluctuates in [ −1.12,0.56], which is significantly smaller than the other three methods. Table 2 shows the noise reduction effect of different methods. The results are basically consistent with Fig. 6, although CEEMDAN-FE is not the best in terms of performance in all three evaluation indicators, and it is able to meet the needs of data noise reduction in terms of overall performance.

Fig. 6
figure 6

Reconstructed sequence errors of different noise reduction methods

Table 2 Noise reduction effect of different methods

CEEMDAN-IALO-LSTM results

In this paper, the best mean square error (MSE) obtained by LSTM in the training process is used as the fitness function. The four parameters corresponding to the minimum MSE are the hyperparameters obtained from the optimization. The error convergent curve during training process is shown in Fig. 7a, and it can be found that the error of CEEMDAN-ALO-LSTM converged quickly as the iteration number became larger and larger. The fitness evolution curve of CEEMDAN-ALO-LSTM reached the required accuracy within 5 iterations and then maintained the optimal fitness value, showing strong learning ability. Compared with CEEMDAN-ALO-LSTM, the initial and final errors of CEEMDAN-IALO-LSTM are one order of magnitude smaller, and the model accuracy has been improved substantially. Figure 7b shows the calculated results of LSTM hyperparameters optimized by ISSA, i.e., L1 = 48, L2 = 69, K = 77, and Lr = 0.00865.

Fig. 7
figure 7

CEEMDAN-IALO-LSTM training process: a error convergent curve and b optimized hyperparameter results

Comparison of different meta-heuristic optimization algorithms

To test the effectiveness of the proposed IALO, GWO, WOA, and ALO are used as the benchmark meta-optimization algorithm models in this paper. Table 3 shows the results of LSTM hyperparameters obtained by applying different meta-optimization algorithms.

Table 3 Hyperparameter calculation results of different meta-optimization algorithms

The LSTM hyperparameters optimized by different meta-optimization algorithms are used for prediction, and the results are shown in Fig. 8. The meta-optimization algorithms play an important role in improving the prediction accuracy of the LSTM algorithm. Compared with LSTM, ALO-LSTM, IALO-LSTM, GWO-LSTM, and WOA-LSTM decreased by 10.58%, 23.38%, 20.60%, and 18.46% in MAE, 11.02%, 20.52%, 18.45%, and 19.80% in RMSE, and 10.57%, 22.09%, 21.95%, and 18.54% in MAPE, respectively. Results show that IALO-LSTM outperforms other models in terms of three error evaluation criteria.

Fig. 8
figure 8

Error statistics of applied algorithms: a error values and b prediction results

Prediction results of different algorithms

To deeply analyze the prediction error distribution and evaluate the observation accuracy of each model, the Gaussian error distribution fitting test was used to analyze them, and after calculating their mathematical and statistical indicators such as mean, standard deviation and variance using Eq. (48), the error distribution was depicted, as shown in Fig. 9a. The prediction errors of the hybrid prediction model proposed in this paper are distributed around the zero point, i.e., the probability density of the interval with smaller errors is larger and has higher observation accuracy. The LSTM, autoregressive integrated moving average (ARIMA), and IALO-LSTM models have a wider range of error distribution and higher probability density at larger errors, and the proposed models have better observation accuracy than the BPNN, CEEMDAN-LSTM, and CEEMDAN-ALO-LSTM models.

Fig. 9
figure 9

Prediction results of applied algorithms: a frequency distribution histogram and b prediction curve

Figure 9b shows the prediction curve of various models. It can be seen that the prediction curves obtained by each method are basically consistent with the trend of the actual value curves, and the prediction curves are not beyond 95% confidence interval of the actual monitoring results. However, through careful local observation, the CEEMDAN-IALO-LSTM model corresponds to a prediction curve that is closer to the actual monitoring curve and has higher prediction accuracy than the other models. This also indicates that the proposed CEEMDAN-IALO-LSTM model is more suitable for accurate prediction of short-term water quality parameters compared with other comparison models.

The Taylor diagram and Violin plot (Fig. 10) were chosen to compare and analyze the performance of different prediction methods. According to Fig. 10a, the results of all applied models had a CC between 0.6 and 0.95, and the CC is more than 0.90 for the proposed model. The root-mean-square deviation (RMSD) values are sorted as follows: CEEMDAN-IALO-LSTM < CEEMDAN-ALO-LSTM < BPNN < IALO-LSTM < CEEMDAN-LSTM < LSTM < ARIMA. The CEEMDAN-IALO-LSTM algorithm has the highest accuracy among the peer models, and can meet the requirements for short-term prediction accuracy of watershed water quality. According to Fig. 10b, the violin plot patterns of the prediction results of CEEMDAN-IALO-LSTM are most similar to those of the original measurement data, and the median and frequency distributions are closest to the actual values, indicating good prediction performance.

Fig. 10
figure 10

Performance comparison of applied models: a Taylor plot and b Violin plot

Table 4 shows the comparison of various models in prediction performance, and the bold values in the table indicate the best performance among peer models. According to Table 4, the CEEMDAN-IALO-LSTM has the best prediction performance among the peer models. Compared with the stand-alone LSTM, the CEEMDAN-IALO-LSTM model reduced by 44.27%, 43.16%, and 44.34% in terms of MAE, MAPE, and RMSE respectively, and increased by 53.77%, 24.00%, and 63.10% in terms of R2, CC, and NSE, respectively. Its improvement for the stand-alone LSTM model was higher than that of the other models, and its error performance was also better than some new improved LSTM algorithms (Dong et al. 2019; Xu et al. 2019; Liu et al. 2020). In addition, the DO prediction results of Yongjiang river are basically consistent with those of Beijiang river, which shows the practicability and robustness of the proposed model.

Table 4 Comparison of various models in prediction performance

Discussion

In this paper, we verified the ability of the CEEMDAN-IALO-LSTM to address the advanced prediction problem of water quality parameters. Summarizing the research results, the error statistics of the Yongjiang river and Beijiang river demonstrate that the proposed model, combining the strong noise-resistant robustness of the CEEMDAN-FE and the nonlinear mapping of the LSTM, outperforms other models in terms of prediction performance. By averaging the error statistics of the two gauging stations, it can be found that, compared with the basic LSTM, the CEEMDAN-IALO-LSTM reduced by 50.24%, 47.66%, and 49.66% in MAE, MAPE, and RMSE, and increase by 71.74%, 31.42%, and 172.23% in R2, CC, and NSE, respectively. The performance improvement of CEEMDAN-IALO-LSTM for the LSTM model is significantly higher than that of the other models. The model developed in this study can be further applied in other regions to improve the forecasting of water quality parameters and other ambient indexes.

Because of the sensitivity of the short-term prediction model to the original water quality series, taking into account the impact of various factors on the water quality series, for further study, more effective pretreatment methods of noise reduction for water quality data should be explored and applied to further enhance the algorithm performance. The methods that can be tried include synchrosqueezing wavelet transform (SWT) (Zhang et al. 2020), Savitzky-Golay (SG) filter (Bi et al. 2021), threshold denoising method based on wavelet packet transform (Lang et al. 2018), etc.

In this paper, only three base prediction models (i.e., LSTM, BPNN, and ARIMA) are analyzed. There exist some other improvements to these models, for example, the seasonal autoregressive integrated moving average (Wang et al. 2019), gated recurrent unit (GRU) and the bidirectional LSTM (Chen et al. 2018). Therefore, more base models with various denoising methods should be compared and analyzed, to further strengthen the applicability of the IALO-LSTM model in watershed water quality prediction. In addition, it is valuable to examine the application of more ensemble learning models (Kim and Seo 2015) in the future.

Conclusion

The high-precision advance prediction of water quality can realize the early warning of water pollution and provide a reference basis for government decisions. This paper proposed a water quality prediction model based on CEEMDAN-IALO-LSTM, including the framework design of CEEMDAN-IALO-LSTM, the determination of model parameters, and the specification of datasets used, which effectively solves the problems of easy to fall into local optimum, slow late convergence, and early convergence in the traditional ALO algorithm. The results indicate that the comprehensive performance of the CEEMDAN-IALO-LSTM model outperforms most of the existing prediction algorithms, with average MAE, MAPE, RMSE, R2, CC, and NSE of 0.42, 5.37%, 0.53, 0.83, 0.91, and 0.82 in the test stage, which are 50.24%, 47.66%, 49.66%, 71.74%, 31.42%, and 172.23% improvements, respectively, compared to the baseline LSTM. In general, this study supports the applicability and robustness of CEEMDAN-IALO-LSTM algorithm in the field of water quality prediction, and to a certain extent, extending the application of ensemble learning technology.

The parameters used to characterize the water quality include DO, chemical oxygen demand (COD), total nitrogen (TN), total phosphorus (TP), etc. However, only DO indicators of Yongjiang and Beijiang were selected due to the completeness of monitoring data in this paper. In future research, for better verification of model robustness, more complete monitoring data should be selected and multi-parameter indicators from multiple regions should be involved in prediction. Principal component analysis or gray correlation analysis can be used to extract the key indicators.

Although the proposed model had an acceptable accuracy, it is possible to employ other meta-heuristic optimization algorithms and improved base models, and integrate them with preprocessing denoising methods such as SWT, Savitzky–Golay (SG) filter, threshold denoising method based on wavelet packet transform. This will create more accurate models in the prediction of DO and other environmental indicators. With the aim of enhancing the validation of the applicability and robustness of the proposed model, we recommend using large data samples with various indicators at different monitoring sites.