1 Introduction

The level of dissolved oxygen (DO) is one of the primary indicators that can reflect the degree of water pollution. It is also one of the vital measures for ensuring the health and development of aquaculture fish (Hu et al. 2015; Rahman et al. 2020; Xiao et al. 2019). A sufficient amount of information on the state of water quality is essential for effective management. Thus, appropriate modeling is required to forecast future events. As a result, while modeling, the uncertainty in water quality data should be carefully evaluated to make more reliable decisions.

Recently, artificial neural network (ANN) based models have been successfully applied in the point prediction of water quality. These models can obtain correlations between input and output without comprehending their basic principles. Wu used a backpropagation neural network based on particle swarm optimization algorithm (PSO) to forecast the DO of regional groundwater in Xilin Gol League (Wu et al. 2018). Li et al., (2021) compared the performance of the recurrent neural network (RNN), long-short time memory neural network (LSTM) and gated recurrent unit (GRU), determined that the GRU had higher performance and was more suited for dissolved oxygen prediction. Zhang et al., (2020) present a model to anticipate dissolved oxygen trends based on a combination of kernel principal component analysis (KPCA) and Recurrent Neural Network (RNN).

In most cases, point prediction of ANN-based models would be considered firstly by lots of studies. However, they may suffer from low reliability when the uncertainty level presents in the DO time series. Because the point prediction cannot capture the modeling uncertainty information, it could not explain the forecast accuracy (Huang et al. 2017). On the contrary, modeling for uncertainty information may be used to determine prediction error and the likelihood of correct forecasts, making decision-making easier.

Interval prediction (PI) is a powerful tool for estimation uncertainty to inform managers (Quan et al. 2015). As the confidence level has measured, PI offered a limit that captured the observed values. ((1-a) %) (Chatfield 1993; Ma et al. 2020; Voyant et al. 2020). Some methods to compute the PIs include the delta method, Bayesian method, bootstrap and low upper bound estimation (LUBE). The delta approach involved using Taylor's series in an ANN non-linear regression model (Chryssolouris et al. 1996; Momotaz and Dohi 2016). This technique hypothesizes that the noise of original data is homogeneity and normal distribution, which is inconsistent with reality and the method questionable. In the Bayesian approach that ANN's parameters have their own probability distribution instead of a single value. Therefore the output of the Bayesian neural network will likewise have distributions depending on the observed training set (MacKay 1992). The Bayesian method needs great calculation force to computer Hessian matrix to construct PIs Moreover, the accuracy of the model largely depends on prior knowledge. PIs are calculated using the bootstrap method, an easy and widely used way to calculate them (Efron and Tibshirani 1993; Lu et al. 2020). As a resample method, the bootstrap method needs to train several different ANNs. This method is easy to realize and does no need any assumption about the distribution of data. The only disadvantage of the method is its high computational cost for large datasets, and the whole performance over depends on a single ANN. The LUBE method directly outputs the lower and upper bound of PIs via an ANN (Lian et al. 2020). This technique includes a heuristic search algorithm to construct the architecture and parameters of the ANN due to LUBE cannot be expressed as a supervised learning problem, which consumes great computational force. In conclusion, each method has own advantage and disadvantage, how to select the suitable technique to quantify the uncertainty of dissolve oxygen need to be further researched.

DeepAR, proposed recently by Salinas et al., (2020), is an autoregressive recurrent neural network with probabilistic forecasting ability. DeepAR applies RNN and autoregression technique to predicts scalar time series, which learns such a model from historical data. Compared with the methods mentioned above, DeepAR does not assume Gaussian noise but can incorporate many likelihood functions. As a supervised learning model, DeepAR can effectively train its parameters using the backpropagation algorithm and directly gives the lower and upper bound of PIs. Moreover, this method works with litter hyperparameter turning and is applied to small-size datasets. In recent years, DeepAR has achieved success in many research fields. For example (Dong et al. 2021), established a model based on DeepAR techniques for deformation trend prediction. Park et al., (2020) photovoltaic generation data captured at Hadong, Korea, to investigate the probabilistic prediction schemes of day-ahead photovoltaic generations with DeepAR. The simulation results show that DeepAR is helpful for efficient grid management. Although DeepAR has a solid ability to account for uncertainties, to the best of our knowledge, DeepAR has been rarely used in interval prediction for water quality in the existing literature.

In this paper, DeepAR is applied for water quality interval prediction. The data on water quality, on the other hand, was often subject to various sources of uncertainty, including measurement mistakes and random occurrences (Nourani et al. 2021). Extraction of crucial feature information, obtaining high-quality water quality data sets. Further, the Variational Mode Decomposition (VMD) technique and sparrow search algorithm (SSA) improved the model's performance. A novel water quality interval prediction framework VMD-DeepAR-SSA was proposed. VMD can deconstruct a complicated original time series signal into a set of band-limited intrinsic mode functions (IMFs) to better analyze their actual distinctive application (Dragomiretskiy and Zosso 2014). SSA is an efficient multi-objective swarm intelligence optimization algorithm with high-performance global search capability, better stability and convergence accuracy, fewer setting parameters and easy implementation (Xue and Shen 2020). Several independent DeepAR models were structured to fit the IMFs obtained by VMD, then the value of the prediction interval is weighted, and SSA searches the best combination of the weights to obtain the ultimate prediction interval.

2 Methodology

2.1 Variational mode decomposition

VMD is a non-recursion and self-adaptive multiresolution variational mode decomposition method with advantages such as high decomposition accuracy, few decomposition layers, and control mode mixing. VMD decomposes a signal to several modes with limited bandwidth and estimates the central frequency of corresponding mode compositions (Dragomiretskiy and Zosso 2014).


VMD process main includes variational model construction and variation model solution two parts. The variational model is to search k finite bandwidth mode function and minimize the sum of the estimated bandwidth of each mode function under the constraint. This constrained optimization problem can be expressed as:

$$ \mathop {\min }\limits_{{\{ u_{k} w_{k} \} }} \left\{ {\sum\nolimits_{k} {\left\| {\partial_{t} \left[ {\left( {\delta (t) + \frac{j}{\pi t}} \right)*u_{k} (t)} \right]c^{ - jwkt} } \right\|_{2}^{2} } } \right\},s.t.\sum\nolimits_{k} {u_{k} = f} $$
(1)

where f is the original input signal, uk = (u1,u2,…,uk) is mode function, wk = (w1,w2,…,wk) is central frequency one-side spectrum of each mode function calculated by Hilbert transform. To solution the constraint problem, introducing quadratic penalty parameter α and augmented Lagrangian function:

$$ \begin{gathered} L(u_{k} ,w_{k} ,\lambda ) = \alpha \sum\nolimits_{k = 1}^{K} {||\partial_{t} [(\delta (t) + \frac{j}{\pi t}} *u_{k} (t)]e^{{ - jw_{k} t}} ||_{2}^{2} + \hfill \\ ||(t) - \sum\nolimits_{k = 1}^{K} {u_{k} (t)||_{2}^{2} + \left\langle {\lambda (t),f(t) - \sum\nolimits_{k = 1}^{K} {u_{m} (t)} } \right\rangle } \hfill \\ \end{gathered} $$
(2)

The saddle point of the above-mentioned is obtained using an alternating direction multiplier algorithm, which is the optimal solution, to realize the adaptive decomposition of complex modulated signals and obtain K mode functions.

2.2 DeepAR model

DeepAR is a probabilistic forecasting RNN architecture that generates Monte Carlo samples for probabilistic forecasting. This paper researches the application of DeepAR model in DO interval prediction. Denoting \(z_{t}\) as the value of DO time series at time t, our target is to construct the conditional distribution P of future time series length T.

$$ P(z_{{t_{0} :T}} |z_{{1:t_{0} - 1}} ) = \prod\nolimits_{{t = t_{0} }}^{T} {\ell (z_{t} |\theta (h_{t} ,\Theta )} $$
(3)
$$ h_{t} = h(h_{t - 1} ,z_{t - 1} ,\Theta ) $$
(4)

where \(\ell\) is likelihood function, \(\theta\) is a parameter of \(\ell\), A multi-layer recurrent neural network with LSTM cells implements the function h, \(\Theta\) are parameters of the model.

As show as the Fig. 1, in the training phase: the input to the network at ever time step t includes the preceding time step's target value \(z_{t - 1}\) and the output of the preceding hidden layer \(h_{t - 1}\). The model parameter \(\Theta\) constructed by RNN parameters \(h( \bullet )\) and \(\theta ( \bullet )\), by maximizing the log-likelihood:

$$ L = \sum\nolimits_{t = 1}^{{t_{0} }} {\ln \ell (z_{t} |\theta (h_{t} ))} $$
(5)
Fig. 1
figure 1

Summary of DeepAR model training and prediction process

In the prediction phase: a sample \(\widetilde{z}_{t} \sim \ell ( \bullet |\theta )\) is drawn by mento carlo simulation as the input to next time step until the end of prediction range t = t0 + T generating one sample trace. Repeating this prediction procedure produces a large number of traces reflecting the jointly predicted distribution. The quantile or expectation can be calculated via those traces.


The architecture of \(\theta (h_{t} )\) only depend on likelihood function \(\ell (z|\theta )\), in this paper gaussian likelihood for real-valued data is considered:

$$ \ell_{g} (z|\mu ,\sigma ) = \frac{1}{{\sqrt {2\pi \sigma^{2} } }}e^{{\frac{{ - (z - \sigma )^{2} }}{{2\sigma^{2} }}}} $$
(6)
$$ \mu (h_{t} ) = wh_{t} + b $$
(7)
$$ \sigma (h_{t} ) = \lg (1 + \exp (wh_{t} + b)) $$
(8)

where w is denoted as the weight matrix, b is denoted as the bias matrix, \(\sigma\) and \(\mu\) are standard deviation and mean value respectively.

2.3 Sparrow search algorithm

The primary purpose of the sparrow search algorithm is to simulate the foraging process of a sparrow swarm. The SSA process is one of the producer-scrounger models with an early warning mechanism of investigation. Producers generally have enormous energy reserves, whereas scroungers are labeled as sparrows. In the meanwhile, certain sparrows are chosen to provide early notice of an inquiry. In the process of SSA, each sparrow owns a position stand for the position of food discovered by itself. There are three possible actions for a sparrow for a sparrow: as a producer to search food, like a scrounger to follow the producer and as an investigator to make an early warning. The position of each sparrow in a D-dimensional solution space with N sparrows is:

$$ X_{i} = (x_{i,1} ,x_{i,2} ,x_{i,3} ,...,x_{i,D} ),i = 1,2,...,N $$
(9)

And he fitness of the position:

$$ f_{i} = f({\rm X}_{i} ),i = 1,2,...,N $$
(10)

In each generation, fitness top 20% sparrows are denoted as producers, the residue 80% sparrows are denoted as scroungers. The position of producer can be updated as follow:

$$ x_{i,d}^{t + 1} = \left\{ \begin{gathered} x_{i,d}^{t} \bullet \exp (\frac{ - i}{{\alpha \bullet iter_{\max } }}),R_{2} < ST \hfill \\ x_{i,d}^{t} + Q,R_{2} \ge ST \hfill \\ \end{gathered} \right. $$
(11)

where \(x_{i,d}^{t + 1}\) is the dth dimensional position of the ith sparrow in the tth generation, α ∈ (0, 1] is a random number between, Q is random number which obeys normal distribution. R2 (R2 ∈ [0, 1]) and ST (ST ∈ [0.5, 1.0]) are alarm value and the safety threshold respectively.


Following the producers, scroungers’ position can be written as follow:

$$ x_{i,d}^{t + 1} = \left\{ \begin{gathered} Q \bullet \exp (\frac{{xw_{i,d}^{t} - x_{i,d}^{t} }}{{i^{2} }}),i > n/2 \hfill \\ xb_{i,d}^{t} + \frac{1}{D}\sum\nolimits_{d = 1}^{D} {(rand[ - 1,1] \bullet (|xb_{i,d}^{t} - x_{i,d}^{t} |)),i \le n/2} \hfill \\ \end{gathered} \right. $$
(12)

where xw is the worst position of sparrows and xb is the best one in tth generation. Some of sparrows will make early warning as well as foraging, they will abandon current food and remove to a new position when sense the danger, which can be described as follow:

$$ x_{i,d}^{t + 1} = \left\{ \begin{gathered} xb_{i,d}^{t} + \beta \bullet (x_{i,d}^{t} - xb_{i,d}^{t} ),f_{i} \ne f_{g} \hfill \\ x_{id}^{t} + K \bullet (\frac{{x_{i,d}^{t} - xw_{i,d}^{t} }}{{|f_{i} - f_{w} | + \varepsilon }}),f_{i} = f_{g} \hfill \\ \end{gathered} \right. $$
(13)

where β represents a random number for the normal distribution, K ∈ [− 1,1] is a Uniform random number, ε is a small number that prevents the denominator from being unique.

3 The hybrid dissolve oxygen interval prediction model


As shown in Fig. 2, this paper combinates VMD, DeepAR and SSA techniques to predict interval efficiency. For the proposed model, DO content data firstly be decomposed into several IMFs with different central frequencies by VMD. Then, each IMF is used as input of the DeepAR to construct the conditional distribution respectively. Finally, weighting combination the outputs of those DeepAR models to construct prediction intervals. Significantly, the combination of the optimal weight is searched by SSA by minimizing coverage width-based criterion (CWC). CWC can be calculated by prediction interval coverage probability (PICP) and prediction interval normalized average width (PINAW) as follows:

$$ PICP = \frac{1}{N}\sum\nolimits_{i = 1}^{N} {a_{i} } $$
(14)

where N is the number of test data samples, and ai is a binary value which can be calculated as follow:

$$ a_{i} = \left\{ \begin{gathered} 0,y_{i} \notin [y_{i,l} ,y_{i,u} ] \hfill \\ 1,y_{i} \in [y_{i,l} ,y_{i,u} ] \hfill \\ \end{gathered} \right. $$
(15)

where yi is the value of test sample i, yi,l and yi,u are the estimated lower and upper bounds.

$$ PINAW = \frac{1}{N*W}\sum\nolimits_{i = 1}^{N} {(y_{i,u} - y_{i,l} )} $$
(16)
Fig. 2
figure 2

The implementation process of the proposed model


where W is the range of PIs width.

$$ (CWC = \beta *PINAW(1 + \gamma (PICP)e^{ - \eta (PICP - \mu )} ) $$
(17)

where γ is a binary value determined as:

$$ \gamma = \left\{ \begin{gathered} 0,PICP \ge \mu \hfill \\ 1,PICP < \mu \hfill \\ \end{gathered} \right. $$
(18)

In Eq. 18, γ and μ are two control parameters for CWC. μ reflects the coverage probability requirement of PIs, which can be calculated by the predefined confidence level (1-α). η and β is the penalty coefficient when the obtained PI dissatisfy the coverage probability requirement. In this paper, to gather high-quality PIs, penalty parameters (β, η, μ) of CWC are set to 1, 1 and 0.95 respectively.

4 Results and discussions

4.1 Simulation data

Simulation DO content data was collected from river crab culture farms in Yixing city, Jiangsu Province, China. Thirty daily data sets with 10 min intervals were chosen as experiment data sets from the summer of 2019. Each data set includes 144 DO content points, and the first twenty-one days of data were as training data while the other nine data of data were used as testing data.4.2 VMD processing of dissolved oxygen content data.

To extract the characteristic hidden in the water quality data, the DO content data are decomposed by VMD into 5 IMFs with different central frequencies. The decomposition results based on VMD are shown in Fig. 3.

  1. (a)

    Orignal data

  2. (b)

    IMF1

  3. (c)

    IMF2

  4. (d)

    IMF3

Fig. 3
figure 3

The IMFs from high frequency to low frequency using the VMD method

  1. (e)

    IMF4

  2. (f)

    IMF5

4.2 Simulation results and discussion

After the data processing by VMD, obtaining five IMFs. Then, each IMF is trained by DeepAR. According to the advice from (Salinas et al. 2020), to facilitate learning time-dependent patterns, such as peak valley during noon, DeepAR automatically creates feature time series based on the frequency of the DO content time series. So, this paper provides the detailed custom feature time series of DO content as far as possible. The following table (Table 1) lists the custom time frequencies features.

Table 1 The custom time frequencies feature of DO content time series

Five DeepAR models have been structured to learn the conditional distribution of the predicted target of five IMFs. The upper and lower bounds of those models are sum simplify to establish the initial prediction intervals. As shown in Fig. 4, the initial PIs results present a good performance with high CICP (CICP = 0.9007), which means PIs can cover over 90% of the observed value. However, the results have a high PINRW (PINRW = 0.1574) value and a cover rate of the lower bound (0.9165). In particular, information on the narrow interval width is valuable more than the wide one and the cover rate of the lower bound of the PIs should be more considered in DO interval prediction. To further improve the performance of PIs, the weighting method is used to reconstruct the PIs and SSA searches for the optimal weight combination..

Fig. 4
figure 4

The initial PIs of test sets

For the SSA process, population size N = 80, the producers account for 20% of the population size, the early warner account for 20% of the population size, and the number of iterations is 40. The trend of fitness value is shown in Fig. 5, which decreases with the number of iterations and finally be stable. Figure 5 shows the changing trend of weights in the optimization process, which to stable after 15 iterations. This shows that the SSA can quickly converge to the optimal solution. Meanwhile, the weight of lower frequency IMFs is seen to be bigger than the higher one, which means that the lower frequency IMFs contain more time-series features and represent the original DO content data instead of the higher frequency IMFs. After optimization and weighting reconstruction, the PIs result is shown in Fig. 6; compared with the initial PIs result, the proposed method improves not only the PICP (PICP = 0.9503) but also the critical index cover rate of the lower bound (0.9749) and decreases the PINRW (0.1324) effective.

  • Changing trend of fitness function.

  • Changing trend of IMF1’s weight.

  • Changing trend of IMF2’s weight.

  • Changing trend of IMF3’s weight.

  • Changing trend of IMF4’s weight.

  • Changing trend of IMF5’s weight.

Fig. 5
figure 5

The graph of SSA process

Fig. 6
figure 6

The PIs of test sets by the proposed model

4.3 Comparison with typical models

To verify the prediction effect of the proposed model, this section does the comparative analysis between the proposed model and the typical model used in interval prediction includes Bayesian, Bootstrap and LUBE. PICP, PINRW and CWC are the evaluating indicators, CWC is a comprehensive index for PICP and PINRW. To ensure the rigor of the paper, the same DO content training set and test set were used for experiments.

As shown in Table 2, it is evident that compared with Bootstrap, LUBE, and the proposed models, the PICP of the Bayesian model respectively 7.006%, 3.996%, and 3.504% beyond. However, the Bayesian method has the highest PICP but causes wider PIs and bigger PINRW. This may be caused by overestimating the variance of the outputs, which leads to a lower actual worth of PIs information. Moreover, the proposed model, LUBE and Bootstrap method have a close PINRW value, But the Bootstrap method has the lowest PICP value.

Considered that the PICP and PINRW are contradictory to each other indexes and the situation that models have similar PICP and PINRW values, the comprehensive index CWC is used to further assess the model’s performance. As shown in Table 2 and Fig. 7, the Bayesian method has the highest PICP and the Bootstrap method has the lowest PINRW, but the bad performance of their other index leads to a low CWC value. The LUBE and the proposed model have close PICP and PINRW, but the CWC value of LUBC is much less than the proposed model because punishment factors of CWC keep the balance between PICP and PINRW. Although the proposed model is not the best in PICP and PINRW, the proposed model still be the best method for DO content interval prediction through evaluating the comprehensive index.

Fig. 7
figure 7

Results of normalization index

Various methods of PIs are established with different levels of complexity, computation burden, reliability and required time. One cannot claim that a certain method is senior to the others, but each method has its advantages and disadvantages when constructing the PIs of DO content. Therefore, a brief comparison of each technique is showed in Table 3 based on the optimal value of each index from Table 2 and Fig. 7. The proposed method is the most superior method in this case. The Bootstrap method is a point prediction model with Monte Carlo simulation, which often cannot accurately reflect the output concept of the probability model. Moreover, the Bootstrap method requires several different neural networks and largely depends on a single neural network’s performance, making it a large computational burden and less reliable. The Bayesian method needs a high calculation power and the scale of the training dataset to improve its accuracy. Putting aside those requirements, the Bayesian method may be acceptable in this case. Compared with other methods, LUBE is the most straightforward method to construct the PIs directly and requires a huge power to adjust the construction and parameters, which is a challenge to equipment.

Table 2 Comparison of different methods with various indexes
Table 3 The brief comparison of different models

5 Conclusions

Forecasting DO content is essential for aquaculture water quality management because it offers decisions for water quality hybrid systems. This paper first propose adding an interval prediction module for hybrid water quality management systems to improve water quality management ability. In addition, we conducted a detailed and comprehensive analysis, comparison, and summary of prior work about interval prediction, and put forward a novel PIs model developed for dissolved oxygen content prediction. The variance mode decomposition technique is used to extract periodic characteristics of the original water quality data. The SSA is used to calculate the weights of various DeepAR outputs. The results demonstrate that the proposed method has better forecasting performance than typical Bayesian, Bootstrap, and LUBE methods as measured by PICP, PINRW and CWC. Furthermore, after weighing the benefits and drawbacks of various approaches, it is concluded that the suggested model adequately captures the uncertainty of DO content time series and is a reliable tool for the development of water quality PIs. Prediction is the most popular task of time series analysis Because of its importance in industrial, social, and scientific applications. The proposed approach is used to solute the PIs problem about time series. It can also be used in other domains such as sales volume forecast for commodity, weather like wind speed forecast, even the stock market forecast.

6 Future directions

Time series prediction is a vital tool for DO content. This paper proposed a novel and complex frame to accurately obtain PIs of DO, which is a tedious trial and error process of manual neural architecture design. With the development of neural architecture search (NAS), artificial intelligence is expected to design a network structure(Ren et al. 2021), and we also hoped that it can design a more effective and simply neural architecture for interval prediction. In the meanwhile, in addition to the past values of DO content, they are many correlated factors such as the position of the farm without knowing in advance how they interact, may influence the trend of DO content. We need to explore more robust features by dimensionality reduction techniques like linear discriminant analysis (LDA) or a spectral clustering technique (Li et al. 2019, 2018a, b; Yan et al. 2021).