A water quality prediction method based on k-nearest-neighbor probability rough sets and PSO-LSTM

Huang, Minrui; Hu, Bao Qing; Jiang, Haibo; Fang, Bo Wen

doi:10.1007/s10489-023-05024-2

A water quality prediction method based on k-nearest-neighbor probability rough sets and PSO-LSTM

Published: 04 December 2023

Volume 53, pages 31106–31128, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Applied Intelligence Aims and scope Submit manuscript

A water quality prediction method based on k-nearest-neighbor probability rough sets and PSO-LSTM

Download PDF

Minrui Huang²,
Bao Qing Hu^1,2,
Haibo Jiang^2,3 &
…
Bo Wen Fang⁴

300 Accesses
2 Citations
Explore all metrics

Abstract

Water security has attracted a lot of attention in the world, and water quality assessment is the main task to ensure water security. In China, as an important water supply route in the Beijing-Tianjin-Hebei region, the South-North Water Transfer Middle Route is crucial to the economic development and health of people in the region. Therefore, an effective water quality prediction method is essential to prevent water quality degradation and water pollution in the South-North Water Transfer Line. In this paper, on the basis of the water quality data of 13 automatic monitoring stations in the middle of the Transfer Middle Route, we propose a water quality prediction method based on k-nearest-neighbor probability rough sets and PSO-LSTM algorithm. More specifically, we first introduce a novel model of k-nearest-neighbor probabilistic rough sets (KNPRSs) by combining k-nearest-neighbor algorithm and probabilistic rough sets. Then, an attribute reduction approach based on KNPRSs is developed, which can effectively eliminate the redundant attributes in water quality assessment and filter out the valuable feature attributes. Furthermore, we propose a water quality prediction method based on LSTM neural network model optimized by PSO algorithm. By introducing the PSO algorithm, the hyperparameters of the LSTM neural network are adaptively optimized to improve the accuracy of water quality prediction. At last, the historical data of three automatic monitoring stations along the route are selected, and six water quality indicators with practical forecasting value are used as the target, and a comparison experiment is conducted. The experimental results show that the KNPRSs-PSO-LSTM model can fully extract the key characteristics of water quality attributes, can be used to predict different target indicators at different stations, which is reliable, stable and efficient, can effectively improve the prediction accuracy, can be applied to the South-North Water Transfer Middle Route water quality forecasting and early warning work.

Water quality prediction using machine learning models based on grid search method

Article Open access 29 September 2023

CART and PSO+KNN algorithms to estimate the impact of water level change on water quality in Poyang Lake, China

Article 23 April 2019

Performance of machine learning methods in predicting water quality index based on irregular data set: application on Illizi region (Algerian southeast)

Article Open access 06 November 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Environmental quality assessment of rivers is a major issue in China’s water security. In China, the South-North Water Transfer Middle Route, as an important water supply route for the Beijing-Tianjin-Hebei region, is crucial to the economic development and people’s health of the region. Therefore, an effective water quality prediction method is very important to prevent water quality deterioration and water pollution from the South-north Water diversion project. In fact, we observe that the 13 automatic monitoring stations along the South-North Water Transfer Middle Route have long and continuous water quality time-series data, which are very suitable for using deep learning methods to train and predict the data. However, the existing platform is only capable of simple processing of water quality monitoring data, which has the following shortcomings: 1) it is unable to effectively use data analysis tools to extract valuable information from complex and huge time-series data; 2) it is unable to utilize the monitoring data safety thresholds to alarm in advance; and 3) it is unable to provide effective and comprehensive decision-making support for the staff. In this paper, based on the time-series data of 13 automatic monitoring stations, we study how to reduce the dimensionality of water quality indicators through rough sets theory, eliminate redundant attributes and extract core attributes, in view of the time-series, instability and complexity of water quality data. We complete the water quality prediction through the deep learning model, to achieve the water quality forecast and early warning capability of the water supply and transmission system in the middle route. In the following, we introduce some necessary knowledge related to the paper.

1.1 A short review of rough sets theory

Water quality data automatically sampled by automatic monitoring stations are characterized by the complexity of indicator dimensions. For this feature, there is a need to reduce the dimensionality through effective methods. In fact, rough sets theory is a very effective tool for attribute reduction. In 1982, Pawlak [1] proposed the rough sets theory by introducing lower and upper approximations through equivalence relations in information systems.

However, the classical Pawlak rough sets only apply to the situation of discrete data, and if we want to discretize the continuous data, it will inevitably cause the missing data and the fragment of information. So it cannot directly deal with the numerical data in water quality data [2]. For this reason, based on Pawlak rough sets, Lin [3] proposed the neighborhood rough sets in 1988, which can effectively deal with continuous numerical data. Yao [4] and Wu [5] proposed the concept of neighborhood particles, which is different from the concept of neighborhood of metric space in 1998 and 2002, respectively. The nature of neighborhood rough sets was discussed by Hu et al. [6] in 2008. Since then many scholars have devoted themselves to the study of neighborhood rough sets. For example, in 2012, Ma [7] proposed a neighborhood rough set by using a fuzzy covering neighborhood. In 2016, Li et al. [8] proposed a decision-theoretic rough sets model based on neighborhood rough sets, and used neighborhood granularity to derive decision rules. In 2018, Miao et al. [9] proposed a three-way decision model with fuzzy neighborhood rough sets. In 2018, Fujita et al. [10] presented a proposal that allows to systematically integrate state of the art solutions of granular computing (GrC) into different phases of resilience analysis of critical infrastructures (CIs). In 2022, Yang et al. [11] explored a novel framework of sequential three-way decision for the fusion of mixed data from the subjective and objective dynamic perspectives. In 2023, Liu et al. [12] proposed a novel multi-label feature selection based on label distribution and neighborhood rough set, known as LDRS. In 2023, Yin et al. [13] studied the noise-tolerant fuzzy neighborhood rough set model for multi-label learning and its feature selection strategy.

In terms of applications, a very important application of rough sets theory is attribute reduction. For instance, Wang et al. [14] proposed an attribute reduction method based on neighborhood discrimination index and significance measure of a candidate feature in 2017. In 2019, Wang et al. [15] proposed another attribute reduction method which combines the advantages of both $\delta $-neighborhood and k-nearest neighbor. In 2021, Wan et al. [16] considered the characteristic of interaction in the neighborhood rough sets and proposed a novel hybrid feature reduction method. In 2021, Sang et al. [17] proposed an attribute reduction method based on fuzzy dominance neighborhood rough set. In 2022, Yang et al. [18] combined the incremental technology and the accelerated strategy in attribute reduction. In the same year, Yang et al. [19] proposed an attribute reduction model based on Student-t kernel and fuzzy divergence. Hu et al. [20] proposed an attribute reduction model based on weighted neighborhood rough sets in 2021. Then in the following year, Hu et al. [21] proposed another attribute reduction model based on k-nearest neighbor rough sets. However, there is a non-negligible drawback of neighborhood rough sets in complex data, that is, the complex data set makes the application of strict lower approximation definition has great limitations and low error tolerance. In 1987, Wong et al. [22] introduced the 0.5-probabilistic rough set model to describe the generalized notion (probabilistic approximate classification) of rough sets by the concept of fuzzy sets. In 2015, Wang et al. [23] proposed an attribute reduction model based on monotonic uncertainty measure in probabilistic rough set model. In 2022, Xie et al. [24] proposed an attribute reduction approach based on a weighted neighborhood probabilistic rough set, which makes the application of attribute reduction more extensive.

1.2 A brief introduction of deep learning

Water quality prediction methods have evolved through the process of mechanistic analysis, statistical analysis and deep learning methods. With the improvement of water quality database and the long-term development of deep learning methods, related methods can be applied in the direction of water quality prediction in depth [25]. The first development of neural networks began in the 1940s. McCulloch and Pitts [26] in 1943 and Hebb [27] in 1949 made great progress in biological learning theory. In 1958, Rosenblatt [28] proposed the first neural network model perceptron, which enables the training of individual neurons. The second development started in the 1980s when Rumelhart et al. [29] proposed BP neural network in 1986. BP neural network is very prone to gradient disappearance, falling into local optimum solutions or overfitting [30]. In 2006, Hinton [31], Bengio [32], Ranzato [33] proved that multilayer neural network can be applied to machine learning frameworks both theoretically and in applications. In 2012, Hinton et al. [34] proposed the Dropout method to solve overfitting and time consuming. Similarly, many other approaches have been proposed by other scholars to solve these problems. For example, Zhang et al. [35] proposed a neural network model based on particle swarm algorithm optimization. Wang et al. [36] proposed an evolutionary neural network based on evolutionary algorithm optimization.

The introduction of rough sets in neural network can not only effectively improve the neural network’s ability to deal with noisy, redundant or uncertain data input patterns, but also show the advantage of rough sets’ higher efficiency in processing data without prior knowledge. In 2011, Xu et al. [37] established a novel least squares based on neural network. In 2018, Lei [38] combined the rough set-based attribute reduction method with wavelet neural networks and successfully used it for stock prediction. In 2021, He et al. [39] utilized rough set theory to overcome the problem of exploding and vanishing gradients of GRU neural networks, which was successfully applied to the dynamic load prediction of scraper conveyor. The above studies well illustrate that the combination of rough set theory and neural network methods can be well applied to classification and prediction problems.

On the other hand, neural networks have also been used in water quality prediction in a large number of applications. In 2017, Wang et al. [40] used Long Short-Term Memory (LSTM) neural network for water quality prediction. In 2019, Ren et al. [41] used Recurrent Neural Network (RNN) for water quality prediction and early warning. In 2022, Peleato et al. [42] applied neural network with fluorescence spectral downscaling to the prediction of by-products of drinking water disinfection. In 2022, Wang et al. [43] used a neural network process-based model to analyze estuarine water quality. In 2023, Wang et al. [44] proposed a new method for water anomaly detection based on fluctuation feature analysis and LSTM. In 2023, Tao et al. [45] used Variational Mode Decomposition(VMD) to extract signal fluctuation characteristics and reduce noise interference, which improved accuracy of LSTM for water quality prediction. The network structure with RNN as the main body can make better use of the historical information of water quality data, and can better extract the memory information of time-series, which becomes the mainstream method for predicting the data set with time attributes such as stock and water quality.

1.3 Motivation and contribution of this research

In this paper, based on the demand of water quality data prediction in the middle route, we propose the water quality prediction method based on k-nearest-neighbor probability rough sets (KNPRSs) and PSO-LSTM. The KNPRSs are used to eliminate the redundant attributes and extract the key attributes for the water quality indicators with complex indicator system and difficult mechanism analysis to improve the accuracy of the data set. The PSO-LSTM is used to optimize the corresponding neural network hyperparameters for the water quality of different stations, to achieve the prediction of water quality target indicators, and to construct a model structure that meets the actual situation of different stations, and to improve the accuracy of water quality prediction.

The paper’s motivations are delineated in the following.

1.
For the complexity of the indicators of water quality data, the k-nearest neighbor algorithm is introduced to eliminate the restriction of subjective factors to select the neighborhood threshold, and to improve the error tolerance ability of the model through the construction of probabilistic rough sets. In order to complete the attribute redundant of the target water quality characteristics, eliminate redundant attributes, extract the key attributes, and streamline the input data dimension of the subsequent neural network prediction model.
2.
For the time-series and instability of water quality data, the PSO algorithm is introduced to optimize the LSTM neural network. While the time-series characteristics of water quality data are well handled, the model hyperparameters are adaptively optimized for different indexes at different stations, so that the model can achieve good results on the prediction of each indicator at each station in the middle route.

The paper’s contributions are delineated in the following.

1.
Facing the complex water quality dataset, compared with the traditional attribute reduction method, the attribute reduction method based on k-nearest neighbor probabilistic rough sets rank the distance of each object. It ensures that the objects are similar, while being able to be free from the interference of the subjective factors and the complexity of the water quality, better streamlines the data dimensions. It’s better to improve the accuracy of the subsequent water quality prediction.
2.
PSO-LSTM water quality prediction model is constructed by combining the particle swarm algorithm and the long short-term memory network. The model is based on LSTM neural network, which can well cope with the temporal order of water quality data, and selectively retain historical information through the “gate" structure and memory cells. At the same time, the PSO algorithm is used to optimize the hyperparameters of the LSTM neural network, and the model is adaptively optimized for different indicators at different stations, which greatly improves the stability of water quality prediction.

1.4 Layout of this paper

The content of this paper is organized as follows. Section 2 introduces the basic theoretical knowledge involved in water quality prediction method based on KNPRSs and PSO-LSTM. Section 3 introduces the construction process of attribute reduction model based on KNPRSs. Then, we describe how to use the PSO algorithm to optimize the hyperparameters of the LSTM that performs well on long time-series, thus completing the construction of the PSO-LSTM water quality prediction model. In Section 4, we compare our data experiments with three other water quality prediction methods to verify the feasibility and superiority of the KNPRSs-PSO-LSTM. Section 5 summarizes the work of the entire paper.

2 Basic concepts

This section consists of three subsections, which mainly introduce the basic theoretical knowledge involved in the subsequent construction of KNPRSs and PSO-LSTM: probabilistic rough sets (PRS), Long Short-Term Memory(LSTM) and particle swarm optimization (PSO).

2.1 Probabilistic rough sets

In this subsection, we introduce the basic concept of probabilistic rough sets.

Definition 1

[6, 46] A quaternion system $IS=\langle U,A,V,I \rangle $ is said to be an information system, where $U\!\!=\!\!\{x_{1},x_{2},...,x_{n}\}$ indicates a non-empty finite sample set, also called universe, $A=\{a_{1},a_{2},...,a_{n}\}$ denotes a non-empty finite set of attributes, V stands for the value range and I indicates a mapping function.

For brevity, an information system can be written as $IS=\langle U,A \rangle $. Specifically, if $A=C\cup D$, which includes both the conditional attribute set C and the decision attribute set D, $C\cap D=\varnothing $ and $D\ne \varnothing $, then we call system $DS=\langle U,C\cup D \rangle $ a decision information system.

Definition 2

[47] Given an information system $\langle U, A \rangle $, $0\le \beta <\alpha \le 1$, then $\forall X\subseteq U$, the lower and upper approximations of X are defined as

$$\begin{aligned} \underline{Apr^{\alpha }}X=\{x\in U~|~P(X~|~[x]_{R})\ge \alpha \}, \end{aligned}$$

(1)

$$\begin{aligned} \overline{Apr^{\beta }}X=\{x\in U~|~P(X~|~[x]_{R})> \beta \}, \end{aligned}$$

(2)

where $P(X~|~[x]_{R})=\frac{|[x]_{R} \cap X|}{|[x]_{R}|}$ and $[x]_{R}$ is an equivalence class by x with respect to R, i.e., $[x]_{R}=\{y\in U ~|~(x,y)\in IND(R)\}$ and $IND(R)=\{(x,y)\in U \times U~|~ \forall a \in R, I(x,a)=I(y,a)\}$.

Meanwhile, from Definition 2, we can get that the universe is divided into three parts, which are the positive region POS(X), the boundary region BND(X) and the negative region NEG(X).

$$\begin{aligned}&POS(X)=\{x\in U~|~P(X~|~[x]_{R})\ge \alpha \},\end{aligned}$$

(3)

$$\begin{aligned}&BND(X)=\{x\in U~|~\beta<P(X~|~[x]_{R})< \alpha \},\end{aligned}$$

(4)

$$\begin{aligned}&NEG(X)=\{x\in U~|~P(X~|~[x]_{R})\le \beta \}. \end{aligned}$$

(5)

2.2 LSTM

LSTM is a kind of temporal recurrent neural network, which is specialized in processing prediction of long time-series data sets and is good at learning long time dependent information, and is a variant form of recurrent neural network. The core concept of LSTM is the “gate" structure and memory cells. The “gate" structure determines how to learn to save and forget information during the training process. Thus, it is able to add and delete historical information. Memory cells, on the other hand, act as a pathway for information transfer, allowing historical information to be passed forward through the cell structure. As a result, the long history information before the longer time step can also be transferred to the memory cells of the latest time step, which eliminates the effect of short term memory. Such a core structure allows the LSTM to selectively forget some of the unimportant information during the process, and to associate important information that can be self-measured and then selected to be forgotten. Thus, the LSTM is able to retain the state of the cells farther away from the current moment, and remember longer sequences [48]. The structure of one of the hidden layer cells is shown in Fig. 1.

Where $X_{t}$ is the input of the input layer at t; $h_{t}$ is the output of the hidden layer cell at t; $C_{t}$ is the cell state at t; $f_{t}$ is the forget gate state at t; $i_{t}$ is the input gate state at t; $Ce_{t}$ is the candidate cell state at t; $o_{t}$ is the output gate state at t; $\sigma _{1},\sigma _{2},\sigma _{3}$ and tanh are the activation functions. The work of each of these gates and memory cells is as follows.

1.
Forget Gate: decide which information should be saved and forgotten. The output of the previous hidden layer cell and the input information of the current hidden layer cell are integrated into the $\sigma _{1}$ activation function through the forget gate, and the output value $f_{t}$ is between 0 and 1, which is used to remove the information $C_{t-1}$ from the previous cell. Where $f_{t}$ is calculated as
$$\begin{aligned} f_{t}=\sigma _{1}(W_{f}[h_{t-1},X_{t}]+b_{f}). \end{aligned}$$
(6)
2.
Input Gate: the role of the input gate is to update the cell state. The input gate integrates the output of the previous hidden layer cell and the input of the current hidden layer cell into the $\sigma _{2}$ activation function, with the output value $i_{t}$ between 0 and 1 to determine which information to update, closer to 0 means less important and closer to 1 means more important. Next, the output of the previous hidden layer cell and the input of the current hidden layer cell are integrated into the tanh function to create a new cell candidate state $Ce_{t}$. Finally, the output value of $\sigma _{2}$ activation function is multiplied with the output value of tanh, and the output value $i_{t}$ determines which of the output values $Ce_{t}$ is the important information to be saved. The equations for $i_{t}$ and $Ce_{t}$ are
$$\begin{aligned}&i_{t}=\sigma _{2}(W_{i}[h_{t-1},X_{t}]+b_{i}),\end{aligned}$$
(7)
$$\begin{aligned}&Ce_{t}=tanh(W_{Ce}[h_{t-1},X_{t}]+b_{Ce}). \end{aligned}$$
(8)
3.
Cell Status: the cell state $C_{t-1}$ of the previous layer is multiplied point by point with the forgetting vector $f_{t}$. If the result is closer to 0, then it means that the old cellular memory information needs to be forgotten in the new cellular state. This result is then summed point by point with the output value of the input gate, and the newly discovered important information of the cell structure is updated into the cell state. By updating the cell state through the above steps, the new cell state $C_{t}$ is calculated as
$$\begin{aligned} C_{t}=f_{t}\cdot C_{t-1}+i_{t}\cdot Ce_{t}. \end{aligned}$$
(9)
4.
Output Gate: the role of the output gate is to form the input value of the next hidden layer cell and the output value of the current hidden layer cell. The output of the previous hidden layer cell and the input of the current hidden layer cell are first integrated into the $\sigma _{3}$ activation function to form a new cell state $C_{t}$, and then input to the tanh function. Finally, the output of tanh is multiplied by the output of the $\sigma _{3}$ activation function to form the output of the hidden layer cell and the information to be passed. The output $h_{t}$ is used as the output of the current cell and the input information of the next hidden layer cell. The value of the output gate state $o_{t}$ and the output result $h_{t}$ are calculated as follows
$$\begin{aligned}&o_{t}=\sigma _{3}(W_{o}[h_{t-1},X_{t}]+b_{o}),\end{aligned}$$
(10)
$$\begin{aligned}&h_{t}=o_{t}\cdot tanh(C_{t}). \end{aligned}$$
(11)

In the LSTM neural network, the information of the historical moment is not equally important to the current moment. By combining the above-mentioned memory cell and “gate” structure, the information of the historical moment can be selectively filtered and filtered to complete the update of the current state. When the forget vector $f_{t}\ne 0$ is output from the forget gate, some of the information of the historical moments will be preserved and fused with the current state, which makes the LSTM cope with the long-range dependence problem and complete the long-distance time- series prediction problem.

2.3 Particle swarm optimization

Particle Swarm Optimization (PSO), also known as a flock foraging algorithm, was first introduced by Eberhart and Kennedy [49] after observational studies of flock feeding behavior in 1995. In a characteristic area, there are different sizes of food targets, and the individuals in the flock share the best food available, so that the whole flock can search for the best food target in the area. The essence of the PSO algorithm is based on the behavior of such a group of animals, a simplified model of group intelligence, using the sharing of optimal information of individuals in the group, so as to achieve the whole group in a specific space from disorderly to orderly continuous movement evolution, and finally find the optimal solution of such a process.

Definition 3

[50] Assume that in a d-dimensional target space, the search community consists of n particles, where the i-th d-dimensional particle can be expressed as

$$\begin{aligned} X_{i}=\{x_{i1},x_{i2},...,x_{id}\}. \end{aligned}$$

(12)

The flight speed can be expressed as

$$\begin{aligned} V_{i}=\{v_{i1},v_{i2},...,v_{id}\}. \end{aligned}$$

(13)

The historical best position can be expressed as

$$\begin{aligned} P_{i}=\{p_{i1},p_{i2},...,p_{id}\}. \end{aligned}$$

(14)

As the particle evolves from generation t to generation $t+1$, its velocity and vector are updated as follows

$$\begin{aligned}&v_{ij}(t+1)=wv_{ij}(t)+c_{1}r_{1}(t)(p_{ij}(t)-x_{ij}(t))\nonumber \\&\qquad \qquad \qquad \qquad \qquad \quad \, \,\,+c_{2}r_{2}(t)(p_{gj}(t)-x_{ij}(t)),\end{aligned}$$

(15)

$$\begin{aligned}&x_{ij}(t+1)=x_{ij}(t)+v_{ij}(t+1), \end{aligned}$$

(16)

where w represents the inertia weight, which determines the degree of influence of the individual’s previous iteration speed on this iteration, and can play the role of balancing the global search and local search; $P_{g}=\{p_{g1},p_{g2},... ,p_{gd}\}$ represents the optimal position of the community up to the current iteration; $c_{1},c_{2}$ represent the self-summary improvement of the particle and the learning from the best individuals in the community, so as to achieve closer to its own historical best and the global best of the community; $r_{1},r_{2}$ take the value of [0, 1] random number. In addition, the upper and lower bounds of $X_{i}$ need to be bounded to ensure that the search is within a certain region.

According to Definition 3, we can get the implementation flow of the PSO algorithm as follows:

Step 1. Initialize: randomly generate the velocity and position of each individual in the population;

Step 2. Evaluate the particles: calculate the fitness of each individual. In general, the optimization goal is to minimize a function value, so the smaller the calculated function value, the greater the fitness;

Step 3. Update the best: compare the individual fitness with its historical best, if it is better than the historical best, then update its historical best $P_{i}$; compare the individual fitness with the population best, if it is better than the population best, then update the population best $P_{g}$;

Step 4. Update the particle: update the velocity and position of the particle according to (15) and (16);

Step 5. Output optimal value: determine whether the termination condition is satisfied, if it is satisfied, output optimal value, if not, return to Step 2.

3 A water quality prediction method based on k-nearest-neighbor probability rough sets and PSO-LSTM

First, we establish an attribute reduction approach based on the k-nearest-neighbor probability rough sets (KNPRSs), to complete the attribute reduction of water quality indicators, so as to complete the pre-processing of many high-dimensional water quality indicators, so that the input dimension of the water quality prediction model is more concise, so as to improve the accuracy and time efficiency of the water quality prediction model.

3.1 K-nearest-neighbor probability rough sets

First of all, we introduce the parameter k to define the k-nearest-neighbor class and the corresponding k-nearest-neighbor relation on the universe.

Definition 4

Given a decision information system $DS=\langle U,C\cup D\rangle $. For any $x\in U$, $B\subseteq C$, define the k-nearest-neighbor class $R^{k}_{B}(x)$ of the attribute subset B with respect to the object x as

$$\begin{aligned} R^{k}_{B}(x)=\bigcap \limits _{a\in B}R^{k}_{a}(x), \end{aligned}$$

(17)

where $R^{k}_{a}(x)$ denotes the set of k objects (including x itself) nearest to x with respect to the attribute a in the universe U and k is a given positive integer.

Besides, $R^{k}_{B}(x)$ is also called the k-nearest-neighbor information granule of object x, and its size is controlled by the parameter k. Especially, k-nearest-neighbor information granule is the smallest when $k=1$, $R^{k}_{B}(x)=\{x\}$; k-nearest-neighbor information granule is the largest when $k=|U|$, $R^{k}_{B}(x)=U$.

Then, we can define k-nearest-neighbor relation on the universe U.

Definition 5

A certain k-nearest-neighbor relation on the universe U can be defined using the matrix relation $N^{k}_{B}=(r^{k}_{ij})_{n\times n}$, where

$$\begin{aligned} r^{k}_{ij}={\left\{ \begin{array}{ll} 1 &{} x_{j}\in R^{k}_{B}(x_{i}),\\ 0 &{} \text{ Otherwise. } \end{array}\right. } \end{aligned}$$

(18)

Obviously, $N^{k}_{B}$ satisfies $r^{k}_{ii}=1$, but may not satisfy $r^{k}_{ij}=r^{k}_{ji}$, which may be an asymmetric matrix.

It is natural that we can obtain the following property.

Proposition 1

Given a decision information system $\langle U,C\cup D\rangle $, the following statements hold.

1.
If $B_{1}\subseteq B_{2}\subseteq C$, then $N^{k}_{B_{1}}\supseteq N^{k}_{B_{2}}$;
2.
If $k_{1}\le k_{2}$, then $N^{k_{1}}_{B}\subseteq N^{k_{2}}_{B}$.

Proof

1.
By Definition 4, it follows that $R^{k}_{B_{1}\cup B_{2}}(x)=\bigcap \limits _{a\in B_{1}\cup B_{2}}R^{k}_{a}(x)=R^{k}_{B_{1}}(x)\bigcap R^{k}_{B_{2}}(x)$. Thus, one gets that $N^{k}_{B_{1}\bigcup B_{2}}=N^{k}_{B_{1}}\bigcap N^{k}_{B_{2}}$. Since $B_{2}=B_{1}\bigcup (B_{2}-B_{1})$, one gets that $N^{k}_{B_{2}}=N^{k}_{B_{1}}\bigcap N^{k}_{B_{2}-B_{1}}$. Therefore, one concludes that $N^{k}_{B_{1}}\supseteq N^{k}_{B_{2}}$.
2.
Since $k_{1}\le k_{2}$, one gets that $\forall a\in B$, $R^{k_{1}}_{a}(x)\subseteq R^{k_{2}}_{a}(x)$ and $R^{k_{1}}_{B}(x)\subseteq R^{k_{2}}_{B}(x)$. Thus, one concludes that $N^{k_{1}}_{B}\subseteq N^{k_{2}}_{B}$.

$\square $

In what follows, we give the definition of the k-nearest-neighbor probability rough sets by using the k-nearest-neighbor class.

Definition 6

Suppose that $\langle U,C\cup D\rangle $ is a decision information system and $R^{k}_{B}(x)$ is a k-nearest-neighbor class of any $x\in U$ generated by the conditional attribute set $B\subseteq C$. Then, for a pair of parameters $\alpha $ and $\beta $ with $0\le \beta <\alpha \le 1$, the lower approximation and upper approximation of any $X\subseteq U$ with respect to the conditional attribute set B are defined, respectively, as follows:

$$\begin{aligned}&\underline{Apr^{k,\alpha }_{B}}(X)=\{x\in U~|~P(X|R^{k}_{B}(x))\ge \alpha \},\end{aligned}$$

(19)

$$\begin{aligned}&\overline{Apr^{k,\beta }_{B}}(X)=\{x\in U~|~P(X|R^{k}_{B}(x))> \beta \}. \end{aligned}$$

(20)

If $\underline{Apr^{k,\alpha }_{B}}(X)=\overline{Apr^{k,\beta }_{B}}(X)$, then we say that X is a definable set. Otherwise, X is a rough set.

Assume that the universe U is divided into the crisp equivalence classes $U/D=\{X_{1},X_{2},...,X_{n}\}$ by the decision attribute D. Then, the upper and lower approximations of D can be defined as follows.

Definition 7

Given a decision information system $\langle U,C,D\rangle $, $U/D=\{X_{1},X_{2},... ,X_{n}\}$, and $R^{k}_{B}(x)$ is a k-nearest-neighbor class generated by the conditional attribute $B\subseteq C$, with parameters $0\le \beta <\alpha \le 1$. Then, we can define the lower approximation and upper approximation of D with respect to the conditional attribute set B as

$$\begin{aligned}&\underline{Apr^{k,\alpha }_{B}}(D)=\bigcup \limits _{i=1}^{n}\underline{Apr^{k,\alpha }_{B}}(X_{i}),\end{aligned}$$

(21)

$$\begin{aligned}&\overline{Apr^{k,\beta }_{B}}(D)=\bigcup \limits _{i=1}^{n}\overline{Apr^{k,\beta }_{B}}(X_{i}), \end{aligned}$$

(22)

where

$$\begin{aligned}&\underline{Apr^{k,\alpha }_{B}}(X_{i})=\{x\in U~|~P(X_{i}|R^{k}_{B}(x))\ge \alpha \},\end{aligned}$$

(23)

$$\begin{aligned}&\overline{Apr^{k,\beta }_{B}}(X_{i})=\{x\in U~|~P(X_{i}|R^{k}_{B}(x))> \beta \}. \end{aligned}$$

(24)

At the same time, we can also obtain the decision positive region of D with respect to the conditional attribute set B, which is listed as follows:

$$\begin{aligned} POS_{B}(D)=\underline{Apr^{k,\alpha }_{B}}(D)=\bigcup \limits _{i=1}^{n}\underline{Apr^{k,\alpha }_{B}}(X_{i}). \end{aligned}$$

(25)

The following example is employed to illustrate the novel model of KNPRSs.

Example 1

As shown in Table 1, given a decision information system $\langle U,C\cup D\rangle $, $B\subseteq C$, there are 8 samples $\{x_{1},x_{2},... .x_{8}\}$, 4 conditional attributes $\{a_{1},a_{2},a_{3},a_{4}\}$ and a decision attribute d. According to the values of the decision attribute d, U is divided into $U/D=\{D_{1},D_{2}\}$, where $D_{1}=\{x_{1},x_{2},x_{3},x_{4}\}$, $D_{2}=\{x_{5},x_{6},x_{7},x_{8}\}$. Also, we divide the conditional attributes into two groups $B_{1}=\{a_{1},a_{2}\}$, $B_{2}=\{a_{3},a_{4}\}$.

By Definition 4, given $k=4$, using Euclidean distance, we can get that $R^ {4}_{a_{1}}(x_{1})=\{x_{1},x_{2},x_{3},x_{5}\}$, $R^{4}_{a_{2}}(x_{1})=\{x_{1},x_{2},x_{3},x_{4}\}$, $R^{4}_{a_{3}}(x_{1})=\{x_{1},x_{2},x_{5},x_{6}\}$ and $R^{4}_{a_{4}}(x_{1})=\{x_{1},x_{2},x_{3},x_{6}\}$. Therefore, $R^ {4}_{B_{1}}(x_{1})=R^ {4}_{a_{1}}(x_{1})\cap R^ {4}_{a_{2}}(x_{1})=\{x_{1},x_{2},x_{3}\}$, $R^ {4}_{B_{2}}(x_{1})=R^ {4}_{a_{3}}(x_{1})\cap R^ {4}_{a_{4}}(x_{1})=\{x_{1},x_{2},x_{6}\}$. The result is shown in Table 2. Then, we can separately obtain the k-nearest-neighbor relation $N^ {4}_{B_{1}}$ and $N^ {4}_{B_{2}}$ of the attribute subsets $B_{1}$ and $B_{2}$ as follows:

$$\begin{aligned} N^ {4}_{B_{1}}= \begin{pmatrix} {1} &{} {1} &{} {1} &{} {0} &{} {0} &{} {0} &{} {0} &{} {0}\\ {0} &{} {1} &{} {1} &{} {0} &{} {0} &{} {0} &{} {0} &{} {0}\\ {1} &{} {1} &{} {1} &{} {0} &{} {0} &{} {0} &{} {0} &{} {0}\\ {0} &{} {0} &{} {0} &{} {1} &{} {0} &{} {0} &{} {0} &{} {0}\\ {0} &{} {0} &{} {0} &{} {0} &{} {1} &{} {1} &{} {0} &{} {0}\\ {0} &{} {0} &{} {0} &{} {0} &{} {0} &{} {1} &{} {0} &{} {1}\\ {0} &{} {0} &{} {0} &{} {0} &{} {1} &{} {1} &{} {1} &{} {0}\\ {0} &{} {0} &{} {0} &{} {0} &{} {0} &{} {1} &{} {0} &{} {1}\\ \end{pmatrix}, N^ {4}_{B_{2}}= \begin{pmatrix} {1} &{} {1} &{} {0} &{} {0} &{} {0} &{} {1} &{} {0} &{} {0}\\ {1} &{} {1} &{} {0} &{} {0} &{} {0} &{} {1} &{} {0} &{} {0}\\ {0} &{} {0} &{} {1} &{} {0} &{} {0} &{} {0} &{} {1} &{} {1}\\ {0} &{} {0} &{} {0} &{} {1} &{} {0} &{} {0} &{} {0} &{} {0}\\ {0} &{} {0} &{} {0} &{} {0} &{} {1} &{} {0} &{} {1} &{} {0}\\ {1} &{} {1} &{} {0} &{} {0} &{} {0} &{} {1} &{} {0} &{} {0}\\ {0} &{} {0} &{} {1} &{} {0} &{} {0} &{} {0} &{} {1} &{} {1}\\ {0} &{} {0} &{} {1} &{} {0} &{} {0} &{} {0} &{} {1} &{} {1}\\ \end{pmatrix}. \end{aligned}$$

We can find that the k-nearest-neighbor relations $N^ {4}_{B_{1}}$ and $N^ {4}_{B_{2}}$ are asymmetric. Let $\alpha =0.8$ and $\beta =0.3$. Then, according to Definition 6, the lower and upper approximations of $D_{1}$ and $D_{2}$ with respect to the attribute subsets $B_{1}$ and $B_{2}$ can be obtained as follows, respectively,

1.
$ \underline{Apr^{4,0.8}_{B_{1}}}(D_{1})=\{x_{1},x_{2},x_{3},x_{4}\}$, $ \underline{Apr^{4,0.8}_{B_{1}}}(D_{2})=\{x_{5},x_{6},x_{7},x_{8}\}$;
2.
$ \overline{Apr^{4,0.3}_{B_{1}}}(D_{1})=\{x_{1},x_{2},x_{3},x_{4}\}$, $ \overline{Apr^{4,0.3}_{B_{1}}}(D_{2})=\{x_{5},x_{6},x_{7},x_{8}\}$;
3.
$ \underline{Apr^{4,0.8}_{B_{2}}}(D_{1})=\{x_{4}\}$, $ \underline{Apr^{4,0.8}_{B_{2}}}(D_{2})=\{x_{5}\}$;
4.
$ \overline{Apr^{4,0.3}_{B_{2}}}(D_{1})=\{x_{1},x_{2},x_{3},x_{4},x_{6},x_{7},x_{8}\}$, $ \overline{Apr^{4,0.3}_{B_{2}}}(D_{2})=\{x_{1},x_{2},x_{3},x_{6},x_{7},x_{8}\}$.

Table 1 Decision information system

Full size table

Table 2 k-nearest-neighbor class

Full size table

Now, we present some fundamental properties of KNPRSs.

Proposition 2

Given a decision information system $\langle U,C,D\rangle $, with parameters $0\le \beta <\alpha \le 1$. Then, for any $B\subseteq C$, the following statements hold.

1.
$\underline{Apr^{k,\alpha }_{B}}(\emptyset )=\overline{Apr^{k,\beta }_{B}}(\emptyset )=\emptyset $;
2.
$\underline{Apr^{k,\alpha }_{B}}(U)=\overline{Apr^{k,\beta }_{B}}(U)=U$;
3.
$\forall X\subseteq U$, $\underline{Apr^{k,\alpha }_{B}}(X)\subseteq X\subseteq \overline{Apr^{k,\beta }_{B}}(X)$;
4.
$\forall X_{1},X_{2}\subseteq U$, $\underline{Apr^{k,\alpha }_{B}}(X_{1}\cup X_{2})\supseteq \underline{Apr^{k,\alpha }_{B}}(X_{1})\cup \underline{Apr^{k,\alpha }_{B}}(X_{2})$;
5.
$\forall X_{1},X_{2}\subseteq U$, $\overline{Apr^{k,\beta }_{B}}(X_{1}\cup X_{2})\supseteq \overline{Apr^{k,\beta }_{B}}(X_{1})\cup \overline{Apr^{k,\beta }_{B}}(X_{2})$;
6.
$\forall X_{1},X_{2}\subseteq U$, $\underline{Apr^{k,\alpha }_{B}}(X_{1}\cap X_{2})\subseteq \underline{Apr^{k,\alpha }_{B}}(X_{1})\cap \underline{Apr^{k,\alpha }_{B}}(X_{2})$;
7.
$\forall X_{1},X_{2}\subseteq U$, $\overline{Apr^{k,\beta }_{B}}(X_{1}\cap X_{2})\subseteq \overline{Apr^{k,\beta }_{B}}(X_{1})\cap \overline{Apr^{k,\beta }_{B}}(X_{2})$.

Proof

We only prove items 4 and 6. The others can be directly obtained by Definitions 6 and 7.

4. $\forall x\in \underline{Apr^{k,\alpha }_{B}}(X_{1})\cup \underline{Apr^{k,\alpha }_{B}}(X_{2})$, at least one of the following two equations holds

$$\begin{aligned} P(X_{1}~|~R^{k}_{B}(x))=\frac{|X_{1}\cap R^{k}_{B}(x)|}{|R^{k}_{B}(x)|}\ge \alpha ,\\ P(X_{2}~|~R^{k}_{B}(x))=\frac{|X_{2}\cap R^{k}_{B}(x)|}{|R^{k}_{B}(x)|}\ge \alpha . \end{aligned}$$

Since

$$\begin{aligned} (X_{1}\cup X_{2})\cap R^{k}_{B}(x)=(X_{1}\cap R^{k}_{B}(x))\cup (X_{2}\cap R^{k}_{B}(x)). \end{aligned}$$

Then, we can get

$$\begin{aligned} P(X_{1}\cup X_{2}~|~R^{k}_{B}(x))&=\frac{|(X_{1}\cup X_{2})\cap R^{k}_{B}(x)|}{|R^{k}_{B}(x)|}\\&\ge \max \{P(X_{1}~|~R^{k}_{B}(x)),P(X_{2}~|~R^{k}_{B}(x))\}\ge \alpha . \end{aligned}$$

Thus, one concludes that $\forall X_{1},X_{2}\subseteq U$, $\underline{Apr^{k,\alpha }_{B}}(X_{1}\cup X_{2})\supseteq \underline{Apr^{k,\alpha }_{B}}(X_{1})\cup \underline{Apr^{k,\alpha }_{B}}(X_{2})$.

6. $\forall x\in \underline{Apr^{k,\alpha }_{B}}(X_{1}\cap X_{2})$, we can get

$$\begin{aligned} P(X_{1}\cap X_{2}~|~R^{k}_{B}(x))=\frac{|(X_{1}\cap X_{2})\cap R^{k}_{B}(x)|}{|R^{k}_{B}(x)|}\ge \alpha . \end{aligned}$$

Since $(X_{1}\cap X_{2})\cap R^{k}_{B}(x)$ is a subset of $(X_{1}\cap R^{k}_{B}(x))$ and $(X_{2}\cap R^{k}_{B}(x))$, one gets that

$$\begin{aligned} P(X_{1}~|~R^{k}_{B}(x))= & {} \frac{|X_{1}\cap R^{k}_{B}(x)|}{|R^{k}_{B}(x)|}\\\ge & {} \frac{|(X_{1}\cap X_{2})\cap R^{k}_{B}(x)|}{|R^{k}_{B}(x)|}\ge \alpha ,\\ P(X_{2}~|~R^{k}_{B}(x))= & {} \frac{|X_{2}\cap R^{k}_{B}(x)|}{|R^{k}_{B}(x)|}\\\ge & {} \frac{|(X_{1}\cap X_{2})\cap R^{k}_{B}(x)|}{|R^{k}_{B}(x)|}\ge \alpha . \end{aligned}$$

i.e., $x\in \underline{Apr^{k,\alpha }_{B}}(X_{1})\cap \underline{Apr^{k,\alpha }_{B}}(X_{2})$. Therefore, one concludes that $\underline{Apr^{k,\alpha }_{B}}(X_{1}\cap X_{2})\subseteq \underline{Apr^{k,\alpha }_{B}}(X_{1})\cap \underline{Apr^{k,\alpha }_{B}}(X_{2})$.

When $\alpha $ and $\beta $ take endpoint-specific values, KNPRSs also has the following properties.

Proposition 3

Given a decision information system $\langle U,C\cup D\rangle $, $B_{1}\subseteq B_{2}\subseteq C$, the following statements hold.

1.
$\forall X\subseteq C$, $\underline{Apr^{k,1}_{B_{1}}}(X)\subseteq \underline{Apr^{k,1}_{B_{2}}}(X)$;
2.
$\forall X\subseteq C$, $\overline{Apr^{k,0}_{B_{1}}}(X)\supseteq \overline{Apr^{k,0}_{B_{2}}}(X)$.

Proof

Since $B_{1}\subseteq B_{2}$, by Definition 4, we get that $\forall X\in U$, $R^{k}_{B_{2}}(x)\subseteq R^{k}_{B_{1}}(x)$.

1.
According to Definition 6, we can get
$$\begin{aligned} \underline{Apr^{k,1}_{B}}(X)&\!=\!\{x~|~P(X|R^{k}_{B}(x))\ge 1\}\\&\!=\!\{x~|~X\cap R^{k}_{B}(x)\!=\!R^{k}_{B}(x)\}. \end{aligned}$$
So $\forall x\in \underline{Apr^{k,1}_{B_{1}}}(X)$, we can get
$$\begin{aligned} X\cap R^{k}_{B_{1}}(x)=R^{k}_{B_{1}}(x)\supseteq R^{k}_{B_{2}}(x). \end{aligned}$$
i.e., $x\in \underline{Apr^{k,1}_{B_{2}}}(X)$. Thus, one concludes that $\underline{Apr^{k,1}_{B_{1}}}(X)\subseteq \underline{Apr^{k,1}_{B_{2}}}(X)$.
2.
According to Definition 6, we can get
$$\begin{aligned} \overline{Apr^{k,0}_{B}}(X)\!=\!\{x~|~P(X|R^{k}_{B}(x))\!>\!0\}\!=\!\{x~|~X\cap R^{k}_{B}(x)\!\ne \!\emptyset \}. \end{aligned}$$
So $\forall x\in \overline{Apr^{k,0}_{B_{2}}}(X)$, we can get
$$\begin{aligned} X\cap R^{k}_{B_{2}}(x)\ne \emptyset . \end{aligned}$$
Since $X\cap R^{k}_{B_{2}}(x)\subseteq X\cap R^{k}_{B_{1}}(x)$, one has that $X\cap R^{k}_{B_{1}}(x)\ne 0$. i.e., $x\in \overline{Apr^{k,0}_{B_{1}}}(X)$. Therefore, one concludes that $\overline{Apr^{k,0}_{B_{1}}}(X)\supseteq \overline{Apr^{k,0}_{B_{2}}}(X)$.

$\square $

When $\alpha $ and $\beta $ takes other non-endpoint values, it can be found that $\underline{Apr^{k,\alpha }_{B}}(X)$ exhibits left-continuity with respect to $\alpha $ and $\overline{Apr^{k,\beta }_{B}}(X)$ exhibits right-continuity with respect to $\beta $.

Proposition 4

Given a decision information system $\langle U,C\cup D\rangle $, for any $0<\gamma <1$, $ B\subseteq C$ and $ X\in U$, the following statements hold.

1.
$\underset{\alpha \rightarrow \gamma ^{+}}{\lim }\underline{Apr^{k,\alpha }_{B}}(X)=\underset{\gamma <\alpha \le 1}{\bigcup }\underline{Apr^{k,\alpha }_{B}}(X)$;
2.
$\underset{\alpha \rightarrow \gamma ^{-}}{\lim }\underline{Apr^{k,\alpha }_{B}}(X)=\underset{0<\alpha <\gamma }{\bigcap }\underline{Apr^{k,\alpha }_{B}}(X)=\underline{Apr^{k,\gamma }_{B}}(X)$;
3.
$\underset{\beta \rightarrow \gamma ^{+}}{\lim }\overline{Apr^{k,\beta }_{B}}(X)=\underset{\gamma <\beta \le 1}{\bigcup }\overline{Apr^{k,\beta }_{B}}(X)=\overline{Apr^{k,\gamma }_{B}}(X)$;
4.
$\underset{\beta \rightarrow \gamma ^{-}}{\lim }\overline{Apr^{k,\beta }_{B}}(X)=\underset{0<\beta <\gamma }{\bigcap }\overline{Apr^{k,\beta }_{B}}(X)$.

Proof

1.
When $\alpha \rightarrow \gamma ^{+}$, obviously $\underset{\alpha \rightarrow \gamma ^ {+}}{\lim }\underline{Apr^{k,\alpha }_{B}}(X)\subseteq \underset{\gamma <\alpha \le 1}{\bigcup }\underline{Apr^{k,\alpha }_{B}}(X)$ holds. Hence, we only need to certificate
$$\begin{aligned} \bigcup _{\gamma <\alpha \le 1}\underline{Apr^{k,\alpha }_{B}}(X)\subseteq \lim _{\alpha \rightarrow \gamma ^{+}}\underline{Apr^{k,\alpha }_{B}}(X). \end{aligned}$$
$\forall x\in \underset{\gamma <\alpha \le 1}{\bigcup }\underline{Apr^{k,\alpha }_{B}}(X)$, $\exists ~\xi >0$, $\xi \rightarrow 0$,
$$\begin{aligned} P(X~|~R^{k}_{B}(x))\ge \gamma +\xi >\alpha . \end{aligned}$$
Then, we can get $x\in \underset{\alpha \rightarrow \gamma ^{+}}{\lim }\underline{Apr^{k,\alpha }_{B}}(X)$, i.e., $\underset{\gamma <\alpha \le 1}{\bigcup }\underline{Apr^{k,\alpha }_{B}}(X)\subseteq \underset{\alpha \rightarrow \gamma ^{+}}{\lim }\underline{Apr^{k,\alpha }_{B}}(X)$. Therefore, one concludes that $\underset{\alpha \rightarrow \gamma ^{+}}{\lim }\underline{Apr^{k,\alpha }_{B}}(X)=\underset{\gamma <\alpha \le 1}{\bigcup }\underline{Apr^{k,\alpha }_{B}}(X)$.
2.
When $\alpha \rightarrow \gamma ^{-}$, obviously $\underset{0<\alpha <\gamma }{\bigcap }\underline{Apr^{k,\alpha }_{B}}(X)\subseteq \underset{\alpha \rightarrow \gamma ^{-}}{\lim }\underline{Apr^{k,\alpha }_{B}}(X)$. Similarly, we only need to certificate
$$\begin{aligned} \lim _{\alpha \rightarrow \gamma ^{-}}\underline{Apr^{k,\alpha }_{B}}(X)\subseteq \bigcap _{0<\alpha <\gamma }\underline{Apr^{k,\alpha }_{B}}(X). \end{aligned}$$
$\forall x\in \underset{\alpha \rightarrow \gamma ^{-}}{\lim }\underline{Apr^{k,\alpha }_{B}}(X)$, $\exists ~\xi >0$, $\xi \rightarrow 0$,
$$\begin{aligned} P(X~|~R^{k}_{B}(x))\ge \alpha >\gamma -\xi . \end{aligned}$$
Then, we can get $x\in \underset{0<\alpha <\gamma }{\bigcap }\underline{Apr^{k,\alpha }_{B}}(X)$,
$$\begin{aligned} \lim _ {\alpha \rightarrow \gamma ^{-}}\underline{Apr^{k,\alpha }_{B}}(X)=\bigcap _{0<\alpha <\gamma }\underline{Apr^{k,\alpha }_{B}}(X). \end{aligned}$$
Since $\alpha <\gamma $, one gets that $\underline{Apr^{k,\alpha }_{B}}(X)\supseteq \underline{Apr^{k,\gamma }_{B}}(X)$. Therefore, we can get
$$\begin{aligned} \lim _{\alpha \rightarrow \gamma ^{-}}\underline{Apr^{k,\alpha }_{B}}(X)\supseteq \underline{Apr^{k,\gamma }_{B}}(X). \end{aligned}$$
$\forall x\in \underset{\alpha \rightarrow \gamma ^{-}}{\lim }\underline{Apr^{k,\alpha }_{B}}(X)$, when $\xi \rightarrow 0$, $P(X~|~R^{k}_{B}(x))\ge \alpha >\gamma $, we can get
$$\begin{aligned} \lim _{\alpha \rightarrow \gamma ^{-}}\underline{Apr^{k,\alpha }_{B}}(X)\subseteq \underline{Apr^{k,\gamma }_{B}}(X). \end{aligned}$$
Thus, one concludes that $\underset{\alpha \rightarrow \gamma ^{-}}{\lim }\underline{Apr^{k,\alpha }_{B}}(X)=\underset{0<\alpha <\gamma }{\bigcap }\underline{Apr^{k,\alpha }_{B}}(X)=\underline{Apr^{k,\gamma }_{B}}(X)$.

Items 3 and 4 can be proved in the same way, and we don’t repeat it here.$\square $

Then, in order to more precisely express the accuracy of the set, we define the degree of dependence of the decision attribute D on the conditional attribute set B as follows.

Definition 8

Given a decision information system $\langle U,C\cup D\rangle $, $B\subseteq C$, k is a given positive integer, and the parameters $0\le \beta <\alpha \le 1$, then the dependence of the decision D on the conditional attribute set B can be defined as

$$\begin{aligned} \gamma _{B}(D)=\frac{|POS_{B}(D)|}{|U|}. \end{aligned}$$

(26)

The positive region $POS_{B}(D)=\underline{Apr^{k,\alpha }_{B}}(D)$ abstractly represents all deterministic rules, where the number of elements is the number of all deterministic rules, and the dependence function $\gamma _{B}(D)$ is the proportion of deterministic rules among all rules, quantifying the consistency of the conditional attribute B with respect to the decision D for a given positive integer k.

According to $POS_{B}(D)\subseteq U$, we have $0\le \gamma _{B}(D)\le 1$. Obviously, $POS_{B}(D)=U\Leftrightarrow \gamma _{B}(D)=1\Leftrightarrow \langle U,C\cup D\rangle $ is a consistent decision information system. Meanwhile, the larger the dependency $\gamma _{B}(D)$, the greater the role of the conditional attribute B in determining the decision D.

Example 2

(Continued from Example 1) According to $POS_{B}(D)=\bigcup \limits _{i=1}^{n}\underline{Apr^{k,\alpha }_{B}}(D_{i})$, we can get that $POS_{B_{1}}(D)=U$ and $POS_{B_{2}}(D)=\{x_{4}, x_{5}\}$. By Definition 8, we get that $\gamma _{B_{1}}(D)=1$ and $\gamma _{B_{2}}(D)=0.25$. From the dependence of KNPRSs, the consistency of $B_{1}$ is better than that of $B_{2}$, and the conditional attribute $B_{1}$ has a greater role in determining the decision D than the conditional attribute $B_{2}$.

3.2 An attribute reduction approach of water quality indicators based on KNPRSs

By the conclusion of the previous subsection, we know that the dependency of a decision attribute d reflects the classification ability of a subset of attributes, so this subsection measures the importance of a single attribute to a subset of attributes by calculating the difference between the dependency of d on the original set of attributes and the dependency on the reduced set of attributes by one attribute.

Definition 9

Given a decision information system $\langle U,C\cup D\rangle $, $B\subseteq C$, $a\in B$, define the importance of attribute a for the conditional attribute B under decision D as

$$\begin{aligned} sig(a,B,D)=\gamma _{B}(D)-\gamma _{B-\{a\}}(D). \end{aligned}$$

(27)

From Definition 9, we find that if $sig(a,B,D)\le 0$, then the elimination of attribute a improves the classification ability of the attribute set or has no effect on the classification ability, and a is said to be unnecessary in B with respect to D and can be eliminated. Conversely, if $sig(a,B,D)>0$, then the elimination of attribute a has a negative effect on the classification ability of the attribute set, and a is necessary in B with respect to D and should be retained.

Definition 10

Assume that $\langle U,C\cup D\rangle $ is a decision information system and $P\subseteq C$. Then, P is said to be an attribute reduction of C with respect to D if there is $\gamma _{P}(D)\ge \gamma _{C}(D)$ and any of the conditional attributes in P is necessary in P with respect to D.

Based on Definitions 9 and 10, we can find that by iterating through all the attributes a in C, we can calculate the importance of sig(a, B, D). And when $sig(a,B,D)\le 0$, we can eliminate the attribute a until any one of the conditions is necessary, so that we can get the final core attribute and complete the attribute reduction. The specific decision-making steps and processes are illustrated by Algorithm 1.

Algorithm 1

Attribute reduction algorithm based on KNPRSs.

Full size image

In Algorithm 1, steps 2-8 compute the k-nearest-neighbor class of the red, and the time complexity is $\textrm{O}(|U|^{2}\times |C|)$. The time complexity of steps 10-15 is $\textrm{O}(|U/D|)$. The time complexity of steps 18 is $\textrm{O}((|C|+1)\times |C|\div 2)$. The time complexity of steps 17-23 is $\textrm{O}(|U|^{2}\times |C|\times (|C|+1)\times |C|\div 2 )$. The time complexity of Algorithm 1 is $\textrm{O}(|U|^{2}\times |C|\times (|C|+1)\times |C|\div 2 )$.

Definition 11

Given a water quality indicators system $\langle U,C$ $\cup D\rangle $, D is the target water quality indicator to be predicted, $red \subseteq C$, we define red as an attribute reduction of C if and only if $\gamma _{red}(D)\ge \gamma _{C}(D)$, and any of the conditional attributes in red are necessary in red with respect to D.

3.3 A water quality prediction approach based on PSO-LSTM

Water quality data are mainly collected through wireless sensor monitoring equipment at 13 automatic monitoring stations along the line, and the equipment automatically collects water quality data every 6 hours to monitor the physical and biochemical characteristics and health of water quality. The automatic monitoring stations have been in working condition since the opening of the line, with a time-series complete water quality database. So this paper is based on deep learning in the recurrent neural network framework for targeted model training and prediction of relevant water quality data. In this section, we optimize the Long Short-Term Memory (LSTM) based on the characteristics of time-series, instability and non-linearity of water quality monitoring data, and construct a model suitable for water quality prediction.

The LSTM neural network has good performance in long-distance time-series data through the memory cell and “gate" structure, which is suitable for the long span of data stored in the automatic monitoring stations of the South-North Water Transfer Middle Route. However, at the same time, the 13 automatic monitoring stations along the middle route have different water quality conditions and different complexity. So it is difficult to set the LSTM network structure hyperparameters suitable for each station by manual tuning without prior analysis of the water quality mechanism, which has a certain impact on the accuracy of the LSTM water quality prediction model.

PSO algorithm is a kind of bionic global optimization algorithm to simulate the foraging process of birds, using PSO algorithm to optimize the initial hyperparameters of the LSTM, using the population iterative update, to find the optimal combination of hyperparameters, and in the combination of parameters to train the LSTM water quality prediction model, to achieve the optimal prediction effect. Therefore, in this paper, while constructing the LSTM network structure, the PSO algorithm is added to complete the optimization of the initial hyperparameters, and a Long Short-Term Memory network is constructed based on the Particle Swarm Optimization(PSO-LSTM).That is, according to the following specific iterative process.

Step 1 Initialize: initialize the hyperparameters of the LSTM: time step, number of hidden layer units, learning rate. Initialize the relevant parameters of the population;

Step 2 Input data : input the data after data preprocessing and test set partitioning to the LSTM iterative model for training, and output the loss function;

Step 3 Update the particles: input the loss function as the objective function into the PSO algorithm, calculate the individual adaptation value to evaluate the particles, and complete the update of particle position and velocity;

Step 4 Judge Iteration: if not, the current hyperparameters are updated to the LSTM iterative model and Step 2 is repeated;

Step 5 Complete the final LSTM construction: if the stopping condition is satisfied, the optimized hyperparameters are output and the final LSTM is updated;

Step 6 Predicte water quality: the final optimized PSO-LSTM is used to predict the water quality results.

Algorithm 2

Water quality prediction method based on PSO-LSTM.

Full size image

Table 3 Dissolved oxygen data ADF test results of station A

Full size table

As shown in Fig. 2, firstly, the initial loss function is output as the fitness function of the population by inputting the data into the LSTM. Secondly, update the velocity and position of the particles in the population, and determine whether the current optimum satisfies the termination condition. If not, the current hyperparameter combination is updated to the LSTM iterative model, and the loss function is retrained and output; if it is satisfied, the optimal hyperparameter combination is output, and the final LSTM is constructed. The specific decision-making steps and processes are illustrated by Algorithm 2.

In Algorithm 2, the time complexity of PSO is $\textrm{O}(n\times m)$, which n represents the number of populations, m represents the number of iterations. The time complexity of LSTM in step 7 is $\textrm{O}(h\times i + h^{2} + h)$,which h represents the hidden size, i represents the input size. The time complexity of Algorithm 2 is $\textrm{O}((n\times m)(h\times i + h^{2} + h))$.

4 Data experiment

In this section, we validate the feasibility and superiority of water quality prediction method based on KNPRSs and PSO-LSTM.

The data in this paper are obtained from the water quality data of 13 automatic monitoring stations in the South-North Water Transfer Middle Route in China from 2019.1 to 2021.6. The automatic monitoring stations are automatically collected every 6 hours, with 3000-3500 data each. 12 water quality indicators were monitored: water temperature, pH, conductivity, turbidity, dissolved oxygen(DO), ammonia nitrogen(NH3-N), permanganate index(COD Mn), dissolved organic matter, integrated biological toxicity, total phosphorus(TP), total nitrogen(TN), chlorophyll a(chla).

4.1 Experiment preparation

4.1.1 Stability of time-series data

Stability is a very important statistical property of time-series. Only time-series with stability can be valuable for prediction, otherwise, the prediction results will have lags and other problems. In this paper, the ADF (Augmented Dickey-Fuller) stability test is used to test the stability of the water quality. Dissolved oxygen data in station A as an example. The results are shown in Table 3, the p-value of the test is 0.5172, the T-statistic is -1.5328, which is greater than the critical value of 10% confidence level, is a non-stability time-series. After the first-order difference of the data, the P-value is less than 0.0001 and the T-statistic is -12.6644, which is less than the critical value of 1% confidence level, and it is a stability time-series.

Then, the time-series data of dissolved oxygen index at the station A are first-order differenced to obtain the stability time-series data. As shown in Fig. 3, predicted results of water quality data that are not stable are delayed and have low prediction accuracy.

4.1.2 Parameters of KNPRSs

Suppose that the water quality classification of i at the $t+1$ time of the indicator i as the prediction target, as the decision attribute of the t time sample data.

Each target water quality indicator corresponds to a decision attribute $d_{i}$, and $D=\{0,1,2\}$. Among them, $d_{i}=0$ means the water quality indicator is in the normal range, $d_{i}=1$ means the water quality indicator is in the yellow warning range, $d_{i}=2$ means the water quality indicator has reached the red warning range.

The next warning interval of the target water quality indicator i as a decision attribute, through the attribute simplification model based on KNPRSs, that is, the Algorithm 1 , we can get the attribute simplification red about the water quality indicator i, called the target water quality indicator of the nuclear attribute. The value of k is generally 0.05 of the length of the data set, so this paper takes $k=150$; $\alpha $ is generally in the interval of [0.5,1], and this paper takes the value of 0.9.

4.1.3 Parameters of PSO-LSTM

The overall network structure of PSO-LSTM consists of four layers: an input layer, two hidden layers and a fully connected output layer. The number of hidden units in the hidden layer is used as a hyperparameter to be optimized and trained in the PSO algorithm. The number of neurons in the input and output layers is matched to the dimensionality of the input and output and the number of hidden units in the hidden layer.

The time step is also an important parameter, which determines how long ago the cell structure can remember information. Too large a time step can lead to the interference of historical useless information in the prediction, while too small a time step can lead to the lack of sufficient memory information. Therefore, in this paper, the time step is used as a hyperparameter for the optimization of the PSO algorithm.

In this paper, the Adam optimizer is used to update the weights of the LSTM network. If the initial learning rate is set too small, the convergence will be too slow, and if the initial learning rate is set too large, the model will be scattered or fall into local minima. Therefore, it is important to choose a suitable initial learning rate for the training of the model, and the initial learning rate is also used as one of the hyperparameters for the optimization of PSO algorithm in this paper.

The loss function uses the mean square error MSE, which is the summed average of the squares of the differences between the true and predicted values, and can be derived by summing. The specific formula is as follows.

$$\begin{aligned} MSE=\frac{1}{n}\sum \limits ^{n}_{i=1}(y_{i}-y^{\prime }_{i})^{2} \end{aligned}$$

(28)

The training set is used for training the model and the test set is used for evaluating the performance of the model. According to the reference of several experiments, the number of hidden layer units $unit=20$, time step $timestep=20$, learning rate $learnrate=0.002$, and the maximum number of iterations is 200. The loss function (28) is input to the PSO algorithm as the fitness function.

The PSO algorithm needs better global search ability in the early stage and stronger local search ability in the late stage. Therefore, in this paper, the weight w is taken as a linear function decreasing with time, that is, $w=w_{max}-(w_{max}-w_{min})\frac{t}{t_{max}}$. $w_{max}=1.4$, $w_{min}=0.4$, t is the number of current iteration rounds, and $t_{max}$ is the maximum number of iterations.

Similar to the weight w, the learning factors $c_{1}$ and $c_{2}$ can balance the ability of local search and global search, the larger the value of learning factor, the better the convergence of the algorithm and increase the local search ability, contrary to the weight w. According to the value of weight w, this paper takes the values of $c_{1}=2$, $c_{2}=2$.

Finally, set the search area boundaries $unit\in [10, 30]$, $timestep\in [10, 30]$, $learnrate\in [0, 0.1]$. According to the Algorithm 2 , the PSO-LSTM optimization model is output and the final water quality data is predicted.

4.2 Experimental procedure and analysis

In order to verify the performance of the water quality prediction method based on KNPRSs and PSO-LSTM (KNPRSs-PSO-LSTM) proposed in this paper, six target water quality indicators are randomly selected from three of the 13 automatic monitoring stations along the middle route: DO, NH3-N, COD Mn, TP, TN, chla to carry out the prediction task. At the same time, the method is compared with three other different water quality prediction models: water quality prediction method based on Long Short-Term Memory (LSTM), water quality prediction method based on neighborhood rough sets and Long Short-Term Memory (NRS-LSTM), water quality prediction method based on k-nearest-neighbor probability rough sets and Long Short-Term Memory (KNPRSs-LSTM).

Table 4 Table of prediction model evaluation results of the six target water quality indicators at station A

Full size table

Three different evaluation indicators are used to analyze the results of water quality prediction, so as to reflect the performance of the model comprehensively. Three evaluation indicators are Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE). The formulae for calculating are as follows.

$$\begin{aligned}&MAE=\frac{1}{n}\sum \limits ^{n}_{i=1}|y_{i}-y^{\prime }_{i}|,\end{aligned}$$

(29)

$$\begin{aligned}&RMSE=\sqrt{\frac{1}{n}\sum \limits ^{n}_{i=1}(y_{i}-y^{\prime }_{i})^{2}},\end{aligned}$$

(30)

$$\begin{aligned}&MAPE=\frac{1}{n}\sum \limits ^{n}_{i=1}\frac{|y_{i}-y^{\prime }_{i}|}{y_{i}}. \end{aligned}$$

(31)

Where $y_{i}$ represents the true value of water quality data, $y^{\prime }_{i}$ represents the predicted value of water quality data, and n represents the number of data sets.

All methods are executed in Python, and run in hardware environment with AMD Ryzen 5 5600H with Radeon Graphics, 3.30 GHz, with 16 GB RAM.

The predicted results of the six target water quality indicators of automatic monitoring station A are shown in Fig. 4. The curve “TRUE" represents the real test set data.

The prediction model evaluation results of the six target water quality indicators of automatic monitoring station A are shown in Table 4.

The bar graph is shown in Fig. 5.

The predicted results of the six target water quality indicators of automatic monitoring station B are shown in Fig. 6. The curve “TRUE" represents the real test set data.

The prediction model evaluation results of the six target water quality indicators of automatic monitoring station B are shown in Table 5.

Table 5 Table of prediction model evaluation results of the six target water quality indicators at station B

Full size table

The bar graph is shown in Fig. 7.

The predicted results of the six target water quality indicators of automatic monitoring station C are shown in Fig. 8. The curve “TRUE" represents the real test set data.

The prediction model evaluation results of the six target water quality indicators of automatic monitoring station C are shown in Table 6.

The bar graph is shown in Fig. 9.

Based on the above experimental results, the following analysis is conducted:

1.
From the overall view, four models based on LSTM networks showed high prediction accuracy in water quality prediction, except for NH3-N and TP indicators of station A. Based on MAPE, the prediction accuracy reached over $90\%$. This is an example of the superiority of the memory cell and “gate" structure of LSTM in time-series data prediction. It proves that the KNPRSs-PSO-LSTM built on the basis of LSTM is feasible and effective for the time-series nature of water quality data in this paper.
2.
From (c) of Figs. 5, 7 and 9, we can get that the performance of LSTM on the prediction of target water quality indicators in different regions is inconsistent. Through the MAPE, it can be found that the prediction of NH3-N and TP is poor at station A, while the prediction of TP and chla is poor at station B, and the prediction of NH3-N, TP and chla is poor at station C. These reflect the complexity of the water quality situation, using the original indicator system and unified model hyperparameters, can not achieve a high prediction accuracy on all the target water quality indicators. After using the NRS and KNPRSs for attribute reduction and eliminating the interference of redundant attributes, the prediction accuracy is improved, and the effect of KNPRSs is better than that of NRS, which also verifies the superiority of the attribute reduction model based on KNPRSs proposed in this paper compared with the traditional attribute reduction methods. Further, based on the optimization of KNPRSs-LSTM, the model hyperparameters are optimized by using PSO algorithm, so that the KNPRSs-PSO-LSTM presents very high prediction accuracy for six target water quality indicators of three different stations.
3.
From Figs. 4, 6 and 8, we can get that when the water quality time-series data present good stability, the prediction results of the four models fit better with the exact values. However, when faced with less stable time-series data, and without eliminating redundant attributes, the fitting effect of the LSTM using the initial hyperparameters is poor. The NRS-LSTM and the KNPRSs-LSTM with some noise interference removed only improve the fitting effect to a certain extent. The KNPRSs-PSO-LSTM, on the other hand, can find the optimal hyperparameters adaptively when facing the time-series data with low stability, which greatly improves the fitting effect. This reflects the superiority of the KNPRSs-PSO-LSTM proposed in this paper for solving the problem of poor stability of water quality data.

Table 6 Table of prediction model evaluation results of the six target water quality indicators at station C

Full size table

In summary, this paper, in view of the complexity of water quality indicators, through the KNPRSs of feature attribute screening, a good realization of the effective data dimensionality reduction, reduce the noise interference; for the instability of water quality data, through the PSO-LSTM of optimization of hyperparameters, adaptive data of different indicators in different areas, improve the stability of the model, so that the prediction accuracy has been greatly improved. The above experimental results prove that the KNPRSs-PSO-LSTM is an effective, reliable and efficient method for water quality prediction.

5 Conclusions

For the water quality data from 13 automatic monitoring stations along the South-North Water Transfer Middle Route in China, this paper proposes a novel water quality data prediction method based on the k-nearest-neighbor probability rough sets and PSO-LSTM from the time-series, instability, indicator complexity of water quality data, and with the goal of water quality forecast and early warning. The main work is summarized as follows:

1.
First, we propose a water quality attribute reduction method based on a novel model of KNPRSs. The water quality data of the South-North Water Transfer Middle Route has the complexity of indicators, each automatic monitoring station collects data of 12 related indicators, the water quality situation is not the same, and different target indicators of different stations, their correlation indicators are not the same. In this paper, the complexity of the data set and the limitations of the traditional neighborhood rough sets are fully considered, and probabilistic rough sets based on the k-nearest- neighbor algorithm are constructed. In this way, we have completed the attribute reduction of the target water quality attributes, eliminated the redundant attributes and extracted the key attributes. The input data dimension of the subsequent neural network prediction model is streamlined, and the accuracy of the water quality data prediction is verified through relevant numerical experiments.
2.
Secondly, we study a water quality prediction method based on PSO-LSTM. In view of the time-series and instability of water quality data, this paper combines the Long Short-Term Memory network and Particle Swarm Optimization algorithm to build a PSO-LSTM water quality prediction model. The model with LSTM neural network as the main body, can well cope with the time-series of water quality data, through the “gate" structure and memory cells to selectively retain historical information. At the same time, the PSO algorithm is used to optimize the hyperparameters of the LSTM neural network, and the model is adaptively optimized for different indicators at different stations. Through relevant numerical experiments, it’s verified that the MAE, RMSE and MAPE are reduced compared with the common LSTM neural network, and the PSO-LSTM shows a better performance in water quality data prediction.

For the proposed water quality attribute reduction method based on the novel model of KNPRSs, the selection of parameter $\alpha $ also affects the final reduction results. In the future, we will study how to provide support for the selection of $\alpha $ according to the characteristics of the data set. In addition, how to improve the training speed of the model while ensuring the accuracy of the prediction is the main direction to be investigated in the future.

Data Availability and Access

The data that has been used is confidential.

References

Pawlak Z, Skowron A (2007) Rudiments of rough sets. Inf Sci 177(1):3–27
Article MathSciNet Google Scholar
Pawlak Z (1998) Rough set theory and its applications to data analysis. Cybern Syst 29(7):661–688
Article Google Scholar
Lin T (1988) Neighborhood systems and approximation in relational databases and knowledge bases. In: Proceedings of the 4th international symposium on methodologies of intelligent systems, citeseer, pp 75–86
Yao Y (1998) Relational interpretations of neighborhood operators and rough set approximation operators. Inf Sci 111(1–4):239–259
Article MathSciNet Google Scholar
Wu WZ, Zhang WX (2002) Neighborhood operator systems and approximations. Inf Sci 144(1–4):201–217
Article MathSciNet Google Scholar
Hu Q, Yu D, Liu J et al (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594
Article MathSciNet Google Scholar
Ma L (2012) On some types of neighborhood-related covering rough sets. Int J Approx Reason 53(6):901–911
Article MathSciNet Google Scholar
Li W, Huang Z, Jia X et al (2016) Neighborhood based decision-theoretic rough set models. Int J Approx Reason 69:1–17
Article MathSciNet Google Scholar
Zhang Y, Miao D, Zhang Z et al (2018) A three-way selective ensemble model for multi-label classification. Int J Approx Reason 103:394–413
Article MathSciNet Google Scholar
Fujita H, Gaeta A, Loia V et al (2018) Resilience analysis of critical infrastructures: a cognitive approach based on granular computing. IEEE Trans Cybern 49(5):1835–1848
Article Google Scholar
Yang X, Chen Y, Fujita H et al (2022) Mixed data-driven sequential three-way decision via subjective-objective dynamic fusion. Knowl-Based Syst 237:107728
Article Google Scholar
Liu J, Lin Y, Ding W et al (2023) Multi-label feature selection based on label distribution and neighborhood rough set. Neurocomputing 524:142–157
Article Google Scholar
Yin T, Chen H, Yuan Z et al (2023) Noise-resistant multilabel fuzzy neighborhood rough sets for feature subset selection. Inf Sci 621:200–226
Article Google Scholar
Wang C, Hu Q, Wang X et al (2017) Feature selection based on neighborhood discrimination index. IEEE Trans Neural Netw Learn Sys 29(7):2986–2999
MathSciNet Google Scholar
Wang C, Shi Y, Fan X et al (2019) Attribute reduction based on k-nearest neighborhood rough sets. Int J Approx Reason 106:18–31
Article MathSciNet Google Scholar
Wan J, Chen H, Yuan Z et al (2021) A novel hybrid feature selection method considering feature interaction in neighborhood rough set. Knowl-Based Syst 227:107167
Article Google Scholar
Sang B, Chen H, Yang L et al (2021) Feature selection for dynamic interval-valued ordered data based on fuzzy dominance neighborhood rough set. Knowl-Based Syst 227:107223
Yang X, Li M, Fujita H et al (2022) Incremental rough reduction with stable attribute group. Inf Sci 589:283–299
Article Google Scholar
Yang X, Chen H, Li T et al (2022) Student-t kernelized fuzzy rough set model with fuzzy divergence for feature selection. Inf Sci 610:52–72
Article Google Scholar
Hu M, Tsang EC, Guo Y et al (2021) A novel approach to attribute reduction based on weighted neighborhood rough sets. Knowl-Based Syst 220:106908
Article Google Scholar
Hu M, Tsang EC, Guo Y et al (2022) Attribute reduction based on overlap degree and k-nearest-neighbor rough sets in decision information systems. Inf Sci 584:301–324
Article Google Scholar
Wong SM, Ziarko W (1987) Comparison of the probabilistic approximate classification and the fuzzy set model. Fuzzy Sets Syst 21(3):357–362
Article MathSciNet Google Scholar
Wang G, Yu H et al (2015) Monotonic uncertainty measures for attribute reduction in probabilistic rough set model. Int J Approx Reason 59:41–67
Article MathSciNet Google Scholar
Xie J, Hu BQ, Jiang H (2022) A novel method to attribute reduction based on weighted neighborhood probabilistic rough sets. Int J Approx Reason 144:1–17
Article MathSciNet Google Scholar
Maier HR, Dandy GC (2000) Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications. Environ Model Softw 15(1):101–124
Article Google Scholar
McCulloch WS, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5:115–133
Article MathSciNet Google Scholar
Hebb DO (2005) The organization of behavior: a neuropsychological theory. Psychol Press
Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386
Article Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
Article Google Scholar
Jin W, Li ZJ, Wei LS et al (2000) The improvements of bp neural network learning algorithm. In: WCC 2000-ICSP 2000. 2000 5th International conference on signal processing proceedings. 16th World computer congress 2000, IEEE, pp 1647–1649
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MathSciNet Google Scholar
Bengio Y, Lamblin P, Popovici D et al (2006) Greedy layer-wise training of deep networks. Adv Neural Inf Process Sys 19
Ranzato M, Poultney C, Chopra S et al (2006) Efficient learning of sparse representations with an energy-based model. Adv Neural Inf Process Sys 19
Kirzhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Sys 25:1097–1105
Google Scholar
Zhang L, Zhao JQ, Zhang XN et al (2013) Study of a new improved pso-bp neural network algorithm. J Harbin Inst Tech 20(5):106–112
Google Scholar
Wang S, Zhang N, Wu L et al (2016) Wind speed forecasting based on the hybrid ensemble empirical mode decomposition and ga-bp neural network method. Renew Energy 94:629–636
Article Google Scholar
Xu X, Ding S, Jia W et al (2013) Research of assembling optimized classification algorithm by neural network based on ordinary least squares (ols). Neural Comput Applic 22:187–193
Lei L (2018) Wavelet neural network prediction method of stock price trend based on rough set attribute reduction. Appl Soft Comput 62:923–932
Article Google Scholar
He H, Lu Z, Zhang C et al (2021) A data-driven method for dynamic load forecasting of scraper conveyer based on rough set and multilayered self-normalizing gated recurrent network. Energy Rep 7:1352–1362
Wang Y, Zhou J, Chen K et al (2017) Water quality prediction method based on lstm neural network. In: 2017 12th International conference on intelligent systems and knowledge engineering (ISKE), IEEE, pp 1–5
Ren T, Liu X, Niu J et al (2020) Real-time water level prediction of cascaded channels based on multilayer perception and recurrent neural network. J Hydrol 585:124783
Article Google Scholar
Remolina MCR, Li Z, Peleato NM (2022) Application of machine learning methods for rapid fluorescence-based detection of naphthenic acids and phenol in natural surface waters. J Hazard Mater 430:128491
Article Google Scholar
Wang S, Peng H, Liang S (2022) Prediction of estuarine water quality using interpretable machine learning approach. J Hydrol 605:127320
Article Google Scholar
Wang L, Dong H, Cao Y et al (2023) Real-time water quality detection based on fluctuation feature analysis with the lstm model. J Hydroinformatics 25(1):140–149
Article Google Scholar
Tao D, Yang Y, Cai Z et al (2023) Application of vmd-lstm in water quality prediction. In: Journal of physics: conference series, IOP Publishing, p 012057
Lin TY et al (1998) Granular computing on binary relations i: data mining and neighborhood systems. Rough Sets Knowl Discov 1(1):107–121
Google Scholar
Yao Y (2008) Probabilistic rough set approximations. Int J Approx Reason 49(2):255–271
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comp 9(8):1735–1780
Article Google Scholar
Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. In: MHS’95. In: Proceedings of the sixth international symposium on micro machine and human science, IEEE, pp 39–43
Marini F, Walczak B (2015) Particle swarm optimization (pso). a tutorial. Chemometr Intell Lab Syst 149:153–165
Article Google Scholar

Download references

Acknowledgements

The work described in this paper was supported by grants from the National Natural Science Foundation of China (Grant nos. 11971365 and 11571010) and the Key Project of Guangxi Natural Science Foundation (Grant No. 2023GXNSFDA026006).

Author information

Authors and Affiliations

School of Mathematics and Physics, Guangxi Minzu University, Nanning, 530006, People’s Republic of China
Bao Qing Hu
School of Mathematics and Statistics, Wuhan University, Wuhan, 430072, People’s Republic of China
Minrui Huang, Bao Qing Hu & Haibo Jiang
School of Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, People’s Republic of China
Haibo Jiang
School of Science, Wuhan University of Technology, Wuhan, 430070, People’s Republic of China
Bo Wen Fang

Authors

Minrui Huang
View author publications
You can also search for this author in PubMed Google Scholar
Bao Qing Hu
View author publications
You can also search for this author in PubMed Google Scholar
Haibo Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Wen Fang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Minrui Huang: Methodology, Investigation, Writing-original draft. Bao Qing Hu: Methodology, Writing-Reviewing and Editing. Haibo Jiang: Methodology, Writing-Reviewing and Editing. Bo Wen Fang: Methodology, Investigation and Writing-Reviewing.

Corresponding author

Correspondence to Bao Qing Hu.

Ethics declarations

Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical and informed consent for data used

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Huang, M., Hu, B.Q., Jiang, H. et al. A water quality prediction method based on k-nearest-neighbor probability rough sets and PSO-LSTM. Appl Intell 53, 31106–31128 (2023). https://doi.org/10.1007/s10489-023-05024-2

Download citation

Accepted: 18 September 2023
Published: 04 December 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10489-023-05024-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A water quality prediction method based on k-nearest-neighbor probability rough sets and PSO-LSTM

Abstract

Similar content being viewed by others

Water quality prediction using machine learning models based on grid search method

CART and PSO+KNN algorithms to estimate the impact of water level change on water quality in Poyang Lake, China

Performance of machine learning methods in predicting water quality index based on irregular data set: application on Illizi region (Algerian southeast)

Explore related subjects

1 Introduction

1.1 A short review of rough sets theory

1.2 A brief introduction of deep learning

1.3 Motivation and contribution of this research

1.4 Layout of this paper

2 Basic concepts

2.1 Probabilistic rough sets

Definition 1

Definition 2

2.2 LSTM

2.3 Particle swarm optimization

Definition 3

3 A water quality prediction method based on k-nearest-neighbor probability rough sets and PSO-LSTM

3.1 K-nearest-neighbor probability rough sets

Definition 4

Definition 5

Proposition 1

Proof

Definition 6

Definition 7

Example 1

Proposition 2

Proof

Proposition 3

Proof

Proposition 4

Proof

Definition 8

Example 2

3.2 An attribute reduction approach of water quality indicators based on KNPRSs

Definition 9

Definition 10

Definition 11

3.3 A water quality prediction approach based on PSO-LSTM

4 Data experiment

4.1 Experiment preparation

4.1.1 Stability of time-series data

4.1.2 Parameters of KNPRSs

4.1.3 Parameters of PSO-LSTM

4.2 Experimental procedure and analysis

5 Conclusions

Data Availability and Access

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interest

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation