Predicting Water Quality with Artificial Intelligence: A Review of Methods and Applications

Irwan, Dani; Ali, Maisarah; Ahmed, Ali Najah; Jacky, Gan; Nurhakim, Aiman; Ping Han, Mervyn Chah; AlDahoul, Nouar; El-Shafie, Ahmed

doi:10.1007/s11831-023-09947-4

Predicting Water Quality with Artificial Intelligence: A Review of Methods and Applications

Review article
Published: 13 June 2023

Volume 30, pages 4633–4652, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Archives of Computational Methods in Engineering Aims and scope Submit manuscript

Predicting Water Quality with Artificial Intelligence: A Review of Methods and Applications

Download PDF

Dani Irwan¹,
Maisarah Ali¹,
Ali Najah Ahmed ORCID: orcid.org/0000-0002-5618-6663²,
Gan Jacky²,
Aiman Nurhakim²,
Mervyn Chah Ping Han²,
Nouar AlDahoul³ &
…
Ahmed El-Shafie⁴

1377 Accesses
15 Citations
Explore all metrics

Abstract

The water is the main pivotal sources of irrigation in agricultural activities and affects human daily activities such as drinking. The water quality has a significant impact on various aspects and thus this review aims to addresses existing problems related to water quality prediction methods that have been found in the literature. We explore numerous quality parameters incorporated in the modelling process to measure the quality of water. Furthermore, we review the commonly adopted artificial intelligence-based models which have been utilized to forecast the water quality. 83 studies published from 2009 to 2023 were selected and reviewed based on their success in modelling and forecasting the water quality in multiple regions. We compared these articles in terms of parameters, modelling algorithms, time scale scenarios, and performance measurement indicators. This paper is beneficial to researchers that have interests to conduct future studies related to water quality forecasting. Additionally, we discuss a variety of modelling methods such as deep learning (DL) that have proven to boost the efficiency compared to traditional machine learning (ML) models. As a result, the hybrid-DL models were found to outperform other models such as standalone ML, standalone DL, and hybrid-ML. This study shows a significant limitation of the data-hungry DL models which require a big data size for modelling. Hence, at the end of this review study, we discuss the potential of some methods such as generative adversarial networks (GANs) and attention-based transformer to open the door for water quality prediction improvement. GAN has shown promising performance in other domains for synthetic data generation. The potential usage of GAN for water quality domain can overcome the limitations of lack of data and enhance the performance of the predictive models reviewed in this study. Similarly, transformer was found to be state of the art model for time series prediction and thus it can be good candidate to predict water quality.

Multi-step Ahead Urban Water Demand Forecasting Using Deep Learning Models

Article 28 September 2023

Short-term urban water demand forecasting; application of 1D convolutional neural network (1D CNN) in comparison with different deep learning schemes

Article 21 September 2023

Artificial Intelligence Generated Synthetic Datasets as the Remedy for Data Scarcity in Water Quality Index Estimation

Article 28 October 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

River water is the main pivotal sources of irrigation in agricultural activities and affects human daily activities such as drinking. It is essential to forecast the future quality of river water using machine learning models. The Water Quality Index (WQI) of a river is dependent on the various quality parameters. There are various quality parameters presented in the literature for water. Hence, researchers have utilized numerous combinations of parameters with various machine learning models to forecast the water quality of a river and the results were promising. They used total dissolved solids, chlorophyll a, total suspended solids, turbidity, and blue-green algae phycocyanin with different machine-learning models including extreme learning machine regression, support vector machine regression, Gaussian process regression, linear regression, and partial least-squares regression to predict the mentioned variables [1]. In another study, physicochemical parameters such as concentrations of Ca²⁺, Mg²⁺, Na⁺, SO4²⁻ and CI⁻ were used as inputs to obtain the salinity of the river water [2]. Researchers utilized different predictive models namely standalone machine learning (ML), deep learning (DL), and hybrid models to forecast river water quality. The input data were monitored and obtained from each country’s research center and they were collected in various scenarios such as hourly, every 4 h, daily, and monthly [3,4,5,6,7]. According to all the previous research works, a more user-centric approach is required to mitigate the water quality issues using user-friendly tools and an interactive environment [8]. They found that there was no way for identifying the best network structure for forecasting the parameters of water quality [9, 10].

The artificial intelligence approaches have been considered and applied across many countries to forecast the parameters of water quality. Among the regular models utilized are ML models followed by the hybrid model and DL models. The Deep learning methods were not commonly used in prediction because they require a vast amount of data in training stage. In other words, the performance of deep learning models is highly dependent on the amount of data. To evaluate the predictive models, the performance indicators such as correlation coefficient (R²), Mean Absolute Percentage Error (MAPE), Mean Squared Error (MSE), and Mean Absolute Error (MAE) were used. The comparisons between different models to predict river water quality has discovered that DL models performed better than the ML modal in research works conducted in China [11]. However, other studies showed that hybrid-machine learning models were more accurate [2, 12, 13] and thus sometimes they outperformed deep learning models [14,15,]–[16].

This study aims to review the research works carried out for forecasting the water quality from 2009 to 2023. A summary of the modelling approaches used in the respective studies is presented. The performance of the predictive models used for this purpose is compared and evaluated. Additionally, the input water parameters used to train and test the models are validated and examined by calculating the performance indicators. The limitations of current water quality prediction methods and future research works are highlighted in this paper. Moreover, this paper proposes to utilize the deep-learning-based generative model called Generative Adversarial Networks (GANs) which have not been employed yet for water quality prediction.

Several contributions are included in this review study as follows:

1.
We present 83 studies related to water quality prediction published recently in many countries.
2.
Various methods that have utilized various input parameters such as chemical and meteorological are explored to show the possible combinations of input parameters and their impact on water quality prediction.
3.
Various modelling algorithms including machine learning, deep learning, and hybrid models that have been used in numerous research articles are demonstrated to highlight the advantages and drawback of them to model water quality outputs for forecasting task.
4.
We present several time scale scenarios such as hourly, daily, weekly, and monthly that usually research articles are used to conduct the experiments and analyze the results.
5.
Numerous performance evaluation matrices RMSE, MSE, MAE, R² that have been utilized in reviewed studies are described to highlight their advantages and limitations.

2 Research Methodology and Literature Review

2.1 Research Methodology

The search engine, Google Scholar was used in the preliminary step of this study to search for the relevant scientific research articles. After that, the results shown by the search engine were filtered and analyzed according to the relevancy of the keywords which were “water quality” and any equivalent meaning of the word “prediction”. Only research articles that contained the keywords were considered. Based on the findings of this study, much research works published in recent years were observed. Based on our humble knowledge, there is no comprehensive reviews published on water quality estimation. As a result, in this paper, we are looking for an answer for an open question which is “What is the best network structure to predict the water quality parameters” [9]. Hence, it is critical to perform an analysis on the most recent predictive models and algorithms including data pre-processing and prediction.

The search equation for water quality prediction in Google Scholar insertion was identified. Several combinations of keywords were applied to compose this search equation:

$$\left( {\text{A1 OR A2}} \right){\text{ AND }}\left( {\text{B1 OR B2 OR B3 OR B4}} \right)$$

Where A1 and A2 are “river water quality”, and “water quality index”, respectively. On the other hand, B1, B2, B3, and B4 are: “modelling”, “forecasting”, “prediction”, and “machine learning”, respectively. The research articles were selected from 2009 to 2023.

Figure 1 illustrates the process of filtering and selecting the articles for this review. Where nos stands for number of studies. A total of 83 articles were selected from 801 articles that matched the search equation from the database. Furthermore, 44 articles were rejected due to duplication, and 674 articles were disregarded from this study because they were not about water quality forecasting or their main findings were not relevant to water quality prediction.

The articles reviewed in this review study were selected to cover experiments focused specifically on water quality prediction. We found 83 research articles as shown in Table 1 and Fig. 1. Most of these articles were published in the last 5 years as shown in Fig. 2. Additionally, we selected to review these articles because they used various input parameters to predict the water quality as clear in Table 2. As can be seen, the input parameters are divided into meteorological inputs and chemical inputs. The origin country that the study was located is illustrated in Fig. 3. Furthermore, these articles were selected to cover numerous modelling algorithms such as traditional machine learning (ML), ensemble learning (EL), deep learning (DL), and hybrid models as shown in Table 3 and Fig. 7. Finally, the selection of articles considers also covering various performance metrics such as RMSE, MAE, R2 with various output paraments required to be predicted in several scenarios including hourly, daily, weekly, and monthly as can be seen in Table 4.

Table 1 Ranking of selected journals

Full size table

Table 2 Research works on water quality prediction

Full size table

Table 3 The methods used in each research article

Full size table

Table 4 Various scenarios for time scales of the WQI and the evaluation metrics

Full size table

2.2 Literature Review

The selected articles focused on water quality prediction, the input parameters, and the performance indicators to evaluate the results. These 83 research articles were from 46 journals. The number of articles selected per journal from the highest to the lowest was shown in Table 1. The most selected articles were from the journal of Hydrology with a quantity of 8, followed by Water journal with 7 articles, and Sustainability journal with 5 articles. Next, Environmental Science and Pollution Research journal and Science of The Total Environment journal contained 4 articles each whereas the journal of Environmental Management had 3 articles. The journals which had 2 articles were Complexity, Environment Pollution, IEEE Access, Marine Pollution Bulletin, Neural Computing and Applications, Water Research and Water Supply. And the rest of the journals have 1 article. The year 2022 showed most reviewed articles as shown in Fig. 2. A summary of the research work on water quality prediction is tabulated in Table 2. The table includes the location of the studies, the data size (initial and end dates) used to train the predictive model and the water quality input parameters utilized. The water quality input parameters used in these studies can be classified into 2 categories which are chemical and meteorological.

Figure 2 shows the reviewed studies that were selected to predict the water quality. These studies were grouped by the location where the experiments were conducted. As can be seen, the majority of researches related to water quality prediction were done in China and India.

Figure 3 demonstrates a bar chart of the countries where the studies were located to forecast the water quality. China was ranked the top in estimating water quality, followed by India, Malaysia and Iran. The studies conducted in these 4 countries covered more than half of the reviewed articles. Another 20 countries covered the remaining studies were Algeria, Australia, Bangladesh, Czech Republic, Germany, Ghana, Greece, Hong Kong, Iraq, Ireland, Italy, Kenya, South Korea, New Zealand, Pakistan, Spain, Taiwan, Turkey, USA, and Vietnam. Figure 4 shows the general framework that was usually found in the reviewed studies for water quality modelling including various parameters such as chemical and Meteorological, preprocessing techniques, and modelling algorithms.

The pre-processing techniques have been considered as important stage before modelling process. Usually, water or river data have missing values that result from limitations in sensors. Therefore, identifying these missing values and handling them is significant to clean the data to be prepared for further processing. In literature, several statistic methods have been used for filling missing values in data. Additionally, several values are unreal and far from their actual values. These values are considered as outliers and required to be detected in early stages to avoid any mistakes in modelling process. Furthermore, the values of water input parameters do not have same scale. In other words, some values are large and other are bounded. Therefore, scaling these input parameters can speed up the modelling process and produce more robust modelling results. Several features or input parameters are correlated and some of these parameters have no roles in modelling process, and thus removing these features can enhance the prediction. When large number of parameters used, reducing these parameters by selecting only subset of them is the good solution for prediction improvement. The feature selection process can be engineered or learned considering modelling algorithm. For example, in conventional ML algorithm, feature engineering is well known stage before modelling. On the other hand, deep learning model targets to learn features automatically to improve the prediction. Applying general ML methods without pre-processing techniques is behind the prediction performance degradation.

3 Classification of Studies

Numerous types of input data can be used to make a prediction about water quality indices. Structured data that can be arranged and tabulated were used in each work. Many publications that have been reviewed used chemical inputs for prediction. Additionally, meteorological data were also utilized for prediction. Furthermore, other research works used combining of both chemical and meteorological inputs. The types of input data that were utilized to estimate the river water quality index in the research papers are shown in Fig. 5.

3.1 Chemical Inputs

Lakes, rivers, oceans, and even groundwater can be better understood by chemical input used for analysis. It also demonstrates the maximum degree of pollution that can be absorbed by a body of water without causing harm to the aquatic ecosystem, its inhabitants, and anyone drinking the water. Some of the examples of chemical parameters utilized are pH, alkalinity, chloride, and others that are appropriate. Biochemical oxygen demand (BOD₅), fluoride, salinity, manganese, potassium, calcium, iron, chemicals, sulphate, chloride, silica, magnesium, pH, phosphate, nitrate, ammonium, and are among the 25 water quality factors included in the modelling of the SVMs and ANN as inputs [45]. These inputs were used to predict the dissolved solids, total solids as well total suspended solids.

3.2 Meteorological Inputs

Meteorological inputs are parameters related to the study of the atmosphere and its phenomena, notably as a way of predicting the weather. For instance, relative humidity, temperature, and solar radiation. Because it influences so many other aspects of weather, the temperature is the single most influential factor in both meteorology as well as ecology. Air temperature, humidity, sunshine, and rainfall were used together with a few chemical inputs such as pH, dissolved oxygen, turbidity and electric conductivity. According to the author, priority targets that had field-measurable parameters, readily accessible statistical data, and a substantial effect on water quality were used to narrow down the list of potential predictor factors [48]. Prediction of water quality characteristics according to temperature, dissolved oxygen, pH, total phosphorus, turbidity, and trophic level (position of an organism in the food chain); electrical conductivity (EC), total dissolved solids (TSS), and discharge; and nutrient budget (balance between crop inputs and outputs) [28].

4 Water Quality Index Modeling Techniques

In this section, we discuss various water quality modelling techniques. Water quality index (WQI) prognosis modelling methodologies are summarized in the Fig. 6.

Because the review is about applying machine learning methods to predict water quality, we targeted various techniques that can be used for water quality forecasting. These techniques are divided into traditional machine learning (ML), ensemble learning (EL), deep learning (DL), and hybrid models. ML methods [90] include decision tree (DT), k nearest neighbor (KNN), multi-layer perceptron (MLP), support vector machine (SVM), multiple linear regression (MLR), and adaptive neuro fuzzy inference system (ANFIS). To produce more powerful model, a combination of several models has been used under ensemble learning such as random forest (RF) bagging, gradient boosting (GB), and stacking of models. On the other hand, deep learning methods have been found to produce superior performance when big data is available. They contain deep neural network (DNN), convolutional neural network (CNN), and recurrent neural network (RNN) such as long-term short memory (LSTM). Furthermore, hybrid models have been used to boost the performance using various techniques. Some additional algorithms that cannot be simply classified into any of the aforementioned groups were labelled by a (O) classification system. The Modelling algorithms used in each research article are summarized in Table 3. Additionally, statistics on how often various modelling strategies have been employed in published studies are shown in Fig. 7 to highlight the frequency of using each AI method in research articles between 2009 and 2023.

The selection of each method depends on various factors such as data size (small to large), hidden pattern complexity (easy to difficult to learn), and data type (spatial or temporal). With small datasets that have few features, usually traditional ML methods and ensemble learning give good performance with availability of patterns hidden inside the data. Increasing number of features with more complex patterns was behind the need to use DNN or CNN to learn features before the mapping to prediction. Having a time series with sequential data necessitates the use of RNN and LSTM to predict future data related to time.

4.1 Multi Linear Regression (MLR)

In ML, MLR stands out as one of the most basic and standard algorithms to utilize. The idea behind it is straightforward, and the method performs reliably. When compared to the other models, MLR’s ability to accurately predict outcomes was the lowest. Possible explanation: inputs and outcomes are strongly intertwined in a nonlinear fashion [83]. To simulate the system’s linear interactions, the tried-and-true MLR approach was utilized. It’s frequently utilized to serve as a standard against which other, non-linear models can be evaluated. The purpose of employing MLR in this research was to provide a standard against which other ML-based methods could be evaluated [1].

4.2 Artificial Neural Network (ANN)

An artificial neural network was employed as a benchmark in this study. The independent variable is multiplied by weights and then added to a constant in the intermediate layer, after which they are output from the algorithm. The neural network’s concealed layer performs nonlinear processing of the data, while the output layer is utilized to create learning outcomes [43]. In ANN, the signal is transmitted in one direction, while errors are relayed in the other direction. The output fault is “back propagated”, or sent layer by layer to the input layer via the concealed layer [83].

One of the neural networks’ main strengths is that they can simulate nonlinear relationships with little prior information about those relationships. Several studies advocated neural networks as a reliable method for estimating river water quality, and they anticipate future applications to enhance comprehension of contamination patterns in rivers [5]. For an intelligent early warning system, monitoring and predicting water quality metrics using machine learning models is essential. It is possible that the suggested optimization of hyperparameters in the ANN modelling approaches may result in adequate prediction accuracy for DO, but this might be enhanced by using additional AI models like Random Forest and Boosted Tree method [31].

When it comes to constructing a model to comprehend the connection between the parameters and their dependency on each other, one research was done to successfully addresses the problem of missing variables. The most important input parameters have been determined by a thorough sensitivity study [64]. A variety of MLP models were built and evaluated to find the optimal hidden-layer- and transfer-function sizes. The complexity of an MLP is determined by having more hidden layers which results in more connections and parameters in the artificial neural network (ANN). Similar to MLP, RBF networks were used to model nonlinear data and they were trained in a single stage, rather than iteratively [17].

4.3 Support Vector Machine (SVM)

Among supervised machine learning methods, the support vector machine family of algorithms is useful for addressing issues in both classification and regression [72]. Although commonly employed for classification, support vector machines (SVMs) can also be utilized for regression [91, 92]. To reduce the number of near misses, SVMs define a hyperplane between the classes by seeing data points projected on a plane and increasing the margin [21]. With the help of the structural risk reduction concept, SVM is a model that can overcome the issue of overfitting. The SVM model’s estimations are derived from a support vector that is a tiny sample of the training data [43]. Multiclass classification is another issue that it helps to clear up. Maximizing the shortest distance from the hyperplane to the nearest example is its primary objective. More parameters and limitations are used in this approach to classify or forecast the classes effectively in the multiclass issue [65].

4.4 Adaptive Neuro-Fuzzy Inference System (ANFIS)

By fusing the power of neural networks with the flexibility of a fuzzy inference system, the neuro-fuzzy method can learn and adapt to new situations. Any genuine continuous function on a compact set may be approximated with FIS to arbitrary precision. When constructing an ANFIS, it is also important to carefully pick the most suitable membership functions (MFs) [17]. In terms of accuracy and precision, the adaptive neuro-fuzzy inference system (ANFIS) performed admirably. The Takagi–Sugeno fuzzy inference system is the foundation for this artificial neural network implementation. When analyzing water, this model is among the most widely used ones [54].

As an artificial intelligence model, ANFIS can function beyond the bounds of traditional fuzzy inference and ANN. The ANFIS model can deal with complicated non-linear interactions between input and output since it combines the strengths of ANN and Fuzzy logic. In calculating the WQI, it was fared better than the MLR model [6]. To help map input space to the desired output region, ANFIS utilizes neural network learning techniques and fuzzy reasoning across several layers of a feed-forward network. The WDT-ANFIS method was introduced to reduce the impact of noise on data mining results. The wavelet de-noising technique ANFIS (WDT-ANFIS) model surpassed all the other models in terms of accurately forecasting the water quality metrics [9].

4.5 Decision Tree (DT)

Due to its ease of use, DT has gained widespread popularity. It’s a network, hence it has nodes and links (called “edges”). In DT, choices and their consequences are organized in a hierarchical framework [55]. Decision trees use a tree-like structure to create models for classification and regression. When a dataset is used as an input to this model, it is automatically sliced and diced into manageable chunks. A study using DT suggested a methodology to provide a faster and cheaper method for calculating and forecasting WQIs [71]. The outcomes demonstrate the capacity of the suggested prediction model to correctly forecast the WQI class.

The tree was constructed by breaking the input data into leaf nodes and inner nodes, which may include descendants. If the subset originating from a root node has the same intended output values, or if no new values are added to the forecast, the operation terminates [65]. When it comes to classifying data, the M5 model tree outperforms other decision tree models. The model’s emphasis on numbers makes it more useful for benchmarking against other models [33]. The strengths of decision-tree-based model lie in its efficiency, versatility, and insensitivity to missing data or features. While other machine learning models may be faster, on the whole, decision-tree-based models excel in making short-term forecasts [43].

4.6 Ensemble Model (EM)

Machine learning ensemble models were used to boost the accuracy of predictions. Building an ensemble may be done in two ways: alone and together. Bagging and random forest (RF) are two examples of independent approaches, whereas coordinated methods like gradient boosting (GB) models are more of an example of a hybrid approach [53]. The issue of dividing a dataset into many classes was also addressed. Several decision trees were combined into one larger one to do the categorization. The forecast from each tree in the forest was aggregated, and the class with the most scores was the one that is considered the forest’s output class. It’s a quicker and more adaptable technique, however, it does have its limitations [65].

Nonetheless, ensemble models based on decision trees, such as Random Forest (RF) and Gradient Boosting (GB), nearly always perform better than the individual decision tree [43]. While both RF and DNN provide extensive latitude to account for non-linear correlations between drivers and modelled parameters, doing so carries a risk of overfitting that increases as more drivers are included in the model. Authors compared how well they performed by gradually introducing new drivers and documenting the performance boost that came with it [56]. In GB, they utilized an additive model in which model performance increased with repeated repetitions. Differentiable loss functions can be optimized with this method [72].

The majority of current contests employed this most recent algorithm. A differentiable loss function can be optimized using an additive model [21]. To a greater or lesser extent, the effectiveness of various ML algorithms vary depending on the location in question. Consequently, it is a continuing challenge to investigate and design a generic ML model for water quality assessment applications [29].

4.7 Deep Learning (DP)

As a deep learning approach, the long short-term memory (LSTM) model is well-suited to predicting time-series data when the size of the time step is uncertain. In the LSTM model, a logistic sigmoid activation function was applied. It appeared that this WQI forecasting approach was not widely used in the literature [66]. Data relationships and hidden patterns can be revealed by various processing layers in Deep Learning network which functions similarly to a human brain’s neural network.

Another deep learning model was convolutional neural network (CNN). Each neuron in CNN is connected to a feature extracted from a lower neural layer. CNN can reduce the number of computations required and help to prevent the overfitting problems. Thus, CNN has been implemented in several studies that analyzed the content of digital photographs [24]. To combat the mediocre accuracy of previous scales, they developed a prediction model using LSTM deep neural networks and water quality monitoring data for training and testing [11].

4.8 Hybrid Techniques

Because of the constraints of some algorithms on the processing of stochastic data, experts often resort to hybrid modelling strategies. The prediction accuracy of a model may be greatly enhanced by combining two or more algorithms at various phases of the modelling process. In this article, we examined several hybrid machine learning and hybrid deep learning models that have been utilized in the study of WQI prediction. Due to using long short-term memory (LSTM) model as a reference point, the transfer learning and long short-term memory (TL-LSTM) model was used to generate deterministic point predictions when data was unavailable. Based on the findings, it is clear that the Multivariate Bayesian Uncertainty Processor (MBUP) strategy, which consists of deep learning and post-processing, was successfully identified the complicated dependency structure between the model’s output and the observed water quality [22].

Water quality characteristics were predicted using a three-part hybrid neural network model built from one-dimensional residual convolutional neural networks (1-DRCNN) and Bi-directional Gated recurrent units (BiGRU). To better capture the local change direction of these three parameters and to track their real value fluctuations, the 1-DRCNN-BiGRU hybrid neural network outperformed the single reference depth learning technique [4]. Predicting water quality data using a hybrid model might be an effective option because it is possible to capture more of the underlying patterns by mixing many models. The hybrid model outperformed both the Auto Regressive Integrated Moving Average (ARIMA) and the neural network models in terms of accuracy due to its superior recognition of time series patterns and nonlinear properties [39].

The model’s fundamental premise is to improve prediction accuracy by minimizing the influence of unimportant factors and amplifying the significance of significant factors through the adaptive weighting of components in the neural network’s hidden layers. For predicting water quality, the attention-based LSTM (AT-LSTM) model was better than the LSTM model [14].

5 Performance Evaluation Metrics

WQI requires validation data to evaluate the performance of the models. The data size may range from minutes interval up to more than seasonal data collected for the analysis. The time scale refers to the frequency of the collection of the water parameter data taken at the stations. The data may be taken daily, weekly, or monthly, depending on the design and purpose of the studies performed by the researchers. Various inputs or independent variables may be used to estimate the water quality index. The most common inputs applied are: temperature, dissolved oxygen, pH, turbidity and total phosphorus. In order to evaluate the model performance under various conditions, the models can be designed with varying inputs. Typically, when evaluating the best predictive models, the comparison should be based on the scenario in which all inputs were used.

The performance metrics are critical to determine how effectively the proposed models can provide predictive values that are comparable to or closed to the desired actual values. In this scenario, it is significant to select relevant performance indicators for model evaluation.

Several numbers of performance metrics are available to measure the performance of prediction in forecasting models. These metrics include coefficient of determination (R2), Mean absolute error (MAE), root mean square error (RMSE) and nash–sutcliffe efficiency coefficient (NSE). Most of the reviewed articles have used the R2, RMSE, and MAE which were successfully used in studies. However, for deeper evaluation of the performance, there were other metrics such as global performance indicator (GPI), correlation factor (R), Willmott Index of agreement (WI) and more.”

The coefficient of determination is a number between 0 and 1 where a value of 1.0 indicates a perfect correlation. R2 is used to explain the relationship between an independent and dependent variable and measures how well a statistical model predicts an outcome. The limitation of R-squared is inability to indicate if a regression model provides a proper fit to your data. In other words, sometimes good model may have a low R2 value. Additionally, it cannot inform if the data and predictions are biased or not.

$$R^{2} = 1 - \frac{RSS}{{TSS}}$$

(1)

where R² = coefficient of determination, RSS = sum of squares of residuals, TSS = total sum of squares.

Mean absolute error (MAE) measures the absolute difference between the model prediction and the target value. The lower MAE score leads to better model. MAE is a robust and an unbiased estimator which is useful if the training data has outliers. The limitations of MAE are that MAE is not differentiable at zero. Additionally, it follows a scale-dependent measure.

$${\text{MAE}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left| {y_{i} - \widehat{{y_{i} }}} \right|}}{n}$$

(2)

where MAE is mean absolute error, y is target value, $\widehat{y } \mathrm{is predicted value},$ $\mathrm{n is numver of samples}$

Root mean square error (RMSE) measures the average of squared difference between values predicted by a model and the actual values. Lower values of RMSE indicate better fit. Opposite to MSE which is highly biased for higher error values. RMSE is better in terms of reflecting performance when dealing with large error values. The limitation of RMSE is it is prone to outliers.

$${\text{RMSE}} = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \widehat{{y_{i} }}} \right)^{2} }}{n}}$$

(3)

where RMSE is root mean squared error.

Nash–sutcliffe efficiency (NSE) coefficient is a reliable statistic used for assessing the goodness and predictive skill of fit of model. It is equal to one minus the ratio of the error variance of the modelled time-series divided by the variance of the observed time-series. NSE ranges between − ∞ and 1.0 (1 inclusive), with NSE = 1 being the optimal value. Values between 0.0 and 1.0 are generally viewed as acceptable levels of performance, whereas values < 0.0 indicates unacceptable performance.

$$NSE = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {q_{o} - q_{s} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {q_{o} - \hat{q}} \right)^{2} }}$$

where NSE is Nash–Sutcliffe coefficient, q_o is observed value, q_s is simulated value, $\widehat{q}$ is average of observed value.

Table 4 presents a summary of various scenarios that the reviewed studies have presented. These scenarios are related to time scales of data used such as hourly, daily, weekly, and monthly. In other words, in daily scenario for example, the data collected for one day is used for prediction of future data. Additionally, the table indicates numerous performance indicators that were mentioned in the reviewed articles to evaluate the model prediction capability.

6 Conclusion and Recommendations

The objective of this study was to address the performance of the predictive models used in water quality prediction via different water parameters based on the results shown and the limitations mentioned. This paper has reviewed various 83 studies that were conducted recently between 2009 and 2023 to predict water quality index (WQI) using machine learning methods. In this review, we identified and categorized various types of modelling algorithms, input paraments and outputs. It was found that machine learning techniques were effective in simulation and prediction of the water quality index in many regions around the world. These methods have found the connections between water quality index and hydrological and meteorological variables without knowledge about physical characteristics of the modelled system. In other words, when it is difficult to design a knowledge-based model, machine learning techniques seem to be useful without a need to build physical models for the observed system. For a successful estimate of the water quality index, studies showed specific steps taken in the modelling process such as data preprocessing, dividing data into training, validation, and testing, and the selection of suitable predictors.

Advancements in modelling techniques employed machine learning (ML) and hybrid models in forecasting water quality index. In this study, it was observed that hybrid models have improved WQI estimation performance significantly. Additionally, since DL models has better performance than ML models in several studies, the hybrid-DL methods may show also superior performance compared to the hybrid-ML methods. However, since the studies of hybrid-DL models employed for WQI estimation were limited, the comparison was not done in this review.

Most of the studies were conducted in the Middle East and Asian Countries Therefore, we recommend more research works on water quality index prediction for regions where the availability of surface water is limited, such as in the African continent and parts of Europe and South America. For the modelling techniques used in the reviewed works, ensemble learning methods were limited, even though they are the most accurate methods.

When it is possible to collect large water quality data, more powerful algorithms of deep learning models such as convolutional neural network (CNN), long-term short memory (LSTM), and transformer can take place of traditional ML methods and produce significant improvement in prediction performance. The recent DL methods, specifically transformer [86], may open door to capture the temporal relations of history of water quality samples collected previously to forecast the future quality value with remarkable performance. This transformer uses attention mechanism to allow the model to focus on specific samples in data sequence by assigning different weights to different data samples applied at input. This technique was found to outperform LSTM in several applications [87,88,89].

Generative Adversarial Networks (GANs) have been discussed and evaluated in several domains and were able to give better prediction results. However, research works on using GANs a for predicting water quality index and comparison with standalone and hybrid models are still required. GAN may play a significant role to address the data-hungry problem of deep learning models by generating synthetic data. The potential benefit from synthetic samples generated by GAN can solve problems related to cost of data collection and lack of data that most of applications suffer from. GAN can increase size of data which open doors to utilize recent models of deep learning such as CNN, LSTM, and transformers for water quality prediction. By GAN, it will also be easy to retrieve some of the missing values in the history of water values collected in previous years. However, using GAN requires powerful machine to train and run the GAN model and it requires to fine tune the hyperparameters to get the expected performance because GAN is extremely sensitive to hyperparameter settings. The conclusions drawn from this review analysis can serve as a guidance for future studies to enhance the performance of Water Quality prediction using GAN’s generated data followed by the existing state-of-the-art methods.

References

Peterson KT, Sagan V, Sidike P, Hasenmueller EA, Sloan JJ, Knouft JH (2019) Machine learning-based ensemble prediction of water-quality variables using feature-level and decision-level fusion with proximal remote sensing. Photogramm Eng Remote Sensing 85(4):269–280. https://doi.org/10.14358/PERS.85.4.269
Article Google Scholar
Barzegar R, Adamowski J, Moghaddam AA (2016) Application of wavelet-artificial intelligence hybrid models for water quality prediction: a case study in Aji-Chay River, Iran. Stoch Env Res Risk Assess 30(7):1797–1819. https://doi.org/10.1007/S00477-016-1213-Y/METRICS
Article Google Scholar
Alizadeh MJ, Kavianpour MR (2015) Development of wavelet-ANN models to predict water quality parameters in Hilo Bay, Pacific Ocean. Mar Pollut Bull 98(1–2):171–178. https://doi.org/10.1016/J.MARPOLBUL.2015.06.052
Article Google Scholar
Yan J et al (2021) Water quality prediction in the Luan river based on 1-DRCNN and BiGRU hybrid neural network model. Water 13:1273. https://doi.org/10.3390/W13091273
Article Google Scholar
Olyaie E, Banejad H (2011) Application of an artificial neural network model to rivers water quality indexes prediction-a case study. J Am Sci 7(1):1545–1003
Google Scholar
Sani Gaya M et al (2020) Estimation of water quality index using artificial intelligence approaches and multi-linear regression. IAES Int J Artif Intell 9(1):126–134. https://doi.org/10.11591/ijai.v9.i1.pp126-134
Article Google Scholar
Pham QB, Mohammadpour R, Linh NT, Mohajane M, Pourjasem A, Sammen SS, Anh DT, Nam VT (2021) Application of soft computing to predict water quality in wetland. Environ Sci Pollut Res 28:185–200
Y. Khan and C. S. See, “Predicting and analyzing water quality using machine learning: a comprehensive model,” 2016 IEEE Long Island systems, applications and technology conference, LISAT 2016, Jun. 2016, doi: https://doi.org/10.1109/LISAT.2016.7494106.
Najah Ahmed A et al (2019) Machine learning methods for better water quality prediction. J Hydrol (Amst) 578:124084. https://doi.org/10.1016/J.JHYDROL.2019.124084
Article Google Scholar
Gao C, Wang Z, Ji X, Wang W, Wang Q, Qing D (2023) Coupled improvements on hydrodynamics and water quality by flowing water in towns with lakes. Environ Sci Pollut Res 30(16):46813–46825. https://doi.org/10.1007/s11356-023-25348-3
Liu P, Wang J, Sangaiah AK, Xie Y, Yin X (2019) Analysis and prediction of water quality using LSTM deep neural networks in IoT environment. Sustainability 11(7):2058. https://doi.org/10.3390/SU11072058
Article Google Scholar
Bui DT, Khosravi K, Tiefenbacher J, Nguyen H, Kazakis N (2020) Improving prediction of water quality indices using novel hybrid machine-learning algorithms. Sci Total Environ 721:137612. https://doi.org/10.1016/J.SCITOTENV.2020.137612
Article Google Scholar
Tiwari S, Babbar R, Kaur G (2018) Performance evaluation of two ANFIS models for predicting water quality index of river Satluj (India). Adv Civil Eng. https://doi.org/10.1155/2018/8971079
Article Google Scholar
Chen H et al (2022) Water quality prediction based on LSTM and attention mechanism: a case study of the Burnett River Australia. Sustainability 14(20):13231. https://doi.org/10.3390/SU142013231
Article Google Scholar
Sha J, Li X, Zhang M, Wang ZL (2021) Comparison of forecasting models for real-time monitoring of water quality parameters based on hybrid deep learning neural networks. Water 13(11):1547. https://doi.org/10.3390/W13111547
Article Google Scholar
Li L, Jiang P, Xu H, Lin G, Guo D, Wu H (2019) Water quality prediction based on recurrent neural network and improved evidence theory: a case study of Qiantang River, China. Environ Sci Pollut Res 26(19):19879–19896. https://doi.org/10.1007/S11356-019-05116-Y/METRICS
Article Google Scholar
Emamgholizadeh S, Kashi H, Marofpoor I, Zalaghi E (2014) Prediction of water quality parameters of Karoon River (Iran) by artificial intelligence-based models. Int J Environ Sci Technol 11(3):645–656. https://doi.org/10.1007/S13762-013-0378-X/METRICS
Article Google Scholar
Asadollah SBHS, Sharafati A, Motta D, Yaseen ZM (2021) River water quality index prediction and uncertainty analysis: A comparative study of machine learning models. J Environ Chem Eng 9(1):104599. https://doi.org/10.1016/J.JECE.2020.104599
Article Google Scholar
Haghiabi AH, Nasrolahi AH, Parsaie A (2018) Water quality prediction using machine learning methods. Water Qual Res J 53(1):3–13. https://doi.org/10.2166/WQRJ.2018.025
Article Google Scholar
Lu H, Ma X (2020) Hybrid decision tree-based machine learning models for short-term water quality prediction. Chemosphere 249:126169. https://doi.org/10.1016/J.CHEMOSPHERE.2020.126169
Article Google Scholar
Ahmed U, Mumtaz R, Anwar H, Shah AA, Irfan R, García-Nieto J (2019) Efficient water quality prediction using supervised machine learning. Water 11(11):2210. https://doi.org/10.3390/W11112210
Article Google Scholar
Zhou Y (2020) Real-time probabilistic forecasting of river water quality under data missing situation: deep learning plus post-processing techniques. J Hydrol (Amst) 589:125164. https://doi.org/10.1016/J.JHYDROL.2020.125164
Article Google Scholar
Hayder G, Kurniawan I, Mustafa HM (2020) Implementation of machine learning methods for monitoring and predicting water quality parameters. Biointerface Res Appl Chem. https://doi.org/10.33263/BRIAC112.92859295
Article Google Scholar
Baek SS, Pyo J, Chun JA (2020) Prediction of water level and water quality using a CNN-LSTM combined deep learning approach. Water 12(12):3399. https://doi.org/10.3390/W12123399
Article Google Scholar
Jin T, Cai S, Jiang D, Liu J (2019) A data-driven model for real-time water quality prediction and early warning by an integration method. Environ Sci Pollut Res 26(29):30374–30385. https://doi.org/10.1007/S11356-019-06049-2/METRICS
Article Google Scholar
Isiyaka HA, Mustapha A, Juahir H, Phil-Eze P (2019) Water quality modelling using artificial neural network and multivariate statistical techniques. Model Earth Syst Environ 5(2):583–593. https://doi.org/10.1007/S40808-018-0551-9/METRICS
Article Google Scholar
Liu S, Tai H, Ding Q, Li D, Xu L, Wei Y (2013) A hybrid approach of support vector regression with genetic algorithm optimization for aquaculture water quality prediction. Math Comput Model 58(3–4):458–465. https://doi.org/10.1016/J.MCM.2011.11.021
Article Google Scholar
Ouma YO, Okuku CO, Njau EN (2020) Use of artificial neural networks and multiple linear regression model for the prediction of dissolved oxygen in rivers: case study of hydrographic basin of river Nyando, Kenya. Complexity. https://doi.org/10.1155/2020/9570789
Article Google Scholar
Khoi DN, Quan NT, Linh DQ, Nhi PTT, Thuy NTD (2022) Using machine learning models for predicting the Water Quality Index in the La Buong River, Vietnam. Water 14(10):1552. https://doi.org/10.3390/W14101552
Article Google Scholar
Alqahtani A, Shah MI, Aldrees A, Javed MF (2022) Comparative assessment of individual and ensemble machine learning models for efficient analysis of river water quality. Sustainability 14(3):1183. https://doi.org/10.3390/SU14031183
Article Google Scholar
Ziyad Sami BF et al (2022) Machine learning algorithm as a sustainable tool for dissolved oxygen prediction: a case study of Feitsui Reservoir, Taiwan. Sci Rep 12(1):1–12. https://doi.org/10.1038/s41598-022-06969-z
Article Google Scholar
Izhar Shah M, Alaloul WS, Alqahtani A, Aldrees A, Ali Musarat M, Javed MF (2021) Predictive modeling approach for surface water quality: development and comparison of machine learning models. Sustainability 13(14):7515. https://doi.org/10.3390/SU13147515
Article Google Scholar
Kisi O, Parmar KS (2016) Application of least square support vector machine and multivariate adaptive regression spline models in long term prediction of river water pollution. J Hydrol (Amst) 534:104–112. https://doi.org/10.1016/J.JHYDROL.2015.12.014
Article Google Scholar
Melesse AM et al (2020) River water salinity prediction using hybrid machine learning models. Water 12(10):2951. https://doi.org/10.3390/W12102951
Article Google Scholar
Hameed M, Sharqi SS, Yaseen ZM, Afan HA, Hussain A, Elshafie A (2017) Application of artificial intelligence (AI) techniques in water quality index prediction: a case study in tropical region, Malaysia. Neural Comput Appl 28(1):893–905. https://doi.org/10.1007/S00521-016-2404-7/METRICS
Article Google Scholar
Ahmed AAM, Shah SMA (2017) Application of adaptive neuro-fuzzy inference system (ANFIS) to estimate the biochemical oxygen demand (BOD) of Surma River. J King Saud Univ Eng Sci 29(3):237–243. https://doi.org/10.1016/J.JKSUES.2015.02.001
Article MathSciNet Google Scholar
Maier PM, Keller S (2018) Machine learning regression on hyperspectral data to estimate multiple water parameters. Workshop Hyperspectral Image Signal Process, Evol Remote Sensing. https://doi.org/10.1109/WHISPERS.2018.8747010
Article Google Scholar
Heddam S, Kisi O (2018) Modelling daily dissolved oxygen concentration using least square support vector machine, multivariate adaptive regression splines and M5 model tree. J Hydrol (Amst) 559:499–509. https://doi.org/10.1016/J.JHYDROL.2018.02.061
Article Google Scholar
Ömer Faruk D (2010) A hybrid neural network and ARIMA model for water quality time series prediction. Eng Appl Artif Intell 23(4):586–594. https://doi.org/10.1016/J.ENGAPPAI.2009.09.015
Article Google Scholar
Sattari MT, Joudi AR, Kusiak A (2016) Estimation of water quality parameters with data-driven model. J Am Water Works Assoc 108(4):E232–E239. https://doi.org/10.5942/JAWWA.2016.108.0012
Article Google Scholar
Abba SI et al (2020) Implementation of data intelligence models coupled with ensemble machine learning for prediction of water quality index. Environ Sci Pollut Res 27(33):41524–41539. https://doi.org/10.1007/S11356-020-09689-X/METRICS
Article Google Scholar
Yan T, Zhou A, Shen SL (2023) Prediction of long-term water quality using machine learning enhanced by Bayesian optimisation. Environ Pollut 318:120870. https://doi.org/10.1016/J.ENVPOL.2022.120870
Article Google Scholar
Malek NHA, Yaacob WFW, Nasir SAM, Shaadan N (2022) Prediction of water quality classification of the kelantan river basin, Malaysia, using machine learning techniques. Water 14(7):1067. https://doi.org/10.3390/W14071067
Article Google Scholar
Huang M et al (2018) A hybrid fuzzy wavelet neural network model with self-adapted fuzzy c-means clustering and genetic algorithm for water quality prediction in rivers. Complexity. https://doi.org/10.1155/2018/8241342
Article Google Scholar
Rizal NNM, Hayder G, Mnzool M, Elnaim BME, Mohammed AOY, Khayyat MM (2022) Comparison between regression models, support vector machine (SVM), and artificial neural network (ANN) in river water quality prediction. Processes 10(8):1652. https://doi.org/10.3390/PR10081652
Article Google Scholar
W. Xuan, J. Lv, and D. Xie, “A hybrid approach of support vector machine with particle swarm optimization for water quality prediction,” ICCSE 2010—5th International conference on computer science and education, final program and book of abstracts, pp. 1158–1163, 2010, doi: https://doi.org/10.1109/ICCSE.2010.5593697.
Than NH, Ly CD, van Tat P, Thanh NN (2016) Application of a neural network technique for prediction of the Water Quality index in the Dong Nai River, Vietnam. J Environ Sci Eng B 5:363–370. https://doi.org/10.17265/2162-5263/2016.07.007
Article Google Scholar
Singh KP, Basant A, Malik A, Jain G (2009) Artificial neural network modeling of the river water quality—a case study. Ecol Modell 220(6):888–895. https://doi.org/10.1016/J.ECOLMODEL.2009.01.004
Article Google Scholar
Q. Ye, X. Yang, C. Chen, and J. Wang, “River water quality parameters prediction method based on LSTM-RNN model,” Proceedings of the 31st Chinese control and decision conference, CCDC 2019, pp. 3024–3028, Jun. 2019, doi: https://doi.org/10.1109/CCDC.2019.8832885.
Azad A, Karami H, Farzin S, Mousavi SF, Kisi O (2019) Modeling river water quality parameters using modified adaptive neuro fuzzy inference system. Water Sci Eng 12(1):45–54. https://doi.org/10.1016/J.WSE.2018.11.001
Article Google Scholar
Chou JS, Ho CC, Hoang HS (2018) Determining quality of water in reservoir using machine learning. Ecol Inform 44:57–75. https://doi.org/10.1016/J.ECOINF.2018.01.005
Article Google Scholar
Elkiran G, Nourani V, Abba SI (2019) Multi-step ahead modelling of river water quality parameters using ensemble artificial intelligence-based approach. J Hydrol (Amst). https://doi.org/10.1016/J.JHYDROL.2019.123962
Article Google Scholar
Chen K et al (2020) Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data. Water Res. https://doi.org/10.1016/j.watres.2019.115454
Article Google Scholar
Ly QV et al (2021) Application of machine learning for eutrophication analysis and algal bloom prediction in an urban river: a 10-year study of the Han River, South Korea. Sci Total Environ 797:149040. https://doi.org/10.1016/J.SCITOTENV.2021.149040
Article Google Scholar
Ahmed M, Mumtaz R, Mohammad S, Zaidi H (2021) Analysis of water quality indices and machine learning techniques for rating water pollution: a case study of Rawal Dam, Pakistan. Water Supply. https://doi.org/10.2166/ws.2021.082
Article Google Scholar
Zanoni MG, Majone B, Bellin A (2022) A catchment-scale model of river water quality by machine learning. Sci Total Environ 838:156377. https://doi.org/10.1016/J.SCITOTENV.2022.156377
Article Google Scholar
Uddin MG, Nash S, Rahman A, Olbert AI (2023) Performance analysis of the water quality index model for predicting water state using machine learning techniques. Process Saf Environ Prot 169:808–828. https://doi.org/10.1016/J.PSEP.2022.11.073
Article Google Scholar
Al-Sulttani AO, Al-Mukhtar M, Roomi AB, Farooque AA, Khedher KM, Yaseen ZM (2021) Proposition of New ensemble data-intelligence models for surface water quality prediction. IEEE Access 9:108527–108541. https://doi.org/10.1109/ACCESS.2021.3100490
Article Google Scholar
Gazzaz NM, Yusoff MK, Aris AZ, Juahir H, Ramli MF (2012) Artificial neural network modeling of the water quality index for Kinta River (Malaysia) using water quality variables as predictors. Mar Pollut Bull 64(11):2409–2420. https://doi.org/10.1016/J.MARPOLBUL.2012.08.005
Article Google Scholar
Kouadri S, Elbeltagi A, Islam ARMT, Kateb S (2021) Performance of machine learning methods in predicting water quality index based on irregular data set: application on Illizi region (Algerian southeast). Appl Water Sci 11(12):1–20. https://doi.org/10.1007/S13201-021-01528-9/TABLES/9
Article Google Scholar
Anmala J, Venkateshwarlu T (2019) Statistical assessment and neural network modeling of stream water quality observations of Green River watershed, KY, USA. Water Supply 19(6):1831–1840. https://doi.org/10.2166/WS.2019.058
Article Google Scholar
Ma C, Zhao J, Ai B, Sun S, Yang Z (2022) Machine learning based long-term water quality in the turbid pearl river Estuary, China. J Geophys Res Oceans. https://doi.org/10.1029/2021JC018017
Article Google Scholar
Adusei YY, Quaye-Ballard J, Adjaottor AA, Mensah AA (2021) Spatial prediction and mapping of water quality of Owabi reservoir from satellite imageries and machine learning models. Egypt J Remote Sensing Space Sci 24(3):825–833. https://doi.org/10.1016/J.EJRS.2021.06.006
Article Google Scholar
Othman F et al (2020) Efficient river water quality index prediction considering minimal number of inputs variables. Eng Appl Comput Fluid Mech 14(1):751–763. https://doi.org/10.1080/19942060.2020.1760942
Article Google Scholar
Bhoi SK, Mallick C, Mohanty CR (2022) Estimating the water quality class of a major irrigation canal in Odisha, India: a supervised machine learning approach. Nat Environ Pollut Technol. https://doi.org/10.46488/NEPT.2022.v21i02.002
Article Google Scholar
Aldhyani THH, Al-Yaari M, Alkahtani H, Maashi M (2020) Water quality prediction using artificial intelligence algorithms. Appl Bionics Biomech. https://doi.org/10.1155/2020/6659314
Article Google Scholar
Lee HW, Kim M, Son HW, Min B, Choi JH (2022) Machine-learning-based water quality management of river with serial impoundments in the Republic of Korea. J Hydrol Reg Stud 41:101069. https://doi.org/10.1016/J.EJRH.2022.101069
Article Google Scholar
Li J et al (2019) Hybrid soft computing approach for determining water quality indicator: Euphrates River. Neural Comput Appl 31(3):827–837. https://doi.org/10.1007/S00521-017-3112-7/METRICS
Article Google Scholar
Fijani E, Barzegar R, Deo R, Tziritis E, Konstantinos S (2019) Design and implementation of a hybrid model based on two-layer decomposition method coupled with extreme learning machines to support real-time environmental monitoring of water quality parameters. Sci Total Environ 648:839–853. https://doi.org/10.1016/J.SCITOTENV.2018.08.221
Article Google Scholar
Kumar L, Afzal MS, Ahmad A (2022) Prediction of water turbidity in a marine environment using machine learning: a case study of Hong Kong. Reg Stud Mar Sci 52:102260. https://doi.org/10.1016/J.RSMA.2022.102260
Article Google Scholar
Ho JY et al (2019) Towards a time and cost effective approach to water quality index class prediction. J Hydrol (Amst) 575:148–165. https://doi.org/10.1016/J.JHYDROL.2019.05.016
Article Google Scholar
Koranga M, Pant P, Kumar T, Pant D, Bhatt AK, Pant RP (2022) Efficient water quality prediction models based on machine learning algorithms for Nainital Lake, Uttarakhand. Mater Today Proc 57:1706–1712. https://doi.org/10.1016/J.MATPR.2021.12.334
Article Google Scholar
Uddin MG, Nash S, Mahammad Diganta MT, Rahman A, Olbert AI (2022) Robust machine learning algorithms for predicting coastal water quality index. J Environ Manag 321:115923. https://doi.org/10.1016/J.JENVMAN.2022.115923
Article Google Scholar
Gómez D, Salvador P, Sanz J, Casanova JL (2021) A new approach to monitor water quality in the Menor sea (Spain) using satellite data and machine learning methods. Environ Pollut 286:117489. https://doi.org/10.1016/J.ENVPOL.2021.117489
Article Google Scholar
Zhu X, Guo H, Huang JJ, Tian S, Xu W, Mai Y (2022) An ensemble machine learning model for water quality estimation in coastal area based on remote sensing imagery. J Environ Manag 323:116187. https://doi.org/10.1016/J.JENVMAN.2022.116187
Article Google Scholar
Saberioon M, Brom J, Nedbal V (2020) Chlorophyll-a and total suspended solids retrieval and mapping using Sentinel-2A and machine learning for inland waters. Ecol Indic 113:106236. https://doi.org/10.1016/J.ECOLIND.2020.106236
Article Google Scholar
Xu T, Coco G, Neale M (2020) A predictive model of recreational water quality based on adaptive synthetic sampling algorithms and machine learning. Water Res 177:115788. https://doi.org/10.1016/J.WATRES.2020.115788
Article Google Scholar
Deng T, Chau KW, Duan HF (2021) Machine learning based marine water quality prediction for coastal hydro-environment management. J Environ Manag 284:112051. https://doi.org/10.1016/J.JENVMAN.2021.112051
Article Google Scholar
Al-Adhaileh MH, Alsaade FW (2021) Modelling and prediction of water quality by using artificial intelligence. Sustainability 13:4259. https://doi.org/10.3390/SU13084259
Article Google Scholar
Khullar S, Singh N (2022) Water quality assessment of a river using deep learning Bi-LSTM methodology: forecasting and validation. Environ Sci Pollut Res 29(9):12875–12889. https://doi.org/10.1007/S11356-021-13875-W/METRICS
Article Google Scholar
Latif SD et al (2022) Development of prediction model for phosphate in reservoir water system based machine learning algorithms. Ain Shams Eng J 13(1):101523. https://doi.org/10.1016/J.ASEJ.2021.06.009
Article Google Scholar
A. P. Kogekar, R. Nayak, and U. C. Pati, “A CNN-BiLSTM-SVR based deep hybrid model for water quality forecasting of the river Ganga,” Proceedings of the 2021 IEEE 18th India council international conference, INDICON 2021, 2021, doi: https://doi.org/10.1109/INDICON52576.2021.9691532.
Wang S, Peng H, Liang S (2022) Prediction of estuarine water quality using interpretable machine learning approach. J Hydrol (Amst) 605:127320. https://doi.org/10.1016/J.JHYDROL.2021.127320
Article Google Scholar
F. H. Garabaghi, S. Benzer, and R. Benzer, “Performance evaluation of machine learning models with ensemble learning approach in classication of water quality indices based on different subset of features,” (2022), doi: https://doi.org/10.21203/rs.3.rs-876980/v2.
Jiang Y, Li C, Sun L, Guo D, Zhang Y, Wang W (2021) A deep learning algorithm for multi-source data fusion to predict water quality of urban sewer networks. J Clean Prod 318:128533. https://doi.org/10.1016/J.JCLEPRO.2021.128533
Article Google Scholar
Attention is all you need. A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... Advances in neural information processing systems 30, 2017.
Amanambu AC, Mossa J, Chen Y-H (2022) Hydrological drought forecasting using a deep transformer model. Water 14:3611. https://doi.org/10.3390/w14223611
Article Google Scholar
Méndez M, Montero C, Núñez M (2022) Using deep transformer based models to predict ozone levels. In: Nguyen NT, Tran TK, Tukayev U, Hong TP, Trawiński B, Szczerbicki E (eds) Intelligent information and database systems ACIIDS 2022. Springer, Cham
Google Scholar
Xu J, Fan H, Luo M, Li P, Jeong T, Xu L (2023) Transformer based water level prediction in Poyang Lake, China. Water 15:576. https://doi.org/10.3390/w15030576
Article Google Scholar
Roushangar K, Shahnazi S, Azamathulla HM (2023) Sediment transport modeling through machine learning methods: review of current challenges and strategies. In: Pandey M, Azamathulla H, Pu JH (eds) River dynamics and flood hazards disaster. Resilience and green growth. Springer, Singapore
Google Scholar
Azamathulla HM, Ghani AA, Chang CK, Hasan ZA, Zakaria NA (2010) Machine learning approach to predict sediment load–a case study. Clean-Soil Air Water 38:969–976
Article Google Scholar
Wu A, Azamathulla HM, Wu FC (2011) Support vector machine approach for longitudinal dispersion coefficients in natural streams. Appl Soft Comput 11(2):2902–2905
Article Google Scholar

Download references

Acknowledgements

This research was supported by the Ministry of Education (MOE) through Fundamental Research Grant Scheme (FRGS/1/2021/TK0/UIAM/03/1).

Author information

Authors and Affiliations

Department of Civil Engineering, Kulliyyah of Engineering, International Islamic University of Malaysia, P.O Box 10, 50728, Kuala Lumpur, Malaysia
Dani Irwan & Maisarah Ali
Department of Civil Engineering, College of Engineering, Universiti Tenaga Nasional (UNITEN), 43000, Kajang, Selangor, Malaysia
Ali Najah Ahmed, Gan Jacky, Aiman Nurhakim & Mervyn Chah Ping Han
Computer Science, New York University, Abu Dhabi, UAE
Nouar AlDahoul
Department of Civil Engineering, Faculty of Engineering, University of Malaya (UM), 50603, Kuala Lumpur, Malaysia
Ahmed El-Shafie

Authors

Dani Irwan
View author publications
You can also search for this author in PubMed Google Scholar
Maisarah Ali
View author publications
You can also search for this author in PubMed Google Scholar
Ali Najah Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Gan Jacky
View author publications
You can also search for this author in PubMed Google Scholar
Aiman Nurhakim
View author publications
You can also search for this author in PubMed Google Scholar
Mervyn Chah Ping Han
View author publications
You can also search for this author in PubMed Google Scholar
Nouar AlDahoul
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed El-Shafie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Najah Ahmed.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Irwan, D., Ali, M., Ahmed, A.N. et al. Predicting Water Quality with Artificial Intelligence: A Review of Methods and Applications. Arch Computat Methods Eng 30, 4633–4652 (2023). https://doi.org/10.1007/s11831-023-09947-4

Download citation

Received: 02 March 2023
Accepted: 25 May 2023
Published: 13 June 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s11831-023-09947-4

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Predicting Water Quality with Artificial Intelligence: A Review of Methods and Applications

Abstract

Similar content being viewed by others

Multi-step Ahead Urban Water Demand Forecasting Using Deep Learning Models

Short-term urban water demand forecasting; application of 1D convolutional neural network (1D CNN) in comparison with different deep learning schemes

Artificial Intelligence Generated Synthetic Datasets as the Remedy for Data Scarcity in Water Quality Index Estimation

1 Introduction