Groundwater Quality Analysis and Drinkability Prediction using Artificial Intelligence

Panigrahi, Niranjan; Patro, S. Gopal Krishna; Kumar, Raghvendra; Omar, Michael; Ngan, Tran Thi; Giang, Nguyen Long; Thu, Bui Thi; Thang, Nguyen Truong

doi:10.1007/s12145-023-00977-x

Groundwater Quality Analysis and Drinkability Prediction using Artificial Intelligence

Research
Published: 04 April 2023

Volume 16, pages 1701–1725, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Earth Science Informatics Aims and scope Submit manuscript

Groundwater Quality Analysis and Drinkability Prediction using Artificial Intelligence

Download PDF

Niranjan Panigrahi¹,
S. Gopal Krishna Patro²,
Raghvendra Kumar³,
Michael Omar⁴,
Tran Thi Ngan⁵,
Nguyen Long Giang⁶,
Bui Thi Thu⁶ &
…
Nguyen Truong Thang⁶

641 Accesses
6 Citations
Explore all metrics

Abstract

Water quality strongly influences sustainable growth of a healthy society and green environment. According to the International Initiative on Water Quality (IIWQ) of the UNESCO Intergovernmental Hydrological Programme (IHP), it is essential to address water-quality issues holistically in developed and developing countries. Due to rapid urbanization and industrialization in many developing countries, groundwater - one of the major sources of drinking is getting highly affected. The traditional laboratory-based chemical testing process with conventional statistical methods is often used to analyze water quality. However, it is time-consuming. Recently, Artificial Intelligence (AI) based approaches have proven to be a better alternative for analysis and prediction of the quality of water, provided with its chemical components’ data. In this paper, we present research focusing on groundwater quality analysis using Artificial Intelligence (AI) in a case study of Odisha, an eastern- state of India and the data acquired from the Northern delta, the North Central Coast of Vietnam. The dataset in Vietnam is collected by the Ministry of Natural Resources and Environment, providing technical regulations on water resources monitoring. The Central Groundwater Board and the Government of India collect the dataset from India. The target problem is formulated as a multi-class classification problem to predict groundwater quality for drinking suitability by WHO standards. AI methodologies such as logistic regression, K-NN, Support Vector Machine (SVM) variants, decision tree, AdaBoost and XGBoost are used. Prediction results have demonstrated that Adaboost, the XGBoost and the Polynomial SVM model accurately classified the Water Quality Classes with an accuracy of 92% and 98%, respectively. It would help decision-makers effectively choose the best source of water for drinking.

Predicting irrigation water quality indices in a typical mining dominated area in the Upper West region of Ghana using multiple machine learning techniques

Article Open access 15 July 2024

Groundwater Quality Assessment of Raipur City Using Machine Learning Models

Evaluation of groundwater quality for drinking purposes based on machine learning algorithms and GIS

Article 18 December 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

With growing urbanization and deforestation, groundwater quality is changing invisibly due to various types of pollution. It is because of harmful substances like chemicals, microorganisms, radio-activity or heat energy which enter directly or indirectly into bodies of water. Classifying and predicting water quality is important for various purposes like drinking, and irrigation (Wang et al. 2017; WHO Guidelines for drinking water quality 2004). Recently, interdisciplinary research has gained momentum to study groundwater quality across various parts of the globe (Venkata Vara Prasad et al. 2022; Dogo et al. 2019; Ranjithkumar and Robert 2021). The major focus has been given to developing countries due to the fact that industrialization and urbanization in developing countries are mostly affecting groundwater quality (Ground Water Year Book 2018).

Water quality is usually assessed by costly, traditional laboratory and statistical analysis, which is time-consuming (WHO Guidelines for drinking water quality 2004). Several types of research (Sahu et al. 2021; Barik and Pattanayak 2019; Ground Water Year Book 2018; Madhav et al. 2020) have been done to carry out water quality analysis focusing on only hydro-chemical processes. In this regard, AI-based approaches can be used for quick and reliable analysis and have been recently adopted by research communities across the globe (Hanoon et al. 2021; Ahmed et al. 2020; Khan and See 2016). This solves major issues related to water.

A set of state-of-the-art ML models are selected, which have shown better performance on many regional water quality datasets of different countries. Motivated by their performance, the present work has considered applying those models and their variants and studying the efficacy of the models on the Odisha, India and Vietnam water quality datasets.

In Sahu et al. (2021), processes using silicate-halite dissolution and reverse ion exchange were employed for the phreatic aquifer, considering Odisha as a focused area. In Barik and Pattanayak (2019), the authors used data plot dispositions on Gibb’s diagram to indicate the chemistry of groundwater for irrigation purposes in Rourkela city of Odisha. In Harichandan et al. (2021), empirical correlation analysis between WQI and physio-chemical parameters was investigated to study the drinkability of water. The authors in (Madhav et al. 2020) applied hydro-chemical processes to study drinking and farming cases.

To the best of our knowledge, machine learning based approaches have so far not been incorporated for water quality analysis, especially in Odisha and Vietnam, which can make the process more effective and less time consuming. In this context, some machine learning approaches that have already been applied successfully in other regions of the world are presented below as a point of motivation.

In Wang et al. (2021), stream water quality was predicted for different urban densities scenario using explainable machine learning methods. The authors in Haghiabi et al. (2018) have predicted WQI by random forest method, namely ANN and SVM. A similar approach was presented in Kouadri et al. (2021) on irregular datasets for the southeast Algerian region using multi-linear regression, random forest, M5P tree etc. The WQI based ML methods were also used in Wang et al. (2017) for the Ebinur lake watershed, China.

Supervised learning methods were used in Ahmedet al. (2019) for water-quality analysis of Rawal water lake in Pakistan. In Theyazn et al. (2020), the authors used AI based approach using an auto-regressive neural network model named NARNET for water quality analysis and classification. Principal component analysis (PCA) and gradient boosting methods were used in Khan et al. (2021) for water quality prediction and classification. Using hydro-meteorological data, a data-driven model was proposed in Sokolova et al. (2022) for predicting microbial water quality.

In Tiyasha et al. (2021), the authors have focused on assessing Klang river water quality using deep learning models. It involves water quality index computation by considering six notable water quality parameters. The proposed method uses random forest, decision tree and deep learning models on two scenarios: “small scale catchment” and “large scale catchment”. In both cases, the deep learning model is claimed to perform well in the case of non-linear data.

The authors in Tiyasha et al. (2020) have presented a thorough survey for the last decade on AI based model development for river water quality assessment. The major points focused on in this survey are variability in inputs for river water quality assessment, model architecture, and metrics for evaluation and investigation in different regions.

In Tiyasha et al. (2021), the authors adopted a hybrid tree-based approach for predicting river dissolved oxygen (DO) using satellite and hydro-meteorological data for the Klang river of Malaysia. Different selector algorithms are used for feature selection, namely, Boruta, GA, MARS and XGBoost. In the next phase, tree-based models like random forest, Ranger, and cForest are used to predict the DO. The best-performing models reported are XGBoost and MARS while considering the coefficient of determination as the evaluating parameter.

In the study by Nizal et al. in Nur Najwa Mohd et al. (2022), water quality parameters are predicted using a neural network-based approach with an integrated GUI. The focused region is selected as the Langat River of Malaysia. They adopted a novel approach of including rainfall data to predict water quality. The GUI design takes real-time inputs and can predict different water quality parameters.

Ubah et al. (Ubah et al. 2021) have proposed ANN-based models for analyzing river water quality for irrigation purposes. The data are collected for Ele river Nnewi of Anambra State. The model is capable of predicting the water quality index for one year. The authors in Venkata Vara Prasad et al. (2021) propose an automated analysis of water quality using ML and autoML methodology. They claimed that autoML performs better than conventional ML if binary classes are predicted. The authors in Zhu et al. (2022) have presented an extensive survey on water quality analysis for different environments like drinking water, surface water, seawater etc. A set of 45 ML algorithms are evaluated and presented for water quality analysis.

In this paper, we make a reasonable attempt to use AI-learning-based models to analyse the water quality for drinking purposes.

We applied XGboost, a polynomial support vector machine, a decision tree, logistic regression, a K-NN, and a CNN in this experiment. It has been found that XGBoost performs best, with an accuracy of 92.67 and 98%; as mentioned in the paper, we have provided more detailed results of the other models in Section “Results summary & discussion”, Results summary and Discussion.

The dataset collected and published by the Central Ground Water Board, Government of India and the Ministry of Natural Resources and Environment providing technical regulations on water resources monitoring in Vietnam are considered as inputs for this case study (Ground Water Year Book 2018). The significant contributions of this research are depicted below.

i.
The underlying problem is formulated as a multi-class classification problem for a distinct classification of the drinkability of water
ii.
State-of-the-art water quality estimation models, including Water Quality Index (WQI) model and Water Quality Class (WQC) model as per WHO specifications, are used to carry out a realistic analysis
iii.
A thorough exploratory data analysis is carried out for better water quality prediction
iv.
A set of well-known learning models are used for optimal prediction of WQC

The rest of the paper follows: Section “Water quality estimation model” describes the water quality estimation model, and Section “Problem formulation and proposed framework” shows the problem formulation and proposed framework. Section “Proposed strategy” explains the detailed strategy adapted to apply the AI-learning-based approaches. Section “Results summary & discussion” presents the results summary with performance evaluation metrics; the conclusion is given in Section “Conclusion”.

Water quality estimation model

The water quality index (WQI) is used to measure the quality of water, and it is calculated based on some state-of-the-art parameters (Khan and See 2016; Haghiabi et al. 2018; Wang et al. 2017). To estimate WQI, mostly nine well-known parameters are considered. In our case, out of four- teen surveyed parameters as given in Ground Water Year Book (2018), after performing exploratory data analysis, the thirteen most influencing parameters are considered, including Total Hardness (TH), Total Dissolved Solids (TDS), pH, Sulphate (SO₄), Electrical Conductivity (EC), Alkalinity, Magnesium (Mg), Sodium (Na), Potassium (K), Chloride (Cl), Calcium (Ca), Fluoride (F), and Bicarbonate (HCO₃). As per WHO guidelines (WHO Guidelines for drinking water quality 2004), the permissible range of different parameters is shown in Table 1. In the case of the Vietnam dataset, out of twenty-one surveyed parameters as given by the Vietnam authorities, after performing exploratory data analysis, 12 most influencing parameters are considered, including Total Dissolved Solids (TDS), pH, Sulphate (SO₄), Harshness-General, Harshness-Permanent, Magnesium (Mg), Sodium (Na), Potassium (K), Chloride (Cl), Calcium (Ca), Fluoride (F), Bicarbonate (HCO₃). Table 2 shows the permissible range of different parameters. Using these parameters and prescribed weights, the WQI and WQC are defined for each sample as given below.

Table 1 Odisha Permissible value for parameters used in calculating WQI

Groundwater Quality Analysis and Drinkability Prediction using Artificial Intelligence

Abstract

Similar content being viewed by others

Predicting irrigation water quality indices in a typical mining dominated area in the Upper West region of Ghana using multiple machine learning techniques

Groundwater Quality Assessment of Raipur City Using Machine Learning Models

Evaluation of groundwater quality for drinking purposes based on machine learning algorithms and GIS

Explore related subjects

Introduction

Water quality estimation model

Water quality index model

Water quality class model

Problem formulation and proposed framework

Water drinkability as Multi-class classification problem

Proposed flow diagram

Proposed strategy

Region selection and dataset collection

Exploratory data analysis

Data cleansing and outlier detection in Odisha and Vietnam Dataset

Correlation analysis in Odisha and Vietnam Dataset

Data pre-processing

Using machine learning models and performance evaluation

Models background

Performance metrics

Performance of CNN

Results summary & discussion

Performance of logistic regression

Performance of K-Nearest Neighbor

Performance of support vector machine and its variants

Performance of Decision Tree

Performance of AdaBoost

Performance of XGBoost

Conclusion

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation