1 Introduction

Water is an indispensable resource for all life forms on earth, yet approximately 780 million people worldwide lack access to improved water sources (Pal et al. 2018; Schmidt 2014). Global water security is threatened due to ecosystem modifications and the discharge of pollutants into rivers (Karthe et al. 2015). Rivers play a pivotal role in human life among various water sources, serving multiple purposes such as domestic, industrial, and agricultural use and contributing to transportation, tourism, and recreation (Sengupta et al. 2022). Approximately 187 million individuals depend on surface water for drinking, and this reliance is escalating due to population growth, rapid urbanization, and industrial expansion (Maniam et al. 2022). The increasing demand for surface water worldwide poses immense pressure on natural resources and ecosystems. Urbanization alters land use patterns, adversely affecting water quality and diminishing freshwater availability (de Mello et al. 2020; Luo et al. 2020).

Additionally, water quality deterioration in rivers is linked with industrialization, which often releases untreated or partially treated effluents into waterbodies (Sarker et al. 2021). These pollution and encroachment issues jeopardize sustainable access to clean and safe water for the future (Connor 2015; Tzanakakis et al. 2020). Developing countries, in particular, face challenges in providing safe water, as both ground and surface water sources become contaminated due to inadequate wastewater treatment and drainage facilities (Corcoran 2010; Kookana et al. 2020).

Dhaka, a megacity in Bangladesh, is experiencing rapid population growth, making it one of the most densely populated regions in the world, with an annual growth rate of around 4.5% (Mowla 2015; UN 2016; Nashwan et al. 2020). The increasing populations exert immense pressure on resources and thus resulted in many environmental issues such as urban warming (Dewan et al. 2021; Dewan and Corner 2012), ecosystem services loss (Abdullah et al. 2022) and severe water pollution (Chowdhury et al. 2014; Dewan et al. 2012). The Turag River, one of the four rivers surrounding the city, is a lifeline that is experiencing elevated levels of water pollution, caused by industrial, residential, and agricultural waste, as well as the use of pesticides and fertilizers (Bhat et al. 2018; Pathak and Bhardwaj 2021). Industrial activities within the Dhaka metro area discharge hazardous pollutants, including pesticides and heavy metals, further exacerbating pollution in downstream sections of the Turag River (Rampley et al. 2020; Uddin and Jeong 2021). The deteriorating water quality of this river raises concerns about the ecosystem’s health and the sustainability of resources, necessitating urgent measures to address and mitigate the impact of water pollution.

Water quality index (WQI) offers a comprehensive and standardized approach to assessing the river’s overall water quality, considering various physicochemical parameters and heavy metals (Iwar et al. 2021; Rahman et al. 2020). The calculation of WQI using different mathematical models and machine learning, including deep learning algorithms, offers a comprehensive and versatile approach to evaluate water quality (Omeka et al. 2023; Paul et al. 2022; Sultana and Dewan 2021; Uddin et al. 2023). These diverse methods enable researchers and environmental practitioners to leverage the strengths of each model to obtain accurate, efficient, and adaptive WQI predictions, facilitating informed decision-making and effective water resource management. Using WQI, decision-makers, and policymakers can make informed decisions about water resource management and pollution control measures, ensuring sustainable water use in the face of increasing demands. The WQI also enables early detection of water quality degradation, supporting timely interventions and preventing severe environmental damage. Furthermore, it aids in raising public awareness about health of the river, encouraging community engagement in water conservation efforts (Cheshmehzangi et al. 2019; Sanneh 2018). It is thus a valuable tool to protect ecosystem and public health, ensuring the wellbeing of millions relying on the river as a vital water source (Almeida et al. 2012; Iqbal et al. 2019; Singh et al. 2022).

Investigating the water quality of the Turag River is of paramount importance for Dhaka and its surrounding areas due to its critical role as a water source for drinking, irrigation, industry, and domestic use. The river’s susceptibility to pollutants from industries, sewage, agriculture, and urbanization necessitates a comprehensive understanding of its pollution sources and the effects on aquatic ecosystems and downstream regions. Beyond being a microcosm of broader urban pollution challenges, Turag’s investigation guides effective pollution control measures and urban planning. Additionally, the River plays a significant role in navigation, facilitating the movement of goods and people. Ensuring its water quality is essential not only for human health and aquatic life but also for maintaining the river’s navigability. Pollution can lead to sedimentation, reduce water flow, and accumulation of debris, hindering navigation and economic activities that rely on it (Gani et al. 2023; Thushari and Senevirathna 2020). By understanding the river’s pollution sources and impacts holistically, informed decisions can be made to implement comprehensive interventions that ensure its sustainable use and benefit the environment, communities, and economy.

The study of water quality in the Turag River has far-reaching implications encompassing both local ecology and public health. The river’s pivotal role as a receptor for pollutants originating from various sources underscoring urgency of the problem. The implications extend beyond ecological concerns, as the river’s water is extensively used for drinking, irrigation, and industrial processes, directly impacting public health. Contaminants entering the river can accumulate in the food chain, posing risks to aquatic life and, subsequently, to humans who depend on waters. Additionally, the current assessment techniques for water quality often lack the depth and accuracy needed to comprehensively address the complex interplay between pollutants and their effects. We address the following research questions: (1) What is the current water quality status of the Turag River? (2) How do various physicochemical parameters and pollutants contribute to the overall water quality of the river? And (3) How does the water quality of the river vary spatially and temporally along its course? To address these research questions, we aimed to assess and portray the water quality of the Turag River using a combination of WQI, machine learning algorithms (MLA), and geographic information system (GIS) techniques. The proposed methodology presents a promising solution to bridge the abovementioned gaps. By merging advanced techniques, this methodology offers a nuanced understanding of pollution dynamics, quantifying their impacts on both the river’s ecology and the health of the communities, relying on it. This, in turn, work could facilitate informed decision-making for pollution control, environmental management, and sustainable utilization of the river’s resources, ensuring the preservation of local ecosystems and protection of public health.

The selection of specific physicochemical parameters and heavy metals for this study underwent a meticulous process to ensure a comprehensive evaluation of water quality dynamics. The chosen physicochemical parameters encompassed temperature, pH, dissolved oxygen (DO), biochemical oxygen demand (BOD), total dissolved solids (TDS), turbidity, total hardness, and transparency. Each of these parameters was selected based on its role in reflecting different facets of water quality. Temperature, pH, and DO are fundamental indicators of aquatic ecosystem health, providing insights into the water's thermal stability, acidity, and oxygen content. BOD represents organic pollution levels, while TDS enables information about mineral content. Turbidity, total hardness, and transparency contribute to the assessment of water clarity and potential pollutants. Regarding heavy metals, a strategic selection was made to focus on sodium (Na), manganese (Mn), chromium (Cr), arsenic (As), calcium (Ca), lead (Pb), zinc (Zn), copper (Cu), and iron (Fe). These metals were chosen considering their wide-ranging sources, both natural and anthropogenic, and their recognized impacts on water quality and ecological health. The presence of lead (Pb) and arsenic (As) among the chosen metals underlines their significant toxicological implications and relevance in pollution assessment. Throughout the selection process, careful consideration was given to parameters that could collectively capture the varying dimensions of water quality, ranging from basic physicochemical attributes to heavy metal contamination.

The contribution of this work to science and society lies in its comprehensive assessment of the Turag River’s water quality using advanced WQI, MLA, and GIS methodologies. It will provide updated data, reveal pollution patterns, identify sources of pollutants, and raise awareness to inform effective water resource management and pollution control measures, benefiting both the environment and communities reliant on the river.

2 Materials and Methods

2.1 Study Area

The research was conducted in the Turag River, located in Dhaka, Bangladesh (23.53° N to 23.54° N and 90.21° E to 90.23° E, Fig. 1). This river is a crucial waterbody that flows through a densely populated region, serving as a vital water source for various purposes, including domestic, agricultural, and industrial use (Rahman et al. 2021a; Yasmin et al. 2023; Yin et al. 2021). Moreover, its pivotal role in sustaining local livelihoods and economic activities highlights its importance. Ecologically, it supports diverse aquatic species and ecosystems, making its preservation crucial for maintaining environmental balance. However, the river’s exposure to pollution raises concerns about water quality and ecosystem health. Given its interconnectedness with downstream locations, the Turag River’s condition has regional implications, emphasizing the need for comprehensive assessments and effective pollution control measures. In essence, the river’s multifaceted contributions to daily life, economies, ecosystems, and regional water quality underscore its central importance. Geographically, the Turag basin encompasses varying landscapes and climatic conditions. The topography ranges from flat plains to gently undulating terrain. The study area is characterized by a subtropical monsoon climate, featuring distinct wet and dry seasons. The monsoon rains significantly influence the river’s flow and water quality dynamics. Ecologically, it sustains a diverse range of aquatic species and ecosystems, emphasizing its importance for maintaining environmental balance. The Turag River was designated as an ecologically critical river by the DoE of Bangladesh in 2009 due to its substantial pollution burden (Shawon et al. 2021; Whitehead et al. 2018). This official recognition highlighted the pressing need to assess and address the water quality challenges the river faced. The study area’s selection was thus guided by a commitment to addressing significant environmental concerns, contributing to informed decision-making, and devising effective strategies for water quality management.

Fig. 1
figure 1

Location of the Turag River and water quality sampling sites

2.2 Data Collection

The Turag River course was subject to a water sampling campaign, during which thirty water sample sites were collected from various locations. Physicochemical parameters including temperature, pH, DO, BOD, TDS, turbidity, total hardness, and transparency (Fig. 2), and heavy metals including Na, Mn, Cr, As, Ca, Pb, Zn, Cu, and Fe (Fig. 3) parameters were derived from these thirty sample sites. Physicochemical parameters included measurements of temperature, pH levels, DO to evaluate aquatic life support, BOD regarding organic matter decomposition, TDS to observe overall substance concentration, turbidity for water clarity, total hardness reflecting calcium and magnesium content, and transparency as an indicator of visual clarity. In parallel, heavy metals such as sodium, manganese, chromium, arsenic, calcium, lead, zinc, copper, and iron were analyzed to understand their presence and potential impacts on water quality and aquatic ecosystems. A comprehensive dataset forms the basis for a holistic evaluation of the Turag River’s water quality and identification of possible pollution sources and mitigation strategies. Incorporating both physicochemical parameters and heavy metal analysis in the study provides a comprehensive overview of water quality, accounting for various factors that can impact its usability, safety, and ecological balance.

Fig. 2
figure 2

Maps of physicochemical parameters in the study area

Fig. 3
figure 3

Heavy metals maps

2.3 Sampling Process

The assessment of water quality involved a systematic and comprehensive sampling strategy. Physicochemical parameters were diligently collected on a monthly basis throughout a one-year period (2022). Each station was subject to three distinct sampling points: discharge point (DP), contamination point (CP), and midpoint (MP), resulting in a total of 1080 samples collected for the evaluation of physicochemical parameters. This approach was aimed at capturing potential variations across different points within the same station and over different months, thus providing a holistic understanding of water quality dynamics.

In parallel, heavy metal analysis was carried out with similar rigor. Sampling was conducted twice during each of the three seasons—pre-monsoon (mid-March to mid-June), monsoon (mid-June to mid-October), and post-monsoon (mid-October to mid-March)—resulting in a total of 180 samples collected for laboratory testing. These samples were meticulously tested for heavy metal content following standardized procedures at the Bangladesh Agricultural Research Institute (BARI, http://www.bari.gov.bd/).

The sampling process involved using clean, 500 ml plastic bottles rigorously cleaned with distilled water before use (Armah et al. 2012; Shekoohiyan and Akbarzadeh 2022; Wood 1976). Grab sampling was employed to obtain water samples, which were promptly sealed and appropriately labeled for accurate identification (Cuffney et al. 1993; Lyons et al. 2015). The pH levels were measured using a pocket-type pH meter, while TDS (total dissolved solids) was assessed using a dedicated TDS meter (Nasir et al. 2019; Talbert 2007). DO (dissolved oxygen) and BOD (biochemical oxygen demand) values were determined through titration methods, ensuring precise measurements (Fathima et al. 2014). Temperature readings were recorded with thermometers, and transparency was evaluated using the Secchi disk depth (Mustapha and Omotoso 2005). Additionally, total hardness values were ascertained by employing EDTA titration, and turbidity levels were measured with a turbidity meter (Ehiagbonare and Ogunrinde 2010; Princela et al. 2021). In the laboratory, the estimation of heavy metals was conducted using both instrumental and manual methods, following the standard procedures for reliable results (Akinyele and Shokunbi 2015; McLaughlin et al. 2000; Nemati et al. 2011).

2.4 Methods

Typically, WQI is developed using a standardized methodology considering various water quality parameters. The study followed a systematic approach in assessing water quality using integrated methods (Fig. 4). Firstly, data was gathered and categorized into physicochemical parameters and heavy metals from 30 randomly selected stations. The second step involved calculating WQI using a weighted arithmetic index method for both types of parameters (cf. 2.3.1 water quality index). Subsequently, a geodatabase preparation was conducted by converting data into spatial form with inverse distance weighted (IDW) interpolation (Figs. 2, 3). This supported input parameters with spatial resolution necessary for machine learning model development. In the fourth step, a supervised classification-based machine learning model was constructed, using the calculated WQI values as outputs. The dataset was divided into training and validation sets in the fifth step (50% each). This choice can be attributed to several potential reasons. Firstly, a 50–50 split ensures a balanced evaluation of their machine learning model, allowing for a fair assessment of its performance on unseen data. Moreover, it might have been necessitated by limitations in the available dataset, making it challenging to allocate larger proportions for training or testing. Additionally, a balanced split can help address issues related to data imbalance, if present. We prioritized equal representation to ensure the robustness of the model’s generalization capabilities. Overall, the selection of a 50–50 split aligns with the aims of this work, emphasizing a balanced evaluation while acknowledging potential data constraints. The sixth step focused on training the machine learning model and predicting WQI values. In the seventh step, the model’s performance was validated using multiple statistical indices. Finally, a comparative study between the proposed models provided essential insights for effective water quality management. This structured methodology ensures a comprehensive approach to addressing water quality assessment and management.

Fig. 4
figure 4

Flowchart for WQI prediction

2.4.1 Water Quality Index

WQI is a highly effective method and serves as a standard for devising water management strategies. To calculate WQI, eight physicochemical parameters, and nine heavy metals were selected. The permissible limits for these parameters were based on the WHO standards for drinking water, and in cases where WHO values were unavailable, the Bangladesh drinking water standards provided by the Department of Environment (DoE) of Bangladesh were used (Table 1). WQI was determined using a weighted arithmetic index method (Bouslah et al. 2017; Teshome 2020). Additionally, a sub-index (Qn) was computed using the following expression for each parameter (Zotou et al. 2020). This approach ensures a clear and robust water quality assessment, facilitating better decision-making in water resource management. The following steps can be considered to calculate WQI: (1) gather data related to physicochemical parameters and heavy metals that characterize water quality. (2) Compute the proportionality constant “K” value utilizing Eq. (1), where “Sn” represents the acceptable standard for the nth parameter. (3) Compute the quality rating (Qn) for the nth parameter using Eq. (2). (4) Determine the unit weight (Wn) for the nth parameter using Eq. (3). (5) Finally, compute the WQI utilizing Eq. (4).

$$K=\frac{1}{\sum (\frac{1}{{S}_{n}})}$$
(1)

where K is the weighting factor assigned relative importance to each water quality parameter in the WQI calculation (Dunnette 1979). Sn is the sub-index for each parameter, quantifying the quality of individual water quality parameters separately (Rajkumar et al. 2022).

$${Q}_{n}=\left(\frac{{V}_{n}-{V}_{io}}{{S}_{n}-{V}_{io}}\right)\times 100\%$$
(2)

where Qn is the quality rating for each parameter, representing a qualitative assessment of the water quality for specific parameters (Uddin et al. 2021). Vn is the estimated value and Vio represents the ideal value in pure water of the nth parameter of the sampling station.

$${W}_{n}=\frac{K}{{S}_{n}}$$
(3)

where Wn is the weighted sub-index for each parameter (Uddin et al. 2021), obtained by multiplying the sub-index value with its corresponding weighting factor, accounting for its importance in the overall WQI calculation.

$$WQI=\frac{\sum ({W}_{n}{Q}_{n})}{\sum {W}_{n}}$$
(4)

where WQI indicates the water quality index of physiochemical parameters and heavy metals.

Table 1 Descriptive statistics of physicochemical parameters and heavy metals

2.4.2 Machine Learning Algorithm

Artificial neural network (ANN) is a computational model inspired by the human brain’s neural network structure (Kasabov 2019; Shanmuganathan 2016). It consists of interconnected nodes, called neurons, organized in layers. ANN has been widely used for WQI calculation because it can process large datasets and capture complex relationships between input variables and WQI output (Hameed et al. 2017). This study describes three optimizers commonly used in training ANNs: Levenberg–Marquardt, Bayesian Optimization, and Scaled Conjugate Gradient. (1) Levenberg–Marquardt (LM) optimizer: The LM algorithm is an iterative method for nonlinear optimization problems (Lourakis 2005; Yanis et al. 2023). In the context of ANN training for WQI calculation, LM optimizer adjusts the neural network’s weights and biases during training to minimize the difference between predicted WQI values and actual values from the dataset. The LM optimizer is efficient in cases where the cost function has multiple local minima and is especially useful for converging quickly toward the optimal solution (Ozyildirim and Kiran 2021). (2) Bayesian Optimization (BO) optimizer: The BO is a probabilistic global optimization technique to find the optimal hyperparameters for training an ANN (Yanis et al. 2023). In the context of WQI calculation, the BO optimizer efficiently explores the hyperparameter space, such as learning rates, the number of hidden layers, and neuron counts, to identify the best configuration that maximizes the ANN’s performance (Yanis et al. 2023). The advantage of BO is that it can handle noisy and non-convex objective functions, which are common in complex ANN models. (3) Scaled Conjugate Gradient (SCG) optimizer: The SCG algorithm is an optimization technique that updates the neural network’s weights and biases during training based on the conjugate gradient method. It is well-suited for large-scale and highly nonlinear problems, making it suitable for ANN-based WQI calculation. The SCG optimizer efficiently converges to the optimal solution while avoiding the need to set manual learning rates, reducing the complexity of the training process (Costa et al. 2020).

The selection of specific optimization techniques for the ANN model was driven by a combination of their demonstrated effectiveness in similar studies and their suitability for addressing the complexities of water quality prediction. Each algorithm was chosen for its unique strengths and compatibility with the objectives of this study. These algorithms have been successfully applied in diverse fields, including hydrology, environmental monitoring, and water quality assessment, indicating their adaptability to complex and dynamic systems like water bodies. Moreover, their potential to provide accurate predictions aligned with the objectives of the study further solidified their suitability for the Turag River.

2.4.2.1 Data Preparation for the Machine Learning Algorithm

In this study, a dataset comprising physicochemical parameters and heavy metals served as input data, and the corresponding WQI calculated using a weighted arithmetic index method was the output. To build the model, 50% of the total dataset was used for training, while the remaining 50% was utilized for validation (Xu and Goodacre 2018).

2.4.2.2 Performance Assessment of the Machine Learning Algorithm

Various statistical indices, including R2 (coefficient of determination), RMSE (root mean squared error), MSE (mean squared error), MAE (mean absolute error), and the Taylor diagram, were employed to validate the model’s performance (Abdalla and Mohammed 2022). These indices collectively offer insights into accuracy, precision, and overall predictive capability. R2 quantifies variance explanation, RMSE evaluates error magnitude, while MSE and MAE provide distinct accuracy perspectives. The Taylor diagram presents a unique visual overview of multiple metrics.

3 Results and Discussion

3.1 Onsite and Laboratory Experiments for Computing WQI

Table 1 presents minimum, maximum, and mean values of various physicochemical parameters measured in a waterbody. They include temperature (ranging from 26.45 to 31.81 °C), pH (ranging from 6.10 to 7.63), dissolved oxygen (DO) levels (ranging from 3.61 to 6.09 mg/L), and biochemical oxygen demand (BOD) values (ranging from 7.32 to 17.26 mg/L). Additionally, the table shows the total dissolved solids (TDS) ranging from 114.00 to 820.00 mg/L, turbidity from 1.39 to 9.95 mg/L, and total hardness from 10.18 to 100.75 mg/L. The transparency of the water, measured in inches, varies from 2.08 to 6.98 inches.

These values provide essential information about the water quality of the studied river. The temperature, pH, DO, BOD, TDS, turbidity, total hardness, and transparency are critical parameters used to assess water quality and its suitability for various purposes. The heavy metals and their corresponding concentrations include sodium (Na), with values ranging from 19.54 to 62.33 mg/L; manganese (Mn), ranging from 0.00 to 0.88 mg/L; chromium (Cr), ranging from 0.00 to 0.01 mg/L; ranging from 5.09 to 31.47 mg/L; arsenic (As), ranging from 0.00 to 0.02 mg/L; calcium (Ca), ranging from 20.09 to 71.69 mg/L; lead (Pb), ranging from 0.00 to 0.22 mg/L; zinc (Zn), ranging from 0.00 to 0.70 mg/L; copper (Cu), ranging from 0.00 to 0.85 mg/L; and iron (Fe), ranging from 0.00 to 0.32 mg/L (Table 1).

The observed values in water quality parameters suggest a dynamic interplay of natural and anthropogenic factors. Temperature fluctuations may be due to seasonal variations or thermal pollution, while pH levels could reflect agricultural runoff or urban influences. Dissolved oxygen levels may be affected by organic matter decomposition and pollution sources, with elevated BOD values indicating organic pollution. Total dissolved solids might be influenced by geological, industrial, or agricultural factors, and turbidity could result from sediment runoff. Total hardness may relate to geological conditions while water transparency can be affected by suspended particles and pollutants. The variations in heavy metal concentrations may be linked to local industrial activities or the riverbed’s geological composition. Seasonal changes and weather patterns may contribute to fluctuations in physicochemical parameters like temperature and turbidity. Precise causative factors warrant further investigation to understand their implications for the Turag River’s ecological balance.

These results provide crucial insights into the heavy metal concentrations in the water body, significantly assessing water quality and potential environmental impacts (Li et al. 2023; Lv et al. 2022). The minimum and maximum values indicate the range within which the heavy metal concentrations vary, while the mean value represents the average concentration (Keshav Krishna and Rama Mohan 2016; Panghal et al. 2021). Researchers and policymakers can determine the potential risks of heavy metal pollution by comparing these concentrations to established water quality standards or guidelines (Egbueri et al. 2021; Li et al. 2021).

3.2 Water Quality Index by the Weighted Arithmetic Index

In this study, a weighted arithmetic index model was employed to calculate WQI values, ranging from 56.31 to 117.53 (for physicochemical parameters, Fig. 5a), with an average of 87.28 and WQI values, ranging from 1.84 to 196.12 (for heavy metals, Fig. 5c), with an average of 76.06 in the study area. The water quality evaluation utilized the classification technique developed by Brown et al. (1972). Based on the WQI values, water quality status for physicochemical parameters was categorized into three types, “poor,” “very poor,” and “unfit for consumption” (Table 2), while WQI values for heavy metals were classified into five categories, “excellent,” “good,” “poor,” “very poor,” and “unfit for consumption” (Table 3). Notably, there were no stations classified as “excellent” and “good” water quality status for physicochemical parameters, while nine stations (30%) fell under the “excellent” category and three stations (10%) fell under the “good” category for heavy metals in the study area. In addition, from physicochemical and heavy metals parameters, five stations (16.67%) and 12 stations (40%) fell under the category of “unfit for consumption,” respectively. These findings shed light on the varying water quality conditions in the area and highlight the significance of continuous monitoring and appropriate management strategies to enhance and preserve Turag River’s water quality.

Fig. 5
figure 5

Water quality index for physicochemical parameters and heavy metals. a A weighted arithmetic index method and b machine learning algorithm. c A weighted arithmetic index method and d machine learning algorithm

Table 2 Water quality index of the study area for physicochemical parameters, based on a weighted arithmetic index method
Table 3 Water quality index of the study area for heavy metals, based on a weighted arithmetic index method

The findings of this study indeed highlighted the significant impact of intensified human activities on the deterioration of water quality along the Turag River. Several human activities were strongly correlated with a decline in water quality, pointing to potential pollution sources that warrant attention. Industrial activities emerged as one of the major contributors. Industries, particularly those discharging untreated or poorly treated effluents, can introduce heavy metals, organic pollutants, and other contaminants into the river. Industries situated near the river's course, including manufacturing plants, chemical processing units, and textile factories, have the potential to significantly influence water quality negatively. Urban areas, characterized by dense populations and various anthropogenic activities, also showed a strong correlation with declining water quality. Urban runoff from roads, residential areas, and commercial spaces can carry pollutants like oil, heavy metals, litter, and nutrients into the river. Additionally, inadequate sewage treatment systems in some urban areas can lead to the discharge of untreated sewage and organic matter, further contributing to water quality degradation. Agricultural practices, including the use of fertilizers and pesticides, also played a role in influencing water quality. Runoff from agricultural fields can carry excess nutrients, sediment, and chemicals into the river, impacting water clarity, nutrient levels, and potentially promoting algal blooms.

3.3 ML-Based WQI

In this study, WQI calculation employed the ANN machine-learning algorithm with three optimizers: LM, BO, and SCG. These optimizers were utilized to train the ANN model and predict the WQI values. Among the three optimizers, the LM optimizer demonstrated superior performance compared to BO and SCG.

In summary, LM’s convergence speed, noise resilience, capacity to handle non-linearity, and compatibility with regression problems collectively contributed to its superior performance, making it the preferred optimizer for calculating WQI and improving water quality assessment and management strategies (Kadam et al. 2019; Yaseen et al. 2019). Consequently, the final water quality index was developed based on the output of the LM optimizer when used with the ANN algorithm. This well-structured approach ensures the selection of the most effective optimizer to obtain accurate and reliable WQI predictions, which are crucial for water quality assessment and management decisions (Barzegar et al. 2020; Salem and Elwakil 2023). Figures 5b and 5d depict the machine learning output based on the LM optimizer. According to the spatial distribution of the resulting map, the WQI values ranged between 50.07 (low) and 115.65 (high) for physicochemical parameters. Furthermore, the spatial distribution of WQI for heavy metals illustrated that the WQI values ranged from -9.40 (low) to 186.59 (high) in the study area. The higher WQI observed in the location very close to the industrial zone can be justified by the potential impact of industrial activities on water quality. This finding aligns with earlier studies (Haghnazar et al. 2022; Ukah et al. 2020), which also reported similar observations, indicating the influence of industrial zones on water quality degradation. Industries often release pollutants and waste into nearby water bodies, increasing contaminants and reducing water quality (Barasarathi et al. 2022; Kanu and Achi 2011; Rajaram and Das 2008). The proximity to the industrial zone likely exposes it to a higher concentration of pollutants, resulting in a higher WQI value compared to other locations farther away from industrial influences (Medeiros et al. 2017; Srivastava et al. 2011; Wilbers et al. 2014).

3.4 Correlation Matrix of Physicochemical Parameter and Heavy Metals

According to the heat map, representing the correlation matrix for physicochemical parameters, WQI is most strongly influenced by biochemical oxygen demand (BOD) with a high coefficient value of 0.91 (Fig. 6a). On the other hand, temperature has the lowest impact on WQI with a coefficient value of 0.11. By reordering the coefficient values in descending order of importance, the following sequence is observed: temperature (0.11) < dissolved oxygen (DO) (0.19) < turbidity (0.35) < pH (0.36) < total dissolved solids (TDS) (0.38) < total hardness (0.43) < transparency (0.57) < BOD (0.91). This analysis provided valuable insights into the relative significance of each physicochemical parameter in calculating WQI.

Fig. 6
figure 6

Heat map of Pearson correlation

A strong positive correlation between BOD and the WQI can be attributed to the nature of BOD as a crucial indicator of water pollution (Subramaniam et al. 2023; Taher et al. 2021; Umwali et al. 2021), which represents oxygen microorganisms consume while decomposing organic matter in water. Higher BOD levels indicate the presence of significant organic pollution, which negatively impacts water quality and the overall ecosystem (Carr and Neary 2008; Kanu and Achi 2011; Shehab et al. 2021). As a result, higher BOD values lead to lower WQI scores, reflecting poor water quality. On the other hand, weak negative correlations observed with parameters like temperature and DO can be explained by their effects on the aquatic environment. Warmer temperatures may accelerate the metabolic activity of organisms, including microorganisms responsible for decomposition. This increased metabolic activity can lead to a higher BOD and, consequently, a lower WQI. Similarly, low dissolved oxygen levels can indicate poor oxygen saturation in water, affecting the survival of aquatic organisms and contributing to lower WQI scores (Meshesha et al. 2020; Pandit et al. 2023; Rahman et al. 2021b).

The correlation matrix for heavy metals revealed their respective influences on the construction of the WQI (Fig. 6b). Among the heavy metals, Pb exhibits the highest positive influence with a coefficient value of 0.89, indicating a significant impact on the WQI. On the other hand, Cu has the lowest positive influence, with a coefficient value of 0.39. Reordering the heavy metals based on Pearson’s correlation coefficients in descending order of influence on WQI yields the following sequence: Cu (0.39) < Mn (0.64) < Zn (0.67) < Cr (0.68) < Na (0.78) < As (0.85) < Ca (0.87) < Pb (0.89). This analysis provided a precise overview of the relative impacts of each heavy metal on calculating the water quality index, offering valuable insights for assessing and managing water quality concerning heavy metal contamination (Akhtar et al. 2021; Bhuiyan et al. 2016; Ouyang et al. 2018).

3.5 Validation

Table 4 presents the performance evaluation of three machine learning models for WQI calculation: LM-ANN, BO-ANN, and SCG-ANN. For the training dataset, the LM-ANN model exhibited highest R2 value (0.997), indicating a strong correlation between predicted and actual WQI. It also achieved lowest RMSE (2.5), MSE (9.05), and MAE (1.52), indicating minimal prediction errors. BO-ANN performed well with an R2 of 0.980 and relatively low error metrics, while SCG-ANN showed a lower R2 of 0.920 and higher error values. This finding is consistent with previous studies (Deshpande et al. 2022; Hassan et al. 2021; Kadam et al. 2019), which also reported similar observations, further highlighting the significant influence of industrial zones on water quality deterioration.

Table 4 Performance of three machine learning models for WQI calculation

For testing dataset, LM-ANN performed excellently with an impressive R2 of 0.999, showcasing the model’s strong predictive capabilities. It achieved a reasonably low RMSE (3.6), MSE (11.01), and MAE (1.78), confirming its accuracy. BO-ANN also demonstrated good performance with an R2 of 0.989 and relatively low error metrics. SCG-ANN performed adequately but had a lower R2 of 0.940 and higher error values than the other models.

Additionally, the Taylor diagram was constructed to evaluate further the performance of the different machine learning models for WQI calculation (Fig. 7). The diagram provided a comprehensive visualization of multiple statistical metrics, such as correlation coefficient (r), RMSE, and standard deviation (SD), allowing for comprehensive model comparison (Asadollah et al. 2021; Kumar and Pratap 2023; Rezaie-Balf et al. 2020).

Fig. 7
figure 7

Taylor diagrams representing the relationship between actual and predicted results for WQI using machine learning algorithms

The Taylor diagram analysis reaffirmed the excellent performance of the LM-ANN model. It exhibited higher “r” and lower RMSE values than other models. Moreover, the standard deviation ratio indicated that LM-ANN model’s predictions were more consistent and closer to the observed values than BO-ANN and SCG-ANN (Deshpande et al. 2022; Miah et al. 2020; Zounemat-Kermani 2012).

3.6 Comparative Analysis

Both maps (developed from a weighted arithmetic index and LM-ANN model) were classified into five categories: excellent (1), good (2), poor (3), very poor (4), and unfit for consumption (5) (Brown et al. 1972). The two-dimensional matrix multiplication method was then applied to calculate similarities and dissimilarities between the maps (Islam and Sado 2000; Rahman et al. 2021c). In assessing the physicochemical parameters, this study observed that 88.13% of the areas shared the same WQI values in both maps, as represented by diagonal elements of the matrix (class no.: pixel values, 1:1395, 4:7002, and 9:362, Fig. 8). This indicates that the intensity of WQI was consistent in 88.13% of the locations across the two maps. The remaining 11.87% of places showed dissimilarities, as noted in the non-diagonal values in the matrix. Besides, WQI for minimum physicochemical value was above 50 in both analyses, and it observed that the water quality categories start from “poor” to “unfit for consumption.” The transition of water quality from “excellent” at the source to “unfit for consumption” downstream along the Turag River is a useful observation that sheds light on the varying levels of pollution and anthropogenic influences. This transition can be attributed to a combination of diverse pollution sources that progressively impact water quality as the river flows downstream. The potential sources of pollution contributing to this deterioration include industrial discharges, municipal sewage, agricultural runoff, and urban effluents. Industries release various pollutants, including heavy metals and organic compounds, into the river, which accumulate as the river flows downstream. Municipal sewage introduces organic matter and nutrients, contributing to elevated BOD and nutrient levels. Similarly, agricultural runoff can carry pesticides, fertilizers, and sediment, affecting water clarity and nutrient content. Urban effluents from populated areas can introduce a range of contaminants, including pathogens and pollutants from household activities.

Fig. 8
figure 8

Comparison of WQI, developed using a weighted arithmetic index method with WQI output from the machine learning algorithm (e.g., LM-ANN)

Similarly, in the comparative evaluation of heavy metals, this study revealed that 83.65% of the areas had identical WQI values in both maps, represented by the diagonal elements of the matrix (class no.: pixel values, 1:1395, 4:7002, and 9:362, Fig. 8) (class no.: pixel values, 1:1071, 4:1376, 9:1447, 16:1016, and 25:3404, Fig. 8). This indicates that 83.65% of locations shared the same WQI intensity in both maps. Conversely, 16.35% of places exhibited dissimilarities, showing variations in WQI values across the two maps. This analysis provides valuable insights into the degree of agreement and discrepancies between the two maps concerning water quality assessments based on physicochemical parameters and heavy metals (Islam and Sado 2000). The differences between the models can be attributed to the variations in data variability, model complexity, and selection of input features (Chia et al. 2022; Deshpande et al. 2022). Additionally, differences in optimization methods and the presence of outliers may also contribute to divergent predictions.

The high percentage of similarities (88.13% for physicochemical parameters and 83.65% for heavy metals) between the machine learning model’s WQI and the traditional index or empirical formula highlights its potential as a viable alternative for WQI calculation (Chia et al. 2022; Hassan et al. 2021). This suggests that the machine learning approach offers a reliable and consistent means of assessing water quality, demonstrating its capability to replace or complement traditional methods with improved accuracy and precision (Tung and Yaseen 2020; Yaseen et al. 2019). These findings support adopting machine learning models as practical tools for water quality assessment and decision-making in resource management.

3.7 Implications and Practical Applications

The findings of this work hold far-reaching implications across the spheres of environmental management, policy-making, and public health, intricately influenced by the integration of the weighted arithmetic index, machine learning algorithms, and GIS techniques. Incorporation of a GIS enabled us to integrate geospatial data, conduct spatial analysis, and visualize complex water quality patterns in a spatial context. This approach offered valuable insights that might not have been as evident through traditional analysis methods. The unique strength of GIS lies in its ability to uncover spatial trends and relationships that might be missed by analyzing data in isolation. We identified specific regions where pollution sources, such as industries or urban areas, were contributing to localized water quality degradation. Moreover, maps helped us understand the spatial variations in water quality parameters, revealing gradients of contamination and highlighting areas of concern that demanded targeted interventions.

In terms of environmental management, the study’s holistic assessment of water quality serves as a valuable guide for resource allocation and targeted interventions. The identification of pollution sources and their spatial distribution empowers environmental managers to prioritize mitigation efforts effectively. Additionally, the quantification of the impact of diverse parameters and heavy metals on overall water quality informs the design of customized strategies to combat pollution challenges.

In the domain of policy-making, the results of this study offered a data-rich foundation for crafting evidence-based regulations. The synergy of the weighted arithmetic index, machine learning predictions, and GIS visualizations allows policymakers to comprehend the intricate relationships among factors influencing water quality. This knowledge aids in the creation of contextually relevant standards and guidelines, facilitating the implementation of practical pollution control measures. Furthermore, the predictive capabilities of the machine learning model enable proactive policy development, enabling preemptive measures against evolving water quality scenarios. Concurrently, the assessment’s ramifications extend to public health, where the direct implications of water quality on human well-being are profound. The evaluation’s capacity to quantify pollutants and potential health risks equips health authorities to provide timely advisories and implement protective measures. This proactive approach mitigated the risk of waterborne illnesses, safeguarding communities dependent on these water resources. In essence, the integrated methodology not only enriches the understanding of water quality but also fosters a paradigm shift in sustainable water resource management, policy formulation, and public health safeguards, all united through a robust synthesis of innovative methodologies.

Furthermore, these results are instrumental in raising public awareness about the importance of water quality and the associated health risks. An informed citizenry is more likely to advocate for stringent pollution control measures and hold responsible parties accountable for maintaining clean water resources. These insights also have implications for long-term planning, facilitating the implementation of monitoring programs and adaptive management strategies. Such efforts support the continual improvement of water quality over time and contribute to the overarching goal of preserving public health, ecological equilibrium, and the sustainable utilization of the Turag River’s waters.

3.8 Limitations of this Study

Despite the significance of the Turag River, the study has certain limitations. These include constraints on the temporal scope of data collection and the availability of historical data, which may limit the ability to capture long-term trends fully. Additionally, the study focuses on physicochemical parameters and heavy metals, while other potential pollutants may warrant future investigation.

In addition, the classification of water quality as “unfit for consumption” for both physicochemical parameters and heavy metals in certain areas raised significant concerns about potential health risks associated with exposure to polluted water. The presence of pollutants beyond permissible limits suggested that the water might contain contaminants that could have adverse effects on human health. Substances such as heavy metals, pathogens, and organic pollutants can pose risks through direct contact, ingestion, or even inhalation of contaminated water vapors. However, it is important to note that while the study identified areas with compromised water quality, a direct assessment of the public health impacts was not conducted as part of this research. Quantifying the precise health risks would require a comprehensive public health impact assessment, considering factors like the population exposed, the duration of exposure, and the specific health effects linked to the identified pollutants. For a more accurate evaluation of health risks, future studies could consider collaborating with public health experts to conduct health risk assessments. These assessments would involve quantifying exposure levels, understanding potential pathways of exposure, and analyzing health outcomes associated with specific pollutants. By quantifying health risks, policymakers and relevant authorities can make informed decisions regarding water quality management and develop appropriate measures to safeguard public health and wellbeing.

Moreover, the integration of hydrodynamic models with water quality models could offer insights into pollutant transport and fate under varying flow conditions. Exploring the socio-economic impacts of water quality degradation and assessing the effectiveness of implemented pollution control measures are essential for comprehensive decision-making. Further studies on emerging contaminants like pharmaceuticals and micro plastics can address modern pollution challenges. Engaging local communities and stakeholders in collaborative research endeavors can also yield insights into traditional practices and community needs, fostering a more holistic approach to pollution management. By exploring these avenues, future research can deepen the understanding of the complexities surrounding water quality in the Turag River. These endeavors would contribute to more effective pollution management strategies, informed policy decisions, and a sustainable approach to preserving the health of the river ecosystem and the well-being of the communities dependent on it.

4 Conclusions

This study focused on developing and applying machine learning algorithms to calculate WQI in the context of physicochemical parameters and heavy metals using LM, BO, and SCG optimizers with ANN models for prediction. The models’ performance evaluation and comparative analysis revealed that the LM-ANN model consistently outperformed the other two algorithms, exhibiting high accuracy and predictive capabilities (0.997 ~ 0.999). The Taylor diagrams demonstrated the reliability of the LM-ANN model, reinforcing its suitability for water quality assessment. Moreover, the analysis of similarities and dissimilarities between the machine learning model and traditional methods indicated a significant percentage of agreement, affirming the machine learning model’s potential as an alternate and robust approach for WQI calculation (physicochemical parameters: 88.13% and heavy metals: 83.65%). In addition, WQI is notably influenced by BOD, demonstrating a substantial coefficient value of 0.91. Among the array of heavy metals studied, Pb emerges as a dominant factor, showcasing a prominent positive influence with a coefficient value of 0.89. This highlights Pb’s pronounced impact on shaping the WQI, signifying its pivotal role in the overall water quality assessment. Overall, the findings of this study contribute valuable insights to the field of water quality assessment and highlight the effectiveness of machine learning models in supporting water resource management and environmental decision-making processes.

Future studies could explore the integration of more advanced machine learning algorithms and comprehensive datasets with real-time sensor data to improve the accuracy and predictive capabilities of WQI calculations. Additionally, investigating the potential of explainable AI techniques and integrating machine learning models with emerging technologies, such as IoT and remote sensing, could lead to intelligent and autonomous water quality monitoring systems for more efficient water resource management.