Introduction

Life on earth depends on water. The two main components of water resources are groundwater and surface water. In recent times, the pressure on groundwater resources (comprising of boreholes and hand-dug wells) has increased, due to easy contamination and pollution of surface water bodies (Egbueri and Agbasi 2022a). Nevertheless, many surface water resources are free of pollution and useful to attend to the needs of humans. Groundwater is used to meet municipal, agricultural, and recreational needs in both urban and rural regions. Groundwater is also used for industrial and commercial purposes in many nations across the world. In the present time, groundwater is an alternative supply of drinking water in regions where surface water is sparse or extremely polluted (Wang et al. 2020; Omeka et al. 2022). However, there have been high evidences of increasing groundwater pollution across the globe (Li and Wu 2019; Adimalla 2020; Agbasi and Egbueri 2022). The continuous pollution of groundwater, which had been regarded as a safer source of water, has become a global threat to homes, communities, industries, and agricultural sector. In dry and semi-arid areas, groundwater is an essential supply of water for domestic, industrial, and agricultural purposes (Ukah et al. 2019; Wu et al. 2017; Li et al. 2017a). Around the world, groundwater is used to produce around 40% of food and about 30% of drinking water (Amiri et al. 2021a; Nickson et al. 2005). However, the quick economic growth and population growth raise the demand for freshwater and degrade the quality of the water.

Numerous pollutants are introduced into the soil and water environments (which are in steady interaction), endangering both ecological stability and human health (Amiri et al. 2021b; Egbueri and Agbasi 2022b). Nonpoint sources of pollution, including septic tank leaks, home wastes, animal manure, and urban runoffs, contaminate both groundwater and surface water resources (Karunanidhi et al. 2020a). When sources of groundwater recharge are contaminated, it seeps into the groundwater through the earth, rendering the water unsafe for human use (Karunanidhi et al. 2020a). Increase in human activities, such as agricultural and industrial practices, as a result of growing populations, cause a series of changes to the environment. These human-induced changes cause severe water emergencies and gradually deteriorate water quality over time, which causes a number of environmental problems as well as human health issues (Ruidas et al. 2022). Increases in water use and contamination are directly correlated with dense population and quick unplanned urbanization in urban areas. Due to overuse and rapid contamination of groundwater, the availability of groundwater that is safe for consumption has given rise to major concerns in many regions of the world (Subba Rao and Chaudhary 2019; Al-Abadi 2017). As a result, the safety of the groundwater supply depends on the quality of the groundwater (Wei et al. 2021). Access to sufficient and safe water is currently a challenge for many nations. Global climate change, the growth of the human population, and decline in precipitation continuously contribute to an ongoing rise in groundwater demand. Therefore, one of the most crucial and required steps to be performed to maintain safe water supply and public health of consumers is the quantitative and qualitative management of natural water resources (Amiri et al. 2021b; Egbueri 2020). Sustainable management of water resources is primarily based on comprehensive water quality research.

Undoubtedly, the management and conservation of water resources can be accelerated and improved with the help of water quality studies. For various reasons, water evaluation and management in Nigeria and other regions of the world have been investigated by numerous authors. Information regarding water types, classification, and suitability for various uses, including irrigation, drinking, bathing, and industry, can be found in hydrogeochemical data (Snousy et al. 2021; Egbueri et al. 2021). The geochemical processes that occur in aquifer materials affect the chemistry of groundwater. When groundwater flows from recharge to discharge locations, its physical and chemical properties are altered, making the water unfit for human use (Farid et al. 2017; Ali et al. 2018). Surface processes, on the other hand, impact surface water bodies, which are the ready recharge systems of groundwater aquifers. Additionally, hydrogeochemical analysis reveals the water’s chemistry in relation to source, climate, geologic, and anthropogenic inputs (Karmakar et al. 2021; Omeka et al. 2022). The weathering of rock-forming minerals and secondary factors like unrestricted water use, a lack of water recharge, wastewater discharge, industrial and agricultural activities, excessive fertilizer use, significant evaporation, and little rainfall have recently put both surface and underground water resources in danger (Egbueri et al. 2019; Karmakar et al. 2021; Ruidas et al. 2022). For determining the hydrogeochemical mechanisms and assessing water quality, the measurement of water physicochemical parameters, such as temperature, pH, main ions, and electrical conductivity, is crucial. In recent years, aside from the hydrogeochemical assessment and modeling of water resources, there has also been a lot of interest in researching the effects of water quality on human health (Li and Wu 2019; Snousy et al. 2021; Egbueri and Mgbenu 2020; Ayejoto et al. 2022; Agbasi and Egbueri 2022).

Access to safe drinking water is a known human right (regardless of wealth, age, tribe, and race), and could promote public health. The World Health Organization (WHO) reports that water demand and extraction from these various sources have dramatically grown in recent decades, impacting water quality and raising health risks. Due to the toxicity and carcinogenic character of the pollutants, a large number of people are exposed to a range of health concerns. It has been reported that pollutants in water are responsible for 70% of diseases and 20% of cancer cases worldwide (Ruidas et al. 2022). While many trace elements necessary for human health are present in groundwater, excessive consumption of these elements can have negative consequences on the body, including reproductive issues, neurological disorders, breathing issues, lung illness, cancer, and hypertension (Ukah et al. 2019; Egbueri 2020). For instance, nitrate (NO3) is a crucial compound for plant growth; however, its excessive amounts in water can result in methemoglobinemia (Wongsanit et al. 2015; Egbueri 2020). The health of several human populations is often risked due to lack of effective testing and mitigation strategy for drinking water supplies. However, accurate water testing and human health risk evaluation can be very helpful in identifying problematic locations. The human health risk is a key assessment tool that may link a number of hydrogeochemical characteristics to human health and can pinpoint an area’s potential health risk zones. To evaluate the health hazards posed by pollution, hazard quotient, hazard index, and total hazard quotient values are frequently utilized (Baig et al. 2011; Egbueri 2020; Ukah et al. 2019; Wei et al. 2021; Ayejoto et al. 2022).

Several other methods have been used by researchers to study the contamination and management of natural water resources. With respect to the degree of contamination, many scientists classify water using a variety of indexing techniques. Numerous researchers have used diverse strategies up to this point (such as water quality indices, multivariate statistical techniques, and data-driven artificial intelligence techniques) (Islam et al. 2017; Wu et al. 2014, 2020; Liu and Han 2020; Botheju et al. 2021; Egbueri and Agbasi 2022a, b, c). Even though the use of data-driven artificial intelligence (which is emerging as a capable and promising multi-functional approach to all research fields) is relatively new in water research, the interpolation method is still utilized in water quality evaluation. Due to their improved performance over conventional statistical analysis, various data-driven artificial intelligence and machine learning techniques (including regression models, artificial neural networks, support vector machine, Monte Carlo simulation, etc.) have been used widely during the past three decades to evaluate the quality of water resources (Onjia et al. 2022; Ozel et al. 2020; Egbueri and Agbasi 2022a, b, c). Due to their high precision in predictive modeling, many artificial intelligence and machine learning algorithms are widely applicable. These algorithms have been rated as reliable alternative and economical approach for water quality monitoring and assessment (Egbueri 2021; Ozel et al. 2020).

In Nigeria, several investigations on water quality have revealed that surface water and groundwater sources have been polluted and remarked as unsuitable for human use (Ighalo et al. 2020; Ighalo and Adeniyi 2020). However, the application of data-driven artificial intelligence and machine learning algorithms in water quality studies across the country is very limited. The available research studies have primarily concentrated on investigating the hydrogeochemistry and health impacts of potentially toxic elements/metals on humans and ecosystems. Overall, the studies that unraveled hydrogeochemical processes and evolution are far more in number than those that calculated the risks posed to human health by contaminated water. Few studies have been conducted on human health risk based on physical and ionic properties of water, despite the fact that several studies have assessed water quality in Nigeria. Ingestion of NO3-contaminated water resources can often pose a serious threat to people, particularly through two exposure routes: the first is through drinking water or other oral channels, and the second is through cutaneous (dermal) interactions (Adimalla 2020; Adimalla and Qian 2019; Rahman et al. 2021). Several studies across the globe have calculated the health risks of NO3-contaminated water (Adimalla and Qian 2021; Karunanidhi et al. 2019; Chen et al. 2017a, b). However, only few studies did so in Nigeria (Unigwe et al. 2022). On a global scale, studies that have investigated the health risks of sulfate (SO4) and TH of water resources are scarce, despite the fact that these physicochemical parameters are often tested by water quality researchers during hydrogeochemical studies. Generally, in Nigeria and many other parts of the world, there is a dearth of study on water quality in relation to studies of human health risks due to NO3, SO4, and TH.

Therefore, by developing several investigative graphical, conceptual, and computational models for the Ojoto Region (in southeastern Nigeria), the aim of this comprehensive research is to identify the key contamination mechanisms of NO3, SO4, and TH in water and establish their connections with human health risks. In this study, health hazard appraisal models were used to assess the toxicity of these contaminants based on the pathways that the water is utilized and the rate at which each contaminant is ingested in relation to various population groups. The specific objectives are to (1) combine descriptive and computational approaches in assessing the toxicity and human health risks of NO3, SO4, and TH; (2) explore the efficacy of graphical methods, Varimax-rotated factor analysis, and R-mode hierarchical clustering algorithm for unraveling their contamination mechanisms; (3) develop a conceptual diagrammatic model that explains the relationships between geochemical processes, anthropogenic perturbations, water quality, and human health risks; and (4) develop computational predictive multiple linear regression (MLR) and radial basis function neural network (RBF-NN) models for estimating NO3, SO4, and TH in the water resources. Most previous studies did not compute hazard index values for the different exposure pathways as done in the present paper. This study also introduced cumulative hazard index for wholistic nitrate risk assessment. Moreover, this research is the first of its kind to combine MLR and RBF-NN in the prediction of NO3, SO4, and TH. Many researchers and policymakers would be able to learn more from baseline information provided in this paper. This study would also serve as a theoretical foundation for the prevention and management of the unsafe, contaminated water resources by local decision-makers.

Materials and methods

Study area description

The present research study was conducted in Ojoto and its neighborhood within Anambra State in the southeastern part of Nigeria (Fig. 1). Onitsha, a major industrial and commercial hub in Nigeria, lies a few kilometers distant from the study area. Agricultural lands, commercial and industrial zones, and human settlements are common land use patterns in the region. Parts of the study area are now home to businessmen and industrialists from Onitsha city, accelerating the area’s population expansion. Among the rising sectors in this region are those related to food, water, automobile parts, and petrochemicals. Water is typically distributed in the research region using a variety of plumbing materials consisting of steel, iron, polymer of vinyl chloride, and plastic, in both homes, industries, and commercial venues (including markets). In the research area, there are two main atmospheric seasons: the wet season (which is known to usually spans from April to October) and the dry season (which is known to usually spans from November to March, every year). When the weather is dry, temperatures can get to 32 °C or more (Egbueri et al. 2019). In rainy seasons, annual rainfall is between 1500 and 2000 mm, although the driest months in this season might see little precipitation.

Fig. 1
figure 1

Maps showing the study area location and the dominant underlying geologic formations

The research region is underlain by two lithostratigraphic, geologic units (Fig. 1): the Nanka Formation (dated Eocene in age) and the Ogwashi Formation (dated Oligocene–Miocene in age) (Nwajide 2013). Friable sands, light-colored mudrocks, sandy shale, thin limestone beds, and ironstones are the dominating lithologies within the Nanka Formation (Nwajide 2013). Over 80% of the research region is covered by the Nanka Formation, which is a regressive deposit made up of the Abakaliki Anticlinorium’s folding-related parts of the 2nd stage of the Campanian-Eocene compressive tectonic motion (Nwajide 2013; Nwachukwu 1972). The Ogwashi Formation was deposited as a result of the 3rd stage of the tectonic motion, which is usually thought to have started at the end of the Eocene. The Ogwashi Formation is mostly made up of bands of limestone, ironstones, and lignite seams, as well as coarse-textured weakly to moderately cemented sandstone units (Nwajide 2013). Additionally, it has been determined that these two geological formations are abundantly producing aquiferous units with different depths to the aquifer systems (Akpoborie et al. 2011; Okoro et al. 2010a, b; Nfor et al. 2007). Nanka Sandstone’s aquifer’s typical depth has been estimated to be over 20 m in some areas of Anambra State (Nfor et al. 2007), whereas the Ogwashi Formation is estimated to be around 7 m (Okoro et al. 2010a, b). However, there is a dearth of literature describing a more current estimate of the aquifer depths across the studied area. Future hydrogeologic studies may have to consider this aspect of research in the area.

Water sampling, physicochemical testing, and data processing

The study’s objectives were met by randomly selecting twenty-eight water samples from springs, hand-dug wells, streams, and boreholes situated within the study area. Using a portable Geographical Positioning System, the coordinates of each sample location were captured and the spatial view of the sampling stations are shown in Fig. 1. In 1-L plastic bottles, the water samples were gathered and preserved prior to laboratory testing. The collection of water sample from every station was done after some few minutes the water was pumped from the well, to ensure the freshness of the samples. Sufficient quantity of water was collected from each station. The standard operating procedures of the American Public Health Association (APHA 2005) were adhered to during collection, sample preservation, and analysis in order to guarantee data quality and consistency. Pre-washing the water sample bottles with distilled water and acidifying them with weak hydrochloric acid were both done before the laboratory testing. The acidification, using 1 ml of concentrated HNO3, was to avoid cationic precipitation. Prior to physicochemical investigation, all of the collected samples were kept in refrigeration condition. A 0.45-µm milli pore cellulose acetate filter was used to filter the water samples before the analysis. The methods of sample preservation employed were designed to stop any unintended chemical reactions or biological growths in the water, before the physicochemical investigation.

In the water samples for this investigation, several variables, like pH, temperature (T), electrical conductivity (EC), K+, Na+, Ca2+, Mg2+, Cl, HCO3, TH, SO42−, and NO3, were tested. A handheld analysis kit (HM Digital COM-100) was used to assess the pH and EC at the sampling locations. The TH was tested using ethylenediaminetetraacetic acid (EDTA) titrimetric method. NO3 concentration was tested with spectrophotometry technique. The HCO3 was measured using H2SO4-based water titration, whereas the Cl was measured using an AgNO3-based titration. The concentrations of SO42− were determined using the turbidity technique with BaCl2. The Ca2+ and Mg2+ were estimated using volumetric techniques (0.05N and 0.01N EDTA), whereas the K+ and Na+ contents were measured using a Systronics Flame Photometer. Additionally, the laboratory analysis also followed the standards and guidelines recommended by APHA (2005) for the analysis of each parameter. Ion balance error (IBE) estimated the validity of physiochemical data (Eq. 1). IBE values ≤ 5% were obtained.

$$\mathrm{BE}=\frac{\sum cations - \sum anions}{\sum cations + \sum anions} \mathrm{x} 100$$
(1)

Health risk assessment

Health risk due to nitrate

The comprehensive technique used in this study for the assessment of human health risk was initially proposed by the US Environmental Protection Agency (USEPA 1989, 1997). In this study, three categories of the exposed population, including children, females, and males, were assessed for risk.

The following formulas were used to calculate the non-carcinogenic health risk from oral ingestion of NO3 in the water resources (Adimalla and Qian 2021; Karunanidhi et al. 2019; Chen et al. 2017a, b; Unigwe et al. 2022).

$$\mathrm{CDI}= \frac{\mathrm{C\times EF \times ED \times IR }}{\mathrm{ABW \times AET}}$$
(2)
$${\mathrm{HQ}}_{oral}= \frac{\mathrm{CDI}}{\mathrm{RfD}}$$
(3)

The explanations of the parameters in Eqs. 2 and 3 are presented in Table 1.

Table 1 Values and units of parameters for nitrate health risks assessment (oral and dermal routes)

The following formulas were used to determine the non-carcinogenic health risk associated with cutaneous contact of NO3 in the water resources (Adimalla and Qian 2021; Karunanidhi et al. 2019; Chen et al. 2017a, b; Unigwe et al. 2022).

$$\mathrm{DAD}= \frac{\mathrm{C }\times \mathrm{ TC }\times {\mathrm{K}}_{\mathrm{i}} \times \mathrm{ CF }\times \mathrm{ EV }\times \mathrm{ ED }\times \mathrm{ EF }\times \mathrm{ SSA}}{\mathrm{ABW }\times \mathrm{ AET}}$$
(4)
$${\mathrm{HQ}}_{dermal}= \frac{\mathrm{DAD}}{\mathrm{RfD}}$$
(5)

The HIoral of nitrate was calculated using Eq. 6 whereas the HIdermal of nitrate was obtained using Eq. 7. It is pertinent to mention that most previous studies did not compute HI values for the different exposure pathways.

$${\mathrm{HI}}_{oral}= {\sum_{i=1}^{n}}\left({\mathrm{HQ}}_{\mathrm{oral\;}(\mathrm{child})}+ {\mathrm{HQ}}_{\mathrm{oral\;}(\mathrm{female})}+ {\mathrm{HQ}}_{\mathrm{oral\;}(\mathrm{male})}\right)$$
(6)
$${\mathrm{HI}}_{dermal}= {\sum_{i=1}^{n}}\left({\mathrm{HQ}}_{\mathrm{dermal }(\mathrm{child})}+ {\mathrm{HQ}}_{\mathrm{dermal }(\mathrm{female})}+ {\mathrm{HQ}}_{\mathrm{dermal }(\mathrm{male})}\right)$$
(7)

The total hazard index of nitrate in the water resources for infants, females, and males was computed using Eq. 8.

$${\mathrm{HI}}_{Total\;(nitrate)}= {\sum\nolimits_{i=1}^{n}}\left({\mathrm{HQ}}_{\mathrm{oral}}+\;{\mathrm{HQ}}_{\mathrm{dermal}}\right)$$
(8)

The hazard index, abbreviated HI, indicates the non-carcinogenic health risk and is represented by the value of HI. A HI number less than 1 implies an acceptable level of human health risk, while an HI value more than 1 indicates a potential risk to human health from nitrate contamination (USEPA 1997, 2002).

Health risk due to sulfate and total hardness

In this study, the human health risks due to sulfate (SO4) and TH were evaluated using qualitative and descriptive approach, and not computational approach. This is due to lack of standard health-based guidelines and mathematical explanations of key parameters for possible computational approach for SO4 and TH health risks assessment. However, there are some documented research works in the literature that have investigated and reported the health concerns of these parameters (Sharma and Kumar 2020; Peterson 1951; Morris and Levy 1983; Cocchetto and Levy 1981).

Computational algorithms for unraveling contamination mechanisms

R-mode hierarchical clustering

The R-mode agglomerative hierarchical clustering has been discovered to offer insights into the degree of correlation between examined water quality metrics, much like the factor analysis and Pearson’s correlation analysis. In several previous studies, water quality contamination mechanisms and source apportionment have effectively been achieved by numerous research scholars using this clustering technique (Egbueri 2018). In this modeling, the Ward (1963) linkage approach and squared Euclidean distance were applied. Additionally, the datasets were normalized using the Z-score transformation technique (Eq. 9). The same cluster of water quality variables is therefore assumed to have similar or identical anthropogenic or geological influences or closer sources of pollution. In this investigation, only one R-mode dendrogram was used to identify the contamination mechanisms and sources. All of the tested physicochemical variables for the water quality assessment were used in the R-mode hierarchical clustering. IBM SPSS (v. 22) was used for this investigation.

$${\mathrm{Z}}_{z}= \frac{\mathrm{X} - {\upmu }_{x}}{{\upsigma }_{x}}$$
(9)

where X represents the individual variable value, \({\upmu }_{x}\) represents the average value from the row, and \({\upsigma }_{x}\) represents the standard deviation of variable values in the row.

Varimax-rotated factor analysis

Datasets are frequently intricate and multifaceted in character, like the water quality dataset. As a result, it is frequently challenging to immediately understand such datasets. But dimensionality reduction machine learning tools have frequently shown to be helpful for more understandable evaluation and interpretation of water quality. For the purpose of dimensionality reduction and interpretation of water quality datasets, principal component extractions were generated by Varimax-rotated factor analysis. This analysis was done with SPSS (v. 22) to complement the result of the R-mode hierarchical cluster analysis. First, the dataset’s appropriateness for principal component analysis (PCA) was evaluated using Bartlett’s test of sphericity (BTS), which identifies correlation matrices, and Kaiser–Meyer–Olkin (KMO) score, which assesses sampling adequacy. For PCA to be effective, studies have stated that the KMO sampling adequacy for the set of variables must be more than 0.50 (Liu et al. 2003). If the KMO is greater than 0.5, the factor analysis is appropriate to provide a significant reduction in data size.

The KMO value to confirm the sampling adequacy in the present study was 0.673. For this water quality investigation, the BTS (χ2) agrees with p(Sig.) 0.0001 (i.e., BTS ≈ 0). According to these results, the PCA-based Varimax-rotated factors can be implemented and the water quality dataset is valid (Ustaoğlu and Tepe 2019; Aguirre et al. 2019; Long and Luo 2020). Based on eigenvalues > 1, significant factor classes crucial for the interpretation of the water quality dataset were identified. Twelve factor classes were identified in this study. However, only five classes with eigenvalues > 1 were selected as illustrated in the scree plot of Fig. 2. It is important to note that the scores from the factor analysis were divided into three groups. Factor loadings below 0.50 were regarded as weak and inconsequential, but loadings between 0.50 and 0.75 were regarded as moderate and substantial. Strong loadings were encountered in ranges greater than 0.75, though. Strong loadings are typically considered to be very significant and to explain more specifics about a particular dataset (Egbueri and Agbasi 2022a).

Fig. 2
figure 2

Scree plot showing that only five components (factor classes) with eigenvalue > 1 are valid for this study

Computational algorithms for predictive modeling of TH, SO4, and NO3

Radial basis function neural networks

In an RBF-NN, learning happens in two stages. The output layer comes first, followed by the hidden layer. Instead of being mapped from the input to the output, the parameters of the hidden layer are decided by the distribution of the inputs (Emami et al. 2020). On the other hand, the output layer uses supervised learning to determine its parameters. A single variable makes up each input unit in the hidden unit (Niroobakhsh et al. 2012). The hidden layer is where RBF-NN and MLP-NN differ most significantly from one another. Both structural and functional differences exist in this scenario. The radial functions of the neurons in the hidden layer account for the functional difference, while the training procedure and the number of neurons account for the structural difference (Ozel et al. 2020). Contrary to the MLP-NN, the RBF-NN does not need parameter learning. A linear weight change is carried out for the radial functions. A trial-and-error approach is used to determine the proper neurons’ number in the hidden layer and the spread numbers in return (Ozel et al. 2020).

An RBF-NN has a different architecture than other neural networks. In recent decades, RBF-NN has seen increasing use in load forecasting, pattern recognition, signal processing, and water quality predictions, etc., due to its straightforward structural design and training effectiveness (Panda et al. 2004; Egbueri and Agbasi 2022c). In comparison to back-propagation neural networks, RBF-NN provides a number of benefits, including a guaranteed learning procedure and a high processing speed because it has just one hidden layer (Panda et al. 2004). However, different hidden layers in the RBF-NN architecture can be employed to carry out the error back-propagation function (Panda et al. 2004). These layers, however, are optional and hardly used. The fact that curve-fitting approximation is used to build RBF-NN in a highly dimensional space to provide the best fit to training data, as opposed to back-propagated networks, is another advantage of RBF-NN over back-propagated networks (Panda et al. 2004). Using exponentially degraded nonlinear functions, RBF-NN creates approximations to nonlinear input–output mapping, whereas MLP-NN can only produce a wider approximation. Thus, the mapping function of an RBF-NN is primarily composed of Gaussians (Eq. 10) rather than sigmoid as it is in the case of an MLP-NN (Ozel et al. 2020; Niroobakhsh et al. 2012; Egbueri and Agbasi 2022c). Localized approximation in input–output mapping is always preferred (Haykin 1999) in RBF-NN. Net convergence is not guaranteed in an MLP-NN, although it is in an RBF-NN. As a result, the RBF-NN is growing in acceptance despite the complexity involved with its computation and challenging ideal parameterization (Panda et al. 2004).

$$f(x)=\mathrm{exp}\left(-\frac{{\Vert x-{c}_{i}\Vert }^{2}}{{\sigma }_{i}^{2}}\right),i=\mathrm{1,2},\dots \dots ,k$$
(10)

where x represents input variable; c represents the distance from the gaussian function center; and σ represents the spread variable, which substantially influences the performance of the neural network.

Three layers make up a standard RBF-NN: input, hidden, and output layers. These layers are all connected by feedforward arcs in their entirety (Panda et al. 2004). For a specific number of tuned units that are completely coupled to a linear output layer, an RBF-NN’s structure consists of a single hidden layer. Literature searches in the course of the present study indicated that RBF-NN has not been used to model the three water quality parameters (TH, SO4, and NO3) before. Using the methodological instructions listed in Table 2, the RBF-NN modeling was used in this study to simulate and predict the concentrations of TH, SO4, and NO3. For this purpose, three feedforward RBF-NN models were created.

Table 2 Essential methodological commands for the development of the RBF-NN models

Multiple linear regression models

A common algorithmic technique for parameter prediction is MLR model. In several academic disciplines, the concept of soft computing has been heavily utilized. It has also been used to forecast parameters and indices linked to water quality (Egbueri and Agbasi 2022a, b, c). MLR modeling was basically included into this work to support and complement the RBF-NN modeling and forecast the TH, SO4, and NO3 by focusing on the linear relationships between the input and output variables. The combination of the RBF-NN and MLR models also reduces the biases that could be obtained when a single (standalone) predictive model is used (Egbueri 2022). Moreover, MLR is suitable for this study because of its capability to shed insight on the contributions made by each input parameter in predictive research (Egbueri et al. 2022). When compared to other soft computing methods, the MLR is also quite easy to implement. This suggests that even someone without specialized artificial intelligence knowledge may use it to assess and forecast water quality. MLR requires little computational resources, suggesting that it is simple to compute even in fundamental, low-memory systems. MLR is adaptable in addition to being straightforward. This suggests that it can allow several input parameters to participate in regression modeling at once. The mathematical formulation of the MLR model is expressed as Eq. 11 (Weisberg 1985; Gaya et al. 2020; Egbueri and Agbasi 2022a, b, c).

$$y= {b}_{0}+{b}_{1}{x}_{2}+{b}_{2}{x}_{2}+\dots +{b}_{i}{x}_{i}+\varepsilon$$
(11)

where xi represents the value of the ith predictor, b0 represents the regression constant, and bi represents the ith predictor’s coefficient. Finally, ε represents the error for each unique i (Agbasi and Egbueri 2022; Chen and Liu 2015).

In this modeling, the three focal parameters (TH, SO4, and NO3) were predicted and three MLR models were consequently produced for each of the water quality parameters. The models were all developed using the IBM SPSS (v. 22).

Verification of RBF-NN and MLR models

Residual error, sum of the square errors (SSE), relative error (RE), and R2 (determination coefficient) were used in this work to evaluate the accuracy and reliability of the RBF-NN models. Equations 12 through 15 provide mathematical expressions for each of these statistical model performance indicators. The three RBF-NN frameworks created in this study would give water managers and hydrogeologists the ability to choose the best analytical parameters for sustainable management of the drinking water resources using the principles of artificial intelligence.

$$\mathrm{Residual\; error}= {X}_{measured\; value}- {X}_{predicted\; value}$$
(12)
$$\mathrm{SSE}={\sum_{i=1}^{n}}{\left({X}_{measured\; value}- {X}_{predicted\; value}\right)}^{2}$$
(13)
$$\mathrm{RE}=\frac{\left({X}_{measured\; value} - {X}_{predicted\; value}\right)}{{X}_{measured \;value}}$$
(14)
$${\mathrm{R}}^{2}=1- \frac{{\sum_{i=1}^{n}({X}_{predicted\; value} - {X}_{measured\; value})}^{2}}{{\sum_{i=1}^{n}({X}_{predicted\; value} - {X}_{average\; value})}^{2}}$$
(15)

where n represents the number of data points.

Furthermore, several statistical methods were also used to evaluate the accuracy, validity, and reliability of the three MLR models developed in this study. They include R2 (Eq. 15), adjusted R-square (R2adj, Eq. 16), standard error of estimate (SEE, Eq. 17), and multiple correlation coefficient (R, Eq. 18). It is important to mention that the SEE is the same thing as the root mean square error.

$${R}_{\mathrm{adj}}^{2}=1- \left[\frac{\left(1- {R}^{2}\right) \left(n-1\right)}{n-k-1}\right]$$
(16)

where n represents the number of data points; k represents the number of input parameters.

$$\mathrm{SEE}= \sqrt{\sum\nolimits_{i=1}^{n}\frac{\left({X}_{measured\; value} - {X}_{predicted\; value}\right)}{n}}$$
(17)
$$R=\frac{n\sum \left(y \cdot {y}{^{\prime}}\right) -\left(\sum y\right) \left(\sum {y}{^{\prime}}\right) }{\sqrt{\left[n\left(\sum {y}^{2}\right)-{\left(\sum y\right)}^{2}\right] \left[n\left(\sum {y}^{\mathrm {^\prime}{2}}\right)-{\left(\sum {y}{^{\prime}}\right)}^{2}\right]}}$$
(18)

where y represents the measured value; y’ represents the predicted value; n represents the number of data points.

The R, R2, and/or R2adj statistics can be used to represent the variance and correlation between the predicted variables and the measured variables. The model’s performance, accuracy, and dependability are higher the closer R, R2, and R2adj values are to 1 (Egbueri and Agbasi 2022a; Menard 2000). The difference between the predicted variable values and the measured variable values of the RBF-NN models is described by the residual error scores, which are frequently visualized as residual plots. The RBF-NN prediction’s accuracy with respect to 0 increases as it approaches 0. Thus, residuals of 0 have perfect model precision. However, the model performs worse the larger the values on the y-axis of the residual error plot. The SSE conveys the accumulation of the magnitude of mistake (error) related to model prediction. The SSE, which is typically a positive number, is hence the sum squared difference between the predicted variable values and the measured variable values. Typically, the model performs better, is more accurate, and is more reliable the lower the SSE value. The RE can also be used to convey how closely the predicted values match the values of the measured variables. Similar to the SSE metrics, the lower the RE, the higher the model performance and reliability. Further, an idea of how well an MLR model fits a given set of data could be gotten from the SEE. Statistically, higher SEE values indicate bad MLR models while lower SEE values (that are much closer to 0) often indicate better models.

Results and discussion

Statistical distributions and general water quality characteristics

The traditional method for determining a general understanding of the quality of water resources involves the analysis of a number of physicochemical parameters. Understanding the statistical distribution and the peculiar characteristics of water resources is paramount to effective and sustainable management of water. The physicochemical compositions of the water in the research region and the WHO (2017) permissible limits of the physicochemical parameters are summarized in Table 3. Box-whisker plots were created to explain the statistical distributions and variations of the physicochemical variables. As can be seen from Fig. 3, there are different levels of variances in the concentrations of these physical and chemical parameters across the water samples. The SO42− content, in particular, showed the most striking variance between the sampling stations (Fig. 3). The research area’s pH ranged from 4.00 to 6.30, as shown in Table 3. Thus, it is adjudged that all the samples are categorized as weakly acidic water resources. This suggests that they may have a tendency to corrode metals, water distribution routes, and aesthetics (Egbueri 2022). Additionally, the acidic waters may have a negative impact on a person’s health, including gastrointestinal problems and degeneration of the mucous membranes (Egbueri et al. 2020). With respect to the information obtained from Fig. 3 a, it was noticed that the pH of the water resources had the lowest statistical variation in the study area. Humic acid from vegetative (organic) decays, as well as other geochemical processes, are the most likely influencers of the waters’ acidity (Egbueri et al. 2020).

Table 3 Water quality parameters for the present investigation
Fig. 3
figure 3

Box and whisker plots showing the statistical distribution and variation of a pH, T, EC, and TH; b Na+, K+, Ca2+, and Mg2+; c Cl, HCO3, and NO3

Water temperature (T), which ranged from 23 to 29 °C (Table 3), indicated that the analyzed water resources were in warm conditions. This could be a favorable state for some geochemical reactions to take place in the water environment. In general, drinking chilly water is considerably preferable to warm water (WHO 2017). Warm water is said to speed up the dissolving of minerals and promote the growth of potentially dangerous bacteria, which could affect the water’s color, taste, odor, and susceptibility to corrode (Egbueri 2019). Additionally, investigations have found that warmer seas typically have less oxygen, indicating anaerobic conditions (Egbueri 2019). In such a scenario, nitrates could be converted to nitrites by nitrate-reducing microorganisms and increasing the water’s iron concentration, which could then result in the discoloration of aerated pipe-borne water (Egbueri 2019). Water’s EC typically serves as a measure of the ions and dissolved solids that are present. Attributable to low level or rate of cationic dissolution in the water resources, the EC was seen to be well below the SON (2015) and WHO (2017) benchmark, ranging between 10.00 and 102.00 μS/cm (Table 3). The statistical information presented in Fig. 3 a indicate that the EC had the second highest level of statistical variation across the samples. The lack of mineralization in the water caused by geogenic and anthropogenic processes is indicated by the low EC values. The conductivity of each water sample is generally low. This is consistent with the water resources’ relatively low concentrations of salt-forming ions (particularly sodium and chloride ions), which are below their SON and WHO benchmarks (Table 3). All of the analyzed samples are excellent freshwaters for drinking, according to the low EC values (Langenegger 1990). Moreover, it is indicated that the inhabitants are not at risk due to high water conductivity, since the EC values are well within acceptable limits and standards. The values of TH of the local water resources, which frequently reflects the levels of Ca2+ and Mg2+ and the ability of a water sample to neutralize soap (Wu et al. 2017) during laundry, ranged from 6.00 to 52.00 mg/L (Table 3). All of the obtained TH values in this investigation were found to be within the national and international standards (SON 2015; WHO 2017).

Na+, K+, Ca2+, and Mg2+ values in the area ranged from 8.00 to 38.00 mg/L, 0.00 to 12.00 mg/L, 2.00 to 25.00 mg/L, and 0.00 to 16.00 mg/L, respectively. According to the maximum concentrations reported in mg/L, the relative abundance of the principal cations in the water resources is in the following order, in terms of their highest concentration values: Na+  > Ca2+  > Mg2+  > K+. The box plots of the cations, which show their graphical-statistical distributions and variations, are presented in Fig. 3 b. Na+ has the highest average value (of 16.79 mg/L) among the major cations, which is well below the 200 mg/L permissible level recommended by the SON (2015) and WHO (2017) for drinking water. According to several studies, sewage irrigation may be the cause of a water sample’s high Na+ concentration (Wu et al. 2013, 2017). However, the lower concentration of Na+ in the water in the current investigation is thought to be partly due to minimal water evaporation (Wei et al. 2021). Maximum K+ concentration is just 12 mg/L, which is likely due to limited mobility brought on by biological activities (Wu et al. 2017). While sodium is a crucial mineral supplement for the human diet, calcium is crucial for healthy bone and teeth growth (Egbueri 2019; Karunanidhi et al. 2020a). However, high sodium levels are linked to an increased risk of hypertension, heart problems, and nervous system disorders. Similar to this, drinking water with K+ concentrations over the recommended 12 mg/L (WHO 2017) (as in sample S16) can cause nervous system diseases (Egbueri et al. 2020; Karunanidhi et al. 2020a).

Cl, SO42−, HCO3, and NO3 have concentration ranges of 2.00–62.00 mg/L, 7.00–130.00 mg/L, 0.00–5.00 mg/L, and 0.00–18.48 mg/L, respectively, for the major anions (Table 3). The study’s findings show that all of the samples’ anion concentrations were within permissible ranges (SON 2015; WHO 2017). Therefore, the consumption of too many major anions does not put water resource consumers at risk for health problems. The order of the main anions, measured in mg/L, is SO42−  > Cl  > NO3  > HCO3. The average SO42− concentration in the region is 40.64 mg/L, which is far lower than the national and international standards for the quality of drinking water (250 mg/L for WHO (2017) and 100 mg/L for SON (2015)). Drinking water could have a bad taste if there is SO42− in it (SON 2015; Egbueri and Mgbenu 2020). Additionally, too much SO42− in water can clog pipes and make people sick with diarrhea. Also, too much HCO3 can affect how acidic water is. However, the extremely low quantities of HCO3 may be a sign that the location has little soil organic matter decomposing, releasing CO2, which is a significant source of water HCO3. Concentrations of Cl and SO42− are related to geogenic processes and evaporation rates. Overall, it was discovered that the major ions in the study region include SO42−, Cl, Na+, and Ca2+, which may be the result of mixed geochemical processes occurring in the aqueous environment of the study area. This is consistent with a previous study conducted in the region by Egbueri et al. (2019), which reported that the research area is predominantly characterized by Na–Ca–SO4 water type and Na–SO4 water facies. A variety of human activities, however, have also been well-documented to have a significant impact on the quality of water suitable for human use. For instance, despite the comparatively low NO3 and Cl concentrations in the study area, it is likely that domestic, agricultural, and industrial activities play contributory roles in the NO3 contamination (Egbueri and Agbasi 2022a, b; Wei et al. 2021). The WHO’s (2017) and SON’s (2015) safe drinking water standard for nitrate is 50 mg/L. When it occurs in excess of this value, nitrate can threaten and impact the health conditions of both children and adults (males and females). Nevertheless, studies have shown that children are the human population mostly affected by nitrate contamination and pollution (Unigwe et al. 2022; Panneerselvam et al. 2021; Egbueri and Mgbenu 2020).

Toxicity and health risk assessments of NO3, SO4, and TH

It is evident that variations in geochemical processes, anthropogenic activities, and contamination of water resources are related to changes in human (and even animal) health. All over the world (especially in under-developed and developing semi-arid and arid regions), these stand out as some of the biggest challenges people, researchers, and policymakers continuously have to deal with. The toxicity and human health risks of NO3, SO4, and TH were evaluated and described in this study. These three are the focus parameters in the current investigation. Although by comparison to their respective standard limits the water resources were generally said to be pollution-free, the contamination levels in some samples are a threat to human health. Line graphs and box plots were used to demonstrate the statistical trends and variations of the NO3, SO4, and TH (Fig. 4). It was noticed that the concentrations of these parameters across the samples varied in trends and distributions. However, as seen in Fig. 4, SO4 and TH had similar trends in some samples whereas NO3 had a striking variation in trend.

Fig. 4
figure 4

Depiction of the statistical trends and variations of the focus parameters, TH, SO4, and NO3, using a line graphs and b box and whisker plots

Toxicity and health risk due to nitrate

Since nitrogen molecules are crucial for plant growth, agricultural productivity, and food security, they are required components of water. But nitrate (NO3), a well-known environmental pollutant, is produced both naturally and as a result of a number of human activities. In the majority of natural water environments, NO3 contributes significantly to the ionic charges. Thus, international guidelines and regulations utilize NO3 ions due to its negative effects on people of different sexes and age groups (Torres-Martnez et al. 2020). However, prolonged exposure to high nitrate concentrations can result in major health risks for adults as well as children, including methemoglobinemia and stomach cancer (SON 2015; WHO 2017; Datta et al. 1997; Unigwe et al. 2022; Karunanidhi et al. 2019; Panneerselvam et al. 2021). Results of the computed human health risks due to nitrate-contaminated water ingestion and dermal contact are presented in Tables 4 and 5, respectively. The computed total health hazard index (THI) of NO3 in the study area is summarized in Table 6. The CDI, HQ, and HI values for oral and dermal health risks and the THI were calculated for males, females, and children in the study area. Bar charts showing the demarcation of the water quality status, with respect to the NO3 HI (risk due to oral and dermal pathways) and THI, are represented by Fig. 5 a and b, respectively.

Table 4 Human health risk results due to nitrate ingestion by children, females, and males
Table 5 Human health risk results due to nitrate dermal contact by children, females, and males
Table 6 Total hazard indexes for infants, females, and males due to nitrate contamination
Fig. 5
figure 5

Charts showing water resources in safe or unsafe category based on nitrate, a HI assessed for oral/dermal, b THI assessed for various age groups

Traditionally, HQ and HI values < 1 indicate that the analyzed water is safe and pose low and negligible non-carcinogenic health risk (Ayejoto et al. 2022; Agbasi and Egbueri 2022). On the other hand, HQ and HI values > 1 indicate substantial human health risk due to use of NO3-contaminated water (Unigwe et al. 2022; Panneerselvam et al. 2021; Karunanidhi et al. 2019; USEPA 1997, 2002). Children’s HQ values for the health risk due to ingestion (oral route) varied from 0.00 to 0.7700, with a mean of 0.0818. (Table 4). The average HQ value for females was 0.0557, with values ranging from 0.00 to 0.5250. (Table 4). Males’ HQ values varied from 0.00 to 0.1770, with an average of 0.0189 (Table 4). Additionally, a careful analysis indicated that the HQ results for children, females, and males were < 1 and within USEPA’s (1997, 2002) acceptable limits. However, higher HQ values were observed in S16 and S26 (Table 4). This indicates that they were possibly the most predisposed water sources that received the highest impact from anthropogenic activities and related geochemical reactions that affect nitrate enrichment. According to the HQ values, children > females > men is the order for non-carcinogenic risk of exposure in the research area. Similar trends have been reported by previous researchers from different parts of the world (Unigwe et al. 2022; Panneerselvam et al. 2021; Chen et al. 2017a, b; Adimalla and Qian 2021; Karunanidhi et al. 2019), indicating that children are the most vulnerable. The HI for oral pathway was computed by summing the oral HQs for the three populations. The HI result is also presented in Table 4 and graphically illustrated in Fig. 5 a. The oral pathway’s HI value ranged from 0.00 to 1.4727, with an average value of 0.1564 (Table 4). It was realized that HI levels for water samples S16 and S26 were greater than 1, indicating that they are unsuitable due to the greater content of NO3 (Fig. 5a). These samples represent the most long-term risk to the area’s water resource consumers.

The HQ values for the children population varied from 0.00 to 0.00495 with a mean of 0.00053 for human health risk owing to absorption (dermal pathway) (Table 5). For females, the HQ values ranged from 0.00 to 0.00187, with an average of 0.0002 (Table 5). The mean HQ value for males was 0.00017, and the range for HQ values was 0.00 to 0.00158 (Table 5). Similar to the oral health risk expositions, a careful analysis indicated that the dermal HQ results for children, females, and males were < 1 and within the acceptable limits. Moreover, higher HQ values were also observed in S16 and S26 (Table 5). Meanwhile, the risk due to dermal route is far lesser than that due to oral route. Nevertheless, the chronic risk level for the exposed people in the study region is still in the following descending order: children > females > males, with regard to the dermal HQ values. Similar observations were also reported by previous studies (Unigwe et al. 2022; Panneerselvam et al. 2021; Chen et al. 2017a, b; Karunanidhi et al. 2019). The HI for dermal pathway was computed by summing the dermal HQs of the three populations and the result is also shown in Table 5 and graphically illustrated in Fig. 5 a. The average HI value for the dermal route is 0.0009, ranging from 0.00 to 0.0084 (Table 5). Water samples S16 and S26 had the highest HI values (Table 5; Fig. 5a), indicating that they are the most predisposed that could get unfit for domestic, laundry, and sanitary purposes, which should be effective remedial measures against nitrate pollution not implemented in the study area.

The THI was computed by summing the oral and dermal health risks, to provide a more detailed view of the nitrate health risk for each group in the study region. The THI results for children, females, and males are represented in Table 6 and graphically depicted in Fig. 5 b. Between 0.00 and 0.7750, with a mean of 0.0823, were the THI values for the population of children (Table 6). The average THI for females was 0.0559, while the range for THI values was 0.00 to 0.5269 (Table 6). With a mean of 0.0190, the THI scores for men varied from 0.00 to 0.1793 (Table 6). It is seen from Table 6 and Fig. 5 b that all the three population groups had THI values < 1. However, cumulative THI values were computed by simple summation of the THI values for the three population groups. This was done to generate an overall, wholistic view of the health risk of nitrate in all the water resources in the study area. From the THIcumulative result (Table 6), it was realized that, overall, S16 and S26 had values > 1. This is consistent with the HIoral result. Meanwhile, it was also noticed from Pearson’s correlation analysis that the results of the HQ, HI, and THI maintained perfect agreements in their trends (Table 7), even though varied input parameters were used for the health risk computations for the three population groups. Additionally, simple linear regression analysis performed on the health parameters (graphs not presented in this paper) was consistent and confirmed the findings of the Pearson’s correlation matrix. The R2 values of the linear regression models were all equal to 1.

Table 7 Pearson’s correlation matrix showing the perfect agreements in trends of the key health risk parameters for nitrate

So far, varied degrees of nitrate contamination have been recorded in the study area. Thus, the users of the water resources are also predisposed to varied levels of toxicity and health risk. Methemoglobinemia, also known as “blue baby syndrome,” a potentially lethal condition, and elevated NO3 contents in water are linked (Egbueri and Mgbenu 2020). The “blue baby syndrome” condition is caused by infants consuming excessive amounts of nitrate-contaminated water compared to adults (Tian and Wu 2019; Egbueri and Agbasi 2022b). The “blue baby syndrome” condition typically affects infants younger than 6 months, and it results in bluish skin because nitrate reduces hemoglobin’s ability to transfer oxygen through the blood (Skold and Klein 2011). The body weights of babies and children may also decrease significantly due to the elevated NO3 concentration in samples S16 and S26 (Tables 3 and 4; Fig. 5). Moreover, the presence of NO3 in these samples can encourage the growth of microorganisms, which could have toxic effects on human health (Egbueri 2020). High infant mortality, rashes, severe cyanosis, thyroid issues, goiter, congenital deformities, chromosomal anomalies, hypertension, and headaches are some other health consequences linked to excessive long-term nitrate intake through water (Rahman et al. 2021; Egbueri 2020). Unfortunately, the non-carcinogenic risk of NO3 in drinking water is almost indomitable in many regions of the world where a sizable population relies only on a variety of water sources without first checking for suitability and health-related risks (Rahman et al. 2021).

Toxicity and health risk due to sulfate

The minerals barite (BaSO4), gypsum (CaSO4·2H2O), and epsomite (MgSO4·7H2O) all naturally contain sulfates. Many drinking waters’ mineral concentration is influenced by these dissolved minerals. Sulfate is virtually always present in natural water, where it can be found in varied quantities, ranging between a few tenth mg/L and a several thousand mg/L (WHO 2004). Freshwater bodies can become salinized by sulfate, which is why it is regarded as a pressing environmental issue (Cañedo-Argüelles et al. 2013). However, the problems associated with sulfates in water are completely ignored in many countries of the world; therefore, the impact has remained understated. According to Sharma and Kumar (2020), at a global scale, research on sulfate contamination in water system is limited to few numbers only. Although no health-based recommendation values have been proposed for SO42− in drinking water, as concentrations in the water rise above 500 mg/L, there is a greater chance that complaints may arise from discernible bitter taste (WHO 2004). In this study, the health-related concerns of SO42− in drinking water resources are evaluated. As have been mentioned earlier, the concentrations of SO42− in the water resources ranged between 7.00 and 130.00 mg/L (Table 3). In this area, it was noticed that SO4 ions were above the national SON (2015) standard limit of 100 mg/L in some water samples (S06 and S24). It was also observed that samples S3, S19, and S28 had elevated SO42− concentrations close to the 100 mg/L benchmark. Meanwhile, all of the samples were found within the acceptable limit of 250 mg/L by WHO (2017).

The analyzed water resources, which have been contaminated with SO42–, may leave a bad taste when ingested. However, the USEPA and other recognized bodies do not employ organoleptic qualities, such as taste, odor, and color, when creating primary water standards. Organoleptic characteristics can, however, be employed in the creation of secondary water standards for drinking purpose. Based on taste qualities, the USEPA (1984) set a secondary sulfate standard of 250 mg/L for drinking water. The taste threshold concentrations for various common sulfate salts have been reported, notwithstanding the dearth of genuine experimental evidence on the taste threshold for sulfate (WHO 2004; Zoeteman 1980; NAS 1977; Whipple 1907). Sulfate may also aggravate concrete deterioration and distribution system corrosion (WHO 2004; Sharma and Kumar 2020; Egbueri 2022; Omeka et al. 2022). Scaling in water pipes from sulfate can lead to pipes with reduced diameter (i.e., clogged pipes). Cleaning fabrics/cloths becomes more challenging when sulfate is combined with chlorine bleach. As a result, excessive sulfate in water supplies causes issues for both materials and human health.

With respect to the Nigerian standard limit, it is adjudged that the water samples with elevated SO42− could pose some health issues to people who use the water. The most typical side effect of being exposed to high SO42− concentrations is a laxative impact (Moore 1952; WHO 2004; Sharma and Kumar 2020). Sulfate’s laxative effects could be more dangerous for infants. As a result, water containing sulfates should not be used for baby purposes, such as drinking and feeding (Sharma and Kumar 2020). Unfortunately, acute SO42− exposures can cause laxative effects, although how severe they are may depend on the sulfate salt and how the SO42− amount is taken into the human system. For instance, Na2SO4 has a weaker laxative effect than MgSO4 (Morris and Levy 1983). However, research by Peterson (1951) revealed that the SO42− concentration that was connected to a laxative reaction reduced as the Mg2+ content in water increased. Although SO42− is not generally known to contribute to toxicity, it can lead to a number of health problems such as dehydration, catharsis, and diarrhea when consumed in larger amounts through water ingestion or food absorption. When the intestinal contents’ osmolality is higher than that of the interstitial fluids, osmotic diarrhea (soft stool) is the most outcome. The osmotic-induced diarrheas would be frequently accompanied by poorly absorbed dissolved substances such MgSO4, sorbitol, or lactulose (Stipanuk 2000). The presence of Mg2+ and SO42− in the water supply under study would exacerbate the diarrheal response because both ions have osmotically active properties. Also, sulphaemoglobin and methaemoglobin levels in the human body system may vary as a result of increased SO4 absorption from the water resources (Sharma and Kumar 2020). The amount of SO42− consumed affects how much of it is absorbed from the human intestine. According to Morris and Levy (1983) and Cocchetto and Levy (1981), the kind of cation associated with the SO42− may also affect absorption, with MgSO4 having lesser chances of being absorbed than Na2SO4. High dosages of SO42− that cause catharsis, however, have also been documented to frequently surpass intestinal absorption capacity and can be eliminated through human feces (WHO 2004). The primary determinants of the amount of SO42− reaching the human colon include the nature of the diet and intestinal absorption (WHO 2004), mostly carried out via the upper digestive system (Florin et al. 1991). With the continuous ingestion of the sulfate-contaminated water in the study area, human blood, but not tissues, may readily distribute inorganic SO4 (WHO 2004). This is because the majority of SO4 present in human tissues is organic and biosynthetically integrated into macromolecules. However, studies have also shown that with increments in time, humans may adjust to increasing SO42− amounts (USEPA 1985).

Toxicity and health risk due to total hardness

The phrase “water hardness” refers to the quantity of ions, particularly sulfate and/or carbonate salts of Mg and Ca, that are present in water. The human body uses hardness minerals (e.g., Ca, Mg, Cl, SO4) for a number of vital metabolic processes or functions. Thus, some experts who study water quality disagree that water hardness poses a health risk. But because of mineral buildup on plumbing fixtures and subpar soap efficacy, hard water is often considered a nuisance. Before using water as industrial water, drinking water, or public water, this property is crucial to investigate. The water TH in this study ranged between 6.00 and 52.00 mg/L (Table 3). A number of TH classification systems have been reported in the literature. The most popular classification system seems to be the one reported by Sawyer and McCarthy (1967), McGowan (2000), and Botheju et al. (2021), illustrated in Fig. 6. Based on this classification scheme, all the water samples were classified as soft water resources. The U.S. Department of Interior and the Water Quality Association reported a scheme for classifying TH, having five categories as thus: soft water (TH = 0–17.1), slightly hard water (TH = 17.1–60 mg/L), moderately hard water (TH = 60–120 mg/L), hard water (TH = 120–180 mg/L), and very hard water (TH > 180 mg/L). With respect to this system, 67.86% of the water samples were grouped as soft waters whereas 32.14% were grouped as slightly hard waters. Ca and Mg bicarbonate salts, as opposed to chloride, sulfate, nitrate, phosphate, and silicate salts, are what give water its temporary hardness level (Rapant et al. 2020; Botheju et al. 2021). However, samples of hard water have higher concentrations of Ca and Mg chloride, sulfate, nitrate, phosphate, and silicate salts. Also, Rapant et al. (2020) reported a hardness classification as thus: soft water (Mg2+ 6.05 mg/L and Ca2+ 20.70 mg/L) and hard water (Mg2+ 26.40 mg/L and Ca2+ 70.12 mg/L).

Fig. 6
figure 6

A TH classification system showing that the water samples are soft waters

The majority of earlier research have interpreted water hardness based on secondary standards including the color, odor, taste, corrosivity, staining, and foaming qualities of water because there are no clear basic standards for health-related concerns of water hardness. Nevertheless, there are also several studies that have investigated the positive and negative human health implications of water hardness on heart disease, stroke, respiratory tract, gastrointestinal and cerebrovascular diseases, and cancers, etc. Over 60 years ago, the connection between drinking water hardness and the occurrence and death due to cardiovascular disease was first reported (Kobayashi 1957). Numerous pieces of international literature have corroborated this association throughout the last few decades (Rubenowitz–Lundin and Hiscock 2005; Catling et al. 2008; Gianfredi et al. 2017; Jiang et al. 2016; Rapant et al. 2017, 2020). Recent epidemiological research by Rapant et al. (2020) examined the ecological type and connection between drinking water hardness, public health status, and average life expectancy of Slovak Republic residents. Additionally, numerous research works have discussed the beneficial impacts of Ca and Mg in water on the occurrences and mortality of the metabolic syndrome or diabetes mellitus (Yang et al. 1999; Joslyn et al. 1990; Naumann et al. 2017). The current study has indicated that the analyzed water resources are in two main categories—soft water and slightly hard water—in terms of hardness values. Insights drawn from previous studies indicate that both soft water and hard water have their human health implications. For instance, the inhabitants who use the waters and are diagnosed with cancer could have their ailment impacted by the soft drinking waters (i.e., the low hardness waters) analyzed in this study (Ahn et al. 2007; Yang et al. 1998, 2000; Butler et al. 2010; Rapant et al. 2020). Moreover, the soft waters could influence amyotrophic lateral sclerosis, high blood pressure, neurological disturbances, preeclampsia in pregnant women (Rosborg and Kozisek 2020; Rapant et al. 2020), bone/teeth development issues in children, and bone fracturing (Huang et al. 2018, 2019; Dahl et al. 2013).

On the other hand, the analyzed water resources classified in the slightly hard water category could provide much more health benefits than the soft waters. Previous studies have indicated that natural waters with elevated hardness could improve the health status of humans than low hardness natural waters, in terms of cardiovascular, digestive tract, oncological, and respiratory system well-being (Rapant et al. 2020). It has also been reported that longer life expectancy is associated to drinking water with elevated hardness than with low hardness drinking water (Rapant et al. 2020). This discrepancy could be linked to deficiency of Ca2+ and Mg2+ mineral salts in soft waters (Rapant et al. 2020). Previously, Mukate et al. (2019) reported that not only is excess of physicochemical parameters in water hazardous to human health, but also is the deficiency of some mineral elements. With this perspective, they developed an integrated water quality index that considers the excessive and deficient concentration levels of elemental and mineral compositions of water.

Computational and graphical methods for unraveling contamination mechanisms

R-mode hierarchical clustering

Various contamination sources influence the suitability of the drinking water resources in the study area. It is revealed in this study that the plotted R-mode dendrogram has four main contamination cluster groups as illustrated in Fig. 7. The first cluster (C I) comprised of TH, Ca, EC, and Mg. This physicochemical cluster represents a group predominantly influenced by geogenic processes. The strong association between these variables indicates that the bivalent cations (Ca2+ and Mg2+) were possibly originated from the weathering and dissolution of silicate rocks and minerals in the study area. Similar observations have been reported in the literature (Omeka and Egbueri 20222022; Subba Rao and Chaudhary 2019; Mukate et al. 2019). While Ca and Mg are known to influence the TH of the water resources, the Ca, Mg ions, and TH seem to influence the water conductivity (Egbueri and Agbasi 2022a). Nevertheless, as indicated by the two sub-clusters formed in the first cluster (Fig. 7), Ca2+ concentration is thought to influence the TH and EC more than the Mg2+ concentration. pH and HCO3 constituted the second cluster (C II), indicating that the HCO3 might be influencing the water acidity. The interaction between the pH and HCO3 in this cluster could also favor carbonic and silicate weathering and the elevation of the water hardness in terms of Ca(HCO3)2, CaCO3, and MgCO3 (Eqs. 1921).

Fig. 7
figure 7

R-mode hierarchical dendrogram showing different clusters of the physicochemical parameters

$${\mathrm{Ca},\mathrm{ Mg}\left({\mathrm{CO}}_{3}\right)}_{2}+2{\mathrm{H}}_{2}\mathrm{O}+2{\mathrm{CO}}_{2} \to {\mathrm{Ca}}^{2+}+ {\mathrm{Mg}}^{2+}+ {4\mathrm{HCO}}_{3}^{-}$$
(19)
$${\mathrm{Ca}\left({\mathrm{HCO}}_{3}\right)}_{2} \to {\mathrm{CaCO}}_{3 (\mathrm{ppt})}+ {\mathrm{H}}_{2}\mathrm{O}+ {\mathrm{CO}}_{2}$$
(20)
$${\mathrm{Mg}\left({\mathrm{HCO}}_{3}\right)}_{2} \to {\mathrm{MgCO}}_{3 (\mathrm{ppt})}+ {\mathrm{H}}_{2}\mathrm{O}+ {\mathrm{CO}}_{2}$$
(21)

The third cluster (C III) is made up of K and NO3 (Fig. 7). This cluster typically represents human-induced and nutrient enrichment processes. The possible origins of this cluster include organic wastes (e.g., sewage) and agricultural inputs (e.g., inorganic NPK fertilizers) from the inhabitants of the research area. Previous studies reported that human-dominated areas are particularly prone to having high levels of water nitrate, derived from sources such as animal waste, industrial waste, residential wastewater, septic tanks, and nitrogen-rich fertilizers (He et al. 2018; Egbueri 2020; Egbueri and Agbasi 2022b; Li et al. 2017b; Karunanidhi et al. 2020b). Other anthropogenic activities’ sources of nitrate in water include fossil fuel combustion (which occurs through atmospheric deposition) and nitrogen-fixing crops in naturally vegetated areas (Ward et al. 2018; Gutiérrez et al. 2018). Owing to the association observed in this cluster, the formation of KNO3(aq) is anticipated in the water environment. Also, nitrification processes could also play a contributory role in the NO3 formation. Ammonia and ammonium from sewage and certain inorganic wastes could also react to facilitate the production of NO2, HNO3(aq), and NO3 in the water in oxic environment (Eqs. 2226).

$${2\mathrm{NH}}_{3}+{3\mathrm{O}}_{2} \to {2\mathrm{NO}}_{2}^{-}+ {2\mathrm{H}}_{2}\mathrm{O}+ {2\mathrm{H}}^{+}$$
(22)
$${\mathrm{NH}}_{3}+{\mathrm{O}}_{2}+ {2e}^{-}\to {\mathrm{NH}}_{2}\mathrm{OH}+ {\mathrm{H}}_{2}\mathrm{O}$$
(23)
$${\mathrm{NH}}_{2}\mathrm{OH}+ {2\mathrm{H}}_{2}\mathrm{O }\to {\mathrm{NO}}_{2}^{-}+ {5\mathrm{H}}^{+}+ {4e}^{-}$$
(24)
$${2\mathrm{HNO}}_{2}+{2\mathrm{O}}_{2} \to {2\mathrm{HNO}}_{3 (\mathrm{aq})}+ {\mathrm{O}}_{2 (\mathrm{g})}$$
(25)
$${\mathrm{NH}}_{4}+{2\mathrm{O}}_{2} \to {\mathrm{NO}}_{3}^{-}+ {\mathrm{H}}_{2}\mathrm{O}+ {2\mathrm{H}}^{+}$$
(26)

The fourth cluster (C IV) is composed of water temperature (T), SO42−, Na+, and Cl (Fig. 7). This association is typically geogenic and is divided into two sub-clusters– (T–SO42−) and (Na+–Cl). The sub-cluster 1 is apparently indicative of the role T plays in sulfate enrichment processes within the water environment. For instance, it is well known that temperature enhances most chemical reactions. It has also been indicated in this study that SO42− is the most abundant anion whereas Na+ and Ca2+ are the predominant cations. The addition of SO42− to the water from varied sources (such as sulfide oxidation, biogeochemical degradation of organic matter, and acid precipitation from humic soils in the area) could be facilitated by increasing the T values. Also, the formation of Na2SO4 and CaSO4 salts and their dissociation to form Na–SO4 and Ca–SO4 water types in the area (Egbueri et al. 2019) could have been much impacted by the T. For instance, the reversible reaction (Eq. 27) illustrating the linkage between CaSO4 and water could performance better with warmer temperature.

$${\mathrm{CaSO}}_{4}.{2\mathrm{H}}_{2}\mathrm{O} \leftrightarrow {\mathrm{Ca}}^{2+}+ {\mathrm{SO}}_{4}^{2-}+ {2\mathrm{H}}_{2}\mathrm{O}$$
(27)

The sub-cluster 2 of the C IV (represented as Na+–Cl) suggests that salinization reaction takes place in the water environment. The associations of T, SO42−, Na+, and Cl in the fourth cluster also indicates that T and SO42− influence the water salinization to substantial extents (Cañedo-Argüelles et al. 2013). Moreover, ion exchanges between the SO42− and Cl, related to the cations (Na+, Ca2+, and Mg2+), could also be impacted upon by the T.

It is important to recall that the pH of all the samples indicated that they are all acidic. NOx and SO2 emissions in the area could also be caused by rising fossil fuel burning. Because of this, acidic particulate matter deposition and acid rain in the form of HNO3 and H2SO4 could be frequent in the studied area (Han and Liu 2006; Liu and Han 2020; Ding et al. 2013; Wu et al. 2012). With excessive HNO3 and H2SO4 in the water environment, the breakdown of rocks and minerals can be facilitated (Wu et al. 2012; Xu and Liu 2007; Han et al. 2019), which could result in the transfer and accumulation of more loads of ions and other substances into the water systems.

Varimax-rotated factor analysis

In this research, five factor classes were generated following the hydrogeochemical dataset transformation using the orthogonal Varimax rotation plus the Kaiser normalization technique. The five factor classes had their cumulative variance at 82.792%. Factor classes F1, F2, F3, F4, and F5 varied by 26.440%, 18.542%, 13.960%, 12.833%, and 11.017%, respectively (Table 8(a)). Table 8(b) represents the principal component transformation matrix showing the associations of the five factor classes while Fig. 8 is a 3D component plot depicting the spatial associations between the physicochemical parameters in the various factor classes. Overall, it was realized that the results of the Varimax-rotated factor analysis are consistent with those of the R-mode hierarchical dendrogram. The observed consistencies are summarized as follows:

  • The first factor class (F1) has high loadings on EC, TH, Ca, and Mg (Table 8(a); Fig. 8). The F1 is a typical representation of the C1 in the hierarchical dendrogram (Fig. 7). In line with the C1, the water contamination mechanism represented by the F1 is mainly influenced by geogenic processes.

  • The second factor class (F2) has high loadings on T and SO4 (Table 8(a); Fig. 8). The F2 typically represents the sub-cluster 1 of the C4 in the hierarchical dendrogram (Fig. 7). In line with the sub-cluster 1 of the C4, the water contamination mechanism represented by the F2 is mainly influenced by geogenic processes. With reference to the common lithologies in the study area, it suffices to say that carbonate and evaporitic weathering do not have much controls on SO4 addition to the waters. Also, since Na is the most common cation and highly soluble and reactive than Ca, the predominance of Na2SO4 over CaSO4 and their water types is expected in the study area (Egbueri et al. 2019).

  • The third factor class (F3) has high loadings on pH and HCO3 (Table 8(a); Fig. 8). The F3 represents, typically, the C2 in the hierarchical dendrogram (Fig. 7). In line with the C2, the water contamination mechanism represented by the F3 is mainly influenced by geogenic processes.

  • The fourth factor class (F4) has high loadings on K and NO3 (Table 8(a); Fig. 8). The F4 is a typical representation of the C3 in the hierarchical dendrogram (Fig. 7). In line with the C3, the water contamination mechanism represented by the F4 is mainly influenced by anthropogenic processes and inputs. Moreover, the F4 was noticed to have a closer proximity with Cl (Fig. 8). This suggests that Cl in the water could also be influenced by anthropogenic sources such as industrial waste and agricultural inputs (Egbueri 2018; Ge et al. 2021).

  • The fifth factor class (F5) has high loadings on Na and Cl (Table 8(a); Fig. 8). The F5 typically represents the sub-cluster 2 of the C4 in the hierarchical dendrogram (Fig. 7). In line with the sub-cluster 2 of the C4, the water contamination mechanism represented by the F5 is mainly influenced by geogenic processes like salinization and weathering of sodic minerals in the underlying geologic units. However, with F5 having the least percentage of variance and for the fact that there are no known evaporite deposits in the study area, it is reasoned that evaporitic processes do not control much of the Na+ and Cl addition to the water resources.

Table 8 Factor classes, their extractions and transformation matrix
Fig. 8
figure 8

3D component plot showing the spatial association between the physicochemical parameters

Nevertheless, it is important to mention that, with respect to the percentages of variance of the five factor classes, the importance and significances of the contamination mechanisms in the study area increased in the order F5 < F4 < F3 < F2 < F1. This order also represents the trend by which the important information in the hydrogeochemical data is explained. So far, it has been indicated that geogenic (natural) processes are much more predominant in the study area than anthropogenic influencers. This fact is reflected in the concentration levels of the physicochemical parameters shown in Table 3. In a scenario where human-induced activities predominate over geogenic processes, the concentrations of these variables could be much higher. Also, it is thought that the anthropogenic influences that were reflected by the C3 and F4 are still not intense in the study area. However, with continuous increase in urbanization and population growth rates, plus lack of effective water resources management strategies, in the study area, the imprints of human activities could greatly be pronounced in the area’s hydrogeochemistry. With increments in the variables’ concentration values, their toxicity and human health risks would adjust accordingly.

Bivariate/multivariate graphical plots

As shown in Fig. 9, ten graphical models were utilized to express and unravel the relationships between the analyzed hydrogeochemical variables and further provide insights on the water contamination mechanisms of the study area. Linear equations of the considered variable sets were also presented in each graphical model. Careful observations across the graphs confirms that multiple and mixed processes are responsible for the contamination and evolution of the natural water resources. The common hydrogeochemical processes indicated by the graphical plots are silicate rock weathering and mineral dissolution (Fig. 9a–h); carbonic acid reactions and silicate hydrolysis (Fig. 9d, e, f); neutralization, redox reactions, and ion exchange processes (Fig. 9a, b, d, e–h), and human-induced activities in the study area (Fig. 9i, j).

Fig. 9
figure 9figure 9

Graphical bivariate/multivariate plots for unravelling the trends between the parameters (measured in mg/L) and water contamination mechanisms

Conceptual diagrammatic model of processes affecting contamination mechanisms

A diagrammatic model of the processes affecting the contamination mechanisms of the water resources is shown in Fig. 10, based on the information drawn from the R-mode hierarchical cluster analysis, Varimax-rotated factor analysis, and graphical plots. A careful observation made from Fig. 10 confirms that natural geogenic processes are the most common in the area. The interactions between the various variables and processes are indicated in the model.

Fig. 10
figure 10

Conceptual diagrammatical model of the main processes affecting the water contamination

Computational algorithms for predictive modeling of TH, SO4, and NO3

Radial basis function neural networks

Due to their simplicity of use and effectiveness in predicting significant traits and variables in situations with limited datasets, ANNs (such as RBF-NN) and MLR are usually seen as more cost-effective methods in prediction research (Ozel et al. 2020; Egbueri 2022). As a result, such modeling can cover a large number of data points while saving a lot of money. The performance summary result of the RBF-NN modeling of the TH, SO4, and NO3 contents in the water resources is contained in Table 9. Scatterplots showing the relationships between the measured and predicted variables and their R2 values are illustrated in Fig. 11. The SSE and RE values are shown in Table 9 whereas the residual error plots are depicted in Fig. 12. Interestingly, all the statistical error values proved that the RBF-NN predictive modeling generally had low error values. However, it was noticed that the predictions of NO3 and TH outperformed that of SO4, marginally (Table 9). Meanwhile, it is apparent that the predictions performed very well, with all R2 values seen to be above 0.800. Additionally, sensitivity analysis was carried out to comprehend how the input parameters affected how well the RBF-NN models performed. For TH prediction, Ca, EC, and Mg (Ca > EC > Mg) were the most sensitive elements that impacted on the TH RBF-NN structure and performance (Fig. 13a). This observation is seen to be consistent with the previous findings of the hierarchical cluster analysis and factor analysis, which had suggested that the TH was noticed to be mainly influenced by the Ca, EC, and Mg.

Table 9 Statistical metrics for validating the RBF-NN and MLR models
Fig. 11
figure 11

Scatterplots and R.2 values for the RBF-NN prediction of a TH, b SO4, and c NO3

Fig. 12
figure 12

Residual error plots for the RBF-NN prediction of a TH, b SO4, and c NO3

Fig. 13
figure 13

Bar charts illustrating the sensitivities of input variables on the RBF-NN prediction of a TH, b SO4, and c NO3

The SO4 prediction, architecture, and performance were majorly influenced by the T and Ca (Fig. 13b). This finding is also consistent with the reports of the cluster and factor analyses. T has been shown to influence the presence of SO42− in the water resources. Based on the obvious association noticed between the SO4 RBF-NN and Ca, formation of CaSO4 could also be impacted by T. CaSO4 has also been reported to influence water TH (Fig. 10). Finally, as graphically illustrated in Fig. 13 c, the NO3 prediction and its model performance were mostly determined by K, Mg, HCO3, SO4, Ca, and TH (K > Mg > HCO3 > SO4 > Ca > TH). Again, the reports of the cluster and factor analyses are seen to be consistent with this sensitivity analysis. K+ was identified to have the highest influence on the NO3 RBF-NN prediction, confirming that they both have very close association. Nevertheless, the NO3 RBF-NN predictive model indicated that other active variables could have linkages with the presence of NO3 in the water resources. For instance, nitrate, in the form of Ca(NO3)2, has also been reported to influence the water TH (Fig. 10). The association seen between NO3 and SO4 in this RBF-NN model could be indicative of similar origins such as organic wastes and agricultural inputs (Fig. 10).

Multiple linear regression models

Table 9 also includes the results of the statistical indicators (metrics) used to summarize the MLR models’ performances. Figure 14 shows the R2 parity graphs for the models. Moderate–high R, R2, and R2adj values were achieved, notably for TH and SO4, demonstrating that the MLR models were effective in the parameter prediction. The obtained R, R2, and R2adj values demonstrated that the MLR models’ performances followed the following order: TH > SO4 > NO3 (Table 9). Mathematically, the MLR models created in this study are represented by Eqs. 2830.

Fig. 14
figure 14

Scatterplots and R2 values for the MLR prediction of a TH, b SO4, and c NO3

$${y}_{TH}= 26+2.171pH+\left(-1.441T\right)+0.152EC+\left(-0.076Na\right)+0.117K+1.719Ca+\left(-0.349Mg\right)+0.002Cl+0.019{SO}_{4}+\left(-0.206{HCO}_{3}\right)+0.099{NO}_{3}$$
(28)
$${y}_{{SO}_{4}}= -244.658+\left(-3.713pH\right)+10.972T+0.843EC+0.599Na+0.182K+\left(-0.295Ca\right)+\left(-0.875Mg\right)+\left(-0.033Cl\right)+1.094{HCO}_{3}+\left(-0.626{NO}_{3}\right)+ 0.346TH$$
(29)
$${y}_{{NO}_{3}}=11.728+\left(-1.779pH\right)+\left(-0.071T\right)+\left(-0.053EC\right)+\left(-0.098Na\right)+0.801K+\left(-0.059Ca\right)+0.107Mg+\left(-0.004Cl\right)+0.232{HCO}_{3}+ 0.071TH+(-0.025{SO}_{4})$$
(30)

These equations show how the input parameters affect or are sensitive to the MLR predictions. Based on Eq. 28, it was realized that pH, Ca, and EC influenced the model more than the other variables. The linkages between TH, Ca, and EC are again reflected in this MLR model. For the SO4 prediction, T, HCO3, and EC influenced the MLR model, with T having the most impact (Eq. 29). This result also reflects the consistency between the hierarchical clustering, factor analysis, and RBF-NN in linking SO4 and T closely than other parameters. Lastly, the NO3 MLR prediction (Eq. 30) shows that the model was impacted by K, HCO3, and Mg, with K having the highest impact factor on the water NO3. Interestingly, this result also demonstrates the consistency of the RBF-NN, factor analysis, and hierarchical clustering in correlating SO4 and T more closely than other parameters.

Comparison of the performances of the RBF-NN and MLR models

In order to lower the cost of monitoring and assessing water quality, it is essential to develop trustworthy models that can predict the characteristics of interest. The creation of precise predictive models is also essential since they serve as crucial decision-making tools for sustainable water quality management for a variety of uses. The RBF-NN and MLR models created for this study functioned well, as shown thus far by the validation metrics. Nevertheless, there were some differences among the modeling approaches. The MLR models produced superior forecasts for the water quality parameters in the following order: TH > SO4 > NO3. The RBF-NN models, on the other hand, offered higher forecasts for the water quality parameters in the following order: NO3 > TH > SO4. RBF-NN > MLR is the model performance order based on R2 values and other model error measures (Table 9). This can be because the two modeling methodologies have different sensitivity levels to the physicochemical input parameters used in the predictive modeling. Compared to the MLR, the RBF-NN appears to be more sensitive. Similar observation had been reported by Egbueri and Agbasi (2022c), who predicted irrigation water quality indices using MLR and RBF-NN. Regardless of the performance variations seen in this study, it is important to recall that there are noted significant agreements in their predictions and sensitivity analysis. This suggests that the both models will be useful for water quality monitoring and assessment in the study area. The models also provide useful insights for national and international researchers.

Conclusions

For the comprehensive assessment and predictive modeling of the contamination mechanisms, toxicity, and human-related health risks of water quality in Ojoto region in southeast Nigeria, multiple data-driven indexical, graphical, and soft computational models have been successfully integrated and implemented. Descriptive study approach was also adopted. The study revealed that majority of the physicochemical parameters tested for were within their SON (2015) and WHO (2017) satisfactory limits for natural drinking water other purposes. However, TH, SO4, and NO3 concentrations in some water locations were not within the stipulated satisfactory limits. The nitrate health risk assessment indicated that some locations pose substantial chronic health risk due to water ingestion to the children, females, and males in the study area. However, insignificant health risk was recorded for dermal absorption route for all the populations. Overall, the health risk due to ingestion was higher than the risk due to absorption. Also, the human populations in the area are predisposed to nitrate health risk in the following order: children > females > males. The newly introduced cumulative nitrate health risk maintained the same trend. SO4 concentrations in some locations were above the 100 mg/L limit by SON (2015); thus, the affected water resources pose some negative health effects on the water users. Low TH values were recorded in most of the samples. This also predisposes the water users to some health defects. R-mode hierarchical clustering, Varimax-rotated factor analysis, and bivariate/multivariate graphical plots consistently revealed that there are several, mixed water contamination mechanisms in the study area. Nevertheless, their findings posit that geogenic mechanisms predominate over human-related mechanisms. Moreover, these findings were used to develop a composite diagrammatic model that showcases the prevalent contamination sources and mechanisms. Stoichiometric equations also aided the assessment of the contamination sources and mechanisms. Furthermore, the predictive RBF-NN and MLR models produced in this study accurately predicted the TH, SO4, and NO3, with the RBF-NN model performing better than the MLR models. The RBF-NN and MLR models also provided insights that aided the validations of the findings from the cluster, factor, and graphical analyses. Conclusively, the outcomes of this research satisfied the research objectives and can be very helpful in identifying areas with health risks and the corresponding actions that need to be performed to control and reduce these risks in the research region. This research also provides theoretical foundations for the effective management and prevention of unsafe, contaminated water supplies by local decision-makers. It will encourage sustainable development of high-quality natural water resources, while promoting ecological protection in the area. For the local and regional authorities to take the necessary precautions in the reduction of health risks through sustainable water management, the provided models and results from this research are extremely important.

Limitations of study, perspectives for future research, and recommendations

There are some noted shortcomings that could be addressed in subsequent research, despite the fact that the current study was founded on strong scientific principles and took into account a number of water quality parameters. Also provided are some vital recommendations:

  • The USEPA provided the majority of the values for the health risk parameters that were employed, which could have a negative impact on the nitrate health risk model’s accuracy. Therefore, more research is required to produce data that is particular to the Nigerian context.

  • In the current modeling, the seasonal variations in water quality were not taken into account. Thus, future studies in the region are needed to investigate that.

  • In the research area, anthropogenic activities generally correlate with nitrate content in the studied water resources. The potential of non-carcinogenic health impacts on humans could dramatically increase when the water is utilized for drinking, cooking, bathing, and laundry in the affected areas. The same is true for the sulfate and water hardness. Thus, the poor-quality water resources spotted in parts of the area is to be appropriately adjusted in order to make them suitable for human use.

  • Future research could make use of sophisticated deep learning and hybrid predictive modeling tools to enhance or validate the results of the current study.

  • The water management strategy should be improved. This can be achieved by encouraging public education about the effects of nitrates, sulfates, and hardness on water quality. Such awareness may help to prevent water contamination in the first place.

  • Effective legislation and enforcement strategies, related to anthropogenic activities that threaten the environmental hygiene and water quality, should be introduced as this would help in ensuring that further quality deterioration is minimized and water resources protected.