Introduction

Due to high crop yield, middle Gangetic plain is considered an important agricultural region of India. The agricultural activities in the area leaves behind contaminant residues in the environment. Such residues are leached out during precipitation and irrigation, resulting in changes of physiochemical environmental characteristics. Groundwater is one of the most important water resources for agricultural, domestic, and industrial activities of the region. Nitrate contamination in groundwater in such scenario becomes a major concern across many similar regions of the world (Kim et al. 2015; Su et al. 2013; Zhang et al. 2018). In rural agricultural areas where groundwater is shallow and primarily used for domestic water supplies, this becomes an important consideration (Gu et al. 2013; He and Wu 2018; He et al. 2018; Zhang et al. 2013). The nitrate (\( {\mathrm{NO}}_3^{-} \)) contamination in groundwater is mainly governed by livestock activity, fertilizer application to farmland, and wastewater disposal including septic tanks (Chesnaux and Allen 2008; Gu et al. 2013; Kaown et al. 2009; WHO 2008; Zhang et al. 2013). Three forms of inorganic nitrogen (\( {\mathrm{NO}}_3^{-} \), \( {\mathrm{NO}}_2^{-} \), and \( {\mathrm{NH}}_4^{+} \)) exit in the soil and \( {\mathrm{NO}}_3^{-} \) and \( {\mathrm{NH}}_4^{+} \) are freely available form for the plants. In groundwater system, the concentrations of \( {\mathrm{NH}}_4^{+} \) and \( {\mathrm{NO}}_2^{-} \) are very low, as they are easily converted to \( {\mathrm{NO}}_3^{-} \) under oxidizing conditions. The specific physicochemical properties (e.g., high solubility rate, movability, and more stable oxidation state) categorize nitrate as a potentially contaminant (Arabgol et al. 2016; WHO 2008; Zhai et al. 2017a).

From the viewpoint of human health risk assessment, the exposure to higher nitrate concentration via ingestion can cause adverse effects on human health (Wu and Sun 2016). Methemoglobinemia is a predominant health effect for bottle-fed infants as a result of high-nitrate-contaminated water. Methemoglobinemia is a disease in which the level of oxygen depletes in blood cells as a result of nitrite (after reduction of nitrate to nitrite in reducing condition) reaction with hemoglobin. The higher levels of methemoglobin (> 10%) can result in cyanosis (blue-baby syndrome) (Gao et al. 2012). Methemoglobinemia effects can also be seen as gastrointestinal infections and combatting dehydration (Almasri 2007). Apart from these health effects due to elevated nitrate levels in water, researchers have also reported nitrosamines (Elisante and Muzuka 2016), non-Hodgkin lymphoma (NHL) (Chang et al. 2009), and multiple sclerosis (Fabro et al. 2015). The World Health Organization guidelines (WHO 2011) recommends 50 mg l−1 as acceptable level of \( {\mathrm{NO}}_3^{-} \) concentration in drinking water, whereas the Bureau of Indian Standards (BIS) recommends maximum acceptable limit of 45 mg l−1.

Source identification by adopting various statistical techniques has gained popularity in recent times. Source identification essentially correlates geological and climate attributes of environment like groundwater depth, direction of water flow, and land use activities (Behera and Das 2018; Chen et al. 2005; Gu et al. 2013) to groundwater quality parameters. Recently, Liao et al. (2016) have used hierarchical cluster analysis as a source identification technique for nitrate investigation in groundwater and illustrated that agricultural fertilizers are the main sources of nitrate pollution in shallow groundwater. Kim and Park (2016) and Zhai et al. (2017b) have used principal component analysis for source identification of nitrate contamination in groundwater for agricultural area. Furthermore, Wang et al. (2018) have used popular statistical tool of factor analysis (FA) to investigate the source of nitrate in groundwater pollution. This study applied an advanced statistical tool called sparse principal component analysis (SPCA) to investigate the source for nitrate pollution in groundwater and compare with factor analysis to check the adaptability of the method in the field of water quality assessment. In addition, empirical bayesian kriging (EBK) interpolation model was also adopted to find out the concentrations at the unsampled locations in the study area.

Factor analysis is a dimension reduction technique in multivariate analysis which can effectively reduce numerous variables into a few factors (Wu et al. 2014). It is used to analyze the interrelationship among a set of objects (Cao et al. 2016). Factor analysis reduces large numbers of variables into linear function (as factors) based on variance explained by the each attribute (Davis 1973). The variance contributed in each factor by individual variable is called loading. Factor analysis helps in extracting unobservable (latent) underlying process which cannot be seen in raw dataset. In this study, an attempt has been made to extract the latent factor from hydrogeochemical data and to classify the original data using factor analysis. Factor analysis can find hidden patterns, extract patterns overlap, and show characteristics of input features in terms of loading. The different orthogonal rotation gives a different loading of individual input variables. The orthogonal rotations give a clear pattern of each individual loading based on maximization of the variance on the first extracted principal axes in terms of first factor (Adams et al. 2001; Aiuppa et al. 2003; Behera and Das 2018; Jayakumar and Siraz 1997). The second factor explains the second maximum residual variability after orthogonal rotation. In this research, factor analysis is used to extract the pollution factors and their involving attributes of geochemical data. The interested reader can find more details about factor analysis in Jolliffe (1986).

Sparse principal component analysis (SPCA) has emerged as a powerful statistical technique in multivariant analysis. It provides simple interpretable modes with localized spatial support. SPCA is an extension of the principal component analysis. The basic framework of this method is parsimonious decomposition. The parsimonious decomposition uses prior information in the form of sparsity to regularize the loading of each input’s variables, which means each weight vector has only a few “active” loadings and at the same time other loadings by other variables are constrained to zero. The generated sparse principal components (PCs) are sparsely weighted linear combination of the input variables. The sparse coefficient of input variable is calculated through L1 penalty. The term penalty is a regularization term which is used to avoid the overfitting of the model (generally used in Lasso regression). The term L1 limits the size of coefficient to yield sparsity (i.e., model with few coefficient). In SPCA loading is depending upon the L1 constraint. This L1 constraint maximizes the variance for the input variables on PC axis. In this study, SPCA is used to extract the pollution factors and involved components which control the process. A brief review of this concept can be found in Zou et al. (2006).

To access the geographical distribution of any features requires spatial geochemical data. Collecting sample data from every location is an expensive, time-consuming, and exhaustive process. Therefore, to overcome such problems and analyze geographical distribution of any features, different geostatistical models are available that can predict data values as intermediate locations with certain uncertainty. The basic concept behind any interpolation methods is that it requires distance function between collected samples and unknown location. Classical interpolation methods use a single variogram model to predict the values at unknown location. EBK is a powerful approach, which uses several-variograms model instead of single-variogram model (Gribov and Krivoruchko 2012). EBK works on autocorrelation to produce prediction surface with minimum uncertainty based on lowest mean square error. Furthermore, this algorithm calculates the local trends when data coverage is good. For poor data coverage, it allows bending using prior background mean (Krivoruchko 2012; Krivoruchko and Butler 2013) to avoid the distortion to derived semi-variogram. Here, EBK statistical interpolation method was used to interpolate concentration of nitrate over others’ unsampled location to generate a raster map and to find the elevated level of nitrates in the study region.

The objective of this study was to investigate the source of nitrate using factor analysis and compare the same with sparse principal component analysis. EBK modeling was used to investigate the distribution of nitrate pollution in the study area. USEPA assessment method was adopted to assess the human health risk due to exposure to nitrate-contaminated groundwater. The results of this investigation are expected to provide accurate results and evidence supporting adoption of the abovementioned multivariate statistical tools for hydrogeochemical data and improve the future research.

Study area and regional hydrogeology

The study was conducted in Gaya district of Bihar (Fig. 1) which falls under Punpun sub-basin of the Ganga basin. Bihar state is situated in the Middle Gangetic Plain of India. The Gaya district spreads over an area of 4976 km2, which lies between north latitudes 24° 30′: 25° 06′ and east longitudes 84° 24′: 85° 30′. The drainage pattern of this study area is primarily controlled by four parallel streams, the Morhar, the Phalgu, the Paimar, and the Dhadhar, all stemming from the southern plateau of Jharkhand to north and northeasterly (CGWB 2013). The whole study area experiences a continental monsoon with extreme climate. The summer months are characterized by hot blasts (46 °C) of westerly winds (known as “loo”) to mercury drops down to as low as 4 °C. The monsoon sets at the end of June which receives annual rainfall between 568.5 mm and 1109 mm (CGWB 2013).

Fig. 1
figure 1

Study area with sampling location

The soil type of this region consists of younger alluvial, sandy, red, and yellow soil. The sandy, red, and yellow soil is restricted to the southern and northeastern parts of the area which consists of less amounts of nitrogen and organic matter. The younger alluvial soil is restricted to mid-north and northwest parts which have loam with a small proportion of sand and clay (locally called “Kewal”). The younger alluvial soil is rich in nitrogen and calcium. The groundwater aquifers of this region was categorized between unconfined and semi-confined. The movement of groundwater in such aquifer is primarily controlled by joints, fissures, and other planes of weak structure like granite gneisses, mica-schist, quartzite, and their other associated rocks of Pre-Cambrian age (CGWB 2013). The exploratory drilling results show that there is good groundwater development prospect up to 100 m below the ground level. The semi-unconfined aquifer consists sandy horizon which gradually increase towards eastwards (CGWB 2013). The groundwater depth level varies between 5 and 10 m below ground level in the major portion of the district during pre-monsoon period, whereas the groundwater levels (> 10 m) are found in the eastern and western parts of the district (CGWB 2013).

Methods and materials

Water sampling and analysis

A total of 156 representative groundwater samples were collected from underground sources (mainly bore wells and hand pumps) across the study area (Fig. 1). All the underground sources (bore wells/hand pumps) were freely accessible in the community for the consumption. Sampling site and geo-positions (latitude and longitude) were found using a Global Positioning System (GPS) (model: Garmin GPS 72H). Groundwater samples were collected after 2–5 min of flushing time. For each location, 1-l high-density polyethylene (Thermo Scientific, 1131000BPC) sample bottles pre-treated with dilute nitric acid overnight and rinsed with ultrapure water (Milli-Q Ultrapure Water Purification System, Model Z00QSVC01) were used. Sample bottles were washed thoroughly with the sampled bore well/hand-pump water before collecting the water samples. Samples were collected during the pre-monsoon season from marked locations (Fig. 1). Immediately after collection of water samples at the site, in situ water quality parameters such as pH and conductivity were estimated in the field. Collected samples were filtered using 0.45-μm membrane filter (syringe filters). To avoid wall deposition, a portion of the samples was acidified after filtration. To avoid the matrix decomposition, the unacidified samples were stored at 4 °C in the dark until completion of analysis (APHA 2005).

Before analysis, water samples were filtered using 0.45-μm cellulose filters (syringe filters) in spectrophotometer equipment(Thermo Scientific Evolution 201). The reagents and redistilled water used for the solution preparation (or dilution) were, respectively, analytical grade (Merk Millipore) and highest purity Millipore water. The in situ parameters such as pH and electrical conductivity (EC) were measured immediately after sampling using Thermo Scientific Orion VERSA STAR pH/ISE/Conductivity/RDO/Dissolved Oxygen Multiparameter Meter Kit. Three-point calibration method, using pH buffer (4, 7, 10) and two-point calibration of potassium chloride strength solutions (1413 μS, 0.1 mS) were used to calibrate the instrument. The instrument accuracy for pH and EC meter was ± 0.002pH and 0.5% of reading ± 1 digit > 3 μS. The selected anions, i.e., chloride, sulfate, and phosphate, were determined using standard AgNO3 titration method, turbidimetric method, and colorimetric method respectively using UV spectrophotometer (Adimalla et al. 2018b). The nitrate (\( {\mathrm{NO}}_3^{-} \)) concentration in water samples was estimated by a method adopted by Ganesh et al. (2012). The standard EDTA titration method was used for estimating calcium and magnesium (Adimalla et al. 2018b). All procedures followed for analysis were as per standard methods for the examination of water and wastewater (APHA 2005).

Human health risk assessment

The USEPA health risk assessment method was used to assess the human health of three age groups (infants, children, and adults) of the study area, who are exposed to nitrate-contaminated groundwater. According to the International Agency for Research on Cancer (IARC), nitrate (\( {\mathrm{NO}}_3^{-} \)) is categorized under group D carcinogenic pollutants. Therefore, only non-carcinogenic risks were calculated in this study. The health risk due to exposure of nitrate-contaminated water was evaluated via two pathways (oral and dermal). The oral and dermal exposure is estimated using Eqs. (1), (3), and (4). In addition, the hazard index (HI) which is the sum of THQingestion + THQdermal is also estimated considering Eqs. (2) and (5) (Li et al. 2017; Li et al. 2016).

$$ {\mathrm{CDI}}_{\mathrm{oral}}={c}_w\times \mathrm{IR}\times \mathrm{EF}\times \mathrm{ED}/\left(\mathrm{BW}\times \mathrm{AT}\right) $$
(1)
$$ {\mathrm{HQ}}_{\mathrm{oral}}={\mathrm{CDI}}_{\mathrm{oral}}/{\mathrm{RfD}}_{\mathrm{oral}} $$
(2)
$$ {\mathrm{CDI}}_{\mathrm{dermal}}=I\times {A}_{\mathrm{SD}}\times \mathrm{EF}\times \mathrm{FE}\times \mathrm{ED}/\left(\mathrm{BW}\times \mathrm{AT}\times f\right) $$
(3)
$$ I={10}^{-3}\times k\times {c}_w\times \mathrm{ET} $$
(4)
$$ {\mathrm{HQ}}_{\mathrm{dermal}}={\mathrm{CDI}}_{\mathrm{dermal}}/{\mathrm{RfD}}_{\mathrm{dermal}} $$
(5)

where CDIoral and CDIdermal, respectively, represents the daily average exposure dosage through the oral and dermal pathways in per-unit weight (mg l−1 day−1), Cw is the concentration of the nitrate in the groundwater, IR indicates intake rate of groundwater for drinking (l day−1). EF and ED are the exposure frequency (day a−1) and exposure duration (a), respectively; BW is the body weight of a person (kg), AT is the average time for non-carcinogenic effect (day), FE is the daily exposure frequency of dermal contact event (t day−1), ASD is the body surface area (cm2), I indicates the exposure dosage of every single event (mg cm2), k is the coefficient of permeability of the skin (cm h−1), ET is the contact time for a single shower (h day−1), and the reference dose for NO3 is RfDoral and RfDdermal are 1.6 (mg kg−1 day−1) and 0.8 (mg kg−1 day−1) respectively. The input values used to calculate the health risk for three age groups are presented Table 1.

Table 1 Health risk parameter value

The hazard index (HI) of pollutant, if less than 1, means that risk is within the recommended limit, while HI > 1 indicates that the risk is beyond the acceptable level.

Results and Discussion

General groundwater chemistry

Table 2 shows the descriptive statistics of groundwater quality in the study area. The pH value ranged from 5.8 to 7.6, which clearly indicated that the groundwater of the study region is slightly acidic in nature. The EC is the basic water quality parameter which represents the total dissolved ions in the groundwater samples. The EC values ranged from 311.4 to 3891 mg l−1. The higher variation of EC value might be due to random variation of hydrogeology. The water quality parameters like bicarbonate, chloride, and sulfate were found well below the BIS acceptable limit (BIS 2012). From the analysis, it was found that major anions in the groundwater were chloride (3rd quartile = 88.9 mg 1−1) followed by sulfate (3rd quartile = 47.3 mg 1−1), nitrate (3rd quartile = 30.1 mg 1−1), and phosphate (3rd quartile = 0.5 mg 1−1). The higher amount of chloride ions shows its conservative nature in groundwater system. Chloride is not lost from physical or chemical process like sorption and precipitation, causing higher concentrations among various water quality attributes (Gascoyne 1989). The shallow groundwater level (< 200 ft) also favors the calcium-magnesium-sulfate-bicarbonate-type groundwater of this region.

Table 2 Descriptive statistics of groundwater samples across middle Gangetic plains of Gaya district

The nitrate concentrations in samples ranged from 0.7 to 54.9 mg 1−1 with a mean value of 19.9 mg 1−1, and 12% of the groundwater samples exceeded the BIS acceptable limit (45 mg 1−1). Also, 5% of groundwater samples exceeded the WHO acceptable limit of nitrate (50 mg 1−1). Threshold concentrations of 1 mg 1−1 and 3 mg 1−1 were also considered to represent possibility of anthropogenic contamination of groundwater due to the surface origins (Madison 1984; Nolan and Hitt 2006). Considering the above thresholds, 99% and 94.87% of the samples exceeded the limit of 1 mg 1−1 and 3 mg 1−1 of threshold limits of nitrate, respectively. That clearly indicated some initiation of nitrate contamination due to anthropogenic activities. Gaya is an important grain commodity base of South Bihar, where extensive agriculture is the primarily land use and groundwater is the main source of water demand for irrigation. The anthropogenic activity like fertilizer and manure application and high nutrient content irrigation water might be the reason for elevated level of nitrate in this region. Furthermore, high chloride content in groundwater (10% samples exceeded the BIS acceptable limit) also indicate the source of contamination from anthropogenic intervention.

Figure 3 shows the Spearman’s correlation coefficient between different water quality attributes of groundwater samples. The histogram and bell-shape density curve show that pH, calcium, and magnesium were approximately normally distributed. However, the nitrate and phosphate data were rightly skewed indicating existence of some abnormality in the dataset. In addition, some other attributes like EC, chloride, sulfate, and magnesium were found highly right skewed following approximately lognormal distribution (Fig. 3). The poor correlation of nitrate (\( {\mathrm{NO}}_3^{-} \)) (r < 0.50) with others’ water quality attributes was observed for this geochemical data. The correlation of nitrate with chloride represents the initiation of nitrate contamination by anthropogenic intervention like application of fertilizer or percolation of domestic sewage water or nutrient-rich irrigation water (Kim et al. 2015). Insignificant correlation (r < 0.18) of nitrate with sulfate might be due to sulfide oxidation. Furthermore, there were insignificant one by one correlations (p < 0.05, r = 0.39, 0.23) between sulfate with calcium and magnesium. That reflects either slow decomposition of calcium-bearing minerals like calcite and dolomite or congruent dissolution of magnesium containing silicate and carbonate minerals in the study area as results of the water-rock interaction (Kumar et al. 2006). Thus, the elevated level of nitrate in groundwater is mainly controlled by the external sources.

Source identification of nitrate in groundwater

The graphical representation of cumulative frequency curve (CFC) for nitrate concentration is shown in Fig. 4. Cumulative frequency shows the running total of frequencies (number of times an event occurs within a given scenario). The CFC plot represents the two step inflection points (the minimum gap between actual and simulated) around 20 and 35 mg l−1. This clearly indicates that different factors are controlling the abnormal behavior of nitrate concentration in groundwater system for this region (Matschullat et al. 2000). To understand the underlying but unobservable relationship among their attributes for nitrate contamination, factor analysis technique is used. The factor analysis was performed using free open source R software with “Psych” package (Revelle 2011).

The scree plot was drawn to extract the number of factors (Fig. 5). The blue and two red lines (on top of each other) represent the actual, simulated, and resampled data, which is estimated based on the parallel analysis (Revelle 2011). From the scree plot, the selected number of factors in between 2 and 4 can be a good choice to determine the factor loading for dataset. This study preferred to select four factors, which explain about 70% of the total variance of the dataset (Table 3). Furthermore, to obtain the factor loading, the oblique rotation was preferred based on the assumption that factors have a certain correlation. This was observed while computing Spearman’s correlations. This method derives solutions through iterative eigenvalue decomposition without considering prior normal distribution. All the geochemical data attributes do not follow normal distribution (pH, calcium, and magnesium) as stated earlier.

Table 3 Total variance explained based on selected factors

From the analysis, factor 1 explains 24% of the total variance in the dataset and has significant variance contribution from EC followed by chloride, sulfate, and phosphate (Table 4). Therefore, factor 1 can be considered “mixed factor” from natural and anthropogenic influences. Factor 2 explains 18% of the total variance, that is primiarly contributed by pH and bicarbonate. As these two geochemical attributes control the redox environment in the groundwater system, factor 2 can be considered redox factor. In addition, factor 2 is primarily controlled by \( {\mathrm{HCO}}_3^{-} \) (pH < 8.5), whereas factor 3 contributes 15% of the total variance contributed from magnesium followed by sulfate and chloride. The positive loading of sulfate and chloride infer the same source of origin, which was derived from domestic sewage and industrial sewage discharge in terms of anthropogenic intervention. Factor 4 shows the significant association of nitrate with calcium and can be considered “nitrate” factor, which contributes 13% of the total variance of the geochemical data. The association of nitrate with chloride and calcium represents that nitrate is controlled by anthropogenic intervention. The above said finding is supported by factor correlation matrix as factors 1 and 4 were significantly correlated (Table 5).

Table 4 Factor loading matrix of four factors of groundwater dataset
Table 5 Factor correlation matrix

For better understanding, SPCA was applied using “elasticnet” package (Friedman et al. 2009) in R open source software. Four principal components were selected based on scree plot (Fig. 5). From analysis, it was clear that the first principal component (PC) was significantly contributed by EC followed by sulfate. Therefore, similar to factor analysis, first PC represented the “salinity factor” also indirectly representing the total ions present in the groundwater samples. The second and third PCs have little contribution for underlying processes as similar conclusion was found with factor analysis. The fourth PC represents the “nitrate” factor and water quality attributes responsible for the presence of elevated level of nitrate in the groundwater system. It was found that nitrate was strongly associated in terms of loading with calcium followed by chloride and sulfate. The positive loading on nitrate with chloride indicated that nitrate comes from same sources of chloride, that may be derived from domestic and industrial sewage. Furthermore, the association of nitrate with sulfate showed the application of fertilizer is the reason behind elevated nitrate level of nitrate in the groundwater system (Cao et al. 2016). These all are external sources indicate that anthropogenic activity might be the reason for the eleveted nitrate.

Table 6 SPCA loading matrix of four component in the groundwater dataset

To understand, the groundwater samples variability in the different regions of the study area, based on the factor scores (1 and 4), inverse distance weighted (IDW) interpolation method was used (Fig. 6). Samples in the NE, middle, west, and SW parts of study area with factor loading > 0 indicated high salinity and anthropogenic pollution (Fig. 6a) contributed by chloride, sulfate, and phosphate. Samples in N to SE and middle of SW parts of study area with factor 4 score > 0 indicated nitrate enrichment, which can be due to association of chloride, sulfate, and calcium as a result of anthropogenic intervention (Fig. 6b). A large portion of the study area has exposure to recent alluvium enriched with clay and silt (Fig. 2) leading to poor connectivity between the unsaturated zone of soil and atmosphere. This condition assists microbial action for reduction of nitrate to nitrite and leaching into groundwater resulting in elevated nitrite concentration in groundwater.

Fig. 2
figure 2

Hydrogeological map of Gaya district (source: CGWB 2013))

Fig. 3
figure 3

Spearman’s correlation coefficient between measured parameters of groundwater sample

Fig. 4.
figure 4

The empirical cumulative frequency plot for nitrate

Fig. 5.
figure 5

Scree plots (selection of factor based on the eigenvalue)

Fig. 6
figure 6

a, b Spatial distribution of factors 1 and 4

Fig. 7
figure 7

Spatial prediction of nitrate using EBK modeling

Fig. 8
figure 8

Spatial distribution map of total hazard index of nitrate in groundwater of Gaya district using IDW interpolation

Modeling of nitrate using empirical Bayesian

Semi-variogram-based model was used to estimate the nitrate (\( {\mathrm{NO}}_3^{-} \)) concentrations at unsampled locations. Different EBK models were tested to identify the best fitted semivariogram for the nitrate dataset. There is no specific rule for selecting the best model for any data set. It is a trade-off process resulting in lowest root mean square error. From the results, the exponential model semivariogram was best fit for nitrate data after setting subset size (70), overlap factor (1.5), and major semi-axis (0.123) on 500 simulations. The final model was selected considering the lowest root mean square error (Table 7). The spatial distribution of nitrate map was plotted to determine the locations of elevated levels of concentration of nitrate in the study area.

Table 7 Results of cross-validation for evaluating the performance of six EBK models for nitrate datasets

The distribution map of nitrate shows the highest concentration value in the mid-north portion of the study region (Fig. 7). The shallow groundwater and younger alluvial soil type of this region assists the groundwater nitrate contamination through surface water interaction. The raster map of nitrate concentration reveals that concentrations are well below the acceptable limit in this study region (BIS 2012; WHO 2011).

Human risk assessment

The human risk assessments via two pathways (oral and dermal) for three age groups were performed. The hazard quotient was calculated based on Eqs. (2) and (5) consisting of both oral and dermal exposure to nitrate in the groundwater. The value for HQoral ranged from 0.037 to 3.08, for infants; 0.02 to 1.90, for children; and 0.0136 to 1.1476 for adults. The value for HQdermal ranged from 2 × 10−4 to 1.8 × 10−2 for infants; 1.6 × 10−4 to 1.3 × 10−2 for children; and 1.3 × 10−4 to 1.1 × 10−2 for adults. Table 8 shows the descriptive statistics of HI, considering different age groups (infants, children, and adults). If the calculated HI is less than 1, then no adverse health effects are expected due to exposure of nitrate. The hazard index value for HIinfants, HIchildren, and HIadults are ranged from 0.04 to 3.10, 0.02 to 1.92, and 0.01 to 1.16 respectively. From the results, it was observed that the mean value for HI has slightly exceeded for infants in this region, whereas for other cases (children and adults), they are well below 1. This investigation shows that \( {\mathrm{NO}}_3^{-} \) is an accountable element for health risk to infants and need to be examined. Young people (infants and children) are usually associated with higher health risks to contaminants than adults, and similar conclusions have been reported widely by Wu and Sun (2016), Li et al. (2018), Li et al. (2014), and Adimalla et al. (2018a).

Table 8 Descriptive statistics of risk assessment in terms of total hazard quotient

To understand the spatial distribution of risk variability (in terms of HI) for different groups, the IDW interpolation method was used. The spatial distribution of hazard index for different groups is shown in Fig. 8. From Fig. 8, it was observed that residents in the northwest and southwest region pose health risk. Especially infants were the most vulnerable group due to exposure of drinking water at a same level of pollution followed by children. Adults were mostly safe in different areas except for a few elevated pockets. The finding of result can be helpful to the population of the region for taking safeguard against utilizing fertilizer and improving groundwater quality. In addition, there is a need for water resource management at a local scale to improve the groundwater quality by preventing the leaching of nitrate to the groundwater system.

Conclusions

  • This investigation reveals that the groundwater of this region is on the slightly acidic side. The major dominant ions in the groundwater are \( {{\mathrm{HCO}}_3}^{-}>{\mathrm{Cl}}^{-}>{{\mathrm{SO}}_4}^{2^{-}}>{{\mathrm{PO}}_4}^{3^{-}} \) and Ca2+ > Mg2+ showing samples of this region fall under the weak mineralized zone.

  • The measured concentration of nitrate in 95% water samples from the middle Gangetic plains is well below the acceptable limit of WHO (2008), whereas 88% of samples were below the BIS safe limit.

  • The results of factor analysis identified various underlying processes that controlled the nitrate concentration in groundwater. Two identified factors were selected to understand the nitrate process. Factor 1 represented by the mixed factor indicated external sources of ions influencing regional hydrochemistry. The nitrate factor (factor 4), showed association of nitrate with chloride and calcium that represented nitrate concentrations controlled by anthropogenic intervention.

  • The groundwater of region was of Ca2+ – Cl−1(NO3 + SO42−) type groundwater with elevated level of NO3.

  • EBK modeling shows that nitrate concentration of this region varied from 9.07 to 25.91 mg 1−1. The nitrate concentration of this region was well within the acceptable levels of BIS and WHO.

  • Health risk assessment due to exposure of nitrate in drinking water shows that infants’ risk was highest followed by that in children and adults. Therefore, water quality management needed for drinking water should put more emphasis on effects on infants at local scale.