Introduction

Water scarcity has resulted in many regions of the world due to increased demand for water with time. In rural India, groundwater provides 80 percent of the domestic water (Malik & Bhagwat, 2021, Soujanya et al., 2020; Aravindan & Shankar, 2011; Foster & Chilton, 2003). Out of this, about 30 to 84 percent of total groundwater consumed in various states of India is brackish (Krishan et al., 2020a; Prusty & Farooq, 2020). Yao and Lund (2021) pointed out that groundwater salinization is increasing globally and poses a major challenge to both water and agricultural management.

Salinity is considered as a major natural source of groundwater pollution (Heydarirad et al., 2019). Groundwater salinity is controlled by many factors such as rainfall, the composition of aquifer material, topography, hydrologic fluctuation, and climate, and it varies spatially and temporally as a result of these intricate interactions (Mohapatra et al., 2011; Venkateswaran et al., 2012; Kawo & Karuppannan, 2018; Panneerselvam et al., 2020b; Li et al., 2020; Haji et al., 2021a, b; Muthusamy et al., 2022). The aquifer’s variability is correlated to a variety of biological, physical, and chemical processes, including weathering, aerobic respiration, mineral dissolution and precipitation, cation exchange, and other anthropogenic sources (Aravinthasamy et al., 2021; Panneerselvam et al., 2020a; Mohapatra et al., 2011; Panneerselvam et al., 2020c, d, 2021, 2022; Ravi et al., 2020). Thus, for identifying the main causes of increases groundwater pollution, different source apportionment techniques like chemical characterization (Panneerselvam et al., 2020b; Li et al., 2022; Liu et al., 2019; Ravi et al., 2020), tracers (Haji et al., 2021a, b; Krishan et al., 2021a, 2022; Panda et al., 2022), statistical techniques (Panneerselvam et al., 2020a, El-Kholy et al., 2022), and modeling are employed. Techniques like tracers and modeling can only be successfully used for contaminant plume detection when the source is already known and the selected tracer changes its characteristics in accordance with the changing nature of the pollutant. Based on the chemical characterization, areas showing similarities in pollution are clustered by many researchers (El-Kholy et al., 2022, Krishan et al., 2021b), but this clustering does not represent the dominant factor responsible for the existing physio-chemical characteristic of groundwater. To give a quantifiable measure of correspondence of water quality metrics and to identify the underlying human and natural processes in groundwater, multivariate statistical techniques such as principal component analysis (PCA) have been utilized (Bonelli & Manni, 2019, Chandrasekar et al., 2021; Elemile et al., 2021; Karunanidhi et al., 2021; Panneerselvam et al., 2020c). The principal component analysis identifies the key features or parameters that determine the data structure, as well as group observatories based on sample similarities. Multivariate analysis approaches such as PCA have been proven to be effective for identifying significant water quality concerns and potential contamination sources/processes by evaluating and interpreting groundwater quality data sets (Kavitha et al., 2019a, b; Rao et al., 2010; Shankar & Kawo, 2019; Singh et al., 2004, 2005, 2009).

Mewat (Nuh) district is one of the seven districts of Haryana, India which registers a very high groundwater salinity. Most of the groundwater wells in the study area show a high TDS of 35,000 mg/l (Krishan, 2019; Krishan et al., 2020c) making it unfit for drinking and agricultural uses. Therefore, to ensure sustainable development, it becomes essential to understand the relationship between the physio-chemical parameters and to identify the regional processes which determine the groundwater quality. Though various methods for source identification like chemical characterization (Krishan et al., 2021b) and isotopic analysis (Krishan et al., 2020b) have been performed in the study area, no attempt has been made to understand the dominant phenomena of salinization through a statistical method like PCA in the study area. Thus, the present study attempts to apply PCA over the physicochemical parameters of groundwater to extract key mechanisms influencing groundwater quality and salinity control in Mewat. Identification of the salinity sources in the study area will be the first step towards achieving SDG-6, improving water quality, and reducing pollution.

Study area

Location, climate, and geomorphology

Mewat region of Haryana state is located between 27°40′00′′N and 28°20′00′′N latitude and between 76°50′00′′E and 77°20′00′′E longitudes (Fig. 1) having a population of 1,089,263. Firozpur Jhirka, Nuh, Nagina, Tauru, and Punahana are the five blocks that cover this region (Krishan et al., 2021a, b). The daily maximum temperature in the study region is 40 °C (in May and June) and the daily lowest temperature is 5.1 °C (January).

Fig. 1
figure 1

Study area showing sampling locations

The elevation of the study area is from 100 m to ~ 550 m (Fig. 2a). Higher elevations are found in the Aravalli ranges. The study area found two major soil types, vertisols and salanchalks. These soil types generally have medium-textured loamy sand. Due to the recurring lack of rainfall during the monsoon season and the scarcity of freshwater resources, farmers are forced to plant crops that require less water, such as wheat, millet, and mustard. The district has a rolling topography with an urn-shaped structure. The geomorphology of the study area is shown in Fig. 2b. Topographically, the study area is mainly made up of alluvium of Quaternary and Paleoproterozoic age groups. The most dominant is the Quaternary age group alluvium, with a polycyclic sequence composed of sand, silt, and clay with kankar. The alluvium thickness varies from 90 to 300 m bgl, and the depth to water level varies from 2 to 32 m bgl. The major lithology is quartzite, schist, and phyllites of the Alwar and Ajabgarh groups. The weathering of these Proterozoic mobile belts forms the overlying younger quaternary deposits of the Pliestocence-holocene period, deposited as aolian deposits in the form of loose oxidized sand and silt. There are no natural drains in the district with no natural lakes and ponds.

Fig. 2
figure 2

a Digital Elevation Model. b Geomorphology of the Mewat district

Land use and land cover

Mewat district consists of 64 percent agricultural land, 12 percent built-up area (human settlements including the residential sites), about 6 percent falls under fallow land, 1 percent underwater bodies, and 18 percent under hills. (Fig. 3). The land use and land cover show that the agricultural activities utilize maximum land; since the water sources available do not fulfill the need for irrigation, the study area’s current and future water usage is stressed.

Fig. 3
figure 3

Landuse/land cover map of the Mewat district

Data collection and analysis

Groundwater samples from each of the 20 selected sites were collected in March–April of 2018 and 2019 from hand pumps, open wells, and bore wells with a depth range of 4–92 m. The water level of the open well in meters was recorded using a water-level indicator. GPS readings were taken to record latitude and longitude. Samples were taken in 125 mL capacity acid-washed Tarson bottles. The samples collected were analyzed for cations (Ca2+, Mg2+, Na+, and K+) and anions (\({\mathrm{HCO}}_{3}^{-}\), \({\mathrm{SO}}_{4}^{2-}\), Cl and \({\mathrm{NO}}_{3}^{-}\)) at the water quality laboratory of the groundwater division of NIH Roorkee, as per standard methodology (APHA, 2005).

Electrical conductivity (EC) and pH were measured using a portable handheld Hach, HQ30d EC meter, and total dissolved solids (TDS) were estimated from measured EC values and expressed in mg/l. Ca2+and Mg2+ were determined titrimetrically using standard EDTA. For determining the Cl concentrations, the Argentometric method was used in which potassium dichromate was used as an indicator and titrated with silver nitrate solution, which was titrant. \({\mathrm{HCO}}_{3}^{-}\) was determined by titration with HCl. Na+ and K+ were measured by flame photometry at a wavelength of 589 nm and 766 nm, respectively, and \({\mathrm{SO}}_{4}^{2-}\) and \({\mathrm{NO}}_{3}^{-}\) by spectrophotometric turbidimetry by a HACH spectrophotometer. All concentrations are expressed in milligrams per liter (mg/l).

PCA analysis

Principal component analysis (PCA) is used to create new variables called principal component, which illustrates the relationship among different chemical variables. The new variables are made up of a linear combination of the old ones. In general, two or three principal components are utilized, which explains a reasonable percentage of variance based on the eigen value. But, before applying PCA to determine the existing relationship, it is important to determine whether the sampling points selected are adequate for evaluating the underlying relationship. Thus, the Kaiser–Meyer–Olkin (KMO) test is performed. The KMO value of more than 0.5 indicates the sample is adequate for factor analysis. Furthermore, to ensure that the F-test does not remain liberal in determining the relationship, Barlett’s sphericity is performed. The Bartlett’s test of sphericity tests the null hypothesis; if the significance value is less than 0.05 then the null hypothesis is rejected and PCA is valuable for the data. If the observed significance level is rejected, it indicates that there are significant relationships among variables.

In the present study, three principal components for both the years were utilized, and the selection was made based on eigenvalues greater than unity (Kaiser, 1958). The varimax rotation was applied to obtain score plots. The principal component scores describe the intensity of chemical processes in relation to each principal component. Lithological and anthropogenic controlled variables were identified from the principal component scores, and the influence of these factors at the sampling locations was then drawn through two sets of plots. These are (a) scatter plots between the principal component loadings and (b) scatter plots of principal component scores obtained through Varimax rotation.

Results and discussion

General

In the month of March–April, twenty samples were collected from different sites in the study area for the years 2018 and 2019 and analyzed according to the prescribed methods, and a data set was used to conduct PCA analysis using SPSS.

The pH varies from 6.8 to 8.3 in 2018 and 6.98 to 8.7 in 2019 indicating increased groundwater acidity. The concentration of TDS is in the range of 291 mg/l to 20,904 mg/l for the year 2018 and 52.93 mg/l to 13,393 mg/l for the year 2019. On comparing cations concentrations (Ca2+, Mg2+, Na+, and K+), Na+ concentration (2.4 mg/l to 10,286 mg/l for 2018 and 31.68 mg/l to 11,635 mg/l) is found to be dominant followed by Ca2+ (64 mg/l to 1340 mg/l) and Mg2+ (6 mg/l to 878 mg/l) for the year 2018 whereas the trend for Ca2+ and Mg2+ gets reversed with Mg2+ (23 mg/l to 1030 mg/l) and Ca2+ (80 mg/l to 1280 mg/l) for the years 2019. Among the anions (\({\mathrm{HCO}}_{3}^{-}\), \({\mathrm{SO}}_{4}^{2-}\), Cl, and \({\mathrm{NO}}_{3}^{-}\)), chloride concentration (135 mg/l to 8041 mg/l for 2018 and 333 mg/l to 17,042 mg/l for the year 2019) predominates for both the years, followed by \({\mathrm{HCO}}_{3}^{-}\) (34 mg/l to 540 mg/l for 2018 and 109 mg/l to 437 mg/l), followed by \({\mathrm{NO}}_{3}^{-}\) (1 mg/l to 11 mg/l for 2018 and 1 mg/l to 20 mg/l for the year 2019), followed by \({\mathrm{SO}}_{4}^{2-}\)(13 mg/l to 450 mg/l for 2018 and 40 mg/l to 4280 mg/l for the year 2019). Thus, the groundwater is generally characterized by sodium chloride-type water.

KMO and bartlett’s test

To ensure that the interpretation made through principal components remains rigorous, two pre-PCA tests KMO and Bartlett’s were performed, and the results so obtained are represented in Table 1.

Table 1 KMO and Bartlett’s test for the samples collected in the year 2018 and 2019

High values of the KMO test (close to 1.0) generally indicate that a PCA is useful for the data. If the value is < 0.50, the results of the PCA are not considered valid (Field, 2013). In 2018, the KMO value is found to be more than 0.50 (KMO = 0.623 > 0.50) (Table 1). Therefore, it can be inferred that the results of the PCA are considered to be valid. If the significance value is less than 0.05 then the null hypothesis is rejected and PCA is valuable for the data (significance = 0.00 < 0.05). Similarly, in 2019, for the months of March and April, the KMO value and Bartlett’s test for sphericity were found to be 0.528 and 0.000 (Table 2), respectively. Therefore, in both years, the data can be considered valid for PCA analysis.

Table 2 Varimax orthogonal rotated factor loadings from PCA of standardized water quality data set 2018

Principal component analysis

Tables 2 and 3 represent the eigen value, and percentage variances for the sampling done in the year 2018 and 2019, respectively through which PCA solution is obtained. The output of the final loading matrix obtained from the data for 2018 and 2019 indicates three principal components which explain 79.58% and 85.08% of the total variation, respectively. The principal component-I (F1) contributed 57.299% in the year 2018 and 61.7% in 2019; principal component-II (F2) contributed 13.11% and 13.63% for 2018 and 2019, respectively, and the principal component-III (F3) contributed 9.17% in the year 2018 and 9.75% in 2019 with 12 chemical variables. Thus, each principal component can be used to identify the specific hydrogeochemical processes and the processes which undergo seasonal variations by examining their loadings.

Table 3 Varimax orthogonal rotated factor loadings from PCA of standardized water quality data set 2019

Principal component loadings

To find out the variation in parameter loadings from positive to negative scatter plots of principle component loadings between PC-1, PC-2, and PC-3 and also PC-2 and PC-3 are illustrated for the year 2018 in Fig. 4 and for 2019 in Fig. 5.

Fig. 4
figure 4

Scatter plot of principal component loadings for the year 2018 a PC-1 vs PC-2, b PC-1 vs PC-3, and c PC-2 vs PC-3

Fig. 5
figure 5

Scatter plot of principal component loadings for the year 2019 a PC-1 vs PC-2, b PC-1 vs PC-3, and c PC-2 vs PC-3

Table 4. Distribution of the groundwater samples in the positive principal component score for the years 2018 and 2019

Principle component-1

For the year 2018, the principal component PC-1 is dominated by TDS, Na+, Cl −, \({SO}_{4}^{2-}\) and K+ variables whereas for 2019, principal component PC-1 is dominated by TDS, Na+, Cl −, \({SO}_{4}^{2-}\), Mg2+, TH, and K+. The high domination of TDS, Na+, Cl −, \({SO}_{4}^{2-}\) and K+ for 2018 and TDS, Na+, Cl −, \({SO}_{4}^{2-}\), Mg2+, TH, and K+ for 2019 can also be concluded from the close plotting in the positive region of PC-1 as shown in Figs. 4(a, b) and 5(a, b), respectively. Thus, PC-1 is considered a salinity factor. For the year 2018, the salinity is directly controlled by Na+ and Cl − ions concentrations, whereas, for the year 2019, salinity is directly controlled by the concentrations of Na+, Mg2+, Cl −, and \({SO}_{4}^{2-}\) ions. The hydrogeology of the region is dominated by silicate minerals feldspathic, gritty quartzite, amphibole, phyllite, and schist but lacks sodalite and chlorapatite minerals which are the main source of Cl − and \({SO}_{4}^{2-}\). Thus, Na+, Cl − and \({SO}_{4}^{2-}\) ions are contributed by anthropogenic sources (Krishan et al., 2021a) whereas, Mg+ controlling the PC-1 of the 2019 is contributed by rock-water interaction. The change in controlling variables in both years can be associated with changing rainfall patterns. Year 2017 is a monsoon deficit year whereas 2018 is a year of normal rainfall since both the sampling was performed in the pre-monsoon season; it is an impression of the previous year which is reflected in the extent of rock-water interactions.

Furthermore, it is observed that TDS (salinity) shows a close positive relationship with Na+, Cl −, \({SO}_{4}^{2-}\), and K+ in the year 2018 and Na+, Cl −, \({SO}_{4}^{2-}\), Mg+, TH, and K+ for the year 2019, thus the variation in TDS of the groundwater samples can be used to evaluate the temporal and regional influence of the salinity factor and its modification. It is observed that out of the 20 samples taken for both the years, 14 had TDS > 1000 mg/l in the year 2018 and 15 had TDS > 1000 mg/l in the year 2019. Four samples showing freshwater environment (TDS < 1000 mg/l) in both years were taken from open wells located at Kotla, Bhoond, Ghata Basai, and Raja ka pul. The sample which shows a transition was located at Khedi Khurd and is a bore well thus it is the rock-water interaction that resulted in increased salinity. Nine out of 14 wells in 2018 and 12 out of 15 wells in 2019 have TDS > 2000 mg/l. Thus, the three wells which are vulnerable to salinity are Naharika, Kalakheda, and Ghagas village. Six samples in 2018 and eight samples in the year 2019 are brackish (TDS > 3500 mg/l). These brackish samples were taken from points located in the Delhi-Alwar subgroup. Since 2018 is a deficit year causing an accelerated rate of groundwater extraction, deep groundwater carries an impression of rock-water interaction which is the leading cause of salinity in these regions. Thus, the two samples which are vulnerable to salinity on account of groundwater extractions are Gaghas SF and Naharika. Thus, rock-water interactions and agricultural return flow govern the PC-1.

Principal component-2

For both the years, the second principal component has higher positive loading for pH and \({\mathrm{HCO}}_{3}^{-}\) as shown in Figs. 4(b, c), and 5(b, c). Whereas, there is a negative plotting for Ca2+, indicating that they come under the same process. PC-2 represents the process of alkalinity in which CO2 dissolution plays an important role.

There were only 3 samples with pH > 8 in the year 2018, whereas there are 8 samples with pH > 8 in the year 2019. A total of 16 out of 20 samples is having a pH between 7 and 8 in 2018, whereas 8 out of 20 samples have a pH between 7 and 8. There is only one sample taken from Kansali, which has a pH = 6.8. The increase in pH and bicarbonate in the year 2019 over 2018 can be associated with the rainfall pattern. There was normal rainfall in the year 2018 compared with the deficit rainfall of 2017, which get translated as recharge in 2019 and 2018, respectively. The decay of organic matter and respiration of roots in the soil enhanced the CO2 pressure, which combines with the recharging water and forms the bicarbonate and excess hydrogen ions, which elevates the pH as shown in Eq. (1).

$${\mathrm{CO}}_2+{\mathrm H}_2\mathrm O\rightarrow{\mathrm H}_2{\mathrm{CO}}_3$$
(1)
$${\mathrm H}_2{\mathrm{CO}}_3\rightarrow\mathrm H^++\mathrm{HCO}_3^-$$
(2)

Furthermore, the aquifer study reveals that Mewat is associated with clay and kankar (CaCO3). The leaching of CaCO3 from kankar increases the Ca+ and \({\mathrm{HCO}}_{3}^{-}\) ions and increases the pH, whereas the precipitation of bicarbonate decreases both Ca+ and \({\mathrm{HCO}}_{3}^{-}\) ions and consequently decreases the pH. Thus, aquifer properties play an important controlling factor for PC-2. PC-2 is therefore termed as a lithological controlled factor.

Principal component-3

The third principal component has a loading of 0.61 and 0.70 for nitrate for the years 2018 and 2019, respectively, as shown in Figs. 4(c) and 5(c). Thus, nitrate acts as an independent variable representing the pollution seeping into the groundwater. In 2018, there were 3 samples with \({NO}_{3}^{-}\) whereas only one sample registers nitrate concentration > 10 mg/l for the year 2019. This indicates that in the case of deficit rainfall, groundwater recharge takes place through drains or other sources contaminated with agricultural or municipal waste. Since there is no source of nitrate pollution other than agricultural return flows or sewage waste, principal component-3 can be termed as the non-lithological controlling factor.

Principal component scores

To identify the regional influences responsible for salinity, principal component scores were plotted. For the year 2018, in PC-1, 6 samples have positive scores and the remaining have negative scores. In PC-2, 11 samples have positive scores, and in PC-3, 7 samples have positive scores. For the year 2019, 6 samples have positive scores and the remaining have negative scores. In PC-2, 10 samples have positive scores, and in PC-3, 11 samples have positive scores. The plot between PC-1 and PC-2 has 3 positive values for 2018 and 4 positive values for 2019. PC-2 vs PC-3 3 positive values for 2018 and 5 positive values from 2019, and the plot between PC-1 and PC-3 has 2 positive values for each year. The distribution of groundwater samples in the positive principal component is shown in Figs. 6 and 7 and in Table 4.

Fig. 6
figure 6

Score plots for principal component loadings for the year 2018 a PC-1 vs PC-2, b PC-1 vs PC-3, and c PC-2 vs PC-3

Fig. 7
figure 7

Score plots for principal component loadings for the year 2019 a PC-1 vs PC-2, b PC-1 vs PC-3, and c PC-2 vs PC-3

Thus, it can be concluded that rainfall plays a significant role in determining the regional factor which influences the salinity. In 2018, the salinity of Kansali, Jaitko, and Akhleempur was determined by both rock-water interaction and anthropogenic factors. But in 2019, apart from these factors, alkalinity also contributing to increased salinity. The salinity at Ghata Basin and Basai for both years is primarily determined by alkalinity. For Naharika, Kalakheda, and Mohmmad, alkalinity is dominating the high TDS value in the year 2018. However, pollution gets added to the groundwater of Naharika, Kalakheda, and Mohmmad in the year 2019. Kotla and Kameda, a fresh water source in the year 2018, became saline in 2019 because of enhanced groundwater pollution. Dhadoli is the only sampling point that contains fresh water in both years. Pollution is the main cause of salinity at Ghagas SF and Pat Khori.

The groundwater quality at Nangli Bhudli is mainly determined by rock-water interaction; however, the alkalinity and pollution become determining factors based on their relative strength. Doha, which registered fresh groundwater in 2018, suffered from both salinity and alkalinity in 2019 under the influence of deficit rainfall and increased recharge. Bhond experienced both alkalinity and pollution in the year 2018, but the groundwater shows a major improvement with all parameters within permissible limits for the year 2019. Khedi khurd and Raja ka pul come under the influence of alkalinity and pollution. At Naglashahpur, rock-water interaction is the main cause of salinity, but it also experienced alkalinity in 2018 and pollution in 2019. At Saral, salinity is on account of rock-water interactions, but it comes under the influence of pollution in the year 2018. Thus, principal component analysis indicates the factors determining the groundwater quality in a specified location and helps identify the wells that are vulnerable to contamination.

Conclusion

To assess the groundwater quality of Mewat district, Haryana 20 samples were collected and analyzed for Cl −, TDS, TH, \({\mathrm{HCO}}_{3}^{-}\), EC, Na +, SO42−, Mg+2, and K+ for the years 2018 and 2019. Based on these analyses, the following conclusions are drawn:

  • 14 out of 20 samples in the year 2018 and 15 out of 20 samples in the year 2019 have TDS > 1000 mg/l. Thus, groundwater salinity and its expansion are evident. To identify the sources of groundwater contamination, principal component analysis is employed.

  • KMO and Bartlett’s test, for both the year the test, was found valid, and PCA is found to be suitable for the study area. Three principal components were selected based on the eigen value which explains 79.58% and 85.08% of the total variation in the years 2018 and 2019, respectively.

  • The first principal component (PC-1) is identified with salinity and is governed by rock-water interactions and agricultural return flow. The second principal component (PC-2) with alkalinity and the third principal component (PC-3) described the pollution.

  • When the yearly comparison was made, the samples collected in 2019 were found to have an increased salinity compared to 2018 which showed an increased vulnerability to the aquifer of Mewat on account of the decline in rainfall recharge. It was also evident that declining recharge also triggered the recharge from other sources, and thus pollution. The impact of pollution is more pronounced in 2019 compared to 2018.