Introduction

The basic origin of water for large amounts of the populace all over the globe is groundwater, and that affects the stable socio-economic growth of each society. In the last few decades, the stability of this essential resource has been a warning sign in several segments of the world due to its overexploitation and increasing demand (Abulibdeh et al., 2021; Sahour et al., 2020). Since about 60% population of the world is present in shoreside regions, the fast-growing population related to these regions leads to too much utilization of groundwater resources to fill the progressive water requirements (Motevalli et al., 2017). Groundwater quality in the coastal zone has been degraded due to seawater intrusion (Arslan, 2013; Arslan & Demir, 2013; Heydarirad et al., 2019) and several other factors such as man-made activities, disintegration of material in aquifers, pollution and salinity from irrigation, and agricultural operations (Ferchichi et al., 2018; Jayathunga et al., 2020). Identification of various pollution sources of groundwater is a basic need because extreme squeezing and different pollution sources of groundwater badly affect its quality related to drinking (Li et al., 2021). Routine-wise quality monitoring of groundwater is mandatory to take care of its quality for drinking. However, examining the several water quality variables is expensive, takes too much time, and is a monotonous exercise. Even assessing all the variables in consistent intervals is nonessential as it will not offer additional information about water quality features (Gulgundi & Shetty, 2018). Troudi et al. (2020) mentioned that monitoring groundwater quality conditions and finding multiple sources of pollution and the origin of influencing parameters related to groundwater contamination are initial requirements to find solutions to groundwater quality problems. An accurate and planned evaluation of the state of groundwater and, in addition, a correct forecast of groundwater quality are necessary to determine the optimal action plan for regional water resources management (Gholami et al., 2016).

Currently, several statistical techniques have been introduced to accurately examine and explain data due to the increase in the number of parameters for groundwater quality analysis (Taşan et al., 2022), considering that water class is commonly expressed in terms of many water quality parameters. PCA can be used to regroup complicated multivariate variables into a nominal and a reasonable number of components without compromising detail (Banda & Kumarasamy, 2020). PCA is an extremely used approach in data research studies and provides a true explanation of multicomponent estimates that allow a better perception of the configuration of groundwater classes (Tripathi & Singal, 2019). Conventional methods for explaining groundwater quality data using the usual charts and graphs may not simultaneously express the uniformities between variables or samples. To overcome this incompleteness, the process of FA was introduced for efficient analysis to identify such similarities between samples or parameters (Patil et al., 2019). There is some research related to PCA/FA that is used to determine the origin of contamination that is prone to degrade groundwater quality. The PCA/FA approach was effectively applied by Bouteraa et al. (2019) and Nguyen et al. (2020) to identify sources of contamination and influencing parameters related to groundwater quality.

The main issue associated with groundwater is that when it is contaminated, it is very difficult to revive the groundwater class (Chen et al., 2018). A conventional approach, such as the analysis of individual parameters, is not immediately understandable, because each parameter has a different quality class, which makes it very difficult to clarify the results when more parameters are involved in the water quality assessment (Solangi et al., 2019). Accurate assessment of the variety and extent of water pollution is very difficult and complex. The main problem related to monitoring the degree of water quality is the complexity of examining many variables. For this reason, the use of the WQI approach is considered a very effective mechanism (Valentini et al., 2021). WQI helps to convert huge water quality data into an index score that directly and rationally tells about water quality (Awachat & Salkar, 2017). The WQI method is effectively employed to assess groundwater quality for drinking. Water quality sense and WQI assessment are essential to water quality management and control (Mahapatra et al., 2012). A model can be prepared with respect to WQI using a statistical method such as MLR, considering several variables that actually affect the water quality of a particular source. The inclusion of fewer variables in the WQI provides cost-effectiveness that can be used to evaluate WQI in a given region in the future (Valentini et al., 2021). If two or more parameters need to be considered simultaneously, the MLR approach should be used to find their interrelationship. It also plays an important role in identifying the variable that has the maximum impact on WQI. MLR model is applied to predict water quality for monitoring purposes (Wu et al., 2020).

Some progressive research studies are mentioned herein related to identifying pollution sources, important parameters related to water quality, and forecasting water quality using different effective approaches. A few years ago, Zhang et al. (2020) and Li et al. (2021) identified the sources of groundwater pollution through positive matrix factorization (PMF) and PCA-based absolute principal component score multiple linear regression (APCS-MLR) models in their respective studies. Later, Mu et al. (2023) investigated the potential pollutants along with their variation in the Malian River through the PCA-APCS-MLR model. Haghnazar et al. (2022) compared the PMF and PCA-MLR receptor models and observed the main contributors associated with groundwater pollution. All the above studies effectively recognized the pollution sources and their impact on water quality in the respective study zone. Singha et al. (2021) acquired 91.7% accuracy in the prediction of WQI through the artificial neural network model approach. Alam and Singh (2023) measured the quality of groundwater by conducting statistical software–based multivariate statistics along with WQI and identified major variables that affected the water quality.

The communities have too much trust in groundwater sources within the Contai area in India for drinking and miscellaneous activities, but its quality has been degraded day after day (Halder et al., 2021; Maity et al., 2017, 2018). After examining 6 years of groundwater quality data, Chakraborty et al. (2020) stated that the concentration of groundwater variables such as pH, TH, Tur, EC, TDS, Mn, Fe, and Cl consistently increases within the Contai area. It makes groundwater in that zone unfit for drinking and domestic use. Recently, Das et al. (2022) observed an interrelationship between selected parameters using WQI, cluster, and regression analyses for the Dulalpur panchayat in Contai. Halder et al. (2021) declared that groundwater quality monitoring immediately requires for managing the vital resources here.

This research, therefore, focuses on water quality issues and the management of essential water resources in coastal areas like Contai by applying statistical approaches together with the groundwater quality index (GWQI), although the concentration of several parameters is constantly increasing, which affects water quality. But to date, details of groundwater pollution sources, the most influential parameters, and the correct index model for water quality monitoring have not yet been identified for water resource management in this coastal region. Thus, it is necessary to identify the probable sources of pollution and the degree of water quality and also to use feasible techniques to monitor the groundwater quality for managing the current water resources for the existing communities. The purpose of this analysis is therefore to determine in detail the likely sources of groundwater contamination, to identify the most influencing parameters for a specific location using PCA/FA, and to identify drinking groundwater quality using GWQI. This study also aims to create new mathematical models for GWQI using MLR analysis, to identify the influencing parameters related to water quality and assess the appropriate model for monitoring groundwater levels in this coastal zone.

Research area

Contai I is formally known as Kanthi I block. It is a municipal development block building directorate division in the Contai sub-division of Purba Medinipur district in West Bengal state of India. It is located at 21.75° N and 87.65° E with an elevation of 3 m (~ 10 ft). The total area of this block is 155.27 km2. As reported by the census (2011), the population in this block is about 170,894. Regarding the availability of water quality data, Badalpur panchayat (BP) and Raipur Pashimbarh panchayat (RP) within this block were used for the study (Fig. 1). Selected area is abbreviated as BPRP area.

Fig. 1
figure 1

Locus plot and sampling points of the research area

It is essentially a coastal area composed of systematic layers formed by the gradual deposition of mud, boulders, gravel, and cobblestones over many years following the continuation of rivers that continuously deposit alluvium. Soil salinity is high in the coastal region consisting of less than 60% clay soil. Agricultural zones are usually the outer side of the main town. The main food crop is paddy rice, which is grown here. In relation to the highest production of paddy rice, the selected district is considered second in the concerned state.

The land is extremely productive, and therefore, agriculture is also a key driver that sustains the economy of middle-class families in the area. However, sometimes agriculture is affected by recurring floods due to relentless monsoon rains and cyclones caused by depressions in the Bay of Bengal. Currently, saltwater intrusion is a threat to residents. People here are highly dependent on groundwater resources to meet their essential requirements because of less rainfall and insufficient surface water resources. The soil here is mainly of fresh alluvial type. The rivers that flow in the area are not enough to handle the necessary water demand. The demand for water is gradually increasing every day due to urbanization, increase in population density, and industrial development. To offset these demands, groundwater is over-extracted, and for this reason, there is an internal movement of salt water into the region. Figure 1 describes the locus graph of the research area and the location of the sample collection. In notation, BPi, i = 1 to 18, and RPj, j = 1 to 20, indicate the number of sampling points corresponding to the respective panchayat.

Methodology

Sampling, analysis, data accumulation, and parameter selection

During August 2019, 38 groundwater samples were accrued from five different Mouzas of Badalpur panchayat (BP = 18) and Raipur Paschimbarh Panchayat (RP = 20) by Contai division specialists (CDS) of Public Health Engineering Directorate (PHED). Samples were taken at an approximate depth of 52 m (~ 170 ft) and 27 m (~ 90 ft) for Badalpur and Raipur Paschimbarh panchayats, respectively. CDS followed the APHA (2017) standard approach for sample accumulation, preservation, and analysis. The sampling method is that first a plastic bottle was taken and refined with purified water. Then, the sample was taken into this bottle from the tubewell after 10 min of pumping and then tightly closed with an inner and outer cap to prevent the passage of air and after properly labeling the dispatch to the laboratory. Cooling to 4–6 °C was performed for each sample to prevent microbiological degradation of solids. The samples at various locations were checked in detail for several water quality variables like pH, TH, Tur, EC, TDS, Mn, Fe, and Cl in the quality testing lab at Manoharchak in CDS PHED. pH and Tur were measured by the electrometric method. EC, TDS, TH, Cl, Fe, and Mn were measured by electromagnetic method, gravimetric method, EDTA titrimetric method, and argentometric method, using a spectrophotometer. Proper quality assurance (QA) and quality control (QC) of the samples were believed to provide the best satisfaction of the data set from the analytical procedures. The standard operating procedures were used for the day-to-day operation of any QA programme and for QC purposes. Special attention was paid to sample collection and preservation, reagent standardization, equipment calibration, and blanks, which are designated as standard water approaches (APHA, 2017). All water quality datasets related to this study were collected from CDS PHED. The 38 samples and aforesaid parameters have been studied here as per existing information of the dataset from PHED.

All aforementioned eight parameters mentioned in IS 10500 (2012) and WHO (2011) are important in water quality assessment because their concentration above the permissible limit can significantly affect the quality of drinking water. The study area is located in the coastal zone and is subject to sea encroachment as identified by Chakraborty et al. (2020). EC, TDS, and Cl are major parameters as there are several soluble in seawater (Liu et al., 2003). Wang et al. (2020a) suggested that seawater intrusion influences Fe levels in groundwater. Gibrilla et al. (2011), Gulgundi and Shetty (2018), Wu et al. (2020), and Ram et al. (2021) considered the maximum of all, and Das et al. (2022) included all eight parameters mentioned in their study. Therefore, based on the codal perspective, literature review, and database availability, the eight parameters presented here are included for groundwater quality assessment as they are often reported by relevant researchers.

Normal statistical analysis

The collected data on groundwater quality parameters were statistically studied using the Statistical Package for the Social Sciences (SPSS, version 26). Initially, data files were prepared using Microsoft Excel media and then moved to the SPSS software package to determine normal statistics of quality parameters. It was performed to provide background information regarding water class data (Papaioannou et al., 2010). The functions used in this statistical approach are (a) minimum (min.), (b) maximum (max.), (c) mean, (d) standard deviation (Std. dev.), and (e) variance of the collected concentrations of groundwater quality variables.

Principal component/factor analysis (PCA/FA)

PCA/FA was used to share sources of contamination and also to select vital water class variables associated with these sources. PCA was performed on the standardized data set to remove key principal components, and these components were then subjected to varimax rotation to create varifactors (VFAs). Initially, the relevance of PCA was examined using the Kaiser–Meyer–Olkin (KMO) measure of sampling adequacy and Bartlett’s test of sphericity. Both checks were used to successively verify the sample efficiency and freedom of each parameter (Yang et al., 2020). Here, PCA was performed on the data set after the Z-score standardization of all parameters to minimize the impact of varieties on the dimensions used for quantification and variation and to provide a smaller unit of data (Gibrilla et al., 2011). Z-score normalization using MS Excel was performed with respect to Eq. (1).

$$Z = \frac{{c - \overline{c}}}{{{\text{SD}}}}$$
(1)

where, c, \(\overline{c}\), and SD are the concentration, mean concentration, and Std. dev. of the variables.

In the PCA technique, the eigenvalues are generally used to find out the principal components (Zeinalzadeh & Rezaei, 2017). Principal components (PCs) with eigenvalues exceeding 1.0 were considered (Jayathunga et al., 2020). Eigenvalues imply the importance of the principal component, so a component with a larger eigenvalue is considered highly influential (Li et al., 2019). The first PC is considered for extremely impressive variance in the database, then the second component, and the rest. It is common to alternate components to achieve an optimal spread of variance in different components (Mahapatra et al., 2012). Varimax rotation was used to obtain an optimal distribution of variance in the various components. Varimax rotation can minimize the impact of nonsignificant variables in groundwater quality analysis because it naturalizes the loadings by accurately rotating the component axes to clarify the results (Islam et al., 2018). Here, a varimax rotation accompanied by Kaiser normalization was performed to find the rotated loadings of the various factors, and sources of contamination regarding the presentation of factor loadings were identified. A specific variable is how strongly it is associated with different factors as reflected by a rotated matrix of factor loadings (Li et al., 2019). Factor scores (F1, F2, and F3) indicate the contribution of each factor for each sample location point (Narany et al., 2014), which are computed here, and it also helps to identify the more active variables for a specific sample site (Taşan et al., 2022). Equation (2) is used to find the principal component (Li et al., 2019). Here, PCA and FA were performed through the SPSS suite of groundwater quality data sets of 38 sampling sites.

$$P_{IJ} = A_{I1} X_{1J} + A_{I2} X_{2J} + A_{I3} X_{3J} + ....... + A_{IM} X_{MJ}$$
(2)

where A, P, X, I, J, and M denote the component loading, component score, measured parameter value, number of components, number of samples, and the total number of parameters, respectively.

Groundwater quality index (GWQI) estimation

The WQI approach examines the respective contributions of water quality variables to the overall groundwater health risk and seeks to translate complex water quality data sets into a value that is commonly acceptable and usable in general (Asare et al., 2021). Here, GWQI was calculated at different sampling sites by including variables such as pH, TH, Tur, EC, TDS, Mn, Fe, and Cl. Four consecutive phases were used to calculate the GWQI.

The first step is to assign a weight (wti) to the selected eight water quality variables related to their respective importance for drinking water quality. A weight of five was assigned to maximum detrimental variables and a weight of one to nominal detrimental variables. The highest weight of five was assigned to TDS; weight of four was assigned to pH, Fe, Mn, Cl, and EC; weight of three was assigned to Tur; and weight of two was assigned to TH (Aminiyan et al., 2018; Boateng et al., 2016). In the next step, by adopting Eq. (3), relative weight (RWTi) was calculated.

$${\text{RWT}}_{i} = {{{\text{wt}}_{i} } \mathord{\left/ {\vphantom {{{\text{wt}}_{i} } {\sum\limits_{i = 1}^{n} {{\text{wt}}_{i} } }}} \right. \kern-0pt} {\sum\limits_{i = 1}^{n} {{\text{wt}}_{i} } }}$$
(3)

The quality rating (QRi) was calculated using Eq. (4), where mi and STi are the measured concentrations and prescribed Standard values related to each variable. The assigned weight, relative weight, and standard value related to each variable are shown in Table 1.

$${\text{QR}}_{i} = \left( {\frac{{m_{i} }}{{{\text{ST}}_{i} }}} \right) \times 100$$
(4)
Table 1 Weight, relative weight, and the standard value of each parameter

In the last stage, the GWQI was calculated using Eq. (5).

$$G_{{{\text{WQI}}}} = \sum\limits_{i = 1}^{n} {{\text{RWT}}_{i} } \times {\text{QR}}_{i}$$
(5)

The class of water according to the index score can be designated as class “A” or excellent (GWQI < 50), class “B” or good (GWQI lies between 51 and 100), class “C” or poor (GWQI lies between 101 and 200), class “D” or very poor (GWQI lies between 201 and 300), and class “E” or unacceptable (GWQI > 300) for drinking (Haghnazar et al., 2022).

Multiple linear regression analysis

In this work, the variables used for the MLR study are the same as the variables used for GWQI estimation. On a random sampling basis, 30 samples (80%) and eight samples (20%) of the total were taken for model preparation and model validation here. This statistical tool is used to understand the relationship between a dependent parameter and various independent parameters (Wu et al., 2020). It is expressed by Eq. (6).

$$R = B_{0} + B_{1} P_{1} + B_{2} P_{2} + .......... + B_{M} P_{M} + R_{E}$$
(6)

where R is the response parameter, P1…..PM are the predictor parameters, B0….BM are the regression coefficients, and RE is the random error. Here, stepwise MLR was performed to assess the association between groundwater quality parameters as independent variables and GWQI as a dependent variable using IBM SPSS to obtain a favorable model. After finding the suitable model, index scores were calculated. Lastly, the significant variations were determined between the results of the original GWQI and the new GWQI using a paired samples t-test.

Results and discussion

Exploration of water quality through normal statistics of variables

The normal statistics of the individual values (Table 2) of water class variables sampled in this research area are shown here. The changes in the water quality variables for the BPRP area are illustrated by box plots (Fig. 2a–h). Basic statistics were calculated for the entire groundwater datasets to gain an overall view and to recognize dataset variations (Patil et al., 2019). Groundwater quality variables are considered the most essential basis for pointing out the character, class, and diversity of groundwater. According to IS 10500 (2012), the permissible pH of drinking water can be in the range of 6.5–8.5. The pH value measures the balance between the concentration of hydrogen ions [H+] and the concentration of hydroxyl ions [OH] in water and indicates the basicity, or acidity, of a mixture. The groundwater pH of the BPRP area ranges from 6.80–7.90 with an average concentration of 7.24, indicating a marginally acidic to alkaline type. Tur concentration ranges from 1.34–64.90 NTU with an average concentration of 8.72 NTU, which is above the permissible value prescribed by IS 10500 (2012). Out of 38 samples, 11 samples (29%) have Tur concentration above the permitted limit. Here, the lower, higher, and mean values of TH of groundwater are 172, 328, and 257.32 mg/l, respectively. TH value (mg/l) between 150 and 300 and exceeding 300 indicates that the water is hard and very hard, respectively (Todd & Mays, 2005). In the BPRP area, the groundwater is classified as hard to very hard. Most of the sample has a TH value that exceeds the permissible limit (200 mg/l) specified by IS 10500 (2012) for drinking. Only two sample points BP6 and BP15 have an acceptable TH value for drinking.

Table 2 Normal statistics of groundwater quality variables (total observation N = 38)
Fig. 2
figure 2

Concentration representation of groundwater quality parameters such as a pH, b turbidity, c TH, d TDS, e EC, f Fe, g Mn, and h Cl through box plots

The minimal, maximal, and average concentrations of TDS in groundwater in the BPRP area are 319, 735, and 397.95 mg/l, respectively. The maximum sampling point has a TDS value within the standard limits, and only sampling locations RP1, RP7, RP12, RP13, and RP20 exceed the permissible TDS for drinking. Groundwater at all sites is categorized as fresh because the TDS concentration at each site is less than 1000 mg/l (Todd & Mays, 2005). The least, highest, and mean levels of electrical conductivity (EC) in the BPRP area are 534, 1223, and 669.21 μS/cm, respectively. Results indicate that saltwater intrusion is less due to EC concentration below 1500 μS/cm throughout the sites (Das et al., 2022). Groundwater samples with EC concentration are below the recommended drinking value in the BPRP area. Fe is a very crucial element that affects human health, and surplus or deficit of Fe ingestion can create several diseases (Wang et al., 2020a). The Fe range is between 0.10 and 2.20 mg/l with an average concentration of 0.45 mg/l in the BPRP area. Fifty percent of the samples exceeded the permissible limit of Fe for drinking given by IS 10500 (2012). Manganese appears in groundwater naturally, chiefly in anaerobic conditions. Its concentration depends on the chemistry of rainfall, the lithologic condition of the aquifer, the flow courses of groundwater, etc. (Ram et al., 2021).

The concentration of Mn varies between 0.002 and 0.08 mg/l with a mean of 0.026 mg/l, and the concentration of Mn in each sampling point is within the acceptable level for drinking. Chloride levels range from 11–139 mg/l with an average concentration of 56.34 mg/l, and the Cl concentration at each sampling point is below the drinking limit.

Spotting the pollution sources through PCA/FA

A KMO value above 0.5 and Bartlett’s test value below 0.05 from water quality data are considered suitable for PCA (Yang et al., 2020). If the KMO value of any data set is found below 0.5, lies 0.5–0.7, and is above 0.7, the data set is considered unsatisfactory, adequate, and good for PCA (Ustaoglu et al., 2020). In this work, the KMO value (0.636) indicates that the data set is acceptable for PCA, and the parameters are significantly related to the significance level (0.00) obtained from Bartlett’s test.

Eigenvalues for eight water quality parameters and their variances were calculated using SPSS. Based on PCA, the first three components have eigenvalues greater than one, and the other components have eigenvalues less than one. So the first three components (eigenvalues greater than 1) were considered, and the corresponding variance is shown in Fig. 3 using a scree plot which addresses the change in the eigenvalue curve.

Fig. 3
figure 3

Scree plot (pink line) and percent of variance (blue bars) corresponding to each component

It shows that the eigenvalues corresponding to components 1, 2, and 3 are above 1 too evidently. The inflection point of the curve (Fig. 3) appears at the third component, so it is convenient to select the first three components that are considered to be the principal components, and clearly, the three counts of the principal components (PC1, PC2, and PC3) were achieved.

In this study, the first three PCs have 80.61% cumulative variance of the total variance, which explains that these PCs may represent the real eight water quality class variables. The first (PC1), second (PC2), and third (PC3) principal components have eigenvalues and corresponding contributions shown in Fig. 3. The interrelationship between PC and selected parameters is indicated by factor loadings, and loadings with the highest positive or negative value determine the highest contribution (Arslan, 2013). Absolute loading values are more than 0.75, between 0.75 and 0.5, and between 0.5 and 0.3 referred to as strong, medium, and weak factor loadings separately (Zhang et al., 2020).

Dutta et al. (2018) stated in their research that the minimum factor loading standard used in several research papers is different for determining the decisive parameters. Here, factor loadings above 0.6 in bold (Table 3) were chosen to explain the results because they are significant for assessing the components, and loadings with a negative and positive sign (Table 3) indicate the direction of the effect. Thus, a high negative loading value indicates that the factor is significantly and negatively affected by the parameter.

Table 3 Factor loading and factor score coefficient of parameters after varimax rotation

The PCA/FA approach shows that three key sources of groundwater contamination were extracted in parallel with three varimax factors (VFAs) (Table 3), which together illustrate 80.61% of the entire variance of the eight groundwater quality variables. The first varimax factor (VFA1) explains 34.35% of the total variation of the eight groundwater quality parameters and is considered the most important factor. The second vital varimax factor (VFA2) explains 28.14%, and the third varimax factor (VFA3) explains 18.12% of the total variance.

VFA1 consists of positive strong loadings of 0.826, 0.872, and 0.860 and medium loadings of 0.688 for Tur, Mn, Cl, and TH. The VFA2 has strong positive loadings on TDS (0.972) and EC (0.969). VFA3 has strong positive and negative loadings with pH (0.839) and Fe (− 0.762). According to Liu et al. (2003), the main elements in seawater are Cl, TDS, and EC. Salinity is a very widespread form of groundwater pollution, specifically in coastal aquifers, and is measured by the rise in TDS (Rao et al., 2013). Parameters such as Cl, TDS, TH, and EC with significant positive factor loading values are an indication of the amalgamation of saline water with fresh groundwater (Akshitha et al., 2021). The above variables can cause salinity in groundwater, and Cl is recognized as a signal of saline water intrusion into groundwater resources (Taşan et al., 2022). Significant positive TH and Tur loading factors suggest that the origin is attributable to rock breakdown and leaching (Boateng et al., 2016). Perhaps, strong pH loading is predicted to be catalyzed by organic or otherwise biogenic activities (Reghunath et al., 2002). The strong Fe factor loading is probably caused by the liquefaction of non-lithogenic or otherwise lithogenic references via percolated water (Patil et al., 2019). The origin of Mn in groundwater is likely due to the weathering of manganese minerals in aquifers and may be induced by industrial wastewater and landfill leachate (Zhang et al., 2020). Perhaps, Mn originating from mineral sources can be released by the chemical disintegration of the parent material (Bodrud-Doza et al., 2016). Here, the varimax factor (VFA1) is influenced by parameters such as Tur, Mn, Cl, and TH. Factor 2 (VFA2) is affected by TDS and EC. Factor 3 (VFA3) is affected by pH and Fe. So the water in VFA1 can be affected by salt water along with the decay and leaching process. Water in factor 2 (VFA2) is an immixture of seawater together with clean groundwater. Factor 3 (VFA3) can be affected by a combination of organic biogenic activities and lithogenic or otherwise non-lithogenic sources through percolating water.

Factor score coefficient (Table 3) is also determined for the entire variables that express the interpretation of a certain factor in a given sampling location (Patil et al., 2019). The factor scores of the 38 sampling sites are shown in Fig. 4. Absolute positive or negative scores (greater than + 1 or less than − 1) on any component indicate that the location is largely affected or unaffected by the parameters influencing the component, while a score close to zero defines a likely location moderately affected by the chemical action of that specific factor (Senthilkumar et al., 2008).

Fig. 4
figure 4

Factor scores at different locations based on groundwater data

Sampling points RP2, RP6, RP9, and RP11 with factor scores 3.013, 3.192, 2.719, and 1.106 are mostly affected by variables such as Cl, Tur, Mn, and TH, because the above sampling points have factor scores more than + 1.

Sampling points RP1, RP7, RP12, RP13, and RP20 with factor scores 2.239, 2.256, 2.193, 2.632, and 2.397 are affected by parameters such as EC and TDS. Sampling points RP3, RP4, RP5, RP8, RP10, and RP12 with factor scores 1.542, 1.687, 1.784, 1.733, 1.560, and 1.008 show that these sampling sites are most affected by pH and Fe. Biplots are more productive and instructive that can be used for graphical demonstrations of statistical analysis (Banda & Kumarasamy, 2020). Biplots were composed in three dimensions accompanied by the first three PCs as three axes. They ideally describe connections among parameters and PCs. Biplots point to a branch of highly correlated parameters using an estimate of the true multivariate space (Gradilla-Hernández et al., 2020). 3D biplots of this current study describing the association between extremely correlated parameters and the first three PCs are shown in Fig. 5.

Fig. 5
figure 5

3D biplot explaining the association between the extremely correlated parameters and the three PCs

PCA/FA approach clearly indicates that the water quality in this selected zone is affected by salt water as mentioned in the earlier studies (Chakraborty et al., 2020; Halder et al., 2021; Maity et al., 2017, 2018) along with other several reasons like decay and leaching process, organic biogenic activities, and lithogenic and non-lithogenic sources from water percolation. This analysis also identifies the influencing parameters along with their probable sources in the selected sites. The groundwater quality can be protected here by controlling the contamination sources. Some studies like Bouteraa et al. (2019), Patil et al. (2019), Nguyen et al. (2020), and Wu et al. (2020) investigated the potential water pollution sources through PCA/FA, and that confirms the precision and acceptance of this technique. So this approach provided priceless information by efficiently identifying the expected groundwater pollution sources including possible regulating parameters related to water quality for a specific site, and that helps the decision-maker restrain water pollution in that area.

Water quality rating by GWQI and identifying MLR-based new GWQI model

The water quality type was determined based on the GWQI score. The measured GWQI scores range from 52.33 to 204.84, and the water status related to the BPRP area is shown in Fig. 6. It is noted that 87%, 11%, and 2% of the total samples fall into the category of good, poor, and very poor condition for drinking, and none of the sampling sites fall into class E. In this study, a stepwise MLR approach was introduced to prepare new GWQI equations. Eight models (Table 4) were obtained from this MLR study. The results of this analysis are shown in Tables 4 and 5. New GWQI equations from this study are represented by Eqs. (7), (8), and (9) sequentially.

Fig. 6
figure 6

Groundwater quality status at different locations according to GWQI

Table 4 Overview of models from the MLR analysis
Table 5 Estimated models with coefficients

Three appropriate regression models Mod6, Mod7, and Mod8 were considered for the unequal significance of the model variables (Table 5). Coefficients of determination (R2) for MLR were also more promptly explained than coefficients of multiple correlations (R) as a scale of the level of relationship, because the multiple R2 is equivalent to the portion of the total variance in the presence of parameter that is possibly attributable to the predictor parameter outcomes (Mondal et al., 2010).

The above three regression models Mod6, Mod7, and Mod8 adjusted R2 values equaling 1.0 (Table 4), indicating that the overall significance level of these three models is high.

The model Mod6 has six independent parameters that are significant (significance level < 0.05) (Mustapha et al., 2012) in describing the variation of water classes in the BPRP area. This regression model is expressed by Eq. (7). According to Mod6, Tur has the highest standardized beta coefficient (0.820) among the variables measured using the stepwise MLR approach (Boateng et al., 2016; Uyanik & Güler, 2013), which means that Tur has the largest contribution to the entire groundwater class of in the BPRP area. The beta coefficient for Fe (0.404) is the second highest after EC (0.164), Mn (0.075), TH (0.035), and Cl (0.033). Cl has a minimal contribution in this model with the smallest beta coefficient (0.033).

The regression model Mod7 consists of seven significant predictor parameters with a level of significance < 0.05. This regression model Mod7 is expressed by Eq. (8). The standardized beta coefficients in this model show that Tur with the maximal beta coefficient (0.827) has the highest contribution to the whole groundwater standard within the BPRP area, followed by Fe (0.407), TDS (0.129), Mn (0.076), Cl (0.039), EC (0.037), and TH (0.033). TH with the smallest beta coefficient (0.033) is considered the smallest contributor in this model.

$$G_{{{\text{WQI}}}}\;=\;10.989\;+\;1.973{\text{Tur}}\;+\;43.594{\text{Fe}}\;+\;0.028{\text{EC}}\;+\;142.940{\text{Mn}}\;+\;0.039{\text{TH}}\;+\;0.047{\text{Cl}}\;\pm\;1.588$$
(7)
$$G_{{{\text{WQI}}}}\;=\;10.666\;+\;1.990{\text{Tur}}\;+\;43.978{\text{Fe}}\;+\;0.006{\text{EC}}\;+\;143.562{\text{Mn}}\;+\;0.036{\text{TH}}\;+\;0.055{\text{Cl}}\;+\;0.037{\text{TDS}}\;\pm\;0.844$$
(8)
$${G}_{\text{WQI}}\;=\;0.009\;+\;2.000{\text{Tur}}\;+\;44.442{\text{Fe}}\;+\;0.009{\text{EC}}\;+\;133.360{\text{Mn}}\;+\;0.033{\text{TH}}\;+\;0.053{\text{Cl}}\;+\;0.033{\text{TDS}}\;+\;1.568{\text{pH}}\pm\;0.015$$
(9)

The regression model Mod8 contains eight predictor parameters and is expressed by Eq. (9). The standardized beta coefficients related to this model show that Tur has the maximum contribution (similar observation with Uyanik & Güler, 2013) to the entire groundwater class in BPRP area due to its highest beta coefficient (0.831), followed by Fe (0.412), TDS (0.115), Mn (0.070), EC (0.051), Cl (0.038), TH (0.030), and pH (0.012). pH has a minimal contribution in relation to water quality due to the lowest beta value (0.012) in this model.

A suitable model for GWQI prediction for the BPRP area was obtained using the MLR approach. The causative variables that are considered in the above three models are significant. In Mod8, TDS and pH are attached, reducing the significance level of the constant. This suggests that Mod8 is associated with the maximum uncertainty regarding the constant (Wu et al., 2020). So Mod8 is not good compared to Mod6 and Mod7. In Mod6, all the predictor variables including the constant are significant (0.00) compared to Mod7. So, from the MLR study, the model Mod6 which has Tur, Fe, EC, Mn, TH, and Cl as independent variables is more reliable and is designed for the appropriate projection of GWQI (Eq. (7)) in this research area. To validate the proposed model, compare the original and new GWQI of the 20% sample using a t-test. The results obtained from the new GWQI are not significantly different from those of the original GWQI. The t-test result confirms the effectiveness of the new GWQI equation because the significance (0.873) is greater than 0.05.

GWQI converts all the parameters into a solitary value which is very acceptable in general. Among all the samples, RP16 shows the better GWQI score and indicates good for drinking, though the TH level is slightly higher than the permissible limit in that sample. On the other hand, RP6 shows a higher GWQI score and suggests very poor for drinking, and though apart from Tur, TH, and Fe, the rest of the parameters are within permissible limits. The samples which are showing poor to very poor grade require some treatment prior to consumption. Before individual parameter analysis was done by Maity et al. (2017) and Chakraborty et al. (2020) in that zone, they proposed that the groundwater quality has been falling down due to elevated concentration of the above selected parameters progressively. But this conventional approach did not provide the entire water quality because of the different classifications of every parameter, and that is confusing the results as various parameters are included here to decide the water quality. So GWQI comprehensively provides a better interpretation of the complete water quality scenario in chosen stretches.

The above study also implies that the new GWQI MLR model with not many variables produces economic benefits, reduces the eclipse effect, offers good accuracy to predict the water quality, and suggests that Tur is the main contributor in connection with groundwater pollution. So this will be very logical to monitor and manage the quality of existing water resources for the communities in this locality. Wang et al. (2020b), Banda and Kumarasamy (2020), Wu et al. (2020), and Valentini et al. (2021) forecasted the WQI through the MLR model approach and also identified the most dominant parameter linked to water quality, authenticate the significance of this approach, and certify its feasibility. The results obtained from the above-stated studies and our current study both prefer that the MLR model is a handy tool to predict water quality, and a model with lesser variables may provide greater performance than a model with more parameters.

The above investigation through PCA/FA, GWQI, and MLR prescribes a lot of information in respect of groundwater quality in BPRP. In addition, here, we have taken into account the eight water quality variables in 38 sampling points as per information available about the dataset. Analysis of more parameters will be helpful to know about the other characteristics of water, and more representatives will furnish a better understanding about water quality processes and trends. As we have limited parameters for this analysis, the reliability of the statistical approaches may be enhanced by introducing more parameters.

Conclusions

The WQI, along with statistical approaches, has been used to provide a comprehensive picture of groundwater conditions associated with selected coastal areas, as groundwater quality is declining here. Therefore, here, the PCA/FA method was used to identify potential sources of groundwater contamination, and GWQI was involved in identifying the current groundwater class in the BPRP area. The MLR analysis was used to develop a new appropriate mathematical model in GWQI estimation for groundwater quality monitoring. Current research shows that groundwater is fresh, hard to very hard, and slightly acidic to alkaline.

Three principal components were extracted from the PCA/FA study, which explained 80.61% cumulative variance of the total variance. PCA along with FA revealed that VFA1 may be affected by salt water along with the disintegration and leaching process, VFA2 may be affected by salt water, and VFA3 may be affected by association with the organic or otherwise biogenic and lithogenic or otherwise non-lithogenic process. This study also reveals that the sampling points RP2, RP6, RP9, and RP11 are mostly affected by variables such as Cl, Tur, Mn, and TH; sampling points RP1, RP7, RP12, RP13, and RP20 are mostly affected by the parameters such as EC and TDS; and sampling points RP3, RP4, RP5, RP8, RP10, and RP12 are mostly affected by pH and Fe.

The GWQI was evaluated for 38 sites, and the calculated index scores ranged from 52.33 to 204.84. It was observed that 87% of the sample falls in grade B, 11% of the sample falls in grade C, and 2% of the sample falls in grade D. None of the samples fall in grades A and E. Therefore, groundwater is possibly considered for drinking purposes in entire sampling locations. The MLR model containing prediction parameters such as Tur, Fe, EC, Mn, TH, and Cl is more reliable and is proposed to predict the GWQI. The proposed MLR model also reveals that Tur is the highest contributor to the overall groundwater quality. This model also provides financial benefits because fewer variables are involved.

The WQI together with the statistical models is shown to be a favorable approach to identifying groundwater quality by converting the dataset into equivalent entity data and numerical index scores. Therefore, the assessment of groundwater quality using statistical and index approaches is not only a common study but allows to delineate the most unsafe area with regard to the class of groundwater that can adversely affect human health. This study can assist the designers and officials in observing and selecting solutions for groundwater contamination. Therefore, this type of analysis is very valuable for the modernization and permissible expansion of groundwater resources for the general public.