Introduction

Metals’ contamination of groundwater is of great concern on lives owing to their toxicity, persistence, and extensive bioaccumulation. Groundwater is an important resource for agriculture, industrial, and other economic sectors in Bangladesh. Rapid urbanization, agricultural, and industrial activities are affecting groundwater day by day. A wide range of public health issues such as cancer, hypertension, hyperkeratosis, peripheral vascular disease, restrictive lung disease, and gangrene occurs due to the consumption of contaminated water (Smith et al. 2000). Approximately 17 % of groundwater in Bangladesh exhibit arsenic (As) concentrations beyond the acceptable limit (10 μg/L) of DoE (1997) for drinking water.

The interpretation of water quality data sets for pollution evaluation is quite difficult by only elemental concentrations (Nimic and Moore 1991). However, WQIs have huge scope to analyze the data sets for better interpretation of pollution. There exists a wide range of WQIs; however, the choice depends upon the input variables and the desired results (Handa 1981; Zou et al. 1988; Sahu et al. 1991; Li et al. 2009). Due to some limitations, WQI values provide better results together with the chemometric techniques. It has been found that chemometric methods are the most reliable approaches for data mining of matrices from environmental quality assessment (Astel et al. 2007, 2008). Among the available chemometric methods, multivariate statistical analysis has been widely used for source apportionment of metals in soil and water in different parts of the world (e.g., Singh et al. 2005; Halim et al. 2010; Bhuiyan et al. 2010; Li et al. 2013; Machiwal and Jha 2015).

On the other hand, geostatistical method has gained importance to evaluate the spatial distribution of pollutant in soils and water. It is also an important tool for spatial dependence/autocorrelation among the sampling points. Subsequently, this type of information is important in estimating the pollutant migration history and spatial distribution of the pollutant at different sites. However, the integrated approaches of multivariate analysis and geostatistics may provide a holistic approach on the complex pollution system.

As the spatial distribution of heavy metal contamination in groundwater is controlled by the geological/geochemical heterogeneity, the spatial interpolation technique has been used to estimate the concentration at unmeasured locations and devise points to show groundwater contamination (Webster and Oliver 2001). Detailed and extensive explanations of geostatistical method have been reported in different literatures (Isaaks and Srivastava 1989; Goovaerts 1997; Webster and Oliver 2001). The cross-validation results from geostatistics represent that the ordinary kriging technique can predict spatial variability more accurately. The ordinary kriging method deals with spatial correlations between the sample points and has been widely used for mapping spatial variability of elements. An assessment of drinking and irrigation water quality is very essential for understanding the suitability of groundwater for different purposes. In the study area, a limited work has been conducted on groundwater quality. Hence, the integrated approaches of different chemometric methods are considered as important tool for pollution evaluation in this study area. Considering all these aspects, Lakshimpur district of Bangladesh has been selected as the study area for a comprehensive study using the integrated approaches of multivariate analysis and geostatistical methods.

Materials and methods

Study area

Lakshimpur Sadar upazila (a subdistrict, a small administrative unit), located in southeastern Lakshimpur district of Bangladesh, has been selected for this study. Geographically, the study area is positioned between 22°49′–22°03′N and 90°43′–92°00′E (Fig. 1). It is bounded by Raipur, Ramganj, and Chatkhil upazilas on the north; Daulatkhan, Kamalnagar, and Noakhali Sadar upazilas on the south; Begumganj and Sonaimuri upazilas on the east; and Raipur upazila and Meghna River on the west. Lakshimpur upazila (subdistrict) has an area of 514.78 sq km with a total population of 575,278 (Banglapedia 2006). The sites are chosen mainly based on their proximity of suspected pollution sources and ecological and environmental importance. Dalal Bazar, Parbatinagar, Dattapara, Hajipara, Jacksinhat sites are densely populated areas. The groundwater quality at Mazuchowdhurihat and Shackchar (western part of the area) is highly dominated by Meghna River. Physiographically, it is a coastal floodplain that experiences tide actions regularly. About 74 % of this area is under water supply coverage provided by several local and national NGOs.

Fig. 1
figure 1

Location map showing the sample sites in the study area

Sample collection and preparation

Groundwater samples are collected from 70 preselected sampling points at the Sadar upazila of Lakshimpur district of Bangladesh (Fig. 1). The sampling locations are recorded by a GPS device (Explorist model: 200). The information regarding well depths is collected from the record preserved by the well owners and local government offices. Three types of tubewells such as (1) shallow wells (10–60 m depth), (2) deep wells (80–375 m depth), and (3) dug wells (7–318 m) have been selected on the basis of the availability at the study area. Samples are collected in pre-washed high-density polypropylene (HDPP) bottles following the standard method of APHA-AWWA-WEF (2005). For metal ions and dissolved organic carbon (DOC) analysis, water samples are preserved following the standard procedures of Rahman and Gagnon (2014). All analytical procedures of groundwater samples are conducted following the standard methods (Table 1). The accuracy and precision of analysis are tested through running duplicate analysis on selected samples, and the average results for all analyses are used to represent the data.

Table 1 List of chemical elemental analysis, methods, and equipments

Groundwater pollution evaluation indices

Groundwater quality index (GWQI)

Groundwater quality index (GWQI) method reflects the composite influence of the different water quality parameters on the suitability for drinking purposes (Sahu and Sikdar 2008). The groundwater quality has been measured by using the following equation for GWQIs Vasanthavigar et al. (2010) with respect to WHO (2011) and Bangladesh standards (1997).

$${\text{GWQI}} = \sum {{\text{SI}}_{i} } = \sum {(W_{i} \times q_{i} ) = \sum {\left( {\left( {\frac{{w_{i} }}{{\sum\nolimits_{i = 1}^{n} {w_{i} } }}} \right) \times \left( {\frac{{C_{i} }}{{S_{i} }} \times 100} \right)} \right)} }$$

where C i is the concentration of each parameters, S i is the limit values, w i is the assigned weight according to its relative importance in the overall quality of water for drinking purposes (Table 2), q i is the water quality rating, W i is the relative weight, and SI i is the subindex of ith parameter.The heavy metal pollution index (HPI) method has been developed by assigning rating or weightage (W i ) for each chosen parameter (As, Pb, Fe, Mn, Ni, Sb, Ba, Mo, Al, Zn) and selecting the groundwater parameter on which the index has to be based on (Bhuiyan et al. 2010). The rating is nearly zero to one, and its selection reveals the significance of each water quality parameter. It has been developed based on the monitored values, ideal values, and recommended standard values of the studied parameters. It can be defined as inversely proportional to the recommended standard (S i ) for each parameter (Horton 1965; Reddy 1995; Mohan et al. 1996). The concentration limits (i.e., the highest permissible value for drinking water (S i ) and maximum desirable value (I i ) for each parameter) are taken from the Indian drinking water specification standards of 2012 (BIS 2012) for this study. Heavy metal pollution index (HPI) has been used for assigning rating or weightage (W i ) for each selected parameter and can be computed using the following expression (Mohan et al. 1996; Bhuiyan et al. 2015):

$${\text{HPI}} = \frac{{\sum\nolimits_{i = 1}^{n} {W_{i} Q_{i} } }}{{\sum\nolimits_{i = 1}^{n} {w_{i} } }}$$

where Q i is the subindex of the ith parameter, W i is the unit weight of the ith parameter, and n is the number of parameters. The subindex Q i is computed by

$$Q_{i} = \sum\limits_{i = 1}^{n} {\frac{{\left\{ {M_{i} ( - )I_{i} } \right\}}}{{(S_{i} - I_{i} )}}} \times 100$$

where M i , l i , and S i stand for the monitored values, ideal values, and standard values of the ith parameter, respectively. The negative sign (−) denotes numerical difference in the two values ignoring algebraic sign.

Table 2 List of parameters, weight factors, and limit values for the water quality index after Vasanthavigar et al. (2010)

Heavy metal evaluation index (HEI) method provides an insight into the overall quality of the groundwater with respect to heavy metals and metaloids (As, Pb, Fe, Mn, Ni, Sb, Ba, Mo, Al, Zn (Edet and Offiong 2002). It has been calculated by Prasad and Jaiprakas (1999) as follows:

$${\text{HEI}} = \sum\limits_{i = 1}^{n} {\frac{{H_{c} }}{{H_{\text{mac}} }}}$$

where H c is the monitored value and H mac is the maximum admissible concentration (MAC) of ith parameter.

The degree of contamination (C d /CD) has been adopted from Backman et al. (1997). Prasad and Bose (2001) evaluate the combined effects of several quality parameters which are considered detrimental to household water. The CD/C d is determined by:

$$C_{d} = \sum\limits_{i = 1}^{n} {C_{{f_{i} }} }$$

where

$$C_{{f_{i} }} = \frac{{C_{ai} }}{{C_{ni} }} - 1$$

C fi is the contamination factor, C ai is the analytical value, and C ni is the upper permissible concentration for the ith component and n indicates the normative value. Here, C ni is taken as maximum admissible concentration (MAC).

Multivariate statistical analysis

Principal component analysis (PCA) reduces the dimensionality of data by a linear combination of original data to generate new latent variables which are orthogonal and uncorrelated to each other (Nkansah et al. 2010). It extracts the eigenvalues and eigenvectors from the covariance matrix of original variables (Chabukdhara and Nema 2012). The eigenvalues of the PCs are the measure of their associated variance, the participation of original variables in the PCs is given by the loadings, and the coordinates of the objects are called scores (Helena et al. 2000; Wunderlin et al. 2001; Heberger et al. 2005). PCA provides an objective way of finding indices of this type so that the variation in the data can be accounted for as concisely as possible (Sarbu and Pop 2005). PCA has been performed to extract principal components (PC) from the sampling points and to evaluate spatial variations and possible sources of heavy metals in groundwater.

Factor analysis is similar to PCA except for the preparation of observed correlation matrix for extraction and the underlying theory (Tabachnick and Fidell 2007; Bhuiyan et al. 2011a). The goal of FA can be achieved by rotating the axis defined by PCA, according to the well-established rules, and constructing new variables, also called varifactors (VF) (Shrestha and Kazama 2007).

The correlation coefficient matrix measures how well the variance of each constituent can be explained by relationship with each other (Liu et al. 2003). According to the approach of Liu et al. (2003), the terms “strong,” “moderate,” and “weak” are applied to factor loadings and refer to absolute loading values as >0.75, 0.75–0.50, and 0.50–0.30, respectively.

The cluster analysis (CA) is applied to identify groups or clusters of similar sites on the basis of similarities within a class and dissimilarities between different classes (Lattin et al. 2003). Hierarchical cluster analysis (HCA) studies distance between parameters of samples. The most similar points are grouped forming one cluster, and the process is repeated until all points belong to one cluster (Danielsson et al. 1999; Birth 2003). The result obtained is shown in a 2D plot called dendrogram. In the study, Ward’s method with squared Euclidean distances is used. The experimental groundwater data were subjected to statistical analysis using SPSS software (version 22.0). Pearson’s correlation matrix is used to identify the relationship among the pairs of parameters.

Geostatistical modeling

Ordinary kriging (OK) and semivariogram models are applied for spatial distribution of groundwater parameters which are related to groundwater application in hydrological studies. These interpolation techniques are well documented in the recent literature (e.g., Masoud and Atwia 2010; Ahmed et al. 2011; Masoud 2014; Marko et al. 2014). Kriging is one of the most popular and robust interpolation techniques among other techniques. It integrates both spatial correlation and the dependence in the prediction of a known variable. Estimations of nearly all spatial interpolation methods can be represented as weighted averages of sampled data. The equation can be written as follows (Delhomme 1978):

$$\hat{z}(x_{o} ) = \sum\limits_{i = 1}^{n} {\lambda_{i} z(x_{i} )}$$

where \(\hat{z}\) is the estimated value of an attribute at the point of interest \(x_{o}\), z is the observed value at the sampled point x i , \(\lambda_{i}\) is the weight assigned to the sampled point, and n represents the number of sampling points used for the estimation (Webster and Oliver 2001). The attribute is usually called the primary variable, especially in geostatistics. The semivariance can be estimated from groundwater data by the following equation:

$$\gamma (h) = \frac{1}{2n}\sum\limits_{i = 1}^{n} {\left[ {z(x_{i} ) - z(x_{i} + h)^{2} } \right]}$$

where \(n\) is the number of pairs of sample points separated by the standard distance calls lag h (Burrough and McDonnell 1998), and \(z(x_{i} )\) is the value of variable z at location x i . Variogram modeling and estimation are important for structural analysis and spatial interpolation. Among the different kriging techniques, OK has been used in this study because of its easy calculation and prediction accuracy compared to the other kriging methods (Gorai and Kumar 2013). Recently, different variograms or semivariogram models such as linear, exponential, and spherical models are very popular worldwide for spatial analysis of geochemical data sets (Kitanidis 1997; Elogne et al. 2008; Varouchakis and Hristopulos 2013). In this study, circular, spherical, exponential, and Gaussian models have been used to measure spatial autocorrelation or dependence of the groundwater data. The best fit theoretical semivariogram models are prepared based on selecting the trial-and-error basis. Predictive performances of fit models are checked on the basis of cross-validation tests (Gorai and Kumar 2013). The mean error (ME), mean square error (MSE), root mean error (RMSE), average standard error (ASR), and root mean square standardized error (RMSSE) values are assessed to establish the fit models performance. Hu et al. (2004) have discussed several criteria for using error measurements to judge the performance of spatial interpolation methods. Models attain the best goodness-of-fit results in minimum mean error (ME), root mean error (RME), and mean squared error (MSE), attain root mean squared error (RMSE) and average squared error (ASE) close to unity and are considered as the best fit models performance (ESRI 2009). The errors are estimated by the following equations:

$${\text{ME}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {(p_{i} - o_{i} )}$$
$${\text{RMSE}} = \sqrt {\left[ {\frac{1}{n}\sum\limits_{i = 1}^{n} {(p_{i} - o_{i} )^{2} } } \right]}$$
$${\text{MSE}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {(ps_{i} - os_{i} )}$$
$${\text{RMSSE}} = \sqrt {\left[ {\frac{1}{n}\sum\limits_{i = 1}^{n} {(ps_{i} - os_{i} )^{2} } } \right]}$$
$${\text{ASE}} = \sqrt {\left[ {\frac{1}{n}\sum\limits_{i = 1}^{n} {(p_{i} - {{\left( {\sum\nolimits_{i = 1}^{n} {p_{i} } } \right)} \mathord{\left/ {\vphantom {{\left( {\sum\nolimits_{i = 1}^{n} {p_{i} } } \right)} n}} \right. \kern-0pt} n})^{2} } } \right]}$$

where n is the number of observation points or samples; o and p are the observed value and predicted or estimated values at location i; os is the standardized observed value; \(ps\) is the standardized predicted value. After completing the cross-validation process, kriging offers graphical representation of the distribution of groundwater quality. In this study, Arc GIS (10.2 version) has been used for this interpolation technique.

Results and discussion

Groundwater quality

GWQI values are calculated using different International Standards and BMAC values to determine the suitability of groundwater quality for drinking purposes (Table 3). The GWQI values range from 24.73 to 430.94 with the mean of 113.56. The critical limit (100) for drinking water purposes has been proposed by Vasanthavigar et al. (2010). The results in Table 4 show that 50 % of samples exceed the critical limit (100) of GWQIs. Among the total samples, 17 % of samples belong to excellent water quality and 33 % represent good water quality, 40 % exhibit poor water quality, 7 % of water is of very poor quality, and the rest of 3 % indicate unsuitable water for drinking purposes (Table 4).

Table 3 Descriptive statistics of physiochemical parameters and heavy metals in the study area
Table 4 Pollution potential of groundwater samples of the study area based on different quality indices

The degree of contamination (CD) has been used for estimating the extent of metal pollution (Al-Ami et al. 1987; Bhuiyan et al. 2010). Table 5 shows that range and mean values of CD for groundwater samples are 0.17–66.23 and 11.17, respectively. CD may be categorized (Backman et al. 1997; Edet and Offiong 2002) as follows: low (CD < 1), medium (CD = 1–3), and high (CD > 3). The heavy metal pollution indices (HPI) are computed using the International Standard (BIS 2012) values of metal content in groundwater samples. The range and mean values of HPI in groundwater samples are 2.19–59.01 and 10.41, respectively. Among the total samples, 77 % of the samples fall above the critical values, i.e., CD 3. According to Edet and Offiong (2002), most of the samples are considered as highly polluted water. Besides the CD values, the water samples are further analyzed by HPI and HEI methods to compare with the results of CD. However, HPI and HEI values for all sample locations fall below the critical values prescribed for drinking purposes (Table 5). The heavy metal evaluation index (HEI) is used to synchronize the criteria for various pollution indices. The HEI criteria for groundwater samples are thus classified as low (HEI < 40), medium (HEI = 40–80), and high (HEI > 80). It is observed that groundwater in the study area exhibits low level of pollution. The results of GWQIs, CD, HPI, and HEI methods show more or less similar trends for most of the samples (Fig. 2). The GWQI values have shown higher spatial variation, whereas HEI values have depicted lower variation. We have also assessed the relationship between metal concentration with the computed indices (GWQIs, CD, HPI, and HEI in Table 5). The GWQIs show positive significant correlations with CD, HPI, and HEI. As, Fe, Mn, and Ni show strong positive correlation (Table 6) with the indices values indicating the metals are the major factors for the pollution in this region.

Table 5 Groundwater pollution indices
Fig. 2
figure 2

Spatial variation in groundwater quality index values in the study area

Table 6 Correlation coefficient matrix for indices values and metal concentration

Source identification of groundwater pollutants

Principal component analysis (PCA) is used for source identification of heavy metals following the standard procedures (Dragovíc et al. 2008; Franco-Uría et al. 2009). Varimax rotation is used to maximize the sum of variance of the factor coefficients which better explains the possible groups/sources that influence water systems (Gotelli and Ellison 2004). Six factors with eigenvalues >1 are extracted for groundwater data sets which represent 80.61 % of the total variance. The scree plot is used to identify the number of PCs to be retained to understand the underlying parameters’ structure (Fig. 3a). The calculated factor loadings together with cumulative percentage and percentages of variance are explained by each factor as listed in Table 7. The positive and negative scores in PCA indicate that most of the water samples are either essentially affected or unaffected by the presence of extracted loads on a specific factor/component, respectively. About 58.38 % of the total variance is represented in the first three loading factors (Fig. 3b). In this study, PC1, PC2, PC3, PC4, PC5, and PC6 explain more than 26, 17, 14, 8, 7, and 6 % of the total variance, respectively.

Fig. 3
figure 3

a Scree plot of the characteristic roots (eigenvalues) of principal component analysis, b Component plot in rotated space of principal component analysis

Table 7 Varimax rotated principal component analysis (R-mode) for groundwater samples

The first principal component (PC1) in the groundwater data sets explains more than 26.87 % of the total variance. It is loaded with Na, EC, Cl, B, Mg, K, Ca, SO4, and HCO3. This factor was brought under the purview of many natural hydrogeochemical evolution of groundwater through groundwater–geological medium interaction (Omo-Irabor et al. 2008). PC2, explaining 17 % of total variance, is loaded with P, As, DOC, NH4–N, Mo, Fe, and HCO3 (Table 5). PC2 explains leaching of materials from soil horizon to the aquifer which are basically trace elements and are regarded as nonpoint pollutant sources along with partial natural weathering processes. Arsenic comes under this category, i.e., geogenic processes augmented through human-induced activities such as fertilizers and animal feedings (Bhuiyan et al. 2011b). R-mode cluster analyses thus are influenced by both geogenic and anthropogenic activities. Excessive presence of NH4–N has found to contribute by the use of chemical fertilizer in agricultural fields (Backman et al. 1998). PC3, accounting for 14 % of total variance, is associated with Fe, Ba, and Si, indicating geogenic factors. PC4 is moderately loaded with Mn, accounting for 8.5 % of total variance and is related to geogenic factors. Generally, Mn occurs naturally as a mineral from sedimentary rocks or from mining and industrial waste products (Bhuiyan et al. 2010). Bacterial activities on Fe and Mn are also responsible for releasing Fe and Mn (Bromfield 1978; Bhattacharya et al. 2002; Naujokas et al. 2013; Islam et al. 2013). PC5 has strong loadings of Al and Pb with 8 % of total variance, indicating anthropogenic pollution from domestic and agricultural sources. Natural causes like geogenic process along with anthropogenic activities (like industrialization) control these enrichments and are shown in high scores of PC5 and S12 and S63 samples. PC6 has moderate to strong loadings of Ni and Sb which is linked to release from stainless steel and alloy product industries.

Spatial similarities and sampling site grouping

In this study, the CA results strongly agree with the PCA results. The R-mode CA retains four main clusters for analyzed parameters (Fig. 4a). Elements belonging to the same cluster are likely to have originated from a common source. The R-mode CA performed on the groundwater samples produces four clusters. Cluster 1 includes EC, Na, Cl, Mg, SO4, K, HCO3, and B which may be explained by combining nonpoint sources and leaching of fertilizers from the soil horizon to the aquifer. Cluster 2 consists of Mn, Zn, Ca, Ni, Fe, and Ba reflecting the influence of domestic and agricultural pollution (Omo-Irabor et al. 2008). Cluster 3 includes As, DOC, NH4–N, P, Mo, and pH and is explained by the dissolution of minerals under basic condition. Cluster 4 includes Pb, Al, Si, and Sb which represent the presence of anthropogenic and geogenic activities. Although there are some variations between the results of CA and PCA, a good agreement between the two statistical techniques is evident in all the data sets analyzed.

Fig. 4
figure 4

a Dendrogram showing the hierarchical clusters of analyzed parameters. Dashed lines in dendrogram represent Phenon lines. b Dendrogram showing the hierarchical clusters of analyzed samples site. Dashed lines in dendrogram represent Phenon lines

Q-mode CA has been applied to detect the spatial similarities and site grouping among the sampling points (Table 8). Samples are clustered in a particular group which share similar characteristics with respect to the analyzed parameters. The 70 sampling sites fall into four clusters (Fig. 4b). Cluster 1 consists of 37 sampling points. These 41 sampling points are S4–S12, S15–S18, S20, S23, S24, S27, S33, S35, S38–S41, S43–S49, S51–S55, S58, S62, S63, S65, S66, and S69. Among the sites, strong correlation exists among the element pairs: Ca with Cl (r = 0.679, p < 0.01) and Mg with Na (r = 0.786, p < 0.01). It is worth noting that moderate concentrations of Ca, Mg, K, SO4, Cl, HCO3, EC, B, and Na have been observed at most of the stations. Cluster 2 contains only four samples such as S19, S56, S64, and S67. Cluster 3 includes the following 12 sample sites: S1–S3, S14, S22, S25, S26, S30, S31, S34, S36, and S37. Cluster 4 consists of 13 sites which are S13, S21, S28, S29, S32, S42, S50, S57, S59, S60, S61, S68, and S70. The relationships among the analyzed parameters are also visualized in the factor loading plots of PC1 versus PC2 and PC1 versus PC3 (Figs. 5, 6). Five major clusters are obtained for all parameters from the plotting of PC1 versus PC2 (Fig. 5a). Cluster 1 contains parameters As, P, DOC, Mo, and Fe. Cluster 2 consists of F, Al, pH, Ni, Zn, Mn, Ba, Al, Sb, and Pb. Cluster 3 includes B, Mg, Be, Na, Cl, K, and SO4. NH4–N, HCO3, Ca belong to cluster 4, whereas Si independently remains in cluster 5. Near similar groupings of parameters are observed in the plot of PC1 versus PC3 (Fig. 5b) with some exceptions of Si, pH’s groupings. For sampling sites, the PC1 versus PC2 and PC1 versus PC3 plots show same common sources having four distinct clusters (Fig. 6a, b) which differ from the clustering phenomenon that are shown in Fig. 5a, b.

Table 8 Varimax rotated principal component (Q-mode) analysis for groundwater samples
Fig. 5
figure 5

Plots of first three principal component loadings. a PC1 versus PC2 and b PC1 versus PC3 for all analyzed parameters

Fig. 6
figure 6

Plots of first three principal component loadings. a PC1 versus PC2, b PC1 versus PC3 for all analyzed sampling sites

Correlation matrix analysis

Pearson’s correlation coefficient matrix is generated in order to identify the rotations among the parameters and sources of groundwater pollution (Table 9). Correlation matrix shows that inter-parameter relationships agree with the results obtained from PCA. It also shows some new associations between the parameters that are not adequately reported in the previous sections. Strong (p < 0.01) and significant correlations (p < 0.05) are observed in most of the parameters of groundwater samples. EC has a strong positive correlations at p < 0.01 with Na+ (r = 0.959), K+ (r = 0.682), Mg2+ (r = 0.847), Ca2+ (r = 0.645), NH4 2−–N (r = 0.538), SO4 2− (r = 0.685), HCO3 (r = 0.608), Cl (r = 0.939), and B (r = 0.842), and these parameters are also positively correlated with each other (Table 8). These associations indicate mixed sources of geogenic/anthropogenic origin which are described in PC1 section. Na+, Cl, Mg2+, K+ are the main constituents of groundwater as a result of interaction with minerals or trapped saline fluids in aquifers and chemical weathering of catchment rocks. The partially acidic nature of groundwater is due to leaching of altered rocks and anthropogenic sources. DOC shows significant correlations with As (r = 0.617, p < 0.01), Mo (r = 0.479, p < 0.05), Fe (r = 0.414, p < 0.05), NH4 2−–N (r = 0.422, p < 0.05), and P (r = 0.583, p < 0.01). These results are also showing similarity with PC2. These correlation results are attributed to geogenic sources from the basement rocks. Harvey et al. (2002) have reported that DOC in groundwater in Bangladesh is positively correlated with As and has concluded that the mobility of As has been closely related to the recent inflow of carbon or desorption of As by carbonate rock. pH shows negative significant correlations with Fe (r = −0.488, p < 0.05) and Si (r = −0.696, p < 0.01). The metal pairs Fe–Ni, Fe–Ba, and Fe–P show significant correlations (at p < 0.05) with correlation coefficients (r) 0.493, 0.609, and 0.458, respectively, depicting a similar sources as mentioned in PC3 section. Similar observations are made by Chapagain et al. (2010) in the deep groundwater quality in Kathmando, Nepal, where the occurrence of heavy metals is possibly influenced by redox levels and nature of underlying sediment (i.e., mineral composition and organic matter contents) of groundwater. A poor correlation exists between As with Fe (r = 0.305, p < 0.05) and Mn (r = 0.142, p < 0.05), whereas As shows significant correlations with P (r = 0.773, p < 0.01) and Mo (r = 0.543, p < 0.01). So, the correlations between As, Fe, and Mn in groundwater are due to the precipitation of dissolved Fe as siderite solids (FeCO3) or rhodochrosite (MnCO3) under reduced environmental conditions (Reza et al. 2010a). High As, low Fe, low Mn in groundwater of the Meghna flood plain in southeastern parts of Bangladesh are evaluated by Reza et al. (2010b). However, their findings quite differ from the observations in this study where very high concentrations of Fe, Mn, and As are observed. A positive strong correlation is observed between Pb and Al (r = 0.843, p < 0.01), indicating similar sources as observed in PC5. Ni exhibits significant correlation with Sb (r = 0.480, p < 005) as observed in P62 section.

Table 9 Pearson’s correlation matrix among physicochemical parameters and metals in the analyzed samples

Geostatistical modeling

Semivariogram models are employed after normalizing the data using ArcGIS (version 10.2). Among other models, OK (ordinary kriging) is applied in this study. The nugget, sill, and the range values of the best fit semivariogram models for quality parameters are shown in Table 10. Extremely low nugget effects and the absence of variability in groundwater elevation at short distances demonstrate that there is an insignificant small-scale variability measurement error. Thus, the fit semivariogram represents very well the spatial structure of these variables in the groundwater. The error statistics are estimated to ascertain the reproducibility of observed values by the theoretical semivariogram models and developed spatial maps (Table 10). The best fit semivariogram model is chosen based on ME, MSE, RMSE, RMSSE, ASE criterion. A model is considered robust and accurate when ME and MSE are close to zero, RMSE and ASE are small, and RMSSE is close to 1 (Adhikary et al. 2010). Ranges of semivariograms are a measure of extension where autocorrelation existed (Li et al. 2009). The distance where the models first flatten out is known as the range which varies in each variable of groundwater quality indices. The range of semivariograms for all the variables is from 0.5 to 22 km showing the lowest range for PC1 and highest range for GWQIs.

Table 10 Indices and PCs of best fit semivariogram models for groundwater parameters and their variance

The nugget/sill ratio represents spatial dependence of groundwater quality parameters (Nayanaka et al. 2010). There are three classifications used for model explanation: If the ratio is less than 25 %, it shows strong spatial dependence; if the ratio is in between 25 and 75 %, it indicates moderate spatial dependence; and if the ratio is more than 75 %, it represents weak spatial dependence. Figures 7 and 8 show experimental semivariograms (binned sign) around the omnidirectional semivariogram model (blue line) and average of semivariogram models (plus sign). The best fit semivariogram models for different groundwater quality parameters are shown in Figs. 7 and 8, and Table 10. The Gaussian semivariogram model is identified to be the best fit model for CD, HPI, PC3, PC4, PC5, PC6, while the circular semivariogram model fit best for GWQIs. The exponential semivariogram model is observed to be the best fit model for HEI and PC2, whereas the spherical model fit well for PC1. In this study, the results show that most of the variables are weakly spatial dependent in semivariogram shapes except PC1 and HEI. PC1 represents strong spatial dependence, whereas HEI shows moderate spatial dependence (Fig. 7). Mostly, the weak spatial dependence has been demonstrated in the large nugget effect in semivariogram shape (Fig. 8) and is due to the high variability of topography of groundwater which varies with the agriculture, residential, and industrial areas.

Fig. 7
figure 7

Best fit semivariogram models of groundwater quality evaluation indices in the study area. a GWQI, b CD/C d , c HPI, and d HEI

Fig. 8
figure 8

Best fit semivariogram models of groundwater parameters in the study area. a PC1, b PC2, c PC3, d PC4, e PC5, and f PC6

The OK interpolation techniques are applied to develop spatial distribution maps of groundwater data set (n = 70) for each groundwater pollution index and each PC score. After conducting the cross-validation test of best fit models, maps of kriged estimates are generated which provide a visual representation of the distribution of groundwater samples. GWQIs, CD, HPI, and HEI exhibit more or less similar distribution patterns with an increasing trend at the southeast to northwest direction of the study area (Fig. 9). In general, contamination of groundwater with metal pollution is attributed to anthropogenic sources. Other than this, both geogenic and anthropogenic sources are likely to be contributed by the high scores of GWQIs, CD, HPI, and HEI in the study area. High scores are observed at the northern and northwestern parts of the sampling area, and low scores are observed at the southeastern part, suggesting the existence of similar point sources. The strong significant correlations among GWQIs, CD, HPI, and HEI show similar distribution pattern of the previous findings.

Fig. 9
figure 9

Maps showing the spatial distribution of four index scores obtained by the indices of quality evaluation of the groundwater samples. a GWQI, b CD/C d , c HPI, and d HEI. Here, both HPI and HEI are plotted considering with As and other metals

Spatial distribution of PC1 score in Fig. 10a reveals complex pattern where high scores (i.e., values from 0.42 to 3.93) are mostly observed at the southern parts and low scores are found at the northern and central parts. The higher score of PC1 is probably due to the rock–water interaction and ion exchange in groundwater (Guler et al. 2012). Similar findings are obtained from the recent studies conducted in various regions of the world (e.g., Fernandes et al. 2008; Guler et al. 2012). The spatial distribution of PC2 scores shows high scores (values from 0.02 to 3.52) at the northern part and the low scores (i.e., values from −1.26 to −0.38) at the southern part (Fig. 10b). This variation is influenced by the dominance of nonpoint sources such as fertilizer and sewage discharges. For example, high NH4 and P in PC2 of groundwater samples may be attributed to the use of fertilizers and urban runoff. Figure 10c shows the distribution of PC3 scores, where moderately higher scores (i.e., values from 0.23 to 1.91) are observed at the northern corner of the study area and the lower scores are found at the southern edge. PC3 contains very high scores of Fe, Ba, and Si which are likely due to geogenic origin. Figure 10d showing the distribution of PC4 scores reverses the distribution pattern of PC3 score, where the higher scores (from 0.12 to 2.18) are observed at the southern part and the lower scores are observed at the northern part. PC4 comprises of the high positive score of Mn and negative score of F reflecting the geogenic origin. The distribution of PC5 component score in Fig. 10e shows higher score (values from 0.12 to 2.18) at the eastern part and the lower score at the western part of the study area. Overall, the map shows a decreasing trend toward the east. In PC5, high score of Al is strongly correlated with Pb, indicating the anthropogenic origin. Figure 10f shows the spatial distribution of PC6 component score. The highest score (i.e., values from 1.00 to 7.25) is found at the central part which comprises of high score of Sb and Ni. This component is contributed by the wastewater discharge from urban areas.

Fig. 10
figure 10

Maps showing the spatial distribution of six PC scores. a PC1: Na, K, Ca, Mg, Cl, SO4, HCO3, EC, and B; b PC2: P, Mo, As, NH4N, DOC, and Fe; c PC3: Fe, Ba, and Si; d PC4: Mn and Fe; e Pb and Al; and f Ni and Sb

Among the geostatistical models, OK is considered as an effective tool for preliminary decision makers of groundwater quality management at the southeastern Bangladesh. The best fit semivariogram models are selected based on the criterion of ME, MSE, RMSE, RMSSE, and ASE which varied for each PC and index. The Gaussian semivariogram model is identified to be the best fit model for CD, HPI, PC3, PC4, PC5 and PC6 while the circular semivariogram model fit best for GWQIs. The exponential semivariogram model is observed to be the best fit model for HEI and PC2, whereas the spherical model fit well for PC1. In the study, most of the variables have weak spatial dependence except PC1 and HEI. PC1 represents strong spatial dependence, whereas HEI shows moderate spatial dependence. However, the weak spatial dependence indicates high variability of topography in groundwater system (e.g., agriculture, residential, and industrial areas). The cross validation indicates that kriging method is the most accurate interpolation technique for the study. The spatial distribution maps are generated using OK method showing the variation in different indices in the study area. Considering the distribution maps, six PC groups (PC1–PC6) and four indices (GWQIs, CD, HPI, and HEI) with different spatial variability patterns can be distinguished (Figs. 9, 10). Indices scores exhibit more or less similar distribution pattern with an increasing trend at the southeast to northwest direction where high scores of GWQIs, CD, HPI, and HEI are observed at the northern and northwestern parts of the study area (Fig. 9a–d). The low scores are found at the southeastern part which suggests the existence of similar point sources. However, spatial analysis of GWQIs depicts that water quality is poor at the western and northwestern parts of the study area. Different distribution patterns of the PC scores have implied the existence of different sources. Distribution maps of PC1 to PC6 have shown some anomalies regarding the mean scores of groundwater variables (Fig. 10a–f). In fact, salinity observed at the northwestern part of PC1 of the study area is alarming, as households depend upon groundwater for domestic purposes.

Conclusions

Integrated approaches of different multivariate statistical and geostatistical techniques are employed in this study to evaluate the variations in groundwater quality of the Lakshimpur district of Bangladesh. Cluster analysis groups 70 sampling points into four clusters of similar water quality characteristics. Based on the obtained information, it may be easier to design the study area further where optimal sampling strategies can reduce the number of sampling sites and associated costs. Principal component analysis assists for identifying the factors or sources responsible for water quality variations. This study illustrates the usefulness of multivariate statistical and geostatistical techniques for the analysis and interpretation of complex data sets, identification of pollution sources, and understanding spatial variations in water quality for effective groundwater management. Moreover, the chemometric studies enable us to demonstrate the similarities and differences among the examined variables that are not evidently visible from an examination of the analytical data as shown in the tables. The ordinary kriging is an effective tool for preliminary decisions makers of groundwater quality management in southeastern Bangladesh. Except As, other geogenic pollutants are not much alarming for the groundwater consumption at the study area. However, some anthropogenic processes are quite unpleasant.

The results of principal component/factor analysis indicate that anthropogenic (agrogenic, surface runoff, and domestic sewage) and natural/geogenic sources (weathering of source rock) are responsible for variation in physicochemical parameters and metal contents in groundwater systems at the southeastern coastal region of Bangladesh. The resulting spatial distribution maps provide a helpful and robust visual tool for researchers and policy makers toward defining adaptive measures. The study provides background information on physiochemical parameters, harmful metals, possible sources, and spatial variation in groundwater systems in Lakshimpur district of Bangladesh.