Introduction

Heavy metal pollution of groundwater represents an important environmental problem. Some heavy metals that are essential for human health may cause various health problems if present in concentrations that exceed permissible limits. In order to determine the suitability of groundwater for drinking and agricultural and industrial use, heavy metal levels must be evaluated. Heavy metals as well as pesticides may cause problems due to their generally non-biodegradable nature (Eugenia et al. 1996). Numerous studies have been conducted examining heavy metal contamination of groundwater and surface water in different parts of the world (Alam and Umar 2013; Kanmani and Gandhimathi 2013; Madhulakshmi et al. 2012; Tiwari et al. 2013). For example, Haloi and Sarma (2012) examined groundwater in the Barpeta District of Assam, India, and found Fe, Mn, and Pb concentrations to be above permissible limits for drinking water. Kumari et al. (2014) evaluated groundwater quality in industrial areas of Ghaziabad, India, and found 12 elements to exceed prescribed standards. Similarly, Yılmaz et al. (2010) found heavy metal contamination of groundwater in Turkey posed a significant risk for drinking as well as irrigation.

Interpolation methods used to estimate unknown values from data observed at known locations can successfully map the spatial distribution of various hydrochemical parameters (Quyang et al. 2013; Shan et al. 2013). Techniques such as inverse distance weighting (IDW), ordinary kriging (OK), and radial basis functions (RBF) have been extensively used in groundwater quality and pollution studies (Rabah et al. 2011). Yao et al. (2014) used eight different interpolation methods to investigate the spatial distribution of groundwater levels and found kriging methods to produce the best models. Noori et al. (2013) used several spatial interpolation methods (kriging, cokriging, RBF, IDW) to map the spatial distribution of groundwater levels in Iran, evaluating the performance of each method by RMSE, MAE and R 2. Hua et al. (2009) compared IDW and OK estimates of groundwater depths in the Shule River Basin, China, and concluded that OK was more efficient than IDW based on the lower RMSE values of the former.

Multivariate techniques are very useful for identifying relationships between variables. Methods such as cluster analysis (CA) and factor analysis (FA) have been used increasingly in studies conducted to measure and monitor groundwater. Whereas CA can be used to illustrate the overall similarity of variables in a data set and link similar groups (Massart and Kaufmann 1983), FA can identify similarities between factors as well as the most important factors contributing to data structure. CA and FA have been widely used to study heavy metals in groundwater (Brahman et al. 2013; Monjerezi et al. 2012). In their study evaluating groundwater quality in coastal areas in China using FA, Huang et al. (2013) identified four factors affecting groundwater quality, namely seawater intrusion and As contamination; water-rock interaction, surface water recharge, and acidic precipitation; heavy metal pollution from industry; and agricultural pollution and sewage intrusion.

Several recent studies have combined the use of multivariate analysis and GIS techniques in order to evaluate groundwater (Lu et al. 2012; Arslan 2013). In addition to measuring heavy metal concentrations in the Çarşamba Plain in northern Turkey, the present study evaluated three interpolation methods and used the optimal method to map the spatial distribution of each heavy metal in the study area. Then, using multivariate statistical analysis, it classified the study area into groups according to groundwater pollution by heavy metals and identified the genetic origin of these pollutants.

Materials and methods

Study area

The study was conducted in the Middle Black Sea Region of Northern Turkey in Çarşamba Plain, one of Turkey’s largest plains (41° 11′–41° 23′ E, 36° 30′–37° 00′ N) (Fig. 1). Elevation of the research site ranges from 0.0–18.0 m above sea level, with slope gradients of approximately 1 %. The climate is semi-humid, with an average annual precipitation of about 700 mm and an average annual temperature of 17 °C. Cropping patterns in the study area vary considerably. The dominant crop was hazelnut. Wheat, tomatoes, and rice are grown in the irrigation season, whereas cabbage and leek are grown in rainy seasons, and corn is grown as both a primary and secondary crop. There are several industries, including copper and fertilizer factories over the study area.

Fig. 1
figure 1

Study area and sample locations

Çarşamba Plain was formed by the alluviums brought by Yeşilırmak River. The aquifer system lies over the Quaternary deposits under unconfined conditions. The unconfined aquifer is within a wedge of unconsolidated gravels and sands that thicken towards the coast (Fig. 2). The fill of Carşamba Plain consists of detrital Quaternary sediments that contain sand, silt, and clay that have little vertical or lateral continuity and frequent lateral changes in facies with 20–110 m thick and increase from south to north. The aquifer of Çarşamba Plain is composed of alluvium deposit and eosin-aged volcanic rock units. The aquifer has available discharges of between 2 and 32 l/s. The aquifer has varying hydraulic gradient and the highest gradient is 0.0025. Groundwater depths ranged from about 3 to 10 m below ground surface (DSI 1994).

Fig. 2
figure 2

Geological units of Çarşamba Plain aquifer (modified from DSI 1994)

Groundwater sampling and analysis

Groundwater samples were collected from 78 water wells in July 2012. Sampled wells were homogeneously distributed over the study area as to represent the entire site. Well locations are shown in Fig. 1. Each sampling location was recorded with a global positioning system (GPS). Pumps were operated for 15 min. Prior to sample collection. Water samples were analyzed for concentrations of 17 heavy metals [lead (Pb), zinc (Zn), chromium (Cr), manganese (Mn), iron (Fe), copper (Cu), cadmium (Cd), cobalt (Co), nickel (Ni), aluminum (Al), arsenic (As) molybdenum (Mo), selenium (Se), boron (B), titanium (Ti), vanadium (V), barium (Ba)] using an Agilent 7500a inductively coupled plasma mass spectrometry (ICP-MS) at the General Directorate of Hydraulic Works (DSI), Department of Technical Research and Quality Control.

Interpolation methods for heavy metals

Spatial interpolation refers to the estimation of values of a particular attribute at unsampled locations using existing information from other observation points. By converting data from observation points to continuous fields, the spatial patterns of sampled measurements can be compared with the spatial patterns of other entities. The three most widely used interpolation methods are inverse distance weighting (IDW), radial basis functions (RBF), and ordinary kriging (OK) (Sun et al. 2009). IDW estimates values at unsampled points by using a linear combination of values at sampled points weighted by an inverse function of the distance from the point of interest to the sampled points using the following formula:

$$ Z=\left[{\displaystyle \sum_{i=1}^n\left({Z}_i/{d}_i^m\right)/{\displaystyle \sum_{i=1}^n\left(1/{d}_i^m\right)}}\right] $$
(1)

where \( Z \) is the estimated value, \( {Z}_i \) is the measured sample value at point \( i \), \( {d}_i \)is the distance between \( Z \) and \( {Z}_i \), and \( m \) is the weighting power that defines the rate at which weights fall off with\( {d}_i \), with a typical \( m \) value of 1–5 (Keshavarzi and Sarmadian 2012). RBF comprises a series of interpolation methods in which the estimated surface must pass through all measured sample values. There are five different basis functions: thin-plate spline (TPS), spline with tension (SPT), completely regularized spline (CRS), multiquadric function (MQ), and inverse multi-quadric function (IMQ) (Xie et al. 2011). Ordinary kriging, the most common interpolation method used in geostatistical studies (Guler et al. 2014), resembles IDW in that it uses a linear combination of weights at known points to estimate the value at an unknown point. Other types of kriging include simple kriging (SK), universal kriging (UK), and indicator kriging (IK). OK uses a linear combination of measured values whose weights are determined based on their spatial correlation to produce estimated values using the following formula:

$$ Z={\displaystyle \sum_{i=1}^n{\lambda}_i}Z\left({x}_i\right) $$
(2)

Where Z is the estimated value, \( Z\left({\chi}_i\right) \)is the measured value at\( {\chi}_i \), \( {\lambda}_i \)is the weight assigned to the residual of \( Z\left({\chi}_i\right) \), and\( n \) is the number of the data used at known locations in a neighborhood. This study compared the results of IDW (using weighting powers of 1, 2, and 3), RBF (using five different basis functions), and OK interpolation in estimating heavy metal concentrations and then used the best method to evaluate the spatial distribution of each heavy metal in the study area. Data was evaluated using the software ARCGIS 10.0 with Geostatistical Analyst Extensions.

This study compared the results of IDW (using weighting powers of 1, 2, and 3), RBF (using five different basis functions), and OK interpolation in estimating heavy metal concentrations and factor scores of each factor and then used the best method to evaluate the spatial distribution of each heavy metal and factor scores in the study area. Data was evaluated using the software ARCGIS 10.0 with Geostatistical Analyst Extensions.

Cross validation

Cross-validation is performed to assess the best method of interpolation (Yao et al. 2014). Several techniques can be used to judge the relationships between observed and predicted values and determine the best method (Li and Heap 2011).

Root-mean-square error (RMSE) and mean absolute error (MAE) were used to evaluate the predictive performance of different techniques, with the smallest RMSE and MAE indicating the most accurate predictions. MAE and RMSE are calculated using the following formulas:

$$ MAE=\frac{1}{n}{\displaystyle \sum_{i=1}^n\left|{Z}_{i-}Z\right|} $$
(3)

and

$$ \mathrm{RMSE}=\sqrt{\frac{{\displaystyle \sum {\left({Z}_i-Z\right)}^2}}{n}} $$
(4)

Where \( {Z}_i \) is the predicted value, \( Z \) is the observed value, and \( n \) is the number of observations.

Multivariate statistical analysis

Multivariate statistical analysis is widely used to identify and evaluate surface water and groundwater data (Chidambaram et al. 2012; Narany et al. 2014; Vieira et al. 2012). Multivariate techniques make it possible to simplify, organize, and classify data to draw out useful meaning.

Cluster analysis is used to group data into hierarchies based on similarities or dissimilarities. Yadav et al. (2013) and Hossain et al. (2013) successfully used cluster analysis to classify groundwater samples according to similarities. In the present study, CA was applied to group groundwater samples for Pb, Zn, Cr, Mn, Fe, Cu, Cd, Co, Ni, Al, As, Mo, Se, B, Ti, V, and Ba content.

Factor analysis is used primarily to reduce the contribution of less significant variables in order to further simplify data structure. In a study evaluating groundwater in a region of Taiwan affected by Blackfoot disease, for example, Liu et al. (2003) defined factor loading as “strong,” “moderate,” and “weak,” corresponding to absolute values of >0.75, 0.50–0.75, and 0.30–0.50, respectively; factors with eigenvalues >1 explained more total variation in the data than individual groundwater quality variables, and factors with eigenvalue <1 explained less total variation than individual variables. Statistical analyses were performed using the Statistical Package for the Social Sciences (SPSS), version 20.

Results and discussion

Heavy metals in groundwater

Descriptive statistics related to groundwater heavy metal concentrations are given in Table 1. Coefficient of variation (CV) was the most important factor in describing the variability of groundwater properties. Data was ranked according to amount of variation as “low variability” (CV, ≤15 %), “moderate variability” (CV, 15–35 %), or “high variability” (CV, >35 %) (Wilding 1985). All heavy metals in the study area were found to exhibit high variability.

Table 1 Descriptive statistics for heavy metal concentrations in groundwater in the study area

Groundwater heavy metal constituents were examined in relation to WHO standards for drinking water (2011). Concentrations of zinc (Zn), chromium (Cr), copper (Cu), cobalt (Co), molybdenum (Mo), vanadium (V), and titanium (Ti) and selenium (Se) in the study area were far below the WHO limits for drinking water, nickel (Ni), boron (B), and barium (Ba) concentrations in all samples were also below the established limits. Lead (Pb) concentrations ranged between 0.00–0.0093 mg/l (mean: 0.0013 mg/l), which are also below WHO limits for drinking water, although some samples are close to these limits. Manganese (Mn) concentrations ranged between 0.0044–2.4820 mg/l (mean, 0.3396 mg/l), with many sites exceeding WHO limits for drinking water. Iron content (Fe) varied considerably, from 0.1711 to 5.3790 mg/l (mean, 0.9978), with many samples above the WHO limit of 0.3 mg/l. Similarly, Al content measured at some wells greatly exceeded the WHO recommended limit of 0.2 mg/l for drinking water. Arsenic (As) and cadmium (Cd) concentrations also exceeded guidelines in some areas.

Comparison of interpolation methods

Maps were prepared only for those heavy metals whose concentrations were close to or above the recommended limits for drinking and irrigation water (Fe, Mn, Al, As, Cd, B, Pb). The spatial variability of heavy metals was assessed using inverse distance weighting (IDW) raised to powers of 1, 2, and 3; radial basis functions (CRS, ST, MQ, IMQ, and TPS) and ordinary kriging. Kriging methods work best with data with a normal distribution (Xie et al. 2011). However, a Kolmogorov-Smirnov test showed the data for heavy metals concentrations in groundwater in the study area were not normally distributed (P < 0.05); therefore, values were log-transformed prior to calculation of semi variance.

The RMSE and MAE were used to compare the performance of interpolation methods. The method that yields the smallest value of RMSE and MAE is the best (Table 2). However, if MAE is not at the lowest value when RMSE is at the lowest, the most accuratet method is the one which has the lowest RMSE value. In addition, MAE values are considered to determine the best method when RMSE values are equal.

Table 2 Accuracy of methods used to interpolate groundwater heavy metal content

IDW-1 had the lowest RMSE values for Mn, Al, and B, whereas Pb was best estimated using IDW-2, Cd using IDW-3, As using RBF-IMQ, and Fe using OK. Xie et al. (2011) found that the greater the weighting power of IDW, the greater the RMSE of interpolation. As Table 2 shows, RBF-TPS had the highest RMSE and MAE values for all heavy metals and should be considered unsuitable for interpolating heavy metal groundwater concentrations.

Spatial distribution of heavy metals

Best-fit methods (Fig. 3ag) were used to interpolate the spatial distribution of heavy metals. Problematic areas for Fe, Mn, Al, As, Cd, B, and Pb were mapped using geographic information systems (GIS). Other recent studies have also used GIS in identifying spatial and temporal distribution of groundwater and soil properties (Sun et al. 2009; Varouchakis and Hristopulas 2013).

Fig. 3
figure 3

Interpolated maps for heavy metals a Fe using OK, b Mn using IDW-1, c Al using IDW-1, d As using RBF-IMQ, e Cd using IDW-3, f B using IDW-1, g Pb using IDW-2

Using OK, Fe was mapped into three categories for drinking and irrigation (Fig. 3a). Fe concentrations were above 0.17 mg/l throughout the study area (Table 1), with the highest concentrations mainly in the eastern part of the area. Normally, iron concentrations in drinking water must be below 0.3 mg/l; however, as Fig. 2a indicates, only 1 % of the study area had Fe concentrations below 0.3 mg/l, whereas the remaining 99 % should be considered unsuitable for drinking due to high Fe concentrations. In terms of irrigation, the recommended maximum concentration of soluble iron in irrigation water is 5 mg/l, whereas concentrations above 1.5 mg/l are considered to represent a severe clogging hazard for drip irrigation emitters, and concentrations between 0.1 and 1.5 mg/l represent a moderate clogging hazard (FAO 1994). Fe concentrations ranged from 0.17 to 1.5 mg/l in close to 88 % of the study area, representing a moderate hazard for drip irrigation, whereas close to 12 % of the study area had a Fe value above 1.5 mg/l, representing a severe hazard. These areas thus require another type of irrigation system, such as sprinkler or furrow irrigation. Finally, the iron concentration of one well (site 38) was above 5 mg/l, making it unsuitable for any irrigation use.

Using IDW-1, Mn was mapped into two categories for drinking and irrigation (Fig. 3b). Permissible limits for Mn in drinking water and irrigation water are normally less than 0.1 and 2 mg/l, respectively. Accordingly, water in 97 % of the study area was found unsafe for drinking, whereas only one well (site 7) was found unsafe for irrigation. However, while Mn concentrations in most of the study area were below the level of plant toxicity (2 mg/l), groundwater in 97 % of the area had Mn concentrations above 0.1 mg/l, making it a moderate hazard for drip irrigation systems (FAO 1994).

Using IDW-1, Al was mapped into three categories for drinking (Fig. 3c). As the map shows, groundwater in a portion of the center of the study area amounting to 7 % of the total is unsuitable for drinking.

Using RBF-IMQ, As was mapped into three categories for drinking (Fig. 3d). Different parts of the study area amounting to 26 % of the total were found to exceed the WHO limit of 0.01 mg/l for drinking water.

Using IDW-3, Cd was mapped into two categories for drinking (Fig. 3e). For the majority of the area, cadmium does not represent a problem; however, 2 % of the study area registered levels above the 0.001 mg/l WHO guideline.

Using IDW-1, B was mapped into four categories for drinking and irrigation (Fig. 3f). Boron is essential for drinking and irrigation water. However, the concentration of boron in irrigation water should not exceed 0.7 mg/l, with concentrations ranging from 0.7 to 3.0 mg/l considered to represent a severe to moderate hazard for irrigation (FAO 1994). Excessive boron concentrations were observed in only 1 well (site 15) in the northern part of the study area.

Using IDW-2, Pb was mapped into three categories for drinking (Fig. 3g). Pb concentrations in all samples were below the permissible limits, with the highest Pb value found at site 21.

Factor analysis

Correlations among hydrochemical constituents of the groundwater samples were examined by factor analysis (Li and Zhang 2010; Guler et al. 2013) using each of the heavy metals tested as variables. Five factors were found to explain 73.39 % of the total variance in the data set. Eigenvalues, percentages of variance, and cumulative percentages for the five factors identified are given in Table 3.

Table 3 Rotated factor pattern of five factors after varimax rotation

Three different weighing powers of IDW method, five different functions of RBF method and ordinary kriging method were tested. IDW-1 had the lowest RMSE and MAE values for factor 1, factor 3, and factor 5, whereas factor 2 and factor 4 were best estimated using IDW-2. Best-fit methods were used to interpolate the spatial distribution of factor scores. Spatial distribution of the scores for factors 1, 2, 3, 4, and 5 are given in Figs. 4ae, respectively, with lighter colors indicating lower values and darker colors indicating higher values. Heavy metals in groundwater originate from a variety of sources. Cu, Mo, and Cd are common components of pesticides, insecticides, herbicides, and fertilizers used in the study area (Kumbur et al. 2008). Whereas Cd originates primarily from anthropogenic sources (e.g., agriculture or industry); As and Pb may originate from urban and Industrial activities such as energy production, mining, manufacturing processes, and waste incineration; and Ba, Fe, Co, Mn, Ni, and Cr may originate from pedogenic sources (Li and Zhang 2010; Demirel 2007).

Fig. 4
figure 4

ae Distribution of scores for five factors

Factor 1 explained 26.55 % of the total variance (eigenvalue, 4.51) and included strong positive loadings on Se, Ti, and Cr and moderate loading on Mo. This factor was ascribed to predominantly anthropogenic and industrial sources. Cr is a marker of paint and metal industrial waste, and high Mo concentrations may also be related to industrial activities (Demirel 2007). As Fig. 4a shows, factor 1 is generally the highest in the middle of the region.

Factor 2 explained 15.53 % of total variance (eigenvalue, 2.64), with strong loadings on Ni and Mn and moderate loadings on Co and Ba. Hence, this factor was attributed to mixed origins, including pedogenic (Mn, Ni, Co) and geogenic (Ba) sources (Krisha et al. 2009). As Fig. 4b shows, the highest score for factor 2 was at site 7.

Factor 3 accounted for 12.36 % of the total variance (eigenvalue, 2.10), with strong loadings on Pb and Cd. Pb is mainly an indicator of agrochemical and industrial waste (Li et al. 2008). This factor can be interpreted as relating to variations in agrochemical sources. As Fig. 4c shows, the highest score for factor 3 was at site 21.

Factor 4 explained 11.32 % of the total variance (eigenvalue, 1.93), with a strong loading on B, moderate loadings on Va and Cu, and negative moderate loading on Fe (Fig. 4d). Huang and Jin (2008) showed that long-term use of chemical fertilizers contributes to the accumulation of Cu in agricultural soil. This factor was interpreted as relating to agricultural run-off and industrial effluents. As Fig. 4d shows, the highest score for factor 4 was at site 15.

Factor 5 accounted for 7.63 % of total variance (eigenvalue, 1.30), with a strong positive loading on As and moderate loading on Zn. This factor was interpreted as influenced by industrial effluents. As Fig. 4e shows, excessive arsenic represented the greatest problem in the northeastern part of the study area (sites 4, 38, and 41) and the smallest problem in the south, west, and southeastern regions (sites 28, 56, and 34).

Cluster analysis

Hydrochemical data was classified by cluster analysis (CA) into 17 dimensional spaces represented by the dendrogram shown in Fig. 5, and groundwater wells were distributed into three groups according to significant differences in groundwater heavy metal concentrations (Table 4). When evaluated in light of existing drinking water guidelines, clusters 1, 2, and 3 could be classified, respectively, as “polluted,” “highly polluted,” and “very highly polluted.” IDW, RBF, and OK methods were separately tested to create the spatial distribution of groups, and IDW-2 yielded the lowest RMSE and MAE values. Therefore, spatial distribution map was created according to IDW-2.

Fig. 5
figure 5

Dendrogram showing the clustering of heavy metals parameters of groundwater

Table 4. Mean values (mg/l) for heavy metals for each group

Cluster 1 comprised the 49 monitoring wells with the lowest loading score for Fe and moderate loading for Mn. In general, cluster 1 had the lowest values for groundwater heavy metals in the study area and included wells located on the outer limits of the area (Fig. 6).

Fig. 6
figure 6

Spatial distribution of groups formed by cluster analysis

Cluster 2 comprised the 26 wells located in the central part of the Çarşamba Plain that had moderate loading for Fe and Mn and high loading for As.

Cluster 3 comprised the three wells (7, 21, and 78) with the worst groundwater quality in the study area. Located in different parts of the plain, these wells had the highest loadings for Fe, Mn, Cd, and Al. The reason for this is thought to be agricultural and industrial activities. Thus, it would be useful to control the pesticide and fertilizer usage in these areas.

Conclusion

This study clearly demonstrated the usefulness of interpolation methods and multivariate statistical analysis for identifying groundwater heavy metals. Optimal interpolation methods for estimating the spatial distribution of heavy metals in the Çarşamba Plain in northern Turkey were found to vary, which are as follows: IDW-1: best fit for Mn, Al, and B; IDW-2: best fit for Pb; IDW-3: best fit for Cd; OK: best fit for Fe; and RBF-IMQ: best fit for As. Maps illustrating problematic areas for drinking and drip irrigation were drawn using the best interpolation method for each heavy metal. Anthropogenic activities, agriculture among them, have led to high levels of groundwater pollution. The leading mineral pollutants in the study area include Fe, Mn, Al, As, and Cd, while B and Zn represent potential pollutants. Whereas As, Al, and Cd concentrations exceed permissible limits at certain locations within the study area; alarmingly, approximately 99 % of the study area suffers from Fe and Mn pollution. In addition to interpolation, data obtained from 78 groundwater wells were subjected to multivariate statistical analysis, with cluster analysis grouping wells into three clusters (“polluted,” “highly polluted,” “very highly polluted”) and factor analysis/principal component analysis identifying five main factors explaining 73.59 % of the total variance of the 17 variables tested. Results indicated that groundwater in the study area cannot be used as drinking water due to high levels of Fe, Mn, and As. Moreover, groundwater cannot be used for drip irrigation due to very high levels of iron and magnesium.