Introduction

Especially in arid zones, groundwater resources worldwide are considered as major sources for freshwater supply worldwide, especially in arid zones. The weak rainfall as well as the intensive extraction of groundwater from shallow aquifers reduces freshwater budget and creates local water aquifer depression, causing a threat to groundwater budget. Furthermore, the increase of chemical fertilizers applications has been considered as an additional cause of groundwater pollution during the last decades. In arid regions, added to a severe water scarcity, water resources are characterized by a significant spatio-temporal variability (Zekai 2008).

Studies undertaken in arid and semi-arid showed the importance of the water assessment and management in any integrated development strategy. Accordingly, results could serve as an available scientific background for sustainable land use planning and groundwater management in the considered region.

The combination of statistics and geostatistics in a context of groundwater resources management has already been explored by many authors. The combination of principal component analysis and kriging was originally proposed by Espinosa et al. (1993) to characterize anomalies in soil geochemical composition. The same approach was later used to characterize groundwater quality in a variety of situations (e.g. Sanchez-Martoz et al. 2001; Jiang et al. 2009). Kriging can also be used directly to map groundwater quality (e.g. Uyan and Cay 2010), or in combination with other techniques than PCA, such as cluster analysis (e.g. Yidana et al. 2010). Either both kriging and co-kriging or semi-variance analysis were applied for mapping GW spatio-temporal level fluctuations in arid and semi-arid regions (e.g. Ta’any et al. 2009).

Based on the previous overview, the combined use of PCA (and factor analysis) and kriging in hydrochemical analysis is well documented in the literature. In all these cases, ordinary kriging was used to interpolate factor scores which represented the weight of the respective processes affecting the hydrochemistry in the various basins. In this case study, PCA and geostatistics were combined and evaluated for GW quality mapping across an unconfined aquifer in Hajeb Elyoun–Jelma (HJ) basin under arid climate conditions and using limited database from 22 wells out of 35 wells belonging to the aquifer. In addition, the evaluation was made under an unequal geographic distribution of boreholes throughout the study. The selected wells have complete data sets and chemical analysis balance below 5 %.

The study is based on both the hydrogeochemical evolution in the aquifer and the physicochemical characteristics. Based on this multivariate and complex information, using PCA, it is intended to establish a series of factorial variables that summarize all the hydrogeochemical information. A geostatistical study of these derived variables allows working in a reduced multivariate space, and establishing their spatial distribution throughout the aquifer by the calculation of variograms and cross-validations. Likewise, it is intended to identify the spatial development of the principal processes acting on groundwater quality by mapping of groundwater quality using these factorial variables and ordinary kriging (OK) technique. In fact, OK is the most popular method which is based on the assumption of intrinsic stationarity and ergodicity of the data, and the availability of sufficient data to model spatial autocorrelation (Yidana et al. 2010). In this way, it is aimed in this case study to verify whether these new variables permit location of the zones where various physicochemical processes are superimposed, considering the hydrogeochemical and geological parameters.

Study area

Located in the northern part of Central Tunisia (Fig. 1), the HJ Basin is characterized by an arid climate with large temperature and rainfall variations. The mean annual temperature and rainfall are about 19 °C and 230 mm, respectively (Saidi et al. 2009). In addition, it is characterized by high evaporation of about 1,200 mm (Dassi et al. 2005). The precipitation/evaporation ratio (P/E) is about 0.19 classifying study area as an arid region (Maliva and Missimer 2012). The dry climate and low precipitation accentuate the drawdown of water resource and can also affect the water quality by salinization (Maoui et al. 2009). In addition, groundwater quality is highly influenced by an increasing exploitation, hydrogeological conditions, and human impacts (Saidi et al. 2009; Abid et al. 2010).

Fig. 1
figure 1

Location and geological map

The HJ basin is part of Tunisian Atlas domain extending over 1,350 km². It shows a complex geological situation characterized by numerous NE–SW and E–W anticlines associated with Triassic salt intrusions (J. Hamra and Kodiet El Halfa). The hydrogeologic basin is featured by a NE–SW synclinal structure developed through the major compressive movement (Atlasic event). The surrounding mountains are made up mainly of Cretaceous outcrops, The Eocene sediments are encountered only in the southern and the north eastern limits of the study area. In the central part of the region, the Quaternary crust (1–2 m), which covers the whole region, constitutes the superficial formation (Burrolet 1956; Zammouri 1988; Zouari and Mamou 1998; Koshel 1980).

Groundwater in the (HJ) synclinal basin is hosted in three main reservoirs namely the Cretaceous, the Middle Miocene, and Mio–Plio Quaternary aquifers. The second one is the most important. Indeed, it provides water for irrigation and drinking water for the whole region and part of the town of Sfax (900,000 inhabitants) located 170 km from these resources. Total groundwater abstraction in 2006 is estimated at 20 Mm3.

The cross-section AA’ (Figure S1 ESM only) shows hydrostratigraphic units consisting of the three aquifer layers. Middle Miocene Aquifer with thickness ranging between 10 and 300 m is made up of coarse to medium grained sandstone. The MMAGW flows from the northwestern and northern highlands (Mghilla and Labeidh Mountains), where the MM outcrops to the southeast (Fig. 2).

Fig. 2
figure 2

Piezometric map (year 2006) and location of sampled boreholes

Materials and methods

Hydrochemical data

Hydrochemical data were obtained from a sampling network of 22 wells tapping the MMA. The density of wells is slightly higher in Hajeb and Jelma zones (East and South) than in Baten El Ghzel and Mghilla (Fig. 2). Sampling was carried out during the high-water period (May 2006). Electrical conductivity, temperature and pH were determined directly in situ using a field conductivity meter. Major elements were analysed in the LARSEN Laboratory at the National School of Engineers of Sfax. A list of analytical methods used in this study is reported in Table 1.

Table 1 Used analytical methods

Methodology

Principal component analysis

The main problem solved by PCA is to transform a set of correlated variables into groups of elements, which could be interpreted in an ideal context as independent factors underlying the phenomenon (Espinosa et al. 1993; Wackernagel 1995; Sanchez-Martoz et al. 2001; Hachicha et al. 2008; Maoui et al. 2009; Hamzaoui et al. 2009). It is used to distinguish the contributions of natural processes and anthropogenic impacts to the chemical composition of MMAGW. PCA can provide information on the most meaningful parameters that can describe the whole data sets with minimal loss of original information (Wunderlin et al. 2001). The maths behind PCA was detailed in many references (e.g.: Davis 1986; Sanchez-Martoz et al. 2001; Yidana et al. 2010).

Data were standardized to their corresponding z scores (Eq. 1). Data standardization is essential in PCA because in the computation of the Euclidean distances, the parameters with the highest variances tend to have a greater influence over those with lower variances (Güler et al. 2002; Cloutier et al. 2008).

$$z = \frac{x - \mu }{\sigma } . $$
(1)

where, x is the data, μ, σ are respectively the mean and standard deviation of the datasets.

Moreover, PCA is generated using the rotation method Varimax with Kaiser Normalisation. Indeed, rotation method (Varimax) is the application of an orthogonal matrix to the factor matrix to maximize the differences among the factors which facilitate interpretation of the analysis the results (Yidana et al. 2010). Frequently, this method is applied to increase the participation of the variables with higher contribution and reduce those with lesser contributions. In this way, PCA was subjected to raw Varimax rotation. The so-called factor 1 is related to the largest eigenvalue and is able to explain the greatest amount of variance in the data set. The second factor (orthogonal and uncorrelated with the first one) explains most of the remaining variance, and so forth (Jiang et al. 2009).

Kaiser’s normalisation (Kaiser 1960) is applied in this study. In fact, this criterion is widely used in factor rotation for sizing down the number of factors that can be included in the final factor model. Factors selected are having eigenvalues at least equal to 1 (Ogasawara 1999; Yidana et al. 2010). The PCA is realized with SPSS software. After establishing the associations between the physico-chemical variables of the MMAGW, PCA factors are analysed by geostatistics to interpolation and mapping.

Geostatistics components

From variogram analysis through kriging cross-validation to mapping are performed by single integrated program: GS+ software (1988) (Ta’any et al. 2009; Yidana et al. 2010). The theoretical bases of geostatistics are described in several papers and textbooks (Matheron 1970; Goovaerts 1997; Ahmadi and Sedghamiz 2008). In addition, from a hydrochemical point of view there are widely used techniques (Sanchez-Martoz et al. 2001; Cambardella et al. 1994; Harmouzi 2010; Ta’any et al. 2009; Yidana et al. 2010)

The semi-variance analysis is used to measure spatial dependence or autocorrelation, depending on the distribution of the data and to check land isotropy (Yidana et al. 2010; Ahmadi and Sedghamiz 2008). The kriging provides a mean of interpolating values for points not physically sampled using knowledge about the underlying spatial relationships in a data set. In this study we used the OK, which is considered the most commonly used method (Lefohn et al. 2005; Ta’any et al. 2009; Harmouzi 2010; Yidana et al. 2010). OK is a spatial estimation method where the error variance is minimized. This later is called the kriging variance. It is based on the configuration of the data and on the variogram, hence it is homoescedastic. It is not dependent on the data used to make the estimate. Yamamoto (2005) proved that the ordinary interpolation variance is a better measure of accuracy of the kriging estimate. The cross-validation analysis is a mean for evaluating effective parameters for kriging interpolations. In cross-validation analysis each measured point in a spatial domain is individually removed from the domain and its value estimated via kriging as though it were never there. In this way a graph can be constructed of the estimated vs. actual values for each sample location in the domain.

In others words, the basic process of cross-validation involves: (1) making estimates at all sampling locations one at a time, assuming no data value at the given location (i.e., ignoring its data value), and (2) comparing those estimates to the actual, known data values. In general, the two key goals of cross-validation are to compare the effectiveness and accuracy of two or more spatial estimation methods, and to evaluate the performance of any given spatial dependence model and/or search strategy in a kriging analysis.

The reliability of semi-variogram and goodness of kriging interpolation describing the linear regression equation is evaluated by the regression coefficient from cross-validation approach, which must be near to 1.

Results and discussion

After hydrochemical parameter analysis, the ion balance between cations and anions is less than 5 % for all samples. According to WHO guidelines (1993), the range of pH value prescribed for drinking purposes is 6.5–8.5. The pH values of groundwater in the study area varied between 7.68 and 8.35, indicating slightly acidic to slightly basic water. These pH values were in the range of accepted values. The exceptional value of 8.82 may be originated from measurements error.

Electric conductivity (EC) shows the concentration of ionized substances in water. According to WHO, the maximum permissible value of EC for drinking water is 1,400 μS/cm. EC values in MMAGW range between 280 and 2,460 μS/cm. The lowest values are located at the northern and in the north-eastern parts of the study area.

Descriptive statistics of chemical composition

The ten hydrochemical parameters monitored in 22 boreholes are summarized in Table 2. It reflects a moderate to high variability (standard deviation) among samples of the variables. The piper diagram (Fig. 3) and Schoeller–Berkaloff diagram (Figure S2 ESM only) indicate that the water is featured by mixed composition. Indeed, the water of this aquifer has several types of facies, with graduation between Mg–Ca–SO4, Na–Mg and Cl–SO4 and Na–Cl types. The enrichment with bicarbonate is due to leaching of the cretaceous limestones. The enrichment with sulphates can be due to a contribution of the triassic evaporitic rocks (southern part of the basin). To provide a better understanding of mineralisation mechanisms and spatial distribution, combination of PCA-geostatistics approaches was implemented.

Table 2 Descriptive statistics of chemical composition (N = 22)
Fig. 3
figure 3

Piper diagram of MMA groundwater

PCA results

Geostatistical methods are optimal when data are normally distributed and stationary (mean and variance do not vary significantly in space). Significant deviations from normality and stationarity can cause problems. The first step of this study is to plotting the histogram of data for checking normality and a posting of the data values in space to check for significant trends. Subsequently, PCA was applied to ten normalized variables data sets, separately (TDS, EC, Ca2+, Mg2+, Na+, K+, Cl, SO4 2−, HCO3 , and NO3 ). According to the Eigenvalue (6.860 and 1.669, respectively), two principals components PC1 and PC2 were selected, which explain 68.603 and 16.69 %, respectively, of the total variance, respectively (Table 3). The application of rotation Varimax method led to an increase in the PC1 and reduction in the PC2.

Table 3 Loadings of principal component (PCA)

The scatter plot (Fig. 4) and correlation matrix (Table 4) indicates that the most relevant variables defining water quality are related to water dissolved salts (Ca2+, Mg2+, Na+, K+, Cl, SO4 2−) and (EC, TDS), and the less relevant one are HCO3 and NO3.

Fig. 4
figure 4

Scatter plot (PCA)

Table 4 Correlation matrix

PC2 (16.695 %) of the variance is mainly driven by HCO3 and NO3 with correlation coefficient (loadings >0.77). PC1 which explains 68.603 % of the cumulative variance is mainly driven by EC, Ca2+, Mg2+, Na+, K+, Cl, SO4 2− with correlation coefficient (loadings > 0.8) (Table 5), which are chemical variables. PC2 may be related to the evolution of the bicarbonates and contamination of water by organic fertilizers and manure and the transfer of pollution by domestic septic tanks. In fact, PC2 has high correlation with NO3 (r = 0.788). The PC1 may be related to common sources of natural processes of dissolution of geological rocks components as following:

Table 5 Components matrix after rotation
  1. 1.

    The high correlation between Mg–Ca (r = 0.882) can be related to dolomitisation phenomenon;

  2. 2.

    The high correlation between Mg–Na and Mg–Cl (r = 0.936, r = 0.927) may be related to ionic exchange by interaction with clay level. This interrelationship indicates that the water hardness is permanent in nature (Hamzaoui et al. 2009);

  3. 3.

    The high correlation Mg–SO4 (r = 0.959) may be derived by the weathering of a magnesium sulphate mineral.

  4. 4.

    The high correlation between Na–Cl and K–Cl (r = 0.972 and r = 0.835 respectively) may be derived by the simultaneous Halite and Sylvite dissolution.

  5. 5.

    The calcium and magnesium sulphated type is produced by the presence of evaporitic formations in J. Hamra.

  6. 6.

    The calcium and magnesium bicarbonate type due to the scrubbing and dissolution of dolomitic limestone (J. Roua) and sandstone in the alimentation area.

Following precedent, two main processes control the chemical composition of groundwater: (1) dissolution of saline materials; and (2) contamination of water by organic fertilizers and also (3) the transfer of pollution by domestic septic tanks.

Components selected: definitions of the new variables

Based on the two components (PCI, PC2), two new variables (VI, V2) were established using the values of principal component scores of the samples, which project the n = 22 observations into the two principal components (Table S1 ESM only). These new variables were used for the geostatistical study, and this enabled an analysis in an orthogonal multivariate space, which is more reduced than the ten original variables.

Mapping groundwater quality

The geostatistical study is performed by GS+ software. The principal aim of this step is to establish the spatial distribution of the two new variables V1 and V2 in the studied zone. For this, the spatial variability of VI and V2 over the aquifer was defined by calculating their experimental isotropic variograms (Figs. 5, 6). Variograms parameters are showed in Table 6: these parameters are used to classify spatial dependence (sill and range) and to estimate recording errors (nugget effect) (Ta’any et al. 2009; Cambardella et al. 1994). In fact, the range values (A) indicate that the spacing between wells was suitable (Ta’any et al. 2009). The presence of nugget effect (Co = variance at zero distance) implies inherited variability shorter than the spacing between observation wells.

Fig. 5
figure 5

Experimental isotropic variogram of V1

Fig. 6
figure 6

Experimental isotropic variogram of V2

Table 6 V1 and V2 variogram’s parameters and validation correlation coefficient (Rc)

For further study, the anisotropy of the aquifer was checked using omnidirectional experimental semivariograms. They were constructed in the four main directions E–W, NE–SW, N–S, and NW–SE for V1 and V2. There is a minor degree of anisotropy. Therefore, the detected anisotropies were disregarded, and, accordingly the isotropic models were adopted.

Then, cross-validation test is done. It refers to the reliability of the adopted models and accordingly to the reliability of the kriging estimates. Regression coefficient obtained after cross-validation must be near to 1 reflecting uncertainty by doing kriging interpolation (Ta’any et al. 2009). Regression coefficients Rc1 = 0.95 and Rc2 = 0.62 are respectively obtained for V1 and V2 (Table 6). The two regression coefficients are above 0.5, besides Rc1 > Rc2. So variable V1 is more spatially significant than variable V2. Subsequently, mapping of V1 and V2 was performed by means of the point kriging method.

According to V1 map (Fig. 7) corresponding to dissolution of saline materials, the highest values are recorded in southeast zone of the catchment corresponding to dissolution of triassic evaporitic material in J. Hamra. Mineralisation is also mobilized by groundwater flow from the North and the Northwest to the South of HJ Aquifer.

Fig. 7
figure 7

Map of estimated V1 with point kriging method

The region characterised by lower values is corresponding to J. labeidh situated in aquifer upstream.

Mapping of variable V2 (Fig. 8) shows highest values in Jelma and Mghilla. These are characterised by agriculture and also domestic activities. Moreover, high weight of V2 can be due to an intense exploitation from greater depth in these regions.

Fig. 8
figure 8

Map of estimated V2 with point kriging method

Conclusions

The aim of this paper is to evaluate PCA and Geostatistics techniques for improving assessment of detrital MMAGW quality in HJ basin. First, based on PCA with variance of 85 %, two factors have been defined which are associated with (1) natural processes (lithology, rock-water interactions) leading to the enrichment mainly with Na, Cl, Ca and SO4; (2) anthropogenic activities (agriculture, industry, urban development): dissolution of the bicarbonates, contamination of water by organic fertilizers and manure and the transfer of pollution by domestic septic tanks. Second, these tow components have enabled two new variables to be defined: VI and V2 geostatistics techniques constitute a useful tool for the study of spatial variability of these variables reflecting hydrogeochemical processes in MMAGW of HJ basin—since they enable the distribution of the variable throughout the aquifer to be analysed via its estimation by experimental isotropic variogram, then cross-validation test. Variogram’s parameters indicate that the spacing between wells was suitable. Regression coefficients from cross-validation are above 0.5. Subsequent kriging interpolation technique enables us the mapping of variables VI, and V2 associated with the two principal processes implicated. Map1 (for V1) shows that mineralisation in MMAGW is derived by saline materials mainly by triassic evaporites sediments in J. Hamra in the southern zone and also mobilized by groundwater low from the North and the North-West to the South of HJ basin.

The Map2 (for V2) shows that contamination by agriculture and domestic activities is, locally, high in Jelma and Mghilla zone.

Finally, results suggest that both natural and anthropogenic processes contribute to MMAGW quality. Moreover, in terms of spatial distribution, natural impact is the most significant.

It is worth noting that results are satisfactory and demonstrate that combination PCA-Geostatistics can be applied in cases where the aquifer is complex, database set is limited and with unequal spatial distribution information in the studied field.

However, it must be recognized that these methods may not applied in every situation. A minimum number of sample locations are required for any geostatistical analysis.