Introduction

Over the last two decades, many regions worldwide have been severely affected by periods of extended drought, especially the African continent due to changes in climatic conditions. According to Nyenjie and Batelaan (2009), groundwater resources are the first to be affected by global warming. This environmental problem is evidenced through the decline in groundwater level and the continuous deterioration of its quality (Kamel et al. 2010).

Located on the southern bank of the Mediterranean, Tunisia is a country with majority of the territory dominated by arid and semi-arid climate, where the water resources are scarce and heterogeneous in time and space (Majdoub et al. 2012). These characteristics seem to limit the socioeconomic development of the country, especially in the agricultural sector (Hamzaoui-Azaza et al. 2013; Hassen et al. 2016; Halimi et al. 2016). Furthermore, salinization has constituted a major factor responsible for the deterioration of water quality for several decades (Farid et al. 2013). It constitutes one of the negative effects of climate change on the availability of groundwater resources and severely threatens their quality. This has become a major concern for Tunisia in safeguarding these resources for future generations.

Various quantitative methods of hydrochemical groundwater data evaluation have recently been developed to provide decision-making tools that would aid the agencies responsible for the management of water resources (Farid et al. 2013; Hajd Ammar et al. 2014; Bouzourra et al. 2015; Alssane et al. 2015; Fadili et al. 2016).

Multivariate methods are generally used for the hydrochemical analysis of groundwater samples. They provide a quantitative insight that can highlight relationships that are difficult to elucidate by the direct analysis of the data. Moreover, these methods have the advantage of providing a concise description of the main information resulting from the input data.

Several research studies have demonstrated the relevance of multivariate statistical and geostatistical approaches in groundwater data evaluation. For example, Jiang et al. (2009) and Belkhiri et al. (2010) used multivariate methods to analyse the variation in the hydrochemical composition of groundwater. In addition, Cloutier et al. (2008) combined statistical tools with hydrochemical methods to study independent factors controlling the quality of groundwater in Quebec. Furthermore, Yidana et al. (2012) used factor analysis to identify the origin of fluorides in a volcanic aquifer in Ghana. The results showed that the high concentrations of fluorides are from the dissolution of silicates. They highlighted a close relationship between the geology of the aquifer and water chemistry. Moreover, Kim et al. (2012) combined analyses of chemometrics and kriging for identifying groundwater contamination sources and origins at the Masan coastal area in Korea. They found that the main sources of contamination are seawater, nitrate and iron. Gbolo and Gerla (2013) used multivariate analysis to characterize nutrients from an abandoned feedlot and Mikhailov et al. (2007) used multivariate analysis as a tool for the ecological assessment and characterization of landfills. Venkatramanan et al. (2016) also used geostatistical techniques to evaluate groundwater contamination and its sources in Miryang City, Korea.

The unconfined aquifer of Sidi El Hani is situated between the sabkha Sidi El Hani and the sabkha Cherita in central-eastern Tunisia. It experiences multiple constraints due to the scarcity of surface water, which is limited only to the sabkhas and the wadi Cherita. Population increase and the increase in agriculture land uses in the area have had a negative impact on the groundwater quality (M’nassri et al. 2016). This present paper focuses on the hydrochemical investigation of the groundwater of Sidi El Hani aquifer to identify the main factors currently affecting water quality. Multivariate statistical methods including a hierarchical component analysis (HCA), a factor analysis (FA) and a principal component analysis (PCA) were used. The HCA was used to group the hydrochemical data, subsequently revealing what makes them homogeneous and thus leading to a better understanding of the salinity distribution. The FA and PCA enabled an examination of the independent factors responsible for the hydrochemical modification of the groundwater quality. Then, geostatistical modelling was used to provide a comprehensive view of the spatial distribution of the salinity and to adequately summarize any structural information related to the zone of influence of a regionalized chemical variable.

Materials and methods

Study area

The unconfined aquifer of Sidi El Hani is located in the Ouled Chamekh Plain in central-eastern Tunisia (Fig. 1). This plain covers an area of 346 km2 and belongs to the Mahdia region. The Ouled Chamekh plain is bordered by the sabkha of Sidi El Hani in the north and the northeast, by the wadi Cherita in the east, by the sabkha of Cherita and the Ktitir Mountain to the south, and by the El Guessat Mountain to the west. The sabkha Sidi El Hani is a northwest-southeast elongated depression with an average altitude of 33 m, covering an area of approximately 370 km2. This saline lake drains the sabkha Cherita, which covers an area of 70 km2 through the hydrographic network of the wadi Cherita. The altitude of the Cherita depression is 60 m (Essefi et al. 2013). The climate of the area is semi-arid, characterized by an irregular and relatively weak annual rainfall of approximately 270 mm. The annual average temperature is 9 °C and 30 °C for the winter and summer, respectively. The potential evapotranspiration is estimated to be approximately 1500 mm/year.

Fig. 1
figure 1

Location of the study area

Geological and hydrogeological setting

According to Castany (1948) and Maamri and Rabhi (2003), the studied aquifer is formed by a basin filled with sediments of Mio-Plio-Quaternary age comprising the Holocene (Qh), Upper Pleistocene (QP3), Middle Pleistocene (Qp2) and Lower Pleistocene (Qp1), highlighted in Fig. 2. These formations are primarily composed of continental deposits of clay, marl, clayey sand, gypsum, silt and sandy clay. The study area is strongly controlled by the Ktitir, Hajeb Layoun and Sidi El Hani faults which control the Mio-Plio-Quaternary deposits (Khomsi et al. 2006). These faults have NW-SE and NNE-SSW to E-W orientations (Essefi et al. 2013).

Fig. 2
figure 2

Geological summary map of the study area: 1 holocene; 2 upper pleistocene; 3 lower pleistocene; 4 mio-pliocene; 5 anticline axis; 6 fault; A, Ktitir Fault; B, Hajeb Layoun Fault; C, Sidi El Hani Fault (M’nassri et al. 2018)

The unconfined aquifer of Sidi El Hani mainly comprises sands, sandy clays and clayey sands. The direction of the groundwater flow is towards the sabkhas of Sidi El Hani and Cherita and the wadi Cherita (Fig. 3). The potentiometric surface map was interpolated using a Geographic Information System (GIS) software. The interpolation was performed using groundwater elevation data from 15 wells. This was done using the kriging method which allows a linear estimation with the minimum of variance. The water table is located at a depth of 5–30 m. The transmissivity varies between 2 × 10−4 and 5 × 10−3 m2/s, and the hydraulic gradients calculated in a regional flow model are on the order of 0.3–0.45%. These values were obtained from the previous numerical studies of the regional groundwater flow (Dridi et al. 2013).

Fig. 3
figure 3

a Potentiometric surface map from (Dridi et al. 2013) (groundwater levels are in meters) and b hydrological cross-section of the Sidi El Hani aquifer from (M’nassri et al. 2019)

Sample collection and analytical procedure

Groundwater sampling was performed during March and April 2015. Forty-nine samples were collected from shallow wells that have depths varying from 30 to 50 m (Fig. 4). Following the standard procedures introduced by Eaton et al. (1995), we carried out sampling and preservation of water samples as follows: All water samples were taken after pumping of the corresponding wells for 15–20 min to obtain representative values under ambient aquifer conditions. Each sample was collected in a new 1 L polyethylene bottle and then stored at a temperature below 4 °C before analysis in the laboratory. The temperature (T), electrical conductivity (EC) and hydrogen ion activity (pH) of each groundwater sample were measured in the field during the sampling activities. Concerning the pH measurements, the electrode was calibrated with a reference buffer solution of pH 4, 7 and 8 at each location.

Fig. 4
figure 4

Location of water samples

The chemical analyses shown in Table 1 were accomplished in accordance with the standard methods of the American Association of Public Health (APHA 1995). Potassium (K+) and sodium (Na+) were measured by flame photometry (Rodier 1996). Sulfate (SO42−) concentrations were measured by the colorimetric spectrophotometry using BaCl2. Chlorides (Cl) were determinate by standard AgNO3 titration (Rodier 1996). Bicarbonates (HCO3) were determinate by titration with H2SO4 (Rodier 1996), and calcium (Ca2+) and magnesium (Mg2+) concentrations were determined titrimetrically using standard ethylendiaminetetraacetic acid (Rodier 1996). The accuracy of chemical analysis was carefully checked by the repeated analyses of samples and then by calculating the percent charge balance error (CBE) (%) as defined by (Freeze and Cherry 1979):

$${\text{CBE}} = \left[ {\frac{{\left( {\mathop \sum \nolimits m_{\text{c}} - \mathop \sum \nolimits m_{\text{a}} } \right)}}{{\left( {\mathop \sum \nolimits m_{\text{c}} + \mathop \sum \nolimits m_{\text{a}} } \right)}}} \right] \times 100,$$
(1)

where mc is the molality of cationic species and ma is the molality of the anionic species.

Table 1 Analytical methods used for the analysis of chemical elements

The results of the analysis were judged to be acceptable when the CBE was less than or equal to ± 5%. No samples in the database had a CBE greater than this value.

TDS were measured by evaporating a pre-filtered sample to dryness. The accuracy of the chemical analysis was carefully studied using repeated analyses of three samples. At the end of the measurements, we chose the mean value of the repeated analyses.

The analyses of total dissolved solids (TDS) and major elements such as Ca2+, Mg2+, Na+, K+, Cl, SO42− and HCO3 were conducted in the hydrological laboratory of the Higher Institute of Agronomy of Chott Mariem and in the laboratory of water analysis at the National Institute for Research in Rural Engineering, Water and Forest (Tunisia).

Multivariate statistical analysis and geostatistical analysis

In our study, the multivariate statistical analysis was based on HCA, FA and PCA. It was performed with the Statistical Package for the Social Sciences (SPSS, software version 18) and applied to a subgroup of the whole hydrochemical dataset of 49 samples and 11 parameters (T, pH, TDS, EC, Ca2+, Mg2+, Na+, K+, Cl, HCO3−, SO42−) using their similarities (Belkhiri et al. 2010; Hamzaoui-Azaza et al. 2009; Belkhiri and Mouni 2014; Bouzourra et al. 2015). This stage was preceded by a normalization of the data according to their z scores to achieve a normal distribution and homogeneity and to ensure that all of the parameters were close in terms of their variances (Guler et al. 2002; Cloutier et al. 2008; Kolsi-Hajji et al. 2013).

The hierarchical component analysis was performed to divide the dataset into hierarchical groups based on the similarity between the samples and to explain the causes of the hydrochemical variation from one location to another. According to Alberto et al. (2001) and Yidana et al. (2010), this classification is known as Q-mode classification, in which the classification plot using Euclidean distances (straight line distance between two points in c-dimensional space defined by c variables). The Ward agglomeration method was used as a linkage criterion in this analysis (Ward 1963). This combination was selected to reveal the most unique sample association. Indeed, the most similar parameters were clustered together in the same group (Fig. 5).

Fig. 5
figure 5

Hierarchical component analysis of the groundwater of Sidi El Hani

A factor analysis was applied to study the variance observed in the dataset. It derives a subset of uncorrelated variables called factors (Cloutier et al. 2008; Kolsi-Hajji et al. 2013). The adequacy of the factor solution was verified by the Kaiser–Meyer–Olkin (KMO) index and the Bartlett sphericity test (Kaiser 1960). If the KMO is greater than 0.5, the factor analysis is appropriate to provide a significant reduction in data size. The total numbers of factors generated from the factor analysis that have an eigenvalue greater than 1 can establish a pattern of variation among the variables and reduce the large dataset into factors for easy interpretation (Jiang et al. 2009; Yidana et al. 2012). The first factor, which has the highest eigenvalue, represents the most important source of variation in the dataset, whereas the last factor represents the least important process that contributes to the chemical variation (Maoui et al. 2009; Hachicha et al. 2008).

The principal component analysis (PCA) is far more commonly used than factor analysis (FA) and permits also to obtain the component which reflects both common and unique variance of the variables. PCA may be seen as a variance-focused approach that reproduces both the total variable variance with all components as well as the correlations.

It was applied in our study to load on the factor and to interpret the correlation coefficients between the variables and the factor using the maximum variance of the factor after rotation (varimax rotation) to maximize the variation among the variables for each factor. Note that “varimax rotation” is an application of an orthogonal matrix to the factor matrix (Sachez-Martoz et al. 2001; Kim et al. 2009). As a convention, variables with significant contributions to the final factor model should not be less than 50% (Belkhiri et al. 2010). The factors selected in the previous stage are then laid out in vectors. They are the image of the principal processes responsible for the variation of the hydrochemical composition of the groundwater. Each vector is then modelled in the geostatistical stage by a characteristic vector (Sachez-Martoz et al. 2001; Kim et al. 2009) using the program GS+ version 7 (Yidana et al. 2010). The geostatistical analysis establishes the spatial distribution of the selected vectors by computing their experimental isotropic variograms (Louati et al. 2015). The experimental variograms \(\gamma \left( h \right)\) were fitted by the Gaussian model expressed by Eq. 2. Then, a cross-validation test based on the method of ordinary kriging (OK) (Goovaerts 1997) was performed to detect both the reliability of the adopted model and the reliability of the spatially interpolated data. In the last stage, kriging maps were produced to clarify the possible distribution factors:

$$\gamma \left( h \right) = C_{0} + C\left[ {1 - e^{ - } \left( {\frac{h}{A}} \right)^{2} } \right]$$
(2)

where C0 is the nugget effect (m2), (C0 + C) (m2) is the sill of the variogram, A (m) is the range and h (m) is the distance between two sampling points.

Results and discussion

Hierarchical component analysis

The dendrogram obtained from the HCA revealed the existence of two classes of groundwater in the unconfined aquifer of Sidi El Hani (Fig. 5). The first class, C1, encompassed 53% of the water samples. It was characterized by a fairly high salinity ranging, between 2400 and 6620 mg/L, with an average and a standard deviation of approximately 3881 and 976 mg/L, respectively (Table 2). The results of the chemical analyses of the major elements show the predominance of chloride, sodium and sulphate. They varied from 726 to 2130 mg/L, from 411 to 1153 mg/L and from 3404 to 1275 mg/L, respectively, whereas potassium was almost non-existent in the groundwater. The second class, C2, represented 47% of the total samples with a high salinity that exceeded 6000 mg/L. As noted for the C1 class, C2 was marked by a predominance of chloride, sodium and sulphate but with higher concentrations. The sodium concentrations varied between 590 and 1462 mg/L with an average of 1079 mg/L, and the sulphate contents ranged between 1920 and 5472 mg/L. However, the chloride concentrations varied between 1088 and 2667 mg/L with an average and a standard deviation of approximately 1852 and 365 mg/L, respectively.

Table 2 Statistical analysis of hydrochemical parameters of the clusters

According to the classes identified by HCA, the spatial distribution of the salinity shows that high salinity was recorded in the neighbourhoods of the sabkha Sidi El Hani and the wadi Cherita (Fig. 6). The presence of evaporites in this area, including gypsum, anhydrite and halite, may be directly linked to the higher salinity in these samples compared to the samples located in the western part. This can be explained by the presence of permeable limestone that allows a high infiltration rate of rainwater into the groundwater, and thus contributes to the dilution of dissolved salts (Bekkoussa et al. 2013).

Fig. 6
figure 6

Spatial interpolation of measured salinity

Furthermore, it is noteworthy that the water table near the mountain of El Guessat is rather deep, which may explain why the groundwater is protected against anthropogenic contamination. Represented on the Piper diagram (Piper 1944), the water of class C1 has mixed facies of Cl–SO4–Na–Ca. However, C2 is characterized by a sodium chloride facies (Fig. 7). The two types suggest the presence of process of dissolving minerals, particularity halides and sulphated rocks and/or contamination of the water by anthropogenic activities (Amadou et al. 2014; Najib et al. 2017).

Fig. 7
figure 7

Piper diagram of Sidi El Hani groundwater

Factor analysis and principal component analysis

The FA was applied to 49 samples and 11 variables, including T, pH, EC, TDS, Ca, Mg, Na, K, Cl, HCO3 and SO4. It should deform the least possible initial configuration of the individuals and variables. This was confirmed by an index of KMO higher than 0.5 and by a Bartlett sphericity test with a value lower than 0.001 (Table 3). The eigenvalue is the variance explained by a factor, which indicates the variance it captured; the higher the value, the more variance it has captured (Awadallah and Yousry 2012).

Table 3 KMO index and Bartlett’s test of the factor analysis

As a result, two factors with eigenvalues greater than 1 were extracted from the “varimax-rotation”. The first factor, F1, expressed more than 42.6% of the total variance, whereas the second factor, F2, had a variance of 15.9% (Table 4). In this study, the factor loading was classified as moderate because the value of cumulative variance varied between 0.5 and 0.75% (Belkhiri and Narany 2015).

Table 4 Percentage of variance of selected factors (using the principal component analysis extraction method)

The first factor, shown in Table 5, represents the most important factor controlling the evolution of the groundwater. It was strongly correlated with TDS, EC, Na, Cl, SO4 and Mg as follows: TDS (0.91), EC (0.89), Cl (0.86), Na (0.84), SO4 (0.69) and Mg (0.67). This confirms the regrouping of these elements with the electric conductivity on the axis F1 (Fig. 8). On the other hand, the second factor (F2) was also correlated with Ca (0.81) and K (0.62).

Table 5 Factor analysis of the physicochemical parameters
Fig. 8
figure 8

Principal component analysis: scatter plot of F1 versus F2

The correlation matrix given in Table 6 shows significant correlation between the major elements. The chlorides are strongly correlated with sodium (0.799) and magnesium (0.665). The sulphates have a correlation with calcium (0.577). These correlations indicate the similarity of the phenomena responsible for the release of these ions in groundwater. On the other hand, the dissolved salts are related primarily to sodium (0.774), chloride (0.756) and sulphate (0.621). These elements are thus the most relevant variables defining the salinization of the groundwater (Kharroubi et al. 2012). In fact, the correlation between Ca and SO4 may be related to the dissolution of gypsum. However, the high correlation between the Na and Cl may be derived from the dissolution of halite.

Table 6 Correlation matrix of the physicochemical parameters

Factor F1 can be considered as a possible path for natural mineralization (geology and evaporation). It appears to be the most important mechanisms in the formation process of ions in water: the majority of the variables were regrouped with the electric conductivity around axis F1 (see Fig. 8), indicating the influence of the water–rock interaction on natural mineralization. F1 thus represents the natural factor derived by evaporite and carbonate dissolution. It suggests that the natural process is the most important mechanism controlling the ions in the given groundwater. These elements originate from the long-lasting contact of water within the aquifer, either in the water-saturated zone or during the crossing of the unsaturated zone. This mechanism is highlighted by both the rock weathering over time and the precipitation of ions resulting from the dissolution of evaporites, particularly halite, gypsum and anhydrite, and carbonates, such as calcite and dolomite (Trabelsi et al. 2007; Fadili et al. 2016). As shown in the study of M’nassri et al. (2019), the contamination of the Sidi El Hani groundwater is due to dissolution of halite, cation exchange, and precipitation of carbonate minerals such as calcite and dolomite coupled with the dissolution of gypsum, and evaporation. The correlation of F2 with calcium and potassium suggests that F2 is mainly related to anthropogenic activities. The calcium and potassium likely originate from the infiltration of surface waters such as rain and irrigation water, which dissolve and leach the salts concentrated in the soil or in the unsaturated zone (Yermani et al. 2003; Bekkoussa et al. 2013). In addition, the Gibbs diagram (Fig. 9) underlines that evaporation is the primary mechanism responsible for the increase of concentration of all species present in the groundwater. Furthermore, evaporation leads to precipitation of the least soluble mineral. Hence, an amount of salt could accumulate on the topsoil. This salt may then be leached under rainfall or irrigation and then transferred to deep soil layers (M’nassri et al. 2018).

Fig. 9
figure 9

Gibbs plot of water samples of the Sidi El Hani basin: total dissolved solids (TDS) as a function of weight ratios of a Na/(Na + Ca) and b (Cl/Cl + HCO3) (M’nassri et al. 2018)

Geostatistical modelling of factors F1 and F2

Semivariograms computed from the OK model illustrated the spatial distribution vector V1 and vector V2 associated with F1 and F2, respectively. The best-fit semivariogram model parameters are shown in Table 7. They are characterized by a nugget effect, in which the model intersects the y-axis at 0.036 and 0.22 for V1 and V2, respectively. Moreover, coefficients of determination \(r^2_1\) = 0.75 and \(r^2_2\) = 0.62 were obtained for V1 and V2, respectively. Additionally, the normalized sill [(C0 + C)/C] presents 49.5% and 60% for V1 and V2, respectively. These values are lower than 75%, indicating a moderate spacing between the sampling points (Belkhiri and Narany 2015). Therefore, the semivariogram of the studied parameters is highlighted in Fig. 10, in which the black lines represent the empirical semivariances and the scatter point symbolize the experimental semivariances.

Table 7 Variogram model parameters obtained for V1 and V2 and the correlation coefficients of the validity of the Gaussian model
Fig. 10
figure 10

Experimental isotropic variogram of vectors V1 and V2

Subsequently, the mapping of V1 and V2 was applied by means of the point kriging tools. The probability map of V1, which correlates with the concentrations of Na, Cl and SO4, reveals that the highest concentrations were recorded near the sabkha Sidi El Hani. This may be due to the dissolution of salt rocks (such as halite and gypsum) combined with the up-coning of salted waters trapped in the sedimentary series in the catchment area of the wells. In addition, this zone was characterized by a rather low hydraulic gradient corresponding to a very slow groundwater flow that may cause an increase in the concentration of dissolved salts. However, the obtained map of V2 resulting from the presence of dissolved calcium and potassium in the groundwater shows that the highest value was recorded in the central plain of Ouled Chamekh and in the neighbourhood of the sabkha Cherita. Thus, the V2 vector is, mainly, associated with activities of the agricultural sector, such as the use of fertilizers.

Conclusions

A combined approach of multivariate statistical methods (HCA, FA, and PCA) and geostatistical modelling was used to characterize the hydrochemical state of the groundwater of Sidi El Hani and to identify the main factors responsible for its evolution. The results based on HCA highlight two classes of water samples: C1 and C2. The C1 class, composed of 53% of the samples, is characterized by a fairly high salinity and a Cl–Ca–Mg–SO4 facies. However, the C2 class gathers 47% of water points where the salinity exceeds 6 g/L. It represents sodic chlorinated facies.

The results of the FA and PCA factorial analysis (AF) demonstrated that the hydrochemical evolution of the groundwater resulted from two independent factors. The first factor (F1) was represented primarily by chloride (Cl), sodium (Na+), sulphate (SO42−) and magnesium (Mg2+). However, the second factor (F2) was characterized by calcium (Ca2+) and potassium (K+). The chart in the factorial space showed that F1 was associated with the water samples having a strong electric conductivity for which the ionic acquisition was controlled by natural factors such as rock weathering. On the other hand, F2 gathered the water samples for which the salinity was controlled by anthropogenic factors such as the return of irrigation water charged by the fertilizers used in agriculture.

The geostatistical modelling gave a comprehensive view of the influence of vectors V1 and V2 associated with the two identified factors (F1) and (F2), respectively. The mapping of the variables shows that two major areas can be defined on a regional level. It can be concluded that the chemical evolution of the sampling points close to the sabkha of the Sidi El Hani is strongly controlled by natural factors (originating from evaporite materials, mainly from the Holocene). The rest of the plain is under the influence of anthropogenic factors related mainly to the intensive use of fertilizers. However, the heterogeneity of salinization observed in the groundwater of Sidi El Hani underlines the difficulty of identifying the origin of the major elements present.

The combination of statistical and geostatistical tools seems to produce a better picture of the main factors influencing the groundwater quality than former studies, but it does not provide detailed information on the physicochemical processes involved. To further specify the origin of each chemical element and the current state of its flux in the aquifer, numerical studies of reactive transport should be performed to supplement the knowledge of the functioning of the aquifer and to provide answers concerning the transport processes involved. Thus, the statistical and geostatistical study of groundwater samples contributes to preventing the salinization of groundwater and improving water quality.