Keywords

1 Introduction

A great increase in the demand in Yemen for the freshwater in the last few decades due to the rapid growth of the population leads to imbalance in available water resource and demand. Hence, the gap between water supply and demand is increasing with time. The groundwater resources are considered as the main source of freshwater in Yemen. This resource is depleted at high rates, and many of groundwater basins in Yemen such as Sana’a are showing decline in the groundwater levels and increase in the salinity which is a strong sign of depletion. The expansions of urbanization, irrigation activities, and industrialization have a significant effect on the groundwater resources. Accordingly, conserving and monitoring this important resource is essential. The aquifer in Qa’Jahran basin consists of two layers (sedimentary unit and volcanic fractures unit), and as a result of this weakness, the probability of water contamination by human or nature is very high. There are many sources of water contamination in the study area which disposed randomly such as wastewater, oils, structures waste, and municipal solid waste. Large quantities of human and industrial waste disposals pose a serious threat to this valuable resource. The sources of these contaminants are the study area itself. These contamination sources will affect the quality of water resources which are used by about 137,804 persons. According to the report of WHO, water is the main reason for 80% of all human diseases. Regular monitoring of the groundwater is required to protect it where it is impossible in most of the cases to stop the pollutants from the source (Al-hadithi, 2012).

Manual, tube wells, and springs constitute the main resources for domestic, industrial, and irrigation uses in many areas. But, few of them are maintained in their original condition due to intensive human activities (Zhao et al., 2011). Today, the shortage of freshwater sources in many countries is a great concern (Alberto et al., 2001; Singh et al., 2004; Qadir et al., 2008). A constant polluting source due to municipal and industrial wastewater discharge and a seasonal phenomenon due to surface runoff can be most affected by climate (Singh et al., 2004). Decision-makers and water quality specialists often face significant challenges to control water contamination due to these complexities (Elhatip et al., 2008). To keep water sources useable, protecting water quality is essential to minimize water treatment cost and to satisfy regulatory limitations (Chowdhury et al., 2007; WHO, 2017; Canada, 2017).

Statistical analyses of water chemistry data are widely used in water analysis interpretation. Complex data matrices of water quality can be better interpreted and understand by applying different multivariate statistical techniques, such as factor analysis (FA) and principal component analysis (PCA). Rapid solution to pollution problems and the reliable management of water resources can be done using such statistical techniques which facilitate the identification of possible factors that influence water quality (Lee et al., 2001; Reghunath et al., 2002). Voza et al. (2015) proved the utility of multivariate statistical methods, factor analysis (FA), principal component analysis (PCA), and cluster analysis (CA) for the analysis and interpretation of the complex water quality data sets. Many studies demonstrated that multivariate statistical analysis is useful to assess the spatial water quality variations (Simeonov et al., 2003; Zeilhoferet al., 2006; Zhang et al., 2009a, 2009b; Olsen et al., 2012; Marinović & Ruždjak, 2015; Phung et al., 2015; Muangthong & Shrestha, 2015). Freshwater quality can be characterized and evaluated using multivariate statistical techniques which are also useful in verifying temporal and spatial variations (Singh et al., 2005; Chowdhury & Al-Zahrani, 2014).

Environmental data management is one of the important applications of Geographic Information System (GIS). To avoid possible environmental crisis, managers and scientists can be provided with different scenarios using GIS for predicting future trends and spatial distribution of the data. GIS is widely used in pollution and water resource management due to its importance in playing a critical role in these fields (Zeilhofer et al., 2007; Igboekwe & Akankpo, 2011; Yan et al., 2015). Nasher and El-Sagheer (2012) studied bacteriological, chemical, and physical parameters of water resources in Bura’a natural protected area, Hodeidah, Yemen, to evaluate the water quality for agricultural and drinking purpose. They found that the water is suitable for irrigation purposes, but it is unsuitable for drinking. Saleh et al. (2018) assessed the suitability of groundwater quality of rural areas (Zabid, Hodeidah) for drinking purpose using water quality index (WQI). They found that the groundwater quality is generally good–poor water quality status. Aqeel et al. (2017) used GIS to create a geospatial distribution map of fluoride concentrations and identified three risky zones in the Al-Howban Basin, Taiz-Yemen. Mohamed and Hassane (2016) studied the impact of urbanization on groundwater resources in Al Ain city, UAE, by performing hydrochemistry assessment of the quality of groundwater. Nasher et al. (2013) identified and evaluated the hydrogeochemical processes for the management of the groundwater resources of the lower part of Wadi Siham, Tihama plain, Yemen. Nazzal et al. (2015) investigated Saq aquifer, northwestern part of KSA by conducting a combined hydrochemical–statistical analysis to assess groundwater resources. Toumi et al. (2015) studied the water quality and the hydrochemical characteristics in Al-Ula Region, Saudi Arabia, and they found that the lithological influences are the guide of water chemistry. Gebrehiwot et al. (2011) evaluated the suitability of groundwater quality in El Khairat aquifer, Enfidha, Tunisia, for drinking purposes. They conclude that GIS is an effective tool for creating maps showing the spatial distribution of water quality parameters. Eslami et al. (2013) used GIS interpolation techniques to evaluate the groundwater quality of Mianab plain, Iran, and they found that groundwater quality in their study area is not suitable for drink and agricultural uses. They also confirm the importance of selecting an appropriate interpolation technique and illustrated the effective of geostatistical technique in groundwater management system. Al-Omran et al. (2018) used different techniques such as water quality index (WQI) and Durov diagrams to assess the groundwater quality of Al-Hasa region, Saudi Arabia, for drinking and irrigation purposes, and they found that the water is excellent for irrigation and poor to unsuitable for drinking purposes.

The main objective of this study is for evaluation of groundwater quality variations in Qa’Jahran basin in Yemen. This study focuses on using the Geographical Information System (GIS) and multivariate statistics to understand the water resource quality and generate a water quality index map for the first time in this area.

2 Study Area and Data Acquisition

The Qa’Jahran basin is located in the middle western part of Yemen, at the Central Highland Plains in Jahran district. It represents the main catchments of Jahran plain (Fig. 1). The geographic extent of the basin area lies between longitudes 44° 12′ 20″ and 44° 22′ 30″ E and latitudes 14° 38′ 11″ and 14° 57′ 30″ N with total area extent 406 km2, and it is about 25 kma distance toward the north of Dhamar city. The Qa’Jahran basin is one of the largest and most completely flat areas of its size in the Highland Plains of Yemen. The catchment area of the basin is affected by soil erosion. The intensive drilling activity for water quest in the catchment and water scarcity related to human interface with water storage has resulted in groundwater to sink down and the basin threatened by drought due to groundwater depletion. The manual and tube wells are the main resource of the water in Qa’Jahran basin. The people in the area depend mainly on these resources for their water requirements. Thirty water wells were selected from Qa’Jahran basin from different locations for the study taking into consideration the hydrologic, social, and economic variation. All statistical and mathematical computations were made using JMP, MINITAB™, SPSS Statistics, XLSAT, and Microsoft Office Excel 2016.

Fig. 1
figure 1

Location map of the groundwater samples from the Qa’Jahran basin, Dhamar, Yemen

2.1 Scatterplots and Pairwise Correlations

The indication of linear or nonlinear relationships and possible correlations were represented by scatterplots and pairwise correlations. Scatterplots and pairwise correlations of Qa’Jahran basin data were investigated by this study using JMP and the MINITAB™ statistical package.

2.2 Principle Component Analysis (PCA)

PCA is designed to convert the original variables (correlated variables) into new, linearly uncorrelated variables, called principal components. The majority of the data set can be described by the most meaningful parameter information which is provided by PCA (Helena et al., 2000). As a result of the reduction in the dimensionality of data by PCA, a smaller number of transformed variables can be used to represent the bulk of the variation in the data (Jackson & John Wiley & Sons, 1991; Wold et al., 1987). The following equation can represent the principal component (PC):

$$Z_{ji } = a_{i1} x_{1j} + a_{i2} x_{2j} + a_{i3} x_{3j} + \cdots + a_{im} x_{mj}$$

where Z is the component score, i is the component number, j is the sample number, a is the component loading, x is the measured value of variable, and m is the total number of variables.

The transformed data coming from the PCA can be simplified even more with the benefit of the factor analysis (FA) which construct new variables, called varifactors (VFs) by reducing the contribution of less significant variables. VF includes hypothetical, unobservable, latent variables, while PC is a linear combination of observed water quality variables (Helena et al., 2000; Vega et al., 1998). Normalized variables are used in PCA to extract PCs with major significance and reduce that have minor significance (Zhang et al., 2009a,2009b; Bu et al. 2010). As a result, the amount of information obtained from a less number of variables will be similar to that resulted from the much larger set of original variables. The following equation can represent the factor analysis (FA):

$$Z_{ji } = a_{f1} f_{1i} + a_{f2} f_{2i} + a_{f3} f_{3i} + \cdots + a_{fm} f_{mi} + e_{fi}$$

where Z is the measured variable, i is the sample number, a is the factor loading, f is the factor score, m is the total number of factors, and e is the residual term accounting for errors or other source of variation. In this study, fifteen of Qa’Jahran water quality data set was subjected to FA/PCA.

2.3 Water Quality Index (WQI)

It is a rating that can be used to reflect the composite influence of different water quality parameters on the overall water quality. The main objective of computing water quality index (WQI) is to obtain usable and understandable information from the complex water quality data. A parameter is selected based on its impact on health and the overall quality of water. The WQI can be computed by the following three steps.

First step: Assign weight (wi) to the selected water parameters (e.g., TDS, pH, TH, HCO3, SO4, Cl, NO3, Fe, etc.) according to their relative importance in the overall water quality for drinking purposes (weight may be from 1 to 5).

Second step: Compute a relative weight (Wi) of the chemical parameter using the following equation:

$$W_{i} = w_{i} /\mathop \sum \limits_{i = 1}^{n} w_{i}$$

where Wi is the relative weight, ‘n’ is the number of parameters, and wi is the weight of each parameter.

Third step: Assign a quality rating scale (qi) for each parameter, as below:

$$q_{i} = \left( {C_{i} \, / \, S_{i} \, } \right) \times 100$$

where qi is the quality rating, Si is the guideline value/desirable limit as given in Yemeni drinking water standard, and Ci is the concentration of each chemical parameter in mg/l. For the computation of WQI, firstly the subindex (SI) is determined for each water quality parameter, as given below:

$$\begin{aligned} {\text{SI}} & = W_{i} \times q_{i} \\ {\text{WQI}} & = \sum {\text{SI}}_{i} \\ \end{aligned}$$

where Wi is relative weight of the parameter, SIi is the subindex of the parameter, and qi is the rating based on concentration of the parameter.

2.4 Classification of Water

The water can be classified into five types based on computed WQI, as given below:

Excellent water (<50); good water (50–100); poor water (100–200); very poor water (200–300);  and water unsuitable for drinking (>300).

3 Results and Discussion

The statistical analysis of water quality parameters (WQPs) is calculated for 30 (WQP) in Qa’Jahran wells as shown in Table 1. Distribution fitting was examined using maximum likelihood as an estimation method and tested using Kolmogorov–Smirnov and chi-square method. The gamma (2) distribution is the distribution that fits best the data as shown in Table 2. The water quality data sets of Qa’Jahran basin were subjected to multivariate statistical techniques: FA and PCA.

Table 1 Statistical analysis of water quality parameters (WQPs) of Qa’Jahran wells
Table 2 Distribution fitting of the data analysis

3.1 Correlation and Scatterplots

The correlations between different water quality parameters are carried out as shown in Table 3. The pH has strong correlation with carbonate (r = 0.857) and negative strong correlation with hardness (r = −0.632) and calcium (−0.639). Total dissolved solids is perfectly correlated with conductivity (r = 1). Total dissolved solids and conductivity show strong correlation with chloride and bicarbonate (r = 0.881), total alkalinity (r = 0.868), sodium (r = 0.803), and sulfate (r = 0.743), and moderate correlations with magnesium (r = 0.597) and potassium (r = 0.475). Sodium has strong correlation with total alkalinity, bicarbonate, and chloride (r = 0.845, 0.822, and 0.628, respectively), and moderately correlated with fluoride (r = 0.484). The strong positive correlation of sodium with chloride and between them with conductivity reflects a common source of these ions. Potassium is moderately correlated with chloride (r = 0.52). Hardness shows strong correlation with calcium (r = 0.945) and magnesium (r = 0.76) and moderate correlation with chloride, sulfate, and nitrate (r = 0.474, 0.533, and 0.494, respectively), while it has negative moderate correlation with carbonate (r = −0.534). Magnesium has strong correlations with chloride (r = 0.705) and moderate correlations with sulfate and calcium (r = 0.544 and 0.51, respectively). Total alkalinity is strongly correlated with bicarbonate (r = 0.996) and shows moderate correlation with chloride (r = 0.598). Bicarbonate has strong correlation with chloride (r = 0.624) and moderate correlation with sulfate (r = 0.473). Carbonate shows moderate correlation with calcium (r = 0.575) and same correlation between calcium and nitrate (r = 0.503). Strong correlations were shown between chloride and sulfate (r = 0.632) and between nitrate and fluoride (r = 0.851).

Table 3 Correlation of different parameters of chemical analysis for water samples in Qa’Jahran basin

3.2 Principal Component Analysis (PCA)

PCA was performed for Qa’Jahran basin data. Based on the eigenvalue–one and scree plot (Fig. 2 and Table 4), the number of significant principal components (PCs) was determined. PCs are regarded as significant if eigenvalues are greater than one. Four significant PCs with eigenvalues > 1 were extracted in this study and explained 40.137%, 26.277%, 11.943%, and 6.863% of the variance, respectively. The findings indicate that 85.22% of total variance associated with WQP in Qa’Jahran basin can be explained by the first four PCs.

Fig. 2
figure 2

Scree plot of water samples in Qa’Jahran basin

Table 4 Total variance explained of water samples in Qa’Jahran wells

3.3 Loading and Scatterplots

Understanding the contribution of original WQP to the major PCs is essential to decide which parameter is important and which is not. Figures 3 and 4 show the scatterplots and loading on the major PCs for Qa’Jahran basin. The WQPs, EC, TDS, TA, HCO3, Cl, SO4, Mg, and Na exceeded the threshold limit of 0.55 on PC1 (Fig. 3). Therefore, they had significant loadings in PC1 and PC2 and have significant loadings from turbidity, pH, TH, CO3, Ca, Na, NO3, and Fe. In PC3, significant loadings were assigned from iron turbidity, CO3, and Fe. The PC4 obtained significant loading from fluoride (F) (Table 5). Figure 4 shows the scatterplots of the loadings. The plot of PC1 × PC2 shows that nitrate and iron were significantly associated with PC2 only, while conductivity, TDS, alkalinity, chloride, bicarbonate, magnesium, sulfate, and sodium were significantly associated with PC1. To represent water quality, it may be required one or two parameters from each cluster. In general, the four PCs were mainly associated with conductivity, TDS, alkalinity, chloride, bicarbonate, magnesium, sulfate, sodium, nitrate, and iron.

Fig. 3
figure 3

Scatterplots and loading on the major PCs for Qa’Jahran basin

Fig. 4
figure 4

Scatterplots of the (1) turbidity, (2) pH, (3) conductivity, (4) TDS, (5) hardness, (6) alkalinity, (7) bicarbonate, (8) carbonate, (9) chloride, (10) sulfate, (11) fluoride, (12) calcium, (13) magnesium, (14) sodium, (15) potassium, (16) nitrate, and (17) iron

Table 5 Factor loadings on the major PCs for Qa’Jahran basin

The corresponding VFs were obtained for Qa’Jahran basin (Table 5). The factor loadings were classified according to absolute loading values as strong > 0.75, moderate, 0.75–0.50, and weak 0.50–0.30 (Liu et al., 2003. The percent (40.137%) of total variance was explained by VF1 and had strong positive loadings from alkalinity, bicarbonate, sulfate, chloride, total dissolved solids, and conductivity, while hardness, sodium, manganese, and potassium had a medium loading on VF1. The percent (26.277%) of total variance was explained by VF2, had strong positive loadings from nitrate, and had medium loading from turbidity, hardness, and calcium. The percent (11.943%) of total variance was explained by VF3 which had medium loadings from iron, turbidity, and carbonate. The VF4 explained 6.863% of the total variance and had weak negative loading from sulfate.

3.4 Data Reduction

The original parameters on the major VFs were assessed to investigate the possibility of reduction of data. It was reported by the past studies that the original parameters have the greatest contribution to temporal variation if they have loadings greater than 0.7 or 0.75 in the major VFs (Liu et al., 2003; Varol et al., 2012). The value (85.22%) of variance is explained by the major four VFs which had strong loadings from sulfate, sodium, chloride, manganese, calcium, total dissolved solids, hardness, bicarbonate, conductivity, iron, total alkalinity, and nitrate as shown in Table 7. As TDS, conductivity, chloride, hardness, magnesium, sodium, and sulfate are strongly correlated as shown in Table 5, TDS can be selected to be significant. Accordingly, a total of twelve parameters (sulfate, chloride, sodium, manganese, calcium, hardness, total dissolved solids, conductivity, bicarbonate, total alkalinity, nitrate, and iron) are needed (approximately 80% of the 15 parameters) to explain 85.22% of the data variance (Table 5).

3.5 Water Quality Parameters

The minimum and maximum values of each parameter were identified, and the spatial distributions for sixteen parameters were determined for the study area. The value of iron (Fe) varies from 0.01 to 1.08 mg/l, and the highest concentration is in the Al-ulaib well east part of the study area (Fig. 5a). The spatial distribution of nitrate (NO3) is shown in Fig. 5b which shows the concentration variation range from 0.8 to 101 with highest concentration only in DJ2-W-9 well. The concentration of potassium is low in the whole study area with values between 0.4 and 5.4 mg/l (Fig. 5c). The spatial distribution of sodium varies from 7.5 to 190.6, and the highest concentration is in the DJ1-W8 (Fig. 5d). Figure 5e shows that spatial distribution of magnesium which varies from 0.0 to 37 mg/l. The spatial distribution of calcium varies from 0.0 to 150, and the highest concentration of calcium is in the DJ2-W-9 well. Fluoride varies from 0.2 to 3.7 with high concentration in the northern part and low concentration only in the central part of the study area (Fig. 5f). The value of sulfate (SO4) varies from 1 to 102 mg/l, and the low concentration is in the northern and southern parts of the study area (Fig. 5g). The spatial distribution of chloride is shown in Fig. 5h which shows the concentration variation is from 7.8 to 110.6. The concentration of bicarbonate varies from 42.6 to 332.5 and is low in the whole study area except DJ1-W33 within the permissible range (Fig. 5i). The value of pH range between 7.9 and 9.2 mg/l is shown in Fig. 5k, which shows that the water is alkaline in most of the study area. TDS varies from 76.5 to 653.9 mg/l which proved that the TDS in groundwater is low (Fig. 5l). That spatial distribution of total hardness of water varies from 0 to 192, and most of the study area has low concentration of TH as shown in Fig. 5n. Figure 5o shows that spatial distribution of total alkalinity varies from 38 to 278. Figure 5p shows that spatial distribution of turbidity which varies from 0.04 to 4.8.

Fig. 5
figure 5

Water quality parameters (spatial distribution maps) of Qa’Jahran basin

3.6 Water Quality Index (WQI)

WQI is widely used tool to assess the water quality. According to the relative importance of each water quality parameter, the relative weight was determined as shown in Table 6. Water quality index was calculated for each parameter as shown in Table 7 and Fig. 6. It can be noticed that 72% of the samples fall in excellent water category and 28% of the samples fall in good water category that reflect good-quality water in Qa’Jahran basin.

Table 6 Relative weight for each water quality parameter (Gebrehiwot et al., 2011)
Table 7 Water quality index for the Qa’Jahran wells
Fig. 6
figure 6

Spatial distribution of water quality index in Qa’Jahran basin

4 Conclusions

Spatial variability of water quality parameter (WQP) in Qa’Jahran basin was evaluated in this study. Several techniques, including pairwise correlations, FA, and PCA, were performed on the data collected in 2019. Data reduction was also investigated based on assessing their loadings and correlation structure on the major PCs and VFs. The possibility of data reduction was investigated in this study. About 80% of data in Qa’Jahran basin can be reduced in expense of 20% of overall variance. This research points out that the data reduction can be possible in the monitoring program. Approximately 48% can be reduced in expense of 14.78% of the overall variance. A total of nine parameters out of 17 (conductivity, TDS, hardness, total alkalinity, bicarbonate, chloride, sulfate, calcium, nitrate) explained 85.22% of the data variance.

According to WHO and G.Y.L., most of the chemical elements in the study area fall in the permissible limits. WQI and GIS approach were integrated to evaluate water quality in the study area and identify the potential risky zones. This water quality index map shows that the water in the study area is mostly excellent to good water, and based on that, this water is recommended for domestic use and human drinking.