Introduction

The Jianghan Plain is located in the central Yangtze River drainage basin, central China, which is rich in groundwater resources (Zeng 1996). However, there is widely distributed geogenic arsenic (As), Fe, Mn and ammonium contamination in groundwater, which is causing significant problems alongside the increasing water demand (Du et al. 2017; Duan et al. 2015; Gan et al. 2014; Li et al. 2018; Zhou et al. 2012). Furthermore, anthropogenic activities (e.g., lake reclamation, over-exploitation, sewage discharge, and fertilizer application) have resulted in water quality deterioration and other environmental geological problems (e.g., wetland degradation; Cui et al. 2013; Niu et al. 2017; Wang et al. 2006; Xie et al. 2017). To support sustainable groundwater resources management, it is necessary to integrate the various data and make a thorough analysis to deepen understanding of the complex groundwater flow patterns and hydrogeochemical characteristics in the plain.

Groundwater flow systems are mainly controlled by physiographical factors (topography and climate), geological factors (lithologic structure) and anthropogenic factors (Liang et al. 2015). The extremely complex surface-water system and micro-topography in the Jianghan Plain have made the groundwater flow system very complicated (Huang et al. 2017). In addition, the insufficiency of data and the intensive anthropogenic activities such as pumping, irrigation and possible groundwater flow barriers (e.g., levees and dams), have caused a lot of uncertainty in current groundwater flow models.

Many studies have proven that hydrochemical characteristics can effectively indicate groundwater recharge (or mixing) and geochemical evolution processes (Awaleh et al. 2017; Barbieri et al. 2005; Liu et al. 2017; Pilla et al. 2006; Zheng et al. 2017; Zhu et al. 2007; Zhu et al. 2008). As a result, hydrochemical analysis has been an accepted method to trace groundwater flow paths. The conventional approach is to divide samples into hydrochemical facies by graphical methods (e.g., Piper diagram), and then analyze the reactions related to the systematic variations among different facies (Guler et al. 2002). Afterwards, information about the groundwater flow system is provided on the basis of variations (reaction intensity or types) along the flow paths. Unfortunately, these graphical methods only use a proportion of the available data (often major ions), and it is difficult to produce distinct groups (Guler et al. 2002). The limitation is even more serious when large data sets are considered.

Compared with traditional graphical techniques or qualitative methods, multivariate statistical techniques are quantitative and semi-objective approaches, which can use any combination of chemical (major, minor and trace constituents), physical (e.g., temperature) and other related (e.g., elevation and precipitation) parameters (Cloutier et al. 2008; Farnham et al. 2003; Guler et al. 2002; Zhu et al. 2017). Hierarchical cluster analysis (HCA) and principal component or factor analysis (PCA or FA) are two well-proven multivariate methods used in various research fields. In hydrogeochemical studies, HCA helps to classify samples into a group of representative clusters (also known as hydrochemical facies, water types or water groups; Guler and Thyne 2004). PCA or FA reduces the dimensionality of large data sets and identifies the meaningful underlying factors affecting the groundwater quality in the area. Numerous studies have shown that HCA and PCA (or FA) are useful to investigate hydrochemical patterns, to determine the processes controlling hydrochemical evolution (temporal and spatial) of groundwater, to decipher the origin and mobility of both geogenic and anthropogenic pollutants, and to define and understand the complex groundwater flow systems (Cloutier et al. 2008; Demlie et al. 2007; Guler et al. 2012; Guler and Thyne 2004; Halim et al. 2010; He et al. 2015; Helena et al. 2000; Helstrup et al. 2007; Huang et al. 2013; Krishna et al. 2009; Moeck et al. 2016; Moya et al. 2015; Newman et al. 2016; Owen and Cox 2015).

Previous studies in the Jianghan Plain mainly used traditional graphical methods to classify and interpret the hydrogeochemical characteristics of groundwater, which have showed little changes in water type (mainly HCO3-Ca or HCO3-Ca-Mg type; Gan et al. 2014; Niu et al. 2017; Yu et al. 2017; Zhou et al. 2012). Due to lack of systematic transformation, the partitioned water types by traditional classification schemes have been of little value in improving the model of the groundwater flow system in the Jianghan Plain.

In this study, multivariate statistical analysis was applied to investigate the hydrogeochemical evolution of groundwater in a large alluvial aquifer system. Therefore, the main objectives of this study were to: (1) test the validity of multivariate methods in identifying the hydrochemical facies in a large area; (2) elucidate and distinguish the main factors controlling the groundwater chemistry; (3) evaluate the applicability of this approach to understand the groundwater flow patterns in the Jianghan Plain.

Study area

Location and physiography

Jianghan Plain is a semi-enclosed basin in Hubei Province of central China, encompassing an area of about 40,000 km2. The plain is situated within the central Yangtze River basin, bounded by mountains or hills (elevation 100–2,000 m above sea level) in the north, west and east, and adjoining Dongting Lake in the south (Fig. 1a). The Yangtze River and Han River (a major tributary of the Yangtze River) flow across and converge in the Jianghan Plain, supplying the alluvial sediments.

Fig. 1
figure 1

a Hydrogeological map of the Jianghan Plain and the location of the study area, b elevation map of the study area, and c a typical hydrogeological section (A–A′) across the study area

The geomorphology of the Jianghan Plain can be classified into two categories, the hilly areas to the boundaries and low plain areas in the center (elevation of about 40–170 and 20–30 m above sea level, respectively). The elevations of the Jianghan Plain gradually decline from the north and west to the south and southeast. The low plain areas have a very low slope of 1/20,000–1/30,000 from west to east (Zhou et al. 2012).

The study area is located in the low plain areas of the south-eastern Jianghan Plain with an area of approximately 1,500 km2, which covers the areas from Yangtze River to Han River. The elevations of study area range from 16 to 40 m above sea level (Fig. 1b). Due to natural or artificial levees, areas along large rivers (e.g., Yangtze, Han and Dongjing rivers) are typically 2–6 m higher than the areas between two rivers.

The study area has a sub-tropical monsoon climate with annual average precipitation of about 1,164 mm (71.6% in June to August) measured during years 1957–2008 (Luo et al. 2011). The annual average temperature and evaporation are 16.7 °C and 1,379 mm respectively.

Geological and hydrological settings

Geologically, the Jianghan Basin is located in the Yangtze Block, primarily framed by Mesozoic Yanshanian orogenesis when the surrounding orogenic belts and massifs had an intensive uplift (Wu et al. 2017). The basin subsided and accepted deposition simultaneously during the Cretaceous to Quaternary. The stratigraphic thickness of Cretaceous-Neogene (mainly clastic rocks) generally ranges from 3,000 to 4,000 m, up to 5,000–6,000 m in the center of subsidence. The alluvial-lacustrine Quaternary sediments deposited on the top of the bedrocks, with the thickness decreasing from about 160–280 m in the center to about 15–90 m in the margin of the plain. In the study area, the thickness of Quaternary sediments to the south of Xiangyin fault is only 50 m. The Quaternary stratigraphy is typically a sandy layer (interbedded with clay layers) overlain by a clayey layer (about 20 m) (Fig. 1c).

The Quaternary aquifer system in the Jianghan Plain can be vertically divided into three aquifer groups (Zhang et al. 2017; Fig. 1c) of which the first is the phreatic aquifer (0–20 m depth) with Holocene (Q4) and upper late Pleistocene (Q3) clay, silty clay, clayey silty and silt. The depth to groundwater level in this aquifer generally ranges from 0.5 to 2 m. The second is the middle-confined aquifer (20–100 m depth) with late Pleistocene (Q3) and middle Pleistocene (Q2) sand and sandy gravel. The discontinuous silty clay and clay lenses with a thickness of 5–10 m inside compose as local aquitards. This aquifer is the main aquifer for current exploitation, with piezometric levels mainly ranging from 20 to 35 m above sea level. Pumping tests for two boreholes in the study area revealed that the hydraulic conductivity of this aquifer ranges from 0.075 to 1.26 m/day (Chen et al. 2017). The third group is the deep-confined aquifer (> 100 m depth) with early Pleistocene (Q1) silt, sand and sandy gravel. The continuous clay aquitard with a thickness about 10 m separates the middle-confined aquifer from the deep-confined aquifer. Groundwater samples for this study were almost all collected from the phreatic and middle-confined aquifers.

The water levels of surface water, phreatic groundwater and confined groundwater showed similar seasonal changes in response to precipitation (Fig. 2). The characteristics of H/O stable isotopes suggested that local precipitation was the fundamental source of surface water and groundwater in the Jianghan Plain (Du et al. 2017; Gan et al. 2014; Yu et al. 2017). In the study area, groundwater is recharged by vertically infiltrating meteoric water, by laterally following groundwater from adjacent aquifers, by leakage from rivers and drainage channels, and by irrigation return flow. Discharge mainly occurs by discharging to surface water and adjacent aquifers, and by modest artificial extraction. Analysis of major ion chemistry and H/O stable isotopes suggested that the phreatic aquifer probably serves as a potential mixing pathway between the confined aquifer and surface water (Du et al. 2017).

Fig. 2
figure 2

The monthly (2013) variations of precipitation and water levels of the surface water (Dongjing River) and phreatic and middle-confined groundwater (GW) in the study area. The groundwater levels were measured in 39 monitoring wells (within a 10-km2 field) in the middle of the study area (between the Dongjing and Tongshun rivers). The precipitation data were collected from Xiantao Observatory

Controlled by the regional topography, the groundwater of the Jianghan Plain regionally flows from the north-west to south-east, and discharges to the Yangtze River (Fig. 1a). The study area is in the transition and discharge zones of the plain groundwater flow; however, the groundwater flow paths would be distorted by the local undulations of the water table. The micro-topography induced by natural and anthropogenic activities leads to spatial diversity of flow paths, while the seasonal changes of precipitation and the surface water level stimulate the temporal variations (or even reverse) of the groundwater flow paths (Duan et al. 2015; Huang et al. 2017; Schaefer et al. 2016).

Seasonal hydrochemical variations in groundwater

Monthly monitoring for 2 years (Duan et al. 2015; Schaefer et al. 2016) in a 10-km2 field site in the middle of the study area showed that the concentrations of major ions (Ca, Mg, Na, K, HCO3) maintained a relatively stable (relative standard deviations < 20%) condition in the phreatic and confined groundwater. The concentrations of Cl (average < 15 mg/L) and SO4 (average < 9 mg/L) were low. Although the redox-sensitive parameters (e.g., Fe and As) displayed dramatic seasonal fluctuations in some wells, the concentrations of all wells that showed change followed a similar trend for each year. In other words, a seasonal effect would not alter the spatial hydrochemical patterns in the study area. What’s more, all samples for this study were collected in the wet season (in July and August), which would guarantee the validity of the analytical results.

Methodology

Sample collection

In August 2014 and July 2015, 474 groundwater and 33 surface-water samples were collected in the study area. Most groundwater samples were collected from domestic tube wells with depth less than 50 m. Several water samples were abstracted from deep boreholes with the depth up to 180 m. Surface-water samples were collected from the main rivers and drainage canals in the study area. The selection of water samples is discussed in section ‘Data screening’.

The wells were purged by pumping for 5–10 min before field measurements and sampling. Samples were collected in 50-ml HDPE bottles after three rinses with extracted water and filtered immediately using 0.45-μm membrane filters (Sartorius Minisart). Samples for cation and arsenic analysis were acidified to pH < 2 in the field with concentrated HNO3 and HCl, respectively. All samples were stored in a cool box containing ice packs immediately, and then transported to the laboratory and refrigerated at 4 °C until analysis.

Field and laboratory measurements

In the field, temperature (T, ±0.1 °C), pH (±0.01), electrical conductivity (EC, ±0.1 μS/cm), and oxidation-reduction potential (ORP, ±0.1 mV, measured relative to Ag/AgCl, after which values were recalculated to Eh for data analysis) were measured using a Hach HQ40D multi-meter. Ammonium concentrations of most samples were measured on-site using a Hach 2800 portable spectrophotometer and Hach reagent kits. Alkalinity (as HCO3) of all samples was tested within 24 h by acid-base titration method.

The concentrations of major anions (Cl, SO4 and NO3) were determined using ion chromatography (IC, 761COMPACTIC, Metrohm AG) with a detection limit of 0.01 mg/L. Analyses for total concentrations of five major elements (Ca, Mg, Na, K and Si) and four trace elements (Ba, Fe, Mn and Sr) were carried out with inductively coupled plasma atomic emission spectrometry (ICP-AES, IRIS Intrepid II XSP, Thermo Electron Co.) with a detection limit of 0.001 mg/L. Arsenic concentrations were measured using a hydride generation atomic fluorescence spectrometer (HG-AFS, 930, Titan, China) with a detection limit of 0.05 μg/L. Those measurements were all completed in the Analysis Center of the Geological Survey, China University of Geosciences.

Data screening

The purpose of data screening was to examine and improve the data quality prior to actual hydrogeochemical and statistical analyses. After initial screening, 13 samples with calculated charge balance errors above 10% (84% samples < 5%), 5 samples with K concentration above Na concentration, and 12 samples severely deviating from the good trend line between EC and calculated total dissolved solids (TDS), were rejected. The aforementioned three methods are commonly used to check the legitimacy of hydrochemical data (Cloutier et al. 2008; Guler et al. 2002; Moya et al. 2015; Shen et al. 1993).

Cases with unusual or extreme values, known as outliers, can distort statistics. Graphical methods (e.g., histograms, box plots, probability plots and scatter plots) and the Mahalanobis distance were used to detect univariate and multivariate outliers (Mertler and Reinhart 2016; Tabachnick and Fidell 2014). After further examination and comparison, 20 extreme outliers were removed from the data set, which were probably due to geothermal groundwater mixing (three wells located closed to the fault in the geothermal area), serious pollution, analytical errors, or incorrect data entry.

After data screening, 457 samples were retained in the data set for subsequent analysis, which included 436 groundwater and 21 surface-water samples. Among the groundwater samples, 91 and 345 samples were collected from the phreatic aquifer and confined aquifer, respectively.

Multivariate statistical analysis

Variables and data transformations

With reference to several similar studies (Cloutier et al. 2008; Guler et al. 2012; Tabachnick and Fidell 2014), 11 variables (Ca, Mg, Na, HCO3, Cl, SO4, Si, Fe, Ba, Sr and As) were selected for the multivariate statistical analysis. Parameters with additive characteristics such as TDS and EC, parameters showing small regional variation such as pH and temperature, parameters with above 5% missing data values such as Eh and NH4, and parameters with low loadings and communalities in PCA analysis such as K, NO3 and Mn were eliminated from the statistical analysis.

For multivariate statistics, values reported as “zero” or as “below the detection limit” need to be replaced. For this study, about half the samples presented zero or below the detection limit values for SO4. These values were replaced by 0.55 times the detection limit (Guler et al. 2002). In the multivariate statistical procedure, the samples with missing data values would be automatically excluded from the analysis. To avoid sample exclusion, four missing arsenic values were estimated by averaging values of nearby sampling sites (same aquifer).

Normality is the general assumption involved in multivariate statistical analysis. Although assumptions regarding the distribution of variables are not in force in PCA and FA, the solution would be enhanced if variables are normally distributed (Mertler and Reinhart 2016; Tabachnick and Fidell 2014). In this study, three variables (Cl, SO4 and As) with substantial skewness and kurtosis were log-transformed to improve the normality of distribution. Subsequently, all the 11 variables were standardized to the standard scores (z-scores) that have zero means and one unit of standard deviation. Standardization ensures that variables with extremely different standard deviations are weighted equally in the statistical analysis. Log-transformation and standardization are commonly applied to hydrochemical data for multivariate statistical analysis (Cloutier et al. 2008; Demlie et al. 2007; Guler et al. 2002; Moeck et al. 2016; Owen and Cox 2015; Zhu et al. 2017).

Statistical procedures

In this study, three multivariate methods were applied to analyze the surface water and groundwater chemistry data using the SAS (version 9.4 for windows) and IBM SPSS Statistics (version 23) software: the principal component analysis (PCA), factor analysis (FA), and hierarchical cluster analysis (HCA).

PCA and FA have considerable utility in reducing numerous variables down to a few uncorrelated components (for PCA) or factors (for FA), which have been proven to be powerful in analyzing high-dimensional hydrochemical data sets (Huang et al. 2013; Moya et al. 2015; Zhu et al. 2017). The produced components or factors are thought to reflect underlying processes that have created the correlations among variables. In this study, principal component analysis was chosen for factor extraction. The number of components and factors were determined by the total explained variability, scree plot and the number of eigenvalues greater than 1 (Mertler and Reinhart 2016). In FA, varimax rotation was further performed to make the factor solution more interpretable without altering the underlying mathematical structure. Factor scores were evaluated by the regression method. Since PCA and FA shared the same data set in this study, the only difference between them was the rotation process.

Q-mode HCA was performed to classify surface water and groundwater samples into coherent clusters. Euclidean distance was used to measure the similarity or dissimilarity between samples. Ward’s method was used to combine the clusters. The number of clusters was determined by observing the hierarchical tree diagram (dendrogram) and statistics—pseudo F statistic, pseudo T2 statistic and cubic clustering criterion (CCC; Johnson 2004). Scatter-plots of factor scores were used to assess the continuity/overlap of clusters (Guler et al. 2002). Particularly, to minimize repeated contributions to distance measurement from highly correlated variables (multicollinearity), this study chose the first three principal component scores (determined by PCA, unstandardized) as the input variables for HCA, rather than the raw data values (Johnson 2004).

Results and discussion

Hydrochemical characteristics

The hydrochemical characteristics of the surface water, phreatic groundwater and confined groundwater samples are presented in Table 1. Similar to previous studies (Du et al. 2017; Duan et al. 2015; Yu et al. 2017; Zhou et al. 2012) in the Jianghan Plain, almost all samples were HCO3-Ca-(Mg) type except one confined groundwater sample (HCO3-Cl-Ca type, with maximum values of TDS and Cl) and five surface-water samples (HCO3-Cl-Ca and HCO3-SO4-Ca type). HCO3 and Ca were the predominant anion and cation in both surface water and groundwater samples, respectively. However, the groundwater generally had higher levels than surface-water samples in TDS (groundwater 483 ± 95.4 mg/L, surface water 187 ± 46.8 mg/L), as well as EC, HCO3, Ca, Mg and Sr.

Table 1 Statistical summary of the hydrogeochemical data of the surface water, phreatic groundwater and confined groundwater samples in the Jianghan Plain. SD standard deviation

Compared to confined groundwater, the surface water and phreatic groundwater samples typically had higher levels of Cl, SO4 and NO3, and lower level of Si. The order of median values of Eh, Fe, As and Ba in samples was confined groundwater > phreatic groundwater > surface-water samples. The confined aquifer was generally under strongly reducing conditions (Eh 96 ± 63 mV). In all, 66.8% of the groundwater samples had As concentrations above the World Health Organization (WHO) standard of 10 μg/L. The phreatic groundwater usually had higher level of Mn than surface water and confined groundwater samples. The order of median values of NH4 in measured samples was confined groundwater > surface water > phreatic groundwater samples.

Result of multivariate statistical analysis

PCA, FA and HCA was performed on 11 variables for a data set of 21 surface water and 436 groundwater samples. PCA was previously used to estimate number of factors, and to compute the input variables (principal component scores) for HCA.

Principal component and factor analysis (PCA and FA)

Factor analysis was used to identify the underlying factors influencing the groundwater chemistry. The end result of a FA includes two matrices (principal component matrix and rotated factors matrix; Table 2) and varimax factor scores (represent in Fig. 4). Three components were extracted from the PCA, explaining 74.63% of the total variance of the data set. After rotation, the first three factors account for 28.46, 24.99 and 21.18% of the total variance (Table 2), respectively. Communality values represent the proportion of variability that is explained by the factor solution (Mertler and Reinhart 2016). Except for Na (0.51) and Cl (0.60), all variables had communality values above 0.70, which meant that the factor solution could effectively explain most information in the original data set.

Table 2 Results of PCA and FA (varimax rotated) for surface water and groundwater samples in the Jianghan Plain (n = 457)

Interpretation and naming of factors depend on the meaning of the particular combination of observed variables that correlate highly with each other. The correlations between variables and factors are given by factor loadings (Tabachnick and Fidell 2014). Factor 1 was characterized by highly positive loadings in Ca, Mg and HCO3, and the loadings were also high in Na and Sr (Table 2; Fig. 3). Factor 2 was clearly characterized by highly positive loadings in As, Fe and Ba, whereas factor 3 was characterized by highly negative loadings in Cl and SO4 and highly positive loading in Si.

Fig. 3
figure 3

Bivariate plots showing the relationships of the first three factor loadings (varimax rotated): a factor 1 vs. factor 2, and b factor 1 vs. factor 3. The factor loadings of Si in both plots were reserved (multiplied by −1) to improve illustration

A descriptive term was defined for each factor based on their characteristic loadings (Fig. 3). Because the associated parameters (Ca, Mg, HCO3, Na and Sr) in factor 1 mainly originate from natural weathering processes of sedimentary or evaporitic rocks, factor 1 was defined as “water–rock interaction”. Factor 2 was defined as “redox conditions” and refers to geogenic Fe and As contamination. Due to the anthropogenic input of Cl and SO4 in the Jianghan Plain (Niu et al. 2017; Zhou et al. 2012), factor 3 was defined as “anthropogenic activities”.

Hierarchical cluster analysis (HCA)

In this study, the grouping into five clusters (named C1–C5) gave the most satisfactory results at forming hydrochemical distinct clusters. The scatter plots (Fig. 4) for the first three factor scores suggested that the five clusters could be relatively clearly separated from each other, despite minor overlapping.

Fig. 4
figure 4

Plots of the first three factor scores (varimax rotated) showing the distribution of HCA-derived clusters: a factor score 1 vs. factor score 2, b factor score 2 vs. factor score 3

The dendrogram (Fig. 5) reveals some indications of the level of similarity between clusters. Samples from C1 and C2 were linked to the other clusters at an elevated distance, indicating that these samples were hydrochemical distinct from the ones of the other three clusters. Among these three clusters, C5 was the least similar, as it had a high distance to C3 and C4. Similarities between the hydrochemistry of C3 and C4 samples were expected due to a low linkage distance.

Fig. 5
figure 5

Dendrogram of HCA for surface water and groundwater samples from the Jianghan Plain, showing the division into five clusters with different characteristics (GW: groundwater)

The characteristics of each cluster are summarized in Table 3 and Fig. 5. Samples from C1 were characterized by the highest levels of Cl, SO4 and NO3, and the lowest TDS (median 187 mg/L). Samples from C2 also had elevated concentrations of Cl and SO4, but the TDS and other major ion (HCO3, Ca and Mg) concentrations were much higher than C1. In comparison to C1 and C2, samples from clusters C3–C5 were characterized by very low levels of Cl, SO4 and Eh, and elevated concentrations of As, Fe and Ba; however, the levels of As, Fe and Ba were much higher in C3 and C4 than C5. Samples from C3 had the highest level of TDS (median 577 mg/L).

Table 3 Median values of physico-chemical parameters for the five clusters determined from HCA

Table 4 documents the distribution of each cluster in three hydrogeological settings. The surface-water samples were gathered into C1, while almost all groundwater samples were grouped into clusters C2–C5. In all, 71% of samples in C2 were phreatic groundwater, while more than 90% of samples in both C3 and C4 were confined groundwater, 26% of phreatic groundwater and 46% of confined groundwater samples were classified as C5.

Table 4 Relationship between clusters and sample types

Factors affecting groundwater chemistry in the Jianghan plain

Groundwater chemistry is largely dependent on the composition of recharging water and water–rock interaction, as well as groundwater residence time within the aquifer (Halim et al. 2010; Mukherjee et al. 2009; Verma et al. 2016). The three factors determined by FA represented the most important differences among clusters, which could be useful to identify the main processes controlling groundwater chemistry.

Factor 1: water–rock interaction

Factor 1 was associated with Ca, Mg, Na, HCO3 and Sr. The good correlation (R2 = 0.80) between factor score 1 and TDS (Fig. 6a) suggested that factor 1 represented the processes controlling the major ion chemistry in surface water and groundwater. In general, three processes contribute solutes to groundwater: evaporates dissolution, carbonate dissolution and silicate weathering.

Fig. 6
figure 6

Bivariate plots of a factor score 1 vs. TDS (GW: groundwater), and b HCO3 vs. Ca + Mg in the clusters. Darker symbols in a are the median values of each cluster, and lighter symbols with the same shapes are original values. The solid line in a was fitted by the whole original data in the plot

The bivariate mixing diagrams of Na-normalized Ca vs. HCO3 and Na-normalized Ca vs. Mg (Fig. 7) indicated that both surface water and groundwater were mainly influenced by silicate weathering and carbonate dissolution (Gaillardet et al. 1999). However, the plot of Ca + Mg vs. HCO3 (Fig. 6b) showed that most samples fell close to the y = 1/2 × line, which suggested dominance of carbonate dissolution; furthermore, Sr is well known for its association with carbonates, where it can readily substitute for Ca in the limestone and dolomite. Moderate positive correlation between Na-normalized Ca and Sr (R2 = 0.70) suggested that Sr and Ca were contributed primarily by carbonate dissolution (Guler et al. 2012; Halim et al. 2010; Mukherjee and Fryar 2008). On the other hand, the incongruent dissolution of silicates such as albite was probably responsible for the excess HCO3—compared to 2(Ca + Mg), Fig. 6b— and the relatively high loadings of Na in factor 1 (Table 2; Wang et al. 2009).

Fig. 7
figure 7

Molar ratio bivariate plots of a Na-normalized Ca vs. HCO3, and b Na-normalized Ca vs. Mg

In conclusion, carbonate dissolution (dominant) and silicate weathering controlled the major solutes in surface water and groundwater of the Jianghan Plain. The high contents of silicate minerals (55–77%) and carbonate (up to 20%) in the sediment from the study area supports the aforementioned inference (Duan et al. 2017).

As shown in Figs. 4a and 6a, the factor score 1 differed among clusters: C3 > (C2, C4 and C5) > > C1 (median value). Since all samples were close to the same trend line between factor score 1 and TDS, the differences in factor score 1 probably resulted from the intensity of water–rock interaction. Therefore, the high values of factor score 1 in C2 (median 0.43) and C3 (median 0.75) were probably caused by more weathering sediments in the phreatic aquifer and slow flow velocity in the confined aquifer, respectively.

Factor 2: redox conditions

The trace elements As, Fe and Ba contributed most strongly to factor 2 (Fig. 3). Since these variables are generally only active in reducing conditions, factor 2 could also indicate redox conditions. Redox conditions significantly control the behavior of Fe and As in groundwater (Schaefer et al. 2016). Reductive dissolution of As-containing iron oxides has been suggested to be the predominant mechanism leading to the elevated As concentrations in the Jianghan Plain (Duan et al. 2015; Schaefer et al. 2017; Ying et al. 2017). Besides, due to barite (BaSO4) solubility control, Ba enrichment is caused by strong reducing environments with low levels of SO4.

In the study area, the samples with high factor score 2 (or high As and Fe, e.g., C3 and C4) were concentrated in the confined aquifer under strongly reducing conditions, while the samples with low factor score 2 (e.g., C1 and C2) were typically in oxidizing environments (Fig. 4; Table 3). The redox conditions could also be verified by the levels of Eh and NH4 (Table 3); however, reducing conditions are not guaranteed for high levels of As and Fe (e.g., C5). The spatial heterogeneity of As and Fe in the Jianghan Plain were believed to correlate with lithology, hydrological and geological features, redox conditions and anthropogenic influence (Duan et al. 2015; Schaefer et al. 2016, 2017; ; Ying et al. 2017).

Factor 3: anthropogenic activities

Factor 3 included classical hydrochemical variables (Cl and SO4, with negative loadings) that indicated anthropogenic activities. The much lower levels of Cl (Table 1) in confined groundwater and no observation of halite in the study area (Gan et al. 2014) suggested that the high levels of Cl mainly originated from anthropogenic activities. The intensive agricultural and industrial activities, domestic wastewater, and landfill leachate probably accounted for the elevated concentrations of Cl, SO4 and NO3 in the study area.

The depth distribution of factor score 3 is presented in Fig. 8. Low scores (i.e., < −1, representing high Cl and SO4) were generally observed in surface water (C1) and phreatic groundwater (C2) samples; therefore, factor 3 could be accepted as the process affecting the water chemistry of surface water and phreatic groundwater. The low levels of Si in C1 and C2 (Table 3) probably related to the weak silicate weathering.

Fig. 8
figure 8

The depth distribution of factor score 3 for five clusters (SW: surface water; GW: groundwater). The depths of the surface-water samples were treated as zero. Two samples in C4 with well depth above 120 m were not shown

Since Cl is conservative along flow paths, the elevated concentrations in groundwater could indicate good connection with surface water. While SO4 and NO3 are sensitive to redox conditions, high levels of these could indicate relatively oxidizing conditions, as verified by the low scores of factor 2 in samples from C1 and C2 (Fig. 4). Therefore, factor score 3 could also assist in assessing the redox conditions and the hydraulic connections between surface water and groundwater or between aquifers.

Indication of potential groundwater flow path

Groundwater chemistry can be useful to trace groundwater flow paths since it gradually changes along the flow paths. The quite different hydrochemistry and controlling factors between phreatic and confined groundwater, suggested that the phreatic and confined aquifers in the study area probably belonged to different groundwater flow systems; thus, this study analyzed the flow patterns of phreatic and confined groundwater separately.

Due to the changes in geological and hydrological setting, the partition of flow systems by depth (at 20 m) would lead to some confusing results in analysis. To select the representative samples in individual aquifers, this study combined the “depth method” with the clustering results from HCA. Samples from C2 and C5 with well depth < 20 m were chosen as phreatic groundwater (n = 80), while samples from C3, C4 and C5 with well depth ≥ 20 m were chosen as confined groundwater (n = 322). The Kriging method was adopted to estimate the spatial distribution of factor score 1, which represented the intensity of water–rock interaction.

In the confined aquifer, the spatial distribution of factor score 1 (Fig. 9) fit in well with the confined water level (measured in 2014–2015). The factor score 1 generally increased along the groundwater flow paths. This phenomenon was especially obvious in the south of the study area, the area between the Dongjing and Yangtze rivers. Due to the higher confined water level, the confined groundwater in this area discharged to the north, south and east, forming an area with low scores of factor 1 (C5). Furthermore, the area with relatively high factor score 1 usually showed high concentrations of As and Fe in groundwater (C3 and C4).

Fig. 9
figure 9

Map showing the spatial distribution of factor score 1 for representative confined groundwater samples (chosen by well depth and HCA results)

Because the number of phreatic groundwater samples were not enough for Kriging, the spatial distribution of factor score 1 in phreatic groundwater was compared to the Kriging results of confined groundwater samples. Despite the actual values, the spatial distribution of factor score 1 in limited phreatic groundwater samples generally corresponded to the confined groundwater (not shown). However, the pattern seems more complicated, probably due to the influence of the complex surface-water net, micro-topography, or the insufficiency of samples.

In conclusion, the concentrations of major solutes (indicated by factor score 1) generally increased along the groundwater flow paths due to increasing water–rock interaction. Thus, the multivariate statistical analysis of hydrochemical data could effectively indicate the groundwater flow paths.

Conclusions

Multivariate statistical methods, including principal component analysis (PCA), factor analysis (FA) and hierarchical cluster analysis (HCA), were applied to identify flow patterns and major processes controlling the hydrogeochemistry of groundwater in the Jianghan Plain.

Although HCO3-Ca-(Mg) type water predominated in the study area, the HCA effectively classified the 457 (21 surface water and 436 groundwater) samples into five hydrochemical distinct clusters (C1–C5). Samples from C1 and C2 generally had elevated concentrations of Cl and SO4. Samples from clusters C3–C5 were characterized by very low levels of Cl, SO4 and Eh, and elevated concentrations of As and Fe. Clusters C1, C2 and (C3 and C4) were dominated by surface water, phreatic groundwater and confined groundwater samples, respectively.

FA results suggested that the following three factors were responsible for the main hydrochemical variability in the surface water and groundwater: (1) water–rock interaction; (2) redox conditions; (3) anthropogenic activities. Major components (e.g., Ca, Mg and HCO3) in surface water and groundwater generally originated from carbonate dissolution (dominant) and silicate weathering. Strongly reducing conditions favored geogenic As and Fe enrichment in confined groundwater. Anthropogenic activities primarily increased the Cl and SO4 concentrations in surface water and phreatic groundwater.

The distinguishing hydrochemistry and controlling factors between phreatic and confined groundwater suggested that the phreatic and confined aquifers in the study area probably belonged to different groundwater flow systems. The factor score 1 of representative samples in confined aquifer generally increased along the flow paths, which was consistent with the variation of the groundwater level. This study suggests that combination of multivariate statistical analysis could effectively indicate or verify the groundwater flow paths, and contributes to a better understanding of hydrogeochemical evolution in complex groundwater flow systems.