Introduction

The arid and semi-arid regions of the world generally rely on groundwater resources as the main source of dependable water supply because of low rainfall and the limited occurrence of surface water bodies. As rainfall in many of these regions is likely to decline because of global climate change and due to a fast growing population, groundwater dependency in these regions is likely to increase (Arnell 1999; Vörösmarty et al. 2000; Green et al. 2011). The availability of sufficient good quality groundwater is important in meeting agricultural, domestic and industrial demand to ensure food security and improve livelihoods (Calow et al. 2010; Green et al. 2011; Taylor et al. 2013). Understanding of the groundwater chemistry is important in regions experiencing increasing groundwater use driven by rapid urbanization and industrialization, which threatens both the water quality and quantity.

Across the world, land use practices have been found to impact groundwater resources. In China, Qin et al. (2013) and Huang et al. (2018) found that agricultural activities and urbanization were the cause of high nitrate concentration in groundwater in the Liaohe River Plain and Pearl River Delta, respectively. Silicate weathering, simple dissolution and evaporation were identified as the predominant hydrochemical process in Delhi, India (Kumar et al. 2006). Güler et al. (2012) reported the influence of both anthropogenic activities and natural process on groundwater in Tarsus Coastal Plain, Turkey.

Conventional and multivariate techniques namely, Hierarchical Cluster Analysis (HCA) and Principal Component Analysis (PCA) have been widely used in groundwater chemistry investigations, for instance, in Algeria (Belkhiri et al. 2010), China (Qian et al. 2016), Ghana (Yidana et al. 2018), South Africa (Adams et al. 2001), and Nigeria (Okiongbo and Douglas 2015). According to Dalton and Upchurch (1978) conventional techniques such as the use of Piper (Piper 1944), Durov and Schoeller plots are limited to assessing the composition of major ions in groundwater and excludes minor constituents for instance, nitrate and fluoride that when present in elevated levels could have adverse effect on human health (Güler et al. 2002; Ncube and Shutte 2005). In contrast, PCA and HCA are capable of handling and analysing all water parameters in an analytical suite through data reduction and classification (Dalton and Upchurch 1978; Suk and Lee 1999; Güler et al. 2002). HCA provides a way for the classification of large datasets based on the level of similarity of parameters and results are presented in the form of a dendrogram. Although, the classes produced by this analysis are subjective, they can often be linked to same geological source or/and land use practices. On the other hand PCA, a data reduction technique, is useful in revealing principal components that explains the variation in the water chemistry data (Usunoff and Guzmán-Guzmán 1989; Suk and Lee 1999; Furi et al. 2012). Interpretations are based on the extracted components loading that show the relationships between the chemical parameters. The causation of these relationships are inferred in order understand the hydrochemical processes and factors controlling groundwater chemistry.

The Vaal River Catchment (VRB) is a densely urbanized and industrialized region (Fig. 1) with the highest density of mining activities situated in the interior of South Africa (Braune and Rogers 1987). It is regarded as the economic hub of South Africa (SA), contributing more than 25% of the GDP (DWA 2003a, b). Anthropogenic activities such as industrial effluent discharge and mining activities have among other things caused the deterioration of surface water quality rendering it unfit for use (Braune and Rogers 1987). Nonetheless, increasing water demands driven by growing population and economic expansion, and the recurrence of drought has put pressure on surface water supply in the region (Rouault and Richard 2005; Wepener et al. 2011). As a consequence of these factors, groundwater has received attention as an alternative to surface water supply, particularly during drought conditions.

Fig. 1
figure 1

Map showing the Vaal River Basin with different land uses

Presently, the main user of groundwater in the region is the agricultural sector and to a lesser extent the urban centres, particularly in the southwestern part of the basin (DWA 2003b). Small towns situated in the drier northwestern and southwestern parts of the VRB also rely on groundwater for domestic water use. Braune and Rogers (1987) recognized the need for groundwater development in the region to reduce pressure on surface waters, and the need for groundwater protection from pollution.

Researchers have not treated groundwater in much detail as surface water quality in the VRB. This is mainly because surface water is the main source of water supply. It is beyond the scope of this study to examine surface water quality and interested readers are referred to Braune and Roger (1987) and Du Plessis (2017) for more information about this issue. Vegter (1995) produced a national map showing trends and variation of groundwater quality using Piper plot and salinity ratio. The study was more descriptive in nature. Hobbs et al. (2013) examined groundwater chemistry trends in the VRB using a Piper diagram, geological information and descriptive statistics to provide an overview of groundwater chemistry. Previous published studies are limited to local surveys and have mainly focused on the impacts of acid mine drainage (AMD) on water resources (Naicker et al. 2003; Nengovhela et al. 2006; Tutu et al. 2008; Durand 2012; Abiye 2014). Data from such studies suggest that acid mine drainage and tailing dumps are polluting water resources in the vicinity of mines. However, there have been few empirical and multi-approach investigations into groundwater chemistry in the basin. This work attempts to generate fresh insights into the hydrochemical characteristics of groundwater chemistry in the VRB.

The purpose of this paper is to assess the impacts of natural and anthropogenic activities on groundwater and to determine hydrochemical processes that influence groundwater quality in the basin. A combination of graphical techniques and multivariate statistics, PCA and HCA were used to examine the groundwater chemistry. These methods provided a means to robustly interpret the results with a high degree of confidence rather than in a speculative manner. This work provided an important opportunity to advance the understanding of the regional groundwater chemistry in the highly urbanized and industrialised VRB.

Geographical conditions

The VRB is situated in the interior of South Africa, covering an area of approximately 198,000 km2. The Vaal River originates from the Drakensberg escarpment in the northeastern part of the basin. It flows 900 km in a southwesterly direction where it joins the Senqu River from Lesotho and forms Orange River and discharges into the Atlantic Ocean. A water divide runs along the catchment boundary in a northeasterly to northwesterly direction which separates the Orange River Basin and Limpopo Basin. The topography varies from nearly level in the northwestern and southwestern parts of the study area, to very hilly in the Vredefort Dome and Johannesburg area.

Climatic factors like rainfall and temperature are highly variable in space and time in the VRB, which is a result of a combination of different factors such as elevation and moisture source (Middleton and Bailey 2005). The mean annual precipitation decreases from the east (800 mm) to west (500 mm) (Middleton and Bailey 2005). Rainfall is strongly seasonal and often occurs as convective thunderstorms in the late evening and early morning in the summer season (October to March). In winter it is relatively dry and there is little or no rainfall with the exception of light snow in mountainous parts of the region. Evaporation as measured by A-Pan also exhibits a similar but opposite trend, ranging from 1900 mm/year in the east to 2400 mm/year in the western part of the region. High evaporation in the southwestern part of the basin is evident by the occurrence of salt pans. The average annual temperature also increases from 14 °C on the eastern side to 17.9 °C in the western side of the region (Middleton and Bailey 2005).

Land use activities

The VRB lies in the Witwatersrand Basin, famously known for the rich gold and uranium deposits, the mining of which has underpinned development in the basin (Robb and Meyer 1995). Major industries include, but are not limited to: power and fuel generation; manufacturing; steel production and agriculture (Braune and Rogers 1987; DWAF 2003a, b). Based on field surveys, coal mining activities are located mainly in the northeastern part of the Basin. Gold and Uranium mines are scattered across the northwestern to the southwestern parts of the study area. Agricultural activities are mainly in the eastern and the southwestern parts of the study site. A high number of salt pans were observed during the field survey between Spitskop Dam and Kalkfontein Dam in the southwestern part of the basin, some are being mined for salt. Other notable industries include manufacturing of petro-chemical products from coal in upper part of the basin.

Geological and hydrogeological setting

The VRB consists of a wide variety of rock types including volcano-sedimentary, igneous and metamorphic rocks that covers the entire geological record from the Early Archaean to late Mesozoic (Fig. 2). More than two-third of the basin is covered by the Late Carboniferous to Early Jurassic Karoo Supergroup. It is composed of fine to coarse-grained sandstone, tillite and mud rocks that underlies the far eastern to the southwestern part of the basin (Johnson et al. 1996, 2006). In some localities, the sedimentary rocks are intruded by dolerite dykes and covered by basaltic lava.

Fig. 2
figure 2

A simplified geological map of the VRB (from the 1:1,000,000 geological map of South Africa prepared by the Council for Geosciences). The predominant rocks associated with a particular stratigraphic unit are included in the legend. Note: Spg and Gp refer to Supergroup and Group, respectively

The granitoids and gneisses are exposed in a few localities in the northern part of the basin. These basement rocks are unconformably and conformably overlain by meta-sedimentary rocks such as quartzite, shale and conglomerate that belong to the Witwatersrand Group (Robb and Meyer 1995; McCarthy et al. 2006). The quartzite makes up the prominent ridges in the northern part of the basin. The Late Archaean Ventersdorp Supergroup consists of volcano-sedimentary rocks that are mainly found in the northwestern part of the region (van der Westhuizen et al. 1991; van der Westhuizen and Bruiyn 2000). The volcano-sedimentary rocks include andesite, quartz porphyry, conglomerate and tuff. The Transvaal Supergroup (Neo-Archaean to Palaeo-Proterozoic age) overlies the Ventersdorp Supergroup (Eriksson et al. 1995; Eriksson and Altermann 1998). The Transvaal Supergoup is exposed along the upper northeastern boundary to the southwestern boundary of the basin. The predominant rock types are dolomite, conglomerate, quartzite, sandstone, shale, and andesite (Eriksson and Altermann 1998). The dolomite is intercalated with chert layers, and majority of the boreholes have been sited along this unit, including the limestone in the southwestern edge of the basin.

From a hydrogeological point of view, the dolomite formations that belong to the Transvaal Supergroup are the most significant aquifers, because of their high groundwater productivity. The dolomites are situated in the northeastern to southwestern parts of the VRB. In the explanatory notes compiled by Barnard (2000) of the 2526 Johannesburg map, four hydrogeological units were defined namely: (1) intergranular aquifers in the alluvial covered area; (2) karst aquifers in the Transvaal Supergroup carbonates; (3) fractured-rock aquifers in the Witwatersrand Supergroup; and (4) intergranular and fractured-rock aquifers in the Dominion Group, Klipriviersberg Formation and Karoo Supergroup.

The Malmani Subgroup dolomite can be classified as an extensive fractured and karstic rock aquifer. It is an unconfined system and comprises an extensive soil cover and weathering products underlain by karstic dolomite aquifer. Buttrick et al. (1993) and DWAF (2006), reported high storativity values (9.7%) and transmissivity values (30,000 m3/day) for the dolomitic aquifer. The younger Karoo Sequence (sandstone, shale, tillite and clay) fills the valley topography in the dolomitic terrain and in some parts the dolomite is flanked by quartzite ridges (Kafri and Foster 1989). In general, the dissolution cavities, fissures and fracture network act as conduits for groundwater movement and as recharge zones (Holland and Witthüser 2009; Abiye et al. 2011).

There are WNW-ESE trending faults (e.g., Rietfontein Fault) and major fractures post-Transvaal tensional tectonics that have penetrated the whole sequence of the Malmani Subgroup and the Ventersdorp lava into the underlying Witwatersrand Supergroup (Swart et al. 2003; Hobbs and Cobbing 2007). Abiye et al. (2011) showed the significance of fractures in regulating groundwater recharge in the crystalline rocks of Johannesburg. The fractured and granular aquifer have a low to moderate groundwater productivity with a borehole yield of less than 2 l/s (Barnard 2000). Hobbs and Cobbing (2007) reported relatively low to modest transmissivity values (< 10 m2/d) and low storativity values (< 1%) for the fractured-rock aquifers. Heterogeneity prevails over homogeneity in the aquifer system not all fractures are permeable.

Methodology

Data collection

Water chemistry data for existing wells in the VRB were obtained from the Department of Water and Sanitation (DWS) of South Africa. The DWS has a database that contains both surface and ground water chemistry data spanning more than a decade. The water chemistry data included both chemical and physical parameters. Trace element concentration and groundwater temperature records were discarded because of inconsistency and paucity in the measurement records. To ensure quality control of the datasets only physico-chemical measurements from monitoring stations with more than 5 years of chemistry data (1995 to 2016) were used in the study. The quality of the chemical analysis was verified by calculating the charge balance error (CBE) as part of quality assurance. The CBE value of less than ± 5% is regarded as the best analytical estimation, and the threshold CBE value is ± 10%, which is the acceptable error (Güler et al. 2002; Appelo and Postma 2004). The chemical analysis with CBE values greater than ± 10% are associated with the errors in the sampling and measurement procedures. In this study, 84% of the samples were below ± 5%, and few samples (16%) were greater than ± 5% but less than ± 10%, which means that the chemical analysis are reliable. Physical parameters such as total dissolved solids (TDS) amount was verified by calculating the total sum of major constituent, and comparing them to reported values. These values showed an agreement, which increases confidence on the use of the data. Table 1 show mean values of physico-chemical parameters from 70 groundwater monitoring stations (59 wells, 9 springs and 2 mine-water discharges) that were used in the study. Following the quality assurance of the chemistry data, data were averaged over the entire period for each monitoring station to get representative values which were used for graphical and statistical analysis, and interpretation. Averaging the data removes seasonal effects on water chemistry, which allows for identification of major hydrochemical process and of the determination of the key factors that govern the evolution of groundwater quality.

Table 1 Summary statistics of the physico-chemical parameters for the groups determined from HCA. Mean concentrations (mg/L), pH (standard units), electrical conductivity (µs/Cm)

Graphical and multivariate statistical methods

Statistical analysis was performed using SPSS software (version 25).The first step before analysis of the datasets was to perform a test for normality on the hydrochemical data using a Shapiro–Wilk (SK) and Kolmogorov–Smirnov (KS) test to assess how the datasets are distributed. Both tests for normality showed that most of the variables were not normally distributed.

The non-normal distributed datasets where then grouped according to similarity/dissimilarity in physico-chemical parameters, to reveal the relationship among the water samples. We assumed that the water samples that fall within the same cluster indicate a common source, and indicate similar hydrochemical processes. The squared Euclidean distance was used as a similarity/dissimilarity measurement with the Ward’s method for linkage such that the clusters are linked (Güler et al. 2002). The physico-chemical parameters where standardized using z-scores because they are not of the same scale and magnitude. The water samples within the groups are more similar to others within the group and dissimilar with water samples outside the groups. Grouping the raw data using HCA proved to be useful, because outliers where detected. The outliers were not discarded because the water chemistry measurements are not just random numbers but provide important information about the groundwater quality in the study area. The water samples within the groups were presented using Piper plot, Schoeller diagram and bivariate graphs.

A Piper diagram (Piper 1944) was used to classify the water types and evaluate possible mixing trends. Bivariate plots of major ions were plotted along the xy coordinates to validate key hydrochemical processes taking place. Schoeller diagram was used to visualise the distinct groups and their associated water samples. Althought the datasets where in mg/L with the exception of EC and pH, the unit of measurements that was used is Meq/L for Piper plot, Schoeller diagram and bivariate plots. The axis were converted to a logarithmic scale to discern patterns in the hydrochemical datasets.

One of the assumption of multivariate statistics method such as PCA is that the data must be normally distributed (Jolliffe 2011), which is not the case in this study as indicated by the KS-SK tests. For that reason, the datasets were log transformed to reduce the skewness in the distribution of the variables. By log transforming the data the level of normality of the datasets is improved, which consequently improves the reliability of the datasets.

Furthermore, we tested the suitability of the datasets for PCA by determining the Kaiser–Meyer Olkin (KMO) measure of sampling adequacy, and performing Bartlett’s test of sphericity. The former gives an indication of how the dataset is well conditioned for PCA, whereas the latter tests whether the correlation matrix of the dataset is significantly different from an identify matrix. In this study, the KMO value is 0.756, in the middling range (in the 0.70 s) according to Kaiser (1959), which means that the dataset is moderately suitable for PCA. The Bartlett’s test of sphericity tests the null hypothesis that the correlation matrix is an identity matrix. The significance value is less than 0.05 (p < 0.05), and therefore, we reject the null hypothesis that the correlation matrix is an identify matrix. This confirms that PCA is suitable for the datasets. The statistical tests indicates that PCA is applicable to the dataset and that it will be useful in the study. Factor analysis was performed on the normalized dataset and the components were extracted using PCA. Only components with eigenvalues greater than 1 were extracted, because they account for at least one variable in the dataset. The extraction communalities for the variables were greater than 0.6, which suggest a greater level of extraction. The principal components were rotated using a Varimax rotation with Kaiser Normalization (1959) to maximize the loading of each variable on one of the components, whilst minimizing the loading on all other components (Güler et al. 2002, 2012). The higher the communalities the better because of a greater level of extraction.

Results and discussion

An HCA assessment was performed on the raw chemical data. HCA outputs are in the form of a dendrogram (Fig. 3) and the statistical summary of the data is presented in Table 1 and were plotted on a Schoeller diagram (Fig. 4). Seven clusters were recognized from the HCA analysis (Fig. 3). Outlier were detected, and were included because they represent a relevant group that could have probably been under-sampled in the VRB. Groundwater samples 12, 40 and 65 were dissimilar to the groups and to each other, and were referred herein as GW12, GW40 and GW65, respectively.

Fig. 3
figure 3

Dendogram from the HCA for the 70 water samples. The red line of asterisks was chosen as the limit to select the number of groups

Fig. 4
figure 4

A Schoeller diagram showing the relative proportion of major ions for groundwater samples from the VRB

It can be seen from Table 1 that the mean values for the physico-chemical parameters for the groups are highly variable, which could be a result of heterogeneity in the geology and various land use activities in the basin. Lines connecting to the points in the Schoeller diagram (Fig. 4) visually illustrate the compositional trends in different samples. The pH of groundwater groups is generally alkaline in nature with the exception of Group 6, which is slightly acidic (6.99). The high pH could be a result of dissolution of the underlying carbonate aquifer, which is the major aquifer-type in the VRB.

Group 6 has the lowest values for chemical parameters (Fig. 4), a characteristic of water in recharge areas. Groups (5 and 7) and GW samples (12, 40 and 65) are highly mineralized waters (TDS > 1000 mg/L), with high sulphate concentrations (> 140 mg/L). The high TDS and SO4 are consistent with characteristics of mine-polluted waters in the basin, which means that Groups (5 and 7) and GW samples (40 and 65) are mine-polluted waters except for GW12. GW12 has a seawater geochemical signature with a much higher Na (574.06 mg/L) and Cl (796.80 mg\L) concentrations than all other samples. The high degree of mineralization of GW12 may be explained by deep circulating water that has had a long residence time. It could also be a result of the higher evaporation in the southwestern part of the basin, which increases the soil salinity. The samples belonging to Group 1, 2, 3 and 4 have relatively similar mean values for the physico-chemical variables; we can infer that this is water in circulation (Fig. 4). Group 2 has a notably high NO3 concentration (18.59 mg/l), which is evident of impacts of agricultural activities and urbanization on groundwater chemistry.

A comparison of water alongside the WHO (2004) drinking water guidelines reveals that the groundwater is below the guidelines with the exception of group (5 and 7) and GW samples. This does not take into account the bacteriological characteristics of groundwater which is an important consideration in drinking water.

The major ions were plotted in a Piper plot (Piper 1944) (Fig. 5), to discern the plausible genesis of the sample groups and the hydrochemical facies to which each sample belongs. The Piper plot shows that the groundwater is dominated by alkaline earths (Ca and Mg) over alkalis (Na and K) and that weak acids (HCO3) exceed strong acids (SO4 and Cl). The water samples within the groups can be broadly classified into two water types, namely: Ca–Mg–SO4–Cl and Ca–Mg–HCO3. The composition Ca–Mg–HCO3 is the predominant water type corresponding to the following groups: 1, 2, 3, 4 and 6. These observations suggest that groundwater is draining diverse geological materials, in particular, dolomite and volcanic rocks in the western part of the basin. These findings are consistent with Hobbs et al. (2013) who showed that Ca–Mg–HCO3 represents pristine water from the dolomitic aquifer. The highly mineralized water, Groups (5 and 7) and GW (40 and 65), are clustered in the Ca–Mg–SO4–Cl field, and are likely to be indicative of pollution from multiple source, particularly from mining and urbanization. This is supported by previous studies by McCarthy (2011), Durand (2012) and Abiye (2014) who demonstrated that mining pollution is the main cause of SO4 enrichment in groundwater in the region. The Ca–Mg–SO4–Cl water type probably evolved from the interaction of recharge water in circulation and acidic water from the mines, and water from the shallow aquifers that has been polluted by highly soluble chloride compounds from the highly urbanized area. In their study Hobbs et al. (2013) reported possible mixing of groundwater from pristine dolomitic aquifers with mining water resulting in a Ca–SO4 water type. The cation ternary plot shows that the bulk of water samples have no dominant ions. These could be explained by mixing of water from different sources, cation exchange between Mg–Ca–Na and the replacement of HCO3 by SO4. Groundwater sample GW12 has Na–Cl water composition which confirms that it is not mining related.

Fig. 5
figure 5

Piper diagram showing the hydrochemical facies of groundwater. The ternary plots show the abundance of major cation and anion in decreasing order: Ca > Mg > Na and HCO3 > SO4 > Cl

The results of the correlation analysis are presented in Table 2. Pearson correlations reveal statistical relationships between two or more variables, and no causal relationship is assumed between the variables. For most correlations among the variables the significance values were less than 0.05 (p value < 0.05), which implies that there is a statistical significant relationship among the chemical variables. We also found that a high number of variables have correlation coefficient values (r), p < 0.05, that exceeds the critical value (R) which is 0.24. The R value is obtained from the critical value table, at the intersection between the degrees of freedom (68) and the level of significance (0.05). The R is low because of a higher degree of freedom (68). In this study, we assume that correlation among variables with r ≥ 0.5 are practical and meaningful. The strength of the correlation among the variables is described as follows weak (r 0.00–0.49), moderate (r 0.50–0.69), strong (r 0.70–0.79) and very strong (r 0.80–1).

Table 2 Correlation matrix of the 15 physico-chemical variables

What stands out in the table is a very strong correlation between TDS and EC, which are both moderately to very strongly correlated to the major ions and F. These observations indicate that the major ions and F contributes significantly to groundwater salinization and mineralisation in the basin. It is not surprising that TAL is strongly correlated to Mg and Ca in the basin. This could be a result of chemical weathering of carbonate minerals, which releases CO3 and HCO3 in the groundwater system. Another possible source of CO3 and HCO3 is rainfall, which is slightly acidic, and carbon dioxide from the soil. The moderate correlation between TAL and pH could be an indication that the groundwater has a high buffering capacity, which could also explain the neutral to basic pH values observed earlier in Table 1.

There is a very strong positive correlation between Ca and Mg (r = 0.90). We suggest that the dissolution of dolomite (CaMg (CO3)2) which underlies a large part of the northwestern and southwestern part of the region, and to a lesser extent limestone (CaCO3) is the major source of Ca and Mg. The metasedimentary and volcanic rocks found in the central and northern part of the basin contain ferromagnesian minerals including but not limited to hornblende, biotite, mica, talc, chlorotoid and feldspars that could be the additional source of Mg and Ca. We observed correlations between the cations Na–K (0.72), Na–Ca (0.59), and Mg–Na (0.50). The possible explanation might be ion exchange processes and silicate weathering.

The most surprising aspect of the data is the moderate to strong correlation between sulphate and all other major ions Na-SO4 (0.73), Ca-SO4 (0.67), Cl-SO4 (0.61), Mg-SO4 (0.60) and K-SO4 (0.53). The known source of SO4 is primarily the dissolution of sulphate precipitates such as gypsum (CaSO4.2H2O), anhydrite (CaSO4) and magnesium sulphate (MgSO4) found largely in mining tailing dumps, they are also sources of Ca and Mg. This is supported by Naicker et al. (2003) and Tutu et al. (2008) who reported elevated proportions of pyrite and other minor metallic sulfides in tailing dumps, which on dissolution pollutes groundwater in the basin.

There is also a very strong correlation between Na and Cl (r = 0.85). This relationship may be partly explained by the dissolution of evaporites minerals such as halite (NaCl) and sylvite (KCl) in the salt pans during recharge, which are the sources of Na, Cl and K. This may also explain the moderate correlation observed between Cl and K (0.62). Another explanation of the association between these ions is likely to be deposition of marine-derived salts in the rain. The moderate correlation between Cl–F (0.66), Ca–Cl (0.59) and Mg–Cl (0.54) could be related to industrial discharge in the basin. DWA (2003a) reported the discharge of wastewater by industries and household that contains highly soluble salts (ClF, MgCl2, CaCl2 and NaF) which infiltrates and join the groundwater system. The soluble salts are anthropogenic and are a source of F, Cl, Ca and Na. The granites, sandstone, quartzites and mudstones in the study area contain; albite, anorthite and apatite which could contribute to Ca, Na and F enrichment in groundwater on dissolution.

Strong correlation between Na and F (r = 0.75) exists. The fluoride in groundwater is most likely derived from the weathering of basement rocks in the VRB and the concentration of this ion in groundwater is likely to be limited by the solubility of fluorite and common-ion effect caused by elevated calcium ion concentrations in some water samples. Ncube and Schutte (2004) reported high F concentration in the northwest province, which suggests that there is a geogenic source of fluoride in this part of the basin.

The observed correlation between NH4 and PO4 in groundwater is likely to be a result of runoff from agricultural activities and sewage sludge that seeps to the subsurface. This finding is consistent with that of DWA (2003a, b) which showed the impact of urbanization and industrialization on water quality. Phosphate could also be sourced from the weathering of basement granitic rocks, which contain the mineral apatite [Ca5 (PO4)3(F,Cl,OH)]. The NO3 and Si are moderately correlated. We suggest that groundwater interaction with the meta-sediments and dolomite intercalated with chert is the major source for Si. Agricultural return flows and domestic wastewater discharge to streams are possible sources for nitrate in groundwater system. Nitrate could also be partially sourced from the oxidation of ammonium ions.

Ionic ratios

Geochemical variations in major ions were plotted in an xy coordinate (Güler et al. 2002) to investigate possible geochemical processes that influence the chemistry of groundwater in the region. From Fig. 6a plot of Na + K vs Cl + SO4, it is apparent that about 85% of the samples plots above the 1:1 line indicating the dominance of Cl and SO4. The Cl and SO4 are most probably derived from the oxidation of sulphide ores as a result of human disturbances such as mining activities and urbanization in the basin. Group 5 samples plot above all the samples, which is indicative of mining pollution.

The plot of Ca + Mg vs HCO3 + SO4 (Fig. 6b) is useful in revealing the likely chemical reactions in groundwater, particularly the extent to which ion exchange reactions influence groundwater quality. In case of ion exchange the samples will tend to shift to the right on this plot as a result of excess of HCO3 + SO4. On the other hand, samples will tend to shift to the left due to an excess of Ca + Mg over HCO3 + SO4 (Fisher and Mulican 1997). The samples that are clustered closer to the 1:1 line indicate that the dissolution of calcite, dolomite and gypsum is the predominant hydrochemical process influencing groundwater quality. Nearly 98% of the samples are clustered around and above the 1:1 line, which shows that reverse ion exchange is significant in the basin. Furthermore, the plot of Ca + Mg vs Cl (Fig. 6d) shows that Ca and Mg do no increase with increasing salinity, which could be further evidence of carbonate weathering and an indication of reverse ion exchange.

Fig. 6
figure 6

Graphs showing the plot of Na + K vs Cl + SO4 (a), Ca + Mg vs HCO3 + SO4 (b), Ca + Mg vs Na + K (c) and Ca + Mg vs Cl (d). The solid line represents the 1:1 line

The plot of Ca + Mg vs Na + K (Fig. 6c) is useful in identifying the importance of chemical weathering of carbonates and silicates. The graph shows that the sample are clustered to the left of the 1:1 line, which means that carbonate dissolution is more important than silicate weathering. Furthermore, the plot supports the suggestion that GW12 is derived from dissolution of evaporates or a result of deep circulating groundwater with a long residence time, and it is not polluted by mine activities.

Table 3 indicates the loading of the variables on each principal component, and the percentage of variance that the variables accounted for in the regional groundwater chemistry data. Four components that accounted for 81.52% of the total variance were extracted using the PCA method. The first component accounts for the most variation in the data, and so are the other remaining components for the residual data. The seventeen variables were treated independently to reveal interrelationships. The principal components were used to explain and infer predominant hydrochemical processes related to man-made and natural factors in the VRB.

Table 3 Rotated component matrix of hydrochemical data of all groundwater samples

Component 1

The first component accounts for 48.61% of the variance in the chemical analysis. The first component has a high positive loading of Na, Cl, K, F, EC, SO4 and TDS in decreasing order. The high positive loading of the ions with EC shows that they are major contributors to groundwater salinity. We suggest that the dissolution of feldspars, micas, salts and phosphates bearing rocks in the study area is a possible sources of Na, K, F and Ca enrichment in groundwater. Enrichment of Na, K, Cl could also be related to soil salinity due to direct evaporation of rainfall and preferential dissolution particularly in the arid parts of the basin. This confirms that dissolution of soil and geological material and the dissociation of evaporites are active processes in the VRB. The occurrence of Na, Cl, K and Ca could also be linked to industrial effluent, domestic wastewater and fertilizers. The high loading of SO4 is indicative of anthropogenic disturbances in the environment particularly gold, coal and uranium mining. The oxidation of sulphide ores such as pyrite (FeS2), arsenopyrite (FeAsS), cobaltite (CoAsS) and pyrrhotite (FeS) in mine tailing dumps and abandoned mines releases SO4 to the environment. The SO4 is latter removed from the water by addition of lime to neutralize the acidic water in the mine site and/or interaction of dolomitic water with the SO4 rich water. These results in the precipitation of sulphate salts, which on dissolution by carbonic acid from rain or inflow of more acidic water will generate SO4, K, Mg, Ca, and Na. Both human activities such as mining and industrial activities, and dissolution of rocks have a considerable influence on the groundwater chemistry.

Component 2

This component has a high positive loading for TAL, pH, Mg, Ca and TDS, explaining 14.88% of the variance in the remaining water chemistry data. The second component shows that dissolution of the underlying dolomitic aquifer is a major hydrochemical process. The dissolution of dolomite, gypsum and anhydrite by circulating water recharging the groundwater system releases mainly Ca, Mg, CO3 and SO4, which affects the pH and contributes to TAL and TDS of the water. The plot of Ca + Mg vs HCO3 + SO4 (Fig. 6b) also illustrated the predominance of carbonate dissolution over weathering of ferromagnesian minerals which contain Mg and Ca. Therefore, the geology exerts some control on the regional groundwater chemistry and not all groundwater is impacted by human activities, particularly in areas underlain by dolomites outside of the mining footprint.

Component 3

Phosphates and ammonia have the highest positive loading, explaining less than 10% of the variance in the chemical data. The presence of PO4 and NH4 in groundwater in the study area is mainly due to sewage sludge, industrial wastewater discharge, and agricultural return flows across the basin. Although the breaking down of organic matter in the soil and the chemical weathering process of granitic rocks in the study area could partly contribute to NH4 and PO4, human activities play a much more significant role in influencing the chemistry of groundwater.

Component 4

NO3 and Si have a high positive loading, which accounts for the remainder of the variation (7%) in groundwater chemistry. Si is derived from the weathering of sandstone, granite, conglomerate, quartz porphyry, and quartzite mainly in northern, northwest and central part of study area. Nitrate in the study area is largely a result of disposal of sewage in the Vaal River and its tributaries, which ends up joining the shallow groundwater system. Another human induced possible source of NO3 is agricultural return flows. Natural contribution of ammonia could be the oxidation of ammonium to nitrate. These findings further support the idea of geogenic and human induced activities on hydrochemical composition of groundwater.

Conclusion

The purpose of the current study was to assess both man-made and geological factors that influence the chemical composition of groundwater and to identify the predominant hydrochemical processes. A combination of graphical and multivariate statistics was used in the study. One of the most significant findings to emerge from this study is that the groundwater chemistry is principally controlled by both geological and human induced activities. It was found that the relative abundance of ions is highly variable among the different groups, which was indicative of recently recharged water, mine-polluted water and water in circulation. The main water types were Ca–Mg–HCO3 type and Ca–Mg–Cl–SO4 type. Human activities such as gold mining and industrialization are likely to be the major sources of higher sulphate and some other ions such as chlorides that pollute the dolomitic groundwater. A comparison of the water chemistry measurements with the WHO water guidelines indicate that the water is below the WHO standards on a regional scale. However, other constituents of public health and environmental concern were not tested. The dissolution of dolomite and gypsum is the main source of ion abundance in groundwater. To a lesser extent dissolution of sulphate salts and evaporites contribute to the TDS. The process of reverse ion exchange is more active than cation exchange. Anthropogenic activities such as agricultural return flows, wastewater treatment discharge and industrial wastewater discharge are possibly significant sources of ammonia, nitrates and phosphates in the groundwater system. Notwithstanding the relatively limited water samples, this work offers valuable insights into the hydrochemical processes and factors that govern evolution of groundwater quality in a geological complex region with various land uses. A natural progression of this work is to analyse isotope to pinpoint the exact source of origin and processes of different ions in the basin. Greater efforts are needed to safeguard the groundwater quality for sustainable use in the future.