Introduction

Groundwater forms a major source of water supply for drinking and domestic purposes in Kapas Islands. The frequency of pumping activities depends on the demand for freshwater consumption. High population growth especially in the island ecotourism sector will increase the groundwater pumping to meet the water demand. However, the over-abstraction of groundwater could be a threat to the aquifers, particularly in the coastal areas, which are vulnerable to seawater intrusion. Consequently, the groundwater will further decline in quality caused by the up-coning of the transition zone or salinity ingress into the groundwater aquifer. In addition, the groundwater may be polluted by an inundation episode in which the saline surface runoff infiltrates into the groundwater aquifer, which is exacerbated by the drought season.

The complex mechanism of a groundwater aquifer is explained in Fig. 1 as a prerequisite to understanding the potential groundwater pollution. The conceptual model of small islands explains the hydrological cycle and other chemical mechanism. The hydrological cycle begins from when the precipitation (rainfall) infiltrates into the groundwater system, also recognized as groundwater recharge. Then, the groundwater may seep away into rivers and streams. However, this step does not exist in small islands which only consist of an ephemeral river. Since small islands consist of only an ephemeral river, this step is skipped. The groundwater is pumped out for daily consumption and flows back into the oceans. The evapotranspiration may take place as the moist air is lifted to form clouds where the precipitation came from, while the potential for the initiation of groundwater pollutions is based on several factors: (1) the drawdown of groundwater level caused by pumping activities, (2) the up-coning of transition zone caused by over-abstraction, (3) leakage from sewage piping, (4) leachate from dumping site, (5) herbicides (chalet maintenance) and (6) water–rock interaction.

Fig. 1
figure 1

(modified from Isa et al. 2014a, b)

Overview of small tropical island’s conceptual model showing simplified hydrological cycle and hydrogeochemistry mechanisms. a Production well, b chalets and c dumping site—(i) vertical intrusion, (ii) lateral intrusion

The association of seasonal changes in the complex of groundwater system results in the alteration of groundwater composition. During rainfall, the groundwater aquifer widens, eliminating the contaminants by dilution and freshening the groundwater quality. In contrast to the wet season, in the dry season due to the decline in groundwater storage, ionic concentrations increased in the groundwater, which can decrease the quality of the groundwater. In order to sustain the availability of groundwater and to ensure the quality of the supply, the impact of climate variability needs to be understood.

This study was undertaken to monitor and understand the groundwater composition based on spatial and temporal scales, in which the contributions of hydrochemistry in the groundwater system are explained using multivariate statistical tools; this is a well-known approach in the scientific research and has been widely used in environmental studies (Aris et al. 2012; Belkhiri and Narany 2015). Various studies on hydrochemistry have used the statistical methods to analyze the hydrochemical data in several ways to delineate the zone of contamination, sample classification, grouping and discriminant variables, and spatial and seasonal (temporal) variation (Table 1).

Table 1 Numerous statistical applications concerning the identification of hydrochemistry study

For example, Barragán-Alarcón (2012) compared three (3) techniques which are principal component factor analysis (PCFA), inverse hydrochemical modeling and forward hydrochemical modeling to identify the hydrogeochemical processes. PCFA resulted in differentiate portions of variability on a hydrochemical data matrix. Forward modeling was used to identify the saturation indexes of different minerals, and phase stability diagrams were used to estimate the silicate deposition. In addition, inverse modeling was used to determine the net mass transfers between two aquifer points. Another approach to multivariate analyses was from Kumar et al. (2011) who used a combination of factor analysis (FA) and ordinary kriging in a geostatistical model. FA was applied to determine the critical pollution indicators due to saltwater intrusion and arsenic pollution, while the geostatistical tool was used to predict the nutrient pollution in the groundwater as well as to describe the spatial and temporal behavior of the hydrochemical parameters. Ratha and Venkataraman (1997) used linear discriminant analysis and partial correlation coefficient analysis to establish the seasonal variation in the chemical data of soil and groundwater. The study was observed to have a large variance of chemical concentration between the soil and groundwater in the different monsoon seasons. In addition, the partial correlation coefficient analysis of soil samples showed that they were dominated by the chemical weathering and the precipitation of atmospheric fallout.

Although most of the previous hydrochemical studies only relied on the application of classic chemical methods such as piper and Schoeller diagrams, multivariate statistical tools have enhanced the explanation by identifying the underlying hydrochemical processes, the significant factors that impact on the water chemistry and the relation between the physicochemical variables in the groundwater. The combination of graphical techniques and statistical methods in hydrochemistry studies provides complementary results that provide a fuller explanation.

Three statistical methods including principal component analysis (PCA), discriminant analysis (DA) and cluster analysis (CA) are applied in this study. Together, these statistics enabled the associations between samples or variables (Vega et al. 1998) to be indicated in order to group the sampling stations, to estimate the contribution of natural or anthropogenic influences and to identify the critical variables that are responsible for the composition of the groundwater.

The PCA method reduces the dimensionality of the large datasets by explaining the correlation among variables in terms of principal components without losing too much information (Lu et al. 2012; Khelfaoui et al. 2013) and allows the data to be associated between variables based on several influential factors. It is a powerful technique for pattern recognition that attempts to explain the variance of a large set of intercorrelated variables and transforms them into a smaller set of independent variables (uncorrelated–orthogonal) called principal components (PCs) (Krishna et al. 2009; Critto et al. 2003; Helena et al. 2000). HCA is a classification method that enables the grouping of similar stations on the basis of homogeneity within a class and heterogeneity between classes based on the hydrochemical status of the dataset set (Lee and Song 2007) and plays as an important role in interpreting the data and indicating patterns. Meanwhile, the main aim of DA is to provide a numerical method for classifying variables into two or more statistically distinguished groups. The linear combinations of the independent variables found by means of this technique will discriminate the groups in such a way that the misclassification error rates are minimized (Lambrakis et al. 2004). DA differs from other classification methods, as it demands a priori knowledge about the relations between samples (Papatheodorou et al. 2006). A qualitative dependent variable and a set of independent variables are necessary for discriminant analysis.

The objectives of the present study are to explain the hydrochemical information concerning the groundwater quality by considering the seasonal variation (pre- and post-monsoon) and to identify the controlling factors of the groundwater hydrochemistry by applying three (3) multivariate statistical techniques (PCA, DA and HCA). The finding of this investigation is expected to promote awareness of the need for statistical methods to be used in conjunction with traditional graphical techniques and to reinforce the importance of critical analysis to be undertaken even when the data appear to exhibit minor variation.

Materials and methodology

Site description

Kapas Island is a small tropical island that has been developed for ecotourism activities in Malaysia. It is located 3 km offshore (5°13.140′ N, 103°15.894′ E), in the district of Marang in Terengganu, Malaysia (Fig. 2). It comprised an area of around 2 km2 and is classified as a small island based on its land mass (Abdullah 1981; Shuib 2003; White et al. 2007), which is partitioned mainly into mainly hilly (90%) and lowland (10%) areas. Due to the accessibility, the population of local people and tourist are concentrated in the low-lying area. The annual precipitation of this study, which was based on the data for 13 years (2000–2012), is approximately 2800 mm. Kapas Island experiences heavy rainfall at the end of each year due to the interchange of the monsoon season (southwest monsoon to northeast monsoon). The average daily temperature is between 28 and 31 °C, and the humidity is around 70–80% annually. Based on the previous hydrological records (Abdullah 1981), Kapas Island consists of one layer of aquifer, which is interlayered by various types of rock (Fig. 3). Based on geological studies, Kapas Island is covered by sedimentary rocks and the sequence of formation is conglomerate. Moreover, the sedimentary rocks mostly consist of sandstone, mudstone, shale, and silts (Shuib 2003). However, the majority of Island is covered with recent alluvial (Abdullah 1981).

Fig. 2
figure 2

Satellite image of Kapas Island, Terengganu, and the location of monitoring boreholes (KW1, KW2, KW3, KW4, KW5 and KW6)

Fig. 3
figure 3

Box and whisker plot a for pre-monsoon and b for post-monsoon showing the variation of the chemical constituents in the studied groundwater sample of Kapas Island, and the Schoeller diagram c showing the major ion concentrations for the two different monsoon seasons (n = 216)

Sampling and analytical procedures

A total of 216 groundwater samples were collected bimonthly from six monitoring boreholes (KW1, KW2, KW3, KW4, KW5 and KW6) for two different monsoon seasons pre (Aug–Oct 2010)- and post-monsoon (Feb–Apr 2011). The lowland area was chosen for the construction of the boreholes as the area experiences dense population, and hence, groundwater pumping activities are practiced. The information for each monitoring boreholes is listed in Table 2.

Table 2 Coordinates, distances, depths and groundwater level of each monitoring borehole

Prior to the groundwater sampling method, first, the groundwater level was quantified and then, the groundwater was pumped out for about 10–15 min so to avoid any stagnant or contaminated groundwater. The procedure continued with the groundwater collection. Composite groundwater samples were collected and divided into two major portions: A and B. Portion A was used to measure in situ parameters, namely temperature, pH, electric conductivity (EC), total dissolved solid (TDS), salinity, redox potential (Eh) and dissolved oxygen (DO), while portion B was divided into another two sections: for anion and cation analysis. Anion analyses of bicarbonate (HCO3), chloride (Cl) and sulfate (SO4) were conducted on-site within 24 h of sampling (APHA 2005; Hidalgo and Cruz-Sanjulian 2001). The HCO3 and Cl were measured using the titration method of HCl and AgNO3, respectively, while SO4 was determined using a HACH meter (HACH, Loveland, CO, USA). For cation analyses, groundwater samples were filtered through a membrane filter with a 0.45-µm pore size and acidified directly after the filtration using HNO3 until pH < 2. The acidification was to prevent the effects of oxidation and inhibit the bacterial development in the samples (Appelo and Postma 2005). All the filtered groundwater samples were stored in polyethylene bottles and kept in a cooler box filled with ice to maintain the temperature at 4 °C, before transporting to the laboratory for further analysis. Cation, Ca, Mg, Na and K were determined using a flame atomic absorption spectrophotometer (FAAS, PerkinElmer, Massachusetts, USA).

The portable equipment used in the field for in situ measurement was calibrated with commercial buffer solutions to ensure it was functioning appropriately, while the glassware was pre-cleaned with 5% of HNO3 to avoid contaminants that caused the alteration of datasets. Triplicate data for each variable were collected prior to data collection, and the accuracy check for the FAAS was performed using a blank solution and three-point calibration curve of standard solvent. The laboratory also participated in a regular national program on analytical quality control.

Data analyses

The hydrochemical facies of groundwater were illustrated using a Schoeller diagram and box and whisker plot. Statistical tools of descriptive analysis, PCA, DA and HCA, were performed using XLSTAT software. The average values with standard deviation (SD) for each parameter were given. Generally, environmental data are characterized by exceptionally high values that deviate widely from the main body of data; hence, data transformation will not normalize the data, but instead will lead to some un-conservative conclusions (Lim et al. 2012). In this case, the present study did not show as being normally distributed, even after the transformation was made. In this case, irrespective of whether or not the data were transformed, a similar output for each multivariate applied showed that none were significant (p > 0.05).

Principal component analysis (PCA)

PCs in PCA were read from greater possible variance as PC 1, with PC 2 having the second greatest variance, and so on. PCA extracts the eigenvalues and eigenvectors, which depend on the range of standard deviations, and uses the correlation matrix (Yongming et al. 2006). The correlation matrix is recommended when variables are measured in different scales. The main outputs in the PCA are data matrices consisting of the principal component (PC) scores and loadings (Stetzenbach et al. 1999). The component scores will usually account for approximately the same amount of information as a much larger set of original observations. Component loadings represented by number 1 < x <−1 (Kumar et al. 2011) were screened where numbers >0.6 and <−0.6 were taken into consideration during the interpretation of this study. PCA attempts to extract a lower-dimensional linear structure; this allows the “cleaning up” by rotating the axis defined by PCA (varimax rotation), which increases the participation of the variables with a higher contribution, while, simultaneously, reduces the variables with a lower contribution (Helena et al. 2000). The basic outcome of any rotational method is to achieve a simpler and meaningful representation of the dataset (Farmaki et al. 2012). The Kaiser–Meyer–Olkin (KMO) test is used to evaluate the suitability of the groundwater quality dataset for the PCA investigation.

Hierarchical cluster analysis (HCA)

The HCA for this study employed the squared Euclidean distance for the similarities and the Ward’s method for the linkage produced as it processes a small space distorting effect (Lee and Song 2007), and also, since it uses more information on the cluster content than the other methods (Reghunath et al. 2002). HCA provides a branching diagram with junction levels, which offers a visual summary of the clustering processes, presenting a picture of the groups and their proximity with a reduction (Sheikhy Narany et al. 2014). A hierarchical cluster diagram was prepared whereby the sampling stations were linked into clusters on the x-axis and the linkage distances were plotted on the y-axis. The linkage distances between clusters illustrated relative similarities in the hydrochemical processes of the groundwater samples (Farnham et al. 2000).

Discriminant analysis (DA)

DA includes several different methods: standard stepwise, forward stepwise and backward stepwise in constructing discriminant functions to evaluate the most important factors for groundwater quality. In the forward stepwise mode, variables are included step by step beginning from the more significant to the less significant changes obtained, whereas in the backward stepwise mode, the variables are removed step by step beginning with the less significant until no significant changes are acquired (Singh et al. 2005). DA consists of two stages. In the first stage, the estimation of parameters in the model uses a so-called training sample. Through the use of training samples, the discrimination functions are constructed. The discrimination functions found are then applied to validate (second stage) the correct classification (deFigueredo et al. 2014). The application of DA is expected to determine which variables are the most significant to the classification in the PCA and to verify whether the groups are classified correctly by the HCA. All 14 variables were used in this statistical analysis for PCA, HCA and DA.

Results and discussion

Descriptive analyses

The physical–chemical variables determined in the groundwater samples and the correlationship table is summarized in Tables 3 and 4, respectively. The results showed a pH mean value of 7.17, which indicated the natural nature of the groundwater, and was negatively correlated (r = −0.965; p < 0.01) with Eh, for which the average value was 1.09 mV. Moreover, salinity, EC and TDS also showed a strong positive correlation (EC and salinity; r = 0.998; p < 0.01) (TDS and salinity; r = 0.999; p < 0.01), which could indicate that the dissolved ions in the groundwater were dominated by major elements. The average concentration of salinity, EC and TDS were 0.23 ppt, 0.23 µS/cm and 238 mg/L, respectively.

Table 3 Descriptive analyses for in situ and the major ions of the groundwater samples (n = 216)
Table 4 Correlation coefficient for groundwater samples of Kapas Island (n = 216)

The domination of cations was presented by the Ca > Na > Mg > K trends with the average concentrations of 64.05, 13.37, 5.72 and 0.77 mg/L, respectively. Moreover, domination of anions was HCO3 > Cl > SO4 with the average concentrations around 326.87, 31.16, 12.34 mg/L, respectively. The box and whisker plot (Fig. 3a, b) clearly indicated the differences in the concentration of the physicochemical parameters between the pre and post-monsoon. The finding was also justified using the Schoeller diagram (Fig. 3c). Based on the Schoeller diagram, groundwater in the pre-monsoon showed higher concentrations of Na and Cl, which could represent seawater intrusion, and also showed higher concentrations of Ca, Mg and HCO3 in post-monsoon, which indicated the dissolution of carbonate rocks in the study area. Therefore, the findings revealed that the groundwater type is Ca–Na–HCO3 in the pre-monsoon, which is slightly changed to the Ca–HCO3 in the post-monsoon. During the pre-monsoon season, the salinization could impact the groundwater quality because of the rainfall. As discussed in the conceptual model, the salinization process might be from over-abstraction of groundwater that later promotes the lateral and vertical seawater intrusion or inundation events. The groundwater during the post-monsoon has been considered as renewed groundwater due to the heavy rainfall.

The groundwater hydrochemical during the pre-monsoon usually had Ca and HCO3 domination (Eq. 1), which altered when the groundwater have interacted with impurities (seawater elements). The simple reaction in Eq. 2 describes the cation exchange process as Ca has exchanged with Na. This explained that when Na (impurities) from the X (bounded onto bedrock) exchanges with the free Ca+ ion in the groundwater, it results in the release of the Na+ ion. Meanwhile, the post-monsoon has a Ca–HCO3 groundwater type as the earlier Na+ ion has shifted back with Ca (bounded onto bedrock) as can be described by the vice versa of Eq. 2.

$$\mathop {{\text{H}}_{2} {\text{O}}}\limits_{\text{Groundwater}} + \mathop {{\text{CaCO}}_{3} }\limits_{\text{Bedrock}} \to {\text{Ca}}^{2 + } + {\text{HCO}}_{3}^{ - } + {\text{OH}}^{ - }$$
(1)
$$\mathop {\frac{1}{2}{\text{Ca}}^{ + } }\limits_{\text{Free ion}} + \mathop {{\text{Na}} - {\text{X}}}\limits_{\text{Binder}} \leftrightarrow {\text{Na}}^{ + } + \frac{1}{2}{\text{Ca}} - {\text{X}}_{2}$$
(2)

Multivariate analyses

Principal component analysis (PCA)

A KMO value close to 1 generally indicates that the PCA is useful, as was the case in this study of pre- and post-monsoon seasons, with 0.77 and 0.80, respectively. PCA was performed to compare the compositional pattern between the groundwater samples and the seasonal variation to identify the factors influencing each one. The PCA was adopted in which the strong factors, which had the most important loadings (loadings <0.6 and >−0.6), were retained, and the less important were excluded. In this study, PCA revealed four components in the pre-monsoon and three components in the post-monsoon with 81.6 and 78.9% of the total variances, respectively (Table 5).

Table 5 Principal component loading with varimax rotation

PC 1 in the pre-monsoon accounted for 45.4% of the total variance and had a strong loading for TDS, EC, salinity, Eh, pH, Cl and Na, which could be attributed to the natural hydrochemical evolution of groundwater by the simple mixing of both the groundwater and seawater. PC 2 and PC 3 accounted for 12.5 and 12.2% of the total variance, respectively. They consisted of SO4, K, Mg and Ca, which indicated an evaporite or low mineralization process. PC 4 consisted of temperature, HCO3 and DO variables with 11.6% of the total variance. The post-monsoon, which was interpreted as PC 1, extracted 43.4% of total variance and was signified by Mg, TDS, EC, salinity, Ca, Na, pH, Eh and HCO3. From these elements, for PC 1, the factors controlling the groundwater were considered to be the lithogenic component since the variables seemed to be controlled by the parent rocks. PC 2 and PC 3 accounted for 21 and 14.6%, respectively, represented by K, temperature, Cl, SO4 and DO. Distinct groups between PCs are demonstrated in Fig. 4, as pre-monsoon and post-monsoon reflect in Fig. 4a–d, respectively. The color classification of yellow was based on the most significant variables from PC 1 and PC 2 in each season, while the blue color indicates other PCs.

Fig. 4
figure 4

Variables of PC 1 vs. PC 2 and PC 3 vs. PC 4 for pre-monsoon (a, b) and variables of PC 1 vs. PC 2 and PC 2 vs. PC 3 for post-monsoon (c, d) after varimax rotation

PC 1 in the pre-monsoon revealed the implications of the natural process of salinization by explaining the control factor of the seawater elements (Na and Cl). The elements comprising PC 1 described the controlling factor of the groundwater composition during the pre-monsoon with a high possibility of seawater disturbance either by the up-coning of the transition zone caused by the over-pumping or the dilution of residual salt from inundation events. Although salinity can include other ions, relatively few of those that make up most of the dissolved materials were major ions. This showed that salinity was strongly correlated (p < 0.01) with Na and Cl with r values 0.797 and 0.823, respectively. The correlation between TDS and EC (p < 0.01) explained the fact that the dissolved substances in the groundwater highly involved the Na and Cl ions since they were in PC 1. Meanwhile, pH and Eh correlated well with each other (p < 0.01), which indicated the oxidation–reduction process in the groundwater where some of the monitoring boreholes experienced an unpleasant odor caused by the sulfate reduction (Eq. 3).

$${\text{SO}}_{4}^{2 - } + {\text{organic matter}} \to {\text{H}}_{2} {\text{S}} + {\text{CO}}_{2}$$
(3)

PC 2 encountered 12.5% of the total variance, which included SO4 and K. These two variables could exist in the natural environment as nutrients or they could arise from man-made sources. SO4 and K might have leached out from the rocks of the aquifer matrix as the natural sources, and the manufactured herbicides and fertilizers were the most common sources of nutrients found in the groundwater.

PC 3 showed a strong absolute loading of Ca and Mg (carbonate elements), which also explained 12.2% of the total variance of the groundwater quality in the pre-monsoon season. As described in Eq. 1, tropical aquifers were made from carbonate bedrock. The compaction and deposition of coral could simply explain the contribution of carbonate elements in the groundwater composition (details in XRD and SEM-EDX section). Based on the literature (Ali et al. 2001), Mg in the groundwater of study area can be originated from the minerals consisting of a high concentration of Mg. The contribution of high Mg mineral (CaMg (CO3)2) clarified the existence of Mg in the groundwater composition (Eq. 4).

$$2{\text{CO}}_{2} + 2{\text{H}}_{2} {\text{O}} + {\text{CaMg}}\left( {{\text{CO}}_{3} } \right)_{2} \to {\text{Ca}}^{2 + } + {\text{Mg}}^{2 + } + 4{\text{HCO}}_{3}^{ - }$$
(4)

Lastly, PC 4, with 11.6% of the total variance, contained the remaining variables (Temp, HCO3, DO), which made a smaller contribution to the groundwater composition during the pre-monsoon. Shallow groundwater might be affected by the temperature related to warming and cooling at the surface (Nelson 2002). As the temperature changes seasonally, with the rise and fall in the water tables, or variation in recharge rate, the chemical state will change, and, as a result, so does the composition of the groundwater.

The PCA results for the post-monsoon season revealed that the three component factors had eigenvalues greater than 1, which explained around 78% of the total variance. PC 1, explained 43.4% of the total variance, which had strong PC loadings on TDS, EC, salinity, pH, Eh, Mg, Ca, Na and HCO3. Besides the physical parameters that explained the dissolved substances and redox potential activities, other elements represented the contribution of carbonate minerals (Ca, Mg, HCO3) in the groundwater composition. The water–rock interaction might take place due to the high amount of precipitation infiltrated into the ground level by explaining Eqs. 1 and 4. In addition, the cation exchange process acted as the primary chemical reaction that shifted the water type from Na–HCO3 (pre-monsoon) to Ca–HCO3 (post-monsoon) and vice versa as can be seen in Eq. 2 and Fig. 5.

Fig. 5
figure 5

Dendrogram showing the hierarchical clustering of the monitoring site according to Ward’s method (Ward 1963) with the Euclidean distance (pre-monsoon)

PC 2 revealed that approximately 20.9% of the total variance exhibited significant loading on K, temperature and Cl. As mentioned before, the K concentration might be derived from natural or man-made sources. However, the findings showed that temperature is an important factor in the post-monsoon season compared to the pre-monsoon. Moreover, the Cl concentrations decreased during the post-monsoon, as stated in the descriptive analysis section; hence, the significant level of contribution also decreased.

PC 3 extracted 15.6% of the total variance comprising SO4 and DO. DO correlated with the Eh values (p < 0.01) explaining the facilitated O2 by bacteria in the groundwater. The consumed O2 by bacteria would reduce the quality of most groundwater sources. Sulfate could occur due to both natural and anthropogenic sources. Anthropogenic sources could include coal mines (Miao et al. 2012), sulfate-bearing fertilizers (Han et al. 2016) and industrial sewage. The decay of plants and animals produces these salts. Industrial waste water (Kumar 2014), household waste water, runoff from a hazardous waste site or naturally decaying material can put sulfates into waterways, rivers, lakes and streams. Wastes that contain sulfates seep through soil and contaminate groundwater. Multiple natural process factors could be involved as the SO4 sources in groundwater. In the present study, SO4 concentrations are from the natural sources. Reduced forms of sulfur are oxidized to sulfate in the presence of oxygen (Worthington and Ford 1995). Others include hydrogen sulfide reduction (Isa et al. 2012), sulfate mineral dissolution (Singh et al. 2017), oxidation of sulfide minerals (Miao et al. 2014), dissolution of evaporites and dilution of sodium-rich marine clays (Salem et al. 2016; Jiang et al. 2013) and seawater intrusion.

Hierarchy cluster analysis (HCA)

The spatial distribution of groundwater quality was tested using HCA. The hydrochemical data were classified by HCA into 14 dimensional spaces (temperature, pH, salinity, EC, TDS, Eh, DO, Ca, Mg, Na, K, HCO3, Cl and SO4), and the results are presented as a dendrogram (Fig. 5). Four preliminary groups were selected based on the visual examination of the dendrogram, each representing hydrochemical facies, which had similar characteristic features and a natural background that was affected by similar sources. The groundwater in the studied area could be classified into two sub-clusters as Ca-rich water (box I) and Na-rich water (box II). C 1 encompassed stations KW 1 and KW 5 and was made up 32% of the groundwater samples. C 2 comprised station KW 2, while C 4 consisted of station KW 6 with 12 and 17% of the groundwater samples, respectively. This type of water was relatively characterized by the mixed water Na–Ca–HCO3 caused by various sources either by man-made over-abstraction or natural geological formation. C 3, which falls in box I, had higher dissimilarities compared to that of box II, which represented stations KW 3 and KW 4 and was categorized as freshwater Ca–HCO3 with 39% of the groundwater samples. This group was basically carbonate dominated. The dendrogram obtained (Fig. 5) also confirmed the elemental grouping made by the PCA in the previous subsection, which explained the seawater disturbance (box II) and the domination of the carbonate minerals (box I).

Since the groundwater is in a lens shape, so, the location of the station (Table 2) postulated a significant change between sampling point. For an understandable view, Fig. 6 shows the location for determining the groundwater hydrochemistry. Stations KW 5 and KW 6 probably describe the interference of the approaching transition zone polluted by brackish water. Meanwhile, stations KW 1 and KW 2 are deeper from others (Table 2) suggested in anaerobic condition which promotes the SO4 reduction processes. So, stations KW 3 and KW 4 are located in the middle; they are secured and basically away from pollution sources (present study), classified as Ca-rich water (freshwater).

Fig. 6
figure 6

Cross section of sampling station in Kapas Island. The groundwater extraction during pre-monsoon was narrowed by the freshwater lens (1). Meanwhile, the opposite event reveals the widening of the aquifer storage during post-monsoon (2)

Compared to post-monsoon, the clustering processes changed due to the various hydrochemical processes. The values of dissimilarity in Fig. 7 decreased, showing the groundwater composition for each of the monitoring boreholes approaching to a similar hydrochemical type. The monitoring borehole of KW 1 indicated that 67% of the groundwater samples were isolated from other monitoring boreholes, with a dissimilarity of ~80,000. Since the hydrochemical facies of groundwater was Ca–HCO3 (during post-monsoon), the dissimilarity was believed to be from the mineralogical structure. Even though the classification was different, there was no significant effect between the monitoring boreholes. The difference in the dissimilarity in the post-monsoon (Fig. 7) was assumed to be very low compared to the dissimilarity in the pre-monsoon, which had dissimilarity values ~113,000. Other monitoring boreholes (KW 2, KW 3, KW 4, KW 5 and KW 6) with 89% of the total groundwater samples were grouped in a similar hydrochemical facies due to the hydrochemical mechanism of cation exchange processes, which were discussed previously.

Fig. 7
figure 7

Dendrogram showing the hierarchical clustering of the monitoring site according to Ward’s method (Ward 1963) with the Euclidean distance (post-monsoon)

In relations to the PCA outputs, the PC 1 from pre-monsoon was dominated by seawater elements, Na and Cl. This could be justified by C 1, 2 and 4 since the box was a Na-rich group (Fig. 5). Meanwhile, the box with Ca-rich (Fig. 5) explained the freshwater type as in PC 3 where the carbonate minerals were grouped.

On the other hand, PC 1 from the post-monsoon, which was dominated by carbonate minerals (Ca, Mg, Na, HCO3), clearly explained the freshwater type in Fig. 6. Low dissimilarities were expected from different mineralogical structures.

Overall, the significant factors from the tiered multivariate as well as the ionic ratio indicated that the groundwater aquifer was controlled by natural processes, either mineralization or salinization. Although the pre-monsoon undergoes a slight salinization process, it was not considered to be a serious case in that the significance of such elements shows a low risk.

Discriminant analysis (DA)

DA was performed on 14 variables, which included the major ions and in situ parameters representing the dependent group of different monsoon seasons (temporal distribution) to determine the most significant variables that explained the results of the groundwater hydrochemistry. DA was used as confirmatory data since the output should be linear to the PCA results. As PCA simplified the data based on the most influencing to less influencing factors, DA would reduce the data that do not contribute to the changes in the groundwater hydrochemistry. The standard, backward and forward models were applied to measure the correction of the classification which was used to exclude more unrelated variables.

Both pre- and post-monsoon yielded over 70 and 80% of the correct predictions, respectively (Table 6). Pre-monsoon encountered 11 variables (temperature, pH, salinity, Eh, Ca, Mg, Na, K, HCO3, Cl and SO4), while post-monsoon described 10 variables (pH, Eh, DO, Ca, Mg, Na, K, HCO3, Cl and SO4) that were the most significant parameters to discriminate between seasons. The physical parameters—TDS, EC, salinity, DO and temperature—were the only variables that were excluded, as they were shown to be insignificant. This means that the parameters did not contribute significantly toward the groundwater hydrochemistry as certain models (with no values) detected >0.1 of significant values (Farmaki et al. 2012). Based on the results, Ca, Mg, HCO3, Na and Cl were considered to be controlling factors, which determine the groundwater quality in both the pre- and post-monsoon seasons. The DA explanation also acted as supporting data for the previous subsections (PCA) where the seawater (Na and Cl) and mineral (Ca and Mg) elements contributed to identifying the groundwater composition.

Table 6 Classification function for the discriminant analysis (DA) of parameters

Based on a previous study done by Lu et al. (2012), one of the factors contributing to saline water is the entrapped seawater and the higher concentration of As in the coastal area. For FA, factor 1 was dominated by Mg, Cl, Na, K and SO4, which are the components of seawater and significantly correlated with each other. These findings in general supported the present study as PC 1 and PC 2 (pre-monsoon) had similar components (Cl, Na, SO4 and K) and were described as seawater disturbance.

The cluster analyses done by Lu et al. (2012) and Papaioannou et al. (2010) were adapted to a group of groundwater samples based on its similarity existing among the hydrochemical compositions in the study area. The study areas were then clustered into several groups which can be explained by the factors determined by FA. In combination with the results of FA and CA, salinization and dissolved metal As were the main distinct hydrochemical characteristics of the groundwater at that particular area. Comparable with the present study, the main contributor to identify the cluster in HCA was the characteristic of groundwater hydrochemistry. As in pre-monsoon, the present study has a distinct group between Na-rich and Ca-rich groundwater.

In a research done by Khelfaoui et al. (2013), the discriminant analyses were discussed by groups. The explanation is based on the significant parameters in each group. Group 1 (First classification) which is the natural group has kept its good quality from its neighboring sources of pollution. Meanwhile, G2 and G3 represent the point of industrial area, described as high mineralization and bad quality in organic matters of water. In the present study, DA explained that most of the mineral and seawater components are significant in the contribution of groundwater hydrochemistry which have supported all the finding in PCA and HCA.

There are some restrictions in the present study that limit the output of the findings. Comparison between samples from each different analysis involves different results through the discussion. Therefore, it is not necessary that an indication of certain parameters’ absence will prohibit the relation on groundwater quality issue. For example, metals concentration (Kargar et al. 2012) and the nutrients parameters (Delpla et al. 2009; Bloomfield et al. 2006) were not measured as the main objective was to identify the hydrochemical information of groundwater quality by considering the seasonal variation (pre- and post-monsoon). As recommendation, these parameters will be useful to include in the next research event. Furthermore, knowledge on the groundwater ecosystem together with the man-made activities involved in the island (e.g., agriculture) will give a firm conclusion of the groundwater quality.

Conclusion

The present study revealed the important role of the integration of multivariate statistical analysis and classic hydrochemical methods for a better understanding of the groundwater hydrochemistry in Kapas Island, Terengganu, Malaysia. The application of certain statistical methods such as PCA, DA, and HCA provides a promising approach to reduce data and identify the factors controlling the groundwater hydrochemistry, especially in the seasonal variations. Two different groundwater types—Na–HCO3 (pre-monsoon) and Ca–HCO3 (post-monsoon)—reveal that salinization and mineralization are important factors controlling the groundwater quality in the study area. The findings also justified by PCA showed the salinization process as the first principal component with Na and Cl parameters in the pre-monsoon and the mineralization process as the first principal component with Ca, Mg and HCO3 parameters during the post-monsoon. DA allows the discrimination of the insignificant variables by more than 90% of the correct classification. The cluster analysis (HCA) recognizes two distinctive hydrochemical patterns—Ca-rich and Na-rich water. The classification is based on the spatial distribution where the monitoring boreholes were classified as either Ca-rich or Na-rich water. Based on the present study, it can be that the natural activities of salinization and mineralization processes are the predominant processes influencing the groundwater hydrochemistry in Kapas Island.