1 Introduction

Groundwater is an important water resource especially in small tropical islands. Small tropical islands have limited sources of freshwater, no surface water and fully reliant on rainfall and groundwater recharge. The demand of freshwater in small islands has been increasing due to tourism sector development. However, uncontrolled exploitation and management of groundwater will lead to environmental degradation. Overexploitation of freshwater by pumping well distorts the natural recharge–discharge equilibrium and causes drawdown of the water table or upcoming of the saltwater interface. Greater withdrawals may not only create localized upcoming but also result in reduction of freshwater lens (Rejani et al. 2008; Bonell et al. 1993). Similarly, groundwater resources of small tropical Manukan Island are limited and face seawater intrusion problem. Tourism impacts on Manukan Island made more severe to their limited water resources (Wong 1993). A number tourist to Manukan Island has increased about 46% from 1999 to 2008 and resulted in overextraction to meet the groundwater demand supply (Aris et al. 2009).

Groundwater quality is controlled by factors such as precipitation, evapotranspiration, mineralogy, type of aquifers, climate, topography, and also pumping rates. The combinations of these factors create diverse water types that change groundwater composition spatially and temporally (Cheni and Khemiri 2009; Gholami et al. 2009). Groundwater salinity is a crucial component in groundwater quality. Salinization results from the concentration and precipitation of water-soluble salts such as chlorides in the groundwater (Metternicht 2001). According to Woszczyk et al. (2010) and Gholami et al. (2009), salinity changes more significant locally than temporally. In small islands, seawater intrusion is the main contributor to the increasing groundwater salinity (Rejani et al. 2008). Seawater intrusion causes irreversible deteriorations of groundwater quality in terms of drinking water quality. Drinking water safety is a primary consideration especially in small islands that solely reliable on groundwater. Moreover, high salinity concentration can affect the island ecological balance, plant growth, and environmental degradation. Thus, identification of factors related to groundwater salinity is important for maintaining a good supply of groundwater resource in small islands (Burns et al. 2005).

Different techniques have been used to evaluate groundwater quality such as numerical modeling, hydrochemical analysis, and graphical method (Praveena et al. 2010). Numerical modeling requires large data sets in order to simulate the study area whereas hydrochemical analysis requires more variables to identify the geochemical composition. Graphical methods (Piper and Schoeller and Collins diagrams) only consider major ion constituents hence these methods are not suitable for many environmental studies. Chemometric analysis is also one of the ways to evaluate groundwater quality such as evaluation of salinity. Chemometric analysis is also very useful to investigate spatial and temporal variations (Singh et al. 2005). Chemometric analysis performs quantitatively and qualitatively using powerful methods such as cluster analysis and multiple linear regressions to analyze groundwater quality. The most important point is these methods have been widely applied as unbiased technique in groundwater quality analysis (Kumar and Riyazuddin 2008).

The objectives of present research were to define the effective factors on groundwater salinity using chemometric methods namely cluster analysis (CA) and multiple linear regressions (MLR). These methods act as an accurate tool to fully integrate and better interpret effective factors influence groundwater salinity. Identification of effective factors on groundwater salinity is crucial to understand the groundwater chemistry processes took in the aquifer. This information indeed very useful especially in threatened small tropical islands environment.

2 Materials and methods

2.1 Study area

Manukan Island (5°57′–5°58′N and 115°59′–116°01′E) covers an area of 206,000 m2, with a crescent shape, (Fig. 1). The study area is located in Tunku Abdul Rahman Park, offshore of Kota Kinabalu, Sabah, East Malaysia, on the island of Borneo cover by South China Sea (Fig. 1). The island is enacted under Parks Enactment 1978 and managed by The Sabah Parks Trustees. Almost 80% of the area is covered by dense vegetation in high relief area while the rest of the area in the low lying is developed for tourism activities. Manukan Island has limited water resources options. It has no surface water and fully reliant on rainfall and groundwater recharge. Tourism impacts on small islands such as Manukan Island made more severe to their limited water resources and have resulted in tremendous increased of groundwater extraction to meet the groundwater supply and domestic needs (Aris et al. 2009; Wong 1993). Moreover, all the wells for groundwater extraction are located in the low-lying area of the island (8 wells have been shut down and only 1 well is currently operating). The small area and low elevations lead to very limited water storage in the low-lying area of Manukan Island (Abdullah et al. 2002).

Fig. 1
figure 1

Manukan Island with a crescent shape

2.2 Climate

Manukan Island experiences a tropical climate, which has high temperatures and annual rainfall. The mean annual temperature in Manukan Island is 26.5°C. The area receives annual rainfall between 2,000 and 2,500 mm (Fig. 2) with humidity between 80 and 90%. Annual average evapotranspiration in Manukan Island is about 11 mm/year.

Fig. 2
figure 2

Monthly average rainfall distribution from 1999 to 2009

2.3 Geology

The topography of the island is relatively hilly land with maximum elevations of approximately 60 m in the western and decreasing elevation toward the eastern coast. The island is about one and half a kilometer long and three kilometer wide in the middle. Generally, the profiles at low-lying area are flatter and thinner than hilly area. The regional groundwater flow is mainly toward the South China Sea. The island consists of unconfined sandy aquifer, underlain sedimentary rock (sandstone and shale), and part of the Crocker range rock formation of the western coast of Sabah deposited during Late Eocene to Middle Miocene (Basir et al.1991; Abdullah et al. 2002; Tunku Abdual Rahman 2007). The medium of the aquifer consists of fine to coarse sand mixed with some fine gravel. The parent materials at hilly area of the island are sandstone and mudstone whereas in the low lying of the island is alluvium. Alluvium acts as a sufficient water storage, which solely depends on its thickness. The sedimentary rock forms a slight symmetrical syncline in the low area and can be observed in several locations around the island (Abdullah et al. 2002). The sedimentary rock of Manukan Island dips toward the low-lying area (east–northeast) with dipping angles of 15–45°. Abdullah et al. (2002) conducted a study on the morphological of the island found that the thickness of the aquifer from the ground surface to bedrock are approximately, 5.7 m (northern part), 11 m (southern part) and 12 m (at the middle). The geological structures indicate that the freshwater of Manukan Island mainly depends on recharge water from precipitation.

2.4 Research methodology

In this research, low-lying area of Manukan Island was selected for this study (Fig. 3) since it has been developed for tourism activities, and all the wells for groundwater extraction are located in this area (8 wells have been shut down and only 1 well is currently operating). Local groundwater properties and environmental conditions were used in the identification of effective factors of groundwater salinity. Local groundwater properties consist of in situ parameters namely pH, temperature, oxygen reduction potential (ORP), electrical conductivity (EC), salinity (Sa), and total dissolved solids (TDS) together with hydraulic head and chloride. Moreover, environmental conditions are represented by meteorological data from meteorological station established in the low-lying area of Manukan Island namely precipitation, evapotranspiration, and humidity.

Fig. 3
figure 3

Low-lying area of Manukan Island

A total of nine boreholes were constructed and installed by hand auger manually to a depth of between 1 and 3.5 m in the aquifer. A total of nine boreholes and eight observations wells were selected in the sampling network, which reasonably represent the groundwater quality in the low lying area of study region. Samplings network were done in 2009 from the observation wells and boreholes. Samplings network in 2009 were done in the middle of each month from January to December 2009. A total of 204 groundwater samples (108 boreholes and 96 observation wells groundwater samples) were collected throughout samplings network done in 2009. Prior to water sample collection, hydraulic heads of each borehole and observation well were measured using water level meter using Solinst 101 water table detector. Water samples were collected by using modified dry vacuum pump technique and water sampler for borehole and well sample collection, respectively. First, water sample from borehole/well was collected in measuring bottle for determination of in situ parameters namely pH, temperature, ORP, EC, Sa, and TDS. It is to acquire representative values of current aquifer condition. The selected parameters were measured using devices as showed in Table 1. Next, water samples were collected by a precleaned polyethylene bottles. Polyethylene bottles were cleaned by rising it with tap water followed by soaking with acid wash water (1:10 HNO3) overnight and rinsed with distilled water to eliminate traces precipitated at the surface of the sampling bottles. Three replicates from each borehole were taken during each sampling period to obtain an average value. The polyethylene bottles were transported to the laboratory under low-temperature conditions in icebox and analyzed immediately for chloride determination. Analysis for chloride was carried out using argentometric method (APHA 1995). All the devices were calibrated with buffer solutions before and after the field samplings to ensure that they functioned properly and accurately. Moreover, the devices probes sensors were soaked in distilled water to protect the electrodes before and after field samplings and calibrations. Personal field log book was brought to field samplings to record all the readings as soon as possible to avoid any misplacement of data.

Table 1 List of parameters for in situ detection

Moreover, meteorological data (precipitation and humidity) from the meteorological station were downloaded to a Twinhead laptop each month during samplings in 2009. Evapotranspiration (ET) was calculated by the Penman method, a combination of energy balance and aerodynamic approaches (Stewart and Howell 2003). This formula has a few advantages over the other numerous ET formulas, such as Thornthwaite, The Blaney–Criddle and the van Banel methods. The advantages of the Penman method are that it provides the most comprehensive approach to the estimation of ET, more widely used than any of other alternatives formulas, acceptable for a wide range of climates, and has given satisfactory results in ET estimation elsewhere (Wallacae et al. 1990). According to Bello et al. (2010), various empirical methods of ET estimation have their own limitations. Limitation in Penman method is the radiation measurement for sunshine hours. Penman method uses approximation for radiation measurement for sunshine hours. However, radiation value of sunshine hours varies with latitudes and climates.

2.5 Data analysis

In chemometric methods, normal distribution is preferred for optimal results and reliable interpretations of the results. In order to analyze the effective factors on groundwater salinity, the groundwater samples data were summarized through chemometric methods (CA and MLR) using SPSS Version 17.0 for Windows statistical program. Simenov et al. (2002) mentioned CA is carried out to further classify elements of different sources on the basis of their similarities between the chemical properties. Cluster analysis aims at sorting different objects into same groups based on degree of associations between objects. The purpose of cluster analysis is to discover a system of organizing observations into groups that share similar properties in common. Hierarchical cluster analysis (HCA) using dendograms identifies relatively homogeneous groups of variables in similar properties and combines clusters until only one is left (Bierman et al. 2009). A dendogram distinguishes groups of high similarities that have small distances between clusters whereas dissimilarity between groups is represented by the maximum of all possible distances between clusters. Dendogram is read from left to right whereby vertical lines show joined clusters. The line position on the scale shows distance at which clusters are joined. The observed distances are in the range of 1–25. The first vertical corresponds to the smallest rescaled distance. The next vertical line is at distances for other merges. In the last two steps, fairly dissimilar clusters are combined. Moreover, according to Cheni and Khemiri (2009) and Helena et al. (2000), this approach has been used to extract related variables and infer the processes that control water chemistry and effective in terms of manipulating, interpreting, and representing data concerning groundwater pollutants. Multiple linear regression (MLR) is an extension of correlation that provides equation linking independent variables, Vd (pH, temperature, ORP, EC, TDS, Cl, HH, precipitation, and ET) to the dependent variable, Vi (Salinity) using the following form as stated in Eq. 1. These parameters selected as local changes in salinity are more important than temporal changes in smaller-scale studies (Gholami et al. 2009). Thus, local changes in salinity can be represented by selected parameters (pH, temperature, ORP, EC, TDS, Cl, HH, precipitation, and ET).

$$ V_{d} = \beta_{o} + \beta_{o} V_{i} + \cdots + \beta_{n} V_{n} $$
(1)

when the intercept β o and the regression coefficients of descriptors (β i ) are determined by least square method (Green and Carroll 1996), V i descriptors are used to describe water quality and dependent. (n) is the number of water samples. The reduction in the number of descriptors (variables) was included in the study to minimize the information overlap in variables. The best equation is selected based on the highest multiple correlation coefficients (r), lowest standard deviation (SD), and F-ratio value. Relationships between variables were established using the stepwise regression method (Broto 2006; Bernstein 1988). Stepwise selection is a combination of the forward and backward procedures. It allows the later removal of variables that were previously entered (Coakes et al. 2010). CA is a multivariate analysis technique that seeks information about variables so that similar clusters can be formed. MLR selection process allows the researcher to obtain a reduced set of variables from a larger set of variables, eliminating unnecessary variables, simplifying data, and enhancing predictive accuracy. Thus, these two approaches were used in the search of effective factors on groundwater salinity in the low-lying area of Manukan Island.

3 Results and discussion

3.1 Local groundwater and environmental conditions properties

Data of the local groundwater and environmental conditions properties appeared to follow normal distribution. Figure 4 shows the frequency distribution of several environmental factors (pH, EC and Cl). The properties of the local groundwater and environmental conditions were presented in Tables 2 and 3. Trend between local groundwater properties, hydraulic heads, and chloride shows the influence of saline water toward groundwater quality. Moreover, wide distributions of studied local groundwater properties indicated that the composition was affected by multiple processes, including seawater–freshwater mixing (Aris et al. 2007b). Next, CA was performed on the data set using average linkage between groups (rescaled distance cluster) to study the connections between local groundwater properties and environmental conditions toward the identification of effective factors on groundwater salinity.

Fig. 4
figure 4

Frequency distribution of several environmental factors (pH, EC, and Cl) (SD standard deviation)

Table 2 Mean values of local groundwater properties in the study area
Table 3 Mean values of environmental conditions in the study area

3.2 Cluster analysis output

Figure 5 shows dendrogram of studied local groundwater and environmental properties in the identification of effective factors on groundwater salinity. According to Banoeng-Yakubo et al. (2009) and Yidana et al. (2008), decision on the number of clusters to use in an HCA is a subjective manner. It is done based on the knowledge of the researcher and also on the subject matter being investigated. In this study, the differentiation was based on the knowledge of current groundwater data as well as on previous groundwater findings (Aris et al. 2009) of low lying area in Manukan Island.

Fig. 5
figure 5

Dendogram showing hierarchical cluster analysis between local groundwater and environmental condition properties (Temp temperature, ORP oxidation reduction potential, EC electrical conductivity, TDS total dissolved solids, Cl chloride, HH hydraulic head, Prec precipitation, Evapo Evapotranspiration, Hum humidity)

From groundwater samples’ dendogram (Fig. 5), salinity and chloride exhibit the highest similarities and eventually assembled into the same cluster with electrical conductivity and total dissolved solids. The salinity and chloride association also give effects toward electrical conductivity and total dissolved solids. This is also supported by study carried out by Aris et al. (2007a, b) where positive correlations were found between salinity, chloride, electrical conductivity, and total dissolved solids. This cluster also explains that the salinity of groundwater is an indication of seawater mixing in freshwater. Seawater is the only chloride source in groundwater of Manukan Island. Thus, chloride concentration in groundwater samples is an effect of upward movement of the seawater by pumping activities. Environmental condition (precipitation and ET) was clusters together to show that they also influence the salinity concentration in groundwater. According to Habib et al. (2007), rainfall plays an important role in the salinity percentages in groundwater. Precipitation has a greater effect in diluting the salt in groundwater. Thus, high precipitation tends to decrease the salinity concentration in groundwater (Kresic 2006). Strong evaporation in areas with shallow water tables may also lead to salinization (Vigna et al. 2009). In terms of hydraulic head, Kim et al. (2000) illustrated that hydraulic heads showed a relatively higher and prompt response to rainfall. It also influences the hydrochemical characteristics of groundwater. The clusters showed that humidity plays a smaller role in the recognition of effective factors in groundwater salinity. This is because humidity is the only environmental factor that showed maximum distance compared with other environmental condition (precipitation and ET) clusters.

3.3 Multiple linear regressions output

MLR analysis was conducted to investigate the relationships between salinity, local groundwater, and environmental properties. MLR method was used to select the best model to identify the effective factors on groundwater salinity. In the selection of best model, an analysis of multiple R coefficients and r 2 values were studied. The multiple R coefficients indicated correlations between local groundwater properties, environmental conditions, and salinity ranged from 0.951 to 0.995. Among all the possible equations, equation where r 2 values closer to 1 was selected. The model was also checked for any multicollinearity effect via variance inflation factor (VIF). The VIF value was closer to one (VIF = 0.989) thus there is no any multicollinearity effect (Alin 2010). Table 4 presents the MLR output predicting the effective factors on groundwater salinity. The output showed that local groundwater properties and environmental properties were found significant (p < 0.00) as effective factors on groundwater salinity in Manukan Island. From the results obtained, descending order of factors from Cl to evapotranspiration illustrates the contribution of groundwater salinity in Manukan Island. Positive/negative sign of beta coefficients (ß) and t-values (t value) in Table 4 indicated that there is positive/negative relationships between salinity and elements of local groundwater and environmental properties. The relative importance of the estimation of a dependent variable, Cl, made the largest contribution toward groundwater salinity with the highest ß-value (Table 4). This is also supported by CA output where salinity and chloride exhibit the highest similarities. Thus, the selected equation for identification of effective factors on groundwater salinity in Manukan Island is as followed in Eq. 2. Regression coefficient value (0.035) can be interpreted as the predicted salinity value if all the other effective factors (chloride, total dissolved solids, electrical conductivity, precipitation, hydraulic head, and ET) are set to be zero value. Thus, 0.035 value represents the concentration of salinity in seawater. Thus, the obtained model can also be applied to classify groundwater in other small islands of similar hydrogeological condition with limited information.

Table 4 Summary for the linear regressions predicting groundwater salinity
$$ {\text{Sa}} = 0.035 + 0.543{\text{Cl}}({\text{mg}}/{\text{L}}) + 0.283{\text{TDS}}({\text{mg}}/{\text{L}}) + 0.281{\text{EC}}(\mu {\text{S}}/{\text{cm}}) - 0.247{\text{Prec}}({\text{mm}}) - 0.141{\text{HH}}({\text{m}}) + 0.143{\text{Evapo}}({\text{mm}}) + \varepsilon $$
(2)

where ε is the error of estimation in the regression model.

4 Conclusion

Groundwater of Manukan Island was investigated to determine the effective factors on groundwater salinity using chemometric methods. CA was used to analyze the similarities between homogeneous groups of variables in similar properties. The CA output showed salinity and chloride exhibit the highest similarities together with EC and TDS. Precipitation and ET in environmental condition properties were clustered together, which also influence the groundwater salinity. Evaluation of relationships between salinity, local groundwater and environmental properties was made using MLR. Estimation of effective factors on groundwater salinity showed Cl is the largest contributor. The only contributor of Cl in groundwater of Manukan Island is from seawater. The pumping activities are believed to be the contributor of seawater and freshwater mixing in groundwater of Manukan Island. The integration of chemometric methods (cluster analysis and multiple linear regression) to evaluate the effective factors of groundwater salinity showed that groundwater salinity is related to local groundwater properties (Cl, TDS, and EC) and environmental conditions (Prec, HH, and evapotranspiration). It is important to be noted that ET estimation (Penman method) uses approximation in radiation measurement for sunshine hours. Availability of new instruments for radiation measurement for sunshine hours will produce more reliable results in ET values in the study area. This will also influence CA or MLR results in the estimation of effective factors on groundwater salinity. Despite the limitation explained, proposed methodology allows the incorporation of local groundwater and environmental properties into a statistical model to determine the effective factors of groundwater salinity. It is an important part in order to understand the groundwater chemistry. This applied methodology and resulting equation is useful especially in similar small tropical islands hydrogeological condition with limited information available to estimate salinity of groundwater in those study sites.