Introduction

There is worldwide water quality deterioration primarily contributable to growing human populations and economical development, particularly elevating nutrients leading to eutrophication and heavy metal pollution in the aquatic environment (Krishna et al. 2009; Nriagu and Pacyna 1988; Peierls et al. 1998; Holloway et al. 1998; Li et al. 2008; Pekey et al. 2004). The natural sources of the metals include volcanism, bedrock erosion, atmospheric transport and the release from plants (Krishna et al. 2009; Pekey et al. 2004) and anthropogenic activities; particularly, mining and mineral processing have dominant influences on the biogeochemical cycles of trace metals (Krishna et al. 2009; Nriagu and Pacyna 1988; Nriagu 1989, 1996). Heavy metal pollution leads to serious human health hazards through the food chain and the loss of biodiversity and harms the environmental quality. Recent researches into trace elements and heavy metals show highly interesting records (Zhang et al. 2009). Of which, their spatial variability reflects geological parent materials and anthropogenic sources in geographic heterogeneity (Imperato et al. 2003).

A number of significant studies are available in the literature on heavy metal pollution in water sources. Such works include Brown-Adiuku and Ogezi (1991), Edet and Ntekim (1996), Xibao et al. (1996), Yang et al. (1996), Yiping and Min (1996), Zhongyi (1996). All these studies concluded that there was a need to monitor and assess the water quality on a regular basis. This is due to the increase in concentration of heavy metals in potable water, which increase the threat to health and environment. Also, few methods exist in literature on the development and application of index methods for water quality assessment (Venkata Mohan et al. 1996; Prasad and Jaiprakas 1999; Edet and Offiong 2002; Bhuiyan et al. 2010).

The groundwater potential is limited to some pockets of the coastal region in Malaysia and is generally exploited by rural people to supplement their piped water supply. In Miri City, the surface water is readily available throughout the year and mainly utilized for irrigation and domestic uses. Surface water represents 97 % of the total water use, while groundwater represents 3 %. The proposed study area (Miri) is surrounded by the coastal region, industrial and agricultural areas, squatted colonies and commercial areas. The increase in human population and economic activities in this region has grown in scale; the demand for large-scale supplies of freshwater from various competing end users has increased. Declining quality and quantity of water supply of the area can be attributed to the water pollution and improper management of the existing resource.

A comparative assessment of toxic heavy metals is important for determining the degree of pollution in the environment. However, interpretation of data sets comprising analyses of numerous metals is complicated. One approach of simplifying multivariate data is to generate and use a single value, which may subsequently be used for comparative purposes (Miyai et al. 1985; Nimic and Moore 1991). Methods of integrating numerous variables associated with water quality in a specific index are increasingly desired in national and international scenarios. Therefore, several researchers have developed various indices, technically referred to as water quality indices (WQIs) (Lermontov et al. 2009). Usually, water quality index (WQI) is a practical and comparatively simple approach of evaluating the composite influence of overall pollution and hardly provides evidences of the pollution sources. The pollution indices are proposed to provide a useful and comprehensible guiding tool for water quality executives, environmental managers, decision makers and potential users of a given water system. The WQI was initially developed in the USA by Horton (1965) and has been widely used in Africa and Asia (Shoji et al. 1966; Handa 1981; Erondu and Nduka 1993; Palupi et al. 1995; Li et al. 2009). Statistical approaches, particularly multivariate techniques, are competent for resolving this deficit of WQI and are useful for environmental data reduction and interpretation of multiple elements. Principal components analysis (PCA) and cluster analysis (CA) have been considered as a more trustworthy approach for data mining of matrices from environmental quality assessment (Astel et al. 2007, 2008; Simeonova and Simeonov 2007). However, PCA and CA are widely used in water quality assessment.

Hence, the present study evaluates the heavy metal concentration in the surface water of Curtin Lake, Miri, East Malaysia. Pollution indices and multivariate approaches (PCA and CA) are used to identify the pollution status and probable sources of pollutants in the lake. The present study has been conducted by comparative evaluations of heavy metal pollution index (HPI), heavy metal evaluation index (HEI) and degree of contamination (C d), which have been successfully used by many researchers (Mohan et al. 1996; Prasad and Jaiprakas 1999; Prasad and Bose 2001; Teng et al. 2004; Prasad and Mondal 2008; Offiong and Edet 1998a, b; Edet and Offiong 2002; Rapant et al. 1999).

Study area

The proposed study area (Fig. 1), Curtin Lake, is located in Curtin University, Sarawak Campus of Miri City, Sarawak State in the east, on the island of Borneo, Malaysia. Sarawak is generally mountainous with the highest range forming the border with Indonesia. The areas of Miri are characterized by a plateau, where young alluvial sediments overlay the folded and monoclinally dipping Late Miocene to Pliocene Lambir and Tukau clastics. The rocks exposed around the Miri City belong to the Middle Miocene Miri formation. Stratigraphically, the rocks belong to the Miri formation; a stack of deltaic cycles forming a layered clay-sand sequence (85 % sand and 15 % clay) with laterally discontinuous sand bodies. The Miri formation [divided as Upper (mostly sand) and Lower part (well-defined beds of shale inter-bedded with sandstones)] is predominantly arenaceous with clay and shale restricted mainly to the lower part. The base of the formation is a gradual transition from the argillaceous Setap shale to the sandy Miri formation (Hutchison 2005). The climate is governed by the regime of the northeast and southwest monsoons. The northeast monsoon blows from October to March, and is responsible for the heavy rains which hit the east coast of the peninsula and frequently cause widespread floods. It also causes the wettest season in the Sarawak State. The southwest monsoon period occurs between May and September and is a drier period. The period between these two monsoons is marked by heavy rainfall. The average temperature throughout the year is very stable (26 °C). In general, Sarawak State experiences more rainfall (3,000–4,000 mm) than the Peninsula. The humidity is high (80 %) due to the high evaporation rate. Out of an annual rainfall volume of 990, 360 km3 is lost to evapotranspiration. The total surface runoff is 566 km3, and about 64 km3 (7 % of the total annual rainfall) contributes to groundwater recharge. However, about 80 % of the groundwater flow returns to the rivers and is therefore not considered an additional resource.

Fig. 1
figure 1

Study area and sample location map

Methodology

Sample collection, and physicochemical and elemental analyses

A total of 25 surface water samples were collected (Fig. 1) in Curtin Lake. The water samples collected below the water surface using 200 ml polyethylene bottles. Prior to sampling, the bottles were rinsed with the water to be sampled and the samples were preserved by acidifying to pH ~ 2 with HNO3 and kept at a temperature of 4 °C until analysis. pH and electrical conductivity measurements were performed in situ with a portable meter. The collected water samples were filtered using a pre-conditioned plastic Millipore filter unit equipped with a 0.45-µm filter membrane for further elemental analysis. The elements (Ca, Mg, Na, K, Al, Ba, Cu, Fe, Ga, In, Li, Mn, Ni, Pb, Rb, Se, Sr, V, U and Zn) were analyzed using inductively coupled plasma-optical emission spectrometer (ICP-OES) Optima 5000 DV Series (Perkin Elmer). It comes with WinLab32 Software which optimizes the work flow and accuracy. Appropriate quality control/quality assurance samples were collected to provide confidence in the data regarding bias and variability. No replicates were analyzed for these samples. Equipment blank was used to test for bias from possible contamination of blank water, which consists of distilled water. This is to verify that decontamination procedures and laboratory protocols are adequate (Koterba et al. 1995).

Pollution evaluation indices

Generally, pollution indices are estimated for a specific use of the water under consideration. The indices used in this study, namely heavy metal pollution index (HPI), heavy metal evaluation index (HEI) and degree of contamination (C d), are determined for the purpose of evaluating drinking and agriculture water quality. The HPI and HEI methods provide an overall quality of the water with regard to heavy metals. These methods are evaluated using the ratios of monitored values of the desired number of parameters and the maximum admissible concentrations of the respective parameters. In the C d method, the quality of water is evaluated by computation of the extent of contamination. The C d is calculated independently for every sample of water analyzed, and is computed as the sum of the contamination factors of each component exceeding the upper permissible limit. Therefore, the C d summarizes the combined effects of a number of quality parameters regarded as unsafe to household water.

Heavy metal pollution index

HPI index was developed by assigning a rating or weightage (W i ) for each chosen parameter. The rating is an arbitrary value between 0 and 1 and its selection reflects the relative importance of individual quality considerations. It can be defined as inversely proportional to the standard permissible value (S i ) for each parameter (Horton 1965; Mohan et al. 1996; Reddy 1995). In this present study, the concentration limits (i.e., the standard permissible value (S i ) and highest desirable value (I i ) for each parameter) were taken from the WHO standard. The uppermost permissive value for drinking water (S i ) refers to the maximum allowable concentration in drinking water in the absence of any alternate water source. The desirable maximum value (I i ) indicates the standard limits for the same parameters in drinking water (Bhuiyan et al. 2010).

The HPI, assigning a rating or weightage (W i ) for each selected parameter, is determined using the expression below (Mohan et al. 1996):

$$ {\text{HPI}} = \frac{{\sum\nolimits_{i = 1}^{n} {\;W_{i} } \,Q_{i} }}{{\sum\nolimits_{i = 1}^{n} {\;W_{i} } }} $$

where Q i and W i are the sub-index and unit weight of the ith parameter, respectively, and n is the number of parameters considered. The sub-index (Q i ) is calculated by

$$ Q_{i} = \sum\limits_{i = 1}^{n} {\frac{{\left\{ {M_{i} ( - )\;l_{i} } \right\}}}{{S_{i} - l_{i} }}} \times 100 $$

where M i , I i and S i are the monitored heavy metal, ideal and standard values of the ith parameter, respectively. The sign (−) indicates numerical difference of the two values, ignoring the algebraic sign.

Heavy metal evaluation index

HEI gives an overall quality of the water with respect to heavy metals (Edet and Offiong 2002) and is expressed as:

$$ {\text{HEI}} = \sum\limits_{i = 1}^{n} {\frac{{H_{\text{c}} }}{{H_{\text{mac}} }}} $$

where H c and H mac are the monitored value and maximum admissible concentration (MAC) of the ith parameter, respectively.

Degree of contamination (C d)

The contamination index (C d) summarizes the combined effects of several quality parameters considered harmful to domestic water (Backman et al. 1997) and is calculated as follows:

$$ C_{\text{d}} = \sum\limits_{i = 1}^{n} {C_{{{\text{f}}_{i} }} } $$

where

$$ C_{{{\text{f}}_{i} }} \; = \;\frac{{C_{{{\text{A}}_{i} }} }}{{C_{{{\text{N}}_{i} }} }}\, - 1 $$

where \( C_{{{\text{f}}_{i} }} , \) \( C_{{{\text{A}}_{i} }} \) and \( C_{{{\text{N}}_{i} }} \) represent contamination factor, analytical value and upper permissible concentration of the ith component, respectively. N denotes the ‘normative value’ and \( C_{{{\text{N}}_{i} }} \) is taken as MAC.

Statistical analysis

The analytical data were subjected to statistical analysis using SPSS software (version 9.0 for Windows). Principal component analysis was used to identify the possible sources of heavy metals. Factor analysis was performed by varimax rotation (Howitt and Cramer 2005), which minimized the number of variables with a high loading on each component, thus facilitating the interpretation of PCA results. Cluster analysis was applied to identify groups of samples with similar heavy metal contents (Panda et al. 2006). CA was formulated according to the Ward-algorithmic method, and the rescaled linkage distance was employed for measuring the distance between clusters of similar metal contents. R-mode CA was used to determine the association of different water quality parameters and pollutant sources. Pearson’s correlation matrix was also used to identify the elements’ relationship.

Results and discussion

Water quality and classification

The physicochemical parameters and total metal concentrations of surface water are shown in Table 1. The electrical conductivity (EC) varied from 330 to 470 µS/cm with a mean of 378.80 µS/cm. The range, mean and standard deviation (SD) of pH are 6.46–8.27, 6.74 and 0.40. The mean metal levels in surface water followed a descending order as: Fe > Al > Ba > Sr > U > Rb > Mn > Se > Li > V > Zn > In > Pb > Cu > Ni > Ga. The descriptive statistics including maximum admissible concentration (MAC) and world standards are given in Table 2. The concentration of Cu, Mn, Ni and Zn are below the MAC values. The concentration of Fe (1607.8–1946.83 µg/l) in all the samples is higher than the MAC of 200 µg/l, 40 % of the samples show Pb in excess of 1.50 µg/l and 28 % of the samples show Se in excess of 10 µg/l. The method of Ficklin et al. (1992), modified by Caboi et al. (1999), was employed for water classification. Figure 2 shows the relationship between total metal contents (mg/l) and pH for the samples. The metal load was computed as Al + Ba + Cu + Fe + Ga + In + Li + Mn + Ni + Pb + Rb + Se + Sr + V + U + Zn (mg/l) and all the samples plot in the field of near-neutral high metal.

Table 1 Chemical composition of surface water collected in the lake (all values in µg/l except EC in µS/cm and pH)
Table 2 Descriptive statistics for elements
Fig. 2
figure 2

Classification of water samples based on the plot of metal load and pH

Shuhaimi-Othman et al. (2008) reported that the mean metal concentration in surface water of Chini Lake, Peninsular Malaysia was low and within the range of natural background values except for Fe and Al. Aqeel Ashraf et al. (2010) also reported that high nutrient load and concentration of metals, especially mercury in Varsity Lake, West Malaysia. Results from this study indicate that the mean metal concentration of Curtin Lake was high as compared to Chini Lake except for Pb and Zn (Table 3). The concentration of Se is also higher in Curtin Lake as compared to other lakes worldwide (e.g., Masresha Alemayehu et al. 2011; Markert et al. 1997; Singanan et al. 2008) except for Manchar Lake, Pakistan (Kazi et al. 2009).

Table 3 Comparison of physico-chemical parameters and heavy metals in lake water samples of the present study with other areas around the world and the Malaysian region

Pollution evaluation indices

The results of pollution evaluation indices are presented in Table 4. The heavy metal pollution index of all the sampling points have been calculated individually using the international standards (Edet and Offiong 2002) and is represented by HPI, respectively. The range and mean values of HPI were 2.86–7.92 and 4.84. The results of indices showed that the HPI for all the samples were below the critical limit of 100 proposed for drinking water by Prasad and Bose (2001). The heavy metal pollution index calculated with mean concentration values of all metals, including all sampling points is 4.84, which is also well below the critical limit of 100.

Table 4 Water pollution indices

The degree of contamination (C d) was used as reference of estimating the extent of metal pollution (Al-Ami et al. 1987). The range and mean values of C d were 2.57–6.11 and 4.01. C d may be classified into three categories (Edet and Offiong 2002; Backman et al. 1997) as follows: low (C d < 1), medium (C d = 1–3) and high (C d > 3). All the samples exceed 3, suggesting that they are highly polluted. On the contrary, the HPI values for all the locations are lower than 100, the critical value (Prasad and Bose 2001) for drinking water.

The heavy metal evaluation index (Edet and Offiong 2002) was used for a better understanding of the pollution indices. The HEI values ranged from 8.57 to 12.11 with a mean value of 10.01. The mean deviations and percentage deviation for all the indices were computed for each sampling point (Table 4); 48 % of the samples (S6, S8–S15, S21, S24 and S25) fell below the respective mean values of HEI. Interestingly, C d, and HPI values of these same samples were below the respective mean value of the indices. These values of C d and HPI which fall below their respective mean values and their corresponding negative percent deviations suggest relatively better quality as observed by Prasad and Bose (2001), Edet and Offiong (2002).

C d, HPI and HEI values show similar trends at various sampling points (Fig. 3) and also significant correlations are observed among the values (Table 5). However, there are some differences between the results of C d and HPI regarding the water quality of the analyzed samples. Therefore, HEI has been used to synchronize the criteria for various pollution indices. By following the approach of Edet and Offiong (2002), the calculated HEI values have been classified in terms of pollution levels as low, medium and high. Different HEI criteria values have been developed for the samples, guided by their respective mean values, and the different levels of contamination are demarcated by a multiple of the mean values. Therefore, the proposed HEI criteria for the samples are as follows: low (HEI < 10), medium (HEI = 10–20) and high (>20). The present level shows that 48 % of samples are within the low zone, while 52 % fall within the medium zone.

Fig. 3
figure 3

Spatial distribution of pollution evaluation indices

Table 5 Correlation coefficients for metal concentrations and indices values

The existing water quality schemes for HPI and C d have also been modified following the mean approach of HEI, and the results are presented in Table 6. Thus for the C d, 48 and 52 % of the samples, respectively, are classed as low and medium zones. For the HPI, 56 and 44 % are classified as low and medium contamination.

Table 6 Classification of surface water quality of the lake on modified categories of pollution indices

For examine the contribution of the key metals to the computed indices, correlation was performed between the indices (C d, HPI and HEI) and heavy metal concentrations. From the analysis, Fe, Pb, Li and U show significant correlations with all the indices, suggesting that these metals are the major contributory parameters (Table 5), where Li and U are not considered in the indices calculation due to lack of standard values. Significant correlations are also observed among the values of HPI, HEI and C d.

The HEI and reclassification schemes of HPI and C d show comparable results, and the HEI method may be used as the simple criteria of assessing the quality of water in the lake. Thus, samples S6, S8–S15, S21, S24 and S25 may be considered as less contaminated, whereas S1–S5, S7, S16–S20, S22 and S23 are moderately contaminated by metal pollutants.

Pollution source identification

The principal component analysis was used to further explore the extent of metal pollution and for source identification (Dragovíc et al. 2008; Franco-Uría et al. 2009). Varimax rotation method (Gotelli and Ellison 2004) was used to maximize the sum of the variance of the factor coefficients, which better explained the possible groups/sources that influenced the water system.

Five factors were extracted for surface water data set with eigenvalues >1. The calculated factor loadings, together with cumulative percentage and percentages of variance explained by each factor, are listed in Table 7. The factors in samples led to reductions of the initial dimensions of the data set, which explained about 76 % of the total variance. The calculated factor scores for all the samples are presented in Table 7. Positive scores in PCA indicate that water samples are affected by the presence of the parameters that are significantly loaded on a specific factor, whereas negative scores suggest that water quality is essentially unaffected by those parameters (Prasanna et al. 2010)

Table 7 Varimax rotated principal component analysis for the samples

PC1, PC2, PC3, PC4 and PC5 explain about 21, 18, 17, 10 and 8 % of the total variance, respectively. PC1 is highly loaded on EC, pH, Ba, Ga, In and Li, which are mostly distributed in S1, S2, S4, S8, S9 and S12. PC1 explains leaching of materials from the soil surfaces or in the sediment of water soils. PC2 is loaded on Al, Fe, Mn, Li and U and could represent a geogenic source component. Fe and Mn could be released by leaching of parent materials from the soil horizon to the water. The solubility of Fe and Mn minerals is strongly redox controlled, particularly at near-neutral pH (Lorite-Herrera et al. 2008). The dissolved Al concentration in surface water is controlled by the solubility of primary silicate weathering products such as kaolinite or illite (Prasanna et al. 2010). These parameters are importantly distributed in S1, S5, S15, S16, S18, S20 and S21, since these samples retain high scores for PC2 (Table 6). PC3 is mostly contributed by Cu, Ni, Se and Zn, representing atmospheric input in the lake water originating from the surrounding industrialized regions (Steinnes and Henriksen 1993; Hanssen et al. 1980), which are significantly distributed in S1, S3, S13 and S14. PC4 is loaded on Pb only, which occurs as an important parameter in S2–S6, S16 and S23. Pb occurs naturally in the environment as geochemical alteration of sulfide minerals. However, most of the Pb concentrations are also found as a result of anthropogenic activities such as automobile exhaust (Bhuiyan et al. 2010). PC5 is loaded on Sr and could be ascribed to geochemical alteration/weathering of sulfate minerals present in the sediment horizon. PC5 shows high scores for S4, S5, S7, S9, S20 and S21, which suggest that Sr is important indicator in these samples.

R-mode cluster analysis was also performed to understand the physicochemical and elemental groupings in the data set and the results are presented in Fig. 4. Parameters belonging to the same cluster are likely to have originated from a common source. The R-mode CA performed on the samples produced four clusters. Cluster 1 includes pH, EC, Ba, Li, Ga and In; cluster 2 consists of Cu, Zn, Ni and Se; cluster 3 contains Fe, Mn and U; cluster 4 includes Pb, Sr, Al, V and Rb. It reflects the influence of natural hydrogeochemical processes (leaching of materials from the soils) and minor anthropogenic input. Even though there are some differences between the CA and PCA results, a good agreement between the two statistical techniques is evident in all the data sets analyzed.

Fig. 4
figure 4

Dendrogram obtained by hierarchical clustering analysis for parameters

Correlation matrix (CM)

The Pearson’s correlation coefficient matrices for the analyzed parameters are presented in Table 8. The inter-parameters relationships support the results obtained from PCA, and the CM has also been useful in revealing some new association of metals that have not been properly stated in the PCA analysis. A significant correlation has been observed in the samples. pH significantly correlates with Ba (r = 0.94), Cu (r = 0.70), Ga (r = 0.63), Li (r = 0.81) and Ni (r = 0.60). EC also shows significant correlations with Ba (r = 0.52) and Li (r = 0.66). These results are similar to that of PC1 in the previous section. Metal pairs Ba–Li, Ba–Cu, Ba–Ga and Ba–Ni correlate significantly with respective correlation coefficient (r) values of 0.74, 0.60, 0.54 and 0.50, respectively, indicating a similar source of PC1. Fe correlates with Li (r = 0.62) and Mn (r = 0.69) similar to sources reported for PC2 in the PCA analysis of the samples. A significant correlation exists between Cu and Ni (r = 0.60), Zn (r = 0.76) and Se (0.51), matching with PC3 described in the previous section.

Table 8 Correlation matrix between elements

Irrigation water quality

The suitability of water for irrigation is conditional on the effects of mineral constituents of water on both the plant and soil. Excess amount of dissolved ions in irrigation water affects plants and agricultural soil physically and chemically, thus reducing the productivity (Bahar and Reza 2010). Electrical conductivity is a good measure of salinity hazard and it reflects the TDS in water. According to the irrigation-based EC classification by Ragunath (1987), all the samples fall in the range of 250–750 µs/cm, indicating good category and that the water can be used for irrigation purpose. The US Salinity Laboratory (USSL) also suggested a plot for ranking the irrigation water, wherein sodium absorption ratio (SAR) was plotted against specific conductance. SAR was calculated from the relation, SAR = [Na/(Ca++ + Mg++)1/2]. Sodium and salinity are the two important parameters, which can indicate the suitability of water for irrigation purposes. In the USSL plot (Fig. 5), all the samples fall in the C2S1 zone, indicating medium salinity and low sodium hazard. Therefore, these lake waters can be used for irrigation on almost all soils, with little hazards in the development of harmful level of exchangeable sodium (Hem 1985).

Fig. 5
figure 5

USSL diagram

Conclusion

Pollution evaluation indices, principal component analysis, cluster analysis and correlation matrix (CM) have been used to assess the intensity and sources of pollution in the Curtin Lake. The concentrations of Fe in all samples are higher than the MAC, while 40 and 28 % of the samples show Pb and Se concentrations above the MAC. The surface water of this lake is characterized as near-neutral high metal, respectively. C d suggests that all samples are highly polluted (C d > 3), whereas HPI indicates that all the samples are within the critical limit (HPI > 100). A better water quality classification of the samples is attained by using heavy metal evaluation index. The HEI criteria assigned 52 % (HEI < 10), 48 % (HEI = 10–20) and 0 % (HEI > 20) of the samples in the categories of low, medium and high pollution, respectively. Comparable results to HEI are obtained when the existing HPI and C d schemes are reclassified using a multiple of the mean, as in the case of HEI.

Principal component analysis with the support of cluster analysis identified that natural/geogenic source (weathering and leaching of parent materials) and anthropogenic impact (from non-point sources) are responsible for controlling the variability of physicochemical parameters and metal contents in the lake water. That the CA and PCA results give a good agreement between the two statistical tools is evident in all the data sets analyzed.

Water quality analysis clearly shows that the elements (e.g., Fe, Pb and Se) released from natural hydrogeochemical processes with minor anthropogenic activities have a high potential for contaminating the lake water. Based on USSL plot and EC classification, the lake water is suitable for irrigation purposes. The contamination of the lake water by some heavy metals poses serious threat to ecological habitat and needs attention. Hence, this work gives background information on toxic metals and their possible sources in the surface water of Curtin Lake. This work has also highlighted the importance of an integrated approach of pollution evaluation indices and multivariate statistical methods in pollution studies of surface water systems.