Keywords

Highlights

  • Multiple Linear Regression produces a good prediction of missing data.

  • HACA classifies region based on homogeneity characteristic.

  • Discriminant Analysis produces significant results in discriminating river region.

  • PCA was used to analyse loading factor contributing to the water pollution.

Introduction

The area Kuantan has been facing urbanization and drastic development, which adversely affect the water quality of Kuantan river basin (Rizwan 2008). Unsustainable development generates destruction to hydrological stability in the location. The sources are related to human activities, which have caused abundant changes to the assemblages and biodiversity of the river fauna (Hellawell 1986; Metcalfe 1989; Wright et al. 1993; Pinel-Alloul et al. 1996; Nedeau et al. 2003). The application of chemometric (known as multivariate analysis) in water quality analysis is easy and uncomplicated while simultaneously able to produce significant results (Mazlum et al. 1999). Chemometric techniques are capable of classifying the level of water quality based on region, and are helpful in decision making and problem solving with regards to the local environmental issues (Juahir et al. 2010).

Materials and Methods

Kuantan river basin is located in the north-eastern of Pahang, spreading across the capital city of Pahang (Ishak et al. 2008). The basin is located at coordinates N3°12'27.66” and E103°07'39.99”, covering the water supply for the town with a population of 607,778. The length of the basin is about 86 km and the total area is 1638 sq. km. Secondary data used in this study was supplied by the Department of Environment, Malaysia (DOE). Hydrological data obtained from the Department of Environment (DOE) from 2003 to 2008, while the data on land use retrieved from Kuantan Municipal Council, and Technical Report of Kuantan Town Planning from 2003 to 2015. Chemometric techniques namely MLR, HACA, DA and PCA were used for this study to identify the sources of pollution in the river and the water quality criteria.

Results and Discussion

MLR is an efficient technique in predicting missing data, so that any collected data remain significant for further analysis. Parameters of land-use data includes the distribution of population, population alternatives stretch, industrial area, number of workers in the industry, the ratio between industrial areas and the number of workers, farming and livestock areas.

Based on the Water Quality Index (based on Interim National Water Quality Standard), the score for areas labelled as LPS was 31–69; the MPS was from 70 to 91, and HPS was 92. Results of HACA shows that seven stations were classified as Low Polluted Stations (LPS), which include 4KN06, 4KN08, 4KN09, 4KN10, 4KN11, 4KN12 and 4KN15. Six monitoring stations known as 4KN01, 4KN02, 4KN05, 4KN07, 4KN13 and 4KN14 were classified as Moderate Polluted Stations (MPS) and two stations namely 4KN03 and 4KN04 were classified as High Polluted Stations (HPS).

DA was applied for the three main groups obtained by HACA. Standard, forward stepwise and backward stepwise methods were tested. From the results, the accuracy of spatial classifications for Standard (13 variables), forward stepwise (6 variables) and backward stepwise (6 variables) methods were 83.61 % respectively. Adopting forward and backward stepwise mode, a few parameters have been identified to be the most significant variables in discriminating the river region, and these parameters were: DO, E. coli, pH, PO4, COD, and Cl.

PCA was applied on the datasets to identify the most important parameters influencing the identified regions of the study area. The Spearman Correlation Test was used to identify the type of development that most affects the water quality in the Kuantan river basin. Tabulations of the population distribution, population alternative slope (PAL), farming, livestock, industrial area, the number of workers in industry, and the industrial areas are in accordance with the number of workers (AI/AW), which affect the dissolved oxygen (DO) based on 90 % of confidence interval.

Conclusion

Chemometric techniques were adopted to investigate the spatial variability of water quality in Kuantan river basin. HACA had successfully classified the Kuantan River basin into three parts based on the homogenetic characteristics, known as LPS, MPS, and HPS were purposely used to identify the most successful and significant parameters The most significant parameters in discriminating the river regions based on DA methods were DO, E. coli, pH, PO4, COD, and Cl, with the accuracy of the spatial classification at 83.61 %. After the varimax rotation of the PCA method, ten contributing parameters causing the variations in the surface water quality along the river basin was identified. The 90 % confidence interval of correlation test showed that population distribution, alternative population slope, farming, livestock, industrial area, number of workers in the industry, and area of industry over number of workers in the industry were considered as land use activities that affect water quality, based on T-test analysis. This study has illustrated the use of chemometric techniques for analysing and interpreting complex water quality data. The methods have been utilised successfully to determine the most polluted locations and to identify land use activities that contribute to the deteriorating water quality in Kuantan river basin.