Introduction

The quantity and quality of water resources have been affected by patterns of accelerated land use around the world. Rapid urbanisation and industrial and agricultural development have been associated with the deterioration of aquatic and terrestrial ecosystems in the watersheds of developing countries. Numerous studies have shown a significant correlation between land use planning activities and water quality in various water bodies (Li et al. 2008; Buck et al. 2004; Baker 2003; Tong and Chen 2002). In general, the level of concentration of water pollutants is linked to the higher percentage of land use change resulting from land development in watersheds, while good water quality is mainly found in the natural forest and undeveloped areas. However, these associations are not constant across different zones because the anthropogenic- and natural-induced factors of watersheds such as the physical landscape, economic development activities, sources of pollution, and regulation policies vary often with space (Tu 2011a). As such, the relationships between land use and water quality studied by traditional statistical approaches, such as Pearson correlation and OLS regression, which suppose that relationships are stationary over space, would not be reliable when analysing spatial relationships among geographical data. Relationships may often vary spatially because the characteristics of pollutants and their sources within a watershed may not be the same in different locations. Therefore, the relationships between one type of land use and different water pollutants are inconsistent and different types of land use are not correlated with the same water pollution issues (Tu 2011b). To address these problems, in recent years, a simple and more powerful statistical method known as GWR has been developed in order to explore the continuing varying interactions between land use and water quality across space (Brunsdon et al. 2002) and time.

GWR provides a useful technique for exploring the spatially varying associations between land use and water pollutants across different regions. It has been the applied method to analyse the variations in spatial relationships between various phenomena including natural and anthropogenic factors (Mohammadi et al. 2019; Nkeki and Asikhia 2019; Wang et al. 2017; Huang et al. 2015; Yu et al. 2013; TU and XIA 2008). It is likely that when global models such as OLS are used for these analyses, a certain level of relationships may be masked. Monitoring sites are assumed to have a distinct relationship model between land use factors and water pollutants when the results are spatially analysed. By employing GWR to analyse the interactions between the land use and water quality variables, it is expected that each associated land use-water quality variable exhibits a more or less distinct pattern of relationship across study region. In fact, the GWR is able to present regression coefficient values and a spatially varying relationship strength for each associated variable on a map. GWR tries to account for spatial differences by letting the parameters of the regression model to vary in space. Local estimation of the model parameters is attained by weighting all adjacent counts using a function of distance decay, supposing that nearby counts have a greater effect on the regression point than those far away. Like the OLS system, GWR constructs a model to analyse how points may be different (Tu 2011b). Local model parameters are commonly mapped using visualisation such as GIS, which allows local spatial variation in regression outputs to be studied (Tu 2011b). Subsequently, Global Moran I is performed on the OLS and GWR models’ residuals to evaluate the performance of these models with respect to spatial autocorrelation. Moran’s I is generally used as an indicator of spatial autocorrelation and has been used in previous studies to test the significance of spatial autocorrelation in regression models to ensure no violation of the randomness assumption and independence of residuals (Buyong 2006; TU and XIA 2008; Clark 2007; Harries 2006).

However, most previous studies examined the relationship between land use and water pollution in terms of total areas rather than densities (Huang et al. 2015; Chen et al. 2016; Erfanian and Alijanpour 2016). For example, they investigated the concentration of pollutants based on the percentage of a land use class in the sub-basin area, but it would also be relevant to evaluate pollutants in relation to land use class within pollution potential zone (contributing area) across the study region. In addition, it will be useful to see how these relationship vary over time, because the major concern could be peak pollution in time, and not regular pollution over time. Thus, more frequent sampling data for three different periods were needed in this study to evaluate the performance and predictive capability of GWR model in examining how water quality and land use variable relations vary with time and space.

Material and methods

Study area

The study area of this research is Selangor River watershed, located in the State of Selangor, Malaysia. It is the main river in the State of Selangor, with an extent area of about 2200 km2, nearly a quarter of the total area of the State of Selangor (Chowdhury et al. 2018). The Selangor River streams in a southwest direction and runs through a total distance of 110 km before ending into the Strait of Malacca. The watershed is the largest water source for the States of Kuala Lumpur and Selangor, and about 60% of the water consumed in these States comes from the Selangor River (Sakai et al. 2017). The main tributaries include Sungai Rawang, Sungai Buloh, Sungai Batang Kali, Sungai Serendah, Sungai Kundang, Sungai Kerling, and Sungai Sembah. The watershed is approximately 70 km long by 30 km wide and covers almost 28% of the State of Selangor, where about 406,000 people lived in 2006 (Fulazzaky et al. 2010). Approximately 57% of the watershed is still covered by natural forests, while agricultural activities use 22%, 17% is used for development areas and 4% occupied by water (Kusin et al. 2016). The land use maps of the Selangor River watershed are shown in Fig. 1.

Fig. 1
figure 1

Location of the study area along with sampling stations (a) and land use maps of the Selangor River Basin (b) from DOA (2019)

Land use and water quality data

To consistently and adequately analyse the interactions between land use and water quality pollutants over time and space, we used land use data for 2006, 2010, and 2015 and the corresponding years of frequently sampled water quality data, from two different sources. First, the land use maps were obtained from the Department of agriculture (DOA), Malaysia. The interview with the technicians of this department indicated that the land use maps were derived from high resolution satellite imageries (spots 2, 4, and 5). All pre-processing and processing analyses of the satellite imageries, including radiometric and geometric corrections, were performed at DOA in Malaysia. These imageries were spatially registered, corrected, and classified into several land use classes using the field ground control points and appropriate classification algorithm. In this study, the land use attributes were aggregated based on level I from Anderson (1976) Classification Scheme. Second, the water quality data used in this study were acquired from nine monitoring stations as part of the river water quality monitoring programme by the Department of environment (DOE) of Malaysia in 2006, 2010, and 2015. Since 2000, the DOE regularly (every 2 months) monitors the water quality of these stations within the Selangor River basin. This dataset includes of a set of water quality indicators such as chemical oxygen demand (COD), dissolved oxygen (DO), biochemical oxygen demand (BOD), suspended solids (SS), ammonia nitrogen (NH3-N), temperature (TEMP), and potential of hydrogen (pH) for monitoring sites that involve the lower, middle, and upper streams of the basin profile (Fig. 1).

In this study, LUAS comprising 12 points are estimates of unknown values using measured values at 1SR 9 sampled points based on a kriging estimation technique. The GWR and OLS models were conducted using water quality pollutants as dependent variables and land use classes as explanatory variables. To avoid possible multicollinearity among land use indicators, each of the GWR and OLS models used only one land use class as an independent variable to analyse its associations with each quality pollutant (Fig. 2). In this study, there were four classes of land use and seven water quality pollutants for three different years. As a result, 84 OLS models and 84 GWR models were built in this study. Given that land units located far from the river cannot generate pollution potential for bodies of surface water (Sivertun and Prange 2003), pollutants generated at a distance greater than 1000 m cannot reach the river or influence the water quality of the river (Sivertun and Prange 2003; Do et al. 2011; Alilou et al. 2018). Therefore, in this study, a 1000-m buffer radius from each monitoring station is used to estimate the land use area and its relationship to water quality at each monitoring site. Through this step, a linkage between each water quality variable and land use indicators for all sampling points was established. Apart from the three main land use categories such as forest, agricultural, and urban lands, the exploratory regression indicated that the other land use classes do not individually contribute much to the prediction of water quality in the study area. Therefore, these land use classes have been grouped into one category (known as others) for further modelling.

Fig. 2
figure 2

Methodology flowchart

Statistical analysis and modelling methods

Prior to modelling the spatiotemporally varying interactions between land use and water quality pollutants, the normality test for all land use classes and water quality variables was conducted using the Shapiro-Wilk test and quantile-quantile (QQ) plots. Most land use indicators and water quality pollutants were not normally distributed. Therefore, the square root and natural log transformation techniques were performed for transforming them to meet the normality distribution requirements for further analyses. Table 1, Table 2, and Table 3 show the standardised water quality values used in OLS and GWR analyses for 2015, 2010, and 2006, respectively. Prior to these regression analyses, a Pearson correlation was applied to observe the overall association between land use indicators and water quality pollutants. In this study, the spatiotemporal relationships between land use and water quality were examined by applying both the OLS and GWR modelling methods. However, OLS is a globally homogeneous model and is unable to reveal the heterogeneity and non-stationarity of spatial relationships between geographic data and assumes a consistent relationship between variables in a given spatial region. Thus, we relied on GWR in this study to examine the spatiotemporally varying interactions between land use and water quality across the study area (Eqs. (1) and (2)).

Table 1 Standardised water quality values used in OLS and GWR analyses for 2015
Table 2 Standardised water quality values used in OLS and GWR analyses for 2010
Table 3 Standardised water quality values used in OLS and GWR analyses for 2006

In order to understand the evolution of the DOE water quality index (DOE-WQI) in relation to land use, a graphical model was developed. The local WQI mainly used in Malaysia emanated from an opinion polling formula of a panel of experts consulted on the choice of parameters and the weighting of each parameter (Gazzaz et al. 2012). The six parameters chosen for the WQI are DO, COD, BOD, SS, pH, and NH3-N. The calculations are done on the sub-indices rather than the parameters themselves. From the computed WQI, a river can be categorised into a number of classes, each indicating the valuable uses to which that river can be used. This classification is based on allowable limits of designated pollution parameters. For this reason, the DOE has defined the values of the water quality variables (WQVs) and WQI indicators that determine each water quality class (DOE 2007). Details on the WQI calculation procedures are provided in the supplementary materials of this study.

Rather than global, GWR model lets local parameters to estimate the location of the sample and the model is expressed as follows:

$$ {y}_{j\kern0.5em }={\beta}_o\left({u}_j,{v}_j\right)+{\sum}_{i=1}^p{\beta}_i\left({u}_j,{v}_j\right){x}_{ij}+{\varepsilon}_j $$
(1)

In which yj denotes the dependent variable, uj and vj are considered the coordinates for each location j, βo(uj, vj) considered the intercept of location j, and βi(uj, vj) is the estimation of local parameter for variable xi at location j.

The GWR is executed by weighting all observed counts about a sampling point using a function of distance decay, by supposing that observed counts nearer to the location of the sampling point have a greater effect on the estimates of local parameter for the location (TU and XIA 2008). The weighting function can be defined using the following exponential decay formula:

$$ {w}_{ij}=\exp \left(-{d}_{ij}^2\right)/{b}^2 $$
(2)

In which wij is the observation weight j for observation i, dij is considered the distance between the two observations (i and j), and b is known as the kernel bandwidth.

When the distance is superior to the kernel bandwidth, the weight quickly moves towards zero. The bandwidth of either fixed or adaptive kernel can be selected for GWR. However, GWR model generally works better with a large number of sampling sites, but with a small number of monitoring sites in a delimited space from the sampling points (pollution potential zone), and by choosing a fixed kernel which has a constant bandwidth across space, the GWR also provides desirable results.

In this study, the comparison between OLS and GWR models was done on the basis of adjusted R2 and the AICc values of both models. Higher R2 indicates that the explanatory variable may explain further variance in the dependent variable. While a lower AICc value means a closeness of the model to reality, a lower AICc value indicates better model performance. If the difference in AICc between two models is less than 3, they are considered equivalent in terms of explanatory power (Fotheringham and Chris Brunsdon 2003). When the difference between AICCOLS and AICCGWR is greater than 3, one model shows a statistically significant improvement over the other model (Yu et al. 2013).

For comparing the ability between OLS and GWR models in dealing with spatial autocorrelation, global Moran’s I statistics was computed for the residuals of each of the OLS and GWR models. As it is a regularly used indicator to assess spatial autocorrelation, Moran’s I value varies from − 1 to 1. Where a value of 1 indicates a perfect positive spatial autocorrelation, while a value of − 1 means a perfect negative spatial autocorrelation, and a value of 0 designates a perfect spatial randomness. If significant autocorrelation is found in the residuals, the results of the regression analysis are not reliable (Ishizawa and Stevens 2007).

Afterwards, the local R2 values resulted from the GWR models were used for mapping in order to give an ideal visualisation of the spatiotemporal variations in the relationships between land use and water quality at 1SR 9 regular monitoring stations, and the capabilities of the land use indicators to explain variations in water quality. All analyses and mappings were done using ArcGIS 10.2, SPSS, and MS Excel.

Results

Land use composition in the pollution zones

In this study, a 1-km buffer zone from each sampling station is used to estimate the land use area and its relation to water quality at each monitoring site in 2006, 2010, and 2015. Figure 3 shows the composition of land uses in the buffer zones which include forest, agricultural, urban, and other land groups. It is not surprising that agricultural and forest lands are the most dominant in the contributory buffer zones, as these lands occupied about 80% of the total area of the river basin. However, the urban area, which is significant in only a few contributing areas, is expected to increase more rapidly over the next decades, while other land use areas which include cleared land, ex-mining areas, marshes, and abandoned and eroded areas changed relatively little during these periods.

Fig. 3
figure 3

Land use composition in the pollution zones

Pearson correlations between land use and water quality variables

Table 4 shows the results of Pearson correlation analysis between land use and water quality variables. From the results, it can be seen that in 2015, forest land is negatively correlated with all the water quality variables except DO and pH which are positively correlated with it. However, agricultural and urban lands are positively correlated with most variables, and agricultural area is significantly correlated with SS. With the exception of DO, all other water quality variables in 2015 are positively correlated with other land groups that include abandoned mining areas and miscellaneous land. Based on this result, it is evident that in 2015, apart from forest area, all other land use activities including agricultural and urban areas were important non-point sources of water pollution in the Selangor River. In addition, the result of correlation analysis between land use and water quality pollutants in 2010 indicate that forest land is positively correlated with three water pollutants which are DO, BOD, and pH, and negatively correlated with other pollutants. However, agricultural activities, urban area, and other land groups exhibit positive relationship with most of the pollutants, with urban land showing significantly positive correlation with BOD, and other land groups indicating significant positive correlation with COD and TEMP and significantly negative correlation with DO. Moreover, in 2006, while forest, agricultural, and urban lands are negatively correlated with most of the water pollutants, other land groups which include cleared land and ex-mining areas, however, are positively correlated with all the pollutants except DO and pH. These land groups could be considered the major non-point source of the river pollution in this year. In general, the level of concentration of water pollutants at a particular site is related to the change of forest or natural areas to urban or agricultural land, while good water quality is mainly found in natural forest and undeveloped areas. However, it should be noted that various qualitative and management criteria may also influence the concentration of pollution at sites where the river receives pollutant loads from poultry farms, municipal wastewater and industrial wastewater (point sources). This may alter the trends of some parameters in time and space, regardless of the land use type or percentage in that area. Therefore, efforts to control both pollutant sources and pollution processes are needed.

Table 4 The results of Pearson correlations between land use and water quality variables in 2015, 2010, and 2006

Evaluation of performances of OLS versus GWR models

The performance of the OLS and GWR models was compared primarily based on the average adjusted R2 values (ranging from 0 to 1) of all the land use indicators and the result is given in Table 5. A considerable improvement in GWR R2 over the OLS R2 for each water quality variable and the land use indicators is observed during these investigation periods. The R2 values from GWR range from 0.65 to 0.07, 0.50 to 0.04, and 0.25 to 0.03 for 2015, 2010, and 2006, respectively. The R2 values from OLS range from 0.44 to 0.04, 0.25 to 0.01, and 0.10 to 0.01 for 2015, 2010, and 2006, respectively. Thus, the R2 values from GWR are all considerably higher than the corresponding R2 values from OLS. In addition, the result also indicates that the two models most predict the variation in DO for all the three periods compared with the other pollutants, while the models explain the least the variation in NH3-N for these periods.

Table 5 Comparison of average coefficient of determination (adjusted R2) between GWR and OLS

In addition to evaluating the performance of the OLS and GWR models by considering higher values of global R2 (adjusted R2), to identify the statistical significance of one model improvement over the other, we used another measure of goodness-of-fit which is the AICc values presented in Table 6. In this case, models with smaller AICc values are better than models with higher values. When the difference in AICc values between two models is greater than 3, one model shows a statistically significant improvement over the other model (Yu et al. 2013).

Table 6 Results of statistical test showing the improvement in the model fit of GWR over OLS

As shown in Table 6, significant perfections in the fit of the GWR over OLS models are found for the forest models with DO, BOD, COD, and TEMP in 2015; DO, TEMP, and SS in 2010; and TEMP in 2006. For agriculture, the GWR models present significant improvements compared with their corresponding OLS models for DO and pH in 2015; SS and TEMP in 2010; and DO, SS, and pH in 2006. For urban, the improvement of GWR models is significant for DO, SS, and TEMP in 2015, and only for TEMP in 2006. The improvement of GWR models over OLS models for others is significant for DO, BOD, and pH in 2015; for DO, SS, and TEMP in 2010; and only for BOD and COD in 2006. Overall, 27 of the 84 GWR models in total show significant perfections over their corresponding OLS models. The remaining 57 models are considered to be insignificant improvements over the OLS models, although in few cases the OLS models show improvements slightly greater than their corresponding GWR models, but none of these improvements is greater than 3 in values of difference.

Comparison of spatial autocorrelations of OLS and GWR residuals

The results of the Moran’s I statistics based on the residuals of the GWR and OLS models for 2015, 2010, and 2006 are presented in Table 7. Both positively and negatively spatial autocorrelations are identified for all the OLS models. Significantly positive spatial autocorrelations are also identified in the models. In Table 7, bold number indicates values with significant autocorrelation at p ≤ 0.05; bold and italic number indicates significant autocorrelation at p ≤ 0.01. Based on these results, in 2015, significant autocorrelations are found in 11 of the 28 OLS models, while they are found in 9 models for 2010, and only in 5 models for 2006. However, most OLS models show no spatial autocorrelation which indicates that most of these models are suitable for examining the relationships between land use and water quality variables. In addition, the results of the Moran’s I statistics derived from the residuals of the GWR models are also presented in Table 7. From these results, both positively and negatively spatial autocorrelations are found for all the GWR models. As shown in Table 7, in 2015, significant autocorrelations are identified in 6 of the 28 GWR models, while they are only found in 4 models for 2010, and in 5 models for 2006. This result indicates that GWR models increase the reliability of the relationships by minimising the spatial autocorrelations in the model residuals.

Table 7 Comparison of Moran’s I of the residuals from the OLS and GWR models

Temporal variations in relationship between forest land and water quality pollutants

Figure 4 shows the result of temporal variations in local R2 values for forest land in relation to each water quality variable. This result indicates the capacity of forest land to explain the variance of each water quality variable at each monitoring site according to its proportion of annual prediction. In Fig. 4, we can observe that in 2015, forest land most predicts the variation in DO, BOD, SS, and TEMP compared to its prediction proportion in 2010 and 2006. However, for pH and NH3-N, the 2010 proportion of prediction is the highest, while for the COD, the highest proportion is recorded in 2006, followed by the prediction of 2015, and no notable prediction in 2010 for this variable.

Fig. 4
figure 4figure 4figure 4figure 4

Local R2 of GWR models for forest (a), agricultural (b), urban (c), and other lands (d)

Temporal variations in interactions between agricultural land and water quality pollutants

The result of temporal variations in local R2 values for agricultural land in relation with each water quality pollutant is shown in Fig. 4. This result indicates the ability of agricultural land to explain the variance in each water quality variable at each monitoring station according to its proportion of annual prediction. Figure 4 shows that in 2015, agricultural land most predicts the change in most water quality variables compared with its prediction proportion in 2010 and 2006. However, it is only for BOD and COD that the proportion of prediction for 2010 is dominant compared to other years.

Temporal variations in relationship between urban land and water quality pollutants

The result of temporal variations of local R2 values for urban land in relation to each indicator of water quality is shown in Fig. 4. This result indicates the capacity of urban land to explain the variance of each water quality variable at each monitoring site based on its annual prediction proportion. Figure 4 indicates that in 2010, the proportion of annual prediction of urban land was higher in DO, BOD, COD, and TEMP compared with its prediction in 2015 and 2006 for these variables. However, the proportion of prediction from urban land is higher in 2006 for pH and NH3-N, while for SS, its prediction for 2015 is dominant over other periods.

Temporal variations in relationship between other lands and water quality pollutants

Figure 4 shows the result of temporal variations in local R2 values for other lands in relation to each water quality variable; based on this result, the ability of other lands to explain the variance in each water quality variable at each monitoring site based on its yearly prediction proportion is observable. Figure 4 indicates that other lands have higher prediction proportion for most water quality variables in 2010 compared to other periods. However, only for BOD and NH3-N, the prediction proportion from these lands is higher in 2015 compared with 2010 and 2006 predictions.

Spatial relationship between land use and water quality variables in 2015

Figure 5 shows the result of spatial variations in local R2 values between land use and water quality variables at the monitoring sites in 2015. This result indicates that the association between land use and water quality variables varies spatially from one monitoring station to another; thus, other land groups which include abandoned mining areas and miscellaneous land can explain most of the spatial variations in most of the water quality indicators, including DO, BOD, NH3-N, and TEM. However, forest land can explain the most the variation in COD while agricultural land explains the most the variation in SS and pH parameters in 2015. This result is consistent with the result obtained from the Pearson correlation analysis between land use and water quality in 2015 (Table 4).

Fig. 5
figure 5figure 5figure 5

Local R2 of GWR models for land use indicators in 2015 (a), 2010 (b), and 2006 (c)

Spatial relationship between land use and water quality variables in 2010

Figure 5 shows the result of spatial variations in local R2 values between land use and water quality variables at the monitoring sites in 2010. This result indicates that the association between land use and water quality variables varies spatially from one monitoring station to another; therefore, other land groups including cleared land, abandoned mining areas, and eroded land may account for most of the spatial variation in most of the water quality indicators in this year. While urban land appears to be the most important predictor for BOD and a good predictor for COD, forest land indicates the greatest association only with pH. And agricultural land has considerable association with BOD and COD. This result also reflects the result of the correlation analysis between land use and water quality in 2010 (Table 4).

Spatial relationship between land use and water quality variables in 2006

Figure 5 shows the result of spatial variations in local R2 values between land use and water quality variables at the monitoring sites in 2006. This result indicates that the association between land use and water quality variables is not the same at all the monitoring sites. Figure 5 indicates that agricultural land can account for most of the SS and pH variations in 2006, while urban and forest lands are the greatest predictors of BOD and NH3-N this year. However, other lands predict more variation in DO, COD, and TEMP. This result is also in line with the result of Pearson correlation analysis between land use and water quality in 2006 presented in Table 4. However, unlike Pearson’s correlation and OLS regression, the result of local R2 from GWR models allows us to observe how the relationship between variables varies in space between sampling sites.

Temporal variations in relationship between land uses and local water quality index

This research employed the local WQI to evaluate the state of water quality in relation to land use indicators for the three different periods. In this process, the water quality data was converted into usable information that reflects the level of water quality degradation in the river (Fig. 6). The water quality status expressed in terms of WQI indicates that the river water is generally of good quality and can therefore be used directly for recreational activities with body contact, but conventional treatment is required for other uses such as domestic supply. However, the river water is of average quality at SR01, 1SR09, and 1SR10 for all 3-year periods, indicating the level of water quality degradation requiring extensive treatment. These monitoring stations are located downstream of the watershed where anthropogenic activities are dominant and they are classified in class III by the DOE-water quality index (WQI). In addition, Fig. 6 clearly shows that predominantly agricultural areas are associated with good water quality status, while areas of agricultural, urban, and other land use activities locate sampling stations that have generally a moderate water quality status in the river basin.

Fig. 6
figure 6

Temporal evolution of WQI in relation to land use changes over the monitoring stations.

Discussion

Relationship between land use and water quality indicators

All the land use indicators have more or less interactions with all the water quality variables at certain monitoring sites or even throughout the study area. These interactions varied between different variables of water quality and land use. These variables also showed an important spatial non-stationarity. The interactions for the same pair of land use and water quality variables at different monitoring sites varied not only in response to different time periods but also according to different levels of land use indicators in the contributing areas. Large spatial differences in interactions were observed between a low and a high percentage of each land use indicator at different monitoring sites. While the Pearson’s correlation and OLS results primarily indicated the global associations between land use indicators and water quality pollutants, GWR results enabled to observe the spatial non-stationarity of these associations, as also demonstrated by the previous studies (Mohammadi et al. 2019; Tu 2011; TU and XIA 2008; Yu et al. 2013; Chen et al. 2016; Huang et al. 2015). However, unlike these previous studies, the present study also demonstrated considerable temporal differences in the interactions between the same pair of land use indicators and water quality pollutants at different monitoring sites over different periods. It was relevant to see how these interactions vary over time, as the main concern could be the peak of pollution over time, not regular pollution over time. Thus, both the temporal irregularity and the spatial non-stationarity between land use and water quality indicators at different sites were covered in this study.

Temporally varying relationship between land use and water quality pollutants

While all of the land use indicators used in this study were relatively important predictors of water quality, it is important to note that the predictive ability of the same pair of land use indicators varies from period to period, which could be due to the fact that the percentage of each category of land use was not the same in different periods. The Selangor River watershed has undergone various land use changes in recent decades, with a considerable conversion of forest land to either agricultural, urban, or other development areas (Fig. 1). Although the forest area is affected by various anthropogenic activities, however, the vegetative landscape of the river basin is still covered by about 57% of natural forests (Kusin et al. 2016). This is an important area for maintaining water quality in good condition, as forest land showed a negative correlation with most water quality pollutants in all the investigation periods. However, forest became a more important predictor of most pollutants in 2015 compared to 2010 and 2006 where the forest area was correlated with fewer pollutants. Likewise, agricultural land most predicted the change in most water quality variables in 2015, while urban areas and other land groups including cleared land, abandoned mining areas, and eroded land were the most important predictors of water quality in 2010. Thus, the variation in prediction capability of an explanatory variable is consistently the result its level of correlation with the dependent variable under observation. As such, only GWR models have the predictive ability to examine how these interactions vary across the study area and over time. Traditional statistical methods such as the OLS and Pearson correlation techniques are global and do not reveal local relationships.

Spatially varying relationship between land use and water quality pollutants

Forest is commonly associated with good water quality in various studies on watersheds worldwide (Bahar et al. 2008; Huang et al. 2014; Camara et al. 2019; Nainar et al. 2017). Similarly, in this study, negative associations were found between forest land and most water quality pollutants. The GWR results showed that these associations were not constant among the different sampling sites. Although natural forest covers the largest area of the total study area, however, in most contributing areas, local estimates indicate that forest fragmentation is not very useful in improving the quality of water in the study area, since the forest area has a higher explanatory power only for COD in 2015, pH in 2010, and BOD and NH3-N in 2006. Thus, the ability of forest land to explain the change in water quality not only varied with time and space but also differed among different water quality pollutants.

Agricultural land is generally considered to be an important source of non-point pollution for surface water quality, significantly positive association is often found between agricultural land and increased pollution of water quality due to various farming activities such as fertiliser application, crop production, and livestock (Camara et al. 2019; Stutter et al. 2007; Ariffin et al. 2016). Although agricultural land represented the lowest percentage of land use in the contributing areas at 1SR sites, it could nonetheless make a great contribution to water pollution in the river watershed by having positive correlation with several water quality parameters and significantly positive association with SS in 2015. It is not surprising that with its lowest percentage, agricultural land had a high predictive power for the parameters SS and pH in 2015, BOD and COD in 2010, and SS and pH in 2006, because studies indicated that agricultural land usually represent less than 10% in most watersheds, but continues to contribute considerably to water pollution (Tu 2011a, b; Mehaffey et al. 2005). In this study, the ability of agricultural land to explain the variation in water quality not only varied over time and space but also differed among the different water quality pollutants (Fig. 5).

Urban land, including residential and recreational lands, facilities, and public service areas, is generality related to water quality deterioration resulting from various anthropogenic activities, such as food waste, construction residues, municipal sewage, and wastewaters from treatment plants, which more correlate with higher concentration of water quality pollutants in watersheds, particularly in developing countries (Azyana et al. 2012; Tong and Chen 2002; Nurhidayu et al. 2016; Camara et al. 2019). Urban areas also influence surface runoff and erosion patterns and alter hydrological processes (Camara et al. 2019). However, in the present study, urban land exhibited positive relationship with most of the water quality pollutants, with significantly positive correlation with BOD in 2010. Tong and Chen 2002 also found significantly positive relationships between urban land and many water quality parameters including BOD in the State of Ohio watersheds, USA. In this study, the ability of urban land to explain water quality change was revealed by GWR (Fig. 5), the results also indicated that this explanatory ability was not the same over time and space and neither the same among the different water quality parameters. Thus, urban land appeared to be the most important predictor of BOD and a good predictor of COD in 2010, and a greater predictor of BOD and NH3-N in 2006.

Another important group of land use in this study included cleared land, abandoned mining area, eroded land, and miscellaneous area. This land use group showed positive relationships with most of the water quality variables throughout these investigation periods. The results of the Pearson correlation and OLS, as well as the GWR results, indicated that this group of lands was more positively associated with most of the water pollutants in the study area compared to other land use indicators such as forest, agricultural, and urban areas. Mining activities were a major source of pollution in many parts of the world, and several studies have addressed this issue (Kusin et al. 2018; Gomez-Gonzalez et al. 2016; Diami et al. 2016). Cleared and eroded lands could also make a significant contribution to water pollution through continuous accumulation of sediments in nearby river systems. However, the results of GWR in this study indicated that these lands could explain most of the spatial variations of most of the water quality parameters, including DO, BOD, NH3-N, and TEM in 2015 and 2010. Thus, this explanatory power was different not only over time and space but also among the different water quality parameters.

Conclusion

This study analysed the influence of four predictors of water quality, including forest and agricultural, urban, and other lands, on seven water quality parameters in the Selangor River watershed for three different periods. By performing the analysis within the potential pollution zones, the results of the Pearson correlation and OLS, as well as the GWR results, showed consistency regarding interactions between land use and water quality indicators.

The performance of the OLS and GWR models was compared using the average adjusted R2 values of all the land use indicators; the results showed considerable improvement in GWR R2 over the OLS R2 for each water quality variable for all the three different periods. The performance of the GWR models over OLS models was also demonstrated by measuring the model goodness-of-fit based on AICc values. In this regard, there are many significant improvements in the fit of the GWR models compared to the OLS models. In addition, the capacity of the two models in dealing with spatial autocorrelation was compared; the results of Global Moran’s I indicated that the GWR models increase the reliability of relationships by minimising spatial autocorrelations in the model residuals.

To examine the temporal variations in relationship between each land use and water quality pollutants, the local R2 values from the GWR models were mapped at 1SR 9 regular monitoring stations for clear visualisation of their varying temporal relationship across the river basin. The proportion of predictive ability of the same land use type varied not only with time and space but also among the water quality parameters. In addition, to compare spatial relationships between land use and water quality indicators for the different years, the local R2 values from the GWR models were equally mapped. The results indicated that the ability of land use indicators to explain water quality change was not the same in time and space and neither the same among the different water quality parameters. GWR can be a useful tool for water resource managers and multi-decision-makers to discover the causes of local pollution on a temporal and spatial scale, to improve understanding of the state of local pollution and to adopt appropriate land use planning and management policies to ensure the sustainability of the local watershed. However, this study recommends that future research considers more land use and water quality variables, as well as more sampling sites. The study also suggests that future studies take into account topography, precipitation, surface flow, and the nature of pollutants to identify areas of pollution or contributing area, as these factors, in addition to distance, may also alter the influence level of pollution to rivers. Future studies should also identify the best measures to prevent a serious threat to the availability of water resources in the basin, which could promote the sustainable management of water resources within the framework of the State’s access to sustainable development, which involves the achievement of environmental, economic, and social sustainability measures.