1 Introduction

Northeast China has been the cradle of China’s industrialization since the country’s reform and opening up policy was initiated (Li 2015). Consequently, urbanization, industrialization and agriculture keep expanding rapidly in this region (Liu et al. 2014). This development frequently alters the existing land use/land cover (LULC) in an irreversible manner (Kuai et al. 2015). For instance, cultivation area is converted into urban area which involves elimination of forest and the alteration of permeable lands into impermeable surface (Goonetilleke et al. 2005). Usually, such areas in the region will encounter the issue of pollution by nonpoint source (NPS) due to LULC changes (Eisakhani et al. 2009; Shukla et al. 2018). Unlike point source pollutants that go into water bodies via pipes or channels, NPS pollutants are pollutants that come from different sources in the environment and cannot be traced to a single source (Kuai et al. 2015). These pollutants which contribute to the declining health status of water resources are transported to water bodies via hydrology of the watersheds.

LULC changes can influence NPS pollution under a rapid economic development context in two ways. First, LULC conversions modify the hydrology of watersheds and therefore increase surface runoff volumes and peak flow (Kuai et al. 2015; Barbosa et al. 2012). Second, rapid economic development persuades a growth in numbers of local people and increases socioeconomic activities, subsequently attracting waste and pollutants generation in watersheds. These waste and pollutants are then transported to water resources through the hydrology of watersheds. These processes are further influenced by different LULC planning and water resources administration choices (Barbosa et al. 2012). It is therefore necessary to tackle the issue of water quality by looking at the hydrology of a watershed through LULC change. Assessment of this nexus is valuable for watershed management.

Globally, researchers have, for some time, quantified impacts of LULC alterations on hydrological components (Nie et al. 2011; Gashaw and Melesse 2012; Gwate et al. 2015; Welde and Gebremariam 2017; Abe et al. 2018; Choto and Fetene 2019; Pal and Talukdar 2018; Santos et al. 2019; Awotwi et al. 2019). For instance, it is reported that, in the upper San Pedro watershed, Mexico, an expansion of cultivated lands and reduction of forest cover increased surface runoff (Nie et al. 2011. It has also been observed that a changed in land cover from 1973 to 2012, in the Upper Crepori River Basin, south Brazilian Amazon, increased stream flow by 2.5%, without noticeably changing the average annual water balance. Future conservation policies and “Business as Usual” trend scenarios were also observed to increased surface runoff by 238.87% and 300.90%, and stream flow by 2.53% and 2.97%, respectively, and reduced groundwater by 4.00% and 5.21%, and evapotranspiration by 2.07% and 2.43%, respectively (Abe et al. 2018). In northeast of Portugal, Santos et al. (2019) also concluded that land use changes and afforestation scenario showed decreases in water yield, surface flow and groundwater flow and increases in evapotranspiration and lateral flow. They further indicated that, land use and land cover changes in 2000 and 2006 showed average decreases in water yield of 91 and 52 mm/year, respectively. Other studies have also documented an increased surface runoff due to land use/land cover change; Schilling et al. (2010), in the Upper Mississippi River Basin, USA, Leach (2015) in Turkey River watershed Iowa. The extension of farm land and reduction of wooded land and grasslands additionally increased stream flow in the Quaternary Basin, South Africa between 2004 and 2013 (Gwate et al. 2015). According to the study by Welde and Gebremariam (2017), an increase in bare land and cultivation land areas has caused an increase in annual and seasonal stream flow and sediments yield volumes in the Tekeze dam, Ethiopia. Alterations in agriculture, urban and forest lands in Upper Du basin, China, from 1978 to 2007 also influenced stream flow (Yan et al. 2013). Although investigations of LULC relations with hydrology have been explored under different environmental conditions globally, very few investigations have been carried out under varying conditions in the cold-temperate regions of China (Liu et al. 2018) per our investigation. Most studies conducted in these zones of China (Liu et al. 2011; Zhang et al. 2016; Yang et al. 2017; Shang et al. 2019) do not measure the influence of individual LULC types on hydrological components. The impact of LULC on hydrological components may be undervalued, overvalued or misjudged, where the contributions of individual LULC are not further analyzed. To utilize land resources and at the same time sustain it for positive hydrological processes, it is important to measure how each LULC influences hydrology.

The involvement of nonpoint source (NPS) pollution in deteriorating water quality is progressively becoming a global concern (Schaffner et al. 2011; Shen et al. 2012; Alvarez et al. 2016) and has made nonpoint source pollution an vital subject for local and nationwide policy makers. Research has reported that NPS pollution has become a significant factor in decreasing ecological water quality, since point source pollution has successfully come under control by many nations (Li et al. 2017a, b). Among the sources of NPS pollutants are synthetic fertilizers, herbicides and insecticides from agriculture land and urban areas. China has become the largest user of these materials in the world (Sun et al. 2012). As a result, NPS pollution control in China has become an important topic in environmental protection in recent times (Shen et al. 2012). Heilongjiang Province is the largest commodity grain production area in China, and faces a significant NPS pollution in most of its water bodies. Ashi River located in the province is the foremost river of the Ashi watershed and one of the important tributaries of Songhua River in northeast China. The Songhua River Basin is a main national commodity grain base and supports the national food basket with 53% maize and 37% soybean production. It is also the source of drinking and irrigation water in northeastern China (Ma et al. 2013). Since China accepted the “open-up” policy and economic reform, the Ashi River watershed has experienced rapid urban sprawl and agriculture, resulting in environmental degradation and worsening of the water quality of the Ashi River, which has been reported as one of the polluted tributaries of the Songhua River (Li et al. 2017a, b). Due to this, scientific and engineering approaches, as well as social programs, have been executed to manage the deterioration of Ashi River water quality. Nevertheless, the degenerated status of the water quality and the environment has not improved significantly (Jun et al. 2011; Ma et al. 2015a, b). Dependence on industrialized approaches could pose a challenge to fundamentally changing the status of the water quality and environmental deterioration. Different approaches are needed to address the root of this concern.

Many researchers have conducted studies in the Ashi River Basin and claimed that the LULC of the watershed is related to the water quality status of the river. For instance, Ma et al. (2015b) analyzed nitrogen (N) pollution characteristics based on water quality monitoring of the Ashi River and concluded that the water quality in the midstream and downstream areas of Ashi River were negatively affected by cropland and developed area including towns, villages and industries. Nitrogen pollution origins were also investigated in the Ashi River Basin by Yu et al. (2015) using water quality and soil monitoring techniques as well as δ15N stable isotope, and concluded that the water quality pattern is closely related to the LULC types and human activities of the watershed. Ma et al. (2015a, b) simulated the distribution of nonpoint source pollution in the Ashi River and observed that the distribution of NPS was mainly influenced by LULC. Ma et al. (2016) also investigated nonpoint source pollution control of the Ashi Basin based on a Soil and Water Assessment Tool (SWAT) model and indicated that returning farmland to forest mode, fertilizer reducing mode, filter strips mode and syntaxic mode could all reduce nonpoint source pollutants to some level. A study by Li et al. (2017a, b) in the Ashi River also observed that agricultural activities such as rice farming were contributing to the pollution of the river, because fertilizers and pesticides are heavily employed by the farmers.

Apparently, the existing literature did not consider the impact of LULC change patterns of the watershed at different points in time and how it affects the watershed hydrology and the river water quality. Moreover, there is no reported research on the relationship between LULC and watershed hydrology using SWAT and statistical models to estimate the influence of each LULC types in the Ashi watershed. To attribute water quality degradation to watershed LULC change, it is necessary to study the historical stream flow patterns in same watershed over different time periods with reference to the dynamics in the watershed LULC. This is a knowledge gap that this research seeks to address. This study will further attempt to confirm the findings of in situ investigations on the water quality of the Ashi River by other studies that linked LULC to the poor water quality of the river. The findings of this study are critical for effective implementation of both current and future water pollution control programs for the preservation of the Ashi River and its ecosystem. The study aims at measuring the influence of LULC change on the watershed hydrology over different time periods to feed into sustainable management decisions. Specific objectives are to assess the impacts of LULC at different points in time on the hydrology, measure the influence of individual LULC types on the hydrological processes and predict the future stream flow based on the future LULC of the Ashi watershed.

2 Materials and methods

2.1 Study area

The total coverage area of the Ashi watershed is 3545 km2, situated in the southwest of Heilongjiang Province, northeast China. It is confined by latitudes 45° 05′ and 45° 49′ N, and longitudes 126° 40′ and 127° 42′ E (Fig. 1). Ashi River is the principal waterway of the watershed, with a length of about 213 km and functions as a tributary of the Songhua River. The watershed altitude above sea level is from 109 to 833 m, with slopes ranging from 0 to 67.3%. Northwest of the watershed is flat, while the southeast has low-lying hilly and sloping vegetation cover. The watershed experiences nippy atmosphere in the winter, with a mean temperature of 3.4 °C and minimum temperature of − 40 °C. It witnesses winter between November and mid-April and gets uneven precipitation, which peaks in July and August. The multi-year normal precipitation is 580–600 mm (Ma et al. 2013).

Fig. 1
figure 1

Location of the study area

2.2 Ashi River watershed land use/land cover evaluation and prediction

Investigations of LULC of the watershed at different points in time were conducted with the aid of four satellite images; Landsat-5 1990, Landsat-7 ETM+ 2000, Landsat-7 ETM+ 2010 and Landsat-8 OLI_TIRS 2014. These images with 30 m resolution and 0% cloud cover were acquired from the U.S. Geological Survey (USGS) Earth Explorer site (http://glovis.usgs.gov) and extracted to assess temporal and spatial changes in the watershed. The watershed falls in one Landsat path (117) and two rows (28 and 29). The two scenes each of years 1990, 2000, 2010 and 2014 were recorded, respectively. Due to the spectral variations exhibited by the features in the watershed, the images were classified using the combined classification method which involved unsupervised and supervised classification approach. The Iterative Self-Organizing Data Analysis (ISODATA) clustering algorithm under the unsupervised technique was used, while the supervised classification technique was carried out with maximum likelihood algorithm by taking 480 ground truth samples from six LULC types (80 GPS points in individual LULC). The six LULC types were urban (URB), water (WAT), agriculture (AGR), closed canopy forest (CLF), open canopy forest (OPC) and other vegetation (OTV). Also, Google Earth images of corresponding time points were used as reference data for classifying the images at the various time points. Geo-visualization techniques and detailed focused group discussions with farmers and local environmental authorities were also undertaken. Accuracy assessment was performed by matching reference data (i.e., ground truth data, Google Earth images) against the classified images to evaluate the classification precision. An error matrix in the form of a table is widely used to yield a series of descriptive and inferential statistics to assure the classification accuracy (Manandhar et al. 2009). This method was applied in this study to confirm accuracy. ArcGIS 10.4 and ERDAS 2015 were used for mapping purposes and image classification purposes, respectively.

The future LULC change distribution of the watershed was predicted with the integrated CA-Markov model under two scenarios for the year 2030. This model was used to take advantage of both Markov chain analysis and Cellular Automata for effective and efficient spatiotemporal dynamic modeling and predicting LULC change (Mishra and Rai 2016). The combination of these models is widely used to simulate complex processes such as LULC change by studying the transition probability between initial time point and final time point to define transition direction among different LULC categories. A Markov matrix of transition probability of LULC change from 2000 to 2010 was developed to constitute a foundation for future predictions. The transition probabilities used 2000 and 2010 images to generate a transition area file that confirms the quantity of pixels which are supposed and expected to convert to other LULC over time. The existence of spatial distribution within each LULC class was not known to the transition probabilities, so a spatial factor to the model was added by Cellular Automata (CA). The transition area file and 2010 classified image as base map were served as contributions for the Cellular Automata tool to model LULC for the period 2014. Following simulation of changes in 2014, the future LULC change for the year 2030 was predicted under two scenarios with the 2014 LULC map as the base map. The first scenario was a worst-case scenario in which we assumed that the factors currently influencing LULC in the watershed maintain the same pace with the development of LULC change from 1990 to 2014 and will not change significantly from 2014 to 2030. The second scenario is best management practice; rehabilitation. Under this scenario, LULC maps for years 2010 and 2014 were inputs for predicting the 2030 scenario. Prior to the prediction, the existing LULC map of 2014 was revised to predict a future scenario assuming good environmental policies; rehabilitation including tree planting interventions, restoring open canopy forest in the forest zone (areas with slope > 25°) into closed canopy and planting trees along river networks which all aided to lessen surface runoff. Restrictions were made along river networks at a buffer of 200 m to prevent urbanization and agricultural expansion. The forest zone was constrained to prohibit anthropogenic disturbances.

2.3 Land use/land cover change analysis

The overall LULC changes as well as the gains and losses in each LULC types throughout the time period was analyzed. This was carried out using the classified images (1990, 2000, 2010 and 2014) and the predicted LULC (worst case-2030W and best case-2030B) status to reveal the trend and status of LULC changes. To get insight into the LULC changes between the time points (1990–2000, 2000–2010, 2010–2014, 2014–2030W, 2014–2030B), The study used the revised version of the single land use dynamic degree (Liping et al. 2018) and spatial-based land use dynamic degree (Liu and He 2002) to analyze the rates of gain and loss as well as the total rates of change among the different LULC types in the watershed. Many authors compute the single land use dynamic degree and neglect the transition, only considering the difference between the initial time and final time point, whereas the new version considers the areas of the LULC type that are converted from one LULC type to another LULC type. Consequently, this study employed the revised version.

The single land use dynamic degree:

$$S_{t} = \frac{{{\text{LA}}_{{\left( {i,t1} \right)}} - {\text{ULA}}_{i} }}{{{\text{LA}}_{{\left( {i,t1} \right)}} }} \times \frac{1}{{t_{2} - t_{1} }} \times 100\%.$$
(1)

The spatial-based land use dynamic degree (total rate of change):

$${\text{CCL}}_{i} = {\text{TRL}}_{i} + {\text{IRL}}_{i}$$
(2)
$${\text{TRL}}_{i} = \frac{{{\text{LA}}_{{\left( {i,t1} \right)}} - {\text{ULA}}_{i} }}{{{\text{LA}}_{{\left( {i,t1} \right)}} }} \times \frac{1}{{t_{2} - t_{1} }} \times 100\%$$
(3)
$${\text{IRL}}_{i} = \frac{{{\text{LA}}_{{\left( {i,t2} \right)}} - {\text{ULA}}_{i} }}{{{\text{LA}}_{{\left( {i,t1} \right)}} }} \times \frac{1}{{t_{2} - t_{1} }} \times 100\%,$$
(4)

where \(S_{t}\) = dynamic degree of a single LULC type, \({\text{LA}}_{{\left( {i,t1} \right)}}\) = area of a given LULC type at initial time point, \({\text{LA}}_{{\left( {i,t2} \right)}}\) = area of a given LULC type at a final time point, \({\text{ULA}}_{i}\) = the part that is not changed, \(t_{1}\) = initial time point, \(t_{2}\) = final time point, \({\text{TRL}}_{i}\) = the transfer-out rate (loss), and \({\text{IRL}}_{i}\) = the transfer-in rate (gain), CCLi is the sum of TRLi and IRLi.

2.4 SWAT model inputs and analysis

2.4.1 SWAT model

Soil and Water Assessment Tool (SWAT) 2012 model (Neitsch et al. 2011) was used in the Ashi River watershed to evaluate the influences of LULC change on hydrological processes. The model has been applied to investigate hydrological processes in small and large watersheds in different parts of the globe. It is a deterministic, physical-based, semi-distributed and continuous daily time step model. It is designed to evaluate the effects of climate variability and land management on hydrology, sediment and nonpoint source pollution in river basins (Arnold et al. 1998). SWAT model splits the watershed into multiple sub-watersheds which are further divided into smaller areas with unique LULC, topography and soil combination termed as hydrological response units (HRUs). These HRUs are created to improve calculation accuracy for best physical account of the water balance (Kushwaha and Jain 2013). The surface runoff values calculated from individual HRUs are summed to get the entire runoff value for the basin. To set up and run the SWAT model, various information of the watershed under study is needed, which include weather, soil, land use and hydrology data. The land phase of the hydrological cycle is simulated by the model using a water balance equation (5).

$${\text{SW}}_{t} = {\text{SW}}_{\text{o}} + \sum\limits_{i = 1}^{t} {\left( {R_{\text{day}} - Q_{\text{surf}} - E_{\text{a}} - W_{\text{seep}} - Q_{\text{gw}} } \right)},$$
(5)

where \({\text{SW}}_{t}\), \({\text{SW}}_{\text{o}}\),\(R_{\text{day}}\), \(Q_{\text{surf}}\), \(E_{\text{a}}\), \(W_{\text{seep}}\) and \(Q_{\text{gw}}\) represents the final soil water content (mmH2O), initial soil water content (mmH2O), the amount of precipitation on day i (mmH2O), the amount of surface runoff on day i (mmH2O), the amount of evapotranspiration on day i (mmH2O), the amount of water entering the vadose zone from the soil profile and the amount of return flow on day i (mm), respectively, and t is the time (days).

The model runs on the platform of ArcGIS a graphical use interfaces known ArcSWAT ArcGIS extension. This study employed ArcSWAT2012. For detailed description and understanding of the SWAT model, see SWAT theoretical documentation and online resources at http://swat-model.tamu.edu/.

2.4.2 SWAT model data inputs

To simulate the hydrological components of a watershed with the SWAT model, the necessary data required as inputs are topographic data also called digital elevation model, climatic data, soil data, LULC data and hydrological data of the river whose watershed is under study. Descriptions and how these datasets were obtained for this study are as follows.

2.4.2.1 Digital elevation model (DEM)

The first input of SWAT model is DEM, and this was obtained from the International Scientific and Technical Data Mirror Site, Computer Network Information Center, Chinese Academy of Sciences with 90 m resolution (Fig. 2c). The SWAT model used the DEM to determine the flow direction and flow accumulation, stream network generation, watershed delineation, sub-basin and HRUs set-up. Considering the topographic parameters of Ashi River watershed, the SWAT model partitioned the watershed into 27 sub-basins (Fig. 2a).

Fig. 2
figure 2

Watershed delineation (a), soil map (b) and weather/hydro-gauge station locations (c) in the Ashi River watershed

2.4.2.2 Land use/land cover

The LULC maps generated from the land use/land cover assessment at the various time points were used separately to reveal the influence of LULC types on the hydrology of the watershed. The LULC types were converted into four digits of the SWAT code. The codes that were given to the LULC types are URBN (Urban), WATR (Water), AGRL (Agriculture), CLCF (Close canopy forest), OPCF (Open canopy forest) and OTVE (Other vegetation), respectively.

2.4.2.3 Soil

The soil data of Ashi River watershed were acquired from Cold and Arid Regions Sciences Data Center at Lanzhou, China, with resolution of 1000 m (Fig. 2b). A soil database comprising the physical and chemical properties of soils was prepared for each layer of soil and added to the SWAT soil database to enable the integration of our soil map with the SWAT model.

2.4.2.4 Weather

Climatic data such as rainfall, temperature, wind speed, solar radiation and relative humidity from stations in and around the watershed (Fig. 2c) for the periods 1990–2014 were obtained from the official website of global weather data for SWAT model (globalweather.tamu.edu/), where climate forecast system reanalysis (CFSR) data can be downloaded.

2.4.2.5 Stream flow

The Ashi River flow monthly data, from 1996 to 2014, obtained from Acheng City Hydrological Station of Water Conservancy Department (Fig. 2c) were used for the calibration and validation of the simulated Ashi River watershed by the SWAT model.

2.4.3 Sensitivity analysis

It is the process of finding the significance of model parameters that determine the speed of alteration in model outputs with respect to variations in the model parameters (Arnold et al. 2012) for model calibration and validation. Based on published protocols, this study used 20 discharge parameters (Table 1) to identify the most essential SWAT parameters that influence stream flow. Global sensitivity analysis which permits varying each parameter at a time (Abbaspour 2013) was applied to the SWAT-CUP new version 5.2.1. The t-stat and p value statistics from SWAT-CUP provided the measure and meaning of sensitivity, respectively. For instance, a high t-stat in absolute values indicates higher sensitivity, whereas a p value of zero indicates more significance (Abbaspour 2013).

Table 1 Parameters used for calibration and validation of the SWAT model

2.4.4 Parameter adjustment for calibration, validation and uncertainty analysis

Calibration of a model is the process of adjusting model parameter inputs to ensure that simulations match with observations so that prediction uncertainty is reduced (Arnold et al. 2012). Before calibration, the difference between observed and simulated precipitation, snowmelt and water yield from the uncalibrated model was initially fine-tuned to minimize the difference. This was achieved by fine-tuning the temperature and precipitation lapse rate (TLAPS and PLAPS, respectively) from default to nearby derived values. The simulated precipitation, snowmelt and water yield were therefore closely matched with observed values. Validation is the confirmation of the calibrated parameters by testing the calibrated parameters with an independent set of data without altering the model parameters (Arnold et al. 2012). Nineteen years (1996–2014) of monthly flow data were used for the calibration and validation of the model. The 1996–2005 data were used for calibration and the 2006–2013 data were used for validation. The Sequential Uncertainty fitting (SUFI-2) algorithm imbedded in SWAT-CUP program (Abbaspour 2013) was used for the calibration and uncertainty analysis. This was a good choice because SUFI-2 captures all sources of uncertainty to accurately reduce uncertainty of the model output. This method was adopted since the weather input data were from the SWAT website. The uncertainty is determined by the 95% prediction uncertainty band computed at the 25% and 97.5% levels of the output variable (Abbaspour 2013).

2.4.5 Evaluation of model performance

The indices used to evaluate the SWAT model performance were the Nash–Sutcliffe efficiency (NSE), percent bias (PBIAS) and coefficient of determination (R2) as recommended by Moriasi et al. (2007). The NSE value describes the accuracy of the model, whereas the R2 value describes the colinearity relationship between the simulated value and observed value. When the PBIAS > 0, it indicates that the simulated value is larger than the observed value and when PBIAS < 0, the simulated value is smaller than the observed value. On the other hand, when PBIAS = 0, it indicates that the simulated value is the optimal. For a SWAT model to be acceptable after evaluation, the performance of the model calibration/validation must have R2 and NSE values greater than 0.5.

NSE is defined as:

$${\text{NSE}} = 1 - \left[ {\frac{{\sum\nolimits_{i = 1}^{n} {(Y_{i}^{\text{obs}} - Y_{i}^{\text{sim}} )^{2} } }}{{\sum\nolimits_{i = 1}^{n} {(Y_{i}^{\text{obs}} - Y^{\text{mean}} )^{2} } }}} \right]$$
(6)

PBIAS is defined as:

$${\text{PBAIS}} = \left[ {\frac{{\sum\nolimits_{i = 1}^{n} {(Y_{i}^{\text{obs}} - Y_{i}^{\text{sim}} )^{*} 100} }}{{\sum\nolimits_{i = 1}^{n} {\left( {Y_{i}^{\text{obs}} } \right)} }}} \right],$$
(7)

where n equals total number of observations, and \(Y_{i}^{\text{obs}}\) and \(Y_{i}^{\text{sim}}\) are the measured and the simulated values, respectively, R2 is defined as

$$R^{2} = \left( {\frac{{\sum\nolimits_{i = 1}^{n} {(O_{i} - \bar{O})(S_{i} - \bar{S})} }}{{\sum\nolimits_{i = 1}^{n} {(O_{i} - \bar{O})^{2} \sum\limits_{i = 1}^{n} {(S_{i} - \bar{S})^{2} } } }}} \right)^{2},$$
(8)

where Si is the simulated value of i and Oi is the observed value of i; \(\overline{S}\) is the average value of all simulated values; \(\overline{O}\) is the average value of all the observed values; and n is the number of the value.

2.4.6 Application of the calibrated model to explore the influence of LULC change on hydrological conditions

Evaluating the influence of LULC changes on hydrological processes of a watershed is significant for water resources management (Yan et al. 2013; Gyamfi et al. 2016). Therefore, the calibrated model with the LULC maps (1990, 2000, 2010 and 2014) and the predicted LULC maps (2030W and 2030B) were used to reveal the hydrological effects of LULC variations. The fixing–changing method was employed, where the calibrated model was run for the individual LULC maps while keeping other input data constant. This method has been applied by other studies in different parts of the world (Nie et al. 2011; Wagner et al. 2013; Yan et al. 2013; Gyamfi et al. 2016). To determine the influence of the changes that occurred in the watershed LULC on the watershed hydrology, the simulated results were analyzed. This was done by comparing the hydrological components obtained at the different time points on yearly and monthly average basis. To probe further, the link between hydrological components and variations in LULC types were evaluated using the pair-wise Pearson correlation method (Yan et al. 2013). Consequently, partial least square regression (PLSR) model was applied to measure the influence of each LULC variation on the hydrological processes of the watershed.

PLSR is a strong multivariate regression approach suitable for data analysis under the condition of multi-collinearity of data or when explanatory variables or predictors are highly correlated (Wold et al. 2001). The approach explores features from principal component analysis and multiple regressions for its predictive ability. This is achieved based on linear combinations called factors, latent variables or components of explanatory variables (predictors) that have the greatest predictive power (Cox and Gaudard 2013). Unlike the ordinary least squares approach, PLSR approach achieves satisfactory results more than OLS approach in circumstances where independent variables are more than observations, there are highly correlated independent variables, or a large number of predictors and many response variables (SAS Institute Inc 2017). To find out about these conditions in the study data, multi-collinearity was checked using tolerance and variance inflation factor (VIF). The results revealed colinear characteristics of the data. With our data exhibiting collinearity with VIF values greater than 10 and tolerance nearing to 0, PLSR model is suitable for discovering the impact of each LULC types (Godoy et al. 2014; Li 1999) in the Ashi watershed.

$$Y = k_{0} + k_{1} x_{1} + k_{2} x_{2} + \cdots + k_{i} x_{i},$$
(9)

where Y = the dependent (response) variable, \(k_{0}\) = intercept, x = the predictors (independent variables from 1 to i) and k = the regression coefficients of the x variables.

PLSR techniques additionally give weight to independent variables by developing components and regression coefficients of each predictor in the greatest explanatory factors. With this, the most inducing variables for a particular response can easily be understood (Abdi 2010). To keep the number of significant components, a criterion involving a cross-validation was assessed with two main indices; R2 (goodness of fit) and Q2 (goodness of prediction). These were used to achieve the suitable number of factors in individual PLSR models. According to the literature when R2 is > 0.5 and Q2 is > 0.0975, the PLSR model portrays significance and good predictions (Trap et al. 2013). Hence, to avoid the issue of overfitting, the suitable number of factors for the individual PLSR model was determined using the above indices through cross-validation. Also, to ascertain the number of components elucidating the model, the Root Mean PRESS (predicted residual sum of squares) was employed (SAS Institute Inc. 2017). A predictor’s variable importance for the projection (VIP) measures the influence on the factors that define the model (Cox and Gaudard 2013). Therefore, the VIPs and regression coefficients (RC) were employed to ascertain the relative impact of each independent variable on the dependent variable. Hence, it was possible to detect which LULC types powerfully connect with the hydrological processes in the watershed. Independent variable with large VIP value indicates how important it is in influencing the dependent variable. A minimum acceptable VIP is 0.8 (SAS Institute Inc 2017). The RCs of the model indicate the strength and direction of the influence of each predictor in the model. A predictor with a large RC and a large VIP implies that the predictor is relevant and contributes greatly to the forecast and thus has to be kept in the model, whereas a predictor is deleted from the model, if both RC and VIP have small values (SAS Institute Inc 2017).

In this investigation, the predictors are the LULC types, while the response variables are annual stream flow, surface runoff, water yield, lateral flow, groundwater flow and evapotranspiration. However, a multi-collinearity test revealed an extreme collinearity among the predictors thus, to get satisfactory regression results, open forest was eliminated from the predictors to reduce extreme multi-collinearity. In view of their association, four PLSR models were established; PLSR 1 for annual stream flow, PLSR 2 for surface runoff and water yield, PLSR 3 for groundwater flow and lateral flow and PLSR 4 for evapotranspiration. PLSR was performed in JMP 14.3.0, whereas STATA 15.1 was used for other statistics and multi-collinearity tests.

3 Results and discussion

3.1 LULC changes assessment

The LULC status of the Ashi watershed from 1990 to 2030 is shown quantitatively and spatially in Figs. 3 and 4, respectively. Agriculture and urban areas are increasing in size throughout the period 1990–2014, and is expected to continue from 2014 to 2030W under the worst-case scenario, but will expand in a lesser degree from 2014 to 2030B under the best-case scenario. Urban and agriculture areas increased from 1.8% and 43.4% in 1990 to 2.9% and 43.7% in 2000 to 5.7% and 45.9% in 2010 and continued to 5.9% and 47.8% in 2014, respectively. These areas will continue increasing to 10.4% and 51.2% in 2030W under the worst-case scenario but will shrink to 9.1% and 49.7% in 2030B, the best-case scenario, respectively (Fig. 3). The observed incremental trend of urban and agriculture reflects the impact of resettlement programs, agriculture modernization and economic liberalization reforms of the Heilongjiang Reclamation Area (HRA), where Ashi watershed is located in the northeast of China (Liu et al. 2014). This trend clearly demonstrates that the government policy on “House-hold Responsibility System” initiated in 1978 and the market-oriented economic system in 1992 (Godoy et al. 2014) had a strong influence on LULC. The policies permitted farmers to make decision concerning land resources and as a result destruction of forest cover areas and along hillsides to cultivate were common (Yan et al. 2013). Congruently, the size of close forest areas also decreased from 33.3% in 1990 to 26.7% in 2000 to 9.5% in 2010 and continued reducing from 2010 to 2014 (9.5% to 6.9%). However, close forest area is expected to increase from 6.9% in 2014 to 7.3% in 2030W under the worst-case scenario and this could be attributed to the fact that, in the 1990s the ecological functions of forest, and other natural land covers were recognized nationwide and therefore green projects such as grain to green policy for Heilongjiang Province were accepted which lessened the rate of natural cover destruction due to human activities (Wang et al. 2009). Moreover, with conservation policies close forest is expected to expand some more to about 32% in 2030B under the best-case scenario (Fig. 3). Open forest area also revealed an increased trend from 1990 to 2010 but begin to reduce from 2010 to 2014 and is expected to further reduce from 31.4% in 2014 to 27.9% and 5.4% in 2030W and 2030B, respectively (Fig. 3).

Fig. 3
figure 3

Area extent of LULC types in the Ashi River watershed from 1990 to 2030 time points

Fig. 4
figure 4

LULC status in the Ashi watershed from 1990 to 2030 time points

The total rate of LULC change is presented in Fig. 5, where the total annual rate of change (km2/yr) of urban is greater than agriculture. This indicates that the watershed which used to be a major agriculture zone is gradually becoming an urban area due to rapid urban sprawl in the watershed. The highest total annual change of close forest occurs in the period 2010–2014 (Fig. 5), and it is expected to achieve highest gain rate during 2014–2030B. Findings in other conducted studies (Gebremicael et al. 2013; Shi et al. 2013; Ottinger et al. 2013; Puertas et al. 2014;Yeboah et al. 2017) are consistent with the finding of this study. The findings of the projected LULC change under worst-case and best-case scenario are also in line with studies done by Han et al. (2015).

Fig. 5
figure 5

Total annual rate of LULC change (km2/year) from 1990 to 2030 in the Ashi River watershed

3.2 LULC accuracy assessment and CA-Markov validation

The overall accuracies of all the four maps examined with ground truth points were greater, 89%, and generated kappa statistics of more than 84% (Table 2). This suggests accuracy between the classifications made and the ground reference information (Monserud 1990). The minimum of about 67% recorded for Other vegetation was accepted because of its similarity with open and close forest classes and agriculture. A kappa index of 0.87 was achieved, which is above 0.75 indicating that the predicted 2014 LULC from the CA-Markov model exhibited consistency with the kappa index and shows the reliability of the model to forecast future LULC change of the watershed under different scenarios (Wang et al. 2011).

Table 2 Classification accuracy values for LULC maps (%)

3.3 SWAT sensitivity analysis

Out of the 20 flow parameters (Table 1) selected for the model calibration, 17 influential parameters were identified to be sensitive to the output variables (Table 3). Sanadyha et al. (2014) also verify that the most sensitive parameters controlling stream flow in snow-dominated areas include those governing groundwater and snow processes. Parameters related to surface runoff such as CN2, CH-K2 CH-N2 were also identified to be sensitive (Table 3).

Table 3 Sensitive flow parameters analysis results

3.4 Calibration, validation and uncertainty of SWAT model

The comparison between observed and simulated stream discharge for the calibration (1996–2005) and validation (2006–2013) is shown on a monthly basis in Figs. 6 and 7. The statistical performance indicators showing a consistent match between simulated and measured stream flow data are presented in Table 4. The achieved R2 values for calibration and validation are greater than 0.80 indicating a very good match between simulated and observed stream flow and fewer error variations between the dataset (Moriasi et al. 2007). With NSE greater than 0.80 and PBAIS of ± 10 the SWAT model portrays a good performance in the Ashi watershed. Despite the SWAT model exhibiting a good model performance in the watershed, the PBAIS indicated an overestimation of stream flow by − 6.16% and − 9.12% during the calibration and validation periods, respectively. This could be due to uncertainties associated with input data quality, applying the SCS curve number in SWAT model simulation method, or misinterpretation of the watershed processes (Abbaspour 2013).

Fig. 6
figure 6

Monthly average stream flow, a calibration and b validation of the SWAT model

Fig. 7
figure 7

Scatter plot of observed and simulated monthly mean flow (m3/s) in a calibration and b validation periods

Table 4 Calibration and validation model performance values

3.5 Influences of LULC changes on hydrological components at watershed scale

The calibrated SWAT model was applied to ascertain the hydrological responses under six different LULC distributions of the watershed at different time points. The average annual values and percentage change of the stream flow and five hydrological components, surface runoff, water yield, groundwater flow, lateral flow and ET, are shown in Table 5. Matching the upward trend of urban and agriculture areas and reduction of close forest cover (Fig. 3), the yearly stream flow of the watershed increased from 12.665 mm in 1990 to 13.873 mm in 2000 and to 14.005 mm in 2010 and continued to 14.123 mm in 2014. The results also indicated that annual stream flow will continue to increase from 14.123 mm in 2014 to 14.204 mm in 2030W which is associated with the expected increase in urban and agriculture areas and reduction in close forest cover in 2030W (Fig. 3). In contrast, the expected reduction in urban and agriculture areas and expansion in forest cover in 2030B revealed an increase in stream flow from 14.123 mm in 2014 to 14.167 mm in 2030B, which is less than the increment under the 2030W.

Table 5 Annual average water balance components (1990–2014) at different time points in the Ashi watershed and percentage change between time points

The increasing trend of the stream flow is due to the growth of impervious surfaces in the watershed as a result of the rapid urban sprawl and expansion of agriculture areas. This is in agreement with other studies (Gebremicael, et al. 2013; Gwate et al. 2015; Welde and Gebremariam 2017; Choto and Fetene 2019). In the case of the future stream flow, the results recorded an expected increment in stream flow under both worst and best scenario case, but relatively expect a less increment in the best scenario case. This is contrary to findings by Shrestha et al. (2018), where they found that the flow rate is expected to reduce under combined impact LULC and climate change of both economic and conservation scenarios. The difference in findings could be attributed to the fact that they investigated the combined effect of LULC and climate change on the future stream flow, while this study only investigated the impact of LULC on future stream flow. It could also be due to different geographical locations of the study sites.

Furthermore, increase in urban, agriculture, open forest, other vegetation and decrease in close forest resulted in a corresponding yearly increment of more than 8% in stream flow, surface runoff and water yield with surface runoff having the highest increment of more than 80%, while lateral flow, groundwater and ET recorded 66%, 28% and 1% reduction, respectively, from 1990 to 2000. Continuous reduction of close forest cover at the advantage of agriculture and urban areas further increased the yearly stream flow, surface runoff, water yield and lateral flow with lateral flow having the highest increment of more than 5% in 2010, while groundwater and ET recorded less than 1% further reduction during the same period. In 2014, the same trend of change in urban, agriculture and close forest also resulted in an increment of less than 1% in stream flow, surface runoff, water yield and groundwater, while lateral flow and ET recorded a reduction of 5.9% and 0.3%, respectively (Table 5). With the expected LULC status in 2030W stream flow, surface runoff, water yield and lateral flow are also expected to increase to 0.6%, 3.6%, 0.7% and 1.6%, respectively, while groundwater is expected to reduce further to 3.8%. On the other hand, in 2030B, stream flow, surface runoff and water yield is expected to decrease to 0.3%, 2.6% and 0.3% as compared to the increment of 0.6%, 3.6% and 0.7% in 2030W, respectively, although lateral flow will be the opposite but groundwater and ET will behave the same way as that of stream flow, surface runoff and water yield.

The influence of LULC changes on stream flow, surface runoff, water yield, groundwater flow, lateral flow and ET were also evaluated based on monthly mean values (Fig. 8). From the 1990 to 2030 period, surface runoff has increased from June–August, while water yield from June–September and ET from May–September have increased and reduced in all other months. In addition to these months, surface runoff and water yield again recorded high values in the month of March throughout the period (Fig. 8). The reduction in the values of these hydrological components in the rest of the months, excluding June to September, is due to severe and long winters in the study area. Though the monthly values of lateral flow are not high relative to the others, its monthly flow dynamics follows the same trend but with the highest monthly flow values in 1990. The study findings are similar to other studies (Gebremicael et al. 2013; Gyamfi et al. 2016) which all concluded that the monthly values of these hydrological processes were related to changes in urban and agricultural areas. According to studies conducted by Karamage et al. (2017), the rise in surface runoff was attributed to expansion of agriculture areas and urban sprawl at the disadvantage of forest cover. Nie et al. (2011) also concluded that the general surface runoff increment was due to increasing trend of urban and agriculture and the reduction trend in grassland.

Fig. 8
figure 8

Monthly average response of stream flow, surface runoff, water yield, lateral flow, groundwater flow and evapotranspiration to the different LULC conditions at various time points in the Ashi watershed

3.6 Changes in LULC and its simulated hydrological processes at sub-watershed scale

The spatial distribution of variations in the five LULC types and their corresponding hydrological processes at the sub-basin scale from 1990 to 2014 is portrayed in Figs. 9 and 10. Urban area is concentrated in the northwestern part, where the capital city of Heilongjiang Province, Harbin City, is located and extending to the central part of the watershed as well as toward the south. This is mainly due to urban sprawl and settlement of farmers resulting from the economic and open-up reforms policy by the government. Built-up areas have thus taken over areas of closed forest, open forest and other vegetation (Fig. 9), which is attested by the negative correlation between urban and close forest, open forest as well as other vegetation (Table 6). The growth in agricultural activities in the watershed also attracted farmers to put up buildings, contributing to the spring up of urban areas. The pattern of both urban and agriculture behaves similarly (Fig. 9) leading to high positive correlation between urban and agriculture (Table 6). Therefore, these two LULC types are expanding at the expense of close forest, open forest and other vegetation as substantiated by the negative correlation between them and close forest, open forest and other vegetation.

Fig. 9
figure 9

Spatial distribution of changes in LULC scenarios at sub-basinal scale between 1990 and 2014 in the Ashi River Basin

Fig. 10
figure 10

Spatial distribution of changes in hydrological processes at sub-basinal scale between 1990 and 2014 scenarios in the Ashi River Basin

Table 6 Pair-wise Pearson correlation for changes in five LULC types and water balance components between 1990 and 2030

Close forest has reduced almost in all the sub-basins but the devastating reduction was observed in the northwest and east sections of the basin pointing to the central and southeastern part. The reduction in close forest (Fig. 9) reveals that, when humans through their anthropogenic activities disturbed close forest, it gives way to open forest and other vegetation to take over. This is confirmed by the fact that, close forest correlates negatively with open forest and other vegetation, where close forest exhibits a high negative correlation with open forest (Table 6).

The matching hydrological processes of these LULC types at the sub-basin level are shown in Fig. 10. Though surface runoff has been observed in the northwestern and central western part of the watershed, the increase in both surface runoff and water yield was largely observed in the central, east, south and southeastern sections of the watershed. This agrees with the spatial distribution of expansion in urban and agriculture, which is proved by the positive correlation between surface runoff and water yield and the two LULC types. This connection could be ascribed to the impervious nature of these LULC types and impermeability of the top surface of soil in these two LULC types. However, sections of the watershed where both surface runoff and water yield were high could be due to the interaction of slope, close forest and these two hydrological components. The slopes in the east, southeast and southern part of the watershed are very high and steep, where forest has been disturbed. Therefore, when there is increase in slope length and steep slopes coupled with decrease in close forest, surface runoff will increase and consequently increase water yield (Akbarimehr and Naghdi 2012). This is validated by the negative correlation between surface runoff, water yield and close forest (Table 6) and the high land nature of these sub-basins where these two hydrological components are high as exhibited by the watershed DEM (Fig. 2c).

The interaction of ET and the watershed LULC types is obvious in Fig. 10. ET decreases in all the sub-basins but increases in the southern part of the watershed where the Xiquanyan Reservoir is located (Fig. 10). This have been proved by the negative association it has with areas of urban, agriculture, open forest and other vegetation as well as the positive link with close forest areas. This is understandable because ET relies mostly on tree transpiration and water bodies as well as photosynthetic processes. The spatial trend of the substantial reduction in ET matches with growth in the areas of urban, agriculture, open forest and other vegetation (Fig. 9), which explains why ET is highly negatively correlated with urban and agriculture (Table 6). Lateral flow and baseflow exhibited almost similar spatial distribution trends, which look like the spatial distribution of urban, agriculture, open forest and other vegetation with negative correlation, but having a positive correlation with close forest.

3.7 Influences of individual LULC changes on hydrological components

To assess the influence of each LULC types on hydrological processes in the watershed, the LULC types were regressed against each of the hydrological components. Four PLSR models were developed separately and the results are summarized in Table 7. The attained R2 and Q2 cum values are above 0.5 and 0.097, respectively, in all the four PLSR models constructed; PLSR 1, PLSR 2, PLSR 3 and PLSR 4 indicating a satisfactory model prediction ability (Table 7). A minimum Root Mean PRESS was achieved with one factor in each of the models explaining the variability in the response variables. In each model, adding other factors does not significantly improve the prediction ability of the models and at the same time it does not boost the percent variation explained by independent variables (Table 7). Prediction errors rather increased when other factors are extracted, which propose that these other factors do not correlate strongly with the residuals of the projected variables.

Table 7 PLSR analysis results of the hydrological components in the Ashi River watershed

For PLSR 1 model, one factor explained 65.4% of the variability in stream flow. Extracting more factors from the model did not improve the prediction and change explained by the predictors (Table 7). Thus, PLSR 1 is dominated by urban and agriculture land in the right direction and close forest in the left direction. Other vegetation type gave low weight value and thus low importance in influencing hydrology components. Similar results were reported by a study in China, that grass land has low importance in influencing stream flow (Shi et al. 2013). The RCs also indicated urban and agriculture influence stream flow positively, whereas close forest and other vegetation had a negative influence on stream flow (Table 8). Urban, agriculture and close forest have their VIP values to be 1.2, 1.2 and 0.9, respectively, indicating a high significant contribution in the model (Table 9). For PLSR 2 model, the one factor that fits the model, explained 62.3% variation in surface runoff and water yield. Again, adding other factors did not enhance the prediction ability of PLSR 2 and percent change explained by independent variables (Table 7). It was detected that urban and agriculture had a positive influence on surface runoff and water yield, while close forest had a negative effect with a relative high significance (VIP > 0.8) in PLSR 2 (Table 9). The model RCs also showed similar direction of impact (Table 8). In the same vein, the one factor in PLSR 3 explained 49.9% of the variability in lateral flow and groundwater. Including other factors did not significantly improve the prediction capability and the explained variance (Table 7). Urban, agriculture and close forest have more importance in the model with VIP 1.27, 1.21 and 0.81, respectively (Table 9). The model revealed urban and other vegetation had negative influence on lateral and groundwater flow, while agriculture and close forest have positive influence on these hydrological components (Table 8). The constructed model for ET (PLSR 4) revealed that 81.5% of the variability in ET was explained by one factor of the model. In this model, the observed VIPs for urban, agriculture and close forest were 1.24, 1.25 and 0.810, respectively (Table 9).

Table 8 Regression coefficients of PLSR models indicating the impact of each LULC types on hydrological components and the direction of influence (− and + signs)
Table 9 Variable importance of the projection values (VIP) and PLSR weights of hydrological components in the watershed

Considering each LULC types on the hydrological components, change in urban area affected stream flow, surface runoff and water yield positively but influenced lateral flow, groundwater flow and evapotranspiration negatively under the four constructed PLSR models. This implies that the urban sprawl in the watershed has increase stream flow, surface runoff and water yield components while decreasing lateral flow, ground flow and evapotranspiration in the watershed. Similar findings were reported by Nie et al. (2011), Gyamfi et al. (2016), Woldesenbet et al. (2017), Gashaw et al. (2018), Li et al. (2019). Also, according to the four models, agriculture impacted positively on stream flow, lateral flow, ground flow and evapotranspiration but influenced surface runoff and water yield negatively. This indicates that as agriculture area expands, it increases stream flow, lateral flow, groundwater flow and evapotranspiration in the watershed. The increase in lateral and groundwater flow reflects the irrigated agricultural activities in the watershed, because irrigated farming using surface water sources increases these components (Foster et al. 2018). The findings of this study therefore demonstrate that the increasing irrigated agriculture activities in the Ashi watershed has led to increase in lateral and groundwater flow in the watershed. Moreover, the increase in plants through agricultural activities in the watershed has resulted in an increase in evapotranspiration, since ET depends on plants and trees for its transpiration mechanism. Change in close forest affected stream flow, surface runoff and water yield negatively, but positively affected lateral flow, groundwater flow and evapotranspiration from all the four models. This is in line with findings reported by Monserud (1990), Gebremicael et al. (2013) and Puertas et al. (2014).

The socioeconomic development along the midstream and downstream areas of the Ashi River have adversely affected the water quality of the river due to domestic sewage and industrial wastewater as well as water released from farmlands (Ma et al. 2015b; Yu et al. 2015). The increasing trend of stream flow, surface runoff and water yield resulting from LULC changes especially urban growth as observed in this study could cause major water quality degradation.

4 Conclusion

The application of process-based hydrological models such as SWAT and statistical models in this study have obviously demonstrated the influence of LULC changes on hydrological processes of the Ashi watershed at different points in time. The expansions of urban and agriculture areas coupled with reduction in close forest from 1990 to 2014 have increased the annual stream flow, surface runoff and water yield. In contrast, evapotranspiration has reduced, while the changing trend of lateral and groundwater flow is not stable after the 2000 period. It was observed that the major LULC changes that influence the hydrological components in the watershed were urban, agriculture and close forest, though they have different contributions for the changes in these components. The VIPs in the four models was observed to have higher VIP values (VIP > 1) for urban and agriculture and (VIP > 0.8) for close forest. This undoubtedly establishes that urban, agriculture and close forest contributed mainly to determining the fate of the hydrological processes in the watershed and therefore are the chief environmental stressors in the Ashi watershed.

The LULC status under both worst-case and best-case scenarios are expected to further increase stream flow, surface runoff, water yield, lateral flow and evapotranspiration but reduce groundwater flow. Nevertheless, it was found that the magnitude of the increment under the worst case is more than the best-case scenario. This expected increase in surface runoff may increase soil erosion and sedimentation thereby aggravating the current water quality problem of the Ashi River. Therefore, this should be a trigger for decision makers to put more emphasis on best management practice in the watershed to avoid the expected increase in stream flow, surface runoff and water yield.

The research findings further suggest that conscious enforcement of sustainable land use management practices adoption should be encouraged in China’s sustainable development policy, especially at the local levels. Doing this could promote a balance between human actions on land utilization and hydrological components to secure water availability for the present and future generation in China.