Introduction

Earth’s climate has been changing throughout history; however, the rate of change is alarming in the last few decades. The increase in anthropogenic emission of greenhouse gases is indeed the primary cause of global warming (IPCC, 2014). The rise in temperature causes an increase in evaporation and air moisture content, resulting in changes in precipitation pattern and intensity. Increased temperature and varying precipitation have a substantial effect on the hydrological cycle, which further affects water availability on the local, regional, and global scale (Oo et al., 2019). The Hindu Kush Himalaya (HKH) region, also known as “The third pole,” contains the biggest reserve of snow beside the polar region, and is currently under threat of climate change (Singh et al., 2011). The snow-fed rivers originating from the HKH, which serves food, water, and hydroelectricity to the billions of people living downstream of this region, are also significantly affected. Increased runoff in these rivers due to high precipitation and rapid ice/snow depletion during monsoon has triggered mountain hazards, such as floods and landslides in recent years (Ahluwalia et al., 2016; Dimri et al., 2016). On the other hand, reduced runoff during dry seasons may cause water scarcity for household, industrial, hydroelectricity, and agricultural irrigation purposes (Jain et al., 2010). Despite its significance, the number of studies conducted in the HKH region is very few when compared to other regions. Therefore, hydrological modeling and climate change impact assessment are necessary for sustainable watershed management and preparing adaptation policies to cope with future climate change impacts on water resources in the HKH region.

In the HKH region, the impact of climate change is prominent. Even if global warming is restricted to 1.5 °C by the end of this century, the average temperature in the HKH region is expected to be 0.3 °C higher (Wester et al., 2019). The rise in temperature may cause significant glacier retreat (Shea et al., 2015) and snow/ice depletion (Thapa et al., 2020a), affecting biodiversity and the overall cryospheric environment. Various researchers have attempted to assess climate change’s impact on hydrological regimes in the HKH region on the local and regional scale (Bajracharya et al., 2018; Immerzeel et al., 2012, 2013; Lutz et al., 2014; Shrestha et al., 2017). Most of these studies projected an increase in discharge in the future, while some studies (e.g., Bhatta et al., 2019) projected a decrease in the future discharge due to climate change. Though the effect of climate change is evident in the HKH region, the direction and magnitude of change are not uniform throughout the region (Pandey et al., 2020). Hence, for effective water resource management, it is pertinent to investigate climate change’s impact on future water availability at the basin or sub-basin level.

Most of the studies on hydrological modeling have employed either conceptual degree-day models or physical energy-balance models (Immerzeel et al., 2013; Shrestha et al., 2017; Singh & Saravanan, 2020). Both physical-based models and conceptual models need detailed knowledge of hydrological processes and catchment properties. Unlike these conventional models, data-driven (DD) models, such as Machine learning (ML), can mimic complex nonlinear systems by learning the association between input and output even without understanding the physical processes involved (ASCE, 2000). In recent years, due to innovations in computer technology, particularly graphical processing units (GPUs) and the availability of remote sensing data, DD models have become a popular modeling tool worldwide (Ateeq-ur-Rauf et al., 2018; Fenu & Malloci, 2020; Thapa et al., 2020b; Uysal et al., 2016; Yang et al., 2020; Yazdani & Rassafi, 2019; Zeydalinejad et al., 2020). Many studies have reported the superiority of ML models over conventional hydrological models, including SRM (Uysal et al., 2016), SWAT (Pradhan et al., 2020), WEAP, and GR2M (Farfán et al., 2020). The result of these studies proves that ANN models are suitable alternatives to conventional models for hydrological modeling. Although traditional ANNs, such as feedforward neural networks, were extensively used in the past (ASCE, 2000), they could not retain time-dependent information. Recurrent neural networks (RNN) were developed to overcome these shortcomings. Though RNNs are better than simple ANNs for time series problems, such as hydrological modeling, they are affected by vanishing gradient issues (Hochreiter & Schmidhuber, 1997). Deep learning (DL)-based networks can overcome challenges encountered by simple RNNs and have demonstrated superior performance over other traditional ML models in various research areas (Thapa et al., 2020b; Zhang et al., 2018). However, to the best of the authors’ knowledge, studies on climate change impact on streamflow in the snow-dominated basins using the DL approach have not been conducted. In this study, the state-of-the-art DL models, namely, Gated recurrent unit (GRU) and Long short-term memory (LSTM), are employed for assessing climate change’s impact on river discharge in the Langtang basin, Nepal.

A previous study by Immerzeel et al. (2013) reported that the streamflow in the Langtang basin will increase in the future and no shift in hydrograph will occur. On the contrary, Pradhananga et al. (2014) stated that the river discharge will not show any significant trend in the future; moreover, the shift in hydrograph was predicted. This inconsistency in the result might be due to different modeling approaches and/or data sources. In previous studies, Immerzeel et al. (2013) developed a cryospheric hydrological model, whereas Pradhananga et al. (2014) applied a positive degree-day model for snowmelt runoff modeling. Instead of conventional hydrological models, in this study, we propose DL-based models for snowmelt runoff modeling in the Langtang basin. The outcomes of the hydrological modeling and climate change studies heavily rely on the accuracy of the climate data (Lutz et al., 2016; Thanh, 2019). Therefore, the selection of the data source, i.e., global and regional climate models (GCMs/RCMs), is crucial for climate change impact studies. However, previous studies in the basin used the climate models without any prior assessment to check their suitability for the region. In a study by Lutz et al. (2016), several climate models were analyzed based on envelope approach and past performance, and the result suggested that two climate models, namely, CMCC-CMS and inmcm4, were suitable for the HKH region under representative concentration pathways (RCPs) 4.5 and 8.5. Therefore, in this study, the future climate of the basin was projected based on outputs from CMCC-CMS and inmcm4 climate models.

This research aims to (1) investigate the future climatic condition of Langtang basin, Nepal using outputs of climate models; (2) develop three ML models, including LSTM, GRU, and RNN, for snowmelt runoff prediction and compare their performance; and (3) quantify the climate change impact on snowmelt runoff by DL approach. Gamma test (GT) was applied to determine the suitable input for the models. This approach is potentially applicable to any mountainous basin with adequate historical streamflow data. This paper is organized into six sections. Section 1, the introduction, highlights the significance of the research. Section 2, study area, describes the study area. Section 3, materials and method, describes the data collection and research methodology. Section 4, results, presents the significant results of the study. Section 5, discussion, presents the interpretation of the results. The final section, the conclusion, summarizes the implications of the findings.

Study area

The study area, the Langtang basin (Fig. 1), is located in Central Himalayas, around 60 km north of Kathmandu, Nepal. It is a typical snow-dominated Himalayan basin. Due to ease of access as compared to other Himalayan catchments and data availability, this site is a suitable location for snow-related water resource and climate change impact studies. The total area of the basin is 354 km2, including 110-km2 glacier area. The glacier area is obtained from the RGI-GLIMS version 6.0 dataset (RGI Consortium, 2017). The altitude of the catchment ranges from 3647 to 7213 m above sea level. In the Langtang catchment, the major contributor to the total runoff is snowmelt, followed by rainfall and ice melt (Ragettli et al., 2015).

Fig. 1
figure 1

Location of the Langtang basin

Materials and method

The local hydrometeorological data for the reference period (2002–2010) is provided by the Department of Hydrology and Meteorology (DHM), Nepal. The future climate data from RCMs is obtained from the International Centre for Integrated Mountain Development (ICIMOD) regional database system. The digital elevation model (DEM) of spatial resolution 30 m × 30 m from ASTER is used to delineate the basin boundary. Moderate Resolution Imaging Spectroradiometer (MODIS) images were processed to extract the snow-covered area (SCA) of the basin. The methodology adopted in this research is shown in Fig. 2. A brief description of datasets and methods is presented below.

Fig. 2
figure 2

Research methodology

Hydrometeorological data

The daily mean temperature, precipitation, and streamflow data were provided by the DHM, Nepal for the reference period (2002–2010). In this study, streamflow data from the Kyangjing hydrological station at latitude 28.22°, longitude 85.55°, and elevation 3800 m above sea level and meteorological data from the Kyangjing meteorological station at latitude 28.22°, longitude 85.61°, and elevation 3920 m above sea level were used.

Snow cover

We used MOD10A2 8-day maximum snow extent (Hall & Riggs, 2016) for snow cover mapping. Various scholars have confirmed the validation of the MODIS dataset with ground observation (Hall & Riggs, 2007; Stigter et al., 2017). In a recent study, Stigter et al. (2017) validated MODIS images with snow observation in Langtang with an accuracy of 83%. For this study, 406 MODIS images were downloaded and processed to obtain the SCA of the catchment. The MODIS images were projected to World Geodetic System 1984, Universal Transverse Mercator Zone 45. The snow cover for the delineated basin was calculated from MOD10A2 datasets. Snow cover images exceeding 10% cloud cover were removed. Thus acquired 8-day snow area was interpolated into a daily scale by the cubical spline method. MOD10A2 snow products can be downloaded from https://nsidc.org/data/mod10a2.

Future climate data

The Intergovernmental Panel on Climate Change has described four representative concentrative pathways (RCPs), namely, RCP 2.6, RCP 4.5, RCP 6.0, and RCP 8.5, as a basis of climate modeling experiment (van Vuuren et al., 2011). The bias-corrected future daily mean temperature and precipitation data of spatial resolution 10 km × 10 km from two RCMs, inmcm4_r1i1p1 and CMCC-CMS_r1i1p1, for RCP4.5 and RCP8.5, was obtained from ICIMOD regional database system (ICIMOD, 2016). Out of several climate models, these models are selected based on past performance and envelope-based approach as described in Lutz et al. (2016). In this study, future time periods are divided as follows: 2021–2030 (2020s), 2041–2050 (2040s), 2071–2080 (2070s), and 2091–2100 (2090s).

RNN

RNNs are developments over traditional ANNs. Unlike simple ANNs, they can retain temporal information, therefore suitable for sequential data such as hydrological time series. A simple unfolded recurrent neural network is shown in Fig. 3. Various studies have reported better performance of RNNs over ANN models (Nagesh Kumar et al., 2004). The architecture of the RNN cell is shown in Fig. 4a.

Fig. 3
figure 3

An unfolded recurrent neural network

Fig. 4
figure 4

The architecture of ML models (a) RNN cell, (b) GRU cell, and (c) LSTM cell, where X is the input vector, h is the hidden state, tanh denotes a hyperbolic tangent function, σ denotes sigmoidal function, r denotes reset gate, z denotes update gate, i denotes input gate, f denotes forget gate, o indicates output gate, c indicates cell state, and t stands for the time step

GRU

GRU is a special kind of RNN, which can retain temporal information as well as address exploding and vanishing gradient problems (Cho et al., 2014). GRU has two types of gates that control adding or removing the information in the cell. The update gate regulates the amount of information to pass to the future. The reset gate decides the amount of previous information to forget. The equations related to GRU are as follows:

$$\mathrm{U}\mathrm{p}\mathrm{d}\mathrm{a}\mathrm{t}\mathrm{e} \ \mathrm{g}\mathrm{a}\mathrm{t}\mathrm{e}:{z}_{t}=\sigma \left({W}_{z}{x}_{t}+{U}_{z}{h}_{t-1}+{b}_{z}\right)$$
(1)
$$\mathrm{U}\mathrm{p}\mathrm{d}\mathrm{a}\mathrm{t}\mathrm{e} \ \mathrm{g}\mathrm{a}\mathrm{t}\mathrm{e}:{r}_{t}=\sigma \left({W}_{r}{x}_{t}+{U}_{r}{h}_{t-1}+{b}_{r}\right)$$
(2)
$${h}_{t}={(1-z}_{t})\odot {h}_{t-1}+{z}_{t}\odot {\stackrel{\sim }{h}}_{t}$$
(3)
$${\stackrel{\sim }{h}}_{t}=\mathrm{t}\mathrm{a}\mathrm{n}\mathrm{h}\left({W}_{h}{x}_{t}+{U}_{h}\left({r}_{t}\odot {h}_{t-1}\right)+{b}_{h}\right)$$
(4)

where \(z\) and \(r\) are the vectors representing Update gate and reset gate, respectively, and \({h}_{t},{\stackrel{\sim }{h}}_{t}\) are the vectors for the hidden states and candidate values. A typical GRU cell is shown in Fig. 4b.

LSTM

Though LSTM was first proposed in the 1990s (Hochreiter & Schmidhuber, 1997), its true potential has been noticed recently. The LSTM is a specialized RNN capable of preserving long-term dependencies (Kratzert et al., 2018). The presence of a memory cell makes LSTM distinct from other RNNs. LSTM has three gates, i.e., Forget, Input, and Output gates, to control the information flow in the cell. Forget gate governs the information to be removed from the previous cell state. The input gate controls the information to be introduced in the cell. Using the notations from Thapa et al. (2020b), equations related to the LSTM are presented below.

$$\mathrm{F}\mathrm{o}\mathrm{r}\mathrm{g}\mathrm{e}\mathrm{t} \ \mathrm{g}\mathrm{a}\mathrm{t}\mathrm{e}:{f}_{t}=\sigma \left({W}_{f}{x}_{t}+{U}_{f}{h}_{t-1}+{b}_{f}\right)$$
(5)
$$\mathrm{I}\mathrm{n}\mathrm{p}\mathrm{u}\mathrm{t} \ \mathrm{g}\mathrm{a}\mathrm{t}\mathrm{e}:{i}_{t}=\sigma \left({W}_{i}{x}_{t}+{U}_{i}{h}_{t-1}+{b}_{i}\right)$$
(6)
$$\mathrm{P}\mathrm{o}\mathrm{t}\mathrm{e}\mathrm{n}\mathrm{t}\mathrm{i}\mathrm{a}\mathrm{l} \ \mathrm{u}\mathrm{p}\mathrm{d}\mathrm{a}\mathrm{t}\mathrm{e} \ \mathrm{v}\mathrm{e}\mathrm{c}\mathrm{t}\mathrm{o}\mathrm{r}:{\stackrel{\sim }{c}}_{t}=tanh\left({W}_{\stackrel{\sim }{c}}{x}_{t}+{U}_{\stackrel{\sim }{c}}{h}_{t-1}+{b}_{\stackrel{\sim }{c}}\right)$$
(7)
$$\mathrm{C}\mathrm{e}\mathrm{l}\mathrm{l} \ \mathrm{s}\mathrm{t}\mathrm{a}\mathrm{t}\mathrm{e}:{c}_{t}={f}_{t}\odot {c}_{t-1}+{i}_{t}\odot {\stackrel{\sim }{c}}_{t}$$
(8)
$$\mathrm{O}\mathrm{u}\mathrm{t}\mathrm{p}\mathrm{u}\mathrm{t} \ \mathrm{g}\mathrm{a}\mathrm{t}\mathrm{e}:{O}_{t}=\sigma \left({W}_{o}{x}_{t}+{U}_{o}{h}_{t-1}+{b}_{o}\right)$$
(9)
$$\mathrm{H}\mathrm{i}\mathrm{d}\mathrm{d}\mathrm{e}\mathrm{n} \ \mathrm{s}\mathrm{t}\mathrm{a}\mathrm{t}\mathrm{e}:{h}_{t}=tanh\left({c}_{t}\right)\odot {o}_{t}$$
(10)
$$\mathrm{O}\mathrm{u}\mathrm{t}\mathrm{p}\mathrm{u}\mathrm{t} \ \mathrm{l}\mathrm{a}\mathrm{y}\mathrm{e}\mathrm{r}:\mathrm{y}={W}_{d}{h}_{n}+{b}_{d}$$
(11)
$$\mathrm{Sigmoid}\;\mathrm{function}:\sigma\left(x\right)=\frac1{1+e^{-x}}$$
(12)
$$\mathrm T\mathrm a\mathrm n\mathrm h\;\mathrm f\mathrm u\mathrm n\mathrm c\mathrm t\mathrm i\mathrm o\mathrm n:\mathrm t\mathrm a\mathrm n\mathrm h\left(x\right)=\frac{e^x-e^{-x}}{e^x+e^{-x}}$$
(13)

where i, f, and o are the vectors for activation values and \({\stackrel{\sim }{c}}_{t}\) and \({c}_{t}\) are the vectors for the cell states and candidate values. Notations used in the equations are described in Table 1. A typical LSTM architecture is presented in Fig. 4c.

Table 1 Notations used for LSTM and GRU models

The learning skill of the ML models during training is affected by a large range of values in the dataset. Therefore, for effective learning, the dataset is normalized and transformed to supervised learning. Input data is then converted to a three-dimensional format. For the final discharge prediction, the output is transformed to the original scale. The model performance is also affected by the choice of hyperparameters, such as the number of layers, hidden units, loss function, optimizer, batch size, learning rate, and dropout rate. The process of finding an appropriate set of hyperparameters is called hyperparameter optimization. In this study, hyperparameter optimization was done by the grid search method using the HParams dashboard in Tensorboard.

Gamma test

The selection of input is an important step in model development. The winGamma™ application (Durrant, 2001) was employed for GT to find out the best input. GT is a nonparametric method to determine suitable input for a nonlinear model (Stefánsson et al., 1997). The variance of the noise related to the output is estimated by the Gamma test. Input combination with minimum Gamma and V-ratio values is considered as the best-input combination for the model.

Performance evaluation

The model performance is evaluated by various statistical error measures, such as Nash–Sutcliffe efficiency (NSE), coefficient of determination (\({R}^{2}\)), mean square error (MSE), and mean absolute error (MAE). NSE is the widely used statistical metric for evaluating the goodness of fit of hydrological models. Its value ranges from − ∞ to 1, where 1 denotes a perfect model and negative values indicate that the mean of the observed data is the better predictor than the model (Nash & Sutcliffe, 1970). The \({R}^{2}\) measures the strength of the linear association between observed and simulated flows. Its value ranges from 0 to 1, closer to 0 represents a lower correlation while closer to 1 denotes a higher correlation. The MSE measures the average square of the errors. A lower MSE value indicates a better fit. MAE is the absolute value of the difference between observed and simulated values. Lower MAE values denote the lower error.

$$NSE=1-\frac{\sum _{t=1}^{n}{\left({Q}_{t}-{Q}_{t}^{\text{'}}\right)}^{2}}{\sum _{t=1}^{n}{\left({Q}_{t}-\stackrel{-}{Q}\right)}^{2}}$$
(14)
$${R}^{2}={\left[\frac{{\sum }_{t=1}^{n}\left({Q}_{t}^{\text{'}}-\stackrel{-}{{Q}^{\text{'}}}\right)\left({Q}_{t}-\stackrel{-}{Q}\right)}{\sqrt{{{\sum }_{t=1}^{n}\left({Q}_{t}^{\text{'}}-\stackrel{-}{{Q}^{\text{'}}}\right)}^{2}}\sqrt{\sum _{t=1}^{n}{\left({Q}_{t}-\stackrel{-}{Q}\right)}^{2}}}\right]}^{2}$$
(15)
$$MSE=\frac{\sum _{t=1}^{n}{\left({Q}_{t}^{\text{'}}-{Q}_{t}\right)}^{2}}{n}$$
(16)
$$MAE=\frac{\sum _{t=1}^{n}\left|{Q}_{t}-{Q}_{t}^{\text{'}}\right|}{n}$$
(17)

where \({Q}_{t}^{\text{'}}\) and \({Q}_{t}\) are the simulated and observed discharge at time t and \(\stackrel{-}{{Q}^{\text{'}}}\) and \(\stackrel{-}{Q}\) denote the average simulated and average observed discharge, respectively.

Results

The daily precipitation (P), mean air temperature (T), and SCA data are provided as input to the ML models for the river flow prediction. SCA was processed from MOD10A2 images for the research period (2002–2010). The monthly SCA in the Langtang catchment is shown in Fig. 5. The dataset is split into a training set (5 years), a validation set (2 years), and a testing set (2 years). Out of three ML models, the best model is employed to predict river runoff under future climate scenarios.

Fig. 5
figure 5

Snow cover area in the Langtang basin

Climate change analysis

The bias-corrected future temperature and precipitation data from two RCMs (inmcm4_r1i1p1 and CMCC-CMS_r1i1p1) under RCP 4.5 and RCP 8.5 are analyzed. As shown in Fig. 6, both RCMs under both RCPs show increase in temperature. Among two climate models, CMCC-CMS shows an increase in average temperature by 1.86 °C in the 2070s and 1.68 °C by 2090s under RCP 4.5, whereas, under RCP 8.5, the temperature continues to increase by up to 4.98 °C by 2090s. On the seasonal scale, the temperature rises constantly up to 2.6 °C by 2090s during winter whereas, for other seasons, the temperature reaches its peak by 2070s and then slightly decreases by 2090s under the RCP 4.5 scenario. Under the RCP 8.5 scenario, there is a maximum rise in temperature during pre-monsoon (+ 5.8 °C), followed by winter (+ 5.6 °C), post-monsoon (+ 4.6 °C), and monsoon (+ 4 °C), respectively.

Fig. 6
figure 6

Change in average annual temperature for different future periods compared to reference period under (a) RCP4.5 and (b) RCP8.5 scenarios

Temperature from the inmcm4 model decreases slightly by 2020s under both RCPs but then increases until 2090s by 0.72 °C and 2.3 °C under RCP 4.5 and RCP8.5, respectively. On the seasonal scale, the temperature rise is maximum during monsoon (+ 0.94 °C) followed by pre-monsoon (+ 0.89 °C), winter (+ 0.43 °C), and post-monsoon (+ 0.38 °C), respectively, by 2090s under RCP4.5. Under RCP8.5, the temperature rise is maximum by the 2090s during pre-monsoon (+ 2.79 °C), followed by monsoon (+ 2.5 °C), winter (+ 2 °C), and post-monsoon (+ 1.65 °C), respectively.

The ensemble average temperature of the two climate models shows an increase of 1.2° to 3.65 °C by 2090s for RCP4.5 and RCP8.5, respectively. On the seasonal scale, a relative increase in temperature is maximum during winter (+ 1.53 °C), followed by pre-monsoon (+ 1.51 °C), monsoon (+ 0.99 °C), and post-monsoon (+ 0.62 °C), respectively, under the RCP4.5 scenario. Whereas, under the RCP8.5 scenario, the temperature rise is maximum during pre-monsoon (+ 4.31 °C), followed by winter (+ 3.82 °C), monsoon (+ 3.26 °C), and post-monsoon (+ 3.13 °C), respectively.

The bias-corrected future precipitation data from the inmcm4 model shows the constant increase in annual average precipitation until the 2090s by up to 427 mm and 817 mm under RCP4.5 and RCP 8.5, respectively (Fig. 7). On the seasonal scale, precipitation will decrease in winter (− 8 mm) but increases significantly during monsoon (+ 284 mm) and post-monsoon (+ 115 mm), and slightly during pre-monsoon (+ 36 mm) by 2090s under RCP4.5. Under RCP8.5, precipitation will decrease during winter (− 51 mm) and post-monsoon (− 13 mm) but increases significantly during monsoon (+ 841 mm) and slightly during pre-monsoon (+ 41 mm) by the 2090s.

Fig. 7
figure 7

Changes in annual precipitation for different future periods compared to reference period under (a) RCP4.5 and (b) RCP 8.5 scenarios

The outputs from the CMCC-CMS model show a decrease in annual precipitation in the 2020s (− 119 mm) and 2070s (− 56 mm) and an increase in the 2040s (+ 158 mm) and 2090s (+ 242 mm) under RCP4.5 (Fig. 6). Under RCP8.5, precipitation is likely to increase in the 2020s (+ 108 mm), 2040s (+ 122 mm), and 2100s (+ 83 mm) but will decrease in the 2070s (− 115 mm). On the seasonal scale, precipitation will decrease during winter (− 33 mm) and post-monsoon (− 16 mm) but increase significantly during monsoon (+ 216 mm) and pre-monsoon (+ 74 mm) by 2090s for the RCP4.5 scenario. For the RCP8.5 scenario, precipitation will increase significantly during monsoon (+ 223 mm) but decrease during winter (− 50 mm), pre-monsoon (− 78 mm), and post-monsoon (− 11 mm) by the 2090s.

Based on the two models, a large variation in precipitation is projected in the future. The ensemble annual precipitation will increase by 16.9% to 22.7% by 2090s for RCP4.5 and RCP8.5, respectively. On a seasonal scale, precipitation may increase by 16% to 34% during monsoon but decrease by 24% to 60% during winter under RCP4.5 and RCP8.5, respectively. Precipitation is projected to increase by 20% during pre-monsoon under RCP4.5 but decreases by 7% under RCP8.5. Similarly, precipitation could increase by 62.8% under RCP4.5 but decreases by 15.3% under RCP8.5 by the 2090s during post-monsoon.

Model development

Input selection

GT was applied to various combinations of SCA, T, and P. Out of seven combinations, M7 achieved the minimum Gamma and V-ratio value as shown in Table 2. Therefore, Model M7 was preferred for further study. From Table 2, it is noted that V-ratio and Gamma values of M1 are lower than that of M2, which implies that river discharge in the snow-dominated region, such as Langtang, is more influenced by temperature than precipitation.

Table 2 Results of the Gamma test for different input combinations

Model optimization

Hyperparameter tuning is crucial for the best model performance. In this study, we applied a grid search using the HParams dashboard in Tensorboard to find suitable hyperparameters for ML models. HParams dashboard offers various tools to find the most promising set of hyperparameters. Previous studies have reported that models with one hidden layer are sufficient for hydrological modeling (Thapa et al., 2020b); therefore, we considered a single hidden layer followed by a dense layer. Different values for window size (1, 7, 15, 30, 60, 90) were tested and a window size of 30 is considered for this study as per trial and error. Several combinations of hyperparameters and the best result obtained as per the grid search are presented in Table 3. The fine-tuned models are employed for hydrological modeling, and the performance of models is compared.

Table 3 Hyperparameter tuning of LSTM, GRU, and RNN models

Comparison of ML models

The quantitative assessment of the models is based on four metrics: NSE, R2, MSE, and MAE. The LSTM model (NSE = 0.887, R2 = 0.98, MAE = 0.28, MSE = 0.105) showed the better performance than GRU (NSE = 0.84, R2 = 0.975, MAE = 0.31, MSE = 0.17) and RNN (NSE = 0.763, R2 = 0.96, MAE = 0.6, MSE = 0.52) model. The performance of ML models during training, validation, and testing are presented in Table 4.

Table 4 Evaluation of ML models

The box plot displaying the median, mean, and percentiles (25th and 75th) of residuals of model prediction is shown in Fig. 8. The mean and median of the residuals for a good model should be close to zero. The median value of the residuals for LSTM, GRU, and RNN models is 0.975, 1.004, and 1.189, respectively. Similarly, the mean value of residuals for LSTM, GRU, and RNN models is 1.497, 1.733, and 1.802, respectively. The mean and median of the residuals of ML models are greater than zero, which indicates that the ML models underestimated the river discharge. Based on the quantitative and qualitative assessment of the three ML models, it can be noted that the performance of the LSTM model is better than that of the GRU and RNN models.

Fig. 8
figure 8

Boxplot of residuals of ML models

Climate change impact on streamflow

After evaluating the performance of the RNN, GRU, and LSTM models, the best model (LSTM) was employed for predicting future snowmelt runoff until 2100. A previous study by Thapa et al. (2020a) reported that SCA in the Langtang basin is depleting at the rate of 0.22% per year. Considering the snow depletion rate of 0.22% per year, SCA was projected up to 2100. Future climate data (temperature, precipitation, and SCA) was provided as input to the trained LSTM model to predict future river discharge until 2100. Finally, the relative change in river discharge with respect to the baseline discharge is calculated as shown in Table 5. The result shows that the river discharge during Nov–Apr except Dec in the 2020s, and Mar–Apr in the 2040s, is likely to decrease as compared to that during the reference period for the RCP4.5 scenario. Similarly, it is observed that the discharge during Feb, Mar, Apr, and Nov in the 2020s and Mar in the 2040s are likely to decrease as compared to that during the reference period under the RCP8.5 scenario. The decrease in river discharge might affect water availability for households, as well as for agricultural irrigation and hydroelectricity generation. In the 2070s and 2090s, river discharge would increase for all months for both scenarios. The magnitude of the increase in river discharge for RCP8.5 is higher than that for RCP4.5. From the hydrographs (Fig. 9), it is well noticed that there will be a significant rise in river discharge in the future; however, the shape of the hydrographs are similar.

Table 5 Relative change in monthly river discharge under RCP 4.5 and RCP 8.5
Fig. 9
figure 9

Projected monthly streamflow under (a) RCP4.5 and (b) RCP8.5

The seasonal analysis reveals that river discharge is likely to decrease during winter (− 3.3%) and pre-monsoon (− 4.0%), whereas increases during monsoon (+ 10.5%) and post-monsoon (+ 7.9%) by 2020s under RCP4.5. River discharge continues to increase in all seasons by the 2040s, 2070s, and 2090s for RCP4.5 (Fig. 10a). Under RCP8.5, river discharge decreases during pre-monsoon (− 3.1%) and increases during winter (+ 3.8%), monsoon (+ 11.8%), and post-monsoon (+ 4.7%) by 2020s. River discharge will increase significantly in all seasons by the 2040s, 2070s, and 2090s under the RCP8.5 (Fig. 10b). During the baseline period, river discharge is maximum during monsoon (Jun–Sep), followed by post-monsoon (Oct–Nov), pre-monsoon (Mar–May), and winter (Dec–Feb), respectively. There is a large variation in future discharge in the seasonal scale, i.e., winter (− 3 to 111%), pre-monsoon (− 4 to 96%), monsoon (10 to 58%), and post-monsoon (4 to 84%), but no shift in hydrograph is anticipated. The average annual discharge is likely to increase by 6.2%, 17.1%, 35.5%, and 46.5% by 2020s, 2040s, 2070s, and 2090s, respectively, under RCP4.5. Similarly, the average annual discharge will increase by 6.9%, 22.2%, 48.6%, and 76.1% by 2020s, 2040s, 2070s, and 2090s, respectively, for the RCP8.5. From these results, it can be noted that the magnitude of change in the river discharge depends on time and RCP scenario.

Fig. 10
figure 10

Relative change in future streamflow for (a) RCP4.5 and (b) RCP8.5

Discussion

The climate in the Himalayas is changing rapidly and is expected to further change significantly in the future. The future climatic condition of Langtang basin, Nepal was projected up to 2100 using outputs of RCMs. The study revealed that the average temperature in the basin could increase by 4.98 °C by 2090s, which is approximately equivalent to 0.058 °C/year. The historical analysis of the temperature data in the Langtang basin during 1980–2015 showed a constant rise in temperature at the rate of 0.04 to 0.068 °C/year (Thapa et al., 2020a). From these studies, it is well noticed that the average temperature shows a rising trend since the 1980s, and a similar trend will continue until 2100. Similar to the findings of the current study, previous studies in the Kaligandaki basin (Bajracharya et al., 2018) and the Tamor basin (Bhatta et al., 2019) also reported an increase in the average temperature by over 4 °C by 2090.

Precipitation shows a large variation in the future. The annual precipitation in the Langtang basin would increase by 16.9% to 22.7% by the 2090s for RCP4.5 and RCP8.5 scenarios, respectively. Immerzeel et al. (2013) also reported an increase in future precipitation in the Langtang basin, whereas Pradhananga et al. (2014) predicted a decline in future precipitation. In a previous study by Bajracharya et al. (2018), it was found that the annual precipitation in the Kaligandaki basin could increase by 26% by 2100, which is similar to the results of our study.

In this study, river discharge was found to increase by 17.1% and 46.5% by the 2040s and 2090s, respectively, under RCP4.5. Similarly, river discharge could increase by 22.2% and 76.1% by the 2040s and 2090s, respectively, for the RCP8.5 scenario. River discharge was found to be maximum during monsoon during the baseline period and would increase significantly in the future up to 58%, which is mainly due to an increase in monsoon rainfall and snowmelt. Whereas during winter, though the total precipitation (snowfall and rainfall) is expected to decrease, the discharge is increasing as there will be more precipitation in form of rain due to temperature rise. The increase in river discharge along with high precipitation may trigger mountain hazards, such as floods and landslides during the monsoon season. Whereas, the decrease in discharge during the pre-monsoon in the 2020s indicates a problem with water availability. These results are comparable to findings by Immerzeel et al. (2013), which reported that river discharge will increase significantly due to the increase in precipitation and snowmelt in the Langtang basin, while contradicts the results of Pradhananga et al. (2014), which stated that no significant increase in discharge would occur until 2050. Such a result of Pradhananga et al. (2014) is perhaps due to future climate projections from NORESM GCM, which projected a decrease in future precipitation. However, they stated that if temperature and precipitation would increase by 2 °C and 20%, respectively, the discharge would upsurge by 43.9%, which is comparable to the result of our study. The result of our study is in line with the result of studies in the Kaligandaki basin (Bajracharya et al., 2018) and the Indrawati river basin (Shrestha et al., 2017), which also reported that increase in discharge due to high precipitation and snowmelt.

Uncertainty in the future discharge projection may arise due to the spread in future climate projections obtained from GCM/RCMs. In this study, to minimize the uncertainty, high-resolution RCMs, namely, CMCC-CMS and inmcm4, were selected carefully based on envelope approach and past performance as described by Lutz et al. (2016). Even so, there is a considerable difference between climate data projected by CMCC-CMS and inmcm4 climate models. The temperature projection of CMCC-CMS was higher than that of inmcm4. Whereas for precipitation data, inmcm4 projection was higher than CMCC-CMS projection and they exhibited a different trend. The future climate projected in this study contradicts the findings by Pradhananga et al. (2014), which projected a decrease in future precipitation by 1.9 mm/year and an increase in temperature by 0.015 °C/year based on NORESM GCM. Therefore, while preparing mitigation and adaptation policies, the choice of climate models should be taken into consideration.

The selection of input is an important step in model development. In this study, the suitable input combination for ML models was determined by GT. From GT, it is observed that river discharge in the snow-dominated region is more sensitive to temperature than precipitation. In cold regions, such as the Himalayas, temperature determines whether precipitation is in the form of snow or rain. Therefore, river discharge in the snow-dominated region, which depends on snow accumulation and ablation process, is more affected by the temperature.

Although ML techniques, including DL, are recently gaining huge popularity in various research areas, they are not widely used in hydrological modeling and climate change impact assessment. This study verifies the suitability of ML models for snowmelt runoff prediction (NSE > 76%). Unlike sophisticated conventional hydrological models, ML models do not require various parameters related to hydrological process and topographical condition. Therefore, ML models are appropriate for hydrological modeling in the data-scarce HKH region.

The rise in temperature and varying precipitation will have a significant effect on the overall cryospheric environment in the Langtang basin; however, this study is only focused on climate change’s impact on river discharge. Land use and land cover (LULC) can have a substantial influence on the hydrology of watersheds (Tankpa et al., 2020). For accurate assessment of future water availability in the region, the combined effect of climate change and LULC alteration should be assessed.

Conclusion

In this study, we analyzed the future climate change in the Langtang basin, Central Himalayas, Nepal based on the outputs from two RCMs, namely, CMCC-CMS and inmcm4, under RCP 4.5 and RCP 8.5 scenarios, and quantified its impact on the river discharge using advanced DL approach. We developed three ML-based models, including RNN, GRU, and LSTM model, for river discharge prediction, and evaluated their performance based on statistical indicators. The input combination for the models was chosen based on GT. The hyperparameters of the models were optimized by the grid search method. Among three ML models, the best performing model was employed for future discharge prediction using climate forcing data up to 2100.

The average temperature in the Langtang basin could increase by 1.68 °C to 4.98 °C by 2100 under RCP4.5 and RCP8.5 scenarios, respectively. The highest temperature rise is projected in pre-monsoon (5.8 °C) followed by winter (5.6 °C), post-monsoon (4.6 °C), and monsoon (4 °C), respectively, by the end of the century. Annual precipitation will increase by 16.9% to 22.7% by 2100 for RCP4.5 and RCP8.5, respectively. On the seasonal scale, precipitation increases during monsoon by 16–34% and decreases during winter by 24–60%; however, for other seasons there is a large variation and no definite trend is observed. The annual river discharge is projected to upsurge by 46% to 76% by 2100 under RCP4.5 and RCP8.5 scenarios, respectively. The magnitude of change in river discharge depends on time and RCP scenario. The seasonal analysis reveals that there will be a significant rise in discharge by up to 58% during the monsoon season due to high rainfall and snowmelt, which may trigger mountain hazards, such as floods and landslides. However, the decrease in river discharge during the pre-monsoon season in the near future might affect water availability for households as well as electricity generation and irrigation purposes. All three ML models used in this study performed well (NSE > 76%) but they underestimated the high flows. The efficiency of the LSTM model (88.7%) was found to be greater than GRU (84%) and RNN (76.3%) models.

The outcomes of this study will be useful for better understanding climate change and its impact on water resources in the basin. However, the result may not be representative of the entire HKH region, and therefore, to get a clear picture of climate change’s impact on water resources in the HKH region, more studies should be carried out in various reference basins in the region. To further improve the study, the combined impact of climate change and LULC alteration on river discharge should be assessed. This study demonstrates the suitability of the LSTM model in streamflow prediction in the data-scarce HKH region. This approach can be replicated in other basins for determining water availability for irrigation, hydropower, and water supply projects. Nevertheless, the uncertainty in river discharge projection due to the choice of climate models should be taken into consideration while developing and implementing the mitigation and adaptation strategies.