Introduction

Climate change and anthropogenic activities are major influential factors determining the state of hydrological processes, and these circumstances are observational facts from different parts of the world (Huntington, 2006). One of the major causes of declining water resource availability and shifting worldwide geographical distribution is climate change (Kundzewicz et al., 2008; Nassery et al., 2021; Roy et al., 2020). Global warming is acting as a medium in climate change and hastens the unpredictability of the atmospheric variables (Arnell, 1999). Climate change and anthropogenic activities, including land use and land cover (LULC) changes, industrialization, and infrastructure developments, have altered hydrological processes. These are collectively responsible for bringing noteworthy changes to water resources globally (Qin et al., 2014). According to Walling and Fang (2003), rivers accounting for 22% of the world have experienced a noticeable amount of declination of the annual streamflow in recent decades, and this decrease is due to excessive water use, diversion, and reservoir construction, resulting in considerable environmental problems. Anthropogenic activities have caused streamflow alteration. Consequently, it invites hydro-ecological stress and complexity in LULC (Li et al., 2009; Nada et al., 2021).

River water-lifting for urban uses (Rose & Peters, 2001), green water harvesting (Pal, 2016a, b), installation of dams and barrages over rivers, and diversion of water through canals (Pal & Saha, 2018; Pal, 2016a; Talukdar & Pal, 2017), and massive withdrawal of water through river lifting for irrigation (Talukdar & Pal, 2018) are some prominent examples of hydrological alteration. Therefore, several scholars from different countries have explored the historical time series of climatic parameters and hydrologic parameters for the identification of change points and trend detection to explore the effects of changing climate and urbanization as well as installation of engineering infrastructures over rivers (Guo et al., 2020; Talukdar & Pal, 2020; Tan & Gan, 2015). Many academics are looking into detecting streamflow trends, which can help with understanding the reasons of past changes and bring fresh insights into management of water resources and water environmental conservation (Rougé et al., 2013; Zhang et al., 2015).

In the last decades, scholars have researched the detection of trends from time series climatic (rainfall, temperature, humidity) and hydrologic (streamflow) parameters using the linear regression model, Mann–Kendall (MK) test, modified MK test, and Spearman’s rho (Praveen et al., 2020; Ouatiki et al., 2019; Dinpashoh et al., 2019; Birara et al., 2018; Bisht et al., 2019). Furthermore, Topaloglu (2006) assessed the trends in annual maximum, minimum, and mean streamflow, as well as monthly mean streamflow, in 26 Turkish basins using the MK test and found a positive trend in basin numbers 14–16 and 22–25; the rest of the basin also show a downward trend. Yenigün et al. (2008) applied the MK test and Spearman’s rho for trend detection in the streamflow of the Euphrates River basin. Cigizoglu et al. (2005) worked on Turkish rivers to detect low, mean, and high streamflow trends using the parametric t-test and the MK test. They reported that a positive trend was detected at a few stations. The rest of them were detected as a negative trend. Fathian et al. (2016) used the MK test, Spearman’s rho, SMK test, and TS method to identify the trend and magnitude in the Urmia Lake basin, Iran.

Sen (2012) first proposed an innovative trend analysis (ITA) and used it to detect trends in the annual streamflows and total precipitation of some stations in Turkey. Sen (2014) used an innovative trend method for a series of temperature data in the Marmara region of Turkey and suggested that ITA did not have any assumptions or restrictions. Therefore, it could apply to any time series dataset. For instance, the datasets could be serially correlated and non-normally distributed, or the datasets could have a short length. So, it can be seen that ITA has an enormous advantage over the other methods. As a result, researchers have increasingly used ITA to identify trends in hydrological and climatological parameters (Mohorji et al., 2017; Serinaldi et al., 2020; Wu & Qian, 2017). Kisi and Ay (2014) applied the MK test and ITA to detect the trend in the water quality parameters of the Kizilirmak River, Turkey, and found the successful application of the ITA method for trend detection. Kisi (2015) observed negative and positive trends in monthly pan evaporation data from six stations in Turkey using the ITA method. Caloiero et al. (2016) used the ITA method, MK test, linear regression method, and Sen’s slope estimator to detect trends in annual and seasonal rainfall and temperatures of the Yangtze River basin, China, and reported that those climatic variables had increased significantly. The scholars concluded that linear regression and MK tests effectively detect many times, but ITA can.

Various ML algorithms like artificial neural network (ANN), hybrid wavelet ANN, random forest (RF), and other sophisticated time series and artificial intelligence-based techniques explaining the streamflow have piqued the interest of academics, particularly hydro-engineers (Mehdizadeh et al., 2019; Mohammadi et al., 2020). These approaches are not based on the data’s net time series change. But these approaches are based on time series variability and data trends, and as a result, they are more sensitive and accurate in predicting future trends (Cheng et al., 2020; Tyralis et al., 2021). Using an ANN and SVM, Adnan et al. (2021) anticipated Indus River’s streamflow with higher accuracy. Peng et al. (2017) used ANN and wavelet transformation to estimate future streamflow for the Yangtze River. Several studies have used SVM (Adnan et al., 2017; Yaseen et al., 2015; Yaseen et al., 2016) to estimate streamflow with improved accuracy in various river basins. Forecasting with RF has been widely used in several river basins (Pham et al., 2021; Ghorbani et al., 2020; Saadi et al., 2019; Pham et al., 2019). RF application for streamflow forecasting is a complex technology that many researchers have successfully deployed in many river basins for flow forecasting (Ali et al., 2020; Ni et al., 2020; Pham et al., 2019). The previous research has showed the quantification of the damming effect on the river using traditional techniques. Also, the application of advanced ensemble machine learning for future forecasting with quantification of damming has not been done so far. Hence, in the present study, we quantified the damming effect with advanced techniques like innovative trend analysis and machine learning algorithm like random forest for forecasting. The identification of problems with its future forecasting can be a resource for river management.

The present study aimed to undertake a trend analysis of river flow data and a change point analysis. This investigation is being performed to determine whether any flow modification produces a change in inflow. In addition, the present study attempted to forecast the streamflow using the random forest. The originality of present study lies in the application of advanced trend detection techniques like ITA and conventional trend detection techniques for comparing the performance of the trend detection techniques to find the most suitable technique in terms of robust results. Most of the previous studies have directly applied trend detection techniques on the streamflow, but in our case, we first applied trend detection techniques on the annual and seasonal datasets, then identified the trend in the annual and seasonal datasets, and applied the same techniques on the pre- and post-change point seasonal and annual datasets. As a result, the exact changes of streamflow with quantification can be made, which is very difficult for the dataset without change detection. The present study has also made comparisons between change-point wise analysis and without change-point wise analysis. Another novelty is the application of advanced ensemble machine learning algorithms like a random forest for predicting the historical time series streamflow data and forecasting the future streamflow for annual and seasonal periods. The present study will promote an understanding of water resource management, bearing in mind the research gap and data-scarce situation in developing nations for streamflow forecasting.

Study area

For the present study, Punarbhaba River basin has been taken as the study area. The total length of the Punarbhaba River is 160 km and its basin area is about 5265.93 km2 which lies on the Barind Tract of India and Bangladesh. The basin has an elevation that ranges from 89 m (in the source region) to 12 m (at the confluence) (Fig. 1). A single-channel meandering river in its upper course characterizes the river, but it becomes a multi-channel river system within the valleys in the lower course. In its upper part, the Punarbhaba River has low-to-moderate sinuousness, which specifies the area’s sloping surface, but the Punarbhaba River’s sinuosity increases abruptly within the valley flat. Several abandoned channels, ox-bow lakes, channel scars, scrolls, and loops are present in these valleys (Rashid et al., 2013; Talukdar & Pal, 2018), and these might be created because of neo-tectonic upliftment and consequent incision of channels. The annual average rainfall in this basin ranges from 258 to 509 mm, out of which 14.46%, 70.16%, and 12.24% rainfalls have been occurred in pre-monsoon, during monsoon, and post-monsoon seasons, respectively. The rainfall trend shows that there has been no significant difference in rainfall in different seasons since 1978–2016, as shown by a very low coefficient of determination (R2 = 0–0.046) while calculating the least square regression.

Fig. 1
figure 1

Location of the Punarbhaba River basin with the location of dam and river gauge station

Materials and methods

Materials

The Punarbhaba River basin’s daily water flow data is gathered from relevant river monitoring sites of Haripur Gauge station over Punarbhaba River, Malda, from 1978 to 2015.

Change point detection

We investigate the occurrence of abruptly changing points or change points in time series climatic and hydrological data using a variety of techniques (Meysam et al., 2012). The Pettitt test of Pettitt (1979) as well as standard normal homogeneity test (SNHT) of Alexandersson and Moberg (1997) was used to detect sudden changes in the Punarbhaba River’s streamflow in this work.

Pettitt test

The Pettitt test is a rank-based, dispersion test for identifying a substantial change in the mean of a data series, and it is especially beneficial when there is no need to construct a hypothesis about where the change point is. It has been used widely to detect changes in observed meteorological and hydrological time series data (Gao et al., 2013). The detailed explanation in the form equation can be found in Talukdar and Pal (2016).

Standard normal homogeneity test

The Alexandersson test, also known as the SNHT, is used to detect an abrupt shift or the existence of a change point in climatic and hydrologic time series datasets. Equation 1 is used to determine the change point or change:

$$T_{s} = \max T_{m} ,1 \le m\langle n$$
(1)

where Ts denotes the location of the change point in the time series as it achieves the maximum value, and Tm is derived using Eq. 2:

$$T_{m} = \overline{m} z_{1} + (n - m)\overline{z}_{1} ,m = 1,2,....,n$$
(2)

where 

$$\mathop {z_{1} }\limits^{ - } = \frac{1}{m}\sum\limits_{i = 1}^{n} {^{{}} } \frac{{(M_{i} - \overline{M} )}}{s}$$
(3)

where \(\overline{M}\) and s refer to the mean and standard deviation of the sample data, respectively.

Trend analysis

Mann–Kendall test

The MK test (Kendall, 1975; Mann, 1945) is a non-parametric rank-based test that is widely used to detect trends in time series hydro-climatic data (Yue & Wang, 2002; Yue et al., 2002) because of its strength for non-normally distributed data and low sensitivity to sudden change. This test has been performed on the R studio software.

The trend test results can determine whether a time series of hydro-climatic variables exhibit a statistically significant trend or a trend that could occur coincidentally. However, in order to do so, one should first evaluate the data series’ serial correlation (Jenkins & Watts, 1968). Because a positive serial correlation can raise the number of expected positive-false results for the Mann–Kendall test, the presence of serial correlation can make trend detection more difficult (Von Storch & Navarra, 1999). Thus, before implementing the Mann–Kendall trend test, serial correlation must be eliminated. The trend-free pre-whitening (TFPW) method of Yue et al. (2003) removes serial correlation.

The Theil-Sen estimator (Sen, 1968) was used for the analysis of magnitude of the trend (Tabari & Talaee, 2011; Tabari & Aghajanloo, 2013). Several previous studies have successfully used the Theil-Sen estimator for computing the magnitude of the trend of hydro-climatic variables (Schaefer & Domroes, 2009; Tangang et al., 2006).

Spearman’s rank correlation coefficient (rho)

We used Spearman’s rank correlation technique, a non-parametric method, for computing the trend analysis in hydro-climatic datasets. This technique is also comparable with the Mann–Kendall test. The detailed of Spearman’s rank correlation technique can be found in Talukdar and Pal (2016).

Innovative trend analysis

The ITA splits a data series into 2 sub-series of equal size and arranges them in ascending order. In a 2-dimensional Cartesian coordinate system, i.e., primary sub-series (xi) is placed on the horizontal axis (x-axis), while another sub-series (yi) is placed on the vertical axis (y-axis) (Fig. 2). The points in the scatterplot are gathered on the 1:1 (45) line if the sub-series are equal, indicating no trend. If the points are above the 1:1 line, the time series is considered rising. If the points aggregate below the 1:1 line, it is assumed that the time series has a falling trend (Sen, 2012). The vertical or horizontal distance from the 1:1 line is the absolute value of the difference between a point’s y and x values. The difference shows the magnitude of a growing or declining trend. As a result, it may determine a trend, with average differences showing a time series’ overall tendency (Shahfahad et al., 2022). The average discrepancies between two time series with different magnitudes must be normalized before they can be compared. Because the first sub-series is used to signify change, the trend indicator is calculated by separating the normal difference by the typical of the first sub-series. The indicator is multiplied by 10 to obtain the same scale as the Mann–Kendall test and linear regression analysis, allowing for direct comparison. The ITA indication is then written:

$$D = \frac{1}{n}\sum\limits_{i = 1}^{n} {\frac{{10(y_{i} - x_{i} )}}{{\mathop x\limits^{ - } }}}$$
(4)

where:

Fig. 2
figure 2

Representation of annual water level, average, maximum and minimum monthly water level

D represents trend indicator, in which the positive value indicates a rising trend while the negative value reflects a falling trend;

n represents the number of annotations in each sub-series; while.

x represents average of the first sub-series.

If the original data series contains odd annotations, the first annotation is removed before splitting to ensure that the most recent data is fully used.

Method for streamflow forecasting

Scholars, notably hydro-engineers, are interested in ML algorithms such artificial neural network (ANN), hybrid wavelet ANN, random forest (RF), and other time series and AI-based methodologies to predict and forecast streamflow (Mehdizadeh et al., 2019; Mohammadi et al., 2020). Researchers have successfully applied RF for streamflow forecasting in numerous river basins (Ali et al., 2020; Ni et al., 2020; Pham et al., 2019). They have found that RF model outperformed other models. This is why the RF model has been selected for forecasting streamflow in the present study. Random forest is customized bagging supervised machine learning approach often employed for classification and prediction (Prasad et al., 2019), but it has lately been utilized for time series forecasting (Srivastava et al., 2019). A random forest is used to create an ensemble of decision trees, with each tree being created from bootstrap training samples (Breiman, 2001). The bootstrap classifier seems to be quite similar to the decision tree hyper-parameter. It is critical to growing the trees to their maximum sizes and numbers for the forecasting model to operate well (Breiman, 2001). The appropriate numbers of selected predictors are required at each node of the trees. The number of observations at the tree’s terminal nodes would be the smallest. The random forest ensemble method operates in this way. To estimate the discharge data for next years, the complete time series records for the years 1978 to 2018 were employed in the current research. It would have been impossible to anticipate future circumstances if we had not utilized all of the pre- and post-dam years to forecast each year. A time series is a collection of successive observations performed over time. In order to anticipate future observations, time series forecasting successfully applies models to past data first. For instance, the measurement may be used as an input to forecast the next minute, day, or month(s). Lag periods or lags refer to the actions that are thought to move the data backward in time (or a sequence). The input data is initially time series discharge data; however, as part of the modelling process, the data is automatically separated and converted to lag in accordance with the requirements of the programme. In order to anticipate future discharge, we employed 5 lags and 1 lag representing 5 years of data in the current research. Since the outcomes for the 5 lags are good, we set 5 lags by analysing 1–4 lags. The calculation parameters have a considerable impact on the performance accuracy of various forecasting models. Then, we used a trial-and-error technique to refine the input data and parameters to get the optimal model for each method. As a result, optimization is used to find the ideal parameters for generating the best forecasting model. The optimized parameters for RF are the following: lag: 5, Seed-5, number of iterations: 1000, learning algorithm: random tree; the number of threads: 4, depth of tree: 100; bag size: 1003.

Validation of the model

Different error statistics, such as mean absolute percentage error (MAPE), root mean square error (RMSE), and mean absolute error (MAE), have been produced to measure the correctness of the model. This data helps determine the parameters and structures during the calibration phase. Equations 6, 7, and 8 were used to determine the RMSE, MAE, and MAPE, respectively. Between the observed and predicted flow data from 1978 to 2017, Pearson’s correlation coefficient (Eq. 9) is calculated. If the association is statistically significant in this known time when ground data is accessible, it is expected that the anticipated outcome up to 2030 will be accepted as well.

The correlation coefficient approach was also used to assess the relationship between different models’ predicted data (2018–2030).

$${\text{RMSE}} = \sqrt{\frac{1}{n}} \sum\limits_{i = 1}^{n} (P_{(\text{predicted.flow})i} - Q_{(\text{observed.flow})i} )^{2}$$
(5)
$${\text{MAE}} = \frac{{\sum\nolimits_{i - 1}^{n} {\left| (P_{(\text{predicted.flow})i} - Q_{(\text{observed.flow})i} ) \right|} }}{n}$$
(6)
$${\text{MAPE}} = (\frac{1}{n}\sum {\frac{{\left| {{\text{Observed.flow - predicted.flow}}} \right|}}{{\left| {{\text{Observed.flow}}} \right|}}} ) \times 100$$
(7)

The better the function of the models, the lower the values.

$$R = \frac{{n(\sum {P_{{({\text{predicted.flow}})i}} \times Q_{{({\text{observed.flow}})i}} ) - (\sum {P_{{({\text{predicted.flow}})i}} )} \times (\sum {Q_{{({\text{observed.flow}})i}} } )} }}{{\sqrt {[n\sum {P_{{({\text{predicted.flow}})i}}^{2} } } - (\sum {P_{{({\text{predicted.flow)}}i}} )^{2} ] \times [n\sum {Q_{{{\text{(observed.flow}})i}}^{2} - (\sum {Q_{{({\text{observed.flow}})i}} )^{2} ]} } } }}$$
(8)

Result

Description of streamflow

Box plots were used to assess the data distribution of yearly, average monthly, maximum monthly, and minimum monthly streamflow data from 1978 to 2017 (Fig. 2). It is observed that the yearly water level has no outliers; however, the average monthly water level has minimal outliers in all months except March to June, indicating that the streamflow has been shifting. Similarly, minimal outliers were identified in maximum and lowest streamflow throughout the summer, winter, and post-monsoon months. This condition shows that there are hydrological changes taking place. Non-uniform or non-normal data distribution can be applied to all time series data. This anomaly is observed because of river damming. As a result, the river’s periodicity has been compromised.

Change detection analysis

The year 1992 is computed as a change point in the data series, according to the Pettitt test and the standard normal homogeneity test (1978–2016). Figure 3 shows that the average annual streamflow levels before and after the transition point are 22 and 18 m, correspondingly. Results show that 3.12 m of average water level has been decreased since the year of change point. Trend analysis was done after detecting a sudden change point, which has been found in the year of 1992 in the 49 years’ time series datasets, since the trend should be differentiated from pre- to post-change point.

Fig. 3
figure 3

The identification of the change point for the average, maximum, and minimum streamflow time series datasets during the 1978–2016

Analysis of trend detection

Annual streamflow trend analysis

The MK test and Spearman’s correlation test results for the yearly flow data of the Punarbhaba River for change point-based segmented two data series are shown in Table 1 (pre-change point: up to 1992; and post-change point: 1993–2016). Even at a 90% significance level, the MK test did not show any significant change in average, maximum, or lowest flow levels until 1992, although a negative trend is found in all situations. The outcome of Spearman’s rho test is nearly equal to that of the MK test. For all the scenarios exhibiting a continuous flow regime up to 1992, the size of the trend’s slope and percentage change are likewise minor.

Table 1 Computation of the MK and rho tests’ values for analysing the trend in the average, maximum, and minimum streamflow data

However, in the post-change point data series, the findings of such tests are negative and significant. With a 95% significance level, the Z values and rho values of average, maximum, and minimum streamflow data are−0.5731 and−0.776,−0.6206 and−0.8182, and−0.4783 and−0.6759, respectively (Table 1). The average (−13.46), maximum (−14.38), and minimum (−9.91) streamflow percentage changes also show that the streamflow has been falling following the transition point.

Figure 4 shows the trend analysis for annual average (D:−0.645), maximum (D:−0.6), and minimum (D:−0.615) streamflow. It shows that the streamflow observed highly negative trend.

Fig. 4
figure 4

Innovative trend analysis for annual average, maximum, and minimum water level; first half of the series (1978–1997), second half of the series (1998–2016)

Monthly streamflow trend analysis

The results of the MK test and rho test of two monthly average streamflow data series are shown in Table 2. Results of the pre-dam stage indicate that a significant negative trend was observed for April and May, with Z and rho values of−0.69 and−0.86, and−0.64 and−0.82, respectively, at the 95% level of significance. At the 95% significance level, significant positive trends were seen in July and September, with Z values of 0.486 and 0.421, respectively. At a p-value of less than 0.05, all the previous months’ trends were negligible. The slope and percentage change have notable magnitudes only in April and May. As a result, we may describe the flow regime before 1992 as natural.

Table 2 Computation of the value of MK and rho tests for monthly time series average streamflow datasets

Table 2 reveals that a significant trend is identified for all months except June, with a p-value of 0.05 for the post-change point data series (1993 to 2017). All months observed a significant slope of flow change and relative change (−21.47 and−0.17). As a result, the streamflow has decreased since 1993.

The dataset’s average monthly data were regressed against the temporal scale to highlight the discovered trend further, and the regression results are shown in Fig. 5 with a streamflow time series, a fitted linear model (represented by the blue line), and a model equation. Results showed that only January month had no significant trend, while all other months have observed a significant negative trend of historical monthly streamflow (Fig. 5). The ITA for the average monthly water level is presented in Fig. 6. The D value of the ITA for all the months varies from−0.281 (January) to−0.94 (May) signifying a negative trend (Table 3).

Fig. 5
figure 5

Trend of average flow level and fitted trend line for different months

Fig. 6
figure 6

Innovative trend analysis for average monthly water level; first half of the series (1978–1997), second half of the series (1998–2016)

Table 3 Computation of trend values using innovative trend analysis for monthly time series average, maximum, and minimum streamflow datasets

Trend analysis for monthly maximum and minimum streamflow

The findings of the MK test and rho test for the two data series of monthly maximum and minimum streamflow are shown in Table 4. Except for May, the other months showed an insignificant trend at the maximum flow level at the pre-change point. May has a significant negative trend, with Z and rho values of−0.43 and−0.55, respectively. We detect this significant negative trend in April when the trend is analysed using minimum water level data before 1992 (Table 5). The maximum water level change is negative and significant except for January, February, June, and November (Table 3). With minimum water levels, March, April, May, October, and December showed negative and significant trends (Table 5). Figures 7 and 8 show the maximum and minimum water level trends in other months.

Table 4 Computation of the trend values using MK and rho tests for monthly time series maximum streamflow datasets
Table 5 Computation of the trend values using MK and rho tests for monthly time series minimum streamflow datasets
Fig. 7
figure 7

Trend of maximum flow level and fitted trend line for different months

Fig. 8
figure 8

Trend of minimum flow level and fitted trend line for different months

The results showed the historical monthly streamflow had significant negative trend for all the months except November which does not show any significant trend (Table 3). Figure 9 shows that ITA for maximum monthly water level, with a D value ranging from−0.3 (November) to−0.792 (May).

Fig. 9
figure 9

Innovative trend analysis for maximum monthly water level; first half of the series (1978–1997), second half of the series (1998–2016)

Results show that all months have observed a significant negative trend of historical monthly streamflow (Table 3). The ITA for minimum monthly water level is presented in Fig. 10 which shows a negative trend with a D value ranging from−0.341 (January) to 0.946 (May).

Fig. 10
figure 10

Innovative trend analysis for minimum monthly water level; first half of the series (1978–1997), second half of the series (1998–2016)

Seasonal trend of average, maximum and minimum streamflow regime

Table 6 portrays the findings of the MK test and Spearman’s correlation test of the seasonal average, maximum, and minimum flow data of Punarbhaba River for change point based on segmented two data series (pre-change point: up to 1992; and post-change point 1993–2016). The MK test does not show any momentous change in seasonal average, maximum, or minimum flow levels even at the 90% level of significance up to 1992, except pre-monsoon, but we perceive a negative trend in all cases. The finding of Spearman’s rho is more or less indistinguishable from the MK test. The magnitude of the trend slope and percentage change is also insignificant except for pre-monsoon for all the cases exhibiting a steady flow regime up to 1992. However, the results are negative and significant with post-change point time series data. The Z values and rho values of average and minimum pre- and post-monsoon streamflow data are −0.75 and −0.411, and −0.81 and −0.33, respectively, with a 95% significance level (Table 6). The streamflow data for the pre-monsoon, monsoon, and post-monsoon seasons all show statistically significant negative trends at the 95% significance level. The percentage change and Sen’s slope estimator produce identical results, implying that the seasonal streamflow has decreased since the change point. Figure 11 displays seasonal, maximum, and minimum water level trends.

Table 6 Computation of the trend values using the MK and rho tests for seasonal time series average, maximum, and minimum streamflow datasets
Fig. 11
figure 11

Seasonal average, maximum, and minimum water level change

The results of ITA for seasonal average, minimum, and maximum are presented in Fig. 12. Results showed that all seasons for annual, maximum, and minimum streamflow have observed a significant negative trend of historical monthly streamflow.

Fig. 12
figure 12

Innovative trend analysis for seasonal average, maximum, and minimum water level; first half of the series (1978–1997), second half of the series (1998–2016)

Streamflow forecasting

The RF model was used to simulate and forecast streamflow, and all the models projected flow based on their intrinsic mathematical basis. However, except for some days, the simulated trend in the post-hydrological period (PHA) and forecast trend up to 2030 are negative in all four seasons. For the winter, summer, monsoon, and post-monsoon seasons in 2030, the mean estimated flow using random forest is 19.26, 16.86, 19.53, and 18.39 m, respectively (Fig. 13a–d). In various seasons, the flow volume is likely to be reduced by 0.67% to−5.23%. There is an extensive range of probable flow oscillations during winter with no discernible trend. All models suggest that the flow is severely decreased in all seasons in the post-dam period (up to 2017). The expected flow would reduce further in the days ahead.

Fig. 13
figure 13

Streamflow forecasting up to 2030 using RF model for a winter, b summer, c monsoon, and d post-monsoon

Pearson’s correlation coefficient approach was used to determine the relationship between the models. The calculated relationships are positive to varying degrees, with a very high positive (r = 0.91) relationship discovered between the result of the RF model for summer, which is significant at the 99% confidence level (Fig. 14). The correlation coefficient between the observed and simulated flow data has been obtained for several models individually from 1978 to 2017 for the authentication of flow simulation and prediction models and computing RMSE, MAE, and MAPE (Table 7). In all seasons, RF can predict streamflow up to 2030.

Fig. 14
figure 14

Streamflow prediction for 1978–2017 using random forest for different seasons. a Winter. b Summer. c Monsoon. d Post-monsoon

Table 7 Validation of RF model in terms of different error measures

Discussion

The previous study shows that 1992 appeared as a change year, and the entire time series data is divided into two parts based on the change year. The trend analysis suggests a significant negative trend in the post-change point year (1992) in annual, monthly, and seasonal streamflow levels. The percentage change in the flow also supports the findings of the trend analysis. The identical findings were observed in the studies of Yue et al. (2003), Zheng et al. (2007), Xu et al. (2010), Rougé et al. (2013), Zou and Zhang (2012), Zhang et al. (2015), and Degefu and Bewket (2017) in their respective fields. Even before 1992, some of the pre-monsoon months before the change point showed a negative trend, owing primarily to water harvesting via temporary damming and lifting of water from the river. We analysed rainfall data from the same period to determine the cause of the abrupt flow change as rainfall feeds the river. Gao et al. (2013) rightly stated that climate change or anthropogenic control might invite such a change. The trends of rainfall for both pre- and post-change years do not show any significant change (pre-change point: y = −0.024x + 21.74, R2 = −0.056; and post-change point: y = 0.256x + 84.22, R2 = 0.053). So, rainfall is the triggering factor for abrupt flow level changes. Talukdar and Pal (2017) accounted for that in 1992. The construction of Kamardanga dam on the Dhepa River, a major tributary of the river Punarbhaba, and water diversion through the canal system are the major reasons behind the reduction of water level in the post-1992 period. Pal (2016a, b) studied the impact of the Massanjore dam on the Mayurakshi River and found that the dam was the main reason for the declining flow in the downstream segment of the river. Pal (2016a) and Pal and Saha (2018) condemned the dam as a vector for the attenuation of the flow of the Atreyee River. According to Zhang et al. (2015), irrigation projects in the upstream part are frequently responsible for a decline in flow volume in the downstream part. Nine river lift irrigation projects in the Indian part also withdraw water (40–400 m3/h). It is also treated as the reason behind the dwindling flow volume. Pal and Saha (2018) reported that water-lifting for irrigation from the Atreyee River leads to reducing the mainstream flow. Agricultural intensification will increase over time, causing the use of more water. As a result, it is anticipated that the volume of water will be reduced further in the near future. Wada et al. (2010) stated that lowering groundwater levels can negatively affect natural streamflow. Rashid et al. (2013) say that, because of the scarcity of surface water, farmers withdraw groundwater, and, noticeably, this rate has been amplified (250 times) in the last 30 years. The water table has progressively declined (average rate of 0.10 m/year). According to Rahman and Mahbub (2012), the expansion of irrigation in Bangladesh’s Barind Tract has depleted groundwater. Das and Pal (2017) also say that the Barind Tract has experienced a declining trend in the groundwater table. As a result, these studies suggest that groundwater levels may cause decreased streamflow. Groundwater lowering may convert the effluent river into an influent, specifically in the pre-monsoon months, and it may aggravate the problem of curtailing surface flow.

Talukdar and Pal (2017) established that alteration of streamflow is a crucial factor for the ecosystem of the river and the riparian wetlands, and the floodplain. According to Talukdar and Pal (2018), flow reductions in the stream can create an eco-deficit situation, which is dangerous for the environmental state of the river and the riparian environment. Flood frequency and the limit of lateral flood spilling may be reduced because of such flow attenuation, which may also cause gradual sterility of the soil and the aggravation of soil pollutants. So, this incident was detrimental both economically and ecologically. Here, it is critical to estimate environmental flow and ensure that it continues in a river.

The methods used in this study, such as RF, are advanced and effective for exploring the future trend of flow change from historical time series data (1978–2017) instead of the commonly used regression analysis. The regression techniques never capture the other flow change properties other than showing net change (Adnan et al., 2021; Shafaei & Kisi, 2016), which employs wavelet transformation and RF models for flow analysis and forecasting. As described in the “Result” section, this model appears to be an excellent simulating and predicting model in the current investigation.

Flow changes over ecological thresholds cause habitat and ecosystem vulnerabilities including stress and hydrological poverty (Poff & Zimmerman, 2010; Saha & Pal, 2019; Pal & Talukdar, 2018). Eco-deficit in all months, as illustrated in Figs. 8, 9, 10, 11, and 12, shows the escalating hydro-ecological impoverishment. Reduced flow from dam diversion and other water extraction is a major source of the developing eco-deficit. Wang et al. (2017) observed a decline in Yangtze River flow, while Li et al. (2017) reported a decline in Mekong River flow. But they did not anticipate flow and compare it to biological flow needs. Sima et al. (2021) found that flow loss has also degraded the riparian flood plain wetland. Reducing wetland habitat, reducing wetland water depth, extending water availability, and increasing uncertainty in wetland hydrological dynamics are direct results of the river’s rising eco-deficit (Yang et al., 2017; Talukdar & Pal, 2018; Ziaul & Pal, 2017). This research does not evaluate direct influence on ecological species; however, earlier work has addressed species simplification and extinction. Diversion of water by Farakka Barrage (installed in 1975) on the river Ganga destroyed the breeding and hunting grounds of 109 fish, amphibians, and other aquatic species, according to Gain and Giupponi (2014). Hossain and Haque (2005) found that 50 species became uncommon in Bangladesh post-Farakka.

Conclusion

The present paper argues that abrupt change points in the yearly streamflow occurred in 1992 in the Punarbhaba River, since then, the average, maximum, and minimum streamflow on an annual, monthly, and seasonal scale have decreased. The MK test and Spearman’s rho tests reveal a significant change trend in this respect. We have identified the primary reasons for the flow change after 1992 as dam construction, lifting water through river lift irrigation projects, and lowering groundwater support. Therefore, the flow before 1992 could be treated as a natural flow regime, and the post-1992 period could be treated as an artificially controlled flow regime. Rainfall does not play any significant role in the abrupt flow changes, which causes moderate to severe damages of the habitat of the species of the river and disturbs the river ecology. This type of flow alteration benefits a specific community living upstream of the dam and throws burdens on the people living downstream. The release of environmental flows is critical for the survival of river and riparian ecosystems and the health of habitats. The river’s hydro-ecological condition will deteriorate much more in the forecast period since the flow level will drop dramatically. This is a critical moment to consider and develop sensible measures for limiting such changes. Estimating ecological flow might aid policymakers in determining the actual quantity of water to be released from the dam to ensure the ecosystem’s existence. The predicted flow has already said that environmental requirements would be violated if this trend continues in the near future. This study is critical in reassessing current flow management tactics and determining the optimum choice for maintaining flow. A significant decrease in flow may obstruct water supply to agricultural areas, forcing people to migrate from river-based irrigation to more expensive underground water-based irrigation. Many species consider the current flow state unsuitable for them, and they may move or perish. A river is more than just a reservoir for irrigation; it is an essential environmental component with several socioeconomic and hydro-ecological advantages. Given this, releasing an ecologically sustainable flow is critical for the river’s survival and the health and vitality of riparian habitats and ecosystems.

In spite of several applicability of the present research, some drawbacks are persisted in the present study, such as lacks of daily streamflow data, which only can give the actual situation of the river. In addition, we used trial and error process for optimizing the RF. Also, recently, many advanced deep learning techniques like long short-term memory (LSTM) and recurrent neural network (RNN) have been evolved, which can provide accurate prediction. To overcome this issue, daily water-level data needs to be simulated using advanced techniques. Grid search optimization can be applied to optimize the RF algorithm. Finally, the deep learning should be used in the future research.