Drought index time series forecasting via three-in-one machine learning concept for the Euphrates basin

Latifoğlu, Levent; Bayram, Savaş; Aktürk, Gaye; Citakoglu, Hatice

doi:10.1007/s12145-024-01471-8

Drought index time series forecasting via three-in-one machine learning concept for the Euphrates basin

RESEARCH
Published: 16 September 2024

(2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Earth Science Informatics Aims and scope Submit manuscript

Drought index time series forecasting via three-in-one machine learning concept for the Euphrates basin

Download PDF

Levent Latifoğlu¹,
Savaş Bayram¹,
Gaye Aktürk² &
…
Hatice Citakoglu¹

11 Accesses
Explore all metrics

Abstract

Droughts are among the most hazardous and costly natural disasters and are hard to quantify and characterize. Accurate drought forecasting reduces droughts' devastating economic effects on ecosystems and people. Eastern Anatolia is the largest and coldest geographical region of Türkiye. Previous studies lack drought forecasting in the Eastern Anatolia (Upper Mesopotamia) Region, where agriculture is limited due to being under snow most of the year. This study focuses on the Euphrates basin, specifically the Tercan and the Tunceli meteorological stations of the Karasu River sub-basin, a vital Eastern Anatolia Region water resource. In this context, time series of 1-, 3-, 6-, 9-, and 12-month Standardized Precipitation Index (SPI) and Standardized Precipitation Evapotranspiration Index (SPEI) values were created. The Tuned Q-factor Wavelet Transform (TQWT) method and Univariate Feature Ranking Using F-Tests (FSRFtest) were used for pre-processing and feature selection. Several models were created, such as stand-alone, hybrid, and tribrid. Machine Learning (ML) methods such as Artificial Neural Networks (ANN), Gaussian Process Regression (GPR), and Support Vector Machine (SVM) were conducted for the time series analyses. The GPR approach was concluded to perform better than the ANN and SVM at the Tercan station. In other words, GPR performs better in 80% of cases than SVM and ANN models. At the Tunceli station for the SPI output, SVM, which had a superior performance in 60% of the cases, demonstrated a performance comparable to GPR. At the same time, ANN once again exhibited an inferior performance. Similarly, for the SPEI output at the Tunceli station, no clear superiority was observed between the GPR and ANN methods. Because both methods were successful in 40% of cases. This study contributes by introducing a third concept to the stand-alone and hybrid model comparison of drought forecasting, adding tribrid models. It has been detected that the Hybrid and Tribrid ML methods lead to a 91% and 64% decrease relative root mean square error percentage compared stand-alone ML methods for SPEI and SPI in two stations. While the hybrid model at Tercan station was more successful in 80% of the cases, the hybrid model at Tercan station was more successful in 90% of the cases. While hybrid models were observed to be superior, tribrid models not only demonstrated performance close to the hybrid models but also provided advantages such as reducing computational load and shortening calculation time.

Forecasting effective drought index using a wavelet extreme learning machine (W-ELM) model

Article 23 May 2016

Short-term SPI drought forecasting in the Awash River Basin in Ethiopia using wavelet transforms and machine learning methods

Article 17 December 2015

Application of artificial intelligence hybrid models for meteorological drought prediction

Article 25 December 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The environment is constantly affected by the rapidly increasing population rate and technological progress (Kilinc and Yurtsever 2022). Climate change, which directly affects all living things in both the short and long term, has emerged as one of the foremost environmental challenges of our age (Katip 2018), especially gaining importance in recent years with the increasing awareness of environmental issues. Climate change is a statistically significant change in the average state or variability of the climate over an extended period, which can occur both due to changes in natural climate dynamics and external factors resulting from human actions. Various potential future scenarios, such as changing rainfall patterns, higher temperatures, and rising sea levels, could result from increasing climate change influences, the consequences of which could play a significant role in ecosystems, societies, and economies (IPCC 2014). Increasing temperatures, irregularities in precipitation, and changes in the frequency of extreme events change the total and seasonal water supply, and when these changes are combined with land use, they significantly affect hydrological processes at the basin level (Wang et al. 2019; Zhang et al. 2019; Alivi et al. 2021). This situation may stand out as considerable evidence about the effects of climate change. Therefore, it is vital to determine and implement management strategies compatible with climate change for the sustainable use of water resources today.

The Intergovernmental Panel on Climate Change (IPCC) Report on extreme events, which supports the targets of the United Nations Framework Convention on Climate Change, has acknowledged drought as a significant extreme climatic event requiring mitigation to minimize its adverse impacts (Field 2012; Deo et al. 2017b). Drought is a persistent and recurrent natural disaster on a global scale, often linked to climate change (Mohammed et al. 2018; Naumann et al. 2018). Drought forecasting is critical to combating drought natural disasters and is vital for risk management and mitigating the drought management process (Mishra and Singh 2010, 2011; Belayneh et al. 2016; Deo et al. 2017b). Additionally, drought forecasting accurately helps to lessen their catastrophic economic effects on ecosystems and people (Mo et al. 2009). It is challenging to forecast when a drought will start because it might appear out of nowhere, move swiftly, and have several outcomes (Wilhite 2000). Droughts are agricultural, hydrological, meteorological, and socioeconomic (Katip 2018; Evkaya and Kurnaz 2021). Most other natural disasters are not like droughts in many aspects, especially when it comes to the degree of difficulty in forecasting the drought's start, end, and severity (McKee et al. 1993).

Türkiye, situated in a semi-arid environment, is vulnerable to the disastrous effects of droughts (Aibaidula et al. 2022). Therefore, forecasting the possible impacts of future droughts is vital. This study, specifically focusing on the Euphrates basin, aims to develop a more accurate and efficient method for drought forecasting. However, traditional stochastic models have limitations in forecasting nonlinear data.

Drought indices are crucial for continuously monitoring drought events in terms of their temporal and spatial extent, assessment of their severity and spatial dimension quantitatively, and early identification, i.e., predicting drought and enabling the development of management strategies under current climate conditions. Drought assessment plays a significant role in the planning and management of water resources. Among these drought indices, the Standardized Precipitation Index (SPI) and the Standardized Precipitation Evapotranspiration Index (SPEI) are widely used in the literature due to their ability to be utilized at multiple time scales, represent various types of droughts, and better reflect changes in drought characteristics (severity, duration, frequency, and spatial extent) (Mishra and Singh 2010). SPI and SPEI are commonly used in global research and calculated at various time scales such as 1-, 3-, 6-, 12-, 24-, and 48-months to represent droughts. The frequency of dry periods is exceptionally high at brief time scales, but as the time scale increases, the frequency of dry periods decreases. Therefore, while drought frequency decreases at longer time scales, increases in its severity and duration are observed. Depending on the type of water source available, response times to drought conditions vary significantly. In order to define the system response period, (McKee et al. 1993) proposed the notion of the drought timescale. For example, short time scales (1 to 3 months) are mainly related to soil water content and river discharge in headwater areas, medium time scales (3 to 12-months) are related to reservoir storage and discharge in the medium course of the rivers, and longtime scales (12 to 24-months) are related to variations in groundwater storage. As a result, different time scales can be used to observe drought conditions in different hydrological subsystems (Vicente-Serrano et al. 2010).

In recent years, ML algorithms have spurred significant advancements in drought prediction (Felsche and Ludwig 2021; Piri et al. 2023; Danandeh Mehr et al. 2023). A variety of ML algorithms have been developed and employed for this purpose, encompassing random forest, k-nearest neighbors, artificial neural networks, support vector machines, Gaussian Process Regression, adaptive neuro-fuzzy inference systems, decision trees, multivariate adaptive regression splines, M5 Trees, including long short-term memory, extreme learning machine, and extreme learning machines (Deo et al. 2017a; Kisi et al. 2019; Yaseen et al. 2021; Kikon and Deka 2022; Docheshmeh et al. 2022; Lotfirad et al. 2022; Moghaddasi et al. 2024; Lalika et al. 2024). Katip (2018) applied a standardized precipitation index (SPI) to the Marmara region. Neural Network (NN) models were utilized. Five models, among six were successful (Katip 2018). Özger et al. (2020) applied the self-calibrated Palmer Drought Severity Index (sc-PDSI) to the Mediterranean region. Models were created via Empirical mode decomposition (EMD) and Wavelet decomposition (WD). WD was more successful (Özger et al. 2020). (Mehr et al. 2020) applied the SPI to the Ankara Province, Turkey. This paper presents a new hybrid model called ENN-SA for spatiotemporal drought estimation. In ENN-SA, an Elman neural network (ENN) is combined with simulated annealing (SA) optimization and support vector machine (SVM) classification algorithms for standardized precipitation index (SPI) modeling across multiple stations. In the study, researchers have shown that ENN-SA is promising and effective for multi-station SPI estimation. Altunkaynak and Jalilzadnezamabad (2021) applied the Palmer drought severity index (PDSI) to the Marmara region. Models were created via Discrete Wavelet Transform (DWT), fuzzy, k-Nearest NeighbourNeighbor (kNN), and Support Vector Machine (SVM). The hybrid models outperformed stand-alone ones (Altunkaynak and Jalilzadnezamabad 2021). Evkaya and Kurnaz (2021) applied univariate drought index (UDI) and SPI to the Marmara region. External Input (NARX) type NN models were created. It was found that the drought index forecasting capacity could be increased using (NARX-NN) (Evkaya and Kurnaz 2021). Başakın et al. 2021a, b applied the self-calibrated Palmer Drought Severity Index (sc-PDSI) to the Mediterranean region. They proposed a new hybrid model via an adaptive neuro-fuzzy inference system (ANFIS) and EMD, namely EMD-ANFIS and stand-alone ANFIS. The hybrid model outperformed (Başakın et al. 2021a). Citakoglu and Coşkun (2022) applied SPI to Marmara region. Both stand-alone and hybrid models were created via ANFIS, Gaussian process regression (GPR), k-nearest neighbors (KNN), NN, and support vector machine regression (SVM). Hybrid GPR and NN models outperformed (Citakoglu and Coşkun 2022). Gholizadeh et al. (2022) applied SPEI to the Central Anatolia region. They proposed a new hybrid model via the Bat optimization algorithm and extreme learning machine (ELM), namely Bat-ELM. The proposed model improved the forecasting accuracy (Gholizadeh et al. 2022). Kilinc and Yurtsever (2022) proposed a grey wolf algorithm (GWO) based gated recurrent unit (GRU) hybrid approach for the Mediterranean region. GWO-GRU model was successful (Kilinc and Yurtsever 2022). Gul et al. (2023) applied SPI to the Aegean region. They proposed Extreme Gradient Boosting (XgBoost), Adaptive Boosting, and Gradient Boosting. XgBoost outperformed (Gul et al. 2023). Reihanifar et al. (2023) presented a new model named multi-objective multi-gene genetic programming (MOMGGP), compared with genetic programming and multi-gene genetic programming. The same forecasting accuracy was obtained from MOMGGP (Reihanifar et al. 2023). Soylu Pekpostalci et al. (2023) evaluated 71 drought monitoring and forecasting studies from 2010 to 2022 in Türkiye. The application of ML for short-term hydrological and meteorological drought forecasting was trending upward (Soylu Pekpostalci et al. 2023). Danandeh Mehr et al. (2023) applied SPEI to the Central Anatolia. They proposed a new hybrid model via convolutional neural network (CNN) and long short-term memory (LSTM), namely convolutional long short-term memory (CNN-LSTM). The proposed model improved the forecasting accuracy (Danandeh Mehr et al. 2023). In addition, researchers have recently tried to suggest various hybrid models that can be used on in hydrological forecasting (Yuan et al. 2018; Adnan et al. 2021, 2022, 2023; Ikram et al. 2023; Mostafa et al. 2023; Mohammadi 2023).

There are various drought studies regarding the Euphrates basin in the literature. For example, (Katipoglu et al. 2020) compared SPI, SPEI, Statistical Z-Score Index (ZSI), Precipitation Anomaly Index (RAI), and Reconnaissance Drought Index (RDI) on a 3-month and 12-month time scale. (Katipoglu et al. 2021) mapped the SPI, ZSI, RAI, SPEI, and RDI using Kriging, Radial Basis Function (RBF), and Inverse Distance Weighting (IDW) methods at three and 12-month time scales. As seen in these studies from Türkiye, no existing studies predict the drought of the Euphrates basin. (Katipoğlu and Acar 2022). Calculated trends were calculated using Mann Kendall (MK) and Modified Mann Kendall (MMK) tests of the Standardized flow (SRI) index at three and 12-month time scales. Their studies also mapped drought trends using Kriging, RBF, IDW, Local Polynomial Interpolation (LPI) and Global Polynomial Interpolation (GPI) methods. (Katipoğlu et al. 2022) calculated and compared the trends with MK and MMK tests of SPI, SPEI, ZSI, RAI and RDI at three and 12-month time scales time scales. (Esit et al. 2023) calculated and compared the trends of SPI, SPEI and the standardized streamflow index (SDI) with MK, Spearman Rho, and innovative trend analysis tests at a 12-month time scale. As seen in these studies from Türkiye, no existing studies have predicted the drought of the Euphrates basin using ML methods. In addition, although drought prediction studies have been carried out in many geographical regions of Türkiye, no current study has been found in the Euphrates basin, the coldest and largest geographical region. Therefore, this study aims to close this gap by estimating SPI and SPEI values obtained from existing stations in the Euphrates basin in Türkiye using ML methods.

This study selected the application units of two meteorological measurement stations in the Euphrates Basin, specifically for the Karasu sub-basin in the Eastern Anatolia region. Within the scope of the study, 1965–2022 monthly precipitation and temperature data of existing stations were used. Monthly SPI values (SPI-1, SPI-3, SPI-6, SPI-9, and SPI-12) and SPEI values (SPEI-1, SPEI-3, SPEI-6, SPEI-9, and SPEI-12) were generated for 1, 3, 6, 9 and 12-time scales. SPI and SPEI outputs were utilized separately to monitor and describe the development of temporal variations of meteorological drought events over various time scales. A forecasting process was conducted using three different approaches: (i) Simple ML methods (ANN, GPR, SVM), (ii) Hybrid ML methods (TQWT-ANN, TQWT-GPR, TQWT-SVM) and (iii) Tribrid ML methods (TQWT-FSRFtest-ANN, TQWT-FSRFtest-GPR, TQWT-FSRFtest-SVM). Another critical purpose of this study, along with these three different approaches, was to develop high-performance drought prediction models and to reveal the advantages and disadvantages of the models by comparing the performances of simple, hybrid, and tribrid approaches. This study will contribute to the literature by offering a third alternative to the single-hybrid comparison.

Material & method

Study area and data

This study was conducted in the River Karasu, a sub-basin of the Euphrates Basin, which is situated between latitudes 40°20'–38°23' N and longitudes 37°17'–41°34' E. The Euphrates River, one of the most significant rivers in the Middle East, is in the most mountainous part of eastern Türkiye. The Karasu, on the other hand, is one of the primary tributaries contributing to the formation of the Euphrates River. The Karasu River Basin has an approximate surface area of 37,339 km², representing about 4.77% of Türkiye's total surface area and 30.70% of the Euphrates Basin. The Karasu River Basin is surrounded by mountainous areas consisting of volcanic masses. It is the most mountainous part of the Euphrates Basin, with a maximum altitude of 3,537 m in the eastern and interior areas and a minimum altitude of 800 m in the south. While 57.4% of the Basin has steep and very steep topography, 10.9% has relatively flat terrain. The basin's land structure consists of 0.7% urban areas, 14.2% agricultural land, 42.6% pastureland, 2.2% water masses, and the remaining portion is bare land (SYGM 2021). The most important agricultural lands of the basin are the Tercan, Erzurum, and Sarısu plains. The prevailing climate in the study sub-basin Karasu is continental and stands out for its harshness according to the central basin that the area belongs to, the Euphrates Basin. The basin experiences the highest rainfall during winter, while the least rainfall occurs in summer. Because of climate change, there is an anticipated rise in the frequency, severity, and spatial distribution of natural disasters, particularly those sensitive to changes in the water cycle, such as drought, across the country. In specific river basins, a decrease in rainfall and a significant increase in temperature are observed, and consequently, a tendency for reduced water flow is observed. Research indicates that the Euphrates Basin is one of the basins witnessing a notable water deficit (Alivi et al. 2021). In this study, Tercan and Tunceli meteorological stations existing in the Karasu sub-basin were selected. Figure 1 presents an overview of the basin and the locations of the stations.

Monthly mean temperature and monthly total precipitation data measured at Tercan and Tunceli meteorological stations for 58 years from 1965 to 2022 were acquired from the General Directorate of Meteorology of Türkiye. The method used to estimate missing data is the Inverse Distance Method (IMD). This method assigns weights to neighboring stations inversely proportional to their distance from the target station. This means that distant stations are given lower weight, while closer stations are given more weight. The basic principle of this approach is the assumption that closer stations correlate better with the target station than distant stations (Mohamed Salleh et al. 2021). This study used the IMD to complete the stations' missing monthly precipitation and temperature data. While using the method, the four closest stations in the vicinity with high correlation were selected, and the missing data were tried to be completed using this station's data. Figure 2 illustrates temporal changes in the annual mean temperature and precipitation for the two selected meteorological stations in the basin.

Precipitation data analysis reveals that the long-term annual mean precipitation is approximately 430.7 mm at the Tercan station, below the country average, and 845.6 mm at the Tunceli station, above the country average. The recorded maximum and minimum annual mean precipitation was 684.2 mm in 1979 and 185.3 mm in 2022 in Tercan, 521.2 mm in 1989, and 1992.8 mm in 1967 in Tunceli. At the Tercan station, 65% of the annual precipitation occurred during autumn and spring, while approximately 22% and 13% occurred during winter and summer, respectively. At the Tunceli station, 77% of the annual precipitation occurred during winter and spring, while approximately 20% and 3% occurred during autumn and summer, respectively. Temperature data analysis indicated that the annual mean temperature is about 8.5 °C at the Tercan station and 12.9 °C at the Tunceli station. The highest and lowest recorded annual temperatures were observed at 11.4 °C in 2010 and 5.4 °C in 1992 in Tercan, 15.0 °C in 2018, and 10.1 °C in 1992 in Tunceli. Furthermore, the monthly mean temperature varies from 21.4 °C to 27.2 °C in August and from -1.7 °C to -10.6 °C in January at Tercan and Tunceli stations, respectively. In conclusion, it has been observed that the different geographical locations and hydrological conditions, such as altitude and high topographic conditions, of the study areas, significantly influence the amount of precipitation and temperature in the region. Table 1 presents geographical and hydrological statistics characteristics related to monthly temperature and precipitation data between 1965 and 2022.

Table 1 Descriptive statistics of the selected stations

Drought index time series forecasting via three-in-one machine learning concept for the Euphrates basin

Abstract

Similar content being viewed by others

Forecasting effective drought index using a wavelet extreme learning machine (W-ELM) model

Short-term SPI drought forecasting in the Awash River Basin in Ethiopia using wavelet transforms and machine learning methods

Application of artificial intelligence hybrid models for meteorological drought prediction

Explore related subjects

Introduction

Material & method

Study area and data

Standardized Precipitation Index (SPI)

Standardized Precipitation Evapotranspiration Index (SPEI)

Pre-processing and Feature Selection (FS)

Artificial Neural Networks (ANN)

Gaussian Process Regression (GPR)

Support Vector Machine (SVM)

Performance criteria

Application of the models

Model 1 (stand-alone approach, ML)

Model 2 (tribrid approach, TQWT- FSRFtest-ML)

Model 3 (hybrid approach, TQWT-ML)

Findings

Evaluation of drought analyses

Temporal variation of meteorological drought events

Frequency analysis of drought indices

Correlation of drought indices

Performance of machine learning models

Discussion

Conclusion

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical approval

Consent to participate

Consent to publish

Competing interests

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation