Keywords

1 Introduction

The hydrometeorological data are collected from a great range of sources and this process increases rapidly with new technologies measuring various environmental data. The first basic environmental data are collected by ground monitoring systems, which consist of vide range of sensor technologies measuring various physical parameters. The weather stations and monitoring systems measure almost all hydrometeorological parameters used in weather prediction and climate change assessment topics. Those include temperature, precipitation, wind velocity and direction, solar radiation, relative humidity and etc.

In Georgia, there exist most of Earths climatic types, from marine wet subtropical climate of west part and steppe continental climate of east part up to eternal snow and glaciers of high mountain zone of Great Caucasus, and also approximately 40% of observed landscapes.

The complexity of the orographic structure of Georgian territory, along with other physical -geographical factors is the cause of wide variety of climates and landscapes. There are almost all types of climates observed on the Globe, from the climate of eternal snows of high mountains and glaciers to steppe continental climate of eastern Georgia and humid climate of the Black Sea coast subtropical zone (Tatishvili et al., 2013).

Such complicated relief has definite influence on air masses motion in atmosphere lower layers. Mainly west and eastern atmospheric processes prevailed over Georgian territory. Thus those climatic zones uphold the formation of different dangerous hydrometeorological phenomena, namely: hailstone, heavy showers, flooding, thunderstorm, draughts, and sea storms. The existing geodynamical and orographic properties of Georgia play major role in the formation of various weather patterns. Such complex relief is one of the main reason of the formation and evolution of various scaled circulation systems and heterogeneous spatial distribution of meteorological elements. This is verified by the fact, that precipitation annual distribution has diverse type, with sharply expressed spatial inhomogeneities (Tatishvili, 2017).

The hydrometeorological ground station observation in Georgia begun on the end of 19-th century gradually increased up to 200 stations at the end of 20-th century. For present there are about 80 ground based automatic weather stations that transmit weather information per 10 min (Fig. 1).

Fig. 1
4 graphs trace the trend of drought indices for Dedoplistskaro S P i 12, Gori, S P I 12, Telavi S P I 12, and Bolnisi S P I 16. All graphs plot noisy trends that first ascend and descend, with several spikes in between. The linear trend slope, slope error, and p-values are mentioned under each.

Drought indices for Dedoplistskaro a SPi12, Gori b SPI12, Telavi c SPI12, Bolnisi d SPI6

Fig. 2
A double-box plot with error bars compares the true and predicted responses of the trained model 10 versus 19 stations in the Kakheti region. The true response has the highest values for all stations.

The trained model 10 for Kakheti region

Fig. 3
A double-box plot with error bars compares the true and predicted responses of the advanced S V M prediction model 2.5 versus 17 stations in the Kartli region. The true response has the highest values for all stations.

Advanced SVM prediction Kakheti region model including additional seventeen stations of Kartli region

The dangerous phenomenon—drought is a frequent in eastern Georgia. Its frequency in some areas exceeded 40% in the 80-th of the last century by assured estimations. The significant transformation of many types of natural landscapes has been observed resulted from the frequent droughts accompanying the global warming in past decades. The desertification probability of steppe and semi-desert landscape of eastern Georgia by the end of the twentieth century has reached 25–30% (Tatishvili et al., 2021). According to official information, the area of over than 200 000 ha is strongly affected by the intense droughts for present. Property damage caused by drought is also very significant.

It is well established, that the main meteorological factors for drought formation are dry weather, high temperature and lack of soil producing moisture. The average time of rainless period with precipitation less than 5 mm most important for agriculture is not more than 10–15 days. Besides, the mean rainfall is not more than 200–300 mm during vegetation period on the lowlands. Nevertheless, producing moisture supply is 50–200 mm per one meter of soil that corresponds to the zone of capillary agro-hydrological humidification and full spring rainfall penetration. At the same time active air temperatures sum exceeds 4000 °C by 10 times and the mean duration of continuous high temperatures more than 30 °C is longer by 4 h.

2 Data and Method

Station data was recovered from the CLIDATA database of the National Environmental Agency (NEA), which has been operating since 2014. Stations were selected based on data continuity and accuracy. After station data validation where data interruption has been detected or measuring sensor transmitted incorrect information due to its malfunction they were removed and not analyzed. At the 21 stations the observers monitor data and except human factor the unreliability of the data are minimal, and the rest of them were operated by rain gauge produced by VAISALA (Publish House of Iv. Javakhishvili Tbilisi State University, 2022), which by its design does not measure residual precipitation. The VAISALA weather gauges are the new generation of weighing precipitation gauges. They combine techniques, the latest high-accuracy load cell technology and advanced measurement control algorithms to ensure high performance, both in liquid and solid precipitation used for almost all weather conditions (Publish House of Iv. Javakhishvili Tbilisi State University, 2022).

Drought is one of the most dangerous and widespread natural disasters of all over the globe. Drought in the historical past has become a provocateur for the disappearance of many civilizations (Evans et al., 2018; Kaniewski et al., 2015) and robust migration processes (DeMenocal, 2001). Droughts are characterized by decreased natural water availability in the form of precipitation, river runoff, or groundwater (Babre et al., 2022; Tatishvili et al., 2022a). Difficulties related to the study of droughts are determined by their diverse nature (meteorological, hydrological, ecological, and economic). Accordingly, there are different types of indexes, the complexity of which depends on the availability of the necessary data. The most widely used indices in modern studies are SPI (Standard Precipitation Index) and SPEI (Standard Precipitation Evapotranspiration Index). The SPI calculation uses the precipitation data series (McKee et al., 1993), and the SPEI is based on the cumulative water balance instead of precipitation sums (Vicente-Serrano et al., 2010). The SPEI hence represents the standard-normal distributed water balance (Vicente-Serrano et al., 2010), where the temperature is considered along with precipitation (Tatishvili et al., 2022a). To calculate above mentioned indices the ground meteorological station data of Georgian National Environmental Agency was used of 1991–2020 year period. 10 stations have been chosen from data archive that has valid data to calculate indices.

The drought indices are classified according months: 3, 6 and 12. R-stat soft was used for plots. Below are presented several draught SPEI, SPI drought indices plots from selected stations. They represent Kakheti and Kartli regions notable for agriculture: Dedoplistskaro, Gori, Telavi and Bolnisi.

3 Discussion

The correlation between data sets is a measure of how well they are related In presented study the statistical parameters: Pearson correlation coefficient (PCC)- the most common measure of correlation, Determination coefficient (R2), and Root Mean Square Error (RMSE) were used as the criteria, which are among the strong statistical measures. Generally, R2 ranges from 0 to 1, with higher values indicating less error variance. The RMSE is the square root of the variance of the residuals. It indicates the absolute fit of two data set and lower the RMSE the better performance is (Observatory and EDO, 2021).

The PCC (3 month) which shows linear relationship between SPI-SPEI is quite high and the RMSE (SPI-SPEI) is low for all stations, especially for Khashuri and Telavi. (Table 1).

Table1 Statistical parameters PCC, R2, RMSE for most vulnerable regions according SPI3 and SPEI3

The PCC (12 month) for SPI-SPEI (12 month) is high. The R2 is low for all stations. The RMSE (SPI (12)-SPEI (12)) is low which means perfect fitting, (Table 2). The strongest relationship was observed among the indices using the equal time periods. By the increasing of time lag, the relationship between variables has been weakening.

Table 2 Statistical parameters PCC, R2, RMSE for most vulnerable regions according SPI (12) and SPEI (12)

The PCC for SPI-SPEI relations inside stations is very high, while between stations is relatively low. This can be explained by the facts that meteorological parameters- temperature and especially precipitation spatial–temporal distribution have diverse nature for different locations at different elevations and stations are placed in various climatic zones. The distance between stations is also the important factor.

The values of PCC are shown on Table 3.

Table 3 Pearson correlation of draught indices inside and between stations

The calculated values of indices give possibility to split negative and positive number or dry and wet day number.It’s interesting to count drought and wet day ratio at each stations. For Akhaltsikhe point wet day number exceeds drought one, severe drought day is approximately 3 and moderate- 50. At Gori station wet day number slightly exceeds drought day number, severe drought day equals 1 and moderate-60. At Telavi point drought day number greatly exceeds wet day number, severe drought day equals 5 and moderate-58. At Tbilisi point drought day number exceeds wet day one, severe drought day is 4 and moderate-62. At Kutaisi station wet day number exceeds drought one, severe drought day is 5 and moderate-55. At Mta-Sabueti both day types are approximately equal, severe drought day is 5 and moderate-54 (Tatishvili et al., 2022b). In Table 4 dry and wet day number for 3 places are presented.

Table 4 Negative and positive day number in Telavi, Tbilisi and Gori for 1991–2020 year period by SPEI3

The index values categorize drought intensity in given period. On Tables 5, 6, 7 and 8 SPEI and SPI categories were presented for Telavi, Tbilisi and Gori stations. All SPI, SPEI values are differing from each other and this fact has explanations.

Table 5 SPI (3,6,12) and SPEI (3,6,12) categories for Telavi station
Table 6 SPI (3,6,12) and SPEI (3,6,12) categories for Tbilisi station
Table 7 SPI3(3,6,12) and SPEI (3,6,12) categories for Gori station
Table 8 Statistical parameters of stations/models located on the Kakheti and Kartli regions

According to the definition of SPEI the results of surface water deficit may be interpreted as dryness that is the indicator of drought conditions For Telavi that represents dry region irrigation is major concern. The region is significant agriculture producer and intensively uses Alazani river water for irrigation. According SPEI index the region is most subjected to moderate dryness than by SPI.

Tbilisi the capital of Georgia has its own climatic regime known as Tbilisi Cavern (Tsitsagi et al., 2022). It is more subjected to moderate drought conditions.

Gori represents Shida Kartli agricultural regionthat known by its fhruit production. It is undoubtedly under moderate drought conditions.

The obtained results indicate that there is difference between the droughts depicted by the precipitation based SPI and the evaporation influenced SPEI caused by inter annual and seasonal variability of temperature and precipitations trends, land cover, vegetation, irrigation and etc.

4 Conclusion

The Machine Learning technique is the important module of the growing field of big data science. Through the use of statistical methods, algorithms are trained to make classifications or predictions, and to uncover key insights in data mining projects. These insights subsequently drive decision making within applications and businesses, ideally impacting key growth metrics. As big data continues to expand and grow, the market demand for data scientists will increase. They will be required to help identify the most relevant business questions and the data to answer them (Publish House, 2022).

Overfitting is a concept in data science, which occurs when a statistical model fits exactly against its training data (Publish House, 2022). When this happens, the algorithm unfortunately cannot perform accurately against unseen data, defeating its purpose. Generalization of a model to new data is ultimately what allows us to use machine learning algorithms every day to make predictions and classify data.

The first region that was subjected to ML analysis is Kakheti, which was revealed as the most vulnerable to drough risk. The existed stations data of 1961–1990 year period were subjected to QC on heterogenity and continuality test and only19 stations passed it. Based on these data, the Standardized Precipitation Index (SPI) for 3 months was calculated and then subjected to Machine Learning. In this case, the most optimal result was shown by the “optimized tree”, where the minimum number of leaves is equal to 45, and the training time is 30,296 s, the prediction speed_ ~ 43,000 obs/sec (Publish House, 2022).

The Support Vector Machine (SVM) was selected in the Matlab space, it is the algorithm for Supervised Machine Learning, which allows us to optimize it, for example, we can control the number of divisions in the “machine tree”, which help to achieve the higher accuracy of the model. As it was revealed the tenth model showed the best result, with the using of this model it became possible to determine the drought probability by months at each point. The model was trained based on historical records and for forecasting by moth the new 1990–2020 year period was added.

Despite obtained good parameters, it became needed to add additional stations from adjacent territories, because there was not enough information in the Kakheti region to conduct the correct analysis of Machine Learning. While comparing the model and station data the overfit was detected, this is not valid for the analysis of new data. It became necessary to add the neighbor Kartli region also. In this case it is remarkable that stations that are displayed closely to each other give better results to the model for this two regions. The stations that are far away increase the probability of overfit. Therefore, Machine Learning needs to increase the observation network density- the number of stations, the observation period, as well as the use of satellite data for the region where there is not enough observation is the better choice. Especially for mountainous country where exist places that aren’t suitable to conduct ground observations.

The additional region contains 17 meteorological stations. The SPI (3) drought index uses precipitation data of those stations. The new prediction model has improved parameters: RMSE-0.77, R2-0.39, MSE-0.58, and MAE-0.60.

The calculated statistical parameters of this two regions stations data and prediction model are enough fitted. Therefore the following is necessary to avoid overfit: increase number of dataset; increase of time study period, more nearest supporting points of observations.

Consequently, to apply the Machine Learning technique it is essential to increase the observation network- the number of stations and also increase the observation period too. As well as use satellite data for the region where there is not enough observation data and station set up is impossible is essential. Such research is conducted firstly for Georgian territory and it’ll be continued using more station data of large period to cover the whole territory. The research results are important for early warning system.