1 Introduction

Drought and global warming, and the current phase of climate change, are still points of contention. According to a study published in 2013, warmer global temperatures would result in more rainfall in some parts of the world and less rainfall in others, leading to greater flooding and droughts around the world (Mahlstein et al. 2013). Other experts disagree with the prediction of more droughts, arguing that global warming will result in a wetter world (Krange et al. 2019; Sprain 2017; Osaka et al. 2020). Droughts are classified as meteorological when utilising precipitation and potential evapotranspiration, hydrological when using streamflow or groundwater recharge, and agricultural when using vegetation index and moisture content (Yu et al. 2018; Sun et al. 2019). Dry season observing was utilised to decide drought conditions, and keep critical misfortunes and serious effects from the hazards. Satellite or space-based observations have been utilised to evaluate the impacts of drought on the environment and ecosystem, including vegetation growth, land cover, soil moisture, and land surface features among others (Brown et al. 2011; Akinyemi 2021).

Droughts are particularly disastrous in South Africa and other arid regions, which are water-scarce by nature due to low rainfall and water sources (van Wilgen et al. 2016; Conradie et al. 2019). Droughts are not uncommon in Africa's drylands and have been rising in African dry terrain, according to some studies (Engelbrecht et al. 2015; Conradie et al. 2019), whereas other studies predict future rises in droughts and other high-temperature events (Engelbrecht et al. 2015; Akinyemi 2017). Extremely high temperatures in parks are becoming more common in South Africa's arid regions (van Wilgen et al. 2016). Drought has raised public awareness in other parts of Southern Africa, including South Africa, due to its detrimental effects, particularly on agriculture, which is predominantly rainfed (Akinyemi 2021). Over Botswana, studies have discovered visible variations of varying magnitudes in rainfall, temperature trends, and drought (Byakatonda et al. 2019; Akinyemi 2021). According to regional climate model predictions, droughts are expected to become more frequent and severe in this region as a result of future global warming. Because of the need to adapt and minimize consequences on social-ecological systems in African drylands, it is becoming increasingly vital to analyse and monitor droughts in the face of climate unpredictability and anthropogenic induced land-use change (Pande et al. 2018, 2021).

Few to no study on LULC changes in the study area has been done (Madasa et al. 2021). This study reported that there has been a steady rate of conversion from rangelands and native woodlands to cultivated arable lands during the last four decades. The study used remote sensing information to quantify land-use/cover changes in the Welkom–Virginia Goldfields (a small part of the Province), findings revealed different land-use changes with fluctuations in values for each index with an overall accuracy of the classified images ranging from 88 to 96%, respectively. Beyond quantifying the LULC changes using GIS/remote sensing techniques, a thorough understanding of the interaction of the changes with the main environmental drivers such as drought is required for the development of informed and appropriate land-use policies in the Free State Province.

Drought has an impact on both the environment and humans as a result of the interaction between natural events and the demand for water. Droughts have three basic impacts: economic, social, and environmental. Drought's economic consequences include food insecurity, hunger, rising food prices, and a decrease in household income. During a prolonged drought in South Africa, for example, many farming households endured ongoing food insecurity, malnutrition, and income reductions. As a result, drought is a significant contributor to the rising food crisis, environmental degradation, and land cover dynamics, as well as being closely linked to periods of vulnerability as a result of climate stress (Mera 2018; Lunyoloet al. 2021). During droughts, food prices, particularly staples, are typically high, resulting in hunger and malnutrition since low-income people cannot pay the high food prices. Drought's social impact includes public safety, water-user conflicts, and lower quality of life. Population migration is one of the major societal consequences. Those that flee usually go to cities or areas outside of drought-stricken areas. Even when the drought is less severe, the migrants rarely return home, resulting in the loss of key human resources in rural areas (Sousa et al. 2018; Hermans and McLeman 2021). Drought causes harm to natural land, ecosystems, and water quality, as well as plant and animal species, forest, and wildlife habitats, as well as degradation of landscape quality and biodiversity loss (Raev and Rosnev 2018; Wang et al. 2018a, b). Some of the consequences are just temporary, and things return to normal after the drought is over. Other environmental effects might continue for a long time, and in some cases, they can even become permanent. Degradation of landscape quality could result in a longer-term loss of biological productivity (Musolino et al. 2018; Beg and Al-Sulttani 2020).

While the impacts of droughts are well established, a globally accepted definition of drought is difficult to define (Slette et al. 2019). Agricultural drought has been most generally characterised, following the subject of this study, as a period during which the moisture supply provided by a region's environment continually falls below the appropriate level for crop or range production, negatively affecting yield (Feng et al. 2019; Dai et al. 2020). On the other hand, drought vulnerability defines drought risk rather than only the frequency and intensity of meteorological anomalies (Downing and Bakker 2000). It shows the degree of susceptibility of society to a drought hazard, which could vary either as a result of variable exposure to the hazard or because of coping abilities (Boguszewska‐Mańkowska et al. 2018; Rustad et al. 2020). The more diverse and robust a household's asset base is, the more drought resilient it will be and more alternatives it will have for switching between different livelihood strategies in response to drought. According to research on the elements determining drought sensitivity in South Africa, non-irrigated farmland and rangeland on sandy soils, located in areas with a high chance of seasonal moisture deficiency, were the most vulnerable to agricultural drought (Baudoin et al. 2017). Less is known about the interplay between drought vulnerability and land-use/land cover changes in drought-prone arid land areas such as South Africa. Due to the vulnerability of the pastoral people to drought risks (Carter et al. 2006), land-use conversion to farmlands as a better coping strategy may have been prompted by the loss of natural land cover or a fall in production as a result of drought.

As a result, we hypothesised in this study that drought and other human activities induce land-use changes in South Africa's Free State Province. This study aims to use space-based information to assess LULC changes and establish the role of drought vulnerability as a driver in the Free State Province of South Africa between 2000 and 2020. Also, in achieving the aim of this study, we intend to identify the predictors of drought within the study area. These predictors will help in future monitoring and assessment of drought events through modelling of drought susceptible areas. The findings of this study could be useful information for activities aimed at ensuring the long-term sustainability of the Free State Province, which is prone to drought.

Several studies have been carried out as regards drought prediction and mapping in arid and semi-arid areas using various models, tools and techniques. Some of these studies have employed the use of in-situ data, non-intrusive methods such as remote sensing techniques s (Vincente-Serrano 2007; Abbas et al. 2014; Rhee et al. 2017; Prasetyo et al. 2019; Adedeji et al. 2020; Liu et al. 2020; Zhu et al. 2021, Orimoloye et al. 2021a, b) and in recent times, the use of machine learning algorithms is emerging, especially in Sub-Saharan Africa (SSA). Despite the fact that the use of machine learning has been applied in extreme events especially for classification, the choice of regression-based analysis such as but not limited to support vector regression, Gaussian process regression, general regression neural network, has been well documented in drought studies (Panu et al. 2002; Shiau et al. 2016; Ghasemi et al. 2021). These studies and some others have shown that machine learning and artificial intelligence methods have a high potential for assessment, prediction and identification of extreme events such as drought due to the ability of machine learning methods to learn from past data, to handle large numbers of input variables, to integrate physical understanding into the models and to discover additional knowledge from the available data (McGovern et al. 2017; Felsche and Ludwig 2021; Li et al. 2021). These machine learning or data-driven models have become increasingly popular in climate scenario forecasting because they are effective in dealing with the non-linear characteristics of climatic data (Belayneh A and Adamowski 2013). Therefore, this study employs the use of machine learning algorithms to identify drought predictors. Furthermore, the study will also highlight the best performing algorithm in drought prediction within the study area by comparing regression-based learners such as multiple regression, principal component regression, partial least squares regression, random forest regression, support vector regression, and multivariate adaptive regression splines.

2 Materials and methods

This study was carried out in the Free State Province, South Africa as shown in Fig. 1. The Free State province is one of the nine provinces in the Republic of South Africa. There are prominent towns and cities including Bloemfontein, and some of the towns are mostly mining and agricultural dominated areas. The Province is situated between latitudes 26.6° S and 30.7° S and the Greenwich meridian distances 24.3° E and 29.8° E. According to the Köppen climate classification, the province's climate is mainly semi-arid. The province's topography is complex, with all surfaces above 1000 m culminating to 1800 m in the north-eastern and eastern Free State. Administratively, the province is divided into five municipal districts (Fezile Dabi, Lejweleputswa, Motheo, Thabo Mofutsanyane, and Xhariep). However, the region has monthly mean sunshine hours of approximately 319.5, 296.5 and 296.3, respectively in November, December and January, with annual sunshine hours and total precipitation of approximately 3312.3 and 559 mm, respectively. The region experiences the lowest rainfall (0 mm) in July, and the highest rainfall (70 mm) in December/January, which corresponds to winter and the summer seasons, respectively. The coldest temperature occurs at night during June and July. A better understanding of the spatiotemporal evaluation of vegetation-related stress and drought events will help identify drought-affected areas over Free State Province, South Africa.

Fig. 1
figure 1

Study area map

3 Methods

3.1 MODIS data

Moderate resolution imaging spectroradiometer (MODIS) Terra product was used to assess drought occurrence in the study area. MODIS was downloaded from the Application for Extracting and Exploring Analysis Ready Samples (AppEEARS) (AppEEARS Team 2020); and it was analysed using R programming. The MODIS instrument operates on both the Terra and Aqua spacecrafts. It has a viewing swath width of 2330 km that views the Earth's entire surface every one to two days. The VIs were generated at 16-day intervals using a MODIS-specific compositing method based on product QA to remove low-quality observations. Masking or correcting the pixels affected by atmospheric disturbances such as clouds, the Pixel Reliability Quality Assurance (QA) layer of MOD13Q1 was performed. The layer classifies the efficiency of the vegetation index from -1 to 5, however, good and poor values are classified within the range of 0 and 1, respectively for this analysis. Poor and marginal data are acknowledged as acceptable accuracy in the pixel reliability bands and were considered for the study.

In this study, the vegetation condition index (VCI) based on the relative normalized vegetation difference Index (NDVI) adjustment with respect to the minimum historical NDVI value as suggested by Kogan (1995) was used to determine the drought conditions in the region. Thus, the VCI compares the present vegetation index (VI), such as the NDVI or the enhanced vegetation index (EVI), to the values found within a particular pixel in the same period in previous years. As shown below, the VCI was determined using Eq. 1;

$$VCI_{IJK} = \frac{{VI_{IJK} - VI_{i,min} }}{{VI_{i,max} - VI_{i,min} }} \times 100$$
(1)

where VCIijk is the VCI value for the pixel i during week/month/ day of the years (DOYj) for year k, VIijk is the weekly/monthly/DOYs VI value for pixel i in week/month/DOY j for year k whereby both the NDVI or EVI can be utilised as VI, VIi, min and VIi,max is the multiyear minimum and maximum VI, respectively, for pixel i.

The method used in this analysis, i.e. estimation of drought events with VCI using R programming is based on EVI, which has certain crucial values or advantages compared to other vegetation indices, including NDVI. First, no distortions of the reflected light caused by particles in the air; and second, no distortions of the reflected light caused by ground cover vegetation. Figure 2 presents the planning, pre-processing and data processing procedures adapted from UN-SPIDER recommended practices (http://www.unspider.org/advisory-support/recommended-practices/recommended-practice-drought-monitoring).

Fig. 2
figure 2

Flow chart showing the analysis procedures

3.2 Land use classification

The land-use classification was carried out using Google Earth Engine (GEE) for the study area (Free State Province) during the year 2001, 2015 and 2020. Landsat 7 Tier 1, Surface Reflectance (LANDSAT/LE07/C01/T1_SR) was used for 2001 and 2015, while Landsat 8 Tier 1, Surface Reflectance (LANDSAT/LC08/C01/T1_SR) was used for the year 2020. The choice of Tier 1 is due to the fact that these datasets meet geometric and radiometric quality requirements, while GEE provides a variety of Landsat-specific processing methods, which enables cloud score and cloud-free composites. For each Landsat image acquired, the cloud disturbance was reduced to the nearest minimum, while the random forest was used as the classifier (ee.Classifier.smileRandomForest). Output accuracy and kappa were used as validation indicators together with the confusion matrix. It is worth mentioning that noise in the form of clouds exists in some of the classifications. However, a large extent does not alter the land use classes. The python script for the GEE code is available at https://code.earthengine.google.com/d12c4561d236f1a1d2b54b7b12113964.

3.3 Regression-based algorithms

Regression is a basic concept in machine learning. Regression analysis is supervised learning that establishes a relationship between variables through the estimation of cause-effects. In broad terms, the regression model estimates a function ƒβ(.) (parameterized with β) given data points under a loss function ∑i ι (ƒ(x¡), y¡). In this study, drought distribution (VCI) was initiated using MOD13Q1. A total of 15 drought factors were identified as drought predictors. These factors are grouped broadly as topographic, climatic, and anthropogenic proxies (Li et al. 2020; Cartwright et al. 2020). Disaggregating these factors, we have Latitude, Longitude, Elevation, Total Precipitable Water, Fractional Vegetation Cover, Emissivity, Normalized Difference Vegetation Index, Land Surface Temperature, Leaf Area Index, Precipitation, Temperature, Potential Evapotranspiration, Solar Radiation, and Soil Moisture at 0–7 cm and 7–14 cm. These 15 factors were extracted from 236 drought locations within the study area. To ascertain which of the factors out of the 15 factors are important in drought prediction, variable importance was performed for each of the regression-based algorithms. These regression-based algorithms were built based on partitioned data (training and testing) to evaluate the best-performing model selected using the root mean square error (RMSE). The choice of RMSE against other metrics such as MAE, MAPE and others is beyond the scope of this study. However, studies have shown that RMSE is just enough to represent model performance (Chai and Draxler 2014; Quan et al. 2020).

Feature selection (variable importance) is an important step in classification and model building. Identifying features or factors as presented in this study helps in understanding the drought dynamics within the study. The feature selection and model identification were performed within the R ecosystem (R Core Team 2019) using caret, earth, vip, and pdp (Kuhn 2008; Friedman 1991; Greenwell et al. 2020; Greenwell 2017), respectively. The regression-based algorithms used in this study are:

  1. a.

    Multiple regression (MR) 

    Multiple regression is a machine-learning algorithm to predict a dependent variable with two or more predictors. Multiple regression has numerous real-world applications in three problem domains: examining relationships between variables, making numerical predictions, and time series forecasting.

  2. b.

    Principal component regression (PCR)

    Principle component regression (PCR) is an algorithm for reducing the multi-collinearity of a dataset (Jolliffe 1982). PCR is basically using principal component analysis and then performs linear regression on these new PCs (d) using cross-validation or test set error and concludes by performing regression using the first d dimension reduced principal components. Here, principal components regression forms the derived input columns Zd = Xvd and then regresses y on z1, z2, z3, … zd for d ≤ p. Principal components regression discards the smallest eigenvalue components (p–d).

  3. c.

    Partial least square regression (PLSR)

    Partial least square regression (PLSR) is a technique that reduces the predictors to a smaller set of uncorrelated components and performs least squares regression on these components, instead of on the original data (Geladi and Kowalski 1986; Tobias 1995). Partial Least Squares is a solution for multicollinearity as it allows for the reduction of the dimensionality of correlated variables and then models the underlying, shared, information of those variables (in both dependent and independent variables) (Geladi and Kowalski 1986; Tobias 1995). Another advantage of Partial Least Squares is that it is a method that can model multiple outcome variables against other machine learning algorithms.

  4. d.

    Random forest regression (RFR)

    Random forest regression (RFR) is a supervised learning algorithm that uses the ensemble learning method for regression (Breiman 2001; Grömping 2009). A Random Forest operates by constructing several decision trees during training time and outputting the mean of the classes as the prediction of all the trees (Breiman 2001). A prediction from the Random Forest Regressor is an average of the predictions produced by the trees in the forest (Breiman 2001; Grömping 2009). The Random Forest Regressor is unable to discover trends that would enable it in extrapolating values that fall outside the training set. A solution to this is to try the support vector regression.

  5. e.

    Support vector regression (SVR)

    SVR gives us the flexibility to define how much error is acceptable in our model and will find an appropriate line (or hyperplane in higher dimensions) to fit the data (Smola et al. 2004; Awad et al. 2015). In contrast to OLS, the objective function of SVR is to minimize the coefficients. This method works on the principle of the Support Vector Machine. SVR differs from support vector machine (SVM) in the way that SVM is a classifier that is used for predicting discrete categorical labels while SVR is a regressor that is used for predicting continuous ordered variables (Smola et al. 2004; Awad et al. 2015). SVR is robust, easy and performs lower computation compared to other regression techniques among others.

  6. f.

    Multivariate adaptive regression splines (MARS)

    MARS, a regression analysis model (Friedman 1991) is more or less an extension of linear models through a non-parametric regression technique. This model can work on forward and backward stepwise procedures (Kisi et al. 2016). MARS is flexible, easy, simple, handles both continuous and categorical data, does the automatic variable selection, has a good bias-variance trade-off and performs predictions quickly compared to other classifiers especially SVR (Kisi et al. 2016).

4 Result and discussion

The spatial and temporal variations in vegetation productivity were presented based on analyses during the summer seasons between years 2000 and 2020 spanning 20 years. Providing insights regarding each year, the VCI drought index (Figs. 3, 4, 5, 6, 7) reveals various drought situations from light to extreme drought episodes over the study area. The Free State province observed moderate drought conditions in 2001, as presented in Fig. 3. From west to eastern parts of the province were moderately affected by drought, which connotes that drought was spatially distributed during the same period. Another drought scenario was observed in 2005, where most of the study area was affected by drought events. The area was severely affected from central to southern parts, with diverse impacts on environmental components such as vegetation productivity, living and non-living creatures that depend on natural resources (Swemmer et al. 2018; Riebsame et al. 2019). The earth's natural environment consists of living and non-living creatures that have found the current life form by creating a balanced condition and system. Whenever some of the factors forming the environment become affected, the system, the balance kind, and the environmental condition are certainly involved in some changes because these disorders, unbalances. This study area result is considered a severe threat to humans and environmental modifications (Williams and de Vries 2020; Haile et al. 2019). Without knowing about the drought and its immediate impacts on the environment, land cover and humans would not achieve the desired results (Arastoo 2013). Due to their critical function in ecosphere-biosphere–atmosphere interactions, land cover and vegetation are essential criteria for human-natural-environment evaluation and drought monitoring and play a vital role in global climate change (Arastoo 2013; Jamali 2019).”

Fig. 3
figure 3

Drought event in year 2001

Fig. 4
figure 4

Drought event in the year 2005

Fig. 5
figure 5

Drought event in year 2001

Fig. 6
figure 6

Drought event in year 2015

Fig. 7
figure 7

Drought event in year 2020

Vegetation productivity, which is the spatial distribution and change of the vegetation cover, was low mostly throughout the study area in 2005, as depicted in Fig. 3. However, in 2010, vegetation improved over the study area (Fig. 4), where most of the study areas were moderately affected by drought events. The result shows that no drought characterises 2010 to moderate drought over large parts of the study area, which may have lower drought impacts on land covers during this period. The temporal evolution of the drought disaster during the year 2010 and the spatial distributions suggested the drought conditions' potential influence on different environmental components with moderate impacts, especially on natural vegetation, agricultural lands, among others (Albano et al. 2020; Marengo et al. 2020). Since summer is known to be the planting period and the initial stages of crop growth, this may affect agricultural sectors moderately in the study area (Ribeiro et al. 2019; Muthelo et al. 2019).

In 2015, Free State Province was severely affected by drought episodes as the study area observed extreme drought. This result corroborates other studies which noted the same drought patterns in the study area (Botai et al. 2016; Abubakar et al. 2020). The observed trends in drought disaster over the study area demonstrate that the impact of droughts could be significantly different across the province and years. In particular, it can be noticed that about the whole province was affected in 2015 compared to other years considered in the study, which exhibit decreasing trends in drought situations in the Free State Province. This suggests that the Free State Province experienced drought fluctuation conditions during the analysed period. As a result, key sectors such as agriculture and water resources are likely to be affected negatively in 2015 (Moeletsi and Walker 2012; Sweet et al. 2017). These conditions are a threat to food security given that the province is considered the food basket of South Africa, and fertile lands might have been degraded as a result of the drought disasters in the affected years (Mare et al. 2018; He et al. 2019). The increasing droughts will decrease the volumes in water reservoirs and streams, which are largely located in (and also transect across) the provinces under the present study.

Drought event in the year 2020 is presented in Fig. 7, where only the western part was moderately affected by drought during the year. The affected district municipalities are Lejweleputswa and Xhariep (northern part) district municipalities, while other municipalities were moderately affected by drought disaster this includes Mangaung Municipality.

4.1 Land use and land cover dynamics

The information on land use/cover for 2001, 2015 and 2020 shown in Figs. 8, 9 and 10 had an overall map accuracy of about 0.86, 0.95 and 0.93, respectively, for the years under investigation using error/confusion matrix. The commonly employed approach for evaluating per-pixel classification is the error matrix (Bechtel et al. 2020; Nabil et al. 2020). Kappa statistics/index was computed for each classified map to measure the accuracy of the results. The resulting classification of land use/cover maps of the years investigated had Kappa statistics of about 0.79, 0.95, and 0.90 for 2001, 2015 and 2020, respectively. Reasonably good overall accuracy was accepted for the subsequent analysis and change detection (Wang et al. 2018a, b). These three years were chosen due to the availability of Landsat data. Other years that were supposed to be included were not cloud-free. Therefore, they are not suitable for this analysis. However, the years selected in this study still have some cloud cover that makes the land features to be varied in size from year to year. Different land surface features were evaluated to identify the potential impacts of environmental stressors such as drought disasters and other activities. These features are built-up areas, water bodies, grassland and barren surfaces and cultivated lands.

Fig. 8
figure 8

Land use dynamics for year 2001

Fig. 9
figure 9

Land use dynamics for year 2015

Fig. 10
figure 10

Land use dynamics for the year 2020

Findings from the study reveal that grasslands and barren surfaces had the highest land cover with about 114,512.53, 125,151.78 and 101,580.02 sq km for years 2001, 2015 and 2020, respectively. While built-up covers about 5330, 3,109 and 1,567 for the three years in that order. In 2001, 2015 and 2020, the area cover for water bodies was 782.24, 633.37 and 861.43, respectively (Table 1), and cultivated lands were 9,361, 930, and 25.545 for 2001, 2015 and 2020, respectively. It was noted that cultivated lands declined drastically as a result of drought disasters in the Free State Province. Signified the dramatic land cover change on the category of cultivated lands surface, exerting incredible pressure on natural resources, including vegetation and agricultural lands (Hassan et al. 2016). Conversion of natural lands and forested areas to other land features was also significant in the three years investigated in this study. From the findings, agricultural lands, cultivated grasslands, and barren surfaces were influenced or impacted by drought disaster, especially in 2015, a drought year in the Free State Province (Botai et al. 2016). If the environmental degradation or conversion of natural land cover continues, the area is bound to face the negative impact of soil erosion and high surface temperature (Trnka et al. 2016; Azadi et al. 2018). These negative impacts may further lead to food and water insecurity, and the ripple effect would help increase global warming in the future (Workman and Ureksoy, 2017; Kogo et al., 2021).

Table 1 Land use dynamics for 2001, 2015 and 2020

4.2 Relationship between drought episodes and environmental covariates between 2001 and 2020

Many of the questions relating to the historical challenges that faced South Africa including Free State include issues, such as but not limited to poor financial management, lack of resources, poor and exclusionary city planning, and unequal infrastructure development, have been linked to drought vulnerability in South Africa (Cooperative Governance and Traditional Affairs 2016; Orimoloye et al. 2021c; Steyn et al. 2019). It was agreed that a more in-depth check at the associated issues, and how they can be combated by the regional and national government, is needed (Cooperative Governance and Traditional Affairs 2016).

Arid and semi-arid lands are a vital part of the earth system in South Africa. These areas provide salient ecosystem services that ensure a sustainable environment and human livelihoods. These regions consist of grasslands, cultivated lands, urban areas and water bodies (Figs. 7, 8, 9). However, the conversion of these land-use types as a result of various anthropogenic activities has been linked to the persisting drought events. Studies have shown that drought events are large as a result of climatic oscillations and dynamics, but the increasing magnitude and frequency of these events have been linked to unsustainable and unchecked anthropogenic activities. The Free State province, a province of farmlands, mountains, goldfields, and widely dispersed towns even though the third largest province in South Africa has one of the lowest population and population densities in the country (https://municipalities.co.za/provinces/view/2/free-state). The economics of the province is agriculture, mining and manufacturing. Major crops grown within the province include Maize, Wheat, Potatoes, Groundnut, and Wool to mention a few. Therefore, the persistence of drought within the province is expected to affect human livelihoods and disrupt the socio-economic balance. The observed persistence in drought, especially since 2015 (Orimoloye et al. 2019, Orimoloye et al. 2021a) is largely driven as a result of climate and anthropogenic factors (Fig. 11). A closer look at the feature selection graph (Fig. 11) shows that surface reflectance, land use (water bodies), soil moisture, temperature, precipitation, and emissivity are the commonalities across the regression-based learners. Some other studies have also shown precipitation, energy availability, elevation (Burke 2011; Otto et al. 2018; Cartwright et al. 2020) as drivers of drought across different regions.

Fig. 11
figure 11

Variable importance selection across the regression-based learners

As presented in this study, land use (water bodies) and solar radiation are the leading factors driving drought within the Free State province. These are classified broadly as anthropogenic and climatic drivers. Understanding these drivers is essential for predicting and understanding the trajectory of drought events across the province. With increasing climate change and unhealthy anthropogenic activities, drought events are expected to persist in intensity and magnitude (Orimoloye et al. 2021b). Even though the Free State province is one of the provinces declared drought-prone, these interconnections, as revealed in this study, suggest the need to raise these alert levels across the province and provide a monitoring and early warning system due to its position in terms of agriculture and economy.

The identification of climatic influence and anthropogenic factors as shown in this study has also been observed in other studies (Li et al. 2020; Li et al. 2021). From the feature selection results, the influence of climate proxies and anthropogenic factors on VCI shows the ecological situation within the Free State province (Jin et al. 2016; Orimoloye et al. 2021b). These studies (Orimoloye et al. 2021b; Jin et al. 2016) show the varying influence of human activities and climatic variation on vegetation greenness. Liu et al. (2020) in their study on the quantitative assessment and driving force analysis of vegetation drought risk to climate change in Northeast China posited that radiation flux and evapotranspiration process are important to vegetation dynamics and that this can be compared with temperature and precipitation. As shown in Fig. 11, some of these parameters, especially solar radiation and temperature, are proxies for climate warming. Solar radiation is one of the vital components needed for human survival on the surface of the earth. It has been posited in some circles that the amount of solar radiation incident at the Earth’s surface is not stable over the years but undergoes significant decadal variations (Orimoloye et al. 2019; Wild 2012). As observed in a study across the Congo basin, increasing solar radiation is expected to put more stress on plants (reducing greenness) through extensive depletion of soil moisture (https://yali.state.gov/satellites-show-drought-may-put-stress-on-congo-rain-forest/). Solar radiation management has been on the rise lately as regards reduction in the amount of energy returned to the atmosphere. The idea is to keep global warming at a limited level to help save the ozone and the lower atmosphere. The continued increase in solar radiation is expected to enhance environmental change which directly or indirectly perpetuates extreme events such as droughts.

In terms of prediction and understanding the trajectory of drought events in Free State province, the regression-based learners with the lowest RMSE (Table 2) provide an opportunity for predicting the vegetation condition index (VCI) across the province. The partial least square regression (PLSR) is most likely to be the model for VCI prediction across the province using solar radiation, land use (water bodies), elevation, soil moisture, precipitation, temperature, and leaf area index. PLSR is known to combine various regression methods in order to maximise independent variables and dependent variables that enable the learner (PLSR) to improve its accuracy and exploratory power (Li et al. 2020; Luedeling et al. 2013). As presented in the method section, this study employed various regression-based learners. The PLSR presents a better accuracy in identifying the drivers of drought across the Free State Province (Table 2).

Table 2 RMSE across the six regression-based learners

5 Conclusion

Drought disaster and land-use dynamics over the Free State were assessed in this study using space-based information in order to identify drought drivers utilising regression-based algorithms. Drought has a wide range of impacts that are dependent on the socio-economic environment of the affected community. Findings from the study revealed that cultivated lands declined dramatically during the period of study as a result of drought disasters and more evident in the year 2015 in the Free State Province. This signified the dramatic land cover change on the category of cultivated lands surface, exerting incredible pressure on natural resources including vegetation and, in particular, agricultural lands. The partial least square regression (PLSR) has shown better accuracy for identifying the drivers of drought across the Free State Province.

The selected features include solar radiation, land use (water bodies), elevation, soil moisture, precipitation, temperature and leaf area index. These features are paramount in drought prediction within the province and could be essential in drought early design across arid and semi-arid regions. PLSR is known to combine various regression methods in order to maximise independent variables and dependent variables that enable the learner (PLSR) to improve its accuracy and exploratory power. Hence, its choice as the best learner in this study presents a better accuracy in identifying the drivers of drought across the Free State Province. It is important to develop an innovative drought monitoring system for regular collection and timely dissemination of relevant data in a usable manner, this should be considered in future research. Area-specific planning of mitigation measures in the region is also necessary for effective drought risk reduction and management. In essence, stakeholders should come together in designing drought early warning systems using identified features.