1 Introduction

Drought is a complex and devastating natural disaster that has profound negative impacts on agriculture, ecological environment and economy (Rong et al. 2019). Extreme drought events caused by global warming have occurred more frequently and severely in the past two decades (Rong et al. 2019). Many countries have suffered different types of drought, and these severe droughts might be further aggravated around the world (Park et al. 2016). Nowadays, effective and accurate prediction and early warning of drought are particular important for the prevention of drought damages and economic loss. Therefore, it is urgent and necessary to predict and monitor drought through a simple and effective way.

Soil moisture plays an important role in drought monitoring and the estimation of drought indices (Zhan et al. 2016; Park et al. 2017). Soil moisture data are usually derived from in situ observations with different depths and various densities (Mishra et al. 2017). However, the unevenly distributed and even unavailable in situ networks limit the availability of soil moisture (Liu et al. 2016). The surface soil moisture obtained from remote sensing techniques enriches the datasets of soil moisture at different temporal and spatial resolutions (Eni 1996; Xi et al 2014; Yuan et al. 2015; Nguyen et al. 2017). However, the accuracy of soil moisture data is limited by the uncertainties of remote sensing techniques. Therefore, considering these limitations, it is important to find an effective way to predict soil moisture and its drought indices for drought prediction and monitoring.

To better monitor and predict drought, researches have attempted to fuse different datasets of variables to reproduce drought (e.g., soil moisture) and drought indices (e.g., soil water deficit index, SWDI) using data-driven models and machine learning methods (ML) (Morid et al. 2007; Belayneh et al. 2014; Guzmán et al. 2017). However, the data-driven models are usually limited when dealing with nonlinear drought prediction, while the more advanced and adaptive machine learning methods are gaining popularity in drought prediction (Feng et al. 2019; Pasoll et al. 2011). ML methods show to be superior when handling the full set of available information (Appelhans et al. 2016). Over the last decades, various machine learning methods have been investigated, including autoregression integrated moving average (ARIMA) (Ping et al. 2010; Tian et al. 2016), artificial neural network (ANN)(Morid et al. 2007; Belayneh et al. 2014), support vector machine (SVM) (Tian et al 2018), adaptive neuro-fuzzy inference system (ANFIS) and hybrid model (Rong et al 2019; Park et al. 2016; Tiwari and Chatterje 2010; Alizadeh et al. 2018). For example, Park et al. (2016) and Alizadeh et al. (2018) applied ML methods to predict meteorological drought index. The results showed that the drought prediction obtained from ML approaches was able to capture the drought variation. Among these ML methods, the SVM model is proved to be a promising ML technique in drought estimation (Pasoll et al. 2011). It is effective in solving high-dimensional problems (Tian et al. 2018). In addition, some previous studies indicate that the SVM method has competitive performance on biophysical parameters with respect to other state-of-the-art methods (Pasolli et al. 2011).

SVM has been applied in classification and regression in the hydrology field (Tiwari and Chatterjee 2010; Alizadeh et al. 2018; Bhagwat et al. 2012). According to previous studies, some researches applied SVM to fuse various variables to obtain drought indices for drought monitoring and predicting (Feng et al. 2019; Bhagwat et al. 2012; Tabari et al. 2012; Poulomi et al. 2014; Liu et al. 2017). For example, Liu et al. (2017) investigated the performance of in situ and remote sensing products for agricultural drought forecasting through SVM and data assimilation methods over the CONUS. Their results showed that the meteorological variables and additional remotely sensed products are able to predict the SWDI with SVM. Feng et al. (2019) fuse 30 remote sensing drought factors to predict a meteorological drought index (standardized precipitation evapotranspiration index, SPEI) by using SVM. The results showed that SVM has relatively good performance in estimation of SPEI with low root mean square error and high R value. Deo et al. (2018) also used the SVM to predict SPEI using 12 variables. They concluded that the SVM model is highly efficient in drought prediction. Moreover, the SVM method also can be used in soil moisture estimation (Liu et al. 2016, 2017; Pasolli et al. 2011; Deo et al. 2018; Arrigo and Smerdon 2008; Maroufpoor et al. 2019; Moosavi et al. 2016). Liu et al. (2016) estimated soil moisture at different soil layers by combing the SVM and the dual ensemble Kalman filter (EnKF) technique, and found that the combination of SVM and dual EnKF is suitable for the multilayer soil moisture estimation. Moosavi et al. (2016) combined remote sensing variables and SVM to estimate surface soil moisture at 100 m spatial and daily temporal resolutions. They concluded that SVM has good performance on soil moisture estimation with high correlation coefficient and low root mean square error.

For SVM method, different input variables probably have significant different performance on drought prediction (Pasolli et al. 2011; Maroufpoor et al. 2019; Moosavi, et al. 2016; Katagis et al. 2013). For example, Katagis et al. (2013) and Moosavi et al. (2016) both estimated soil moisture by using remote sensing data with different variables as inputs in SVM. Their results showed different performance in soil moisture prediction. Therefore, the selection of input variables is especially important in SVM method for drought prediction. Based on the studies mentioned above, the research in drought prediction with SVM usually can be divided into three categories: using in situ meteorological variables (e.g., in situ precipitation, temperature, relative humidity and solar radiation) as inputs, using remote sensing variables (e.g., leaf area index, land surface temperature and remote sensing soil moisture) as inputs, and using the combination of in situ variables and remote sensing variables as inputs. For in situ meteorological variables, most of studies usually focus on the commonly used variables, such as precipitation, temperature, relative humidity and solar radiation. There are few studies investigating the role of evapotranspiration, wind speed and other meteorological variables at drought prediction. The performance of addition of other meteorological variables as inputs in SVM for drought prediction needs to be assessed. Besides, to our knowledge, that whether the combination of remote sensing soil moisture and precipitation products can improve the accuracy of drought prediction has never been investigated.

The Xiang River is the 5th largest tributaries of the Yangtze River, located in Hunan Province, China. The Xiang River Basin is vulnerable to continuous and severe drought disasters (Zhang et al. 2009). Therefore, it is important to identify the characteristics of drought conditions at present and future periods in the Xiang River Basin. Moreover, Xiangjiang River is located in southwest China, and its climate is monsoon climate. The drought monitoring in this basin is of certain reference value for drought research in regions with similar latitudes and climates. In recent years, a variety of studies have been conducted on drought monitoring in the Xiang River Basin (Tian et al. 2018; Lu et al. 2018; Du et al. 2013; Ma et al. 2016; Zhao et al. 2016; Zhu et al. 2019). However, most of them mainly focus on drought monitoring other than prediction, and most of them explored the drought conditions with in situ observations (Tian et al. 2018; Du et al. 2013; Ma et al. 2016; Zhao et al. 2016), while just a few utilized satellite remote sensing products (Zhang et al. 2009; Zhu et al. 2019). As far as we know, there are few works using the SVM method to predict soil moisture and its corresponding drought index (e.g., SWDI) by using in situ meteorological variables and remote sensing products in the Xiang River Basin. Although there are several studies applied remote sensing products in drought prediction, the application of SMAP in drought forecasting with SVM is quite limited. In addition, the clarity of lead time is critical for agricultural drought, which has not been investigated in Xiang River Basin.

The specific objectives of this study are as follows: (1) to select appropriate input variables in SVM for drought prediction; (2) to investigate the efficiency of in situ observations and remote sensing precipitation products as input of SVM for the soil moisture and SWDI forecast; (3) to assess whether the addition of SMAP soil moisture can improve the performance of SWDI prediction. The remaining structure of the paper is organized as follows: the descriptions of the study area and data are presented in Sect. 2; the methodology is introduced in Sect. 3; Sect. 4 provides the results and discussion; and conclusions are drawn in Sect. 5.

2 Study area and data

2.1 Study area

The Xiang river, as the largest river in Hunan Province, originates from the southwest and flows into the northeast of Xiang River Basin (Fig. 1) (Du et al. 2013; Ma et al. 2016; Zhao et al. 2016; Zhu et al. 2019). The annual rainfall in Xiang River Bain ranges from 1500 to 1800 mm. The average annual temperature and evaporation are 18.4 °C and 932 mm, respectively (Zhu et al. 2019). The unevenly distributed rainfall has led to frequent and severe droughts in the Xiang River Basin (Tian et al. 2018).

Fig. 1
figure 1

Xiang River Basin and distribution of the meteorological stations

2.2 Data sources

2.2.1 In situ observations

The in situ observations used in this study are obtained from the China National Meteorological Information Center (https://data.cma.cn) from April 1 to November 30, 2017, including daily precipitation (in situ P), average daily temperature (T), potential evapotranspiration (PET), average relative humidity (Rh), wind speed (Ws) and solar radiation (Rn). The in situ observations from fourteen meteorological stations are calculated at catchment scale and used as the input variables for SVM method.

2.2.2 Reference soil moisture from China land soil moisture data assimilation system

The reference dataset for SVM prediction is collected from China land soil moisture data assimilation system (CLSMDAS). The detailed information of CLSMDAS soil moisture dataset can refer to some previous studies (Shi et al. 2011; Zhu 2013; Shi and Xie 2008). The CLSMDAS soil moisture data are obtained from in situ soil moisture observations and remote sensing products across China by using data assimilation method and land surface model. The CLSMDAS soil moisture data are available since January 19, 2017, and the relatively short time series might limit the application of CLSMDAS soil moisture in hydrology. However, some studies have approved that the CLSMDAS system presents reasonable spatiotemporal distributions of soil moisture and matches well with severe drought events in the southwest part of China (Shi et al. 2011; Zhu 2013; Shi and Xie 2008). The CLSMDAS system provides soil moisture at different depths (0–5 cm, 0–10 cm, 10–40 cm, 40–100 cm, 100–200 cm). However, the remote sensing soil moisture datasets can only provide top 5 cm soil moisture, and there is no in situ soil moisture available in Xiang River Basin. Therefore, in this study, the 0–5 cm CLSMDAS soil moisture in the Xiang River Basin during April 1 to November 30, 2017 is used as the reference for SVM in soil moisture prediction. Moreover, the SWDI derived from CLSMDAS soil moisture (CLSMDAS_SDWI) also used as the reference drought index in this study. The 0–5 cm soil moisture dataset is obtained from China Meteorological Data Sharing Service System (https://data.cma.cn/).

2.2.3 Remote sensing products

The remote sensing products used in this study contain SMAP L2 soil moisture, CMORPH-CRT precipitation, TRMM 3B42V7 precipitation and IMERG V05 precipitation. The summary information is shown in Table 1. The detailed information can refer to some related studies (Fan et al. 2017; Colliander et al. 2017; Huffman et al. 2010a, 2010b; Wei et al. 2018; Joyce et al. 2004; Huffman and Bolvin 2010; Jiang et al. 2016; Li et al. 2017; Wang et al. 2017; Sharifi et al. 2016; Chen and Li 2016; Tan et al. 2017).

Table 1 Summary information of remote sensing products

Considering the available reference CLSMDAS soil moisture is from January 19, 2017, while the accessible data of in situ meteorological observations end at 2017, the time period of this study is from April 1 to November 30, 2017. In this study, the catchment scale is adopted, and all the grid remote sensing data are averaged to the catchment scale.

3 Methodology

As mentioned above, our target is to investigate the efficiency of using in situ and remote sensing precipitation combined with other meteorological variables for drought prediction. Firstly, different variables are selected to investigate their suitability for drought soil moisture/SWDI prediction. Secondly, two types of strategies are designed for SWDI prediction. The first strategy is to predict SWDI indirectly, which means soil moisture is predicted with SVM, and then, SWDI is calculated based on the predicted soil moisture (Strategy I). The second strategy is to predict SWDI directly with SVM (Strategy II). Soil moisture controls and affects water resource management and drought forecast. However, it is challenging to accurately obtain soil moisture under the limitation of in situ networks, especially in remote areas. Therefore, predicting soil moisture by using available meteorological variables in Strategy I can provide an appropriate way to obtain soil moisture. In addition, soil moisture is an important variable for hydrological modeling along with drought monitoring, and the surface soil moisture prediction can be used for soil moisture assimilation in deep layers. Therefore, soil moisture prediction through SVM is also very important. On the other hand, Strategy I and Strategy II can be used to show the difference between direct and indirect prediction of SWDI and to select the more appropriate method for drought prediction with SWDI. The daily CLSMDAS soil moisture datasets are used as reference datasets to evaluate the accuracy of SVM on soil moisture prediction. The weekly SWDI derived from daily CLSMDAS soil moisture is used as reference datasets to evaluate the accuracy of SVM on drought prediction. The specific experimental design for soil moisture and SWDI prediction is presented as follows:

3.1 Experimental design

The experiments conducted for the soil moisture predictions (Strategy I) are given in Table 2 and described as follows:

Table 2 Experimental design for soil moisture prediction

Case 1 is conducted with in situ meteorological data (in situ P, T, PET, Rn, Rh, Ws) as inputs for SVM to predict daily soil moisture. This case aims to investigate whether the in situ meteorological data with short time period can be used for the soil moisture prediction and to provide a benchmark for the comparison with Case 2 to Case 4. Case 2 to Case 4 are designed to incorporate the remote sensing precipitation data, instead of the in situ P, with CMORPH-CRT, IMERG V05 and TRMM 3B42V7, respectively. These cases are used to investigate whether the remote sensing precipitation data can be an alternative source for soil moisture prediction, or even improve the performance of soil moisture prediction when there is no in situ precipitation available. Then, the SWDI was calculated based on the predicted soil moisture obtained from Case 1 to Case 4. The calculation of indirect predicted SWDI is used to assess the efficiency of indirect drought prediction by using in situ and remote sensing precipitation products.

The experiments conducted for the direct SWDI predictions (Strategy II) are given in Table 3 and described as follows:

Table 3 Experimental design for SWDI prediction

Similar to Case 1, Case 5 is conducted with in situ meteorological data (in situ P, T, PET, Rn, Rh, Ws) as inputs for SVM to predict weekly SWDI. This case aims to investigate whether the in situ meteorological data with shortly time period can be used for the SWDI prediction (direct predicted SWDI) and to provide a benchmark for the comparison with Case 6 to Case 12. Case 6 to Case 8 are used to assess whether the remote sensing precipitation products can be an alternative source for SWDI prediction, or even improve the performance. Moreover, the direct predicted SWDI obtained from Case 5 to Case 8 can be used for comparison with indirect predicted SWDI obtained from Case 1 to Case 4. Case 5 to Case 8 also provide a new way for drought prediction when it is difficult to obtain reference soil moisture datasets. Case 9 to Case 12 are used to analyze whether the addition of remote sensing soil moisture can improve the performance of SWDI prediction when compared with Case 5 to Case 8.

3.2 Soil water deficit index (SWDI)

The drought condition is quantified by SWDI using the top 0–5 cm surface CLSMDAS soil moisture. The calculated CLSMDAS_SWDI datasets are used as reference to evaluate the performance of SVM for SWDI prediction. The SWDI is calculated as follows:

$$ {\text{SWDI}} = \frac{{\theta - \theta_{{{\text{FC}}}} }}{{\theta_{{{\text{AWC}}}} }} \times 10 $$
(1)
$$ \theta_{{{\text{AWC}}}} = \theta_{{{\text{FC}}}} - \theta_{{{\text{WP}}}} $$
(2)

where θ is the time series of 0–5 cm CLSMDAS soil moisture (m3/m3). \(\theta_{{{\text{FC}}}}\), \(\theta_{{{\text{WP}}}} \) and \(\theta_{{{\text{AWC}}}}\) represent the field capacity, wilting point and available water capacity, respectively. The \( \theta_{{{\text{WP}}}}\) and \(\theta_{{{\text{FC}}}}\) are calculated based on the percentile method, which considers 5th percentile of the observed soil moisture time series during the growing season as \(\theta_{{{\text{WP}}}} \) and 95th percentile as \(\theta_{{{\text{FC}}}}\). There are several ways to define \(\theta_{{{\text{FC}}}}\) and \(\theta_{{{\text{WP}}}}\): (1) the 5th and 95th of time series of soil moisture denote \(\theta_{{{\text{WP}}}}\) and \(\theta_{{{\text{FC}}}}\); (2) the soil moisture at a soil water potential of − 33 kPa and − 1500 kPa is considered equal to \(\theta_{{{\text{WP}}}}\) and \(\theta_{{{\text{FC}}}}\); (3) \(\theta_{{{\text{WP}}}}\) and \(\theta_{{{\text{FC}}}}\) are calculated by basic soil physical characteristics such as the proportion of clay and sand via pedo-transfer functions. These three methods are all show good performance in defining \(\theta_{{{\text{WP}}}}\) and \(\theta_{{{\text{FC}}}}\), and the first method is the simplest way for calculation (Martinez-Fernandez et al. 2016). If the values of the SWDI are negative, it indicates the occurrence of drought in the Xiang River Basin. According to Martinez-Fernandez et al. (2016), the values of the SWDI and the corresponding drought categories are shown in Table 4.

Table 4 Classification of SWDI for different drought categories

3.3 Support vector machine (SVM)

SVM is a data-driven model used for classification and regression (Cortes and Vapnik 1995). Dependent variables can be estimated using SVM by combining several independent variables through a kernel function (Fig. 2). The algebraic function can be defined as follows:

$$ y = {\text{f}}\left( x \right) + \varepsilon = w \cdot \emptyset \left( x \right) + b $$
(3)

where y is the dependent variable, x is the independent variable, w is the weighted vector, b is the characteristic constant of the regression function, and \( \emptyset \left( x \right)\) is the kernel function. Equation 3 is used to find the relationship between inputs and outputs. In this study, the radial basis function (RBF) is selected as the kernel function for its good performance in soil moisture and drought indices prediction (Tabari et al. 2012; Maroufpoor et al. 2019).

Fig. 2
figure 2

SVM mode

3.4 Evaluation indices

The correlation coefficient (R value) and mean square error (MSE) between predicted values and reference values are used to evaluate the performance of the SVM. The calculation procedure for R value and MSE are given as follows:

$$ R = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {O_{i} - \overline{O}} \right)\left( {P_{i} - \overline{P}} \right)}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {O_{i} - \overline{O}} \right)^{2} } \sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {P_{i} - \overline{P}} \right)^{2} } }} $$
(4)
$$ {\text{MSE}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {P_{i} - O_{i} } \right)^{2} $$
(5)

where \(O_{i}\) and \(P_{i}\) represent the reference dataset (Original) and predicted dataset (Predicted), respectively, and \(\overline{O} \) and \(\overline{P} \) represent the mean value of these two datasets, respectively.

4 Results and discussion

4.1 Meteorological variables selection for drought prediction

In order to select variables for soil moisture and SWDI prediction, the commonly used meteorological variables (e.g., P, T, Rh and Rn) and infrequent used meteorological variables (e.g., PET and Ws) are selected to investigate their efficiency and suitability for drought prediction. The correlation coefficient between selected meteorological variables and soil moisture or SWDI is calculated, and the result is presented in Table 5. The relationship between Rn and soil moisture in Table 5 presents negative relationship, similar to T. This is because the solar radiation can also strengthen evapotranspiration. Compared with T and Rn, in situ P and Rh imply positive signals over the Xiang River Basin. Figure. 3 shows the variation of precipitation over Xiang River Basin. The highest precipitation usually occurs in June and July. After July, the precipitation rapidly decreases during summer. The lowest precipitation is observed between September and October. Precipitation presents positive correlation with soil moisture because precipitation can serve as a source of soil water content. Humidity also can maintain or compensate the soil moisture.

Table 5 The correlation coefficient between meteorological variables and soil moisture or SWDI
Fig. 3
figure 3

The variation of weekly precipitation during April to November 2017 over Xiang River Basin

From Table 5, except T, the commonly used meteorological variables (P, Rh, Rn) are all show relatively high absolute R values. The absolute R values between soil moisture and PET and Ws are high than the absolute R value between soil moisture and Rn (absolute R value = 0.235). Therefore, the PET and Ws were selected and added as input variables in SVM for drought prediction. It is observed that the R value between T and soil moisture is slightly negative at the catchment scale over the Xiang River Basin. The negative correlation between T and soil moisture is well known, because high temperature can accelerate the evapotranspiration and then negatively affect the water content of soil. However, there is a small R value (R value =  − 0.09) between T and soil moisture in this study. The reason is that the dataset of soil moisture used in this study is 0–5 cm soil moisture and the upper layer soil moisture is more vulnerable to precipitation than temperature. Temperature tempts to have significant effects on lower layer soil moisture (Cai et al. 2009). The relatively high R value between in situ P and soil moisture (R value = 0.5437) in this study can also prove this point. In order to investigate the suitability of T as input variable for drought prediction in SVM, two experiments were conducted and the results are in Table 6 (1) and (2). It can be observed that even though the R value between T and soil moisture is small, there are still improvements for adding T in inputs in SVM method for drought prediction. The performance for soil moisture prediction is not good enough without T variable as the input variable in SVM method. Therefore, the T variable was used with other selected meteorological variables as inputs in SVM for drought prediction.

Table 6 The comparison of different inputs in SVM for soil moisture [(1) and (2)] and SWDI [(3) and (4)] prediction

As for SWDI prediction, the correlation coefficient between meteorological variables and SWDI is presented in Table 5. The R values between SWDI and meteorological variables are similar to soil moisture. This is because that the SWDI is calculated based on soil moisture. According to Table 4, the greater the SWDI, the higher the soil moisture. Therefore, there is a positive correlation between SWDI and precipitation, soil moisture and Rh, and a negative correlation between SWDI and temperature, Rn, Ws and PET. Moreover, the experiments investigating the suitability of T as input variable for SWDI prediction in SVM were also conducted and the results are presented in Table 6 (3) and (4). The conclusion is similar to the experiments for soil moisture, so T was also used as an input variable for SWDI prediction. Therefore, from Case 5 to Case 8, the input variables for SWDI prediction include P, T, PET, Rh, Rn and Ws. The input variables for SWDI prediction include P, SM, T, PET, Rh, Rn and Ws from Case 10 to Case 13.

In order to verify whether the addition of Ws and PET could improve the prediction results, this study conducted the following experiment: soil moisture and SWDI prediction was performed using a group of input variables without Ws and PET, and the prediction results were used as a baseline. The control group was the input variable group with Ws and PET added. The results are shown in Table 7. From Table 7 (1) and (2), when Ws and PET were added as input variables, the prediction results of soil moisture were greatly improved at the testing stage, indicating that Ws and PET play a significant role in soil moisture prediction. Table 7 (3) and (4) also shows high improvement in testing stage for the prediction results of SWDI. Both cases indicate that the new set of input variables can improve the performance of drought prediction comparing to other methods.

Table 7 The comparison of experiments with and without Ws and PET in SVM for soil moisture [(1) and (2)] and SWDI [(3) and (4)] prediction

4.2 Indirect SWDI prediction based on strategy I

4.2.1 Soil moisture prediction through SVM

The R value and MSE between original soil moisture (original) and predicted soil moisture (predicted) in training stage and testing stage for case 1 to case 4 are provided in Table 8. It is observed that in the training stage, the remote sensing precipitation products can slightly improve the performance of soil moisture prediction. However, in the testing stage, compared to the in situ precipitation, all three remote sensing precipitation products have no improvements for soil moisture prediction. In situ precipitation performs the best, and CMORPH-CRT performs worse than IMERG V05 and TRMM 3B42V7. TRMM 3B42V7 performs the best among remote sensing precipitation products. These results illustrate that the in situ precipitation with other meteorological variables can effectively predict soil moisture at catchment scale over the Xiang River Basin. Moreover, the TRMM 3B42V7 precipitation product can serve as an alternative precipitation dataset for soil moisture prediction through SVM, if there is no in situ precipitation available.

Table 8 The R value and MSE for case 1 to case 4 in Strategy I

Figure. 4 shows the time series of original and predicted soil moisture in training and testing stage. It is observed that the predicted soil moisture can capture the dynamics of original soil moisture in both training and testing stage. However, the predicted soil moisture cannot capture the peaks and troughs of original soil moisture well in training stage. The predicted soil moisture dataset of the four cases all overestimates the original soil moisture in testing stage. The best results were observed between day 196 and 211. In other period, except for day 212 to 225, the predicted soil moisture is almost identical to the original soil moisture. These results indicate that the in situ precipitation and remote sensing precipitation with other meteorological variables have satisfactory performance in soil moisture prediction with SVM. The lead time for soil moisture prediction with SVM is around 15 days.

Fig. 4
figure 4

Time series of predicted and original soil moisture in training and testing stage. a Case 1; b Case 2; c Case 3; (4) Case4

4.2.2 Indirect SWDI based on the predicted soil moisture

The indirect predicted SWDI values derived from predicted soil moisture in training and testing stage for Case 1 to Case 4 are presented in Fig. 5. The R values and MSE in training and testing stage between indirect predicted SWDI and original SWDI are listed in Table 8. From Fig. 5, the indirect predicted SWDI values obtained from Case 4 in training stage show the best performance in capturing the dynamics of original SWDI obtained from the reference dataset. The R values of Case 4 in Table 9 in training stage are also the highest among the four cases. The results of indirect predicted SWDI in Fig. 5 and Table 9 show the similarity with soil moisture prediction in training stage, which is that the remote sensing precipitation products can slightly improve the performance of SWDI prediction compared with in situ precipitation. However, the indirect predicted SWDI in Case 1 to Case 4 are all underestimate the severity of drought in testing stage. Case 1 shows the best performance in testing stage and followed by Case 3. This result is different with the soil moisture prediction in testing stage that Case 4 shows the best performance among remote sensing precipitation products. It can be concluded that IMERG V05 as input in SVM is more suitable for indirect SWDI prediction, while TRMM 3B42V7 as input in SVM is more appropriate for soil moisture prediction.

Fig. 5
figure 5

Time series of predicted and original SWDI in training and testing stage

Table 9 The R value and MSE for case 1 to case 4 in Strategy II

4.3 SWDI prediction based on strategy II

The R value and MSE between original SWDI and predicted SWDI in training stage and testing stage for case 5 to case 13 are provided in Table 10. From Case 5 to Case 8, it can be observed that all the three remote sensing precipitation products can improve the performance of SWDI prediction in training stage. However, different from the results of soil moisture prediction, TRMM 3B42V7 improved the performance in testing stage compared with CMORPH-CRT and IMERG V05. And IMERG V05 performs better than CMORPH-CRT. These results indicate that the in situ meteorological variables and remote sensing precipitation combined with SVM can be used for SWDI prediction, and the TRMM 3B42V7 can be used as the alternative source to in situ precipitation for SWDI prediction when there is no soil moisture available. From Case 9 to Case 12, it can be concluded that the addition of remote sensing soil moisture can improve the performance of SWDI prediction in training stage compared with Case 5 to Case 8. However, only Case 9 and 10 have better performance in SWDI prediction in testing stage compared with Case 5 and 6. Therefore, it can be concluded that if there are in situ precipitation or CMORPH-CRT available, the remote sensing soil moisture can be added to improve the performance for SWDI prediction. When using IMERG V05 or TRMM 3B42V7 as input, adding remote sensing soil moisture as input in SVM makes no difference for SWDI prediction.

Table 10 The R value and MSE for case 5 to case 12 in training and testing stage

Figure. 6 shows the time series of predicted and original SWDI in training and testing stage. All the predicted SWDI in 8 cases capture the dynamics of original SWDI in training stage. Only 2 or 3 weeks show worse performance in predicted SWDI in testing stage. As for Case 5 to Case 8, it illustrates that in testing stage, the predicted SWDI values are all higher than original SWDI values. It indicates that the SVM underestimates drought severity when using remote sensing precipitation as input. However, for Case 9 to Case 12, even though 1 week (week 32) presents poor performance in SWDI prediction, the predicted SWDI values are closer to original SWDI values than that of Case 5 to Case 8, especially week 33 to week 35. These results indicate that the drought prediction performance with SVM can be improved by adding remote sensing soil moisture as input. The predicted and original SWDI values between week 29 and 30 in Fig. 6 indicate that the lead time for SWDI prediction in SVM is 2 weeks. The lead time is coincided with the lead time for soil moisture prediction in Strategy I, which demonstrates that the appropriate lead time for drought prediction in Xiang River Basin is around 14 days.

Fig. 6
figure 6figure 6

Time series of predicted and original SWDI in training and testing stage. a Case 5; b Case 6; c Case 7; d Case8; e Case 9; f Case 10; g Case 11; h Case 12

Compared to the other cases, Case 9 indicates that the addition of remote sensing soil moisture with in situ precipitation as inputs in SVM performs the best in SWDI prediction, followed by Case 10 and 11. Case 9, 10 and Case 11 also illustrate that the combination of remote sensing precipitation and remote sensing soil moisture has large improvements in SDWI prediction in testing stage. Case 12 shows that the TRMM 3B42V7 presents the best results in training stage; however, it shows the worst performance in testing stage. This result indicates that Case 12 is overfitting in training stage in SVM. Therefore, for model reliability, the experiment that the length of the train set in Case 12 was changed to 90% of the time series, and the R values in training and testing stage are 0.78 and 0.99 respectively. The new set of time series of original and predicted SWDI for case 12 is presented in Fig. 7. It is clearly observed that the performance of predicted SWDI is largely improved. The result at week 32 is also improved over previous one. However, the suitable lead time in testing stage cannot be concluded from Fig. 7 due to the unsatisfactory result in week 32. These results also indicate that the 80% of time series of TRMM 3B42V7 with SMAP soil moisture as the inputs in SVM are unsuitable for SWDI prediction.

Fig. 7
figure 7

Time series of predicted and original SWDI in training and testing stage for new set of Case 12

Commonly, the length of training set is 2/3 of the dataset; however, the soil moisture and SWDI prediction for 2/3 training set show worse performance than 80% training set (Table 11). It can be seen from Figs. 8 and 9 that the reference data in the training stage fit well with the predicted value; however, the prediction stage presents unstable performance. In addition, the lead time for 2/3 training set is shorter than 80% training set. Several researches mentioned that a longer time period (e.g., 90%) of training set can help improve the performance of prediction (Liu et al. 2016; Pasolli et al. 2011; Tabari et al. 2012). However, in this study, a longer time period of dataset for training did not significantly improve the results of the second week in testing stage (see Figs. 6h and Fig. 7). Therefore, this study used 80% of dataset for training. Even though 80% of dataset for training is used in this study, there may exist another appropriate training percentage. Therefore, exploring the optimal length of training set may improve the performance of prediction.

Table 11 The R value and MSE for Case 1 to Case 12 with different training percentage
Fig. 8
figure 8

Time series of predicted and original soil moisture in training and testing stage for Case 1 to Case 4. a Case 1; b Case 2; c Case 3; (4) Case4

Fig. 9
figure 9figure 9

Time series of predicted and original SWDI in training and testing stage for Case 5 to Case 12. a Case 5; b Case 6; c Case 7; d Case8; e Case 9; f Case 10; g Case 11; h Case 12

5 Conclusion

This study investigated the performance of selected in situ meteorological variables (PET, T, P, Rn, Rh and Ws) and remote sensing products (SMAP soil moisture, IMERG V05, CMORPH-CRT and TRMM 3B42V7 precipitation products) for soil moisture and drought index (e.g., SWDI) prediction using SVM at catchment scale over the Xiang River Basin. Two different strategies are proposed to predict drought, which are predict SWDI indirectly and directly, respectively. Additionally, the performance of adding SMAP as input of SVM was also investigated. The main conclusions are as follows:

  1. (1)

    The new set of input variables in SVM method, including P, PET, T, Rh, Rn and Ws, is suitable for drought prediction over the Xiang River Basin.

  2. (2)

    The limited meteorological variables are able to predict soil moisture over the Xiang River Basin with SVM. The best performance in soil moisture prediction was observed in Case 1 that the in situ P and other meteorological variables used as inputs of SVM. The IMERG V05 precipitation product can serve as an alternative precipitation dataset for indirect SWDI prediction with SVM if there is no in situ precipitation available.

  3. (3)

    The in situ precipitation and remote sensing precipitation products with other meteorological variables used as inputs in SVM are able to predict SWDI effectively. In situ precipitation and IMERG V05 precipitation with other meteorological variables as inputs in SVM are more suitable for indirect SWDI prediction, while CMORPH-CRT and TRMM 3B42V7 are more suitable for direct SWDI prediction. The TRMM 3B42V7 can be used as the best precipitation products for direct SWDI prediction when there is no in situ soil moisture and in situ precipitation available.

  4. (4)

    The addition of remote sensing soil moisture can improve the performance of SWDI prediction when the input precipitation sources are in situ precipitation and CMORPH-CRT. Typically, the addition of remote sensing soil moisture with in situ precipitation performs the best than the other three remote sensing precipitation products. The addition of remote sensing soil moisture with TRMM 3B42V7 and IMERG V05 has no improvement for SWDI prediction. This indicates that TRMM 3B42V7 and IMERG V05 without remote sensing soil moisture can be used as inputs in SVM for SWDI prediction.

  5. (5)

    The appropriate lead time for drought prediction in SVM over the Xiang River Basin is around two weeks.