Keywords

1 Introduction

Surface soil moisture is a key dynamic hydrological state variable [28] that affects various hydrological process. Spatiotemporal variation of surface soil moisture significantly affects the watershed-scale soil moisture dynamics, which may result in variations of surface atmospheric feedback, runoff dynamics, groundwater recharge, and crop yield. Understanding surface soil moisture dynamics at a watershed-scale with reasonable temporal and spatial resolution is required for hydrological modeling [7, 19, 30], climate modeling and weather prediction [1, 12, 20] and agricultural modeling [2, 23, 31]. Thus, precise measurements of surface soil moisture at the watershed-scale are necessary to fulfill the requirements of various applications.

The spatiotemporal variation in geophysical characteristics such as rainfall, vegetation cover, soil properties, and topography over a watershed-scale makes soil moisture distribution highly nonlinear across time and space. Therefore, measuring soil moisture in a large watershed (>100 km2) to represent the average soil moisture dynamic accurately is challenging. The conventional in-situ point scale measurement (i.e., gravimetric sampling) of soil moisture is precise. Still, gravimetric sampling is not practicable for watershed-scale soil moisture measurement, where many in-situ observations are needed at frequent intervals. The use of electronic sensors-based geophysical techniques with fixed-type automatic data-logging devices for continuous in-situ soil moisture measurement eliminates the need for time-intensive gravimetric sampling [6]. But, it is expensive to maintain a sufficiently large-density sensor network for watershed-scale soil moisture assessment. Soil moisture scaling theory and soil moisture spatiotemporal variability analysis reveals that a reliable estimate of large-scale soil moisture could be obtained using a few-point observation [3, 10, 13, 22]. Inline this context, past studies show the potential of soil moisture spatiotemporal variability analysis with a few statistical analyzes to determine the number of required samples (NRS) to estimate mean soil moisture of a large area [3,4,5, 14, 29]. In addition, a hypothesis on the temporal stability of soil moisture spatial pattern [27] provided an opportunity to minimize the NRS to capture the temporal pattern of the soil moisture over a large region [4, 5, 10 and 11].

Notably, these studies also suggested that areal mean soil moisture assessment using a few-point observation and scale of temporal stability must be established using dense soil moisture measurements over a large region. However, due to expenses and limited conditions experienced in soil moisture field campaigns, monitoring is complex, and soil moisture spatiotemporal variability analysis in an area greater than 100 km2 [5] is poorly understood. Over the past, studies focused on measuring soil moisture and its variability analysis either favoring the large spatial scale [8, 14, 17, 18] or long time periods [4, 15, 16, 21], but very few consider both of them [5, 9]. Besides, tropical regions are rarely studied for soil moisture variability analysis, which may hold key differences from previous studies due to very high variability in soil moisture from very dry to very wet soil conditions.

In view of the facts mentioned above, the present study aimed to investigate the long-term soil moisture spatiotemporal variability over a large tropical agricultural watershed (>100 km2) for the optimal sampling design of soil moisture measurements. For this objective, frequent in-situ measurements at various point locations were carried out in an eastern Indian watershed of 500 km2 to capture different soil wetness conditions for a year.

2 Study Area and Measurements

2.1 Study Area

An agricultural watershed, namely, Rana watershed of approximately 500 km2, located in the middle region of the Mahanadi River basin of eastern India, is selected for this study (Fig. 1). The climate of the watershed is tropical, having marked seasonality in rainfall with the long dry season. The average annual rainfall of the study area is 1458 mm, mainly occurring (about 70%) in the southwest monsoon season of June to September [24]. Due to the tropical climate, the study area experiences very high temperatures during April and May. The mean annual temperature of this region is 27.4 °C with maximum and minimum temperatures of 42.2 °C and 11.3 °C. Due to the high climate variability, the study watershed experiences varying soil wetness conditions, i.e., very dry to very wet and back to very dry conditions. Paddy is a dominant crop in the study region and is usually cultivated during the southwest monsoon season. Elevation in the watershed ranges between 22 and 299 m (see Fig. for the topography view). Based on the soil textural analysis of various locations of the watershed it was found that soil texture class of the watershed varies considerably from sandy loam to clay, where a major portion of the watershed has sandy loam, followed by sandy clay loam and clay loam.

Fig. 1
A map of India highlights the region of the Mahanadi river basin located in the middle east of India. It marks the Rana watershed, river, Hirakud reservoir, middle region, and upper and data region. A 3-D graph plots the Rana watershed model with elevation, longitude, and latitude.

a Location of Mahanadi river basin in India b Location of the study area, Rana watershed in Mahanadi River basin c Topographic overhead view of Rana watershed with sampling locations

2.2 Soil Moisture Measurements

With the aim of achieving dense sampling at a large spatial scale for a long period, 83 locations (i.e., 79 agricultural fields and 4 grasslands) were selected within the watershed of 500 km2 for measuring soil moisture (see Fig. 1). In addition to the vegetation, the criteria for choosing the sampling locations were geophysical characteristics that affect the spatiotemporal variation of the soil moisture, such as soil texture and elevation. The choice of sampling periods was based on the criterion of minimum interaction with human activities, such as tillage. Therefore, soil moisture measurements were initiated in June when the paddy crop was planted in most fields. Samplings were not carried out for a one-month duration (10 December 2016−10 January 2017) due to tillage activities after the paddy crop in a few areas of the watershed for the cultivation of pulse crops (i.e., moong). Overall, soil moisture measurements were conducted for nearly one year, from 20 June 2016 to 12 July 2017, to capture the entire range of soil moisture variability from dry to wet and wet to dry conditions. Due to unfavorable conditions, sampling was not carried out frequently from August to October because of heavy rainfall and standing water in most paddy fields. Similarly, measurements were not taken frequently in April and May because of the soil's hardness (i.e., very dry condition). Overall, a total of 56 days of soil moisture sampling was carried out in one year. A view of soil moisture measurements during field campaigns is presented in Fig. 2a. The temporal pattern of the watershed-mean soil moisture based on soil moisture sampling at 83 sampling locations is also shown in Fig. 2b.

Fig. 2
15 photographs of the different levels of the paddy crop during the one year with measurement tools. An error graph plots watershed mean V S M and rainfall versus the months. The highest peak is measured between 16 August to 16 September and the lowest is on 17 March.

a A view of soil moisture measurements during different stages of the paddy crop and field conditions throughout the year. b Temporal pattern of the watershed-mean volumetric soil moisture (VSM), along with error bars of ±1 standard deviation in space. The blue bars represent a variation of daily rainfall measured by a weather station in the watershed

Soil moisture was sampled using an impedance probe (ThetaProbe, type ML3 and HH2 recording device, Delta-T Devices, Cambridge, England) which consists of four sharpened, 6 cm long stainless-steel rods. For each sampling location, three-point measurements of ThetaProbe were taken at 10–15 m separation distances, and each measurement consisted of three ThetaProbe samples. A total of nine ThetaProbe samples were taken from each sampling location to reduce the uncertainty in the estimates of mean soil moisture of a sampling location. Figure 3 shows a view of the sampling design and soil moisture measurements. During each sampling day, 747 soil moisture measurements were carried out, and a total of 41,832 samples were collected in 56 sampling days. In addition, on each sampling day, 10% of the total locations were also sampled with gravimetric sampling. Gravimetric samples were taken adjacent to one of the ThetaProbe samples with a soil core sampler having a fixed volume of 137 cm3 with a 6 cm depth. The measured impedance from ThetaProbe was calibrated using gravimetric-based volumetric soil moisture content. A single generalized calibration of Thetaprobe was developed [25] for the watershed and used for precise soil moisture measurements at each sampling location.

Fig. 3
An illustration of the sampling location. It includes sampling location, point measurement, theta probe sample, gravimetric sample, and theta probe measurement. 6 photographs below exhibit the theta probe device, soil core sampler, and collection of the samples from the field using devices.

Sampling design at each sampling location, where a large blue circle represents the sampling location and three-point measurements at the sampling location forming a triangle is shown with a gray circle. Small black circles show the shape of ThetaProbe measurement, and the red circle represents the position of the gravimetric sample. A view of the gravimetric sample collection in the rice field is presented through photographs

3 Methodology

The statistical properties of each sampling day and whole field campaign are analyzed in terms of their variability in space and time for the soil moisture spatiotemporal variability analysis as given below:

Let \(\theta_{ijk}\) the soil moisture measured at point i, sampling location j and sampling day k, then the spatial mean of the sampling location and sampling day, \(\overline{{\theta_{jk} }}\), is given by.

$$\overline{{\theta_{jk} }} = \frac{1}{{N_{p} }}\sum\limits_{i = 1}^{{N_{p} }} {\theta_{ijk} }$$
(1)

where Np is the number of measurement points at the sapling location j. Similarly, the spatial mean of each sampling day, \(\overline{{\theta_{k} }}\), can be defined as.

$$\overline{{\theta_{k} }} = \frac{1}{N}\sum\limits_{j = 1}^{N} {\overline{{\theta_{jk} }} }$$
(2)

where N is the number of sampling locations. Also, the temporal mean for each sampling location, \(\overline{{\theta_{j} }}\), can be defined as:

$$\overline{{\theta_{j} }} = \frac{1}{M}\sum\limits_{k = 1}^{M} {\overline{{\theta_{jk} }} }$$
(3)

where M is the number of sampling days.

The coefficient of variation of each sampling day in space, CVk, is calculated as follows:

$$CV_{k} = \frac{{\sigma_{k} }}{{\overline{\theta }_{k} }} = \frac{{\sqrt {\frac{1}{N - 1}\sum\limits_{j = 1}^{N} {\left( {\overline{{\theta_{jk} }} - \overline{{\theta_{k} }} } \right)^{2} } } }}{{\overline{{\theta_{k} }} }}$$
(4)

where σk is the standard deviation in space for a sampling day.

Determination of standard deviation helps in the assessment of an optimal number of sampling locations (ONL) for estimating the mean soil moisture within a specified range of absolute error. The robust monitoring strategy has been optimized for the watershed-mean soil moisture assessment through soil moisture spatiotemporal variability analysis using a statistical approach and time stability method.

3.1 ONL Analyzes Using a Statistical Approach

ONL, for watershed-mean soil moisture (\(\overline{{\theta_{k} }}\)) assessment within a specific value of absolute error, has been determined using Eq. 5 given by Gilbert (1987) and is expressed as:

$$ONL = \left( {t_{1 - \alpha /2,ONL - 1} \frac{{\sigma_{E} }}{AE}} \right)^{2}$$
(5)

where σE is a standard error, t1-α/2,ONL-1 is the value of the student’s t-distribution at a significance level α, and depending on the sample dimensions ONL; AE represents the absolute error (% v/v). If a reliable value of σE is not available, CV can be used to estimate ONL (Gilbert, 1987). For this, a functional relationship between the CVk as well as σk and the \(\overline{{\theta_{k} }}\) is investigated. An exponential function \(CV_{k} = \,k_{1} .\,e^{{ - k_{2} \overline{{\theta_{k} }} }}\) was used to fit the CVk -\(\overline{{\theta_{k} }}\) relationship for characterizing the variations of soil moisture, as usually employed for soil moisture campaigns [3, 14], where k1 and k2 are the model fitting parameters. Based on the CVk -\(\overline{{\theta_{k} }}\) fitting, a relationship between σk and \(\overline{{\theta_{k} }}\) can be derived as \(\sigma_{k} = \,k_{1} \,.\,\,\overline{{\theta_{k} }} \,.\,e^{{ - k_{2} \overline{{\theta_{k} }} }}\). Further, the σk -\(\overline{{\theta_{k} }}\) relationship is used to calculate the uncertainty in watershed-mean soil moisture assessment from a certain number of soil moisture samples, including its evolution with drying or wetting.

Moreover, the exponential law described by CVk -\(\overline{{\theta_{k} }}\), is employed to determine the ONL to achieve a specified uncertainty using σk -\(\overline{{\theta_{k} }}\). Equation 6 is solved by an iterative procedure to estimate the ONL at a 5% significance level for different absolute errors (AE). The iterative process is repeated until \(\left| {ONL_{l} - ONL_{l - 1} } \right| \le \varepsilon\), where ε is a control value, 0.5 in this study.

$$ONL_{l} = \left( {t_{{0.975,ONL_{(l - 1)} - 1}} \frac{{k_{1} \,.\,\,\overline{{\theta_{k} }} \,.\,e^{{ - k_{2} \overline{{\theta_{k} }} }} }}{AE}} \right)^{2} ,\,\,\,\,l = 1,\,2,\,3,\,...$$
(6)

The statistical approach quantifies the ONL to determine the sampling size for capturing the spatial variability of soil moisture at the watershed-scale. But most hydrological applications require a temporal pattern of watershed-mean soil moisture. Besides, the exact position of the ONL in the study domain is essential to set up a soil moisture sensor network. Since the statistical approach fails to characterize the temporal pattern of the watershed-mean soil moisture, a time stability method is used to identify locations where the soil moisture can be considered “representative” of the entire area of study at the temporal scale.

3.2 ONL Analyzes Using Temporal Stability Analysis

The temporal stability analysis [27] identifies the sampling locations that maintain a consistent temporal relationship with the areal mean soil moisture with little variability. This study conducts temporal stability analysis based on the parametric test of the relative differences in soil moisture. For each sampling location j and total sampling days M, the mean relative difference of soil moisture, \(\overline{{\delta_{j} }}\) (% v/v), and variance of the relative difference, σ(δj)2, is estimated as:

$$\overline{{\delta_{j} }} = \frac{1}{M}\,\,\sum\limits_{k = 1}^{M} {\,\frac{{\overline{{\theta_{jk} }} - \overline{{\theta_{k} }} }}{{\overline{{\theta_{k} }} }}}$$
(7.1)
$$\sigma (\delta_{j} )^{2} = \frac{1}{M - 1}\,\,\sum\limits_{k = 1}^{M} {\,\,\left( {\frac{{\overline{{\theta_{jk} }} - \overline{{\theta_{k} }} }}{{\overline{{\theta_{k} }} }} - \overline{{\delta_{j} }} } \right)^{2} }$$
(7.2)

\(\overline{{\delta_{j} }}\), at a sampling location computes the location’s bias and helps to identify whether a particular location is wetter or drier than the areal mean. Generally, a “representative” location to capture the temporal pattern of the areal mean soil moisture can be identified by the low value of \(\left| {\overline{{\delta_{j} }} } \right|\) and/or standard deviation of the relative difference, σ(δj). [17] considered combining \(\overline{{\delta_{j} }}\) and σ(δj) statistical metrics relative difference and presented a comprehensive evaluation criterion (CEC) to include both the bias and accuracy to locate the best time-stable locations.

$$CEC_{j} = \sqrt {\left( {\overline{{\delta_{j} }} } \right)^{2} + \,\sigma \left( {\delta_{j} } \right)^{2} }$$
(8)

Based on the rank-ordered CECj, the sampling location with the highest time stability is identified as the one with the lowest CECj value.

4 Results and Discussion

The temporal pattern of measured soil moisture was found to be highly linked to rainfall (see Fig. 2b). The measured soil moisture statistical analysis shows that the spatial CVk ranges between 0.151 and 0.901. CVk was found to be very high during dry periods, whereas low in wet periods and, on average equal to 0.404. On the other hand, the temporal CVj was found to have an average of 0.723, considerably higher than CVk, and ranges between 0.578 and 0.887. This confirms that soil moisture temporal variability is more significant than spatial variability and indicates that ONL to capture the temporal pattern of the watershed-mean soil moisture can be derived in this study region. The high CVk value follows the findings reported in the past investigation on soil moisture variability in relation to the spatial variability of soil moisture and the dimension of the investigated area [3,4,5, 14]. Specifically, the spatial CVk increases with the increase in the area, where average CVk ranges from 0.06 to 0.20 for the area of 1 m2 and 250 km2, respectively, at 0–15 cm depth in central Italy [5]. Comparing σk and CVk values presented in [14], the values found for the eastern Indian watershed are very similar.

4.1 ONL to Capture Watershed-Mean Soil Moisture

Based on the in-situ measurements, an analytical relationship was fitted between \(\overline{{\theta_{k} }}\) and the CVk for the growing season, non-growing season, and the whole year, as presented in Table 1, and Fig. 4a. Further, the fitted parameters of CVk verses \(\overline{{\theta_{k} }}\) relationship were utilized to derive a relationship between σk verses \(\overline{{\theta_{k} }}\) as shown in Fig. 4b to capture the spatial variability of soil moisture. The distribution of σk verses \(\overline{{\theta_{k} }}\) shows that σk increases until \(\overline{{\theta_{k} }}\) reaches around 30% and decreases beyond that, whereas CVk shows a rapid decreasing pattern with increasing \(\overline{{\theta_{k} }}\). It was also observed that CVk has a very less scattered pattern as \(\overline{{\theta_{k} }}\) decreases during the growing season but is found to be a widely scattered pattern during the non-growing season. Overall, a decreasing exponential pattern of CVk\(\overline{{\theta_{k} }}\) and a convex upward trend in σk\(\overline{{\theta_{k} }}\) reported in this study is similar to those reported in previous studies across the world [3,4,5, 13, 14, 17] (Fig. 4).

Table 1 The exponential fitting parameters of CVk versus \(\overline{{\theta_{k} }}\) relationship and the corresponding coefficient of determination (R2)
Fig. 4
6 line graphs plot the coefficient of variation and standard deviation versus the watershed mean soil moisture of the growing season, non-growing season, and whole year. The lines in the coefficient of variation have a decreasing trend while another graph has a gradually increasing trend.

Analytical relationship between watershed-mean soil moisture (\(\overline{{\theta_{k} }}\)) and statistical characteristics a coefficient of variation (CVk), b standard deviation (σk) of soil moisture measurements at watershed-scale during each sampling day for the growing season, non-growing season, and the whole year

Further, the fitted decreasing exponential pattern of CVk, \(\overline{{\theta_{k} }}\) was used to quantify the ONL as a function of the average wetness condition and in relation to a prefixed significance level and varying absolute errors. Figure 5a. demonstrates the ONL (using Eq. 6) to capture the watershed-mean soil moisture with a 5% significance level and within an absolute error of ±2%, during different seasons. The ONLs were found to be equal to 30 for the growing season and fewer for the non-growing season, equivalent to 20, whereas a maximum ONL of 25 is needed for the whole year to assess watershed-mean soil moisture. The maximum ONL found to be 30 that is during the growing season confirms the effectiveness of the sampling design adopted for this study, where 83 locations were monitored. Analysis of different absolute errors for ONL shows that fewer resources are needed on higher absolute error (> ±2%) for watershed-mean soil moisture estimation (Fig. 5b). The reported analysis reveals that ~10 to 15 sampling locations are sufficient to capture the spatial pattern of the watershed-mean having an area of 500 km2 with a more relevant range of absolute error between ±4% and  ±6%. The ONL obtained in this study matches with the ONL found for different hydroclimatic regions and geomorphological conditions with varying spatial scales, ranging between 15 and 40 sampling locations [4, 5, 14]. Since the computation of the “average” error on the soil moisture temporal pattern is necessary [4]), a temporal pattern of spatial mean soil moisture is analyzed to assess ONL for a reliable estimate of watershed-mean soil moisture.

Fig. 5
2 multi-line graphs plot the optical number of locations versus the watershed mean soil moisture and absolute error of the growing season, non-growing season, and whole year. The lines in the first graph first rise and then start to drop. The lines in the second graph steeply decrease.

The optimal number of locations to capture the watershed‐mean soil moisture at a 5% significance level during different seasons a  with ±2% absolute error and b at different absolute errors

4.2 ONL to Capture the Temporal Pattern of Watershed-Mean Soil Moisture

The most temporally stable locations that can serve as representative locations to capture the watershed-mean soil moisture were identified using time stability analysis based on the parametric test of relative differences (Eq. 7). Figure 6a shows the time stability characteristics, mean relative difference (\(\overline{{\delta_{j} }}\)), ranked from smallest to largest along with ±1 standard deviation. The sampling locations representing positive \(\overline{{\delta_{j} }}\) values constantly overestimate the watershed-mean soil moisture, whereas negative values of \(\overline{{\delta_{j} }}\) represents underestimation of the watershed-mean soil moisture consistently. Generally, the range of variation in \(\overline{{\delta_{j} }}\) increases with the investigated area size [5]. The CEC (Eq. 8) has the ability to effectively eliminate systematic bias as well as accurately capture the watershed-mean soil moisture at each sampling time. The selected most time-stable location based on the lowest CEC value was found to capture the watershed-mean soil moisture with a high correlation (R2 = 0.852) but with a high Root-Mean-Square error (RMSE) of 5.573%. Though only one representative location has the capability to capture the temporal variation pattern of the watershed-mean soil moisture with high correlation, it fails to provide a reasonable accuracy (RMSE of ±4% or better). Notably, this contradicts the previous study's finding [5], where one representative site can capture the temporal pattern of areal mean soil moisture with reasonable accuracy at a regional scale of 200 km2. In further analysis, it was found that the five most time-stable locations of the study watershed can capture watershed-mean soil moisture with an excellent correlation (R2 = 0.981) and RMSE of ± 2.17%. The scatter plot in Fig. 6b shows ensemble soil moisture of the identified five best time-stable locations that can represent watershed-mean soil moisture with an error range of ±4%. These analyzes imply that the five time-stable sampling locations could be utilized to capture the temporal pattern of watershed-mean soil moisture with reasonable accuracy and confirm the robust optimal sampling design for the watershed of 500 km2. The spatial distribution of the five most time-stable locations in the watershed along with other sampling locations (Fig. 7), shows that identified sampling locations are well distributed across the watershed. Remarkably it was also found that the identified time-stable locations have different elevations and soil texture conditions.

Fig. 6
A multi-line graph plots the mean relative difference of V S M and time stable locations ensemble versus the rank and watershed mean. The line C E C starts at negative 50 and the line average difference starts at 50 and has a gradually increasing trend. The lines of different error have a gradual increasing trend.

a Rank ordered mean relative difference of soil moisture at each sampling location with  ±1 standard deviation error bars. The solid line is a comprehensive evaluation criterion (CEC) to identify the best time-stable locations. The thick bars are the five best time-stable locations in the watershed based on the CEC. b Comparison of watershed-mean VSM with the ensemble VSM of the identified five best time-stable locations to achieve a reasonable accuracy (RMSE of ±4% or better)

Fig. 7
A map of the river basin of Mahanadi located in the middle east region of India. It indicates the time-stable location, sampling location, and elevation with low and high.

The elevation map of the watershed shows 83 soil moisture sampling locations. Locations marked with a circle are the five most representative or time-stable locations of the watershed

5 Conclusion

On the analyzes of 41,832 in-situ soil moisture samples collected during one year at 83 sampling locations in a watershed of 500 km2, it can be concluded that the dense in-situ measurements help to characterize the soil moisture spatiotemporal variability at the watershed-scale. The characterization of soil moisture spatiotemporal variability reveals that sampling at a few locations (~30 locations) is sufficient for capturing the spatial pattern of the soil moisture at a watershed of 500 km2. In comparison, very fewer resources, and nearly five representative or time-stable locations are required to capture the temporal variation pattern of the watershed-mean soil moisture within an RMSE of  ~  ±2%. However, the optimal sampling design strategy must be transferred to ungauged regions to avoid the monitoring cost of dense measurements. The selection of representative locations a priori in ungauged areas to capture the temporal variation pattern of the areal mean soil moisture is possible with various geophysical characteristics information. A few past studies [17, 26], reveal that soil properties and topography are significant geophysical parameters that jointly control spatiotemporal persistence. However, more detailed investigations are needed towards transferring optimal sampling design strategy to the ungauged area by assessing the effects of heterogeneities of similar or different geophysical properties in other regions and for different space and time scales. In this study, a large soil moisture dataset at a watershed-scale has been generated through several intensive field campaigns for tropical regions where soil moisture information is sparse. The optimal sampling design based on long-term intensive sampling can be utilized as a guideline for designing a robust sensor-based network at the watershed-scale and could be helpful in planning sampling for satellite soil moisture product validation. The data used for optimal sampling design focuses only on one watershed in the tropical region of India. Therefore, further analysis is needed using in-situ soil moisture measurements of other watersheds in tropical climates and possibly other climate regions to assess the applicability of such an approach.