1 Introduction

Soil moisture plays a vital role in the exchange of moisture and energy fluxes at the land-atmosphere boundary. Hence, the accurate estimation of surface soil moisture is of utmost importance for various application studies such as the weather forecast, flood or drought prediction, soil erosion, and climate change (Walker and Houser 2001).

Ground-based soil moisture observations are point observations and are limited in spatial and temporal extent and are expensive to maintain (Robinson et al. 2008; Dorigo et al. 2011). Land surface models (LSM) can provide for continuous and spatially distributed soil moisture estimates over a time period by integrating the LSM with appropriate atmospheric forcings. Regional and continental soil moisture estimates are entirely based on the output from LSM (Srinivasan et al. 2000). LSMs are usually forced with observed precipitation and surface meteorology and hence the soil moisture estimates obtained from LSM do not reflect the contribution of irrigation to the soil moisture estimates. However, the satellite retrievals of soil moisture estimates are effective in capturing the irrigation effects (Kumar et al. 2015b; Nair and Indu 2019). It is suggested that the soil moisture estimates obtained from LSM may reflect the role of irrigation if they are assimilated with soil moisture estimated from satellites (Kumar et al. 2015b). Furthermore, the above assimilation would contribute to reduced uncertainties in the LSM soil moisture estimates to ultimately yield a much improved soil moisture estimate. Although such studies that ingest soil moisture obtained from LSM with satellite retrievals exist in the literature (Nair and Indu 2019; Kumar et al. 2015a), there are very few instances where such studies have been carried out over India.

Ensemble Kalman filter (EnKF) technique is widely employed in data assimilation for the following reasons: (i) the suitability of its sequential structure for processing the satellite retrievals in real time, (ii) its easy implementability even with nonlinear model equations, and (iii) its ability to consider a number of model errors (Reichle et al. 2002). Blankenship et al. (2016) assimilated the Soil Moisture and Ocean Salinity (SMOS) satellite retrieval into the Noah land surface states via EnKF and showed that the anomaly correlation of soil moisture at 10-cm depth has increased from 0.45 to 0.57 with respect to in situ measurements over the central and southeastern USA. Nair and Indu (2016) studied the improvement of Noah LSM soil moisture by assimilating Soil Moisture Operational Products System (SMOPS) satellite soil moisture over the Indian domain and showed an improvement in the results with the values of average correlation of 0.96 and average root mean square difference of 0.03 m3m− 3. Drusch (2007) studied the impact of data assimilation on the European Centre for Medium-Range Weather Forecasts (ECMWF) integrated forecast system using Tropical Rainfall Measuring Mission (TRMM) Microwave Imager (TMI) soil moisture data set.

The main objective of this study is to examine, whether the LSM soil moisture estimates after assimilation, do reflect the contribution of irrigation considering that India is a reasonably well irrigated country. Towards realizing the above objective, an attempt is made to improve the LSM soil moisture estimates by assimilating Advanced Scatterometer (ASCAT) satellite soil moisture retrievals into the Noah LSM using an EnKF technique. Furthermore, a detailed validation of the improved near surface soil moisture estimate is performed by comparing the assimilated land surface state with in situ ground based weekly India Meteorological Department (IMD) observations as well as with the high-resolution Indian Monsoon Data Assimilation and Analysis (IMDAA) regional reanalysis data sets. For IMDAA, the soil moisture analysis is produced by applying Extended Kalman filter (EKF) based land data assimilation system by ingesting (ASCAT soil wetness from MetOp satellite) soil moisture observations. Since the abovementioned high-resolution IMDAA soil moisture analysis is obtained by assimilating satellite soil moisture estimates, the above will have the signature of the irrigation.

2 Model, simulation, and validation

2.1 Land information system (LIS)

The present study employs 3.6 version of Noah LSM (Mitchell et al. 2004) available within the National Aeronautics and Space Administration (NASA) LIS (Kumar et al. 2006) that provides for EnKF data assimilation technique. The Noah LSM is based on the coupling of the diurnally dependent Penman potential evaporation approach of Mahrt and Ek (1984), the multi layer soil model of Mahrt and Pan (1984), and the primitive canopy model of Pan and Mahrt (1987). The above LSM has been extended by Chen et al. (1996) to include the effects of canopy resistance using the approach of Noilhan and Planton (1989) and Jacquemin and Noilhan (1990). The Noah LSM has one canopy layer and four soil layers with thickness of each layer from the ground surface being 0.1, 0.3, 0.6, and 1.0 m, respectively with the following prognostic variables such as soil moisture and temperature in the soil layers, water stored on the canopy, and snow stored on the ground. While the root zone is in the upper 1 m of soil, the lowest 1-m soil layer acts like a reservoir with a gravity drainage at the bottom. The surface skin temperature is determined following (Mahrt and Ek 1984) by applying a single linearized surface energy balance equation representing the combined ground-vegetation surface with the ground heat flux determined by the diffusion equation for soil temperature. The prognostic equation for the volumetric soil moisture content is determined by the Richard diffusion equation which is derived from Darcy’s law under the assumption of a rigid, isotropic, homogeneous, and one-dimensional vertical flow domain. The Noah LSM has a simple snow and sea-ice model; the snow model has a single layer of snow cover and simulates the snow accumulation, sublimation, melting, and heat exchange at snow-atmosphere and snow-soil interfaces. The precipitation is categorized as snow when the temperature in the lowest atmospheric layer is below 0° C. The Noah LSM employs the vegetation type and soil texture as the two primary variables upon which other secondary parameters such as minimal canopy resistance and other soil hydraulic properties are determined. More details are available from Chen and Dudhia (2001).

The Noah LSM is forced with meteorological forcings and land surface parameters. For the present study, the land cover data is obtained from the Moderate Resolution Imaging Spectroradiometer-International Geosphere-Biosphere Programme (MODIS-IGBP), with a horizontal resolution of 1 km. State Soil Geographic-Food and Agriculture Organization (STATSGO-FAO) blended soil texture map data provides the soil texture data set for this study. Shuttle Radar Topography Mission (SRTM) data is used for elevation. The monthly albedo, maximum snow albedo, Greenness fraction and Slope type data sets are obtained from corresponding NCEP data sets with a spatial resolution of 0.01° × 0.01°. Bottom temperature information is taken from the International Satellite Land Surface Climatology Project 1 (ISLSCP1) bottom temperature data sets. The meteorological forcing data is taken from Global Data Assimilation System (GDAS) except for the rainfall rate which is from IMD gridded rainfall data with a spatial resolution of 0.25° × 0.25°.

Soil moisture is an important variable, that needs to be initialized accurately. For the present study, the Noah LSM was spun-up by cycling five times (five loops) through the period from 01 January 2011 00 UTC to 01 January 2012 00 UTC using all the meteorological forcing data from GDAS and rainfall data from IMD. The deepest soil layer (0.60 to 1 m) soil moisture content in the Noah LSM is utilized for assessing the establishment of the equilibrium condition by checking and ensuring that the differences between the soil moisture content at the deepest layer for the present and the previous loop is less than 5% (Rodell et al. 2005; Case et al. 2007).

The study area is over the Indian land domain spanning a latitude ranging from 6.375° N to 38.375° N and a longitude encompassing from 66.375° E to 99.875° E with a horizontal resolution of 0.125°×0.125°. In this study, the Indian landmass domain is divided into four homogeneous regions, namely, Northwest India, Northeast India, Central India, and South Peninsular India, according to the distribution of monsoon rainfall over the Indian domain. The four homogeneous regions and their meteorological subdivisions are shown in Fig. 1.

Fig. 1
figure 1

The homogeneous regions, numbering four defined by India Meteorological Department (IMD) are based on the amount of precipitation as well as the seasonal variations of precipitation. Also, the various meteorological subdivisions within each of the four homogeneous regions of India are defined by IMD

2.2 ASCAT and LIS simulation

ASCAT is a real aperture radar system carried on-board the Meteorological Operational (Metop) polar satellites launched by the European Space Agency (ESA) which provides day-night measurements unaffected by cloud cover. The surface soil moisture estimated from ASCAT for the topmost soil layer (< 5 cm) is given in degree of saturation, ranging from 0% (dry) to 100% (wet) and are available with a resolution of 0.25°×0.25° on daily intervals. In this study, ASCAT data is obtained from SMOPS (Liu et al. 2012).

Two simulations are performed to evaluate the positive impact of assimilation of daily ASCAT soil moisture retrievals with the Noah LSM land surface states: (i) control run (CNTRL run) with no assimilation and (ii) assimilation run using EnKF (EXP run). The Noah LSM is integrated from 01 January 2012 00 UTC to 31 December 2012 00 UTC. The ENKF data assimilation algorithm has a sequential approach having the two following steps: (i) a forecast step and (ii) an update step. EnKF method utilized thirty ensemble members, obtained from perturbations on meteorological forcing, model estimated states, and observations. The details of perturbations that represent the uncertainty in the land surface conditions are given in Table 1 and are based on study by Yin et al. (2015). The 30 ensembles are generated by applying random Gaussian error with zero mean. Values of cross correlation in perturbation of near soil temperature (NST), Precipitation, and radiation fields (Short Wave (SW) and Long Wave (LW)) are shown in Table 1. The ASCAT soil moisture observations are perturbed with random Gaussian noise with a standard deviation of 0.04 m3m− 3 (Nair and Indu 2019).

Table 1 Summary of perturbations

Data assimilation theory requires both unbiased observation and unbiased model states. However, there are large differences between the temporal moments of the model and the satellite retrievals. Hence, the present study accounted for the bias correction using the cumulative distribution function (CDF) matching technique (Reichle and Koster 2004).

2.3 IMDAA regional reanalysis data and IMD in situ data

High-resolution soil moisture data both at near surface as well as at three depths (0–0.1 m, 0.1–0.35 m, 0.35–1 m, and 1–3 m) are available from the Indian Monsoon Data Assimilation and Analysis (IMDAA) re-analysis (Ashrit et al. 2020). The IMDAA soil moisture reanalysis data is available at every hour and with a horizontal resolution of 12 km over the domain spanning latitude from − 15° S to 45° N and longitude from 30° E to 120° E. The IMDAA system with 63 vertical levels is based on the Met Office four-dimensional variational data assimilation (4DVAR) and its Unified Model, and uses a 6 hour intermittent data assimilation cycle. Lateral boundary conditions for the reanalysis run are taken from the global reanalysis (ECMWF Re-Analysis) ERA-Interim. The following observations are assimilated in the 4DVAR system: (i) Surface observations, (ii) Upper air, (iii) Aircraft, (iv) Atmospheric motion vector from Geostationary Meteorological Satellite-4, and (v) TOVS (Microwave Sounding Unit (MSU) and High-resolution Infrar Red Sounder (HIRS)) satellite radiances.

IMD in situ soil moisture data from 22 stations are utilized in this study to validate the improved soil moisture estimates obtained from the EnKF data assimilation for the year 2012. The in situ data are available every week at different depths (0 m, 0.075 m, 0.15 m, 0.30 m, 0.45 m, and 0.60 m). The IMD in situ station locations are shown in Fig. 2.

Fig. 2
figure 2

IMD in situ station locations as numbered in Table 2

2.4 Validation

The quantitative evaluation of the assimilated soil moisture with respect to IMDAA data is calculated using an improvement parameter and a forecast impact parameter. Improvement parameter is defined as

$$ \eta = \mid \mathrm{SM}_{obs} - \mathrm{SM}_{CNTRL} \mid - \mid \textrm{SM}_{obs} - \textrm{SM}_{EXP} \mid $$
(1)

where SMobs, SMCNTRL, and SMEXP refer to surface soil moisture that is obtained from IMDAA, and from CNTRL and EXP runs. The positive value of improvement parameter ‘η’ is a measure of the positive improvement of the soil moisture estimate due to EnKF data assimilation.

The forecast impact (FI) parameter is defined as

$$ FI = \left( 1 - \frac{RMSE(E)}{RMSE(C)} \right) \times 100 $$
(2)

where RMSE (E) and RMSE (C)are the Root Mean Square Error (RMSE) of the EXP and CNTRL runs soil moisture data (at 5-cm depth) with respect to the IMDAA data. The positive value of the FI parameter indicates the positive impact of soil moisture data assimilation.

A two-sample Kolmogorov-Smirnov distance (KS-D) (Chakravarty et al. 1967) statistical test is used in this study to quantitatively compare the probability distribution between the CNTRL and EXP run. It is based on a null hypothesis that the two sample distributions (CNTRL and EXP) are taken from the same source distribution. The KS-D value gives the empirical difference between the two sample distributions.

2.5 Evaluation using triple collocation (TC) method

The present study employed the TC method to evaluate the irrigation impact of ASCAT in the assimilated soil moisture. The basic idea of this approach is to obtain the unknown error standard deviations of three independent (it is assumed that the errors are uncorrelated) measurements, without the knowledge of the truth [Stoffelen (1998)]. Initially, the TC method was widely employed in oceanographic studies to evaluate the errors in sea surface temperature measurements (Gentemann 2014; O’Carroll et al. 2007). Subsequently, it has been applied in soil moisture studies (Nair and Indu 2019). In order to ensure that the errors of the three measurements remain uncorrelated, the present study has utilized soil moisture obtained from Global Land Data Assimilation System (GLDAS), Catchment Land Surface Model (CLSM), and MERRA (Modern-Era Retrospective analysis for Research and Applications) Land data set along with the CNTRL run for the TC analysis. Furthermore, TC analysis is also performed on the soil moisture obtained from GLDAS CLSM and MERRA data along with the EXP run. The abovementioned approach will be able to evaluate the irrigation impact of ASCAT in the assimilated soil moisture with the same reference since the first two data sets (GLDAS CLSM and MERRA) remain the same in each of three independent data sets.

3 Results and discussion

Figure 3 shows the spatial distribution of improvement parameter for different seasons during the year 2012 at 5-cm depth. The percentage of grid points over land where the improvement parameter is positive with respect to the total number of grid points is named ‘α’ and the values of ‘α’ for the winter (January–February), pre-monsoon (March–May), southwest monsoon (June–September), and post-monsoon (October–November) seasons are 59.14%, 69.17%, 43.59%, and 77.53% respectively. For the south-west monsoon season, the percentage of improvement parameter is slightly less than 50%. It is well known that Noah LSM’s soil moisture estimate is completely devoid of any effects of irrigation while the satellite-derived soil moisture does register the signature of the irrigation effects. Also the IMDAA soil moisture has the signature effects of irrigation. It is well known that India as a whole receives as much as 80% of its annual rainfall during its south-west monsoon season lasting four months. The monsoon season being the rainy season, the requirement of irrigation of the same magnitude and extent becomes less for India during the monsoon season as compared to the other three seasons. Keeping the above in mind, the results of Fig. 3c that reveal the lack of extensive regions of improvement in the soil moisture estimates due to ingestion of satellite-derived soil moisture (the latter incorporating the irrigation effects) during the monsoon season are not surprising.

Fig. 3
figure 3

The spatial distribution of improvement parameter for the year 2012, where a, b, c, and d represent winter, pre-monsoon, monsoon, and post-monsoon respectively

Figure 4 shows the spatial distribution of the forecast impact parameter for the year 2012 at 5-cm depth. The percentage of grid points over land where the forecast impact parameter is positive with respect to the total number grid points is named ‘β’ and the values of ‘β’ for winter, pre-monsoon, southwest monsoon, and post-monsoon seasons are 58.43%, 69.26%, 45.85%, and 75.66% respectively. The southwest monsoon season has a lower percentage of grid points having requirement of irrigation during the southwest monsoon season.

Fig. 4
figure 4

The spatial distribution of forecast impact parameter for the year 2012, where a, b, c, and d represent winter, pre-monsoon, monsoon, and post-monsoon respectively

The resulting values for the K-S distance(D) for different seasons are shown in Fig. 5. When the K-S-D value is close to zero, the above indicates that the CNTRL and EXP run soil moisture distributions are similar. Contrarily, larger values of K-S-D suggests that the differences between the probability distribution of CNTRL and EXP integrations are marked. The presence of regions having large values of K-S-D may be attributed to the impact of irrigation over the above regions.

Fig. 5
figure 5

Kolmogorov–Smirnov distance (D) from comparison of soil moisture distributions from CNTRL and EXP integrations for the year 2012 where a, b, c, and d represent winter, pre-monsoon, monsoon, and post-monsoon respectively

Figure 6a-d depict the difference in near surface soil moisture between ASCAT and CNTRL run for different seasons (winter, pre-monsoon, monsoon, and post-monsoon) of 2012. India, as a whole had near normal annual rainfall (− 11% departure with respect to normal), with the monsoon season receiving 78% of annual rainfall with a − 7% departure of its monsoon rainfall. Figure 6a shows that the differences in the soil moisture are quite small during the winter season. The winter season has the least amount of rainfall (3.7% of annual rainfall) and hence the soil moisture values from ASCAT are not high. With the CNTRL run overestimating soil moisture values as compared to IMDAA, it is not surprising that CNTRL soil moisture values are higher than ASCAT’s. The lack of marked differences between ASCAT and CNTRL soil moisture values would result in lack of pronounced positive impact due to EnKF data assimilation, as reflected in Figs. 3a and 4a.

Fig. 6
figure 6

The spatial distribution of bias between the ASCAT soil moisture and CNTRL run soil moisture for different seasons for the year 2012, where a, b, c, and d represent winter, pre-monsoon, monsoon, and post-monsoon respectively

Figure 6b shows that the differences in the soil moisture in pre-monsoon season is marked and has negative values over Madhya Maharashtra, Marathwada, Vidarbha, Saurashtra and Kutch (all part of central India) and also over Telangana and North Interior Karnataka (part of South Peninsular India). Most of the abovementioned regions have experienced deficit rainfall during pre-monsoon season with percentage departures ranging from − 40% (Madhya Maharashtra) to − 94% (Saurashtra and Kutch) (Kaur and Purohit 2013) and these regions contain mostly non irrigated croplands as compared to other homogeneous regions (Fig. 3 of Ambika et al. (Krishnankutty Ambika et al. 2016)) leading to lower soil moisture as estimated by ASCAT. With the overestimation of CNTRL soil moisture, the difference between ASCAT and CNTRL run is markedly negative in the above regions. The abovementioned regions also have the largest positive FI values during the pre-monsoon season (refer Fig. 4b). The above result reveals that the overestimation of CNTRL soil moisture has reduced in the EXP run, resulting in positive impact, when the LSM is assimilated with ASCAT soil moisture.

Figure 6c shows that the differences in the soil moisture in the monsoon season are quite small and have mostly positive values except for regions over Tamil Nadu, South Interior Karnataka, North Interior Karnataka (all part of South Peninsular India), Saurashtra and Kutch, and Gujarat (part of Central India). Most parts of India received normal rainfall during the monsoon season (78% of annual rainfall) in 2012, increasing the soil moisture estimates from ASCAT and contributing to positive differences in soil moisture between ASCAT and CNTRL run. A few regions, however, received lower rainfall with percentage departure during monsoon season being − 23% for Tamil Nadu and South Interior Karnataka, − 36% for North Interior Karnataka, − 34% for Saurashtra and Kutch and − 28% for Gujarat region (Kaur and Purohit 2013). Lower rainfall over these regions contributed to lower soil moisture and hence differences between ASCAT and CNRL soil moisture became negative over these regions. However, the magnitude of the decrease (i.e. large negative differences as seen during the pre-monsoon season) is considerably reduced during the monsoon season. Figure 4c shows that the above regions show a marginal impact of EnKF data assimilation while the K-S-D values are also close to zero over these regions during the southwest monsoon season (Fig. 5c).

Figure 6d shows that the differences in the soil moisture is quite small during the post-monsoon season. Figure 6d is similar to Fig. 6a except that for the post-monsoon case, the ASCAT soil moisture has higher soil moisture values over more regions than the CNTRL run as compared to the winter season. The above can be explained by considering that the post-monsoon follows the monsoon season with the latter contributing 78% of the annual rainfall with the year 2012 having only − 7% departure of rainfall during the monsoon season. With such good rain over most parts of India, one would expect the ASCAT soil moisture values to be higher than the CNTRL run values. Figure 4d shows that the maximum positive FI values during the post-monsoon season are seen over Bihar, Gangetic West Bengal, Jharkhand (all part of North East India) as well as Chattisgarh, Orissa, (part of Central India) and coastal Andhra Pradesh (part of South Peninsular India). It is pertinent to note from Fig. 6d, that the above mentioned regions had positive soil moisture difference values. Although the highest positive FI values are observed in post-monsoon season (refer Fig. 4d), it is clear that the maximum K-S-D values are seen in the pre-monsoon season (Fig. 5b), the latter consistent with the maximum differences between ASCAT and CNTRL values of soil moisture during the pre-monsoon season (refer Fig. 6b).

Tables 2345, and 6 show the soil moisture RMSE and correlation coefficient of CNTRL run, EXP run, and IMDAA with respect to IMD in situ soil moisture data for 22 stations at 5-cm depth for different seasons (Tables 3 to 6) and the annual average for the year 2012 (Table 2). It is clear from Table 2 that only 9 stations from the total 22 stations show a lack of improvement due to EnKF data assimilation (RMSE (CNTRL) is lower than RMSE (EXP)). The above clearly indicates that the positive benefits of EnKF data assimilation (12 of the 22 stations show lower RMSE of soil moisture after assimilation) are observed over a majority of the IMD soil moisture stations. Out of 22 stations, 11 stations show higher correlation coefficient values due to EnKF data assimilation, while 5 stations show no change in correlation coefficient value due to assimilation. The remaining 6 stations show lower correlation coefficient values after data assimilation.

Table 2 Comparison of soil moisture RMSE and correlation coefficient (R) of CNTRL run, EXP run, and IMDAA at 5-cm depth at 22 stations for the year 2012
Table 3 Same as Table 2 but for winter season
Table 4 Same as Table 2 but for Pre-monsoon season
Table 5 Same as Table 2 but for Monsoon season
Table 6 Same as Table 2 but for post-monsoon season

The irrigation map of India for the year 2012 is shown in Fig. 7d (Devanand et al. 2019). The above figure shows that the most irrigated areas are observed over the Indo-Gangetic plain while least irrigated areas are seen over the south western region of India. Table 7 has tabulated the 22 IMD in situ stations into three classes namely, a low irrigated (< 6 mm), moderately irrigated (6 to 14 mm), and highly irrigated (above 14 mm) based on Fig. 7d. Three of the highly irrigating stations such as Basti, Ranchi, and Solapur have shown positive impact due to the assimilation of ASCAT soil moisture in terms of lower RMSE of soil moisture of the EXP run as compared to CNTRL run (refer Tables 2 to 6). Since the ASCAT soil moisture values have a signature of irrigation effects (Nair and Indu 2019; Zhang et al. 2018), one would expect that over highly irrigated regions, assimilation of ASCAT soil moisture would result in much-improved estimates closer to the ground truth after data assimilation. The above hypothesis is confirmed with lower RMSE values of EXP run as compared with CNTRL run for each of the four seasons as well as for the annual average. Another highly irrigated station, Karnal show the same RMSE values of soil moisture before and after assimilation (refer Table 2), indicating that there is no degradation of the soil moisture estimates after data assimilation.

Fig. 7
figure 7

Depicts the squared correlation coefficient (R2) from Triple collocation method for (a) CNTRL run and (b) EXP run, (c) the change (CNTRL-EXP) in squared correlation coefficient of CNTRL and EXP run, (d) the irrigation map of India in mm for the year 2012, fractional RMSE of (e)CNTRL run and (f) EXP run using Triple collocation method

Table 7 IMD in situ stations are tabulated, based on irrigation

Among the 11 moderately irrigated stations, seven of the stations (Bhubaneswar, Ludhiana, Durgapura, Sagar, Bellari, Chatha, and Vedasundar) have shown positive impact due to the assimilation of ASCAT soil moisture in terms of lower RMSE of soil moisture of the EXP run as compared to CNTRL run. It is clear that the abovementioned moderately irrigated stations have shown improvement in soil moisture estimates after assimilation of ASCAT soil moisture, the latter having the signature of irrigation. It is pertinent to note that irrespective of rainfall received that includes stations receiving below normal rainfall (departure of annual rainfall from normal for Bellari is − 13%, for Bhubaneswar is − 17%, for Sagar is − 23%) or receiving normal rainfall (Durgapura, Jaipur district has 10% departure of annual rainfall from normal and Chatha, Jammu district has 5% departure) or receiving deficit rainfall (Vedasundar, Dindigul district has − 32% departure of annual rainfall from normal and Ludhiana has -55% departure), all these seven moderately irrigated stations have shown positive impact. Two other moderately irrigated stations (Anakapalle and New Delhi) have a very small difference in the RMSE of soil moisture before and after ASCAT soil moisture assimilation (0.007 and 0.001).

One would expect a negative impact due to the assimilation of ASCAT soil moisture over stations that are low irrigated considering that the ASCAT soil moisture has the signature of irrigation effects. The above expectation turns out to be true for four (Rahuri, Pune, Niphad, and Bhopal) of the six stations that have low irrigation values. All the abovementioned four stations have higher RMSE of soil moisture after data assimilation as compared to CNTRL run for the annual average(refer Table 2).

It is certainly true that a small number of stations are showing deviant behavior from the above hypothesis. Sabour a station in Bhagalpur district in the state of Bihar despite being a highly irrigated station has shown a negative impact after data assimilation with a higher RMSE value of soil moisture for the EXP run as compared to the CNTRL run (refer Tables 2 to 6). Similarly, two stations among the moderately irrigated stations (Udaipur and Nagpur) have shown a negative impact after data assimilation. Furthermore, two of the low irrigated stations (Vellanikara and Vittal) have shown a positive impact after data assimilation.

It is clear that a substantial majority of the stations (17 out of 22) have shown a clear positive/negative impact, including no or very small impact, that assimilation of ASCAT soil moisture has impacted positively for most of the stations that have high irrigation and or stations that have moderately irrigation while assimilation of ASCAT soil moisture has impacted negatively for most of the stations that have very low irrigation levels. Overall, the above results lead to confirm the notion that the ASCAT soil moisture has the signature of irrigation effects and assimilating the ASCAT soil moisture would invariably lead to an improved soil moisture estimate closer to the ground truth.

13 out of 22 stations show larger values of RMSE of IMDAA soil moisture as compared to the RMSE of both model runs (CNTRL and EXP runs), while for 6 stations the RMSE of IMDAA soil moisture is less than the RMSE of both model runs. For 3 stations, RMSE of IMDAA soil moisture has values intermediate between the RMSE of model runs (higher than CNTRL and lower than EXP). The soil moisture correlation value of IMDAA has lower values for 11 stations (out of 22) and higher values for 7 stations as compared to CNTRL and EXP runs. For 4 stations, IMDAA correlation values are either intermediate or equal to the model runs (CNTRL and EXP). The above results may appear surprising considering that both IMDAA and EXP run have assimilated ASCAT soil moisture observation. However, it is to be noted that the Richardson equation for soil water used in Noah LSM is not a linear partial differential equation since the diffusion term is not linear. It is well known that EnKF data assimilation provides better results as compared to EKF for nonlinear equations. The reason for a majority (13 out of 22) of stations reporting lower CNTRL RMSE soil moisture values as compared to IMDAA RMSE values are somewhat harder to explain. The soil moisture from CNTRL run is obtained by integrating a relatively simple model (Noah LSM) with observed IMD rainfall forcings and realistic surface meteorology from GDAS. However, the IMDAA regional reanalysis utilized the Unified model and employed 4DVAR data assimilation for the atmospheric variables and utilized EKF data assimilation for soil moisture fields. The LSM called Joint UK Land Environment System (JULES) is coupled to the Unified Model and soil moisture analysis is performed by EKF method for IMDAA. While it is indeed surprising that the soil moisture RMSE of CNTRL (without any data assimilation) run has lower RMSE values as compared to RMSE of IMDAA analysis, it has to be noted that rainfall forcing is a very important forcing in the LSM; while in the CNTRL run, the Noah LSM is forced with observed IMD rainfall forcings, the rainfall forcings for the JULES LSM for IMDAA analysis is forced by the model (Unified Model) simulated precipitation. Thus, despite carrying EKF assimilation involving ASCAT soil moisture estimate in IMDAA, due to errors and uncertainties in the model generated rainfall forcings, the soil moisture analysis from IMDAA may show larger RMSE values of soil moisture as compared to RMSE CNTRL values. In general most data assimilation methods such as EnKF and EKF assume that the random error for the model and observations are Gaussian. Also, land surface exhibits the following characteristics such as (i) heterogeneity (smaller spatial scale as compared to atmosphere and ocean), (ii) non-linearities, and non- Gaussiality (e.g hydrological cycle) that pose serious challenges while applying data assimilation methods. The abovementioned features of land surface, especially those related to non-linear and non-Guassiality have important implications while applying Kalman filter methods (Lahoz and Schneider 2014).

The assimilated soil moisture estimate is further evaluated using triple collocation (TC) method. For the TC analysis, two sets of triplet soil moisture data sets have been selected. The GLDAS and MERRA model soil moisture estimates are kept same for both sets of triplets, while the role of the third data set in each of the two triplets is played by soil moisture in the CNTRL run and EXP run. The first set of triplet soil moisture datasets (GLDAS CLSM, MERRA and CNTRL run) will all be devoid of any signature of irrigation effects [Nair and Indu (2019)], However, while the first two of the second set of triplet soil moisture data sets (GLDAS CLSM, MERRA and EXP run) lack any signature of irrigation effects, the soil moisture data set obtained from the EXP run has the signature of irrigation effects due to assimilation of ASCAT soil moisture in the EXP run. One would expect that the TC analysis with soil moisture from EXP run will exhibit high errors and low correlations over the irrigated regions (Nair and Indu 2019; Kolassa et al. 2017). On the contrary, the TC analysis with soil moisture from CNTRL run will exhibit low errors and high correlations over the irrigated regions.

The TC analysis are performed during the winter and pre-monsoon seasons, the months that precede the chief rainfall season over India. The squared correlation coefficient (R2) is calculated based on the study of McColl et al. (2014). Figure 7a and b, depict the squared correlation coefficient (R2) of the TC analysis of the two sets of triplets that have utilized soil moisture estimates from the CNTRL run and the EXP run for the third data set. Figure 7c depicts the difference in the squared correlation coefficient (R2) between the two sets of triplets that employ soil moisture estimates from CNTRL run and EXP run. From Fig. 7c, it is clear that the assimilated soil moisture simulation shows lower squared correlation coefficient values over highly irrigated areas (refer to Fig. 7d) as compared to the triplet set that utilizes the soil moisture from the CNTRL run. The lower squared correlation coefficient values of the triplet with EXP run is more manifested and are found to be spatially distributed over the Northwest India region. It is to be noted that, the irrigation regions are also more spatially distributed over these regions (refer to Fig. 7d). Over Northeast India, except for the states of Bihar and Jharkhand, the impact of TC analysis is not significant. This is attributed to the fact that the most of the regions in Northeast India are low irrigated regions. However, over the states of Bihar and Jharkhand, Fig. 7a and b show high squared correlation coefficient (R2) for the CNTRL run compared to EXP run since the aforementioned states are highly irrigated. Over south Peninsular India, regions over the Western Ghats show no change in squared correlation coefficient after assimilating with ASCAT soil moisture. Over Central India, meteorological subdivisions such as Saurashtra and Kutch, Gujarat, West Madhya Pradesh, and Chhattisgarh are highly irrigated and the above regions show low squared correlation coefficient values after data assimilation. Further more, the error values in TC analysis are calculated based on fractional root mean square error (fRMSE) as proposed by Draper et al. (2012). Figure 7e and f show the fRMSE values of the TC analysis associated with the two sets of triplets that employ soil moisture from the CNTRL run and EXP run, respectively. From Fig. 7e and f, it is clear that the high irrigated areas (refer to Fig. 7d) shows high fRMSE values after data assimilation. The above feature of manifestation of high fRMSE values is not prominent over the regions where the irrigation effects are lower. The above discussion conclusively demonstrates that the TC analysis employed in this study effectively captures the signature and importance of irrigation in the soil moisture data set of the EXP run that had assimilated ASCAT soil moisture.

4 Conclusion

This study has assimilated the ASCAT near surface soil moisture in the Noah LSM using EnKF data assimilation technique and assessed the impact of assimilation using the forecast impact parameter and improvement parameter over the Indian domain for the year 2012 with respect to IMDAA data. Furthermore, the assimilated soil moisture (EXP run) is validated with IMD in situ soil moisture stations. The results clearly indicate that 12 of the 22 stations show reduced soil moisture RMSE after data assimilation. Also, 11 of the 22 stations report higher correlation coefficient values of soil moisture after data assimilation.

Since the model output is subjected to errors from atmospheric forcings, initial conditions, model discrepancies, and model deficiencies (such as not incorporating some important effects that are present), the CNTRL run soil moisture overestimates the IMDAA soil moisture. The most significant improvements due to assimilation are found over the western parts of the Central Indian region during the pre-monsoon season that are associated with large negative differences between ASCAT and CNTRL soil moisture values. The above regions experienced lower rainfall rate and have non irrigated croplands over the Central Indian region contributing to reduced overestimation of the CNTRL soil moisture after assimilation. The least improvements of soil moisture due to data assimilation are seen during the monsoon season. Possibly, the irrigation requirements are minimum during the typical rainy season (monsoon season). Most of the stations that are highly irrigated and or moderately irrigated show reduced soil moisture RMSE after assimilation with respect to IMD in situ data, while majority of the stations that have low irrigation levels show negative impact due to data assimilation showing that the effects of irrigation as reflected in the ASCAT soil moisture data do contribute to improved soil moisture states and hence, significant improvements to incorporate the effect of irrigation in LSMs, may improve the model soil moisture estimates.

The impact of irrigation on assimilated soil moisture is further evaluated using the triple collocation (TC) method. The TC analysis is performed using GLDAS CLSM, and MERRA Land data with EXP and CNTRL run soil moisture estimates as the third data set for each of the two triplets. As the GLDAS CLSM and MERRA data do not have the signature of irrigation, the assimilated soil moisture shows a low correlation coefficient value and high fRMSE value over highly irrigated regions. These results indicate that the TC analysis has effectively captured the impact of irrigation on data assimilation.

The land surface characteristics such as land cover and soil texture types are predefined in Noah LSM and the land cover and soil texture types are fixed to a single land cover class and a single soil texture one for each grid point. However, in reality, the above parameters have spatial heterogeneity even within a grid cell. In addition, the variation of the soil texture type with soil depth is also not considered and modelled in the Noah LSM. Furthermore, the one dimensional Noah LSM used in the present study is not adequate to describe the land surface interactions in the horizontal. The Noah LSM can only provide for the vertical variations of soil moisture and soil temperatures over various soil depths. The drained water from the bottom layer removes immediately in Noah LSM, which results in fewer memories of preceding weather and climate changes. Since the Noah model has a shallow soil column, the model is unable to capture the soil critical zone (up to 5-m depth). The Noah LSM has a combined soil and vegetation surface. Hence, it is difficult to implement the dynamic leaf model in Noah LSM.