1 Introduction

Surface characteristics are largely dependent on soil type, vegetation cover, leaf area index and vegetation type. Land-use patterns considerably affect surface temperature, and consequently, the atmospheric temperature profile. Impact studies with regional climate models using changes in land-use patterns are reported in Ray et al. (2010). Several studies (Pan, 1990 and Chen et al. 1997; Arora, 2002; Gerten et al. 2004) have shown that the leaf area index and total green fraction are important factors in changes to local weather conditions and eventually to regional weather and climate. The green fraction contributes to the evapo-transpiration mechanism; vegetation controls the soil moisture having a significant direct impact on the relative humidity profile of a region (Kar and Ramanthan 1990; Dudhia, 1996). As different vegetation cover leads to different roughness length, air flow above the land surface is affected and is directly linked to small eddies and vertical turbulent flux (Pan and Wu 1995) of a given area. Irrigated landscapes can alter the regional surface energy balance and its associated temperature, humidity, and climate (Roy et al. 2007), which influence the parameters such as soil moisture, rainfall, drag-coefficients, etc. Therefore, vegetation cover is one of the most fundamental components in a model (Betts et al. 1996).

The planetary boundary layer (PBL) also varies as per the vegetation and soil surface characteristics (Pan and Mahrt 1987; Kar and Ramanthan 1989). The performance of a numerical weather prediction (NWP) model is dependent on the input data of both the surface and the upper air. The surface data is heavily dependent on static as well as dynamic fields on and near the surface. Vegetation on the land surface directly influences most of the parameters, including the temperature and wind flow over it. As a result, much of the local weather change has a direct relationship with vegetation and the bio-geosphere as a whole. Therefore, vegetation cover should be featured appropriately in any NWP model so that the model captures realistic surface parameters. As the vegetation fraction has a direct influence on the variability of atmospheric conditions, proper use of vegetation data for model simulation needs to be ensured for achieving a desirable model setup and improved interactive processes (Ek et al. 2003).

As a mesoscale model is very much sensitive to changes in land-use patterns (Niyogi et al. 1999), compilation of proper surface data is of utmost importance. Previously, several studies investigated the role of land surface processes and the mechanisms that govern land–atmosphere interactions in the monsoon systems (e.g., Meehl 1994; Liu and Wu 1997; Lau and Bua 1998). Yang and Lau (1998) found that land surface has substantial but limited effects at local scales. Over the Indian region, vegetation type and soil moisture undergo rapid and significant variations, especially during the southwest monsoon period. There is also considerable interannual variability in the vegetation cover which is crucially dependent on the amount of rainfall received during the southwest monsoon. In view of the importance of land surface forces, accuracy of land-use information is critical in obtaining improved simulations over this monsoon domain. Niyogi et al. (2010) have examined observational data to relate the Indian monsoon rainfall activity to land-use change.

The South Asian Regional Reanalysis (SARR) project is being undertaken to prepare an atmospheric reanalysis data set over South Asia in which consistency between hydroclimate and atmospheric parameters is ensured. All possible and available data are being used in the preparation of the reanalysis dataset. The WRF model and its assimilation system WRF-VAR are being used for the reanalysis. Several sensitivity experiments have been carried out to choose the physical parameterization schemes before the final reanalysis. There has been no study on the sensitivity of the WRF model to vegetation green fraction for the Indian monsoon simulations. Moreover, there have been no studies describing the relative role of observed data and their assimilation compared to the role of land surface parameterization schemes.

The objective of this study is to document the results of a series of isolated experiments using land surface models (LSMs), PBL schemes, and land cover schemes with a control. Experiments have also been carried out using the ISRO vegetation cover data (Shefali et al. 2003; Oza et al. 2006) in place of the deafault USGS climatological vegetation cover data used by the WRF model. With such isolated comparisons, the capabilities of the WRF model can be documented in simulating the regional hydroclimate over the South Asian region during the Indian monsoon season. A set of data assimilation experiments have been carried out for 1 year (1 annual cycle) using two different land surface parameterization schemes in order to highlight the relative importance of land surface schemes in describing the regional hydroclimate. While carrying out sensitivity studies using numerical models, experimental design plays an important role in isolating various factors for which experiments are conducted. If multiple parameters are changed for some of the experiment runs a factor separation approach (Stein and Alpert 1993; Dearden 2009) needs to be taken in order to explore the interaction between variations in different factors. In this study single parameters (from among the physical parameterization schemes) are changed sequentially to evaluate the results. Brief descriptions of the model, experiments carried out and the data used in the study are provided in Sect. 2. Results of the sensitivity experiments are presented in Sect. 3, while data assimilation experiments and their results are presented in Sect. 3.6. The study is concluded in Sect. 4.

2 Data Assimilation, Model, Numerical Experiments and Data

2.1 The Data Assimilation Scheme

The three-dimensional variational (3DVAR) data assimilation system within the WRF modelling frame work is used in this study. The basic goal of 3DVAR data assimilation is to produce an optimal estimate of the true atmospheric state at analysis time via iterative solutions of a prescribed cost function (Parrish and Derber, 1992). The detailed description of the WRF-3DVAR data assimilation system used in this study can be found in Barker et al. (2004). Background error statistics have been computed for different seasons over the chosen domain following Routray et al. (2013) and Sowjanya et al. (2013).

2.2 Model and Numerical Experiments

For this study, the WRF Version 3.1.1 model (Skamaraock et al. 2005) has been used with a horizontal resolution of 25 km and 38 sigma levels in vertical. A series of numerical experiments have been carried out in two stages by considering parameterization schemes among the multitude of physics options available in the model. It may be noted here that newer versions of the WRF model are now available. However for purposes of conducting the SARR reanalysis, the model Version 3.1.1 was frozen and all the experiments reported in this study are carried out using this version.

Before these experiments were carried out, another set of sensitivity experiments were performed to finalize the model domain. After careful examination the computational domain was finalized such that the effect of boundaries does not spoil the mesoscale perturbations that develop over the South Asian domain of interest. Resultingly, the lateral boundaries have been kept far away from the domain of interest. Additional runs were carried out using a nested domain. It was clearly seen that even if the coarser resolution NCEP reanalysis data were used as boundary conditions, the model can generate locally-driven circulation within 24 h of model integration, which is not affected by the lateral boundaries kept far away from the domain of interest. Results of these experiments are not described further in this study as the main objectives of this study are to document the impact of land surface processes and data assimilation experiments.

2.2.1 Modeling Experiments

For this study, four numerical experiments have been carried out with four possible combinations of two PBL and two land-surface parameterization schemes. The two PBL schemes are the Yonsei University Scheme (YSU; Hong et al. 2006) and the Mellor–Yamada–Janjic TKE scheme (MYJ; Janjic, 2002). The two land-surface schemes are (1) the Unified NOAH land-surface model (Chen and Dudhia 2001) with soil temperature and moisture in four layers fractional snow cover and frozen soil physics, and (2) the slab model with simple soil thermal diffusion (i.e., thermal diffusion scheme; TD) (Chen and Dudhia 2001). All these runs use the Kain–Fritsch (KF) scheme (Kain and Fritsch 1993; Kain 2004) for the cumulus convection. The short-wave radiation scheme Dudhia (1989) and Rapid Radiative Transfer Model (RRTM; Mlawer et al. 1997) long wave radiation scheme are used for these experimental runs. In addition to the above, these experiments use default USGS data provided with the model for surface boundary conditions. Out of these four experiments, the run with NOAH land-surface model, MYJ PBL scheme is treated as the control run. Two sets of control simulations were carried out to examine how the dynamic downscaling using the WRF model brings out the interannual variability of the Indian summer monsoon. These runs were carried out each day of July 1994 and July 1999 using National Centre for Environment Prediction (NCEP) reanalysis data (2.5° × 2.5°) as initial and boundary conditions (Kalnay et al. 1996). The control simulations for 1994 and 1999 are referred to as CNTL94 and CNTL99, respectively, in the following text. The details of the model configuration for the control runs are given in Table 1.

Table 1 Model configuration used

The WRF model uses USGS vegetation fraction data. The USGS data are climatological data and these are not updated in real-time. ISRO estimated the remotely-sensed vegetation cover from multi-date SPOT (Satellites Pour l’Observation de la Terre or Earth-observing Satellites) vegetation data. This data at spatial resolution of about 1 km was re-aggregated/regrouped to USGS 25 classes, and spatially aggregated to 10′ grid size. This monthly vegetation fraction is generated from a temporally filtered 10-day NDVI composite of SPOT vegetation data over the Indian region (Shefali et al. 2003; Oza et al. 2006). This vegetation data is referred to as ISRO vegetation in the following text. Dutta et al. (2009) have carried out a study with ISRO vegetation fraction data in simulations of the MM5 model and showed some improvements in terms of the rainfall estimates and regional coverage. The impact of vegetation green fraction on the model simulations has been carried out by replacing the USGS vegetation cover data with that of ISRO.

2.2.2 Assimilation Experiments

Two sets of data assimilation experiments were carried out using the conventional observed data in the assimilation cycle. These experiments used the WRF 3DVAR assimilation package. The same version of the WRF model was used in the assimilation experiments with the horizontal resolution of 25 km. The first experiment (ASSIM) used the NOAH land surface scheme. The second experiment (AS_TD) used the TD land surface scheme of the model. The NCEP reanalyses are used as lateral boundary conditions and the model 6 h forecast is considered to be the first guess in the assimilation experiments. In the present study, the data assimilation has been carried out in cyclic mode with four cycles a day where new data has been ingested every 6 h (Sowjanya et al. 2013; Routray et al. 2010a, b).

Conventional observations have been assimilated into the WRF model using the 3DVAR data assimilation method over the Indian region for 1994, and the annual cycle of hydroclimate evolution has been studied. Observations used in this study are taken from the NCAR data support section (http://dss.ucar.edu) as well as from the Global Telecommunication System (GTS) archived data at National Centre for Medium Range Weather Forecasting (NCMRWF). The average number of data ingested and types of data in the assimilation system at 00 UTC and 12 UTC in July 1994 are shown in Fig. 1a and b, respectively. On average, sonde observations at various pressure levels are received from more than 100 stations in the assimilation domain. These are mostly at 00 and 12 UTC. The observational dataset has a fair amount of atmospheric motion vectors (AMVs) over this domain. Very few observations are received from ships and buoys, etc. in the oceanic region. Details of modelling and assimilation experiments and their intended goals are provided in Table 2.

Fig. 1
figure 1

Average number of observations assimilated during July 1994. a at 00 UTC; b at 12 UTC

Table 2 Description of numerical experiments

3 Results and Discussion

3.1 Control Runs

One of the key objectives of the SARR project is to have a high-resolution analysis of atmospheric data, which is consistent with the hydroclimate. Precipitation drives surface temperature, surface evaporation and runoff, etc. Along with the land surface scheme, parameterization of the PBL would also play a significant role as the PBL scheme transfers the heat, momentum and moisture flux from the surface to the atmosphere. Two sets of control experiments have been conducted for July 1994 and July 1999. These experiments are essentially dynamic downscaling runs in which coarse-resolution global reanalysis data have been downscaled using the WRF model. For these two July months salient features of atmosphere and hydroclimate were studied from the observed data and the control model simulations. Figure 2a and c show the monthly average 24-h precipitation forecast (mm/day) and winds at 850 hPa from the WRF model for July 1994, respectively. Similar plots for July 1999 are shown in Fig. 2b and d, respectively. The observed rainfall from the India Meteorological Department (IMD; Rajeevan et al. 2006) and coarse resolution (2.5° × 2.5°) GPCP precipitation (Huffman et al. 2007) for July 1994 and 1999 are depicted in Fig. 3a–d, respectively.

Fig. 2
figure 2

a Monthly average 24-h forecast precipitation (mm/day) from the WRF model for July 1994 (CNTL94); b same as (a) but for July 1999 (CNTL99); c monthly mean winds at 850 hPa for July from CNTL94; d same as (c) but for CNTL99

Fig. 3
figure 3

a Observed monthly mean precipitation for July 1994; b same as (a) but for July 1999; c and d are observed precipitation from GPCP for July 1994 and 1999, respectively; e observed and CNTL94 precipitation difference and f observed and CNTL99 precipitation difference

Limitations of dynamic downscaling using global reanalysis products and the WRF model are evident from a comparison of Figs. 2 and 3. As it is well known, the Indian monsoon is characterized by zones viz. west coast, eastern, central India, etc. of maxima in rainfall during the July month. Over the west coast, there is a maximum in observed rainfall with a monthly mean of more than 20 mm/day. Day to day variations of rainfall in this region are quite large, and occasionally rainfall of about 200–300 mm/day is also received. The eastern and east-central parts of India are the other regions in which a maximum of rainfall is seen during July 1994 in observed data. During this month in both 1994 and 1999, this region received about 8–20 mm/day rainfall, as seen in the figures. In the observed rainfall plots, rainfall maxima are seen in the north Bay of Bengal and along the Arakkan coast. GPCP precipitation being in lower-resolution than the IMD precipitation data, most of the regional details are missing in the GPCP data for both 1994 and 1999. However, interannual variability is clearly evident over the Indian land and oceanic regions for 1994 and 1999 in both IMD and GPCP precipitation plots. It appears that the monsoon is rather weak over most parts of India, especially over the Western Ghats in the CNTL94 run. In July 1999, the observed and model rainfall amounts over central and east parts are less than that in 1994. The model precipitation shows mixed capabilities as far as the 24-h forecast is concerned. While most of the major rainfall zones are simulated well by the model, there are deficiencies. Over the Gangetic plain, the model simulates less rainfall than actual observations. For the July 1999 run, it seems the rainfall zone has moved northward with a precipitation maxima along the foothills of the Himalayas and Nepal, which is not seen in the observed data. The model also overestimates precipitation over the Oceanic regions in both CNTL94 and CNTL99 which is also not seen in actual observations.

The monthly mean 24 h forecast of winds at 850 hPa for July 1994 shows that the speed of the south westerly Somali jet is grossly underestimated in the downscaled run. For July 1999 the monsoon winds are reasonably simulated in the 24 h runs. The south westerly monsoon flow over the Arabian Sea is well simulated by the model (observed or analyzed winds are not shown in the figure). The core of the low level jet off the Somalia coast attains a peak speed of more than 20 m/s. A weak trough over the Bay of Bengal in July 1999 is also seen and agrees reasonably well with the NCEP reanalysis data (figure not shown). Differences of rainfall simulations from IMD observed data for July 1994 and 1999 are shown in Fig. 3e and f, respectively. The figures show that over most parts of India the model underestimates rainfall values in 1994 except over north-east India and adjoining northern parts of West Bengal. Over the west coast, the model simulates about 6–8 mm/day less rain than that observed. Although the simulated rainfall values in the CNTL99 run are less than those observated over most parts of India, the difference is not as large as that seen in the CNTL94 run. The model overestimates precipitation over the Himalayan foothills and the eastern coast of India. The RMSE values from all the experiments are shown in Table 3 for the region within latitudes 18–25N and longitudes 73–81E. For the CNTL99 run, RMSE for precipitation is 5.1 mm/day and for temperature at 2 m it is 1.61 °C. These RMSE values form the benchmark for all subsequent experiments in this study.

Table 3 Area averaged RMSE of precipitation (mm) and temperature (°C) over the box Lat: 18–25N and Lon: 73–81E from July 1999

3.2 Impact of Data Assimilation

The main objective of this section is to document the impact of data assimilation on the monsoon hydroclimate. Observed data used in this study are described in Sect. 2.2. Figure 4 shows the average root mean square error (RMSE) between observation and analysis (O–A) as well as background and analysis (O–B) for each type of observation at 00 and 12 UTC. It is seen from the figure that O–A RMSE is always smaller than O–B for each cycle of assimilation and each type of observation. This means the final analysis is closer to observations than the background field (first guess from the model). Due to assimilation of sonde data at 00 UTC, temperature RMSE between observation and final analysis is reduced to about 1° over the region compared to about 1.7° that is seen for observation and background RMSE. Wind errors are also reduced from about 4 to about 3 m/s due to assimilation of radiosonde winds. Similar improvements are seen when pilot or AMVs from geostationary satellites (Geoamv), or aircraft reports (AIREP) are assimilated. Improvements are consistent in both 00 UTC as well as 12 UTC. Figure 4 also shows the average of all O–B and O–A including the impact of all observations in the domain of interest. In this case, it is also seen that O–A is smaller than O–B indicating that all the available observations are ingested properly into the 3DVAR-produced reanalysis data.

Fig. 4
figure 4

a Mean RMSE of observation and background (O–B) and observation and analysis (O–A) for a temperature (T) at 00 UTC; b T at 12 UTC; c zonal wind (u) at 00 UTC; d u at 12 UTC; e meridional wind (v) at 00 UTC; f v at 12 UTC

As mentioned earlier, two sets of data assimilation experiments were carried out using the same set of observed data. The assimilation experiments began on January 1, 1994 and continued till December 31, 1994. The first experiment (ASSIM) used the NOAH land surface scheme. ASSIM and observed data have been compared in order to examine the relative role of assimilating monthly mean precipitation data for July 1994 from CNTL94 (Fig. 2). In addition, monthly mean temperature at 2 m (T2m) from the assimilation experiment as well as from observations has been compared. These results are shown in Fig. 5. It may be noted that the CNTL94 run does not assimilate any observed data (shown in Fig. 1) and only downscales the global reanalysis data using the WRF model. Additionally, in the ASSIM run these are 6hrly precipitations from the model accumulated on a per day basis. It is clearly seen that a much improved precipitation pattern is seen in the ASSIM run as compared to the CNTL94 run. Rainfall over most parts of India, especially over the Western Ghats, has increased in the ASSIM run. Differences between observation data and the ASSIM precipitation data (shown in Fig. 5b) indicate that the model still overestimates precipitation over the north eastern part of India. Rainfall over central India is underestimated. Monthly mean temperature at 2 m (Fig. 5c) indicates the air temperature is about 32°–35° or more over most parts of India. Over the western coast it is about 20°–23°. Differences between observations and the ASSIM run (Fig. 5d) show the model has large bias over the foothills of the Himalayas and northern India (especially over Jammu and Kashmir, Himachal, Uttarakhand regions). The model bias is more than 10° over this region indicating the need of orography-related correction of the temperature field in the model. Over peninsular India, the model has a warm bias ranging from 2° to 4°. As shown in Table 4, area-averaged RMSEs of rainfall and temperature at 2 m over the box bounded by latitudes 18N–25N and longitudes 73E–81E for the assimilation run are 6.06 mm and 1.54 °C, respectively.

Fig. 5
figure 5

Monthly mean precipitation (mm/day) for July 1994 from a ASSIM; b difference between observed and ASSIM precipitation; c temperature at 2 m; d difference between observed and ASSIM temperature

Table 4 Area averaged RMSE of rainfall (mm) and temperature at 2 m (°C) over the box Lat: 18–25N and Lon: 73–81E in assimilation experiments for July 1994

3.3 Impact of Vegetation Green Fraction

Figures 6a and b show the vegetation green fraction data (%) from the USGS (used in the CNTL99 run) and ISRO for July 1999. Differences between these two datasets are shown in Fig. 6c. The plot of vegetation difference shows enhanced vegetation in ISRO data in all of the Indian subcontinent, except for the north-eastern and Himalayan regions. It also shows less vegetation over northern Myanmar, Vietnam and Thailand during July 1999. As seen from the plots, the USGS vegetation climatology underestimates the vegetation green fraction data over a major part of this domain; the USGS data being climatological, it does not consider interannual variability in the vegetation cover.

Fig. 6
figure 6

Vegetation green fraction data (%) for July 1999 a from the USGS used in Control runs; b from the ISRO; c difference between these two datasets

Several simulated hydroclimate parameters such as temperature, latent heat flux, precipitation, runoff, and potential evapo-transpiration were compared with use of ISRO vegetation fraction data contrasted against the control case of USGS data. Figure 7 shows the diurnal variation (monthly average) of temperature at 2 m (T2m, °C) and air temperature (T c, °C) at 950 hPa obtained from the model runs with these two vegetation datasets over the northwest (70E–80E and 25N–35N) and northeast regions of India and the adjoining Thailand region (90E–105E and 20N–30N). Over the northwest Indian region, the 2 m temperature is higher in the afternoon and night hours in the CNTL experiment (with USGS data) than that obtained with ISRO vegetation cover data. Therefore, it is demonstrated that enhanced vegetation cover tends to cool the surface. As shown in Fig. 7c, the impact is also seen at 950 hPa over the same region. Similarly, reduction of vegetation cover has increased the surface as well as air temperatures over northeast India and the adjoining Thailand region. This cooling/warming due to enhanced/reduced vegetation green fraction data is seen during the entire diurnal cycle.

Fig. 7
figure 7

Diurnal variation (monthly average) of temperature at 2 m (T2m) obtained from the model runs with USGS and ISRO vegetation datasets over a northwest region of India (70E–80E and 25N–35N); b northeast India and adjoining Thailand region (90E–105E and 20N–30N); c same as a but for air temperature (T c) at 950 hPa; d same as (b) but for air temperature (T c) at 950 hPa

PBL height simulated with USGS and ISRO vegetation data was examined (CNTL99 and ISRO experiments). PBL height at early morning is at its minimum and during the day it grows according to fluxes from the surface. Douglas et al. (2009) have shown that an increase in vegetation in the semiarid regions of India increased the net radiation. As the surface characteristic is also a factor determining the surface fluxes, it is expected that with the use of two different vegetation characteristics, the PBL height also would vary accordingly. Figure 8a shows the PBL height difference between these two experiments at 06 UTC. Major regions where the PBL height differences are large are in the Thailand region where the ISRO PBL height is more that the USGS PBL height. Over the north western parts of India, the USGS PBL height is greater than that of the ISRO PBL height. A comparison with actual vegetation cover in both of these experiments indicates that PBL height has increased where the vegetation green fraction is less and vice-versa. Therefore, the role of vegetation green fraction in these experiments has been to increase the latent heat flux and reduce the sensible heat flux to the atmosphere. Latent heat flux differences between these two experiments are shown in Fig. 8b. As discussed earlier, it is seen that over northwest India, the latent heat flux is more in the ISRO experiment than in the CNTL experiment. This indicates that there are increased evaporative processes associated with more vegetation in the model runs. The difference is large and is sufficient to change the PBL height by about 150–200 m and also change the atmospheric thermodynamic characteristics. Findell and Eltahir (2003) developed the convective triggering potential (CTP) as a measure of the early-morning near surface atmospheric thermodynamic structure and demonstrated that this structure must be considered in order to determine how the growing boundary layer will respond to fluxes from the land surface. They showed within their 1D model, that land surface moisture or vegetative condition can influence the potential for rainfall only in a limited range of early-morning atmospheric conditions. Similar studies have not been carried out for the Indian region. The work of Findell and Eltahir (2003) and this study encourages a detailed study on the growth of the PBL and CTP over the Indian region.

Fig. 8
figure 8

a PBL height difference (m) at 06 UTC (monthly average) between experiments with ISRO and USGS vegetation data; b same as (a) but for latent heat flux (W m−2)

Figure 9a shows the monthly average simulated soil moisture (m3/m3) over the computational domain obtained from the CNTL run. The soil moisture value ranges from 0.2 to 0.4 m3/m3 over most parts of India. A well defined gradient of soil moisture from the zone that receives more precipitation to the regions with less precipitation is seen. However, geographical patterns of soil moisture content do not match that of precipitation patterns. In order to examine if available soil moisture at the surface gets affected due to vegetation cover differences in the monthly mean of 24-h forecast soil moisture (m3/m3 multiplied by 100) between ISRO and USGS experiments are plotted in Fig. 9b. It is seen that where vegetation cover is reduced, especially over the Thailand region, the monthly mean soil moisture also gets reduced as more of the surface area is exposed to solar radiation. This leads to reduced evaporation from the surface resulting in reduction in latent heat fluxes. Potential evapo-transpiration differences (W m−2) between these two experiments are shown in Fig. 9c. Potential evapo-transpiration is the amount of evaporation that would occur if a sufficient water source were available. Surface and air temperatures, insolation and wind all affect this parameter. Figure 9c clearly shows the effect of vegetation green fraction on the surface processes. With an increase in green fraction over northwest India in the ISRO experiment the potential evapo-transpiration increases over this region; this is less over the Thailand region in the ISRO experiment. Over the regions where vegetation cover differences between ISRO and USGS are small, differences in surface temperature, PBL height, latent heat flux, soil moisture and potential evapo-transpiration are also small. Therefore, this study highlights the importance of vegetation green fraction in the context of regional hydroclimate. The importance of vegetation green fraction was also concluded by Pielke (2001).

Fig. 9
figure 9

a Monthly average simulated soil moisture (m3/m3) from CNTL run; b monthly mean difference of 24-h forecast soil moisture (m3/m3 multiplied by 100) between ISRO and USGS experiments; c same as (b) but for potential evapo-transpiration (W m−2)

It is noticed from the initial NOAH scheme experiments (CNTL99 run) that this scheme improves the modeled rainfall over the Himalayan part of north-eastern India and eastern India. Figure 10a shows the difference in 24-h accumulated rainfall (mm/day) using ISRO and USGS vegetation fraction data. The ISRO vegetation data provides more rain over most parts of India excluding the north eastern parts of India. Over the Thailand region where ISRO vegetation is less the rainfall amount is also less. Reduction in rainfall in the regions of north eastern India and adjoining regions of northern Myanmar, Thailand and Vietnam indicates this is attributable to a decrease in green fraction. Rainfall difference is just not confined to land region only where there is a change in vegetation green fraction. Differences in rainfall between these two experiments are also seen over the surrounding oceanic regions indicating the vegetation green fraction has large-scale effects. Rainfall over a region depends on large-scale circulation. Local conditions such as topography enhance or reduce rainfall activity and amount. As the region in our study (i.e. South Asia) is surrounded by vast oceans on three sides, moisture transport from the ocean is primarily responsible for the rainfall, and recycling of moisture from the land areas, possibly causes local rainfall differences to be a less contributing factor. Misra et al. (2012) studied the sources of evaporation during the Indian summer monsoon seasons using data from three global reanalyses and found that evaporative sources are similar between the reanalyses, with a significant fraction of contribution from local continental origin. This study shows that the surface conditions are themselves sufficient to make large-scale changes in divergence and convergence locations leading to differences in the rainfall amount, as seen Fig. 10a. Figure 10b shows the differences in accumulated surface runoff values (mm/day) averaged for July. Large positive runoff differences over the region of reduced vegetation cover (Thailand) is noticed. Over the same region, the soil moisture amount is less, as was noted earlier. Large differences are also seen in the Himalayan region and northern parts of India which indicate that the vegetation green fraction is able to modify the surface water budget over this region. Increased runoff from the region of less vegetation cover indicates that most of the water retention capacity of the soil surface is also being reduced. Not much difference in runoff is seen over most parts of central and peninsular India. Results of these two experiments indicate large differences in precipitation seen over Thailand and northern parts of India. It may be noted these are also regions where the rainfall is also quite large (e.g., over north-eastern parts of India, Bangladesh and Thailand). The observed rainfall amounts can reach up to 20 mm/day. The difference of about 1 mm/day between these two experiments is only a small fraction of the total amount of precipitation received.

Fig. 10
figure 10

a Difference in 24-h accumulated rainfall (mm/day) using ISRO and USGS vegetation fraction data; b same as (a) but for surface runoff

Land use properties including vegetation and soil types do not directly affect precipitation over the monsoon region. Instead, they change the partitioning of the latent and sensible heat fluxes, resulting in changes in not only local heat balance but also monsoonal large-scale circulation due to the modified land–sea contrast (figure not shown). Although the precipitation differences between ISRO and CNTL in Fig. 10a are explained by the local changes in heat fluxes and PBL height, the differences are small. It could be due to treatment of land surface processes and PBL schemes in the model. In that case, specification of land surface properties such as land use patterns could play a smaller role as compared to the way the surface characteristics are treated in the model. One possible reason for the small differences in Fig. 10a is that the local (or regional scale) feedback of vegetation is alleviated by the large-scale circulation feedback (figure not shown). It may be noted here that the vegetative green fraction is not the only quantity modified when changing land cover schemes. Other affected quantities include roughness length, albedo, water retention capacity, rooting depth, and treatment of subcellular processes. These parameters get changed within the WRF model according to the vegetation green fraction over a grid point. Area-averaged RMSEs of rainfall and temperature at 2 m over the box bounded by latitudes 18N–25N and longitudes 73E–81E for CNTL99 (use of USGS data) and ISRO experiments shown in Table 3 indicate that overall the use of ISRO data marginally improves the capabilities of the model.

3.4 PBL and Land Surface Parameterization Experiments

3.4.1 PBL Parameterization Experiments

In order to examine how the PBL parameterization scheme used in the WRF model simulations affect the precipitation distribution given the same scheme for the land surface process, one more experimental run was conducted for July 1999. This run is the YSU PBL scheme (hereafter YSU) instead of the MYJ PBL scheme used in CNTL99, keeping all other model parameters same. Figure 11a shows the differences of PBL height between the CNTL99 and YSU runs. It is seen that in YSU, the PBL height is less than the MYJ scheme over most parts of India and surrounding regions. The difference is as large as 900 m or more over the Arabian Sea where the low level Somali jet is seen. The YSU scheme simulates higher PBL height (about 300 m more) than the MYJ scheme over north-eastern parts of India. This indicates that in the YSU scheme, turbulent processes in the atmosphere are confined to a lower height than the MYJ scheme over India and most of the oceanic region. Hu et al. (2010) carried out sensitivity experiments using the WRF model and the PBL schemes over the south-central United States. They found that while the WRF simulations underpredict temperature and overpredict moisture near surface, the bias is more in the simulations with the MYJ scheme than the YSU scheme. They concluded that stronger vertical mixing in the YSU scheme causes stronger entrainment at the top of PBL leading to a warmer and dryer PBL. In the MYJ scheme, local mixing only accounts for the entrainment and the scheme does not account for the entrainment from penetrating plumes or large eddies. In this study, detailed boundary layer characteristics have not been studied due to absence of data in the PBL. However, it is seen that over the Indian region, the boundary layer is confined to a lower height in the YSU scheme than the MYJ scheme. Figure 11b shows the differences of July mean (daily accumulated) latent heat flux between the YSU and CNTL99 runs. It is seen that in the YSU scheme, the simulated latent heat flux is about 200–400 W m−2 more than when the MYJ scheme is used in the CNTL99 run. Considering that the evaporation flux from the ocean is brought forward to the Indian landmass causing monsoon precipitation in July, this difference is quite significant. Figure 11c shows precipitation differences (all 24hrly precipitation for July 1999) between the YSU and MYJ PBL schemes (YSU-CNTL99). Over the oceanic region, the YSU scheme shows increased precipitation compared to the MYJ scheme. Along the foothills of the Himalayas the YSU run also shows more rain than that of the CNTL99 run. Over the Myanmar, Thailand and Indo China regions, the MYJ scheme produces more precipitation. However, with the NOAH scheme, the YSU scheme provides less precipitation over the eastern parts of India only. There is no large difference between the precipitation obtained from these two PBL schemes over the rest of the region. Therefore, it is seen that although large-scale pattern of rainfall does not change much even if the PBL scheme is changed, the differences in precipitation over the Western Ghats is quite significant. A comparison of PBL height, boundary layer temperatures, water vapor mixing ratios, and low-level wind speeds against observed soundings is needed to properly identify which PBL scheme best matches the observations. This would depend heavily on which land surface and land cover classification schemes were used with the PBL schemes. As mentioned earlier, the main objective of this study is to document large-scale impacts of the physical parameterization schemes on the monsoon hydroclimate and not to make a comparison of boundary layer characteristics on a daily basis with observed soundings. Area-averaged RMSEs of rainfall and temperature at 2 m over the box bounded by latitudes 18N–25N and longitudes 73E–81E for the CNTL99 run (MYJ scheme) and the YSU experiments shown in Table 3 indicate that overall the use of the MYJ scheme improves the capabilities of the model in terms of error reduction. Rainfall error was reduced to 5.1 from 6.68 mm/day and temperature errors were reduced from 1.61 to 1.87 °C. Therefore, the MYJ scheme provides improved distribution of precipitation and temperature as compared to the YSU scheme.

Fig. 11
figure 11

Difference of monthly mean between CNTL99 and YSU runs a PBL height; b latent heat flux; c precipitation

3.4.2 Land Surface Parameterization Experiments

The NOAH scheme used in the CNTL99 run is an advanced scheme with several levels of complexity in it. For proper representation of hydroclimate over the Indian region, all surface characteristics have to be prescribed correctly or the simulated hydroclimate using this sophisticated scheme may provide undesirable results. Another run for July 1999 was conducted using the slab model (TD) keeping all other model configurations the same. The soil layers in the TD scheme are 1, 2, 4, 8, and 16 cm thick. It may be noted here that the TD scheme is rather simple. This scheme does not account for surface wetness and this is a major weakness of the scheme (Chen and Dudhia, 2001). This experiment (TD) and its comparison with the CNTL99 run (with the NOAH scheme) shows how the treatment of land surface processes in the WRF model simulations affects monsoon hydroclimate. Figure 12a shows the geographical plot of 2 m temperature monthly mean 24-h forecast differences (for 00 UTC) between these two schemes. Large difference of 3°–4° (the NOAH scheme produces warmer surface temperatures) is seen over the Arabian region, and 2°–3° are seen over large part of the Himalayas and northern China. Over the northwest parts of India and over the northern parts of Jammu and Kashmir the NOAH scheme simulates colder surface temperatures as compared to the TD scheme. Over most parts of India, the temperature difference is small although the NOAH scheme generally provides warmer surface temperatures than the TD scheme. Diurnal variation of model-simulated 2 m temperature for the northwest region of India (70E–80E and 25N–35N) for the entire month has been compared against available maximum and minimum temperatures (figure not shown). Both the schemes generally elicited well the monthly variations of temperatures for this region. Except for the early part of the month, the CNTL99 run with the NOAH scheme provides higher temperatures at 2 m as compared to the TD scheme. Rise and fall of observed temperatures during the month are also well simulated by both the schemes.

Fig. 12
figure 12

a Monthly mean 24-h forecast difference of 2 m temperature between the CNTL99 and TD runs; b monthly mean 24-h forecast difference of precipitation between the CNTL99 and TD runs; c same as b but for difference between the YTD and TD schemes

As shown in Fig. 12b. differences in the treatment of surface processes in the model leads to large differences in precipitation simulation over the Indian domain. The TD scheme simulates more rainfall than the NOAH scheme over most parts of India and neighbouring seas. The NOAH scheme simulated higher rainfall only over the north Bay of Bengal, and the eastern coast and foothills of the Himalayas. Differences in precipitation simulation by these two schemes is as large as 5–10 mm/day. Large differences in rainfall in the oceanic regions suggest that, even within 24 h of simulation, the entire monsoon system is modified due to differences in the land surface process parameterization scheme. This figure suggests that the monsoon precipitation process is very sensitive to land surface processes in the WRF model.

In order to examine how the TD land surface scheme works with the YSU PBL scheme in the WRF model for the Indian monsoon precipitation simulations another experimental run was conducted for July 1999. This run is referred to as YTD. Figure 12c shows precipitation differences (all 24hrly precipitation for July 1999) between YTD and TD simulations. Over most part of the Indian land region, the YTD run provides less precipitation than the CNTL99 run. However, with the NOAH scheme, the YSU scheme provides less precipitation only over the eastern parts of India. A comparison of precipitation difference between the YSU scheme and the MYJ scheme with NOAH land surface process (shown in Fig 11; YSU-CNTL99) indicates that there is no large difference between the precipitation obtained from these two PBL schemes over the rest of the region. Therefore, it is seen that large-scale patterns of rainfall do not change much even if the PBL scheme is changed. It is seen that irrespective of the chosen land surface process scheme precipitation differences due to differences in the PBL scheme are similar. The differences in precipitation due to a change in land surface process (CNTL99-TD) shown in Fig. 12b are much larger than the differences seen due to different PBL schemes. Therefore, the impact of land surface parameterization is greater than that of the PBL scheme chosen.

Model precipitation from each experiment has been compared with the observed precipitation for the month. This has been done using bilinear interpolation of the model precipitation to the IMD observation grid. Differences of precipitation from the model runs for YTD, YSU and TD from the observed data are shown in Fig. 13a–c, respectively. The model has a tendency to simulate more precipitation over the southern Indian peninsula and the foothills of Himalayas when the TD scheme is used. Over central parts of India and the Gangetic plains, the observed values are always more than the model simulations. It is seen from the figure that maximum differences are onserved in the experiments using the TD scheme. The differences are larger than that obtained from the CNTL run which used the NOAH scheme. Differences between observations and the YSU run are similar to the ones obtained in the CNTL99 case with the MYJ scheme. Therefore, the model’s characteristics do not change much when different PBL scheme is employed in the model runs.

Fig. 13
figure 13

Monthly mean precipitation difference (average of all 24hrly precipitation for July 1999) between a observation and YTD; b observation and YSU; c observation and TD experiments

Area-averaged RMSE for the domain bounded by 18N–25N and 73E–81E shown in Table 3 indicate that for both temperature and precipitation, the CNTL99 run (use of the NOAH scheme) has better capability than when using the TD scheme. The use of the NOAH scheme improves the accuaracy of the model in terms of rainfall error (reduced to 5.1 from 6.53 mm/day) and temperature errors (reduced from 1.61 to 1.79 °C). Therefore, this study recommends the use of the NOAH scheme for further studies of hydroclimate over the region. The results from the YTD scheme have higher RMSEs for both temperature and precipitation than when using the NOAH and MYJ schemes (CNTL99). The RMSEs from YTD scheme are 2.0 °C and 6.82 mm/day for temperature and precipitation, respectively. Therefore, it is also seen that the combined effect of the NOAH scheme along with the MYJ scheme provides better accuracy than the YTD runs (using YSU and TD schemes).

3.5 Impact of Land Surface Schemes in Data Assimilation Experiments

The main objective of this section is to document the impact of data assimilation in the presence of two different land surface parameterization schemes, namely TD and NOAH. As mentioned earlier, two sets of data assimilation experiments were carried out using the same set of observed data. The assimilation experiments began on January 1, 1994 and continued until December 31, 1994. The first experiment (ASSIM) used the NOAH land surface scheme while the second experiment (AS_TD) used the the TD land surface scheme, keeping all other model parameters the same. In order to examine the relative role of data assimilation and the physical parameterization scheme for land surface processes, monthly mean precipitation and latent heat fluxes for July 1994 from these two experiments have been examined. July mean precipitation from the AS_TD experiment and its difference from the ASSIM experiment are shown in Fig. 14a and b, respectively. It may be noted that these are 6hrly precipitations from the model accumulated on a per day basis. Large scale patterns of precipitation from both the experiments are similar. Maxima in rainfall over the Western Ghats, the north Bay of Bengal and adjoining north-east India are evident. Rainfall over the Gangetic Plains in both experiments is quite large for this year. Differences between the precipitation from these two assimilation experiments show that the ASSIM experiment (NOAH scheme) produces more rainfall over northern India and the central plains as compared to AS_TD. Differences over Bihar, Jharkhand, Chhattisgarh and east Uttar Pradesh is quite large considering both are assimilation experiments and observed data is ingested every 6 hours into the models. Over the head Bay of Bengal, AS_TD produces more rainfall than the ASSIM experiment.

Fig. 14
figure 14

a Monthly mean precipitation from AS_TD experiment; b difference of precipitation between ASSIM and AS_TD experiments

In order to examine the evolution of the annual cycle in these two experiments, precipitation (averaged over the longitudinal belt 70E–90E) has been plotted against a function of latitude in Fig. 15. It can be seen that when using observed data the model is able to elicit the seasonal evolution of precipitation rather well in both experiments. A northward progression of the rainfall belt is noticed at the onset of the monsoon over India. During the monsoon season, the rainfall amount is large over the Indian latitudes. There are instances of active and weak spells within the monsoon season. As the monsoon season ends, rainfall activity reduces over the landmass by October. Over the oceanic region, rainfall is generally higher than over the land in both experiments. The difference in precipitation from these two experiments is shown in Fig. 15c. It is seen that there are large differences and sometimes the amount of difference is 8–12 mm/day over the oceanic region. Over the landmass, the ASSIM run simulates more rain than that of the AS_TD run during the monsoon and post-monsoon seasons. This study indicates that even if the same datasets are used in assimilation, using different land surface schemes may provide a different hydroclimate over the domain. The differences over the oceanic region are due to modifying the atmospheric characteristics through land surface treatment in the assimilation cycle. Differences in land surface properties impact the atmosphere above and in a 6hrly assimilation cycle the impact over the continents propagates to adjoining oceans. Kar et al. (1996) examined relative roles of air–sea interaction and land surface processes on the development of intraseasonal oscillations during the Indian summer monsoon seasons.

Fig. 15
figure 15

Annual cycle of precipitation (mm/day) for 1994 from a AS_TD; b ASSIM; c difference between the two experiments. Precipitation has been averaged over the Indian longitudes (70E–90E)

This study further emphasizes the fact that even if the same set of observed data are used in both experiments the differences in land surface schemes are able to reduce the impact and contribution of observed data being assimilated. The hydroclimate over the region becomes a function of the land surface scheme and the bias of the model. However, it is important to examine which land surface scheme yields better results. The RMSE has been computed for temperature at 2 m and precipitation from the model results compared against the observed IMD data. The RMSE values from these two assimilation experiments are shown in Table 4 for the region within latitudes 18N–25N and longitudes 73E–81E. The ASSIM experiment provides better simulations for both variables. Precipitation error is reduced by about 1 mm/day in the ASSIM experiment, while temperature error is reduced by about 0.2 °C as compared to the AS_TD experiment. Therefore, even with the assimilation of observed data, the land surface process scheme chosen for the model plays an important role in the simulation of hydroclimate variables.

4 Conclusion

Land surface processes play a key role in partitioning the energy transfer between the surface and atmosphere. These processes are also very important for surface hydrology. There has been no systematic study to document the performance of land surface parameterization schemes in the WRF model as far as the Indian monsoon is concerned. Therefore, land surface sensitivity experiments with the WRF (V3.1.1) model were carried out for simulations of the Indian summer monsoon. To examine the impact of land surface schemes on the model, runs were performed using the TD and NOAH schemes of the model. The role of the PBL parameterization schemes along with different land surface schemes has also been evaluated by carrying out additional sensitivity experiments. T effect of these schemes on the precipitation distribution in the WRF model simulations has been noted from experimental runs with the YSU and MYJ PBL schemes. Over the South Asian region, there is considerable interannual variability in the vegetation green fraction. Impact of vegetation green fraction on the model simulations has been carried out by replacing the default climatological USGS vegetation cover data with that of the ISRO data valid for the month of simulation.

Results indicate that differences in the treatment of surface processes in the model leads to large difference in precipitation simulation over the Indian domain. The TD scheme simulates more rainfall than the NOAH scheme over most parts of India and the neighbouring seas. Over the oceanic region, the YSU scheme shows increased precipitation compared to the MYJ scheme. Along the foothills of the Himalayas, the YSU scheme also shows more rain than that of the MYJ scheme. It is seen that irrespective of which land surface process scheme is chosen, precipitation differences due to differences in the PBL scheme are similar. Several hydroclimate parameters were examined using the sensitivity experiments with the ISRO and USGS vegetation green fractions. A comparison of these runs indicates that PBL height grows more where vegetation green fraction is less and vice versa. Therefore, the role of vegetation green fraction in these experiments has been to increase the latent heat flux and reduce the sensible heat flux to the atmosphere. The impact of vegetation green fraction is not only seen over land regions, it also impacts the neighbouring oceanic regions indicatingthe large-scale influence of this parameter.

Two sets of data assimilation experiments were carried out for 1994 using the same set of observed data but with different land surface parameterization schemes. It is found that even if the same set of observed data are used in both experiments, the differences in land surface schemes are able to reduce the impact and contribution of observed data being assimilated. The hydroclimate over the region becomes a function of the land surface scheme. It may be noted here that newer versions of the WRF model has more land surface scheme options such as the rapid update cycle (RUC) LSM scheme. Jin et al. (2010) have shown that the RUC LSM scheme slightly outperforms the NOAH scheme and significantly outperforms the TD schemes. It is envisaged that further work on the detailed land surface and boundary layer characteristics shall be carried out using this version of the WRF model.

The main objective of this study has been to identify suitable land surface and PBL schemes for the reanalysis of the Indian monsoon. This study indicates that even if observed data are used in the assimilation, chosen physical parameterization schemes (especially the land surface scheme) are dominant factors and the hydroclimate of the South Asian monsoon is subsequently defined according to the scheme chosen. In this study, it is found that the MYJ PBL scheme, the NOAH land surface scheme and the ISRO vegetation green fraction cover provide better analysis of the hydroclimate over the domain of interest.