1 Introduction

Prediction of Indian summer monsoon rainfall (ISMR) requires a reliable modeling and forecasting skill since long, due to its importance for the economy and the livelihood of the people of this region. It has a long legacy starting from the time of Blanford (1884), who associated ISMR with the snowfall over the Himalayas. During the era of Walker (1924) the role of Pacific Sea Surface Temperature (SST) came into the picture. The empirical models developed by the India Meteorological Department (IMD; e.g. Rajeevan et al. 2006a) based on several parameters (viz. pressure, temperature, wind, SST, etc.) was also unsuccessful to produce skillful seasonal monsoon rainfall prediction to the level of reliability (e.g. Gadgil et al. 2005). After the advent of the dynamical models, several Atmospheric General Circulation Models (AGCMs) were used to assess the ISMR prediction skill, but were not able to perform up to the satisfactory level (Sperber and Palmer 1996; Goswami 1998 etc.). Thereafter Coupled General Circulation Models (CGCMs) were used for ISMR forecast (e.g. Preethi et al. 2010; Krishnamurthy and Shukla 2011, 2012; Rajeevan and Nanjundiah 2009; Rajeevan et al. 2012; Pokhrel et al. 2013 and many more). Although CGCMs has a slight edge over the AGCMs (e.g. Chaudhari et al. 2013a), ISMR prediction skill is still very low despite all the efforts.

The basic premise of seasonal prediction is the existence of the slowly varying boundary conditions (Charney and Shukla 1981). Thus anomalous SST associated with basin wide event viz. El-Niño Southern Oscillation (ENSO), Indian Ocean Dipole (IOD) are found to be the primary source of predictability (Shukla and Wallace 1983; Brankovic et al. 1994; Kumar and Hoerling 1998). The predictable signal due to slowly varying boundary condition is, however constrained by the internal atmospheric dynamics (Intraseasonal oscillations; Goswami 1998; Ajaya Mohan and Goswami 2003; Goswami and Xavier 2003). It is known that potential seasonal predictability over the tropics is better as compared to the extra tropics (Shukla 1998; Phelps et al. 2004; Chen et al. 2010). It is due to the presence of large signal to noise ratio (SNR), which deteriorates in extra-tropics due to the presence of midlatitude synoptic-scale systems. In spite of large potential predictability in the tropics, predictability over Indian domain is very low (e.g. Kang et al. 2004; Kang and Shukla 2005; Wang et al. 2005). For AGCM, the predictability is the boundary value problem, however, in case of CGCM it is an initial value problem. So a thorough understanding of the impact of the different initial conditions in the Indian monsoon simulation by a CGCM is required.

The AGCM study of Phelps et al. (2004) has demonstrated that atmospheric initial conditions do not have great influence on seasonal mean predictability and the model forecast skill is mainly determined by the interannual variations in the SST anomalies. As per Reichler and Roads (2005) the influence of initial conditions on atmospheric variables last only for the first 3 weeks and for some, its influence last up to 8 weeks. Chen et al. (2010) has shown (using model monthly mean temperature) that zeroth month lead time (L0) has the highest forecast skill and it decreases rapidly (slowly) over extra-tropics (tropics) with few exceptions. Over the tropical regions their model results show highest skill at L0 during summer monsoon season (JJA; June–August). They have attributed the predictability over tropics mainly to the anomalous SST and the role of atmospheric and land initial conditions is minimal. Slingo and Palmer (2011) argued that the decrease (increase) of model bias (skill) with the decreasing lead-time is also indicative of the dynamical model drift. In the context of ISMR, Singh et al. (2012) have studied the lead time dependency of predictability using two AGCMs and four CGCMs. They found higher predictability in the first month lead time (L1) and L0 for CGCMs and AGCMs respectively. They have speculated this difference to the higher spin-up time for CGCMs as compared to AGCMs. At times, however, estimates of predictability from the Atmospheric Model Intercomparision Project (AMIP) simulations have been questioned due to the absence of coupled air–sea interactions (van den Dool et al. 2006). Despite having proper representation of air–sea interactions, the systematic bias in the CGCM is the major hindrance to achieve the reliable seasonal predictability of ISMR (Turner et al. 2005; Chaudhari et al. 2013a). The systematic bias in CGCMs is not letting its skill to surpass that of AGCMs by sound margin, and the main causes behind the systematic bias are: improper representation of many basic physical processes (i.e. error in model physics) and error in initial conditions (i.e. some inadequacies in assimilation system).

Climate Forecast System version 2.0 (CFSv2; Saha et al. 2014a) of the National Centers for Environmental Prediction (NCEP) is a coupled atmosphere–ocean-land model, which is used for global forecast in different lead times (week to several months). CFSv2 shows considerable improvement in simulation of various aspects of the Indian summer monsoon (Saha et al. 2014b; Sahai et al. 2013) as compared to previous version (i.e. CFSv1; Pokhrel et al. 2012a; Chaudhari et al. 2013b), thus it is pertinent to check the sensitivity of prediction skill to the different initial conditions in the retrospective forecast of CFSv2. Previously Drbohlav and Krishnamurthy (2010; now onwards DK10) have done an extensive study to check thoroughly the forecast and predictability errors in CFSv1, which was constrained by the presence of various biases across the Indian ocean basin (Seo et al. 2007; Chaudhari et al. 2013b; Pokhrel et al. 2012a). Since there are many improvements in the simulation of Indian summer monsoon by its latest version CFSv2 (Saha et al. 2014b), it is expected that forecast and predictability errors might have different characteristics for different initial conditions, which may eventually sensitize the ISMR prediction skill.

This study focuses on the forecast and predictability errors at different lead times of CFSv2 hindcast simulations and elucidates importance of initial conditions and model physics for the prediction skill of ISMR. The initial conditions show a significant spread in atmospheric and oceanic state variables when they traverse through the peak of ENSO phase in the Pacific (during December, January and February month) to the presence of spring predictability barrier (during March and April month) and finally passes through the normal ENSO signal (during May month). Thus the skill of CFSv2 to simulate ISMR, initialized during these varied atmosphere and Ocean conditions is expected to have spread. This study tries to bring out those differences in terms of model and initial condition errors. Section 2 describes the model and hindcast runs, Sect. 3 describes the biases of various model parameters for different initial conditions. Section 4 discusses different monsoon metrics at different leads. Section 5 elaborates the spring predictability barrier aspect of the forecast and Sect. 6 explains the forecast and predictability errors and Sect. 7 summarizes and conclude the results.

2 Data and model

The CFSv2 retrospective forecasts are a set of 9-month long hindcasts initiated every 5th day starting from 1st January with four ensemble members per day for the period from 1982 to 2009. This 28-years ensemble retrospective forecast dataset from CFSv2 with 24 members is provided by NCEP (http://cfs.ncep.noaa.gov). Initial conditions for the atmosphere and ocean come from the NCEP Climate Forecast System Reanalysis (CFSR, Saha et al. 2010). Beginning at January 1st, 9-month hindcasts were initiated every 5 days with four cycles (00, 06, 12, 18 GMT) on those days. NCEP compiled the monthly estimates as follows: for each calendar month, the hindcasts with initial dates after the 7th of that month were used as the ensemble members of the next month. For example, the starting dates for the January ensemble members are (called as February release by NCEP) the January 11th, 16th, 21st, 26th, 31st, and the February 5th. For the analysis, we have utilized ensemble mean forecasts obtained by averaging of these 24 ensemble members.

CFSv2 (Saha et al. 2014a) used in retrospective forecasts consists of a spectral atmospheric model (GFS) at a high resolution of T126 (~0.937°) with 64 hybrid vertical levels and the advanced version of the GFDL Modular Ocean Model, version 4p0d (Griffies et al. 2004), which is a finite-difference model at 0.25–0.5° grid spacing with 40 vertical layers. The atmosphere and ocean models are coupled with no flux adjustment. The convection scheme employed in the atmospheric component of CFSv2 is the simplified Arakawa–Schubert convection (Hong and Pan 1998), with cumulus momentum mixing and orographic gravity wave drag (Saha et al. 2010). It uses the rapid radiative transfer model (RRTM) shortwave radiation with advance cloud radiation integration scheme (Iacono et al. 2000; Clough et al. 2005; Saha et al. 2014a). It is also coupled to a four-layer Noah land-surface model (Ek et al. 2003) and a two-layer sea ice model (Wu et al. 1997; Winton 2000).

For validation of model simulations, we have used NCEP reanalysis dataset version-2 (NCEP-R2; Kanamitsu et al. 2002) for winds at 850 and 200 hPa. SSTs from Reynolds et al. (2002) and rainfall data (1979–2009) from Global Precipitation Climatology Project (GPCP; Adler et al. 2003) are used.

3 Simulation of mean and forecast error

This study uses Indian summer monsoon season (JJAS; June–September) data from 1982 to 2009 (i.e. for 28 years). The lead forecasts of January, February, March, April, May and June release (as termed by NCEP) are known hereafter as L5, L4, L3, L2, L1 and L0 respectively (This represents December, January, February, March, April and May initial conditions respectively). Accuracy in the simulation of mean climate is also linked with the forecast skill of a model (Delsole and Shukla 2010). Therefore, it is also important to know how well CFSv2 is able to mimic the mean state of the climate at different lead times.

3.1 Rainfall

The seasonal observed GPCP climatology of rainfall is marked by well known inter-tropical convergence zone (ITCZ) coinciding with the meteorological equator and its extension as South Pacific convergence zone (SPCZ) starting from the maritime continent (Fig. 1a). Typically Indian subcontinent is marked by continental and oceanic tropical convergence zone, the first being situated over the central India and the foothills of the Himalaya and the second over the equatorial Indian Ocean (Fig. 1a). It also shows the orographic high raining zones over Western Ghats and Myanmar coast along with the ocean basin of the Bay of Bengal and South China Sea, which is the breeding ground for the large number of synoptic scale systems (Goswami et al. 2003). The model simulated seasonal mean is shown for the L2 (i.e. March initial condition, Fig. 1h), just for a reference. The model simulations are realistic enough to get all the major raining bands. However, it overestimates (underestimates) over oceanic (land) regions, which is systematic in nature (Fig. 1i–n). The dry bias over land region and particularly over Indian region is a major concern, which is rather enhanced in CFSv2 as compared to its previous version (CFSv1; Saha et al. 2014b). In spite of these widespread biases, the grid-by-grid correlation with observation becomes more robust with decreasing lead time of forecast (Fig. 1b–g). This indicates that overall rainfall tends to be more skillful at the last lead time (i.e. L0) over the Pacific and the Indian Ocean basin. This will be much clear by the quantitative values at some selected regions of interest. The rainfall biases, in general are pervasive and the nature of bias is almost constant in all the leads, this further confirms that biases are very systematic. Unlike correlation the amplitude of bias decreases marginally with decrease in forecast lead time.

Fig. 1
figure 1

JJAS mean rainfall (mm/day) in shaded and standard deviation in contour a GPCP and h CFSv2 (L2 Forecast). Spatial JJAS seasonal rainfall correlation between GPCP and CFSv2 for b L5, c L4, d L3, e L2, f L1, g L0 and bias for i L5, j L4, k L3, l L2, m L1 and n L0. The boxes in i represents the study regions for the estimation of quantitative values of model errors

The quantitative representation of biases in the mean and correlation coefficient is shown by Table 1, which is the averaged value over four regions closely linked with Indian summer monsoon. These four regions are East Equatorial Indian Ocean (EEIO; 50°E–100°E, 10°S–10°N), Western Equatorial Pacific Ocean (WEPO; 110°E–160°W, 15°S–15°N), Eastern Equatorial Pacific Ocean (EEPO; 130°W–80°W, 15°S–15°N) and the Indian land region (ILR). These regions are selected slightly extending Niño (related to El Niño) and IOD boxes over the Pacific and the Indian Ocean respectively, as they have the major influence on ISMR prediction. These regions are marked as boxes in Fig. 1i along with L5 rainfall bias. The dry rainfall bias over ILR decreases with the decreasing lead and attains least value of 1.2 mm/day at L0. The maximum correlation over ILR at L3 (0.21) also has the largest bias (−2.35 mm/day) and the same correlation value is also evident at L0. In case of oceanic regions, the correlation over EEIO (WEPO and EEPO) first decreases from L5 to L4 (L3) and then it increases to its maximum values at L0. The bias of both the regions over the Pacific basin (WEPO and EEPO) first increases till L3/L2 then decreases to its lowest value at L0, contrary to EEIO where rainfall bias first decreases till L2 then increases to its maximum magnitude at L0 (see Table 1). Thus, except ILR the correlation (bias) in general attains least (largest) value during L3/L2, which symbolizes the well known aspect of the Spring Predictability Barrier (SPB), which will be discussed in Sect. 5.

Table 1 Seasonal (JJAS) correlation and bias of CFSv2 rainfall with respect to GPCP over four different regions; (a) Indian land region (ILR), (b) East Equatorial Indian Ocean (EEIO; 50°E–100°E, 10°S–10°N), (c) Western Equatorial Pacific Ocean (WEPO; 110°E–160°W, 15°S–15°N) and (d) Eastern Equatorial Pacific Ocean (EEPO; 130°W–80°W, 15°S–15°N) at different leads

Coupled model producing the reasonable facsimile of the observed interannual variability (IAV) has greater likelihood of better prediction (Sun and Wang 2013). Furthermore, a realistic simulation of IAV also depends upon the model’s ability to simulate the mean accurately (Delsole and Shukla 2010). Thus, it is important to investigate the interannual variability of summer rainfall in CFSv2 with respect to observation. Since CFSv2 shows large systematic error in the mean rainfall, IAV of rainfall is also expected to be largely affected by these biases. Here IAV is represented as interannual standard deviation (SD) of the JJAS mean rainfall during different leads. Maxima of the interannual SD in observed rainfall (GPCP) in general coincides with regions of high rainfall, viz. over ITCZ, SPCZ and also over the eastern Indian Ocean, coastal Western Ghats, head Bay of Bengal and northeast India as shown by contour (Fig. 1a). Also, most of the oceanic high variability zones of rainfall converge with the high SST warm pool region (figure not shown). This establishes the fact that in the tropics, SST significantly controls the variability and hence the predictability of the rainfall.

Although CFSv2 is able to simulate the high SD zone over all the major raining bands, it does not coincide with the maxima of mean and rather there is a southward shift (particularly over ITCZ). Apart from this, CFSv2 overestimates (underestimates) the SD over the Pacific basin (eastern Indian Ocean and the Indian subcontinent; Fig. 1h). The magnitude and nature of this SD bias are maintained in all the forecast lead times, however, its amplitude slightly decreases (figure not shown). This underestimation of IAV of rainfall over land region is not unique to CFSv2 alone, almost all the models of the DEMETER project (Preethi et al. 2010) and that of the ENSEMBLES project (Rajeevan et al. 2012) too suffered from the similar problem. Rainfall over land is also governed by the robust coupling between slowly varying land surface state and atmospheric processes (Dirmeyer et al. 2006; Saha et al. 2011, 2012, 2013), which may be one of the probable components of these widespread biases over the land regions.

3.2 Winds at 850 hPa

The mean seasonal (JJAS) wind at 850 hPa is marked by intense south-westerlies over the north Indian Ocean, strong easterlies over tropical zone (20°S–20°N) of the Pacific basin and over the southern part of equator till 20°S in the Indian Ocean (Fig. 2a). CFSv2 has clearly simulated the mean features accurately with the differences being in the amplitude of the wind field (Fig. 2b). It is noteworthy that the equatorial easterlies are much more intense in CFSv2 and the maximum has a westward shift from the analysis field over the Pacific basin. As these easterlies over the equatorial Pacific basin form the lower branch of the planetary scale Walker circulation (Krishnamurti et al. 1973), this may have impact on ISMR predictability (e.g. Goswami et al. 1999; Soman and Slingo 1997; Pokhrel et al. 2012a). The south-westerlies (westerlies) over the north (equatorial) Indian Ocean are predominately underestimated (overestimated). The southern ocean westerlies (at 30°S) are also overestimated in the CFSv2.

Fig. 2
figure 2

JJAS mean wind speed at 850 hPa (m/s) in shade and direction as vector a NCEP Reanalysis-2 and b CFSv2 (Lead-2 Forecast). JJAS seasonal wind speed bias at 850 (shaded) and wind direction bias (vector) for c L5, d L4, e L3, f L2, g L1 and h L0

The lead-wise bias clearly shows the error growth in the CFSv2 simulations (Fig. 2c–h). The south-westerlies over the north Indian Ocean is one of the most important aspects of the ISMR prediction, which is highly underestimated in L5 (Fig. 2c) and this underestimation decreases as the lead decreases (see Fig. 2h). This is expected as the target season is close to the initialization in L0 forecast as compared to the L5 forecast. The same holds true for the westerly jet at southern ocean (at 30°S). In this case overestimation reduces as the lead decreases. The most intriguing aspect seems to be the equatorial wind bias over both the Indian and the Pacific Ocean basins. Here the wind bias increases as the lead decreases. This zone lies over 2°S–8°N throughout the Indian Ocean basin, including maritime continent and also over the equatorial Pacific basin (10°S–10°N; 160°E–140°W). This is further clear by the quantitative values over the study regions as given in Table 2. The positive wind bias over EEIO (WEPO) increases as the lead decreases and attains largest value of 0.88 (0.68) m/s in L0. However, in case of EEPO, it decreases as the lead decreases and have its least value of 0.35 m/s at L0. Thus, combined regions of EEIO and WEPO together have opposite behaviors than EEPO. These equatorial regions play very important role in the equatorial wave dynamics through wind forcing, which is known to excite Rossby and Kelvin waves and helps in ocean heat adjustments through thermocline and mixed layer depth changes (Schott et al. 2009). These biases, thus, play a significant role in defining SST variability and subsequent atmospheric response through the SST. These wind biases, thus, seem to hold the key for the ISMR predictability. Here one new aspect emerges, that all the biases do not decrease as the lead decreases, rather some biases increase as the lead decreases. Now it is important to know why the equatorial wind bias increases as the lead decreases, is it the initialization error or the model error. The error regarding model error and initialization error are discussed in Sect. 6.

Table 2 Seasonal (JJAS) bias of CFSv2 winds at 850 hPa with respect to NCEP-R2 over the three oceanic study regions (EEIO, WEPO and EEPO) at different leads

3.3 Wind at 200 hPa

The upper level winds at 200 hPa is characterized by the tropical easterly jet (TEJ) over the Indian subcontinent region (centered around 10°N) and very intense upper level Southern hemispheric sub-tropical westerly jet (SSWJ) centered around 30°S (Fig. 3a). Strength of TEJ is related to the monsoon activity over the Indian subcontinent (e.g. Naidu et al. 2011). Strengthening (weakening) of TEJ is generally related to strong (relatively weak) monsoon over Indian region (e.g. Chaudhari et al. 2013b). SSWJ is an integral part of the tropical Hadley circulation, which is mainly driven by thermal forcing. Intense SSWJ in the model may lead to weak ISMR through relatively weak cross equatorial flow, as also seen in the previous version of the model (CFSv1; Chaudhari et al. 2013b). Tibetan high, as indicated by the upper level anticyclonic winds over the Himalayan region is also evident, which is basically maintained by the sensible heating at elevated terrain (Flohn 1968). Further, it may also reflect convective heating in the atmosphere forced by uplift of the air by the Himalayas (e.g. Yanai and Wu 2005). The location and intensity of Tibetan anticyclone are one of the important components of the monsoon circulation over Indian sub-continent. Since TEJ is located at the southern flank of Tibetan anticyclone, TEJ strength directly relates to a Tibetan anticyclone (e.g. Raghavan 1973; Krishnamurti and Bhalme 1976, etc.).

Fig. 3
figure 3

Same as Fig. 2 but for wind at 200 hPa

CFSv2 realistically simulates all the mean features (Fig. 3b), however, magnitude of TEJ is underestimated as compared to NCEP Reanalysis (Fig. 3a, b). Error in TEJ decreases with decreasing lead (starting from L5 to L0; Fig. 3c–h). CFSv2 is also able to replicate the position and strength of the Tibetan High (Fig. 3a, b). In case of SSWJ, its intensity is overestimated in CFSv2, however, there is a very marginal decrease in lead-wise SSWJ bias (Fig. 3c–h). Thus, model leads may have marginal impact on SSWJ. The quantitative value also justify the same (Table 3). Bias over all the study regions is least at L0, in case of EEIO and EEPO first it increases till L3, then it decreases to its least value in L0. The negative wind bias at L0 is 1.19, 0.83 and 0.61 m/s for EEIO, WEPO and EEPO regions respectively. Thus, there is a different behavior of the wind biases at 850 and 200 hPa level over EEIO and WEPO regions. In the lower level (850 hPa) the wind bias increases with the decrease in lead and in the upper level (200 hPa) it shows the opposite to that of the lower level. However, in case of EEPO region wind bias at both levels decreases with the decrease in lead.

Table 3 Same as Table 2, but for CFSv2 winds at 200 hPa

3.4 SST

SST is one of the most important parameters for the prediction of the ISMR (Saha 1970; Shukla 1975) as it controls the ocean response to the atmosphere in terms of various feedbacks. Warmer (colder) SST over the central and the eastern equatorial Pacific Ocean (i.e. ENSO events) has a tendency to substantially reduce (enhance) the amount of rainfall over the Indian subcontinent during monsoon season (Sikka 1980; Rasmusson and Carpenter 1983; Shukla 1987; Pokhrel et al. 2012a) through changes in planetary scale Walker circulation. Similarly the Indian Ocean SST too governs the ISMR through the modulation of regional Hadley circulation (Ashok et al. 2001) during IOD events. These SST variations are in general governed by subsurface ocean dynamics and surface heat flux forcing (Shinoda et al. 2004). The surface heat forcing is mainly controlled by the cloud cover.

Seasonal mean SST shows tropical warming due to constant solar insolation, and a warm pool region (SST > 28 °C) exist over the western Pacific Ocean and over the central and the eastern equatorial Indian Ocean basins (Fig. 4a). The prevalent tropical easterlies also tend to pile up equatorial warm waters over the western Pacific Ocean region. The inter-annual SD (overlaid contour) is maximum over the tropical central and east Pacific basins. The region with higher SD is also present over southern Bay of Bengal. As discussed above, the high variability of SST over eastern Pacific has a great impact on ISMR. Thus, any changes in the mean as well as SD values over these regions in different leads may have a influence on ISMR prediction.

Fig. 4
figure 4

Same as Fig. 1 but for SST. Here observational reference is Reynolds SST

CFSv2 realistically simulates all the general characteristic features of the observed mean SST (Fig. 4h). However the amplitude is considerably less which is evident by pervasive negative bias over the major part of the Pacific and Indian Ocean basins in all the leads (Fig. 4i–n). In spite of having lower value of the mean, the SDs are quite high over the central and the eastern equatorial Pacific Ocean basins and also over some part of the central Indian Ocean basin (shown by contours in Fig. 4h). The spatial SST correlation between the observation and the model (shaded), increases as the lead decreases (Fig. 4b–g), which shows that SST simulations tend to be more realistic as the lead decreases, throughout the Indian and the Pacific basins. This is also clear by the presence of largest correlations over the study regions (Table 4) at L0 as compared to other leads (viz. 0.69, 0.71 and 0.75 in EEIO, WEPO and EEPO regions respectively). Even the bias in SD of SST over central, east equatorial Pacific basins and over the equatorial central Indian Ocean basin keeps on decreasing as the lead decreases (figure not shown). The SST bias seems to be invariant among all the leads, with pervasive cold bias throughout the study region, except the western coast of north and south America (Fig. 4i–n). In case of EEIO, the SST bias increases as the lead decreases till L1, thereafter it decreases to 0.76 °C in L0. In WEPO region, it decreases constantly as the lead decreases and attains least value of 0.27 °C at L0. Over EEPO, it shows no clear trend of increasing or decreasing SSTs.

Table 4 Same as Table 1, but for CFSv2 SST’s

The cold SST bias over north Indian Ocean is particularly due to the underestimation of near surface specific humidity, which eventually lead to higher evaporation and colder SSTs as also found in the freerun of CFSv2 (Pokhrel et al. 2012b). The warm (cold) bias over the eastern Pacific basin (rest of the region) is due to under-estimation (over-estimation) of the low-level marine stratocumulus clouds (low and middle level stratus clouds; Yoo et al. 2013; figure not shown). Although some part of SST bias might be controlled by the ocean dynamics, these systematic biases in clouds, particularly over these regions, are dynamically consistent with the systematic biases in SST. The zonal surface wind in the model is stronger than the observation beyond L2 (figure not shown). Hence, apart from the radiative component (due to cloud cover bias) the turbulent heat flux in terms of latent heat may also play a major contributor for this large negative SST bias.

4 Indian summer monsoon indices

To assess the performance of the model quantitatively for the simulation of ISMR and its characterizing metrics in different leads, three very well known indices are used. Each of these three indices manifest different aspect of ISMR. The first being the ISMR index (Parthasarathy et al. 1995), which is the averaged rainfall over the Indian landmass, second one is Webster–Yang index (WY; Webster and Yang 1992), which represents the vertical shear of zonal wind between 800 and 200 hPa averaged over 0°N–20°N, 40°E–110°E and third being the Niño3.4 index, which is averaged SST values over Niño3.4 region (170°W–120°W, 5°S–5°N). The ISMR index is one of the most basic and widely used indices representing the overall monsoon performance. The WY index is a circulation index and describes the broad-scale South Asian monsoon variability that is primarily driven by convective heat sources. On the same line, Niño3.4 index represents the SST variability over the central equatorial Pacific basin and its proper simulation directly implies the possibility of better teleconnections through atmospheric response. Better representation of both WY and Niño3.4 indices may lead to better prediction of ISMR.

To compare the relative performance of these indices among different leads, Taylor diagram is used (Fig. 5a–c). Taylor diagram provides information of correlations, root-mean-square differences, and the ratio of variances (Taylor 2001). The distance from the origin is the normalized SD of the field with respect to the SD of the observed climatology. Also the distance from the reference point to the plotted point gives the root-mean-square difference (RMSE). Reference point represents the truth and is basically the position of the observed value with respect to other data to be compared. The correlation between the model and the observed climatology is the cosine of the polar angle. Thus the lead which has largest correlation, smaller RMSE and comparable variance will be close to the observations. Several studies have used Taylor plots for comparing model statistics (e.g. Pokhrel et al. 2013).

Fig. 5
figure 5

Taylor plot for all the leads of the CFSv2 for a ISMR, b WY index and c Niño3.4 index

The skill score of ISMR in all the leads in terms of correlation has the range from 0.43 to 0.64 (also shown in Table 5) and the variance explained ranges from 55 to 75 % (Fig. 5a). L3 has the maximum correlation of 0.64 and explains 65 % of the observed variance. L1 too shares the equal variance as that of L3 and correlation too is marginally less at 0.60. In terms of the best representation of variance, L2 performs the best, as it explains almost 75 % of the observed variance, but has a correlation of only 0.49. Thus, considering both correlation and variance together, L1 and L3 perform better for the ISMR skill score and among them, L3 is more robust. To check the consistency of maximum correlation at L3, we have also used IMD one degree gridded rainfall data (Rajeevan et al. 2006b) and CPC Merged Analysis of Precipitation (CMAP) data (Xie and Arkin 1997). Both these data sets show the same result (Table 5), wherein the largest correlation is at L3 (0.57 and 0.42), followed by L1 (0.51 and 0.34) for IMD rainfall and CMAP respectively.

Table 5 Skill of ISMR with respect to three independent data, viz. GPCP, CMAP and IMD

In terms of circulation index (viz. WY index) performance, L1 performs the best, both in terms of correlation and variance, as it explains exactly the same variance as that of the reanalysis field and also has the largest correlation of 0.70 (Fig. 5b). L0 also performs better with the correlation of 0.67 and explains variance of 95 % of the reanalysis. L4 has the worst performance with a correlation of 0.28 and variance explained is 78 % of the reanalysis. L5 underestimates, however, L2 and L3 overestimate the variance. SST based index (viz. Niño3.4) skill is realistically captured by L0 and L1, with the correlation of 0.80 and 0.77 and variance explained is overestimated by 25 and 20 % respectively (Fig. 5c). Thus considering all the aspects of monsoon performance in terms of real precipitation skill score, circulation skill score and SST skill score, L1 has the best performance. However, in terms of ISMR prediction skill, L3 has best performance.

5 Spring predictability barrier (SPB)

The spring predictability barrier is known to be one of the factors responsible for the decrease in ENSO lead forecast skill during the spring season (Webster and Yang 1992; Duan and Wei 2013) and thus controls the ISMR skill. The rapid seasonal transition of monsoon circulation during spring time along with the existence of weak mean east–west SST gradient and very weak ocean–atmosphere coupling leads to spring predictability barrier (Webster and Yang 1992; Webster 1995). During the spring season the SST anomalies are relatively small in terms of predictable signal, which falls behind, in the presence of larger atmospheric and oceanic noises (Xue et al. 1994; Chen et al. 1995). Thus, SPB is basically intrinsic physical property of ENSO forecasting (Samelson and Tziperman 2001) and consequently it reflects back into ISMR forecasting. CFSv2 is able to realistically simulate the presence of SPB in both Niño 3.4 and IOD indices (particularly at L3), indicating its robustness in ENSO forecasting (figure not shown). Despite the presence of SPB in ENSO and IOD signals, ISMR prediction skill is maximum at L3, thus there may be some phenomena other than ENSO and IOD which may control the ISMR prediction in CFSv2.

5.1 Cross correlation

The cross correlation of both ENSO and IOD indices with ISMR is shown in (Fig. 6a–c). First two panels of Fig. 6a, b represents cross-correlation for CFS simulations for all the start times and all the lead months (e.g. first column of the Fig. 6a shows correlation of 1st, 2nd, 3rd, …, 10th month’s Niño 3.4 index with ISMR (rainfall during JJAS) of L5 simulations from bottom to top respectively, and the second column represents the similar correlation for L4 and so on till L0) and the third panel (Fig. 6c) shows the observed correlations. In case of observation, we get single value of correlation of ISMR for each month’s Niño 3.4 index, which is therefore shown as a line plot, with the abscissa (ordinate) representing months of a year (correlation).

Fig. 6
figure 6

Cross correlation as a function of start time and lead months between CFSv2 simulated ISMR and a CFSv2 simulated JJAS averaged Niño3.4 SST and b CFSv2 simulated IOD index. The area between two slanting lines represents JJAS season. The same cross correlation (c) between GPCP ISMR and Reynolds Niño3.4 index and IOD index. The correlation significant at 99 % confidence level is marked by asterisk symbol

CFSv2 simulated Niño 3.4 index shows statistically significant negative correlation with the ISMR in JJAS season for all the initial conditions, which is similar to the observations. In case of IOD index, although the observed values are positive and significant during the July month (Fig. 6c), CFSv2 is completely out of phase with the observed ones (as the correlations are negative, instead of positive; Fig. 6b). This further testifies that CFSv2 is unable to depict the actual relationship between IOD index and ISMR. It may be due to very strong ENSO signal in the model (as indicated by the significant negative cross correlation) which dominates the model simulated weak IOD signal. Thus, it is not viable to study the skill in a combined way, rather we have to see explicitly pure ENSO signal (i.e. ENSO without concurring IOD) and pure IOD signal (i.e. IOD without concurring ENSO).

5.2 Skill

The simultaneous correlation of model simulated and observed values for both Niño 3.4 and IOD indices clearly establish that prediction of ENSO is skillful in CFSv2 (Fig. 7a, b). All lead months of all start times are significantly correlated with the observed Niño 3.4 index. This indicates that for the ENSO prediction, the start time becomes immaterial. However, this is not true in case of the IOD index. In case of IOD index, the hindcast skill is significant till 1 month lead time for December to April initialization. However, for May initial conditions its IOD prediction skill remains significant till 6-month lead time. Despite, both ENSO and IOD has the best skill at May start time (i.e. L0), ISMR skill is highest at L3. This indicates the role of processes other than ENSO and IOD to be involved in dictating ISMR skill at L3 in CFSv2 simulations.

Fig. 7
figure 7

Skill as a function of start time and lead months between observed values and CFSv2 simulations for a Niño3.4 index and b IOD index. The area between two slanting lines represents JJAS season. The correlation significant at 99 % confidence level is marked by asterisk symbol

6 Forecast and predictability errors

The predictability of a model is characterized by two types of errors. One being the model error, i.e. errors arising due to imperfection in the model due to its parameterization schemes, resolution, truncation, ill representation of physical processes etc. and other being due to the initialization (DK10). Forecast error is the deviation of the model results with respect to the observations (which has been discussed so far), which represents error due to all possible reasons (i.e. model physics, dynamics, initial conditions etc.). The predictability error arises due to error in the initial condition, assuming the model to be perfect (DK10). The sensitivity of predictions on initialization may be better represented by the predictability error as compared to forecast error. The predictability error is represented here as the deviation between the model’s two lead forecasts at 1 month apart (i.e. L5 predictability error = L5 − L4 etc.). Now to get these deviations in absolute term, we find the RMSE and calculate the final forecast or predictability error. Suppose Xi,j is the prediction of a variable corresponding to ith season (i = 1982–2009) of the jth lead time in month (j = 5, 4, 3, 2, 1) for “N” number of years and Oi is its analogous observed values, then forecast error (FE) and predictability error (PE) corresponding to the jth lead time will be

$${FE_{j} = \sqrt {\frac{{\sum\nolimits_{i = 1}^{N - 1} {(X_{i,j} - O_{i} )^{2} } }}{N}} }$$
(1)
$${PE_{j} = \sqrt {\frac{{\sum\nolimits_{i = 1}^{N - 1} {(X_{i,j} - X_{i,j - 1} )^{2} } }}{N}} }$$
(2)

A Similar approach was utilized by various previous researchers as well (e.g. DK10). Here we have identified those regions, wherein the effect of initialization may lead to significant model error in case of monsoon seasonal forecast. Even with a perfect model, a small error in the initial conditions (which is unavoidable) can spoil the predictability. Furthermore, a prediction from dynamical system may be more accurate because of a particular set of initial conditions (e.g. Kleeman 2002).

6.1 Rainfall

In any non-linear system, the initial error grows with time may make the system unpredictable (Lorenz 1963). This is clearly seen as the model forecast error has much higher value as compared to the predictability error for the same lead. This implies that model errors are the major ones and the initial condition errors are of secondary importance (Fig. 8a–j). All the major raining band, including the ITCZ, SPCZ, both the convergence zone corresponding to the Indian monsoon (viz. continental and oceanic tropical convergence zone) and also Western Ghats of India have significant forecast error at all the leads (Fig. 8a–e). This indicates that the cumulus parameterization and grid-scale cloud microphysics schemes used in the model may induce errors in the rainfall simulations. This may be further augmented by the errors in the prediction of large scale circulation features and forecast fields which influences it (viz. SST) by providing erroneous ingredients for these parameterization schemes.

Fig. 8
figure 8

Forecast error of rainfall at a L5, b L4, c L3, d L2, e L1 and Predictability error of rainfall at f L5, g L4, h L3, i L2 and j L1

Predictability error is more concentrated over the equatorial belt of the western Pacific and the eastern Indian Ocean basins (Fig. 8f–j). Overall the predictability error decreases as the lead increases. This represents the ability of the model to reach at two different states which is very close to each other, if initialized with sufficient longer leads. This may be due to the long term inertia effect of slowly varying boundary conditions, which is the basic premise of seasonal prediction (Charney and Shukla 1981). Moore and Kleeman (1997) argued that variance growth in an ensemble prediction system for ENSO depends on the phase of the ENSO that initial condition comes from. It is shown that initial condition during the ENSO phase when the amplitude is high, the instability is reduced and the predictability becomes high. This explains the least error in the L5 predictability error. A Similar feature was observed with the previous version of the same model by DK10 also. It is interesting to note that L2 has the maximum predictability error concentrated over the equatorial western Pacific, including maritime continents, eastern equatorial Indian Ocean and also Indian land mass (typically over the central India and the foothills of Himalaya) and the Western Ghats of India. L2 (L1) represents most of the March (April) initial condition with some initial states coming from early April (May) month. The initialization during April–May is marked by the changing regimes of the mean winds (e.g. existence of westerly wind bursts in the region of prevailing easterlies over the equatorial Pacific Ocean), which may be some of the factors responsible for the predictability error if the model being initialized at this time. This error may be associated with the SPB effect of ENSO as discussed in the previous section.

The quantitative estimates of both forecast and predictability errors over the study region are given in Table 6. Forecast error is more or less constant over all the study region at all the leads as the range of error is marginal as compared to mean error. Furthermore the forecast error is highest over ILR (maximum error is 3.90 mm/day) and least over EEPO (maximum error is 1.84 mm/day). This clearly shows that the model simulates much better rainfall over the eastern Pacific region i.e. over the main spatial domain of ENSO dynamics and offers a great challenge for ISMR prediction. Predictability error, in general, has the greater contribution at L1 and L2, and least at L5 over all the study regions. WEPO (ILR) region has the largest (least) contribution of predictability error in terms of percentage contribution to the forecast error, which is 26–37 % (13–22 %). Thus, it is clear that one-third of the total error in rainfall (ISMR) may be contributed by the initial condition error, which is a significant portion of the total error.

Table 6 Seasonal (JJAS) forecast and predictability error at all the four study regions at different leads

6.2 Zonal wind 850 hPa

Most of the forecast error is concentrated over the equatorial zone and the most prominent being along the Indian Ocean basin and the western Pacific basin (Fig. 9a–e). Over the Southern Ocean of the Pacific basin and along the Western Ghats region of India, there exists a significant amount of the model forecast error. The forecast error over the Western Ghats (equatorial West Pacific) region decreases (increases) as the lead decreases and the forecast error over the southern Pacific Ocean first increases and peaks at L2 then further decreases. It seems that there is a large model deficiency in simulating the zonal component of wind over the equatorial Indian Ocean basin. The error over the equatorial Indian Ocean seems to be not sensitive to the initialization. Predictability error is mostly present over the equatorial western Pacific basin and it increases as the lead decreases and peaks at L2 then further decreases (Fig. 9f–j). The same holds true for the equatorial eastern Indian Ocean basin. This implies that at L2 greater amount of total model error depends on the initialization error at the equatorial western Pacific basin (Fig. 9i). This clearly indicates the model’s inability to capture the sudden westerly bursts of the zonal wind component in the area of the dominant easterlies, which may be due to the spring predictability barrier (Webster and Yang 1992) in particular during March (L3) and April (L2) leads. This is clearly seen in the predictability error in rainfall, cloud (figure not shown) and SST, indicating the dynamical consistency in the model. Table 7 shows these errors over the three oceanic regions for the zonal wind at 850 hPa. Predictability error has the largest contribution in WEPO region as compared to EEIO and EEPO regions, and it reaches up to 45 % in L2. This implies that initialization, too, is a great error contributor over these specific regions. Furthermore EEPO has the least forecast error (range 1.63–1.91 m/s) as compared to both EEIO (range 2.66–3.24 m/s) and WEPO (range 2.06–2.54 m/s).

Fig. 9
figure 9

Same as Fig. 8, but for 850 hPa Zonal wind

Table 7 Same as Table 6, but for zonal wind at 850 hPa for three oceanic regions

6.3 SST

Similar to precipitation and zonal wind patterns at 850 hPa, SST also has large forecast error as compared to the predictability error (Fig. 10a–j). Most of the errors are concentrated over the sub-tropics rather than the tropics, except very narrow strip over the central and the eastern equatorial Pacific basin and the western Arabian Sea (off Somalia coast). Surprisingly the tropical (sub-tropical) forecast error increases (decreases) as the lead decreases. This indicates that for the larger leads, the influence of subtropics may be more important and for the shorter leads, tropics may play the major role. Significant SST forecast error results from the improper surface energy budget due to the ill representation of clouds (Zheng et al. 2011). The forecast error over the eastern Pacific basin near the Peru coast is solely due to under-representation of marine stratocumulus clouds (which reflects most of the shortwave radiation; e.g. Yoo et al. 2013). This is common to most of the coupled models of the present genre (Zheng et al. 2011). Ocean dynamics may also play some role due to less upwelling of cold water from beneath. For the rest of the region, over-estimation of model simulated low and middle level stratus/stratocumulus clouds may give rise to cold SST bias. The predictability errors are mainly confined over the east and central equatorial Pacific regions, north and northwest Pacific and Arabian Sea regions (Fig. 10f–j). In case of SST (Table 8), forecast error at WEPO (range 0.51–0.61 °C) is least as compared to EEIO (range 0.72–0.90 °C) and EEPO (range 0.85–0.95 °C). However, the percentage contribution of predictability error is maximum (range 26–55 %). Thus, WEPO region has the initialization error which is even larger than the model error at L2. Thus, accurate initial conditions have a greater role at some particular location and at particular lead as compared to model error.

Fig. 10
figure 10

Same as Fig. 8, but for SST

Table 8 Same as Table 7, but for SST’s

7 Summary and conclusions

This study tries to explore the sensitivity of the initial conditions in terms of lead time for the seasonal retrospective forecast of rainfall and SST for the summer season (JJAS) in the NCEP CFSv2. The main focus of this study is to find out in detail the regions and factors responsible for ISMR predictability in terms of model errors and initial condition errors. In general predictability of ISMR is controlled by the Indian and the Pacific Ocean basins generated SST variability. Our study region consists of both these basins.

Model climatology in terms of seasonal mean and the inter-annual standard deviations are realistic and very close to that of the observed ones for rainfall, upper and lower level circulations and SST in all the leads, implying CFSv2 prediction is consistent at all the leads. Seasonal biases in precipitation, large-scale circulation at lower and upper levels and SST have a common tendency to decrease with the decrease in the forecast lead-time, clearly exhibiting the model drift. The dynamical consistency of biases among all the variables augers well for the possibility of their rectification as these biases may be traced back to particular parameter(s) which may contribute to major part of the model error (Slingo and Palmer 2011). If we consider parameter wise bias, model simulated rainfall tends to be overestimated (underestimated) over oceanic (land) region. Overall rainfall tends to be more skillful at the least lead time (L0) as indicated by the large grid wise average correlation values at L0. However, bias in general does not follow the same trend. In case of the Indian land region, this correlation value is highest at L3 (0.21), despite having largest bias (−2.35 mm/day). L0 too has the same correlation (0.21) but with the least bias (−1.2 mm/day) over all the leads. Similarly, the bias of lower level wind at 850 hPa increases (decreases) as the lead decreases for the EEIO and WEPO (EEPO) regions. However, in the case of upper level winds at 200 hPa, the bias always decreases as the lead decreases over all the study regions. SST correlations constantly increase with the decreasing lead, but model cold SST bias seems to be pervasive and identical among all the leads with some regional exceptions. The region of the positive (negative) SST bias corresponds to the negative (positive) total cloud cover bias (figure not shown). This clearly hint at the role played by total cloud cover to significantly modulate the SSTs through the surface energy budget. Thus, the increase (decrease) in the bias with a decrease in lead in all parameters indicates that CFSv2 has best forecast skill at L0. However, the Indian subcontinent stands out as an exception.

The ISMR has maximum skill in L3, followed by L1 in all the independent data. These highest skill scores are 0.64, 0.42 and 0.57 with respect to GPCP, CMAP and IMD rainfall respectively for L3. However, two most important monsoon indices in terms of large-scale zonal wind shear (WY index) and SST over Niño3.4 region show maximum skill at L1 and L0 respectively. Further, there exists a strong spring predictability barrier of Niño3.4 and IOD index at L3. In spite of having better wind patterns, SST (according to WY, Niño3.4 index) in L1 and L0 and SPB in L3, ISMR prediction skill is maximum in L3. This maximization of ISMR skill at L3, despite significant ENSO SPB and better WY and Niño3.4 index at L1 and L0 adds one more dimension to the problem. Thus, predictability of ISMR is perhaps controlled by the factors other than ENSO and IOD, which requires further detailed study. This is addressed in other manuscript in terms of both diagnostic and prognostic potential predictability using initial SST, snow and soil moisture conditions (Saha et al. 2015).

Furthermore, in CFSv2, most of the errors are contributed by the model imperfection and the initial conditions (as shown by the predictability error), has substantially less error. This is also common with the previous version of the model (DK10). Regions where forecast errors are high coincides with the regions having large negative/positive model bias. In general, forecast error tends to decrease with the decrease in the lead, except over a few regions. Model simulation at larger leads tends to have the least predictability error. The percentage contribution of the initial condition-based error to the model total error increases as the lead decreases and attain maximum value generally at L2. The SST predictability error is large over the eastern side of the equatorial Pacific basin and over the central north Pacific Ocean. Almost all the parameters show that the maximum predictability error occurs at L2 and over the equatorial western Pacific basin. This seems to be very important in the perspective of prediction of ISMR skill due to strong teleconnections. Similar to the bias, the predictability error of rainfall, zonal 850 hPa winds and SST are also dynamically consistent.

This study may be helpful in identifying the regions where model error and initial condition error may significantly affect the ISMR prediction. However, it has raised more queries and pave the way forward for detailed study on the reasons behind the L3 skill being maximum for the prediction of ISMR.