1 Introduction

The spatio-temporal variability of Indian monsoon rainfall has received enormous attention since the early studies of Blanford (1884) and Walker (1910) more than a century ago. Variability and predictability is often estimated using the India-averaged seasonal rainfall total over the 4 wettest months from June to September. This index is usually referred to as “All India Rainfall” (AIR hereafter) with drought and flood years defined as departures of ±10% from the long-term mean (Mooley and Parthasarathy 1984; Parthasarathy et al. 1993, 1995; Gadgil 2003). Spatio-temporal averaging reduces local-scale variations while integrating any common or recurrent variations among stations or grid-points, across the season. The integration in space and time reduces the unpredictable weather noise in the context of boundary-forced seasonal climate predictability, helping to isolate large-scale climate signals (Charney and Shukla 1981; Meehl 1987, 1997; Shukla and Mooley 1987; Palmer et al. 1992; Shukla 1998).

The seasonal predictability of AIR when estimated from antecedent sea surface temperature (SST) anomalies is small to moderate (Hastenrath 1988; Gowariker et al. 1991; Palmer et al. 1992; Webster et al. 1998; Sperber et al. 2000; Oelfke Clark et al. 2000; Wang et al. 2015; Li et al. 2016) and may be epoch dependent (Krishna Kumar et al. 1999). This is usually attributed to the weak and variable nature of El Niño Southern Oscillation (ENSO) teleconnections (Krishna Kumar et al. 1999), as well as to the spring predictability barrier of ENSO (Balmaseda et al. 1995). It may also be partly because the seasonal rainfall sum at a location is not necessarily the most predictable statistic of daily rainfall. If, for example the wettest days of the season are dominated by local noise, those large random amounts can diminish the predictability of the seasonal rainfall total. Large daily amounts falling near the peak of the monsoon may hide spatially coherent predictable interannual signals occurring near the start or end of the season.

Spatial coherence gives an empirical upper bound of potential predictability, if we assume that any large-scale forcing should imprint a quasi-homogeneous anomaly at regional scale. Previous studies (Webster and Yang 1992; Rupa Kumar et al. 1992; Goswami and Xavier 2005; Xavier et al. 2007; Moron et al. 2012; Rajagopalan and Molnar 2012, 2014) have shown that potential predictability may be larger at the beginning or end of the monsoon season. For example, Rajagopalan et al. (2012) demonstrated that for central India and its west coast, early (May 15–June 20) and late (September 15–October 20) rainfall anomalies are significantly correlated with Pacific Ocean SST anomalies. Moron et al. (2012) found that the strength of the annual cycle, defined from the leading data-adaptative filtered mode of outgoing longwave radiation and low tropospheric winds, is significantly different between anomalously dry and wet monsoon seasons, particularly from mid-June to late July and again during September. Moron and Robertson (2014) have also shown that the leading spatiotemporal mode of local-scale onset date variability across India is related to ENSO with warm (respectively cold) ENSO events in May delaying (respectively hastening) the local scale onset, especially along the Western Ghats and over eastern monsoonal India. These previous studies (Moron et al. 2012; Moron and Robertson 2014; Rajagopalan and Molnar 2012, 2014) suggest that spatial coherence and potential predictability peak at the beginning and end of the monsoon season, while decreasing during the core of the season. However, this has never been shown explicitly to the best of our knowledge.

The goal of this paper is to investigate the potential predictability of Indian monsoon rainfall at fine spatial scales through the analysis of the spatial coherence of interannual rainfall anomalies in different parts of the monsoon season, using a 0.25° daily rainfall 1901–2014 dataset (Sect. 2). In Sect. 3 we contrast the early/late parts of the season with the monsoon’s core phase, finding that spatial coherence of interannual anomalies drops during the core phase. We then break down 31-day running averages of rainfall amount into running averages of daily rainfall frequency and mean daily rainfall intensity. It is shown that the spatial coherence of interannual anomalies of mean intensity as well as its relationship with the leading mode of monsoon circulation, is much smaller than for rainfall frequency. Further, we find that interannual anomalies in rainfall amounts are more strongly controlled by intensity variations during the core phase, explaining the drop in potential predictability then. It is concluded that seasonal (and sub-seasonal) forecasts have more potential to be skillful at local and regional scales during the early and late phases of the monsoon season. A discussion and conclusion close the paper (Sect. 4).

2 Data

2.1 Daily rainfall data

The study uses Indian Meteorological Department (IMD) high resolution (0.25° × 0.25°) gridded daily rainfall data for the period April 1st—November 30th 1901–2014 (Pai et al. 2014). The gridded data were prepared by interpolating daily rainfall data from 6955 stations across India (with varying data availability periods) using distance-weighted interpolation (Shepard 1968). The interpolated values were computed as the weighted sum of the station data within a search radius of 1.5°. The scheme was locally modified by including directional effects and barriers (Shepard 1968). Seasonal total rainfall amounts are decomposed into the frequency of wet days ≥1 mm and the mean intensity of rain (defined as the total amount divided by number of wet days). Local-scale monsoon onset date is defined following the “agronomical” guidelines (Sivakumar 1988), to be the first wet day of the first significant wet spell without a subsequent dry spell (Boyard-Micheau et al. 2013; Moron and Robertson 2014). This general definition fits the farmer’s practice of waiting for the first wetting rains to allow sowing (Sivakumar 1988). A subsequent dry spell is detrimental to seedlings and signifies a “false” onset (Flatau et al. 2001) associated with isolated rainfall before the start of the monsoon, which is eliminated using the second criterion. In this context, Moron and Robertson (2014) defined the local-scale onset as the first wet day of the first 5-day wet sequence after April 1, receiving at least the climatological mean 5-day wet spell amount over the April 1–November 30 season, without a 10-day dry sequence, receiving less than 5 mm, occurring in the following 30 days. End dates of the season are defined symmetrically as the last wet day of the last 5-day wet sequence from November 30 without a 10-day dry sequence in the previous 30 days. Both definitions thus filter out isolated wet sequences before and after the main rainy season. The main findings presented below were not found to be sensitive to omitting the false onset/end criterion (not shown).

2.2 Atmospheric data

Atmospheric data were obtained from the NCEP/NCAR reanalysis (Kalnay et al. 1996). Daily winds and temperature at 850 hPa, vertical velocity at 850 and 500 hPa and precipitable water were extracted between 1948 and 2014.

3 Results

3.1 Mean monsoonal evolution

Figure 1 shows the 5-cluster solution of a k-means analysis of the climatological mean seasonal cycle at each grid point computed as the daily mean over the April-November 1901–2014 period, low-pass filtered using a 30-day recursive filter. The grid-points belonging to a given cluster are almost always contiguous. Most of India is associated with a classical monsoonal regime (black dots in Fig. 1) with an onset around late May to early June (Fasullo and Webster 2003; Gadgil 2003; Moron and Robertson 2014), followed by a steep increase to the annual maximum in late July and then a somewhat slower decrease with ≤2 mm/day from mid-October. This region is often referred to as “Monsoonal India” (MI hereafter) (Gadgil 2003). Northeastern India, as well as a narrow coastal strip along the Arabian sea south of 17°N is associated with a similar regime, but with a longer and more intense rainy season (orange dots in Fig. 1). The rainfall here is especially larger than in monsoonal cluster at the start of the rainy season in April–May. Northeastern India may be part of South-East Asian monsoon starting earlier than the Indian monsoon itself (Wang and LinHo 2002; Wang et al. 2004; Yoo et al. 2010; Krishnamurthy and Shukla 2008). The middle interior of peninsular India (red dots in Fig. 1) experiences a much drier monsoonal regime; the onset is gradual here and rainfall remains moderate during summer, with a small peak in late September during the southward retreat of the summer monsoon. This regime is also seen for 5 grid-points in the south of Assam state, near the border with Burma and Bangladesh. The extreme southeast of peninsular India (blue dots on Fig. 1) experiences the northeast monsoon with a relative maximum in late October and early November, while rainfall is weak in summer. The maximum is related to NE winds traversing the warm Bay of Bengal when ITCZ retreats southward, but also to tropical depressions and cyclones forming there in boreal fall. Lastly, the far northeast of India and 3 grid-points near Burma are associated with dry conditions (green dots in Fig. 1). The summer monsoon is negligible here and the highest rains fall in boreal spring, and probably in winter (Yadav 2006).

Fig. 1
figure 1

5-cluster solution of a k-means clustering of the daily mean rainfall low-pass (cut-off = 1/30 cycle-per-day) filtered by a recursive filter. The low-pass filtered daily mean are standardized to zero mean and unit variance and the leading EOFs accounting for 75% of the total variance are subjected to k-means clustering. The classifiability index shows a clear maximum for 5 clusters (CI ≥0.99). The panels shows the low-pass filtered mean rainfall (in mm/day) for each cluster (full line in the same color as the grid-points belonging to the cluster) with dashed line indicating ±one standard deviation amongst the grid-points

The mean onset and end dates computed are displayed in Fig. 2. The mean onset date (Fig. 2a) shows a well-known pattern (Wang and LinHo 2002; Gadgil 2003; Moron and Robertson 2014) across MI (indicated by a black line in Fig. 2), with early onsets across the western Ghats and over northeastern part of MI, progressing northwestward toward the Thar Desert. The mean onset date across MI occurs on June 27 (June 15 without false start criteria, not shown). The end of the monsoon (Fig. 2b) is broadly symmetric, starting over the Thar Desert, then contracting to the southeast. The mean end date across MI occurs on September 8 (September 23 without the false end criteria, not shown). Our mean onset and withdrawal dates are thus rather close to the Indian Meteorological Department (IMD) estimates (see criteria and normal dates for onset and withdrawal at http://www.imd.gov.in/pages/monsoon_main.php). The mean onset (resp. withdrawal) dates do disagree over Kerala (resp. NW India close to the Thar desert) possibly because IMD uses other information besides local-scale rainfall and starts its computation from May 10 for onset and from September 1 for withdrawal; ours are both computed from April 1, with the latest possible withdrawal being set to November 30.

Fig. 2
figure 2

Climatological average of the summer monsoon onset and end calendar dates (month/day) (see Sect. 2.1). The black contour denotes the boundaries of Monsoonal India (black dots in Fig. 1)

Indian monsoon onset and withdrawal dates have been also estimated using regional-scale atmospheric indices, including low-level winds (Syroka and Toumi 2004), vertically-integrated moisture flux (Fasullo and Webster 2003) and tropospheric thermal gradient (Goswami and Xavier 2005; Xavier et al. 2007). These atmospheric indices refer primarily to the meridional shift of the regional-scale monsoon circulation and the associated rainbelt. The mean onset date is rather similar among these definitions, between May 27 (Syroka and Toumi 2004) and June 4 (Fasullo and Webster 2003), reflecting the abruptness of the start of the monsoon. Late May–early June (light green on Fig. 2a) corresponds in the timing of onset along the western Ghats between 12°N and 15°N and over western Bengal and coastal Orissa. The mean withdrawal date reported in these studies is more variable, between September 7 (Fasullo and Webster 2003) to October 19 (Syroka and Toumi 2004), consistent with the gradual southward shift of the rainbelt (Sperber et al. 2000; Gadgil 2003). Early September withdrawals (dark green on Fig. 2b) correspond to a large sector in northwest and north-central India, while mid-October (yellow and orange on Fig. 2b) withdrawals are observed over coastal Orissa and at the SE tip of India.

The mean seasonal cycle of rainfall over MI is asymmetric (Sperber et al. 2000; Gadgil 2003; Goswami and Xavier 2005; Xavier et al. 2007) with a steep ramp-up around midMay and a more gradual decrease from the annual peak around late-July and early-August (Fig. 3a). Rainfall is close to zero from late October onwards. Five stages can be defined from this mean evolution : pre-monsoon (April 1–May 15), early (May 16–June 30), core (July 1–August 31) and late (September 1–October 31) phases of the summer monsoon, and post-monsoon (November 1–November 30). Figure 3b–e shows the climatological mean of atmospheric variables spatially averaged over MI (after having interpolated the NCEP grid onto a 0.25° grid). The precipitable water (Fig. 3b) as well as the 500 hPa omega vertical velocity (Fig. 3d) exhibit a similar behavior to the rainfall time series, including the asymmetry between the early and late phases. The vertical velocity (Fig. 3c) and temperature (Fig. 3e) at 850 hPa reveal a different behavior since the pre-and early monsoon are associated with hot temperatures and low-level ascent confined to low levels. The 850-hPa temperature decreases from late May when rain starts (Fig. 3e), while the low-level ascent remains quite similar through the early and core phases. The late phase is associated with a gradual cooling and decreasing ascent (and increasing subsidence from mid October).

Fig. 3
figure 3

Spatial average of daily a rainfall (in mm), b precipitable water (in kg/m2), c omega vertical velocity at 850 hPa (in Pa/s) and d at 500 hPa (in Pa/s), and e temperature at 850 hPa (in K) across Monsoonal India (MI, defined from Fig. 1). The black line is the unfiltered daily average while the blue lines are the interannual standard deviations of the MI spatial average of each variable. Atmospheric data from NCEP reanalyses are linearly interpolated onto the 0.25° grid of the rainfall dataset and grid-points above 3000 m a.s.l. are excluded from the spatial average. The vertical red lines show the temporal phases of the summer monsoon (see text). The values shown in each panel is the interannual standard deviation of the time-averaged time series on the 3 sub-seasonal phases

The atmospheric variables shown in Fig. 3 can be physically related to the occurrence and mean intensity of daily rainfall. As elsewhere in the tropics, rainfall is mostly convective across India associated with various atmospheric phenomena from individual thunderstorms, referred to as mesoscale γ by Orlanski (1975) (i.e. horizontal scale between 2 and 20 km), to individual cloud clusters (meso-scale β according to Orlanski (1975) with an horizontal scale between 20 and 200 km), organized meso-scale convective clusters or complexes (MCCs) (Laing and Fritsch 1993), and synoptic systems such as tropical depressions and cyclones (Mooley 1973; Mitra et al. 1997; Goswami et al. 2003; Mohapatra et al. 2003) both belonging to mesoscale α according to Orlanski (1975) (i.e. horizontal scale between 200 and 2000 km), and the tropical convergence zone at macro-scale >2000 km (Godbole 1977; Gadgil 2003; Francis and Gadgil 2006). The occurrence and intensity of these phenomena are modulated by the seasonal cycle as well as intraseasonal oscillations (Webster and Yang 1992; Webster et al. 1998; Goswami et al. 1999; Krishnamurthy and Shukla 2000, 2007, 2008; Goswami and AjayaMohan 2001; AjayaMohan and Goswami 2007; Yoo et al. 2010; Moron et al. 2012).

Complex atmospheric processes are involved in triggering and then sustaining deep convection (Sherwood 1999). Deep conditional instability is required to initiate cumulus development and depends primarily on the lapse rate and humidity of rising air parcels. Numerous studies, mostly over tropical oceans (Bretherton et al. 2004; Peters and Neelin 2006; Neelin et al. 2009; Muller et al. 2009) have demonstrated a non-linear relationship between precipitable water and rainfall at daily time scale. In particular, a moist low- to mid-troposphere helps to trigger deep convection (Sherwood 1999; Sobel et al. 2004), while the upper troposphere tends to moisten after rainfall events. Figure 4a compares the daily values of bin-averaged (bins of 1 mm) precipitable water (PW hereafter) spatially averaged over MI with the distribution of local-scale rainfall across MI. Rainfall is almost absent for MI-averaged PW ≤25–30 kg/m². The mean local-scale rainfall increases slowly for PW less than 34–35 kg/m², then increases faster. This mean increase is also associated with an increasing spread of local-scale daily rainfall, i.e. the same value of spatially-averaged PW is associated with a larger range of local-scale daily rainfall value across the MI domain. For example, with a value of 34–35 kg/m² (i.e. the climatological average for both early and late phases of the monsoon, Fig. 4b, c), the upper 90th percentile of local-scale daily rainfall is less than 8 mm ; with a value of 49 kg/m² (i.e. the climatological average for the core phase of the monsoon, Fig. 4b, c), the 75th and 90th percentiles are around 12 and 28 mm respectively (Fig. 4a). The core of the monsoon season with a lot of precipitable water is associated with both larger daily mean rainfall intensity, as well as with more variable daily amounts, compared to the early and late stages of the monsoon.

Fig. 4
figure 4

a The distribution of local-scale rainfall (ordinate in mm/day) vs spatially-averaged precipitable water (abcissa in kg/m2) over Monsoonal India (MI, defined from Fig. 1) computed on a daily basis. The entire period between April 1 to November 30 over 1948–2014 is taken into account. b Distribution of daily precipitable water spatially-averaged over MI for the early (May 16–June 30), core (July 1–August 31) and late (September 1–October 31) stages of the monsoon. c Distribution of interannual time-averaged precipitable water spatially-averaged over MI for the early, core and late stages of the monsoon

3.2 Interannual variability of daily rainfall characteristics

Figure 5 shows the climatological means of seasonal rainfall amount, occurrence frequency and mean intensity along with the leading empirical orthogonal function (EOF) of their interannual variations. The climatological averages of amount (Fig. 5a) and frequency of wet days (Fig. 5b) have very similar spatial structures (pattern correlation = 0.87 for all India and = 0.84 for MI), with wettest conditions along the western Ghats and over northeastern India, including eastern part of MI, as well along the Himalayan foothills, and driest ones over the Thar desert in northwestern MI. Spatial variations are large across MI for both the amount (mean = 875 mm, sd = 499 mm) and rainfall frequency (mean = 57 days, sd = 21 days). This contrasts with the mean daily intensity (Fig. 5c) which is more uniform across MI (mean = 14.8 mm/day, sd = 3.5 mm/day), except on the windward slopes of the Western Ghats (and the Himalayan foothills) that bisect the westerly (southerly-southeasterly) monsoon flow (Francis and Gadgil 2006). The mean daily intensity also tends to be relatively higher over mountains and high plateaus, even the subtle ones south of the Gangetic plain (Fig. 5c). Since the probability distribution of rainfall on wet days broadly follows an exponential distribution, the mean daily intensity gives an indication of the overall distribution of daily rainfall. The inverse of the mean corresponds to the scale parameter of the exponential distribution, while its variance is the square of the mean. Thus, a larger mean intensity directly implies more variance and higher rainfall amounts on extremely wet days, or more frequent extremely wet days above a threshold.

Fig. 5
figure 5

Upper row mean climatology of June–September rainfall a amount (mm), b frequency of wet days (≥1 mm; days per 122-day season) and c mean daily intensity (mm/day). The black contour outlines Monsoonal India (MI defined from Fig. 1) as in Fig. 2. Lower row first EOF of July–September d amount, e frequency of wet days and f mean daily intensity, expressed as correlations between the first PC and the original variables between 1901 and 2014. The explained variance is indicated in the title of the subpanels (df) as the percentage of MI area having significant correlations at p = 0.01 (S < 0.01) according to a random-phase test (Janicot et al. 1996)

The leading interannual EOFs of rainfall are similar between the amount (Fig. 5d) and frequency (Fig. 5e) (pattern correlation = 0.76 for all India and 0.58 for MI only) and the highest positive loadings coincide with the geographical extent of the monsoonal regime of Fig. 1, especially for the frequency of wet days (Fig. 5e). The leading EOF explains more variance in frequency than in amount and loadings are higher and more uniform for frequency while amount loadings decrease over northeastern MI (Bedi and Bindra 1980; Singh 1999; Guhathakurta and Rajeevan 2008) (Fig. 5d). In contrast, the leading EOF of mean daily intensity (Fig. 5f) explains only a small amount of variance, with highest loadings located outside MI over Kashmir.

At local-scale, year-to-year changes in mean daily rainfall intensity are the largest contributor to the interannual variability of rainfall amounts (Table 1). This can be seen by calculating the spatial average of local correlations between daily mean intensity and amount which is larger (by about 19%) than that between rainfall frequency and amount. This is especially pronounced (31%) during the core phase. By contrast, the highest correlations between the respective spatial averages, calculated as a standardized anomaly index (SAI, (Katz and Glantz 1986)), are found between amount and frequency (Row 3 vs 6 of Table 1). This is consistent with the corresponding leading EOFs (Fig. 5d, e). As argued previously (Moron et al. 2007), the regional-scale seasonal (or sub-seasonal) anomalies of amount reflect greater spatial “synchronization” of anomalies of rainfall frequency of wet days rather than for mean daily intensity. In other words, an anomalously dry (wet) season across MI as a whole is primarily due to fewer (more) wet days than usual, pooled over locations, while an anomalously dry (wet) season at local scale is primarily associated with less (more) rain on wet days. Spatial averaging and the leading EOF emphasize the common signal among the grid-points, which is the interannual variations of the frequency of wet days. This also explains the small variance explained by the leading EOF of intensity across MI (Fig. 5f), because it reflects the large amount of spatial noise in rainfall intensity.

Table 1 Interannual correlations between rainfall statistics across Monsoonal India (MI, defined from Fig. 1), by phase of the monsoon. All computations are done over the 1901–2014 period

In summary, this section demonstrates that seasonal frequency of wet days conveys the spatially-coherent component of interannual variations of amounts, while the mean daily intensity is dominated by spatial noise which decreases the signal-to-noise ratio of interannual variations in amount. Further, the correlations in Table 1 imply that the seasonal rainfall total at local scale is likely to be less predictable than the seasonal frequency of wet days, and that some spatial aggregation is essential to isolate the potentially predictable signal.

3.3 Sub-seasonal modulation of spatial coherence

The sub-seasonal modulation of spatial coherence of rainfall is first corroborated using correlograms. Figure 6 displays spatial autocorrelation functions for rainfall frequency, mean intensity and amount, during the three sub-seasonal monsoon stages, and for averaging periods of 5, 11, 21, 31 days. As found in previous studies (Moron et al. 2006, 2007), rainfall frequency spatial autocorrelation values (Fig. 6a, d, g) generally increase systematically with the temporal averaging period, especially for early and late stages of the monsoon, suggesting a common large-scale forcing acting throughout the monsoon season. The increase of spatial autocorrelation of frequency almost saturates above a duration of 11 days during the core stage, with a decorrelation distance of about 500 km (Fig. 6d). In contrast, there is almost no impact of temporal averaging on mean intensity (Fig. 6b, e, h): the decorrelation distance is always close to 80–100 km throughout the three sub-seasonal stages. This horizontal scale corresponds to cloud clusters, according to Orlanski (1975). The picture for rainfall amount (Fig. 6c, f, i) is intermediate between frequency and intensity. The temporal integration is less efficient during the core (Fig. 6f) than during early (Fig. 6c) and late (Fig. 6i) stages. These results demonstrate that progressive integration in time, as the season goes by, strengthens the interannual anomalies in rainfall frequency, while anomalies in the daily mean intensity—strongly impacted by the wettest days—are largely random and tend to cancel out in time and space. The larger spatial scale for rainfall frequency during early and late stages may be partly attributable to the onset and end of the monsoon. A delayed onset, for example, will cause less wet days than usual during the early stage of the monsoon.

Fig. 6
figure 6

Spatial autocorrelation (ordinate) vs horizontal distance (abcissa, in km) for rainfall frequency (left), mean intensity (middle), and amount (right). Values are averaged for successive 30-km bins (1–30 km, 31–60 km, etc.) across Monsoonal India (defined from Fig. 1), for early (May 16–June 30), core (July 1–August 31) and late (September 1–October 31) monsoon phases. The different curves denote increasing temporal averaging periods: running 5-day, 11-day, 21-day, 31-day and all season (colors defined in a). The correlations are computed between anomalies (in mm/day) with respect to the climatological mean of each monsoon phase, over the 1901–2014 period. The vertical colored lines give empirical estimates of the decorrelation distance (corresponding to the last correlation ≥1/e). When many decorrelation distances overlap, only the longest time scale is displayed

Figure 7 shows the correlations between local-scale onset, end dates, length of the season (defined as the difference between end and onset dates plus one) and June–September rainfall amounts, with the correlations between the MI spatial averages given in the panel insets. MI-average amounts during the early and late stages are strongly related to the onset and end dates in the expected sense. At local scale, the relationship is generally much weaker, especially for onset date, though still generally statistically significant at p = 0.01 (Fig. 7). This noisy pattern may be a bi-product of defining onset and end dates in terms of a single 5-day wet sequences not followed by a 10-day dry spell. The pattern of local-scale correlations and correlations with SAI are similar when onset/end are computed without the false onset/end criteria (not shown). The local-scale correlations between seasonal (May 16–October 31) amount and length of the rainy season (Fig. 7c) reveal significant positive values across the whole country, especially over MI. The lower panels of Fig. 7 suggest a stronger link between rainy-season length and withdrawal dates than with onset dates, while onset and withdrawal dates are basically independent to each other (Fig. 7f).

Fig. 7
figure 7

Local-scale correlations between onset, end dates, length of the rainy season (defined in Sect. 2.1), amount of rainfall during the early (May 16–June 30), late (September 1–October 31) and full (May 16–October 31) monsoon stages. Insignificant correlations at p = 0.01 are masked out as light gray. The black contour again delineates Monsoonal India (MI, defined from Fig. 1). The R(SAI) values in each panel are the correlation between the MI spatial averages of the local-scale standardized anomalies. S (<0.01) gives the percentage area of MI having significant correlations at p = 0.01 according to a random-phase test (Janicot et al. 1996)

We use an empirical measure of covariance among a set of grid-points, namely the number of degrees of freedom (DoF) (Fraedrich et al. 1995; Moron et al. 2007). The DoF is a “global” metric of covariance in the sense that it is computed from the whole eigenvalue spectrum of the correlation matrix between grid-points. The DoF varies between 1, for perfectly in- or out-of phase co-variations, to the rank of the matrix for completely independent co-variations among the locations. Due to the finite size of the matrix (i.e. 114 rows × 3202 columns for interannual variations of standardized anomalies of any characteristic of the rainy season across Monsoonal India), the maximum possible DoF equals 109 only. Figure 8a shows the seasonal modulation of DoF computed from running 31-day windows for MI rainfall amount, frequency and mean daily intensity. Figure 8b shows the climatological mean of MI rainfall frequency and intensity for each calendar day while Fig. 8c shows the distribution of onset and end dates. The climatological sub-seasonal DoF variations are similar between amount and frequency, with the curve for frequency being smoother and flatter than the one of amount (Fig. 8a). The spatial coherence of both amount and frequency peaks (i.e. low DoF) around late May, just before the local-scale onsets (Fig. 8c) and then again just after the end of the monsoon (Fig. 8c) from late September to mid-October, while the spatial noise in amount is larger (i.e. high DoF) close to the climatological rainfall peak in July–August. The behavior is strikingly different for the mean daily intensity where the number of DoF is much larger and the noise peaks around late June and early-mid July, just before the climatological peak of the monsoon, while the minima around late May and mid-October are not clearly seen (Fig. 8a). The seasonality of DoF of mean intensity tends to match the seasonality of mean daily intensity itself (Fig. 8b). Thus the heaviest rainfall, just before and near the climatological peak (Fig. 3a) is also the spatially noisiest, and this noise appears to contaminate the seasonal amounts themselves.

Fig. 8
figure 8

Average seasonal evolution of: a The number of degrees of freedom (DoF; see text) for running 31-day rainfall amount (black), frequency (red), and mean daily intensity (blue) across Monsoonal India (MI, defined from Fig. 1). b Mean daily intensity (blue, left ordinate) and relative frequency of wet days (red, right ordinate) across MI. The climatological means are computed from the whole sample of 114 years and 3202 grid-points included in MI. c Daily PDF of local-scale onsets (orange curves) and ends (green curves) of the rainy season across MI. The onset (respectively end) is defined without (full curves) and with (dashed curves) the false start/end criteria (see Sect. 2.1)

Another view of the interplay between rainfall frequency and mean intensity is provided by Fig. 9. The DoF of wet-day frequency increases almost linearly with the amount used to define the wet days (Fig. 9a). Figure 9a also shows that the noise, for the same threshold, is larger during the core than for the early and late phases of the monsoon. This may be because the whole PDF of wet days is shifted to the right during the core of the monsoon (Fig. 9b). The proportion of wet days receiving a large amount of rainfall and their contribution to the total amount of rainfall are naturally higher for the core than during the early and late phases of the monsoon (Fig. 9b, c).

Fig. 9
figure 9

a Number of interannual degrees of freedom (DoF) of wet-day frequency above a threshold rainfall amount (in mm on abscissa; the first value plotted is for 1 mm) for the whole season (black) and the three sub-seasonal stages (colors) across Monsoonal India (MI, defined from Fig. 1). b Proportion of total frequency of wet days wetter than the daily rainfall threshold for the same ranges, averaged over MI. c As b but for the proportion of total rainfall amount falling on wet days above the threshold

Figure 10 shows a final statistical property of daily rainfall, depicting the mean spatial extent of contiguous wet grid cells (referred to as “wet patches” hereafter) as a function of the maximum rainfall received at the center. We select each grid-point receiving at least 10, 20, 50 and 100 mm of daily rainfall and compute the mean amount of rainfall at each 0.25° grid-point within a 20° square centered on this grid-point. Note that the meso-scale γ, 2–20 km (i.e. an individual thunderstorm, Orlanski 1975) is basically sampled here as a single grid cell. If we consider the 20 mm isohyet, the average size of a wet patch receiving ≥20 mm increases by about a factor of 3 with the central maximum rainfall (Fig. 10) from roughly 105 sq. km to roughly 3 × 105 sq. km (daily maximum rainfall ≥100 mm), corresponding to approximate length scales of 300–1000 km. The mean extent of MCCs identified by (Laing and Fritsch 1993) from satellite data over the Indian subcontinent during the period April–December 1988 was also 3 × 105 sq. km for a mean lifetime of 9 h. We thus conclude that our detection method, at least with a rainfall threshold of 20 mm, captures the spatial fingerprint of MCCs and synoptic systems (Orlanski 1975; Laing and Fritsch 1993; Goswami et al. 2003). A threshold of 10 mm leads to a far larger extent and fuzzier pattern. A weak elongated extension on the southern side of wet patterns for amount larger than 100 mm (Fig. 10) is due to the specific case of Western Ghats : a lot of wet patches there are elongated along the N-S mountain chain, especially for wet patches ≥50 or 100 mm. We can not conclude if this is an artefact associated with the fact that only land data are analyzed here or a physical effect. Anyway, it slightly distorts the quasi-circular shape observed elsewhere. The area covered simultaneously (at daily time scale) by rainfall larger than 50 or 100 mm is always tiny ≤2° in radius even when central maximum ≥100 mm, i.e. of order of meso-scale β of cloud clusters according to Orlanski (1975) on average (Fig. 10) and close to the decorrelation distance of mean intensity (Fig. 6b, e, h). The sub-seasonal variations show similar sizes for a given threshold across the monsoon season, at least for the early and core phases (Fig. 10) when a 20 mm isohyet is considered. The mean extent weakly peaks during the core of the season and slightly decreases at the end of the monsoon (Fig. 10).

Fig. 10
figure 10

Average rainfall amount (shading, mm) associated with wet patches over Monsoonal India (defined from Fig. 1) receiving at least 10, 20, 50 and 100 mm (columns) at their center, according to time range (rows; whole monsoon season at top). Each rectangular plot represent a 20° latitude-longitude square about the central point. The black contours in the lower panels show the 10, 20, and 50 mm isohyets for the whole season, for reference. The number of 0.25° grid cells receiving at least 10 and 20 mm is indicated on each panel. Note that a 0.25° grid cell at 24°N (the mean latitude of MI) covers 706 sq. km

In summary, the drop in regional-scale coherence of interannual rainfall anomalies during the core monsoon phase is primarily related to the seasonal increase in the number and intensity of wet days. The wettest days contribute disproportionately to interannual variations of amounts but they occur largely at random in space and time—they are not “synchronized” by climate anomalies such as ENSO. The average of size of wet patches is controlled by meteorological phenomena (MCCs and synoptic systems) and no systematic reduction in the size of wet patches is seen during the core phase.

3.4 Links between rainfall and atmospheric circulation

The previous sections demonstrate that sub-seasonal rainfall anomalies during the core of the monsoon are less spatially coherent than those during the early and late phases. The leading combined EOF of unfiltered daily winds in the lower (850 hPa) troposphere, the precipitable water and the vertical velocity at 500 hPa is shown in Fig. 11. This EOF includes the seasonal cycle (May 16–October 31) and accounts for 21.9% of total combined variance (the second and third modes, not shown, explain respectively 6.5 and 5.5% of the total combined variance). Most of its temporal variance is related to the interannually modulated seasonal cycle (Moron et al. 2012) also with a small contribution from intraseasonal and faster variations (not shown). Positive values of this mode are associated with monsoonal flow with westerly winds south of 20°N veering around the monsoon trough extended from the Bay of Bengal to NW India (Godbole 1977) with regional-scale ascent peaking across MI. The implied advection of moisture toward most of MI feeds the deep convection associated with monsoon (Sikka 1980; Sperber et al. 2000; Gadgil 2003; Fasullo and Webster 2003). The corresponding principal component time series is hereafter referred to as the Monsoon Circulation Index (MCI).

Fig. 11
figure 11

Leading EOF of a unfiltered daily 850-hPa winds, precipitable water, and b omega 500-hPa vertical velocity, computed from NCEP Reanalyses (from May 16, 1948 to October 30, 2014). Grid-points left in blank (a.s.l ≥3000 m) are not used to define the leading EOF. Loadings are expressed as correlations between the corresponding principal component and the standardized anomalies of the original variables

Figure 12 shows the correlations between the MCI index and gridded rainfall amount, frequency and mean intensity for the whole monsoon season and its 3 phases. The spatial patterns of correlation for amount (Fig. 12a–d) and frequency (Fig. 12e–h) broadly match with the leading EOF of rainfall amount and frequency (Fig. 5d, e), demonstrating the physical linkage between the leading mode of interannual variability of amount and frequency and the regional-scale monsoon circulation. The correlations between the MCI index and the SAI index of MI-averaged rainfall amount are high and quite stable across the season, ranging from 0.66 (early monsoon) to 0.78 (late monsoon). However, the area with significant local-scale correlations drops substantially during the core phase over MI (Fig. 12c). The correlations are also close to zero over the northwestern part of MI during the early phase, but this corresponds approximately with the area where monsoon onset is after late June in the mean (Moron and Robertson 2014) (Fig. 2a). During the core (Fig. 12c) and late (Fig. 12d) phases, the weakest correlations between MCI and local-scale amount are located over the northeastern Gangetic plain, that is the area which tends to be out-of-phase with central Peninsular India during 20–60 day oscillations (Krishnamurthy and Shukla 2000, 2007).

Fig. 12
figure 12

Correlations between time-averaged Monsoon Circulation Index (MCI) given by the first principal component of an EOF of unfiltered daily winds at 850 hPa, precipitable water, and vertical velocity at 500 hPa from NCEP Reanalyses (from May 16, 1948 to October 31, 2014) (see text and Fig. 11) and gridded rainfall amount, frequency, and mean intensity (rows). The correlations are calculated separately for the whole May 16–October 31 season and its 3 sub-seasonal stages (columns). The values in the upper right corner of each panel show the correlation value between MCI and the SAI of the Monsoonal India (MI, defined from Fig. 1)-averaged rainfall characteristic (corr. SAI), and the fraction of MI area with significant correlations at p = 0.01 according to a random-phase test (Janicot et al. 1996) (S < 0.01). Insignificant correlations at p = 0.01 are masked out as light gray

The correlations with rainfall frequency (Fig. 12e–h) reveal similar patterns, but with larger local-scale correlations. The area with significant correlations is larger and the drop during the core phase is also reduced. Perhaps surprisingly, the correlation between time-averaged MCI and SAI of rainfall frequency drops during the core phase of the monsoon. A very different picture is seen for mean daily intensity, which exhibits very weak correlations with MCI at local-scale, with less than 10% of MI having significant values at p = 0.01 (Fig. 12i–l). The correlation between MCI and the SAI of intensity now peaks during the core phase of the monsoon (although it is still relatively low). Thus, a more intense monsoon circulation during the core phase accompanies a regional-scale increase of mean daily intensity, although this increase is not spatially coherent, except windward of the Western Ghats and over southeastern MI (Fig. 12k) under the strongest westerly flow at 850 hPa.

4 Discussion and conclusion

We have examined the spatial coherence of interannual rainfall anomalies and its sub-seasonal modulation over monsoonal India (defined in Fig. 1) (Gadgil 2003), using a 0.25° gridded dataset of daily rainfall available from April–November, 1901–2014.

Seasonal rainfall amount was firstly factorized into the frequency of wet days (≥1 mm) and the mean daily intensity of rainfall on wet days. Beyond its analytical properties (Moron et al. 2006, 2007, 2015), this decomposition emphasizes different physical processes, since frequency reflects the triggering of convection while mean daily intensity reflects its strength, acknowledging that local rainfall events (Smith et al. 2005) and their associated convective systems (Ricciardulli and Sardeshmukh 2002; Laing and Fritsch 1993) occur on sub-daily time scales. In addition, mean daily intensity equates with the scale parameter of an exponential distribution, fitted to the distribution of rainfall amounts on wet days. This in turn implies that the variance of daily rainfall is approximately given by the square of the mean daily intensity.

The climatological mean field of mean daily intensity is much more geographically uniform across MI than is rainfall frequency and amount (Fig. 5a–c). The largest climatological values of mean daily intensity are reached along the Western Ghats (Tawde and Singh 2015) and southern foothills of the Himalayas, emphasizing the role of orographic forcing, especially where slopes bisect the moisture flow.

The climatological seasonal cycle of MI-averaged rainfall and atmospheric variables (Figs. 3, 8b) identifies three distinct stages of the summer monsoon. The early phase (ca. May 16–June 30) is associated with local- and regional- scale onset (Fig. 2a) with rapid increases in both rainfall frequency and intensity at regional scale (Fig. 8b). These are associated with increases in precipitable water that accompany the establishment of the W-SW flow from the Arabian Sea across peninsular India (Fasullo and Webster 2003; Gadgil 2003; Webster et al. 1998; Goswami 2006). The core phase (ca. July 1–August 31) is associated with the largest rainfall frequency and mean intensity (Fig. 8b), precipitable water and mid-tropospheric ascent (Fig. 3b, d). The withdrawal phase (ca. September 1–October 31) is slower than the onset phase and the mean daily intensity remains at moderate values (Fig. 8b) well after the rainfall frequency decreases.

Next, we analyzed spatial coherence of interannual variations of seasonal and sub-seasonal averages of rainfall amount, frequency and mean daily intensity using various metrics (i.e. EOF, DoF, correlogram) and their relationships with the leading mode of regional-scale atmospheric variations. Interannual variability of seasonal amount and frequency of wet days is spatially coherent across monsoonal India, especially for frequency (Fig. 5e), while mean daily intensity, albeit contributing a larger fraction to local-scale interannual amount variations (Table 1), is not (Fig. 5f). The spatial coherence of sub-seasonal averages peak in mid-May to mid-June, around monsoon onset, and again from late August to late October, around its (slower) withdrawal, while it decreases around the climatological peak of the rainfall from early July to mid-August (Fig. 8a, c).

The decrease of spatial coherence during the core of the monsoon is argued to stem partly from the near-exponential distribution of daily rainfall coupled with the larger mean intensity at that time which increases the variance (proportional to the square the mean, for an exponential distribution). Strong low-level moisture advection increases the mean daily intensity, thus making possible more intense local-scale daily rainfall events (Fig. 4a). The interannual variability in the frequency of wet days at local scale is less and less spatially coherent as the threshold used to define it increases (Fig. 9a).

Although heavy rainfall grid-points and days are more numerous during the core phase (Fig. 9b), the mean spatial extent of contiguous wet grid-points at daily scale—the size of wet “patches”—remains fairly constant across the monsoon season and is almost independent of the central maximum (Fig. 10). The length scale L–O(300–1000 km) corresponds to the α mesoscale of convective clusters and synoptic systems. There is also a weakening of the coupling between the leading mode of variation of the regional-scale monsoon circulation and local-scale rainfall (Fig. 12c).

The larger spatial coherence of interannual variations near monsoon onset and withdrawal (Fig. 8a) is consistent with smaller mean rainfall intensities (and thus daily variance) then. At the end of the dry season, the combination of dry soils and advection of dry air conveyed by northwesterly winds from Pakistan-Iran region (Douglas et al. 2016) inhibits moist convection across most of MI despite very high surface temperatures which decrease the static stability near the surface. The low and mid-troposphere intense heating may be also reinforced by the heavy aerosol loading in May (Bollasina et al. 2008; Gautam et al. 2009). The development of the W-SW monsoon flow then brings moisture across the peninsula India from mid May (Fasullo and Webster 2003; Xavier et al. 2007). The timing and strength of the developing monsoon circulation may be affected by various external factors, including anomalous SSTs over Pacific and Indian oceans (Joseph et al. 1994; Fasullo and Webster 2003; Xavier et al. 2007; Rajagopalan and Molnar 2012, 2014; Gill et al. 2015) and the intraseasonal oscillation (AjayaMohan and Goswami 2007; Moron et al. 2012), while the local land surface remains mostly passive due to its nearly homogeneous dryness. This is also illustrated by the strong relationship between local-scale rainfall frequency across MI and the leading regional- scale atmospheric mode of variation during the early stage (Fig. 12f).

Once the precipitable water exceeds a certain threshold (≈25–30 kg/m2, Fig. 4a), moist convection is triggered, leading to very intense rainfall (Fig. 4a) across MI, with the most intense events occurring at the beginning of the core season (Fig. 8a). At this stage, from late June and early July, the interaction with the land surface intensifies due to successive local-scale wet events increasing the soil moisture and its spatial gradients. Idealized numerical simulations (Clark and Arritt 1995) have shown that, in the absence of dynamical forcing, soil moisture and vegetation cover cause the deep convection to occur earlier and the precipitation amounts to be greater. Such positive feedback between soil moisture and precipitation has also been noted at regional scale (Meehl 1994) and is considered as especially strong over Monsoonal India in summer according to Koster et al. (2004). Sufficient surface moisture may be able to sustain rainfall through local-scale “recycling” of moisture even as the surface land-sea temperature contrast is reduced owing to progressively cooler land temperatures (Meehl 1994). Interannual variability of spatially-averaged precipitable water over MI weakens during the core phase relative to the early and late stages of the monsoon (Figs. 3b, 4c). The coefficient of variation of MI-average rainfall is also lower during the core phase, consistent with an increase in the spatial noise at grid point scale. The increasing noise in space is partly canceled out by the time averaging.

During the late phase of the monsoon, the progressive homogenization of soil moisture may also tend to increase the spatial coherence of interannual rainfall amounts by imposing a uniform lower boundary condition. Large-scale forcings such as intensifying ENSO events (Gill et al. 2015) or early Indian Ocean Dipole events (Saji et al. 1999) in boreal autumn, may also increase the spatial coherence during the late phase of the monsoon. The shift of the main heat source toward the equator and over the Bay of Bengal could also play a role in increasing the spatial coherence of interannual rainfall amounts around the monsoon withdrawal and the sensitivity to external boundary forcing (Slingo and Annamalai 2000; Syroka and Toumi 2004), through the Gill’s (Gill 1980) mechanism.

The implications for seasonal and sub-seasonal prediction are twofold: (1) potential predictability at local (i.e. grid point) or sub-regional (i.e. state level) scales is highest for the frequency of wet days, and (2) it peaks just around the onset and during a more extended period around the withdrawal of the Indian summer monsoon (Webster and Yang 1992; Slingo and Annamalai 2000; Syroka and Toumi 2004; Goswami and Xavier 2005; Xavier et al. 2007; Sabeerali et al. 2012). The highest potential predictability and spatial coherence reached near the monsoon withdrawal fits with the conclusion of Rupa Kumar et al. (1992) that the excess or deficiency of the monsoon rainfall is more frequently realized in the latter half of the monsoon. Around the onset and withdrawal stages, the interannual variability of rainfall is then tightly linked with the timing and intensity of the low-level atmospheric flow conveying the moist flow toward continental India (Fasullo and Webster 2003). During the core phase, the monsoon is very much a self-sustaining system, primarily driven by the local distribution of latent heat release within the monsoon itself (Slingo and Annamalai 2000; Sperber et al. 2000; Syroka and Toumi 2004). We can argue that the spatial pattern of latent heat release is then noisier than during the onset and withdrawal stages due to very intense wet events. The monsoon is then less sensitive to (and thus less predictable from) the boundary forcings associated with regional- to global-scale SST anomalies. Slingo and Annamalai (2000) also noticed that the northernmost location of the latent heat sources over Western Pacific and Asian summer monsoon domains may also disconnect ENSO and Indian monsoon through anomalous local Hadley circulation during the core phase as in 1997. On a longer-term perspective, we may also hypothesize that heavy precipitation intensification, now emerging in the observed record across many regions of the world (Fischer and Knutti 2016), would be associated with decreasing predictability of the seasonal and sub-seasonal amounts of rainfall, especially during the core phase of the monsoon.

Extrapolating beyond India, we might expect other core monsoon seasons over land having a large mean daily intensity to be intrinsically less spatially coherent and less potentially predictable at sub-seasonal and seasonal time scales. Previous results (Moron et al. 2007, 2015) have shown indeed that the rare cases in which the core monsoon season exhibits spatially coherent interannual variability are relatively dry (as Brazilian Nordeste in February April or Kenya and Northern Tanzania in October–December). In contrast, wetter tropical regions are expected to experience a sub-seasonal modulation in which the spatial coherence and potential predictability peaks either near the start or the end of the season, rather than during the core. Such behavior has already been emphasized in Indonesia (Moron et al. 2010), the Philippines (Moron et al. 2009) as well as during the February–May rainy season in equatorial East Africa (Moron et al. 2013).