1 Introduction

Changes in the occurrence and magnitude of climatic parameter because of the global warming are very obvious and are being observed (IPCC 2007), but extreme events like heavy rainfall days, dry spells and wet spells were not considered for the climate change studies and in the recent years, some extreme events in different parts of the world forced the climate change researchers to study about these extreme events. Expert team on climate change detection and monitoring (ETCCDI) has worked on extreme precipitation events and suggested some rainfall and temperature indices to monitor these extreme events (Zhang and Yang 2004). Manton et al. (2001) studied the trend in extreme daily rainfall and temperature in South East Asia for the period of 1961–1998 for 13 stations and found significant increasing trend at some stations for very wet days while decreasing trend observed in rainy days at some stations. Significant increase in very hot days and decrease in cold days were also observed for the different parts of the South East Asia. De Lima et al. (2014) selected 57 stations of mainland Portugal (Southwestern Europe) and precipitation data of 67 years were used to investigate the intensity and frequency of extreme precipitation events, study revealed that the consecutive dry day spells are increasing significantly at 17% of the total stations while decreasing trend was observed for 1-day and 5-day maximum rainfall. Heavy precipitation days and very heavy precipitation day indices were also representing a significant decreasing trend at several stations. Abbas et al. (2014) carried out a study in the semiarid and sub-humid region of Pakistan for identifying the pattern of extreme precipitation indices at five stations of Punjab, and it was observed that no significant temporal trend is found for annual precipitation but the spatial trend was observed according to variation in elevation. Roy and Balling (2004) conducted study on 129 stations across India to analyze the precipitation extreme events trend and concluded that Himalayan region and Deccan Plateau show increasing trends in precipitation extreme events, whereas parts of Gangetic Plain show decreasing trends. Goswami et al. (2006) found significant increase in heavy precipitation events over central India in last century.

Several studies related to rainfall, temperature, evapotranspiration and its temporal and spatial variation in northeast India have been carried out by several researchers, but main focus was on the recognition of trend and its variability for these climatic parameters and to the best of authors knowledge, no study of regionalization on the basis of precipitation indices has been carried out for this region. Goyal (2014) reported based on the study carried out for 68 stations of Assam that there is no trend in annual rainfall of this region during last century, but the seasonal variation in rainfall is significant for agriculture and other sectors of the state. Deka et al. (2013) in their study analyzed rainfall data for more than a century in Brahmaputra and Barak River basin at monthly, seasonal and annual basis and found significant change in annual and monsoon rainfall. Many other researchers studied the climatic variation within the northeast India (Jain et al. 2013; Jhajharia et al. 2009, 2012) and found significant change in seasonal rainfall at some stations. Shivam et al. (2017) conducted a study on a river basin of northeast India to analyze the changes in extreme rainfall events and found that the daily rainfall intensity is likely to increase for climate change scenarios, whereas the number of rainy days tends to decrease.

Apart from the trend analysis methods, another technique which is very helpful in identifying the trend and behavior of hydro-meteorological data is regionalization of the area on the basis of statistical properties of meteorological data. Clustering approach is used for regionalization, and in recent time, fuzzy clustering approach has gained much popularity because of its flexibility to attach a particular dataset with several clusters. In hydrological studies, unlike hard clustering, fuzzy clustering provides an idea of a particular station being associated with different clusters. Goyal and Sharma (2016) studied the meteorological drought pattern in three states of western India and found that the region reflects different numbers of clusters for different monthly timescales of drought analysis. Goyal and Gupta (2014) used cluster approach to identify the homogeneous region of precipitation events in northeast India using five different validity indices and found that the method provides sufficient homogenized cluster for regional frequency analysis. Rao and Srinivas (2006a, b) used fuzzy cluster approach in regionalization of the watersheds to analyze the flood frequency. Rajsekhar et al. (2012) identified eight homogeneous regions for hydrological drought situation based on severity and duration by using hydrological simulation and fuzzy clustering approach.

Northeast India is one of the most vulnerable regions toward climate change, and the largest threat to its biodiversity and hydrological system is from changing climate (Ravindranath et al. 2011). Recent extreme climatic phenomena in this region have caused loss of life, agriculture and socioeconomic losses. Increasing temperature and changing pattern of rainfall in the region as observed by several researchers have compelled planners and policy makers to think and plan according to these results which would reduce the further losses. Taking into consideration all these facts, this study was carried out for following objectives (a) extreme rainfall events occurred in the previous decades in northeastern region of India, (b) analysis of the rainfall indices for the region, (c) identifying the existing trend of different precipitation indices and (d) clustering of the northeast India region on the basis of precipitation indices.

2 Study area and data collection

Northeast region of India which consists of the eight states Assam, Meghalaya, Arunachal Pradesh, Tripura, Nagaland, Mizoram, Manipur and Sikkim is selected for the study. This region extends from latitude 22.4°N to 28.7°N and longitude 88.2°E to 96.5°E. Elevation varies from 28 m above mean sea level to 7000 m above mean sea level. Region is very diverse in nature, and plains mainly comprise Brahmaputra and Barak valleys. Onset of monsoon occurs from middle of May and continues till October. On an average, the NE region receives about 2450 mm of rainfall. The Cherrapunjee–Mawsynram range receives rainfall as high as 11,500 mm, about 60% area under forest with Arunachal Pradesh having about 80% of its area under different kinds of forest, while Assam has the minimum percentage of forest area (30%). Large altitude difference and varying physical features are the main reason of diverse climate from near tropical to temperate. The annual rainfall in the region is received mainly from southwest monsoon, annually (Das et al. 2009; Dash et al. 2012; Goyal 2014). The temperature varies from 15 to 32 °C in summer and 0–26 °C in winter (Das et al. 2009).For studying the extreme precipitation indices and its trend, 27 stations from 0.5° × 0.5° gridded precipitation data from India Meteorological Department (IMD) were selected for the period of 1971–2005. These stations are shown in Fig. 1, and their details are given in Table 1.

Fig. 1
figure 1

Study area and location of selected stations

Table 1 Details of the selected stations

3 Methodology

3.1 Precipitation indices and trend analysis

Precipitation indices are calculated according to the methodology suggested by the experts team on climate change detection, monitoring and indices (ETCCDMI). Details of these rainfall indices are given in Table 2. Mann–Kendall nonparametric test (Mann 1945; Kendall 1948) was used for calculating the trends of precipitation indices. As Mann–Kendall test is applicable on the data which is not auto-correlated so the auto correlation of the precipitation time series was checked at 1-lag, 2-lag, 3-lag and 4-lag, it was found that precipitation at any station is not auto-correlated significantly at 95%. The magnitude of trends in time series was determined using Sen’s estimator (Sen 1968).

Table 2 Details of rainfall indices

3.2 Fuzzy c-means clustering

Fuzzy c-means is a soft clustering approach where each vector belongs to every cluster with certain degree of membership (Dunn 1973; Bezdek et al. 1984). If a dataset of M objects which can be classified into c classes and \(Y_{k}\) is the data vector belongs to \(k_{th}\) object where k = 1, 2, 3…M, then fuzzy c-means clustering algorithm is used to minimize the objective function represented as:

$$J(U,C) = \sum\limits_{j = 1}^{M} {\sum\limits_{i = 1}^{c} {u_{ik}^{\phi } } \left\| {Y_{k} - C_{i} } \right\|^{2} }$$
(1)

where \(u_{ik}\) is the membership value of \(k_{th}\) data point in \(i_{th}\) cluster, ‖Yk − Ci2 is squared Euclidean distance between data vector k and center of \(i_{th}\) cluster center,\(C_{i}\) is the center of \(i_{th}\) cluster and \(\phi\) is generally called as fuzzifier or fuzziness index, which can have any value greater than 1. Fuzzifier value near to 1 denotes hard cluster, and in general, its value is set between 1 and 2.5 (Bezdek and Pal 1998). In fuzzy c-means clustering algorithm, initially the number of cluster and data vector of cluster center is selected randomly and then membership matrix is calculated using formula

$$u_{i < k}^{t + 1} = \left[ {\sum\limits_{j = 1}^{c} {\left( {\frac{{\left\| {y_{k} - c_{i} } \right\|}}{{\left\| {y_{k} - c_{j} } \right\|}}} \right)^{{\frac{2}{\phi - 1}}} } } \right]^{ - 1}$$
(2)

where i = 1, 2, 3…c and k = 1, 2, 3M. By using updated membership function and Eq. (6), new cluster center can be calculated as follows:

$$c_{i} = \frac{{\sum\nolimits_{k = 1}^{M} {u_{ik}^{\phi } y_{k} } }}{{\sum\nolimits_{k = 1}^{M} {u_{ik}^{\phi } } }}$$
(3)

In clustering analysis, the number of clusters is very important as it represents the compactness of clustering. Bezdek (1981) and Bensaid et al. (1996) presented some validity checks for optimality of number of clusters. Optimal number of clusters refers to the compactness of the data points in a cluster and its separation from other clusters, and optimal value of number of clusters also decreases with increase in number of clusters (Pal and Bezdek 1995; Rao and Srinivas 2006b). In present study, three validity indices have been used to check the performance of clustering which are as follows:

  1. (a)

    Xie and Beni (1991) suggested compactness and separation validity function (S) which is measure of the ratio of variance within the clusters to the variance between the clusters, and smaller value of S indicates the better separation of clusters.

    Separation index

    $$S\left( {U,V{:}X} \right) = \frac{{\varSigma_{i = 1}^{\text{c}} \varSigma_{k}^{\text{M}} \left( {u_{ik}^{a} } \right)\left| {\left| {c_{i} - y_{k} } \right|} \right|^{2} }}{{M\mathop {\text{min} }\limits_{i \ne k} \left| {\left| {v_{i} - y_{k} } \right|} \right|^{2} }}$$
    (4)
  2. (b)

    The fuzziness performance index (FPI) is a measure of the degree to which different classes share membership (fuzziness) (Bezdek 1974a).

    Fuzziness partition index

    $${\text{FPI}} \left( U \right) = \frac{1}{m}\varSigma_{i = 1}^{c} \varSigma_{k = 1}^{M} u_{ik}^{2}$$
    (5)
  3. (c)

    The modified partition entropy (MPE) estimates the degree of disorganization created by a specified number of classes (Bezdek 1974b).

    Modified partition entropy

    $${\text{MPE}} \left( U \right) = \frac{1}{m}\left[ {\varSigma_{i = 1}^{c} \varSigma_{k = 1}^{M} u_{ik} \log_{a} \left( {u_{ik} } \right)} \right]$$
    (6)

Minimizing the value for MPE, FPI and S gives the optimum number of clusters. These indices have been used in several studies (Minasny and McBratney 2002; Odeh et al. 1992; Halkidi et al. 2001; McBratney and Moore 1985).

Variables with different units affect the clustering results, so it is suggested in several studies (Lim and Voeller 2009; Dikbas et al. 2012) to normalize the data. In this study, six attributes of each station were used for identifying the clusters, i.e., latitude, longitude, average value of the precipitation index, standard deviation, maximum value and minimum value. These data were normalized using the following transfer function:

$$x_{n} = \frac{{x - x_{\rm min} }}{{x_{\rm max} - x_{\rm min}}}$$
(7)

where xn is the normalized data attribute, x is the original data attribute and \(x_{ \rm min}\) and \(x_{ \rm max}\) are the minimum and maximum data attributes of all stations.

4 Results and discussion

4.1 Analysis of precipitation indices

Consecutive dry days (CDD) are the number of consecutive days when rainfall is less than 1 mm, and this index is helpful in identifying the drought spell, scarcity of the water resources for the particular spell for agricultural requirement. Northeast India is humid region where major rainfall occurs in the monsoon season of June to October and on an average area receives high rainfall of 2450 mm annual, but major portion of rainfall occurs in the monsoon season only. According to the analysis of 35 years of rainfall data, it was found that average dry spell is approximately 126 days and the longest dry spell occurred at West Garo Hills for 150 days in 1999 while the shortest dry spell was at upper Subansiri for 48 days in 1979. As the elevation of the station from mean sea level varies from minimum 58 m (Bongaigaon) to maximum 3719 m (Dibang Valley), variation in dry spells also varies according to the elevation. Length of dry days spells is lesser at higher elevation as compared to lower elevation stations, and average dry spell length at the stations having elevation less than 500 m was 51 days while stations above 1000 m have the average dry spell of 41 days. Reason behind this would be the orographic precipitation at the higher elevation. Figure 2 shows the spatial variation in dry days spells for the region. A significant negative trend was observed at Lunglei and Aizawl stations while at Tirap significant positive trend was found, and the rest of all the stations show non-significant increasing or decreasing trend at 95% confidence interval. Dry days spell is an indicator of drought situation in region, and its analysis is also necessary for the area like northeast India where major portion of the land is covered with forest; many studies have correlated the dry days spells to forest fire incidents (Chen et al. 2014) as during the dry spells dead biomass can get fired easily. There have been so many incidents of forest fire in northeast India and it is very vulnerable to it (Puri et al. 2011).

Fig. 2
figure 2

Spatial variation in the precipitation indices

Similarly, the consecutive wet days spells are also important for detecting the climate change footsteps; as the threshold value for a rainfall to consider as significant was decided as 1 mm, it was assumed that the rainfall below this amount does not contribute in any hydrological process; many studies have suggested different methods to select the threshold value for the particular region (Manton et al. 2001; Groisman et al. 1999). The longest wet spell was observed at Mamit for 118 days, whereas minimum 29 days at Karbi Anglong. Karbi Anglong is the only station which shows a significant positive change, and remaining stations reflect insignificant positive or negative trend. Interpolation map of maximum wet days spells within the duration of 1971–2005 is shown in Fig. 2.

4.2 Changes in annual precipitation

Highest annual rainfall received at Tamenglong in 1971 which was 12,195.9 mm, whereas minimum rainfall was recorded in West Kameng-2 in 1972 that was 741.9 mm. As far as the average annual rainfall is concerned, it was observed at Dibang Valley which has the highest elevation among selected stations, and average annual rainfall was recorded to be 3358.3 mm. Trend analysis of annual precipitation shows that no significant trend was observed at any of the stations, whereas non-significant increasing or decreasing trend is existing at several stations. Only six stations (Mamit, Tamenglong, Sibsagar, Tinsukia, Upper Subansiri and Dibang Valley) are there with decreasing (statistically insignificant) annual precipitation trends, and remaining stations are showing insignificant increasing trend.

4.3 Analysis of rainy days (R1 mm), heavy precipitation days (R10 mm) and very heavy precipitation days (R20 mm)

As per the ETCCDI, it was considered that days receiving precipitation of 1 mm are rainy day; however, this threshold could be different for different areas. Similarly, days with precipitation more than or equal to 10 mm are termed as heavy precipitation days and days with 20 mm or more precipitation are termed as very heavy precipitation days. Analysis of these indices is important for hydrological studies as well as climate change studies (Aguilar et al. 2005). As far as the rainy days are concerned, maximum number of rainy days was observed at Sibsagar in 1981 of 283 days and minimum 89 days at Mokokchung in 1971. Average annual precipitation days calculated for the region were 170 days, and spatial distribution of R1 mm, R10 mm and R20 mm is shown in Fig. 2. Trend of rainy days from 1971 to 2005 has not changed at any station significantly, although it is positive at some stations (Nalbari, East Khasi Hills, West Kameng-1 and 2, Aizawl, Tirap and Tinsukia) and decreasing at most of the stations, and the trend is non-significant at 95% confidence interval.

Maximum heavy precipitation days observed were 131 days at Jaintiya Hills in 1986, whereas minimum 19 days at Tuensang. This region has average 72 days of heavy precipitation days annually, and trend at these stations is not significant. Similarly, the very heavy precipitation (precipitation > 20 mm) days are also very often in northeast India. Maximum 71 days of very heavy precipitation observed at Tamenglong in 1971 and minimum 2 days at Tuensang in 1986. On average, 36 days in a year receives the very heavy rainfall, and this very heavy rainfall events cause the flash flood, landslide and other hazardous phenomena which is quite often in hilly terrain of northeast India. East Kameng and Churachandpur show the significant increasing trend, whereas in Dibang Valley it is significantly decreasing. Similarly, very wet days (R95p) and extremely wet days (R99p) indices suggest significant increasing trend at West Kameng-1 and West Kameng-2 with a magnitude of 28.71 mm and 22.36 mm, respectively, and extremely wet days index represents significant positive change at East Kameng and Churachandpur while at Karbi Anglong and Kohima this index is decreasing significantly.

4.4 Changes in R1 day, R5 day and SDII index

Maximum 1-day precipitation and maximum 5-day precipitation are the indication of temporal distribution of precipitation. For the places like northeast India, this becomes more important to analyze these indices as the region receives rainfall more than 2400 mm annually, and the maximum amount of rainfall occurs in monsoon season, that too with a certain period of very high intensity rainfall. Urban hydrology has the certain significance of these indices; recently occurred 1-day heavy rainfall at Mumbai in 2005 was recorded 94.4 cm, and heavy rain in Kedarnath in Uttarakhand state of India caused severe destructions. Such incidents encourage to study and monitor these extreme precipitation indices. In the study, it was observed that stations in northeast India have received very heavy precipitation in one day during the 1971–2005 periods. It has been observed that a very much variation in 1-day maximum precipitation exists in this region depending on the spatial variation. Tamenglong (Manipur) has received maximum 1-day rainfall among all the selected stations which was 597.4 mm on June 1976, whereas Tuensang (Nagaland) received maximum 110 mm rainfall in August 1976. Trend analysis of the index R1 day represents that out of 27 stations 15 stations, namely Bongaigaon, Nalbari, West Kameng-1, West Kameng-2, Aizawl, East Kameng, Churachandpur, Tamenglong, Kohima, Golaghat, Upper Subansiri, Mokokchung, Sibsagar, Tirap and Dibang Valley depict significant positive or negative trend for maximum 1-day precipitation (Table 3). Five-day maximum precipitation shows significant increasing trend in Bongaigaon (June month with magnitude 1.69 mm), West Kameng (May month with magnitude 1.05 mm), East Kameng (0.71 mm in May month) and Churachandpur (August month with 1.16 mm), whereas significant negative trend in Nalbari, Tamenglong, Kohima, Siang and Tinsukia for the month of July, June, May, May and August, respectively.

Table 3 Magnitude of trend (Sen’s slope) for maximum 1-day (R1 day) and maximum 5-day (R5 day) rainfall

Maximum 1-day precipitation has significant trend on monthly basis, but no significant trend was observed at annual basis. Daily intensity index (SDII) is defined as annual precipitation divided by the number of wet days (rainfall > 1 mm). Daily intensity index is important for understanding the distribution of rainfall, and northeastern region is a high rainfall receiving area but most of the rainfall occurs in short duration of monsoon period, so it is important to analyze intensity of precipitation throughout the year. Maximum intensity within the study period was observed 80.8 mm/day at Tamenglong in 1995, and minimum 5.1 mm/day was observed at Upper Subansiri in 1982.

4.5 Clustering of rainfall indices

Fuzzy c-means clustering algorithm was applied on the matrix containing normalized value of the attributes of the stations, i.e., latitude, longitude, average, standard deviation, minimum value and maximum value of each index. As discussed in the methodology section, fuzziness index (\(\phi\)) is taken as 2. Urcid and Ritter (2012) suggested the upper bound of the number of cluster as \(\sqrt n\)(where n is the number of stations), and minimum number of cluster can be 2, so the number of cluster classes for the analysis was taken from 2 to 5. Further, optimum number of clusters was selected with help of certain validity indices discussed in methodology section. Objective function value (OFV) and validity indices values for indices CDD, CWD and PRCPTOT are given in Table 4. As in the table, FPI, MPE and S values are minimum at five numbers of clusters for CDD, whereas FPI and S values are minimum at four numbers of clusters and MPE value is minimum value at five number of clusters in case of CWD index. Therefore, optimum cluster for CWD is assigned as 4. FCM results for clustering of annual precipitation index (PRCPTOT) checked with validity indices, FPI and MPE value minimize at five numbers of clusters, whereas separation index (S) minimizes at four numbers of clusters (Table 4); hence for PRCPTOT, five groups were finalized and distribution of the stations can be seen in Fig. 3. Precipitation days (R1 mm), heavy precipitation days (R10 mm) and very heavy precipitation days (R20 mm) clustering results and validity indices values can be seen in Table 4. As FPI and MPE and S values minimize at five numbers of clusters and therefore for R1 mm and R10 mm cluster numbers are 5, whereas for R20 mm, FPI and MPE values minimize at three numbers of clusters, this is selected as maximum number of clusters for R20 mm. Stations falling under each cluster can be seen in Fig. 3.

Table 4 Values of validity indices for different precipitation indices
Fig. 3
figure 3

Clusters identified by fuzzy c-means for indices

Similarly, cluster analysis for the indices R95p, R99p and SDII also shows five clusters which can be seen in Fig. 3. Further, FCM was applied on 1-day maximum rainfall index (R1 day) and 5-day maximum rainfall (R5 day), as these two rainfall indices are calculated for each month, so for the analysis only May, June, July, August and September months were selected as these are the highly rainfall-receiving periods. Analysis reveals that for June and August months, there is not very much variation among the stations and thus the number of clusters is only 2 and 3, respectively, whereas in May, July and September months five clusters were found based on the values of validity indices. Reason behind this monthly variation could be the northeast monsoon winds of Bay of Bengal which dominates the region in summer months and southwest monsoon winds which reach to this region in monsoon months. Distribution of the stations in each cluster for R1 day and R5 day can be seen in Fig. 4.

Fig. 4
figure 4

Clusters identified by fuzzy c-means for index R1 day and R5 day

5 Discussion and conclusion

Northeast India has a very irregular terrain, and elevation ranges from 55 m (Bongaigaon) to 3719 m at Dibang Valley and this causes the high variation in climatic pattern of the region. Results of extreme precipitation indices analysis of the region show high variation in dry days spells (CDD) and wet spells (CWD) in stations. Dry day spells are quite often at the stations situated at lower elevation, whereas prolonged wet days spells observed at high elevation zone. Maximum 1-day rainfall is good indicator for the flood analysis in the hilly region, and the study shows increasing trends in maximum 1-day rainfall at the stations of Arunachal Pradesh state (East and West Kameng) which has the several implications for the region as catchment of the Brahmaputra river drains run off entirely from states of Arunachal Pradesh and Assam. Extreme rainfall events in the upper catchment of Brahmaputra River cause flash flood in the downstream catchment.

Although the trend analysis of annual precipitation does not suggest statically significant trend, the decrease in rainy days and subsequent increase in intensity are the prime concern for the region. Changing magnitude of extreme rainfall events of the northeast region is matter of concern as the area is very much vulnerable to extreme events; because of its hilly terrain, even a small-duration heavy rainfall at upper part causes flood in plain areas in downstream of Brahmaputra River. Contrary to this, despite receiving high annual rainfall, some parts face drinking water and irrigation water scarcity in the non-monsoon period because of the draining out of large portion of rainfall water.

This study classifies the northeast India region in several homogeneous clusters according to their statistical properties for different precipitation indices using fuzzy clustering approach, which provides an insight into the rainfall distribution in the region. Optimum number of clusters from fuzzy clustering approach was validated using five validity indices, which assures the compactness of the clusters. Findings of the study could be helpful for frequency analysis and watershed management planning of the region. Homogeneous regions identification based on the different extreme events indices would be helpful for the preparedness and effective mitigation planning for natural hazard situation.