1 Introduction

Tropical cyclones (TCs) are one of the most serious natural disasters in Taiwan, which bring heavy rainfalls and strong winds resulting in severe damage to economies and casualties. In general, an average of three to four typhoons invades the vicinity of Taiwan annually (Wu and Kuo 1999). The increasing frequency and intensity of extreme typhoon rainfall events has been observed recently in Taiwan. Recently, (Chu et al. 2014) pointed out an upward trend in typhoon rainfall intensity over Taiwan since 1950. For example, Typhoon Morakot, Typhoon Fanapi, and Typhoon Megi in 2009 and 2010 brought record-breaking rainfall intensity and amount to different locations of Taiwan. In particular, Typhoon Morakot brought record-breaking rainfall in Southern Taiwan leading to about 700 deaths from mudslides during 7–9 August 2009. Different typhoon trajectories can induce different spatio-temporal patterns of rainfall and wind speed in a watershed due to topographical effect and cause different disasters and runoff patterns (Yeh and Elsberry 1993a, b; Lin et al. 2002; Wu et al. 2002; Huang et al. 2012; Yu 2014). Therefore, understanding the typhoon movements is important for estimating potential typhoon impact and mitigate typhoon disasters.

To characterize the typhoon trajectories, various classification approaches have been proposed. For example, (Hodanish and Gray 1993) classified the typhoon tracks in western North Pacific into four major patterns by considering their recurving processes, including recurving, gradually recurving, left-turning and non-recurving track patterns. Taiwan Central Weather Bureau stratified typhoon trajectories into nine groups with respect to their movements around the Island (Central Weather Bureau 2014). Classification techniques have been used to objectively determine the typhoon track classes by accounting for various characteristics of the trajectories, such as the typhoon’s positions of maximum intensity, final position, statistical moments of track positions (Elsner and Liu 2003; Camargo et al. 2007a, b; Nakamura et al. 2009). Based upon the quantitative measures of typhoon trajectories, the major patterns of typhoon movements can then be identified by various curve clustering methods, such as k-means clustering method, fuzzy c-means clustering methods, and regression mixture method (Elsner and Liu 2003; Camargo et al. 2007a, b; Nakamura et al. 2009; Chu et al. 2010a, b; Kim et al. 2011; Paliwal and Patwardhan 2013). Among them, the regression mixture method proposed by Camargo et al. (2007a, b), and Chu and Zhao (2011) has an advantage of not requiring the same length of the track data.

Large atmospheric circulation is one of the major factors to affect the typhoon trajectories as well as the genesis and development of tropical cyclones (Camargo et al. 2007a, b; Chu and Zhao 2007; Hsu et al. 2014). Based upon various atmospheric circulation observations, statistical regression approach was applied for the seasonal forecast of hurricane activities that can explain over 60 % of hurricane activities in Atlantic (Gray et al. 1993, 1994). Bayesian regression method was also developed for the modeling of seasonal TC activities to account for forecasting uncertainties of TC counts (Elsner and Jagger 2004; Chu and Zhao 2007; Lee et al. 2012). The large-scale atmospheric variables relevant to TC activities and trajectories in Western North Pacific commonly include the space–time distributions of sea level pressure (SLP), sea level temperature (SST), wind field at various vertical levels, as well as El Niño Southern Oscillation (ENSO) index (Chan 1985a, b; Holland 1995; Simpson et al. 1997; Camargo et al. 2007a, b). Among them, SST is the major energy source to fuel tropical cyclones, particularly for the temperature higher than 26 °C (Emanuel 2005; Trenberth 2005; Michaels et al. 2006; Ali et al. 2013). The ENSO variation can change spatial distribution of SST in the Pacific, i.e., the hot spot of high SST. Wind field can affect the stability of the atmosphere, and further induce or suppress the developing of tropical cyclones (Rasmusson and Carpenter 1982; Vickery et al. 2000).

This study proposed a seasonal typhoon trajectory modeling approach to characterize and forecast the trajectory type based on the large-scale atmospheric circulation patterns over East Asia and western North Pacific region. A regression mixture method was used for classifying typhoon trajectories in different seasons. generalized linear model (GLM) was applied to identifying the associations between activities of each trajectory pattern and climatic features. For the purposes of disaster management in Taiwan, the proposed approach was applied to analyzing the best tracks of typhoons across the island during the period of 1951–2009.

2 Data

Japan Meteorological Agency (JMA) compiles the trajectory data of all TCs since 1951 in the western North Pacific, i.e., spatial range of 100°E–180°E and 0°N–60°N, shown in Fig. 1 (Japan Meteorological Agency 2014). The best tracks of these trajectory data are recorded in the temporal resolution of 6 h. This study collected the trajectory data only for the TCs that went across the vicinity of Taiwan, i.e., within 6° in latitude and longitude as shown in Fig. 1. Tropical depression data (TD) was also considered in this analysis. The large-scale monthly atmospheric observations were obtained from NCEP/NCAR reanalysis I dataset (Kalnay et al. 1996), including: (1) sea level pressure (SLP), (2) precipitable water (PWAT), (3) 850-hPa relative vorticity (Vor850), (4) vertical wind shear (VWS), and (5) Southern Oscillation Index (SOI). In addition, the SST dataset in our analysis were obtained from NOAA, which used the improved tuning statistical method to reduce the data uncertainties and compiled the latest version of historical SST estimations (Smith et al. 2008), i.e., extended reconstructed SST (ERSST). The time duration of the large-scale atmospheric data are between 1951 and 2009 and the spatial range is within 60°E–120°W in longitude and 0°N–70°N in latitude with spatial resolution of 2.5° × 2.5° (2° × 2° for SST). Both best track and atmospheric data were classified into spring (April–June), summer (July–August), and autumn (September–November).

Fig. 1
figure 1

Study area of the size with 6° boundaries in vicinity of Taiwan, i.e., 100°E–180°E and 0°N–60°N

3 Methods

For the purposes of long-term forecast of typhoon movement patterns, this study investigates the frequency of different typhoon track patterns regarding to the space–time patterns of atmospheric anomalies. Our analysis was performed in different seasons, i.e., summer and autumn, during which southwest and northeast monsoons are dominant in West Pacific respectively. The space–time patterns of atmospheric anomalies were revealed by empirical orthogonal function method (EOF). The atmospheric anomalies were calculated with respect to the baseline records during 1970–2000. The EOF analysis decomposes a continuous space–time random field \( X(s,t) \), i.e., atmospheric anomalies in this study, into the additive space–time multiplication form as follows (Pearson 1901; Hotelling 1933; Hannachi et al. 2007)

$$ X(s,t) = \sum\limits_{k = 1}^{M} {c_{k} } (t)u_{k} (s) $$
(1)

where the vector (s, t) denotes the space–time location at time t and spatial position s. M is the number of modes in orthogonal space–time random fields, i.e. c k (t) u k (s). The modes are formulated as an optimal set of orthogonal spatial functions u k (s), i.e. EOFs, the k-th spatial features. Their associated expansion functions of time c k (t), i.e., the projection of X(s, t) on u k (s), are also called EOF expansion coefficients (ECs), which represents the k-th temporal features. Similar to common principal component analysis, the leading EOFs can usually explain the fair amount of the observed variances of the original space–time dataset (Hannachi et al. 2007; Yu and Chu 2010, 2012).

The typhoon track clustering was performed by using a curve clustering analysis (Gaffney 2004; Camargo et al. 2007a, b), in which a regression mixture model based on finite mixture model was developed to classify the TC trajectories into a user-specified number of track clusters. Finite mixture model is a convex combination of two or more probability density functions, which is capable of approximating any arbitrary distribution (Filho 2008). The regression mixture model considers the probability distribution of entire set of TC trajectories, which can be characterized by the combination of several marginal distributions of identified subset of TC tracks.

The method formulates the marginal distribution of the typhoon positions to be Gaussian-distributed, in which the mean positions are time-dependent. The time-dependent position function can be derived by using spline or polynomial regressions to identify the relationships between time and typhoon positions, i.e., longitude and latitude. It was shown that the low-order polynomial regression, i.e., quadratic, appears to be the optimal choice for TC trajectory modeling in terms of interpretation and goodness-of-fit (Camargo et al. 2007a, b).

The parameters of marginal distributions, i.e., the estimated positions of TC tracks and associated covariance, are estimated by the EM (Expectation–Maximization) algorithm. EM algorithm is an iterative method for finding maximum likelihood estimates when there is missing values or latent variables. The EM algorithm alternates between the steps of guessing a probability distribution over completions of missing data given the current model (known as the E-step) and then re-estimating the model parameters using these completions (known as the M-step) (Bilmes 1998; Do and Batzoglou 2008). For more details of regression mixture modeling for TC trajectory clustering, the readers are referred to (Gaffney 2004; Camargo et al. 2007a, b; Chu and Zhao 2011). In this study, the trajectory clustering is performed by using the Curve Clustering Toolbox (i.e., CCtoolbox) available online (Gaffney 2004; Gaffney et al. 2007).

This study investigates the relationships between large-scale atmospheric variables and typhoon trajectories by using GLM. GLM is an extension from ordinary linear regression, which allows the error distribution models of response variables other than normal distribution, such as binomial, Poisson and gamma distributions (McCullagh and Nelder 1989; Fox 2008). We characterized the occurrence number of typhoon track clusters and the temporal variations of space–time atmospheric EOF patterns by using GLM model with the following relationship

$$ \lambda_{i} (t) = \sum\limits_{j = 1}^{m} {\sum\limits_{k = 1}^{p} {\beta_{j,k} EC_{j,k} } } (t) $$
(2)

where \( \lambda_{i} (t) \) is the occurrence rate of typhoons in TC track i at time t, \( EC_{j,k} (t) \) is the expansion coefficient of the k-th EOF patterns of j-th atmospheric variable, e.g., SST and SLP, at time t. The number of typhoons appeared in track category i can be considered to be Poisson-distributed expressed as \( N\sim Poisson\left( {T\lambda_{i} (t)} \right) \), where T = 1 month in this study. We performed parameter estimation for Eq. (2) by using the glmfit routine of statistics toolbox in Matlab which is based upon maximum likelihood method (Dobson 1999). Our analysis integrates the EOF and GLM methods and can be summarized in Fig. 2. The optimal model for our analysis is based upon the Akaike information criterion (AIC) values.

Fig. 2
figure 2

The modeling flowchart of this study

4 Results

This study performed the EOF analysis to identify the leading EOF patterns and their associated temporal variations, i.e., EC values. The EC results used for the modeling of seasonal typhoon tracks are listed in Table 1. Since the trajectory clustering analysis requires a pre-determined cluster number, the log-likelihood of the regression mixture models were estimated to determine the optimal number of trajectory cluster number. Figure 3 shows the changes of log-likelihood with respect to different cluster numbers, where log-likelihood is the goodness-of-fit indicator representing the log-probability of observed data derived from the regression model. As shown in Fig. 3, though log-likelihood increases as the cluster number increases, the change of log-likelihood significantly reduces while the cluster number larger than six. As a result, we chose six to be the number of track classes to represent for the typhoon trajectories around Taiwan in summer in this study, respectively.

Table 1 The selected atmospheric variables in summer and the explained variances of their first three EOFs
Fig. 3
figure 3

The changes in log-likelihood versus clustering numbers in a spring, b summer, and c autumn

Figure 4 shows the results of typhoon track clustering analysis in the vicinity of Taiwan in different seasons, i.e., spring, summer, and fall. Table 2 lists the number of typhoon tracks in the three seasons. Among these seasons, summer is the most prevalent season for typhoon visits in Taiwan during the past decades; therefore, we focused our analysis of seasonal trajectory in summer. Particularly, typhoon track number one is one of the most prevailing tracks during summer and can easily associate to the southeasterly prevailing steering flows, bringing substantial rainfalls and landslide damages to the island. Though the numbers of summer tracks 3 and 4 are also high, these typhoons are generally less devastating to the island. GLM model was used to characterize the relationships between at atmospheric variables and the number of typhoon track 1 in summer. The significant patterns of atmospheric variables in this GLM model are listed in Table 3, showing the optimal fitted GLM regression coefficients (i.e., the β in Eq. (2)) selected by AIC value and their associated p-values, in which TRUE and FALSE represent the variable is selected in this model or not, respectively. Figure 5 shows the leave-one-out cross validation results from GLM monthly track number modeling for typhoon track 1 during 2002–2009. The result shows that the proposed model can provide reasonable monthly forecast for the prevailing typhoon track pattern with respect to different atmospheric conditions. The spatial distribution of the identified EOF patterns in atmospheric variables and their corresponding temporal variation that significant level reach p value ≤ 0.01 are shown in Fig. 6, including two EOF patterns for SLP and one for SST, respectively, i.e., slpEC2, slpEC3, and sstEC3. Among them, the second EOF of SLP (Fig. 6a) and its corresponding EC (Fig. 6b) showed a strong opposite pressure system pattern between Pacific Ocean and Asian continent, which is a common atmospheric pattern in summer dominated by subtropical ridge in Pacific and thermal low in Asia. The third EOF of SLP (Fig. 6c) and its corresponding EC (Fig. 6d) shows the Pacific subtropical ridge is located in higher latitude while lower latitude Pacific Ocean is dominated by low-pressure systems or tropical cyclones. The third EOF of SST (Fig. 6e) and its corresponding EC (Fig. 6f) show significant weighted region near Guam, where most tropical cyclones form in the western North Pacific.

Fig. 4
figure 4

Track clustering analysis in the vicinity of Taiwan in different seasons, a spring, b summer, and c autumn

Table 2 The number of typhoon tracks in the three seasons from 1951 to 2009
Table 3 The variable selection and parameter estimation in GLM for track 1 in summer
Fig. 5
figure 5

The leave one out cross validation result for GLM model in forecasting the number of summer track 1 during 2002–2009. Two forecasts (July and August) are made for each summer

Fig. 6
figure 6

The corresponding EOF results of selected variables (slpEC2, slpEC3, and sstEC3) in GLM for summer track 1 (left spatial weight of each EOF variable; right the corresponding temporal variation of each EOF variables (EC))

5 Discussion

Studies have investigated the major clusters of typhoon trajectories in the western North Pacific (Camargo et al. 2007a, b; Chu et al. 2010a, b); however, for the disaster prevention purposes by local governments, the spatial domains of previous analyses were too large to provide necessary information to estimate the anticipated impacts of typhoons such as heavy rainfall or landslides for a moderately small island such as Taiwan. For the case of Taiwan, changes in typhoon tracks can introduce distinct impact to the island. As a result, typhoon track patterns were considered as the primary predictors by studies of rainfall and landslide modeling (Lee et al. 2006; Chiang and Chang 2011). Furthermore, Taiwan central weather bureau (CWB) announced an official classification of typhoon tracks to fulfill the requirements of local agencies (Fig. 7). The current typhoon track classification was subjectively determined on the basis of the experiences of experts and officers (Central Weather Bureau 2014). This study applied regression mixture model to provide a more objective approach to differentiate the typhoon tracks within the study area. As shown in Fig. 4, the prevalent tracks of typhoon can change across the seasons due to the influence of monsoons. Particularly, during summer seasons, typhoons can more likely take paths going through the middle of Taiwan from East to West, i.e., track 1 and 2 in Fig. 4b, than other seasons. These two summer tracks can commonly bring heavy rainfalls and strong winds to Taiwan, such as typhoons Herb (1996), Aere (2004), and Saola (2012). Many typhoons in July and August taking these tracks caused severe damages during July and August. Moreover, the typhoons with these tracks can sometimes have accompanying effects, introducing the southwesterly flow with amount of precipitable water from the warm oceans, and substantially increase the rainfall intensity and duration (Pan et al. 2011). For example, typhoon Morakot (2009) followed track 2 and brought record-breaking intensity and magnitude of rainfalls, i.e., over 3,000 mm rainfalls in 3 days, in southern Taiwan. These accompanying effects can also occur in autumn while typhoons take the tracks 1, 2, 4, and 5 in Fig. 4c. Comparing our results with CWB classification, it shows our analysis is compatible with the previous experiences from the local agency. In addition, we further identified the major classes of the typhoon tracks in different seasons. Speaking of the CWB classification, our results showed the CWB tracks 2–5 are more prevalent in summer and CWB tracks 1 and 6 can occur more frequently in fall.

Fig. 7
figure 7

Classes and percentages of typhoon tracks from Taiwan Central Weather Bureau (1897–2007)

Numerous studies have developed the seasonal forecast model for typhoon counts (Elsner and Jagger 2006; Chu et al. 2010a, b; Kim et al. 2012) and revealed the significant atmospheric patterns for the TC activities. (Mestre and Hallegatte 2009) identified SST, North Atlantic Oscillation, and Southern Oscillation index are important predictors to the TC occurrence frequency in North Atlantic. Camargo et al. (2007b) showed the high associations of 500 hpa wind field, vertical wind and ENSO with the typhoon track distribution in NWP (Camargo et al. 2007a, b). Chu et al. (2010a, b) investigated the typhoon tracks crossing Taiwan areas from June to October and revealed that their high associations with antecedent atmospheric variables, including positive correlation with SST around Taiwan and equatorial western Pacific, and negative correlation with SLP at least half of NWP (Chu et al. 2010a, b). Based upon the EOF analysis of atmospheric variables, this study found that the EOF patterns of SLP around the vicinity of Hawaii and Japan, as shown in Fig. 6a, c respectively, are positively associated to the typhoon counts of summer type 1, implying that the two SLP patterns can strengthen the large-scale low-level summer circulation over the NWP in terms of the formation of monsoon troughs and ridges (Lander 1996). This study also showed the EOF pattern of SST around the vicinity of Guam, as shown in Fig. 6e, has positive correlation with the typhoon counts of summer type 1, suggesting the high SST in this area can contribute to the typhoon formation. The movement of tropical cyclone is well-known steered by mid-level steering flow, therefore, wind field data was widely used for estimating the TC track (Chan and Gray 1982; Chan 1985a, b; Velden and Leslie 1991). Though the importance of steering flow to the TC tracks, for the purposes of seasonal TC forecasting, in this study, we focused on the modeling of seasonal TC track patterns by several large-scale atmospheric observations including SLP and SST, which are used to represent the general characteristics of atmospheric conditions. Previous studies also used SLP to construct the statistical TC track models (Rogers 1997; Fogarty et al. 2006; Lu et al. 2010, 2013).

Our approach of seasonal forecast of typhoon tracks specific to Taiwan region can be useful to local agencies for disaster mitigation preparations; however, the uncertainties of this study should also be understood, resulting from the limited size of typhoon data because our analysis focused on the neighborhood of Taiwan Island and also considered the seasonal effects to typhoon tracks. For the purposes of forecasting, it can be also worthwhile to investigate the conditions of low-level circulations at each of identified typhoon tracks, and to have more physical-based explanations of prevailing tracks.