1 Introduction

Understanding the risk factors of tropical cyclone (TC) is essential for effectively mitigating its potential risks. Many efforts have been made to investigate TC risk factors, such as TC intensity, size, and track. TC intensity normally refers to wind-related parameters, such as maximum wind speed (Vmax) (Nordhaus 2010). Klotzbach et al. (2020) showed that another intensity parameter, minimum sea level pressure, correlates better with damage and fatalities than Vmax does. In addition to TC intensity, TC size is known to be an important factor for predicting the amount of damage over the United States (US) (Czajkowski and Done 2014; Hsiang and Narita 2012). Track is also an essential factor used in determining the occurrence of damages from a TC. According to Nam et al. (2018), even a small difference of less than 250 km in the track can serve as a key determinant of damages from TCs.

Many previous TC risk studies have focused on TC under the strong TC than weak and decaying TC, since the former are more destructive than the latter, particularly, associated with its strong wind gusts. Meanwhile, a recent study by Park et al. (2016) showed that weak TC (WTC; Vmax < 17 m s−1) can, in some regions, be as destructive as strong TC (STC; Vmax ≥ 17 m s−1); the capital area of the Republic of Korea (hereafter Korea) experienced as much damage from WTC as from STC. However, no detailed analysis for the characteristics of WTC attributable for the substantial damage was reported. Extratropical transition (ET), which is the phenomenon when a TC transforms into an extratropical cyclone as it moves into the mid-latitude and encounters baroclinic environmental zones (Klein et al. 2000; Hart et al. 2006), could explain the unexpected damage from WTC. Note that TCs have two different pathways via which they move poleward: 1) they decay as a tropical depression (TD) without undergoing ET, or 2) decay undergoing ET (Jones 2003). Over the western North Pacific (WNP), about 50% of TCs (~13 TCs) undergo ET as they move poleward and interact with mid-latitude environments, and 7–10 TCs make landfalls during ET every year (Bieli et al. 2019).

It is interesting to note that ET storm can cause more extreme weather hazards associated with heavy precipitation than TD (Choi 2021). TC draw energy from latent heat released from warm ocean surfaces, whereas extratropical cyclone derives their energy from the release of potential energy when cold and warm air masses interact. During ET, the deep warm core of a TC becomes shallow and is often replaced by a cold-core, asymmetric structure, later leading to the development of surface fronts (Klein et al. 2000; Hart et al. 2006). The ET process involves baroclinic frontal lifting, combined with the remaining moisture and secondary circulation in the eyewall, which creates favorable conditions for heavy rainfall (Keller et al. 2019; Ritchie and Elsberry 2003). For example, North Atlantic hurricane Sandy (2012) was under ET when it made landfall in New Jersey and produced catastrophic flooding that caused economic loss amounting to 50 billion US dollars (Galarneau et al. 2013). In Japan, the extreme precipitation associated with typhoon Etau (2015) caused disastrous floods, resulting in casualties in the northern and eastern portions of Tokyo. This typhoon was also under ET when it passed the Japanese islands (Keller et al. 2019).

Hence, a comprehensive understanding of TC-induced damages requires consideration of all intensity categories of STC and WTC, including ET, TD, tropical storm (TS), severe tropical storm (STS), and typhoon (TY), accompanied by their tracks. In this study, the contribution of each intensity category and track to the total TC damage occurrence over Korea was examined using the updated version of the TC damage dataset, based on Park et al. (2015, 2016). In the following section, the meteorological and damage data, and decision tree methods are explained. The third section contains the results of the risk analysis, and finally, the fourth section highlights the major findings of this study.

2 Data and Methods

2.1 Tropical Cyclone and Damage Data

The locations of the TC center and intensity were obtained from the International Best Track Archive for Climate Stewardship (IBTrACS) version 4 (Knapp et al. 2010, 2018). Hourly TC data, interpolated linearly from three-hourly IBTrACS, were analyzed for the years 1979–2015. TCs approaching the 3° latitude and longitude distance from the coastline and border of Korea were sorted as Korean landfall TCs (Fig. 1). The definition of 3-degree distance has been used by previous research that investigated Korea-landfalling TCs (Park et al. 2016; Nam et al. 2018; Park et al. 2021). Kim et al. (2023) recently showed that the 3-degree definition corresponds the best with the operational definition of Korea-affecting TCs, which includes subjective judgments of the forecasters. A total of 123 TCs, equivalent to 4.56 TCs per year, were recorded for the analysis period. Further details on the calculation of each variable are discussed in Section 2.2.

Fig. 1
figure 1

TC parameters (impact angle, impact category, and minimum distance) of typhoon Usagi (2007) used in a decision tree analysis. Gray lines denote the 3° distance from the Korean coastline and border. “x” marker with circle denotes the locations where Usagi (2007) first comes into the 3° line. Star indicates the location of Daejeon city, which is a basis for calculating the impact angle. The impact angle is calculated counter-clockwise from the north just as the blue curved arrow. “+” represents the location of Usagi (2007) where it records minimum distance from the coastline of Korea

Verified data on socioeconomic losses were taken from the open-access website of the National Disaster Information Center (NDIC) of the Korean government (http://safekorea.go.kr). The socioeconomic losses of the NDIC data include direct damage of industrial, public, and private facilities in total economics, and the losses are adjusted to the value of money in 2005. Thereafter, the damages were marked corresponding to the period that a TC center was located within 5° from the Korean coastline. Cases with a damaging period longer than 10 days (About 11% of the TCs) were then excluded from the analysis because these data points may not represent TC damage exclusively. The NDIC records aggregate damage records when multiple natural disasters happen simultaneously or consecutively; thus, damage data with a damaging period longer than 10 days in the NDIC record would represent aggregated damage from multiple TCs coming in succession or a TC arriving during a heavy rainfall monsoon period (i.e., Changma season). As the NDIC has significantly changed its data format since 2016, data that were used in the present study were limited to the period of 1979–2015. More details about the socioeconomic losses dataset can be found in Park et al. (2015).

2.2 Decision Tree Method

The decision tree method was used to analyze the relationship between the risk factors and TC damage. In the decision tree model, the explanatory variables are the following TC parameters: impact category, impact angle, and minimum distance (Fig. 1), while the response variable is the economic damage occurrence recorded for TC landfalling episodes.

For the response variable, we used the categories ‘yes’ and ‘no’ with respect to damage occurrence because the decision tree method is more effective in predicting the outcome of a categorical than a continuous variable. The ‘undamaged’ TC cases are those whose impact was negligible so that no damage record was reported to the National Emergency Management Agency.

Regarding the explanatory variables, Table 1 shows the six impact categories (TY, STS, TS, ET, TD, and unknown) that are adopted from the TC grades of the Regional Specialized Meteorological Center (RSMC) Tokyo. The impact category is decided according to the RSMC Tokyo TC grade at the time of impact when the TC first enters the 3° distance. Meteorological agencies have some discrepancies with respect to TC grades, especially in differentiating extratropical versus tropical storms (Schreck et al. 2014). Therefore, we decided to use the RSMC Tokyo record, as they are in charge of TC analysis and forecast for the WNP TCs. The total 123 TCs break down to 41 TYs, 19 STSs, 24 TSs, 16 TDs, 13 ETs, and 10 Unknowns. TCs that enter the impact location in the TY, STS, and TS categories were grouped as STC, while those in the ET, TD, and unknown categories were grouped as WTC, similar to Park et al. (2016). The Unknown category belongs to WTC because the named TCs (i.e., STCs) are tracked by operational agencies rigorously, and their intensity estimates are more robust and accurate (Velden et al. 1998). Thus, STCs would not be labeled as Unknown by the RSMC Tokyo. The impact angle, representing the entry location, indicates the angle from which the TC approaches the 3° distance line. The center position is Daejeon city (Fig. 1), and its direct north is 0°, to form a full 360° angle. The minimum distance was defined as the distance from the coastline to the center of the TC as it approaches the Korean coastline. For example, the impact category of TC Usagi (2007) was STS, its impact angle was 236°, and its minimum distance was 1.44° (Fig. 1).

Table 1 The six impact categories adopted from the tropical cyclone (TC) grade of the Regional Specialized Meteorological Center (RSMC) Tokyo at the time of the TC entering a 3° distance line from the Korean coastline. Vmax indicates the maximum value of the 10-min average of winds at a height of 10 m above the sea surface inside the TC

The decision tree method aids in structuring the layers of the decision process using multiple variables following a graphical sequence. It is one of the simplest machine learning techniques, as it is intuitively interpretable with robustness to a variety of data. Here, we used the See5/C5.0 algorithm (Quinlan 1993), following Nam et al. (2018), which calculates information obtained at each node to select the most efficient attribute for splitting training samples into two branches.

To prevent over-fitting, we introduced pruning and cross-validation. First, we required that branches have a sample size of at least five. The number five was determined through a retrospective pruning process. Second, a ten-fold cross-validation was conducted, and we checked that the decision tree results (e.g., model accuracy, tree size, or attribute usage) are stable and consistent and that the relative importance among the TC risk factors was robust.

3 Results

3.1 Comparison of Damages Resulting from STCs and WTCs

WTCs are commonly considered to pose a lower risk of natural disasters than STCs. Figure 2 displays the median and average economic losses directly caused by TC landfalls in five provinces in Korea, in which the average economic loss was generally higher than the median for both STCs and WTCs (Fig. 2a and b). This can be attributed to the skewed distribution of damage data, with extremely high economic losses from certain TCs recorded in some provinces. STCs caused significantly higher damages than WTCs, particularly in the southern and eastern provinces, and the opposite was true in the northwestern province, including the Seoul metropolitan area where half of the Korean population is concentrated.

Fig. 2
figure 2

Economic losses (in billion KRW) caused by STCs and WTCs that made landfall in five provinces in Korea. Bold text indicates the median; values inside parentheses indicate the average of all the economic losses records for each province; italicized text refers to the provinces with statistically different economic losses from STCs and WTCs, at the 95% confidence level (Wilcoxon-Mann-Whitney rank sum test)

The impacts of WTCs in the Seoul metropolitan area were documented by Park et al. (2016), based on the track differences between the two TC categories. WTCs generally follow a pattern wherein they re-curve after moving west for a longer time, as they approach the Seoul area. The reason why the TCs passing through the Yellow Sea are weaker than the TCs that recurve earlier and pass the southeastern part of the Korean Peninsula can be 1) some of them make landfall in China first and weaken (Fig. 3c), and 2) the Yellow Sea has a shallower subsurface warm layer and thus smaller ocean heat content than the eastern side of the Jeju Island (Moon and Kwon 2012). As a result of this track difference between WTCs and STCs, the northwestern provinces are more directly exposed to WTCs than STCs, compared with the southern and eastern provinces where STCs were recorded to have considerably stronger local winds and higher amounts of rainfall.

Fig. 3
figure 3

Tracks and landfall intensities of TCs that made landfall in Korea for the years 1979–2015. a) Economic losses caused by STCs (damaged STCs); b) no economic loss from STCs (undamaged STCs); c) economic losses caused by WTCs (damaged WTCs); d) no economic loss from WTCs (undamaged WTCs)

Figure 3 presents the tracks of STCs and WTCs for both damaged and undamaged cases. TCs causing landfall with TY, STS, and TS categories had tracks that were more geared eastward than those causing landfall with ET, TD, and unknown categories, which is consistent with findings by Park et al. (2016). Comparing the STCs that caused economic losses and those that did not, the track appears as the most evident difference (Fig. 3a and b). Nam et al. (2018) showed that TCs approaching the west coast of the Korean peninsula generally cause more damage, owing to the longer duration and topographic influence of rainfall. The difference in track patterns of WTCs was minimal, and the only clear difference between damaged and undamaged WTCs was whether that WTC was approaching as ET or TD/unknown (Fig. 3c and d). According to Choi (2021), ET storms bring more rainfall over the Seoul area on average than what TDs do. In addition, rainfall is also a more dominant hazard source than wind gusts or storm surges in WTCs (Park et al. 2016; Bakkensen et al. 2018).

3.2 Decision Tree Analysis

The decision tree analysis further derives insightful information from the findings shown in Fig. 3. Using the input data of the explanatory variables, the probability of occurrence of national economic losses caused by TC landfalls was estimated (Fig. 4). Using the decision tree flow chart, the parameters that lead to TC damage occurrence can be easily determined, with the first leaf node as the starting point. The results reveal that impact category was the primary parameter of the highest relevance because it was used as the first leaf node to determine differences between damaged and undamaged TCs. Results suggest that no economic loss will occur if TC landfall is under the TD or unknown category. In contrast, economic losses from damages will occur if the landfalling TC is under the STS or TY categories.

Fig. 4
figure 4

Decision tree flowchart analysis of TCs that made landfall in Korea for the years 1979–2015. Parentheses inside end nodes refer to the number of correctly predicted cases/the number of cases that satisfy the conditions

Under ET and TS categories, however, the impact category alone was not a sufficient predictor, and hence, the decision tree algorithm searched for other secondary parameters that may be used as additional information. It was determined that minimum distance and impact angle were the secondary parameters for ET and TS categories, respectively. If TS approached the west or south coast of Korea (angle <225.33°), economic losses were predicted to occur. In contrast, negligible damages would occur if TS approached the east coast. If the ET storm was closer than or equal to 1.22° (~130 km from the Korean coastline), it was predicted to cause damage, whereas if the minimum distance was larger than 1.22°, damage would be negligible.

The accuracy of the decision tree shown in Fig. 4 is 78.9%, and the accuracy of each end node is displayed in the parenthesis (Fig. 4). According to this model, the STS and TY categories inevitably incur damages while TD and unknown categories do not. Since, among the total of 60 STS and TY impact category cases, 50 TCs actually recorded economic losses in Korea, the accuracy for this node is ~83%. Most of the incorrectly predicted STSs and TYs were those that were predicted to cause damage but did not. Their tracks were generally located east of Korea and relatively far from the coastline (black dashed lines in Fig. 5e and f), and their possibility of damaging Korea was less than the others. In contrast, for TD and unknown impact category cases, most of the incorrectly predicted cases were TCs that caused damage but were predicted not to, but their common features are not clearly distinguished in the tracks (red dashed lines in Fig. 5a and b). The incorrect predictions could come from the cases that were undergoing ET but were mistakenly categorized as TD, as it is challenging to correctly identify ETs (Schreck et al. 2014; Studholme et al. 2015). Among the 26 TD and unknown categories, 16 TCs did not cause damage, and thus, the accuracy for this node is only 62%, which is the lowest of all end nodes. This inaccuracy could be improved by further bifurcation; however, the decision tree model no longer branched out, due to the lack of other common characteristics for the damaged cases.

Fig. 5
figure 5

Economic losses from the TCs that made landfall in Korea for the years 1979–2015, represented by the tracks and impact categories. Red indicates economic loss, black indicates no economic loss, solid lines are the cases that are accurately predicted by the decision tree model, and dashed lines are incorrectly predicted cases

For ET and TS nodes, the decision tree made another branch layer. For the TS node with an accuracy of ~79%, TCs that did not cause damage but were predicted to have are the greatest number among incorrectly predicted cases. Looking into all tracks from genesis, most of the damaging TS cases passed the South Sea of Korea (Fig. 5d). Half of the incorrectly predicted TS cases passed the Japanese islands, in a way that they did not necessarily cause damage, but the impact angle satisfied the basis of determination of damage. It was difficult to determine the reason for the incorrect prediction on the other half of incorrectly predicted TS cases based on the given information. The ET category cases only had one incorrect prediction, making it the most accurate (~92%). The one incorrect ET case had a minimum distance of 1.22°, just above the threshold of the decision making, which facilitates incorrect predictions (Fig. 5c). This high sensitivity is a representative limitation of single decision trees, and hence, ensemble methods, such as bagging or random forest, are generally applied to lessen the sensitivity. However, because our main goal was not to develop a forecast model, but to diagnose the decisive factors for damage, and we had a limited number of cases, a decision tree was determined to be more appropriate than the ensemble methods, since single decision tree is more interpretable. For details on the measures that we employed to best mitigate the limitations of the decision tree method, see Section 2.2.

4 Concluding Remarks

In the present study, the relative importance of the risk factors of TC damages in Korea was examined. The impact category was revealed to have the highest importance in determining the probability of direct economic loss caused by an approaching TC. According to the decision tree model, the TY and STS categories would inevitably cause damage while TD and unknown categories would not. For the TS and ET categories, the second branches were necessary to determine damage occurrence. For TS, the impact angle was the second most important factor determining whether damage was caused; TSs approaching from the west coast of the Korean peninsula incurred damage, whereas those from the east coast did not. For the ET category, the distance from the coastline was the second determinant; damage would be caused if ET storms passed closer than ~130 km from the coastline.

The findings from this study highlight the direct impacts of ET cases in Korea. Of the 39 WTC cases, 18 (⁓50%) caused economic losses, whereas, for ET cases, eight of 13 (⁓61%) cases recorded economic losses. The ET cases that did not have damage reported were far from the border of Korea (> 1.22°). By comparing the large-scale environment for ET and TD cases, Choi (2021) showed that the atmospheric background for ET cases has a stronger and larger signal of positive divergence and potential vorticity in the upper troposphere, which supports heavier rainfall in a wider area. Additionally, ET cases can re-intensify in the process when the environment is favorable (e.g., phasing with upper-level troughs and warm oceanic environments), and re-intensified ET cases would then constitute an increased probability of causing economic loss as they landfall (Evans et al. 2017; Ritchie and Elsberry 2003). Among the TCs that went through ET in the WNP for the 1979–2009 period, 36.5% recurved, and of the recurving ET storms, 42.5% re-intensified (Archambault et al. 2013). Future work could examine the re-intensification during ET around Korea as an important hazard factor for the TC risk process.

This study underlines the importance of accurately forecasting ET cases. In the present decision tree model, the lowest accuracy of the TD and unknown categories could be a result of misidentification, i.e., where TCs experienced ET but were not classified into the ET category. ET forecasting can be particularly challenging compared to other specializations as a storm often deviates its structure from pure TCs, and we lack observation data for ET storms. During ET, TCs expand to the outer-core wind field, which affects the distribution of large waves. Precipitation becomes largely asymmetric, and new convections can be 500–2000 km away from the transitioning TC, as a result of quasi-stationary convection from enhanced moisture transport and mid-latitude baroclinic forcing (Grams and Archambault 2016). Although some researchers have examined the final transformation stage of ET (Evans et al. 2017), ET cases that have been documented for entire ET cycles are limited; hence, more observation data are needed to reduce the uncertainty around ET forecasts. Therefore, further research on the direct impacts of ET cases and the physical processes associated with precipitation and re-intensification is expected to improve extreme weather forecasting and disaster preparedness, particularly across the Seoul metropolitan area of Korea. We also note that there are other hazard factors that one can include for the risk analysis such as TC size, TC translation speed, and the land area within gust radii, all of which could affect the spatial and temporal extent of TC impact. Detailed risk analysis utilizing more hazard factors will further advance our understanding of the TC risk process over this region.