1 Introduction

Hurricanes pose a significant threat to life and property (e.g., Pielke et al. 2008). Some argue that ocean warming should increase hurricane frequency and/or intensity (Emanuel 1987, 2005; Klotzbach and Gray 2008; Knutson and Tuleya 2004), though others disagree. For example, using a type of high-resolution model capable of replicating hurricane histories when nudged with NCEP-1 reanalysis data (Knutson et al. 2007), studies forecast an end-twenty-first-century modest decrease in hurricanes and tropical storms (Knutson et al. 2008; Zhao and Held 2012) under a warming regime. Knutson et al. (2019b) in response to 2° future warming inferred a roughly 5% increase in hurricane strength, an increase in storm rainfall intensity, a roughly 13% increase in the percent of Category 4–5 storms, but a decrease in number of storms. Bendar et al. (2010) also forecast a decrease in total storm counts and an increase in the percent of Category 4–5 storms, especially in the western Atlantic north of 20° N. Various studies have found upward (Elsner et al. 2000; Goldenberg et al. 2001; Kossin et al. 2013; Webster et al. 2005; Nyberg et al. 2007; Curry 2008; Saunders and Lea 2008), downward (Kossin et al. 2013 for E. Pacific; Landsea et al. 1996; Webster et al. 2005), or flat (Kossin et al. 2013 for W. Pacific; Solow and Moore 2002) trends in hurricane numbers over various time periods and for different basins. Knutson et al. (2019a) concluded that storm track changes may be detectable in historic data, but that agreement on other changes was weak.

This lack of consensus is complicated further by the common assumption that overall tropical cyclone activity is directly related to threat to life and property, which is flawed (Staehling and Truchelut 2016). In fact, conditions associated with increased activity may lead to decreased coastal impacts (Kossin 2017). While understanding basin-wide hurricane activity is worthwhile in and of itself, as changes in overall count and intensity have energetic and moisture implications for the atmospheric system at large, whenever possible, skipping the intermediary of overall activity and assessing landfall risk directly is desirable when addressing human impacts (Staehling and Truchelut 2016). However, relatively few studies take such an approach when considering hurricane trends (e.g., Elsner and Bossak 2001; Klotzbach et al. 2018). In addition, studies aimed at evaluating trends generally use parametric or nonparametric regression methods, despite the fact that annual hurricane counts follow a Poisson distribution (e.g., Elsner and Schmertmann 1993; Elsner et al. 2001; Hall and Hereid 2015).

To avoid the twin pitfalls of considering overall hurricane activity as a proxy for direct impacts to human populations and discounting the statistical nature of count data, we use more rigorous statistical tests to examine potential trends in hurricane climatology in multiple datasets of hurricane activity.

2 Methods

Hurricane data were obtained from the National Hurricane Center (https://www.aoml.noaa.gov/hrd/tcfaq/E23.html). These data record US continental hurricane landfalls from 1851 through 2017. Database development is described in Landsea and Franklin (2013). They documented storm intensity at the point of landfall rather than maximum storm intensity. This metric is most relevant to US damage risk. We counted number of storms rather than number of locations. There may be some underreporting of some stronger hurricanes (i.e., they may be classified as a lower category) in the early part of the record due to low populations in some coastal regions at that time. For each year, total number of hurricanes and of strong (Category 3–5) hurricanes were tallied from their table. These data differ somewhat from those used in Klotzbach et al. (2018) who used a different base set of data and counted the same storm more than once if it made landfall in more than one place more than 100 miles apart.

Accumulated cyclone energy (ACE) data were summed by Truchelut and Staehling (2017) over the continental US landbase to give the Integrated Storm Activity Annually over the Continental US (ISAAC) index. These data are used here with an update for 2018.

Atlantic basin hurricanes for 1950–2018 were obtained from the IBTrACS database (Knapp et al. 2010). Storms are classified into all and Cat 3–5 groupings. There is the possibility that pre-satellite era Atlantic hurricanes were under-counted (Moon et al. 2019), although Knapp et al. (2010) attempted to correct for this bias.

The Poisson distribution was used to conduct tests of trends. First, the simple linear trends for total and strong storm data were evaluated for different periods. For count data, trends were also evaluated using Poisson regression. Then, for both the data and comparable Poisson synthetic datasets, the magnitudes of trends were found that would be detectable over the 167 years for the landfalling events. Data and synthetic data were also evaluated to determine the effect of series length on trends. This latter test indicates the likelihood of short series showing a spurious trend. Mean number of landfalling hurricanes per year was evaluated for periods of positive and negative Atlantic multidecadal oscillation (AMO) periods as defined by Klotzbach and Gray (2008). Klotzbach et al. (2018) argued that the most recent positive AMO period extended reliably to 2012, so this date was used. Similar analyses were performed with the ISAAC and Atlantic basin data.

3 Results

Counts of hurricanes making landfall in the USA (Fig. 1a) had a mean of 1.75/year for all storms and 0.54/year for major storms (Category 3 or greater; Fig. 1b). For all storms, the mean (1.7485) and variance (2.0) were close to equal, allowing tests using Poisson distributions. For all storms, the best-fit linear regression (Fig. 1a) had a slope of − 0.29/century, but P = 0.2 and the 95% confidence interval overlapped zero, so this result is not significant. For major storms, the slope was  + 0.03/century, but P = 0.82 and the adjusted R2 was − 0.0057, indicating that a two-parameter linear model explains less than a simple mean. In addition, the 95% confidence interval overlapped zero. Using only the 1900–2017 data, results for trend were visually and statistically indistinguishable from the 1851 to 2017 data. Thus, no trend was detectable for either all storms or strong storms.

Fig. 1
figure 1

Hurricane counts and best-fit Poisson regression trends from 1851 to 2017. a All US hurricanes. b Category 3 + hurricanes. Black dashed line is the full series trend, red dashed the trend since 1900, and green dashed the trend since 1980. All regressions P > 0.05 and slope confidence intervals include zero

Given that count data tend to be positively skewed and kurtotic, especially below a mean value of about 10 (Coxe et al. 2009), Poisson regression suggests itself as a method. Poisson regression is a type of general linear model, where the link function is the natural logarithm. For a linear model in time only, the predicted value C at t is obtained by exponentiation

$$C_{t} = e^{{b_{0} + b_{1} t}}$$

or

$$C_{t} = b_{2} e^{{b_{1} t}}$$

A linear change in the Poisson mean gives a curved trend. However, when applied to a near-zero change data trend as in Fig. 1, the very small estimated slope value gives a curve that is nearly linear. Poisson regression for all three time periods (starting in 1851, 1900, and 1980) and both all storms and strong storms are not different from a zero slope, have nonsignificant parameters, and are visually indistinguishable from linear regression trends.

Synthetic data can be used to evaluate effects of short time series on trend detection. For the actual data for all storms, all series subsets of given lengths were evaluated for trend (ignoring statistical significance) and the maximum positive and negative slopes at a given segment length were plotted versus length. The result (Fig. 2, black line) shows that below 30 years of data, but especially below 20 years, short segments are highly likely to give strong positive or negative trends even when the entire data series has no significant trend. Based on 400 trials, the mean maximum or minimum at each series length using the observed mean of 1.75/year (Fig. 2, red line) diverges less than the actual data. This is an indicator of system memory (long-term persistence) at times below 10 years, which could be due to cycles of ocean circulation patterns. There is in fact evidence for decadal-scale fluctuations in Atlantic hurricane activity related to ocean conditions such as the AMO (Klotzbach and Gray 2008) or Atlantic multidecadal mode (Goldenberg et al. 2001; Landsea et al. 1999). If predictable, this could offer hope for enhancing short-term hurricane forecasts. Notably, short-term excursions from a Poisson sample with a mean of 6 (Fig. 2, green line) diverged even more from the zero line. This suggests that short-term trends from smaller versus larger datasets (e.g., landfalling vs. all basin) are inherently likely to diverge. This means that full basin trends could be larger than US land-falling trends on short timescales.

Fig. 2
figure 2

Effect of series length on trend estimates evaluated at 5-year increments for all US hurricanes. Black line is all data. Red line is mean maximum or minimum of Monte Carlo samples for synthetic Poisson series with the same mean. Green line is synthetic series with mean of 6

Klotzbach and Gray (2008) and Klotzbach et al. (2018) found that Atlantic hurricanes and Atlantic coast landfalling storms, respectively, were more frequent during positive phases of the AMO. We here repeat their analysis using slightly different data and different statistics. The ratios of positive/negative periods (Table 1) are very similar to their result. Although these ratios seem large, it is useful to do a test of means for the positive versus negative AMO periods. Due to nonnormality, a Kruskal–Wallis test was used. For the all-hurricane data, the periods 1926–1969 verses 1970–1994 differed at P < 0.05, as did the all positive years versus all negative years, but all other comparisons across all and major storms in Table 1 were not significant.

Table 1 Hurricane counts and ratios for US landfalling storms versus AMO phase (after Klotzbach and Gray 2008 and Klotzbach et al. 2018). Columns 3 and 4 are mean annual counts (first five rows) or ratios (last five rows)

An interesting consequence of this relationship is that transitions between AMO phases might generate positive or negative trends that will not persist. We tested for this effect with the total hurricane count data. The expected AMO-induced trends would be positive for neg–pos transitions and negative for pos–neg transitions. For each transition in the data, the length of the test for a trend was based on the shorter AMO phase length. For example, the first AMO phase (Table 1) is 1878–1899, which is 22 years, so the first 22 years of the next phase were joined to make an 1878–1921 dataset for evaluating a linear trend. For all eight transitions, the signs of the slopes were as predicted, but none of the slope coefficients were significant, probably due to the high variance in the data over these short intervals. Using Poisson regression, the lack of significance of all transition slopes still holds.

An additional question that can be asked is how large a trend would need to be for reliable detection over the 167 years. This requires a type of power analysis. Starting with the given mean of 1.75/year for all storms, incrementally rising slopes were tested using synthetic data (Fig. 3). If we wish to consider an 80% likelihood of detecting a trend, the minimum slope is 0.00704 (0.704/century) based on the graph. Repeating this analysis for Poisson means from 1 to 21 using 1000 Monte Carlo samples shows that the 80% detectability cutoff for linear trends is not dependent on the mean but appears to fluctuate randomly between 0.0068 and 0.00705/year (not shown). Larger Monte Carlo samples would likely narrow this range.

Fig. 3
figure 3

Monte Carlo estimation of trend detectability for a 167-year series. For 1000 trials with Poisson mean from all US hurricane data, for positive slopes (counts/year), the plot shows the percent of simulations where the slope was statistically significant

Results for the ISAAC data differ in some ways. With a mean of 4.89 cumulative ACE units and a variance of 15.53, the ISAAC data are not Poisson even though they give this visual impression (Fig. 4). We can think of the ISAAC data as a Poisson process (generating a storm) times a magnitude, with a random path factor that governs how much of the storm intersects the US mainland and for how long. The ISAAC data also include storms whose eye did not hit land, which is why there are no zero values in the time series. As with the count data, the 119 year data have essentially a zero trend (slope of – 0.0092 is n.s.). Poisson regression is not appropriate because ISAAC is not count data.

Fig. 4
figure 4

ISAAC annual cumulative storm intensity (summed ACE over land) 1900–2018. Black dashed line is the full series linear trend and red dashed the trend since 1900. All regressions P > 0.05 and slope confidence intervals include zero

The result for short segment trends for ISAAC (Fig. 5) is the same basic shape as for the count data (Fig. 2), though on a different scale. This means that the same cautions about spurious short-term trends apply to ISAAC.

Fig. 5
figure 5

Effect of sequential series (subset) length for ISAAC data trends 1900–2018

The relationship of ISAAC to the AMO (Table 2) is similar to but slightly stronger than for the count data (Table 1). All three linear trend transitions between AMO phases have the expected sign, with the 1945–1969 versus 1970–1994 phases trend being significant (slope = − 0.0991, P = 0.009). Using a test of means across AMO periods, one of three comparisons was significantly different, as was the all positive versus all negative test. Over the two datasets (total counts and ISAAC), eleven transition linear trends could be tested and all gave the expected trend sign, with a binary probability of 0.0005.

Table 2 ISAAC for US landfalling storms versus AMO phase (after Klotzbach and Gray 2008). Columns 3 and 4 are mean annual intensities (first four rows) or ratios (last four rows)

The all Atlantic basin storm plot 1950–2018 (69 years) shows a slight upward trend (Fig. 6a) that is not significant by either linear or Poisson regression. The post-1990 trend is visually elevated, but the slope again does not differ from zero by either test. The major storm history (Fig. 6b) has a nonsignificant trend for 1950–2018 and 1990–2018 by both tests. Kossin et al. (2013) found a strong upward trend over the period 1982–2009 using quantile regression for Category 4–5 storms. We matched this period for Category 3–5 storms using Poisson regression (Fig. 6b, green line) and also obtained a significant upward slope. However, the slope over the prior period of the same length (1954–1981) is downward (Fig. 6b, red line) and the slope for 1990–2018 is much flatter and nonsignificant. Thus, inferences about trends are highly dependent on the time period chosen.

Fig. 6
figure 6

Atlantic basin hurricanes 1950–2018 with Poisson regression trend lines. a All storms. Dashed regression lines slopes not different from zero. b Major storms (Cat 3 +). Black dashed is entire series trend (n.s.). Red dashed is 1954–1981 (n.s.). Green dashed is 1982–2009 (p < 0.01). Blue dashed is 1990–2017 (n.s.)

In contrast to the landfalling data, the all Atlantic basin storms match expectations from a simulated Poisson (Fig. 7) for purely random short-term trends. There is no indication of persistence that would need to be explained by large-scale ocean activity like the AMO.

Fig. 7
figure 7

All Atlantic basin hurricane max (min) trends verse. series length. Red line for Poisson simulation

Given the inherent noise in a Poisson process, linear regression may be flawed, but the curvilinear Poisson model may not be ideal either. Another approach suggests itself that is based on the null expectation of a Poisson series. Consider a time-series 400 years long with the hurricane count each year being drawn from a Poisson distribution with mean 1.75/year (per Fig. 1a). This time series will have no trend if long enough. For each sequential series 10 years long (starting in year 1, year 2, etc.), the linear regression slope is calculated. We then find the value that 95% of series slopes fall below (for a positive trend) or above (for a negative trend). We repeat this for longer series and plot the result (Fig. 8). This boundary plot is monotonically decreasing to zero with series length. If an observed real data series trend (slope) falls above the line, we can say it is unlikely (though not impossible) that it is due to chance. Note that curve shape will differ with the Poisson mean (µ). An excellent fit to this curve can be obtained with a two-part exponential decay function. Alternatively, numerical interpolation can be used to smooth the curve before testing. Comparing the slopes for the earlier test of AMO transition periods, none of them are significant by this test.

Fig. 8
figure 8

For random Poisson series of various lengths, 95% of slopes fall below the depicted line. Only measured slopes above the line are likely to be nonrandom

4 Discussion

The known gaps in historical data in the earlier part of the landfalling record (Landsea and Franklin 2013) affect mainly maximum windspeeds. This means that some hurricanes might have been more intense at landfall than recorded (Klotzbach 2006; Landsea et al. 2006). Evidence for this is that a reanalysis of hurricane intensity data (Kossin et al. 2007) reduced the trend in all ocean basins except the North Atlantic. Since this biases the strong storm record down in the early period, it adds a bias toward an upward trend over the record. The lack of trend observed here is in spite of this potential bias. The fact that the 1851–2017 and 1900–2017 trend lines are nearly identical (Fig. 1) also suggests that early-period bias is small. According to Klotzbach and Landsea (2015), detection improvements over the period 1970–2004 may be responsible for part of the upward trend detected by Webster et al. (2005). In addition, the reversal of many of the Webster trends when 10 more years of data were added is what we would predict from our results on short-term spurious trends.

The effects of noise on interpreting trends affects attempts to relate hurricane activity to the AMO (per Klotzbach and Gray 2008). While we found that the sign of shifts in hurricane counts and ISAAC between AMO phases was highly significant across 11 tests, individual tests of either means or linear trends between AMO phases were mostly not significant except for four of nine ratio tests. Thus, while the direction of shifts supports the AMO influence theory, individual AMO shift influences on number of hurricanes or energy will not be reliably detectable statistically, probably due to the high variance. Our purpose in conducting this AMO test was largely to illustrate how such tests need to be performed. We recognize that other ocean cycles (e.g., Klotzbach et al. 2018; Zhang et al. 2017) may also be influential or even stronger than the AMO.

The results of this analysis suggest that tests for trend changes based on short series (e.g., Elsner et al. 2000; Goldenberg et al. 2001; Emanuel 2005; Curry 2008; Saunders and Lea 2008) may suffer from spurious trends, a general problem pointed out by Koutsoyiannis (2013) for this type of data. For example, Landsea (2015) pointed out that the increased Atlantic basin-wide hurricane activity increase since 1970 (Kunkel et al. 2013) may not be unusual in the context of a 100 + year dataset, as we show in Fig. 6b. Attempts to smooth data with five-year block means (Elsner et al. 2000), five-year running means (Webster et al. 2005, 2006), or eleven-year running means (Curry 2008), while a good idea in principle, may be using too short a smoothing period based on results here. Attempts to unravel causation using annual hurricane counts and climatic variables (e.g., SST, ENSO relationships) (e.g., Elsner et al. 2001; Hoyos et al. 2006; Curry 2008; Saunders and Lea 2008; Klotzbach 2010; Zhang et al. 2017) may likewise risk spurious correlations unless datasets are long. Attempts to explain a particular active year (e.g., Klotzbach and Gray 2006) should incorporate the expectations from a stochastic (i.e., Poisson) process into the evaluation of whether a particular year was truly unusual. This effect may explain some of the variability in trends detected in previous studies.

Another curious relationship is that between all-basin strong storms (Fig. 6b) and landfalling strong storms (Fig. 1b). The all-basin storms are much more variable on decadal scales. Since landfalling storms are a subset of all-basin storms, it would seem that more variable landfalling histories would be more likely if steering effects differed on decadal scales. This is the opposite of what is observed. A mechanism that would smooth out all-basin storm fluctuation for the landfalling subset has never been proposed and seems difficult to conceptualize. It could be that classification of storm strength in the open ocean storm population suffers from temporal biases. This deserves further study.

An important question is whether climate change in general or factors like the AMO on decadal scales are altering hurricane frequency and/or intensity. Such changes will affect hurricane damage (e.g., Pielke et al. 2008). Results here suggest that detection of changes will be difficult on short timescales due to high variability. Trenary et al. (2019) did in fact find a lack of explanatory power for models over the period 1958–2005.