6.1 Taxonomy of Methods

Analysis of epidemic time series is a large endeavor because of the richness of dynamical patterns and plentitude of historical data (Rohani and King 2010). A wide range of tools are used, some of which are borrowed from mainstream statistics other of which are “custom made.” The classic “mainstream” methods belong to two categories: the so-called time-domain and frequency-domain methods. The autocorrelation function and ARIMA models belong to the former class and spectral analysis and the periodogram belong to the latter. Hybrid time/frequency methods have become increasingly prominent in the form of wavelet analysis because it allows the study of changes in disease dynamics through time (Grenfell et al. 2001). This chapter discusses a variety of “mainstream” methods using a variety of time-series data. Examples of “custom made” methods are mechanistic models such as the time-series SIR (TSIR) which is the focus of Chap. 7, semi-parametric models (Ellner et al. 1998) and nonparametric (“empirical dynamic”) models. An example of the latter is discussed in Sect. 10.8.

6.2 Time Domain: ACF and ARMA

The autocorrelation function (ACF) and the autoregressive-moving-average (ARMA) model are classic tools for describing serial dependence in time series in the time-domain. We first apply the ACF to the (weekly) time series of prevalence from the seasonally forced SEIR model. The ACF quantifies serial correlations at different time lags. Figure 6.1 shows the ACF for lags up to 3 years (=156 weeks):

Fig. 6.1
figure 1

The ACF of prevalence from the seasonally forced SEIR model. (a) time series, (b) ACF

The major peak in autocorrelation at 104 weeks reflects the dominant 2-year periodicity; the minor peak at 52 weeks reflects the subdominant annual periodicity.

6.2.1 ARMA

Autoregressive moving-average models have been used to forecast disease dynamics (e.g., influenza-like illness;  Choi and Thacker 1981). The ARMA(p,q)-model assumes that the future incidence (Yt) can be predicted according to Yt = a1Yt−1 + + apYtp + εtb1et−1bqεtq, where the ε’s represent stochasticity and the echo of past stochasticity.Footnote 1 We apply the ARMA model to monthly ILI incidence in Iceland using the forecast-package:

We convert the data frame to a time-series ts-object and do a seasonal decomposition (Fig. 6.2). There is a slight trend through the data, but as expected the winter seasonality is the dominant feature of the time series. Because the epidemics are very peaky we consider square-root transformed numbers:

Fig. 6.2
figure 2

A decomposition of the Iceland ILI time series

We train a seasonal ARMA(2,1) model on for example 1990 through 2000 epidemics and do a 24-month forecast (Fig. 6.3):

Fig. 6.3
figure 3

Forecast of square-root transformed ILI incidence in Iceland for the 2001 and 2002 seasons using a seasonal ARMA(2,1) model

While ARMA forecasting is useful in many disciplines and is an important part of the broad statistical toolbox, it suffers from lacking mechanism and can therefore not answer questions like “how are dynamics likely to change if we vaccinate 50% of susceptible children?” It furthermore assumes that time series are stationary, essentially meaning that dynamical patterns do not change radically over time. As we frequently see in infectious diseases, this is not a good assumption. In Chap. 7 we discuss how time-series methods that incorporate more biological mechanisms (like the “time-series SIR” model) are better able to capture/predict dynamic transitions.

6.3 Frequency Domain

The Schuster periodogram is a direct way of estimating and testing for significant periodicity. The periodogram decomposes a time series into cycles of different frequencies (frequency = 1/period). The importance of each frequency is measured by the spectral amplitude. We use the spectrum-function to calculate the periodogram for the time series from the seasonally forced SEIR model. The analysis clearly identifies the two superimposed periods (Fig. 6.4).

Fig. 6.4
figure 4

The power spectrum of prevalence for the seasonally forced SEIR model. (a) Default plot of log-amplitude against frequency and (b) amplitude against period (in years)

Using the fast Fourier transform (FFT), the Schuster periodogram will automatically estimate the spectrum of a time series (of length T) at the following T∕2 frequencies: \(f =\{ \frac{1} {T}, \frac{2} {T},\ldots, \frac{T/2} {T} \}\) (or equivalent periods: \(\{T, \frac{T} {2},\ldots,2\})\). An upside of using FFT is that it is fast. A downside is that the Schuster periodogram is not a consistent method, meaning that the estimated periodogram does not converge on the true power spectrum as the time series gets longer because the number of frequencies considered (and thus the number of parameters) increases linearly with time-series length. Numerous fixes of this have been developed, the most common is to smooth the periodogram (Priestley 1981), but nonparametric density estimation has also been proposed. We use Kooperberg et al.’s (1995) log-spline method in Sect. 9.7.

6.4 Wavelets

The wavelet spectrum is an extension of spectral analysis that allows an additional time axis and therefore to allow the study of changes in dynamics over time (Torrence and Compo 1998). Unlike the periodogram, wavelets do not have “canonical” periods for decomposition. If we use the Morlet wavelet (which is provided by the cwt-function in the Rwave-package), we need to specify the periods we wish to consider through the number of octaves, no, and voices, nv. With 8 octaves the main periods will be {21, 22, , 28} = {2, 4, , 256}. The number of voices specifies how many subdivisions to estimate within each octave. With four voices the resultant periods will be {21, 21. 25, 21. 5, 21. 75, 22, 22. 25, }. We first consider the simulated time series of prevalence for the unforced SEIR model (Fig. 6.5).

Fig. 6.5
figure 5

Prevalence against time for the unforced SEIR model (μ = 1∕50, N = 1, β = 1000, σ = 365∕8, γ = 365∕5) with associated wavelet spectrum

The initial inter-epidemic period at around 2.5 years is strong (recall that the dampening period of the SEIR with these parameters is 2.3 years with these parameters; Sect. 5.3), but then wanes as the system converges towards the stable endemic equilibrium. We see this clearly illustrated if we compare the wavelet spectrum at, for example, the beginning of year 2 and the beginning of year 10 (Fig. 6.6).

Fig. 6.6
figure 6

The estimated wavelet spectrum at the first week of year 2 and year 10 for the unforced SEIR model

6.5 Measles in London

The pre-vaccination incidence of measles shows interesting non-stationarities that have been traced back to changing susceptible recruitment due to the post-World War II baby boom (Fig. 6.7). The meas data set contains the biweekly incidence and births from 1944 and 1965.

Fig. 6.7
figure 7

Biweekly incidence of measles in London between 1944 and 1965 with susceptible recruitment (births) superimposed

We apply the wavelet analysis to the historical measles dynamics from London (Grenfell et al. 2001). In addition to providing a continuous wavelet transform, the Rwave-package has a “crazy climber” algorithm to highlight ridges in the wavelet spectrum (implemented with the crc and cfamily-functions). When applied to the London measles data, the crazy climber reveals the background annual rhythm and the punctuated appearance of the biennial cycle in the early 1950s (Fig. 6.8).

Fig. 6.8
figure 8

The wavelet spectrum of the London measles incidence with “crazy-climber” ridges. The appearance of a significant biennial rhythm in the 1950s is conspicuous

We can contrast the spectrum of the first biweek of January 1945 and the first biweek of January 1954 (Fig. 6.9). The transition from a dominance of annual to biennial epidemics is conspicuous. Two-year cycles are pronounced when birth rates are around 20 per thousand per year; Annual epidemics are associated with higher birth rates. This transition, due to the post-World War II baby boom, is as predicted by the seasonally forced SEIR model with dropping birth rates (Earn et al. 2000b, Fig. 5.6).

Fig. 6.9
figure 9

The wavelet spectrum of the London measles in Jan 1945 vs Jan 1954

The above methods of time-series analysis require regularly spaced time series without any missing values. Lomb (1976) developed the Lomb periodogram for unequally spaced data. Furthermore, the classic spectral methods cannot quantify rhythms in cruder “nonmetric” data such as presence/absence of infection. Legendre et al. (1981) developed the “contingency periodogram” for such situations. The nlts-package has the functions spec.lomb and contingency.periodogram to carry out such analyses. The mvcwt-package can do wavelet analyses of time series with missing data.

6.6 Project Tycho

Project Tycho (http://www.tycho.pitt.edu) is a great resource for time series on historical disease incidence. The data used in Sect. 5.1 were downloaded from this database. Weekly data of whooping cough (1925–1947), diphtheria (1914–1947), and measles (1914–1947) in the city of Philadelphia are from Project Tycho and are saved in the tywhooping, tydiphtheria, and tymeasles data sets. These were all important causes of childhood mortality in the early twentieth century and were therefore “reportable infections” in the USA. Whooping cough is caused by bacterial colonization of the lower respiratory tract by congeneric species in the genus Bordetella, most notably B. pertussis, and cause violent coughing, vomiting, and pneumonia. Diphtheria is caused by infection by Corynebacterium diphtheriae which toxins cause a range of health complications. Measles is a severely immuno-compromising paramyxovirus. We will use these time series to illustrate some additional aspects of disease dynamics/time-series analysis.

These time series have occasional weeks of missing data which we interpolate. We will use the imputeTS-package. But first we use the whooping cough data to illustrate the use of the Lomb periodogram for spectral analysis of unevenly spaced data.

6.7 Lomb Periodogram: Whooping Cough

There are 14 missing weeks in the tywhooping data set. For frequency-domain analyses of this data we either have to interpolate the missing weeks or use the Lomb periodogram. We compare the two approaches:

With only 14 missing values in a 1000+ week long time series the shape of the Schuster periodogram (on interpolated data) and the Lomb periodogram are almost identical (Fig. 6.10).

Fig. 6.10
figure 10

The Lomb periodogram and the classic periodogram (on interpolated data) of the Philadelphia whooping cough time series

6.8 Triennial Cycles: Philadelphia Measles

Like in London, pre-vaccination measles dynamics in Philadelphia exhibit interesting nonstationarities we can highlight with the wavelet analysis. There are 24 missing weeks we interpolate:

We twiddle with the graphics margins and layout using the par and layout functions to make a prettier compound graphic (Fig. 6.11).

Fig. 6.11
figure 11

Wavelet spectrum of measles in Philadelphia

The early annual epidemics give way to triennial epidemic cycles from 1920 onwards (Fig. 6.12). The tri-annual cycles are the hallmarks of chaotic epidemics (Dalziel et al. 2016) we discuss further in Sect. 10.2.

Fig. 6.12
figure 12

The Jan 1915 versus Jan 1940 measles wavelet spectrum; Annual epidemics give way to triennial cycles

6.9 Wavelet Reconstruction and Wavelet Filter: Diphtheria

Diphtheria exhibited conspicuous annual cycles during the beginning of the twentieth century until the addition of an adjuvant to the toxoid vaccine in 1926 led to a strong secular downward trend and effectively the elimination of the disease (Fig. 6.13). The wavelet lets us study how adjuvant-induced reduction in incidence is associated with a loss of periodicity and increase in high-frequency variability (“noise”) (Fig. 6.13). There are 18 missing values we interpolate prior to the analysis.

Fig. 6.13
figure 13

Wavelet spectrum of diphtheria in Philadelphia

We are sometimes interested in using the wavelet as a “filter.” We may for example want to quantitate how the strength of the annual cycle of diphtheria (in the 45–60 week range, say) changes over time. To do this we use wavelet reconstruction around the relevant time scales (Fig. 6.14). For the Morlet wavelet the formula for reconstruction using the j’th though j + sth scales is provided by Torrence and Compo (1998). The mid-pass filter clearly illustrates the loss of annual signal over time (Fig. 6.14).

Fig. 6.14
figure 14

Wavelet reconstructed variability in the 45–60 week range of diphtheria in Philadelphia

6.10 Advanced: FFT and Reconstruction

One-hundred-and-twenty years ago, Arthur Schuster proposed the bold idea that any discrete time series can be decomposed and exactly reconstructed from a sum of trigonometric functions. Given its nonstationary transition from annual to biennial epidemics, the pre-vaccination 1944–1964 London measles time series (in the meas data set) offers a nice test-bed for this assertion.

The below code generates an animated visualization of the reconstruction. Section 11.6 discusses making in-line and permanent animations in more detail. A web-optimized animated gif can be found in https://github.com/objornstad/epimdr/blob/master/mov/fftrecon.gif.

If z is the fast Fourier transform of the time series, then the trigonometric “signal” of the k’th observation is \(\frac{1} {T}(\sum _{f}(Re(z)\cos (2\pi (k - 1)f)) - Im(z)\sin (2\pi (k - 1)f))\), where Re() and Im() represent real and imaginary parts. We first piece together relevant bits for the formula; we then do the reconstruction in the rec2-object where the contribution of each frequency is weighed:

Finally we can visualize the convergence on the original signal using the sequence of frequencies order by amplitude (highest to lowest importance):