Time-Series Analysis

Bjørnstad, Ottar N.

doi:10.1007/978-3-319-97487-3_6

Ottar N. Bjørnstad⁵

Part of the book series: Use R! ((USE R))

4564 Accesses
1 Citations

Abstract

Analysis of epidemic time series is a large endeavor because of the richness of dynamical patterns and plentitude of historical data (Rohani and King 2010). A wide range of tools are used, some of which are borrowed from mainstream statistics other of which are “custom made.” The classic “mainstream” methods belong to two categories: the so-called time-domain and frequency-domain methods. The autocorrelation function and ARIMA models belong to the former class and spectral analysis and the periodogram belong to the latter. Hybrid time/frequency methods have become increasingly prominent in the form of wavelet analysis because it allows the study of changes in disease dynamics through time (Grenfell et al. 2001). This chapter discusses a variety of “mainstream” methods using a variety of time-series data.

This chapter uses the following R-packages: forecast, Rwave, imputeTS, nlts, and plotrix.

Access provided by CONRICYT-eBooks. Download chapter PDF

6.1 Taxonomy of Methods

Analysis of epidemic time series is a large endeavor because of the richness of dynamical patterns and plentitude of historical data (Rohani and King 2010). A wide range of tools are used, some of which are borrowed from mainstream statistics other of which are “custom made.” The classic “mainstream” methods belong to two categories: the so-called time-domain and frequency-domain methods. The autocorrelation function and ARIMA models belong to the former class and spectral analysis and the periodogram belong to the latter. Hybrid time/frequency methods have become increasingly prominent in the form of wavelet analysis because it allows the study of changes in disease dynamics through time (Grenfell et al. 2001). This chapter discusses a variety of “mainstream” methods using a variety of time-series data. Examples of “custom made” methods are mechanistic models such as the time-series SIR (TSIR) which is the focus of Chap. 7, semi-parametric models (Ellner et al. 1998) and nonparametric (“empirical dynamic”) models. An example of the latter is discussed in Sect. 10.8.

6.2 Time Domain: ACF and ARMA

The autocorrelation function (ACF) and the autoregressive-moving-average (ARMA) model are classic tools for describing serial dependence in time series in the time-domain. We first apply the ACF to the (weekly) time series of prevalence from the seasonally forced SEIR model. The ACF quantifies serial correlations at different time lags. Figure 6.1 shows the ACF for lags up to 3 years (=156 weeks):

The major peak in autocorrelation at 104 weeks reflects the dominant 2-year periodicity; the minor peak at 52 weeks reflects the subdominant annual periodicity.

6.2.1 ARMA

Autoregressive moving-average models have been used to forecast disease dynamics (e.g., influenza-like illness; Choi and Thacker 1981). The ARMA(p,q)-model assumes that the future incidence (Y_t) can be predicted according to Y_t = a₁Y_t−1 + … + a_pY_t−p + ε_t − b₁e_t−1 − … − b_qε_t−q, where the ε’s represent stochasticity and the echo of past stochasticity.^{Footnote 1} We apply the ARMA model to monthly ILI incidence in Iceland using the forecast-package:

We convert the data frame to a time-series ts-object and do a seasonal decomposition (Fig. 6.2). There is a slight trend through the data, but as expected the winter seasonality is the dominant feature of the time series. Because the epidemics are very peaky we consider square-root transformed numbers:

We train a seasonal ARMA(2,1) model on for example 1990 through 2000 epidemics and do a 24-month forecast (Fig. 6.3):

While ARMA forecasting is useful in many disciplines and is an important part of the broad statistical toolbox, it suffers from lacking mechanism and can therefore not answer questions like “how are dynamics likely to change if we vaccinate 50% of susceptible children?” It furthermore assumes that time series are stationary, essentially meaning that dynamical patterns do not change radically over time. As we frequently see in infectious diseases, this is not a good assumption. In Chap. 7 we discuss how time-series methods that incorporate more biological mechanisms (like the “time-series SIR” model) are better able to capture/predict dynamic transitions.

6.3 Frequency Domain

The Schuster periodogram is a direct way of estimating and testing for significant periodicity. The periodogram decomposes a time series into cycles of different frequencies (frequency = 1/period). The importance of each frequency is measured by the spectral amplitude. We use the spectrum-function to calculate the periodogram for the time series from the seasonally forced SEIR model. The analysis clearly identifies the two superimposed periods (Fig. 6.4).

Using the fast Fourier transform (FFT), the Schuster periodogram will automatically estimate the spectrum of a time series (of length T) at the following T∕2 frequencies: \(f =\{ \frac{1} {T}, \frac{2} {T},\ldots, \frac{T/2} {T} \}\) (or equivalent periods: \(\{T, \frac{T} {2},\ldots,2\})\). An upside of using FFT is that it is fast. A downside is that the Schuster periodogram is not a consistent method, meaning that the estimated periodogram does not converge on the true power spectrum as the time series gets longer because the number of frequencies considered (and thus the number of parameters) increases linearly with time-series length. Numerous fixes of this have been developed, the most common is to smooth the periodogram (Priestley 1981), but nonparametric density estimation has also been proposed. We use Kooperberg et al.’s (1995) log-spline method in Sect. 9.7.

6.4 Wavelets

The wavelet spectrum is an extension of spectral analysis that allows an additional time axis and therefore to allow the study of changes in dynamics over time (Torrence and Compo 1998). Unlike the periodogram, wavelets do not have “canonical” periods for decomposition. If we use the Morlet wavelet (which is provided by the cwt-function in the Rwave-package), we need to specify the periods we wish to consider through the number of octaves, no, and voices, nv. With 8 octaves the main periods will be {2¹, 2², …, 2⁸} = {2, 4, …, 256}. The number of voices specifies how many subdivisions to estimate within each octave. With four voices the resultant periods will be {2¹, 2^1. 25, 2^1. 5, 2^1. 75, 2², 2^2. 25, …}. We first consider the simulated time series of prevalence for the unforced SEIR model (Fig. 6.5).

The initial inter-epidemic period at around 2.5 years is strong (recall that the dampening period of the SEIR with these parameters is 2.3 years with these parameters; Sect. 5.3), but then wanes as the system converges towards the stable endemic equilibrium. We see this clearly illustrated if we compare the wavelet spectrum at, for example, the beginning of year 2 and the beginning of year 10 (Fig. 6.6).

6.5 Measles in London

The pre-vaccination incidence of measles shows interesting non-stationarities that have been traced back to changing susceptible recruitment due to the post-World War II baby boom (Fig. 6.7). The meas data set contains the biweekly incidence and births from 1944 and 1965.

We apply the wavelet analysis to the historical measles dynamics from London (Grenfell et al. 2001). In addition to providing a continuous wavelet transform, the Rwave-package has a “crazy climber” algorithm to highlight ridges in the wavelet spectrum (implemented with the crc and cfamily-functions). When applied to the London measles data, the crazy climber reveals the background annual rhythm and the punctuated appearance of the biennial cycle in the early 1950s (Fig. 6.8).

We can contrast the spectrum of the first biweek of January 1945 and the first biweek of January 1954 (Fig. 6.9). The transition from a dominance of annual to biennial epidemics is conspicuous. Two-year cycles are pronounced when birth rates are around 20 per thousand per year; Annual epidemics are associated with higher birth rates. This transition, due to the post-World War II baby boom, is as predicted by the seasonally forced SEIR model with dropping birth rates (Earn et al. 2000b, Fig. 5.6).

The above methods of time-series analysis require regularly spaced time series without any missing values. Lomb (1976) developed the Lomb periodogram for unequally spaced data. Furthermore, the classic spectral methods cannot quantify rhythms in cruder “nonmetric” data such as presence/absence of infection. Legendre et al. (1981) developed the “contingency periodogram” for such situations. The nlts-package has the functions spec.lomb and contingency.periodogram to carry out such analyses. The mvcwt-package can do wavelet analyses of time series with missing data.

6.6 Project Tycho

Project Tycho (http://www.tycho.pitt.edu) is a great resource for time series on historical disease incidence. The data used in Sect. 5.1 were downloaded from this database. Weekly data of whooping cough (1925–1947), diphtheria (1914–1947), and measles (1914–1947) in the city of Philadelphia are from Project Tycho and are saved in the tywhooping, tydiphtheria, and tymeasles data sets. These were all important causes of childhood mortality in the early twentieth century and were therefore “reportable infections” in the USA. Whooping cough is caused by bacterial colonization of the lower respiratory tract by congeneric species in the genus Bordetella, most notably B. pertussis, and cause violent coughing, vomiting, and pneumonia. Diphtheria is caused by infection by Corynebacterium diphtheriae which toxins cause a range of health complications. Measles is a severely immuno-compromising paramyxovirus. We will use these time series to illustrate some additional aspects of disease dynamics/time-series analysis.

These time series have occasional weeks of missing data which we interpolate. We will use the imputeTS-package. But first we use the whooping cough data to illustrate the use of the Lomb periodogram for spectral analysis of unevenly spaced data.

6.7 Lomb Periodogram: Whooping Cough

There are 14 missing weeks in the tywhooping data set. For frequency-domain analyses of this data we either have to interpolate the missing weeks or use the Lomb periodogram. We compare the two approaches:

With only 14 missing values in a 1000+ week long time series the shape of the Schuster periodogram (on interpolated data) and the Lomb periodogram are almost identical (Fig. 6.10).

6.8 Triennial Cycles: Philadelphia Measles

Like in London, pre-vaccination measles dynamics in Philadelphia exhibit interesting nonstationarities we can highlight with the wavelet analysis. There are 24 missing weeks we interpolate:

We twiddle with the graphics margins and layout using the par and layout functions to make a prettier compound graphic (Fig. 6.11).

The early annual epidemics give way to triennial epidemic cycles from 1920 onwards (Fig. 6.12). The tri-annual cycles are the hallmarks of chaotic epidemics (Dalziel et al. 2016) we discuss further in Sect. 10.2.

6.9 Wavelet Reconstruction and Wavelet Filter: Diphtheria

Diphtheria exhibited conspicuous annual cycles during the beginning of the twentieth century until the addition of an adjuvant to the toxoid vaccine in 1926 led to a strong secular downward trend and effectively the elimination of the disease (Fig. 6.13). The wavelet lets us study how adjuvant-induced reduction in incidence is associated with a loss of periodicity and increase in high-frequency variability (“noise”) (Fig. 6.13). There are 18 missing values we interpolate prior to the analysis.

We are sometimes interested in using the wavelet as a “filter.” We may for example want to quantitate how the strength of the annual cycle of diphtheria (in the 45–60 week range, say) changes over time. To do this we use wavelet reconstruction around the relevant time scales (Fig. 6.14). For the Morlet wavelet the formula for reconstruction using the j’th though j + sth scales is provided by Torrence and Compo (1998). The mid-pass filter clearly illustrates the loss of annual signal over time (Fig. 6.14).

6.10 Advanced: FFT and Reconstruction

One-hundred-and-twenty years ago, Arthur Schuster proposed the bold idea that any discrete time series can be decomposed and exactly reconstructed from a sum of trigonometric functions. Given its nonstationary transition from annual to biennial epidemics, the pre-vaccination 1944–1964 London measles time series (in the meas data set) offers a nice test-bed for this assertion.

The below code generates an animated visualization of the reconstruction. Section 11.6 discusses making in-line and permanent animations in more detail. A web-optimized animated gif can be found in https://github.com/objornstad/epimdr/blob/master/mov/fftrecon.gif.

If z is the fast Fourier transform of the time series, then the trigonometric “signal” of the k’th observation is \(\frac{1} {T}(\sum _{f}(Re(z)\cos (2\pi (k - 1)f)) - Im(z)\sin (2\pi (k - 1)f))\), where Re() and Im() represent real and imaginary parts. We first piece together relevant bits for the formula; we then do the reconstruction in the rec2-object where the contribution of each frequency is weighed:

Finally we can visualize the convergence on the original signal using the sequence of frequencies order by amplitude (highest to lowest importance):

Notes

1.
The ARMA model is usually considered a purely statistical model (i.e., not containing biological mechanism), though it can be shown that the linearized discrete-time SIR model with stochastic transmission can be rewritten as a ARMA(2,1) model (see Sect. 9.8).

References

Choi, K., & Thacker, S. B. (1981). An evaluation of influenza mortality surveillance, 1962–1979: I. time series forecasts of expected pneumonia and influenza deaths. American Journal of Epidemiology, 113(3), 215–226.
Article Google Scholar
Dalziel, B. D., Bjørnstad, O. N., van Panhuis, W. G., Burke, D. S., Metcalf, C. J. E., & Grenfell, B. T. (2016). Persistent chaos of measles epidemics in the prevaccination United States caused by a small change in seasonal transmission patterns. PLoS Computational Biology, 12(2), e1004655.
Article Google Scholar
Earn, D. J., Rohani, P., Bolker, B. M., & Grenfell, B. T. (2000b). A simple model for complex dynamical transitions in epidemics. Science, 287(5453), 667–670.
Article Google Scholar
Ellner, S., Bailey, B., Bobashev, G., Gallant, A., Grenfell, B., & Nychka, D. (1998). Noise and nonlinearity in measles epidemics: combining mechanistic and statistical approaches to population modeling. The American Naturalist, 151(5), 425–440.
Article Google Scholar
Grenfell, B., Bjørnstad, O., & Kappey, J. (2001), Travelling waves and spatial hierarchies in measles epidemics. Nature, 414(6865), 716–723.
Article Google Scholar
Kooperberg, C., Stone, C. J., & Truong, Y. K. (1995). Logspline estimation of a possibly mixed spectral distribution. Journal of Time Series Analysis, 16(4), 359–388.
Article MathSciNet Google Scholar
Legendre, L., Frechette, M., & Legendre, P. (1981). The contingency periodogram: A method of identifying rhythms in series of nonmetric ecological data. The Journal of Ecology, 69, 965–979.
Article Google Scholar
Lomb, N. R. (1976). Least-squares frequency analysis of unequally spaced data. Astrophysics and Space Science, 39(2), 447–462.
Article Google Scholar
Priestley, M. B. (1981). Spectral analysis and time series. Cambridge, MA: Academic.
MATH Google Scholar
Rohani, P., & King, A. A. (2010). Never mind the length, feel the quality: The impact of long-term epidemiological data sets on theory, application and policy. Trends in Ecology & Evolution, 25(10), 611–618.
Article Google Scholar
Torrence, C., & Compo, G. P. (1998). A practical guide to wavelet analysis. Bulletin of the American Meteorological Society, 79(1), 61–78.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Center for Infectious Disease Dynamics, Pensylvania State University, University Park, PA, USA
Ottar N. Bjørnstad

Authors

Ottar N. Bjørnstad
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bjørnstad, O.N. (2018). Time-Series Analysis. In: Epidemics. Use R!. Springer, Cham. https://doi.org/10.1007/978-3-319-97487-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-97487-3_6
Published: 31 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97486-6
Online ISBN: 978-3-319-97487-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics