Trajectory Matching

Bjørnstad, Ottar N.

doi:10.1007/978-3-319-97487-3_8

Ottar N. Bjørnstad⁵

Part of the book series: Use R! ((USE R))

4517 Accesses
1 Citations

Abstract

When we fit mechanistic models to data, we have to consider carefully the relationship between the nature of the data versus the nature of the model state variables. For example, when we work with continuous-time S(E)IR models it is important to keep in mind that incidence is not prevalence; so if our data is incidence we will need to do something more than trying to match simulated prevalence with observed incidence. We therefore start with a toy example using simulated data.

This chapter uses the following R-package: deSolve.

Access provided by CONRICYT-eBooks. Download chapter PDF

8.1 Preamble: Prevalence Versus Incidence

When we fit mechanistic models to data, we have to consider carefully the relationship between the nature of the data versus the nature of the model state variables. For example, when we work with continuous-time S(E)IR models it is important to keep in mind that incidence is not prevalence; so if our data is incidence we will need to do something more than trying to match simulated prevalence with observed incidence. We therefore start with a toy example using simulated data.

When/if we can assume that dynamics is unaffected by process noise (demographic and environmental stochasticity), we can fit models to data using trajectory matching. The assumption is that discrepancies between the observations and the predictions from the dynamic model are due to observational errors. The upside of trajectory matching is that we can easily fit continuous-time models to variably spaced observations on any/all state-variable, the downside is that these assumptions are usually restrictive.

8.2 Event-Based Stochastic Simulation

To begin, we will consider how to stochastically simulate the continuous-time SIR model (Eqs. (2.1)– (2.3)). Previously we consider stochastic simulation using discrete-time models. An alternative is to do continuous-time stochastic simulation using an event-based approach: The Gillespie exact algorithm (Gillespie 1977) and the τ-leap approximation (Gillespie 2001). As discussed in Sect. 2.7, the S(E)IR-model (and all simple ODEs) implies exponentially distributed waiting times between events. The Gillespie algorithms take advantage of this idea. If we for example consider how the SIR-states of the SIR flows ( (2.1)– (2.3)) should change over time, we expect the following six possible changes:

S → S + 1 at rate μN from births
S → S − 1 at rate μS from deaths
S → S − 1 and I → I + 1 at rate βSI∕N from infection
I → I − 1 at rate μI from deaths
I → I − 1 and R → R + 1 at rate γI from recovery
R → R − 1 at rate μR from deaths

Thus, the system is expected to change by an overall summed rate of r = μN + μS + βSI∕N + μI + γI + μR. We can therefore draw a random exponential waiting-time with mean r to update a continuous-time clock, then draw a random event from a multinomial distribution with probabilities given by the relative rates, update the state variables accordingly, and repeat…

Because of the many versions of compartmental models used in studying disease dynamics, it is useful to write a general purpose stochastic simulator that can be applied to any set of rate equations. To this end we first define a rlist-list of equations corresponding to the rates for the six transitions of the SIR flows. The quote-formalism allows us to set up the list such that all equations can be evaluated in a single sapply-call as the simulation progress.

We next define a transition matrix associated with each SIR event. The three columns correspond to changes in S, I, and R, respectively; The rows correspond to the six possible events.

We finally write a general-purpose function to simulate a dynamical systems using the Gillespie algorithm. The idea is to write a function that is sufficiently robust and general that it can be applied to event-based stochastic simulation of any model that fits within a compartmental framework. The function takes five arguments to accomplish this:

rateqs—a list of E rate equations corresponding to each of the E possible events using the quote-formalism
eventmatrix—a E-by-S matrix of changes to each of the S state variables associated with each event
parameters—a vector of parameter values
initialvals—a vector of initial values for the S states
numevents—number of events to be simulated

We provide parameters and initial conditions for a stochastic simulation assuming an infectious period of 20 days (Fig. 8.1):

The Gillespie algorithm provides an “exact” stochastic simulation in the sense that the time-evolution of the system is changing exactly according to the exponential waiting-time distributions of the stochastic differential system. It is, however, computationally expensive as every event is recorded separately. Gillespie’s τ-leap method uses the Poisson approximation corresponding to the discussion of Sect. 7.1; If we assume that the interval, Δt, is sufficiently short that any change in the rates are negligible, the number of events should be Poisson-distributed with mean overallrate ∗Δt and multinomially divided among the events according to their relative rates.

We write a general τ-leap simulator and then apply it to the SEIR model. The SEIR model has eight possible events:

S → S + 1 at rate μN from births
S → S − 1 at rate μS from deaths
S → S − 1 and E → E + 1 at rate βSI∕N from infection
E → E − 1 at rate μE from deaths
E → E − 1 and I → I + 1 at rate σE from becoming infectious
I → I − 1 at rate μI from deaths
I → I − 1 and R → R + 1 at rate γI from recovery
R → R − 1 at rate μR from deaths

We thus have the following event matrix:

The SEIR equations associated with each event are:

A general-purpose τ-leap simulator is:

We assume an initial population comprised of 1000 individuals and 1 initial infected and simulate daily incidence for 2 years and assume measles-like parameters:

Following the virgin epidemic, the inherent birth/death stochasticity leads to low-amplitude oscillations (Fig. 8.2) according to the resonant periodicity of the SEIR model (see Chap. 9).

8.3 Trajectory Matching

Trajectory matching assumes that the discrepancies between models and data are due to error of observation. The event-based, stochastic simulation breaks with this assumption as model discrepancies are due to demographic stochasticity; Let us nevertheless see if we can fit the SEIR model to the event-based simulation. We first recall the gradient-function for the system:

Following the ideas introduced in Sect. 3.4, we define a likelihood function to estimate parameters. The Gaussian log-likelihood is \(= \mbox{ const} -\frac{n} {2} \log (\mbox{ RSS})\), where n is the length of the time series, RSS is the residual sum-of-squares, and the constant is n(log(n) − log(2π) − 1)∕2 (Aitkin et al. 2005).^{Footnote 1}

We next estimate parameters:

and plot the deterministic prediction:

The trajectory-match’ed fit predicts the virgin epidemic and the next dampened epidemic well, but not—as expected—the subsequent stochastically excited low-amplitude cycles (Fig. 8.3).

In addition to finding parameter estimates we are usually interested in uncertainty and trade-offs among parameters in producing a fit to the data.

8.4 Likelihood Theory 101

We have used maximum likelihood principles in several of our previous analysis of, for example, the chain-binomial, the catalytic and the TSIR models. We have, however, not discussed likelihood theory in a formal fashion.^{Footnote 2} For our purposes it is useful to summarize the key results with respect to inference from “elementary” likelihood theory with maximum brevity (see, for example, appendix A of McCullagh and Nelder 1989):

Let L(D | θ) be the function that calculates the likelihood for a set of data, D; i.e., the probability of observing the data given some values for the parameters, θ. The values that maximize this probability are the maximum likelihood estimates (MLEs) of the parameters, \(\hat{\theta }\).
If ℓ(θ) is the negative log-likelihood (i.e., − logL), then \(\hat{\theta }\) are the values that minimizes ℓ. If data points are independent, then the joint log-likelihood is simply the sum of the log-likelihoods of the data points.
The MLE is a minimum of ℓ, so the score function U(θ) = ∂ℓ∕∂θ is zero at the MLE.
The likelihood profile graphs how ℓ(θ) changes with θ. The 95% confidence interval is the set of values of θ for which ℓ(θ) is within χ ²(0. 95, p)∕2 of the minimum, where p is the number of parameters. The quantity 2ℓ(θ) is referred to as the deviance, so if we work with the deviance we would use χ ²(0. 95, p) as the cut-off.
The second derivative of ℓ(θ) with respect to θ is called the Fisher information, ι(θ) = ∂ ² ℓ∕∂θ ². The inverted information matrix is an approximation to the variance-covariance matrix of the parameters, so we can obtain approximate standard errors as the square-root of the diagonal of the inverted information matrix. The approximate correlation matrix is the standardized inverted information matrix.
A matrix of second derivatives is generally referred to as a Hessian matrix . If we call optim(…, hessian=TRUE), R will numerically estimate the Hessian at the minimum, so if the function to be minimized is the negative log-likelihood, we can obtain approximate SEs and the approximate correlation matrix from this Hessian.
If we have two alternative models that are nested—meaning that the more complex model contains all the parameters of the simpler—then we can test for significant model improvement; the difference in the log-likelihood is χ ²(df = Δp)∕2-distributed, where Δp is the number of extra parameters in the complex model.^{Footnote 3}

We apply these ideas to our model fit:

The true parameter values used in the simulation were μ = 1, N = 1000, β = 1000, σ = 45. 6, and γ = 73. So while the model prediction gives a good fit, the parameter estimates are not particularly accurate. This is where it is useful to use likelihood theory more extensively. From the normalized inverted Hessian we see that several of the parameters are highly (positively or negatively) correlated, and several with correlations more extreme than ± 0. 9. That means that different parameter combination may provide a very similar fit to the data. This is an illustration of identifiability problems; With observations only on the infectious stage, for instance, a relatively short infectious period and high transmission rate will predict a similar trajectory to a relatively short latent period and a lower transmission rate. Furthermore, a smaller population size and higher birth rate can result in identical susceptible recruitment rate than a larger population with lower birth rate. For inference it is therefore normally best to inform the analysis with any known biological quantities; For example if the latent and infectious periods are known from household or clinical studies, it may be best not to attempt to infer these from the time series alone (though, as King et al. (2008) point out for Cholera dynamics, conventional wisdom may not always be consistent with dynamical patterns). Moreover, if there are strong correlations, the individual SEs (and CIs derived there from) may be a poor representation of parametric uncertainty. It may then be better to look at pairwise confidence ellipses (e.g., Bolker 2008).

8.5 SEIR with Error

We can use ode to integrate the SEIR model and add noise, to generate a data set that exactly adheres to the assumption that the dynamics is only affected by observational noise. Let us simulate 10 years of weekly data assuming measles’ish parameters and that 6% of the initial population is susceptible:

We add noise to the data using the jitter-function (Fig. 8.4),

define a Gaussian likelihood function,

and estimate parameters using the jittered observations.

The estimates are

8.6 Boarding School Flu Data

The boarding school flu data set introduced in Sect. 3.6.1 has an approximate match between observation and prevalence because the data represents the number of children confined to bed each day, and while the average stay in bed (3–7 days) is maybe a bit different than the infectious period, the durations are comparable.

We define the gradient functions for a closed SIR epidemic:

and define the likelihood function assuming normally distributed errors

There are two parameters to estimate: β and γ. The time-scale is daily so we set reasonable initial conditions and maximize the likelihood:

The estimated parameters and basic reproductive ratio, R₀, are:

The R₀ estimate is comparable to the estimate we made in Chap. 3 The observed and predicted outbreaks are seemingly a good match (Fig. 8.5):

8.7 Measles

We consider, again, the measles incidence data collected by Doctors Without Borders (MSF) during the 03/04 outbreak in Niamey, Niger (Fig. 8.6; Sect. 3.4) but using data at a daily resolution. We can compile the daily incidence in a vector y.

The challenge with this data is that we need to make the SEIR-formulation relevant to the data on incidence. The complication is that I represents prevalence (i.e., current number of infected individuals), while incidence, y, represents appearance of new cases (i.e., flux) into the infected class. If we recast the SEIR model to also keep track of cumulative incidence, K, we can difference the K time series at time-steps corresponding to that of the observations to predict incidence (y). We define the SEIRK-model assuming known latent and infectious periods of 8 and 5 days, respectively.

The resultant gradient function is:

We next define the likelihood function (assuming Poisson distributed errors) for the unknown transmission rate, β, and initial susceptible number, N. According to the MSF outbreak response protocol, an outbreak is declared once five cases have been confirmed. The unknown infectious fraction is thus 5∕N.

For starting values we assume initial susceptible numbers N = 11, 000 and β = 5 and optimize:

The estimated effective reproductive ratio, R_E, is comparable to the estimates obtained in Chap. 3:

8.8 Outbreak-Response Vaccination

Grais et al.’s (2008) objective in fitting a model to the Niamey outbreak data was to evaluate the effectiveness of outbreak-response vaccination (ORV) in reducing the burden of disease during an on-going outbreak. The ORV campaign began on day 161 after the beginning of the epidemic with a goal of vaccinating 50% of all children of ages between 9 months and 5 years. After 10 days, almost 85,000 (57%) of this at-risk group was vaccinated (without knowledge of previous disease or vaccination status). Assuming vaccination was at random with respect to immune status, we can write a modified SEIR function to study the problem. The vaccine cover is a fraction—effectively a probability—so we need to translate it to a rate using the relation discussed in Sect. 3.2: r = −log(1 − p)∕D, where D is the length of the campaign. We define two functions to carry out the efficacy calculations. The sivmod-function integrates the SI-model with outbreak-response vaccination and the retrospec-function compares predicted epidemic trajectories with and without the ORV.

We will discuss S3-class programming more formally in Sect. 12.1. However, as a preview we define a plot.retro-function for objects of class retro as the list returned by the retrospec-function is labeled:

If we assume our model is correct and that the vaccine either elicits instantaneous protection or after 2 or 4 weeks (for the antibody response to mature), the ORV is predicted to have reduce the epidemic by 25%, 15%, or 8% respectively:

We can plot the red1-object to inspect the predicted epidemic curve with and without outbreak-response vaccination (Fig. 8.7). The key insight is that for ORVs to work it needs to be implemented early (Grais et al. 2008).

8.9 ShinyApp

The epimdr-package contains the orv.app with a more detailed sensitivity analyses of outbreak response vaccine scenarios. The app can be launched from R through:

Notes

1.
If in a hurry we can ignore the constant and minimize \(\frac{n} {2} \log (RSS)\) because it is the relative likelihood that matters.
2.
Bolker (2008) is an excellent broad discussion on estimation for ecologically realistic models using a variety of methods.
3.
If the models are non-nested, formal tests are not available but information theoretical rankings of models using AIC, BIC, AIC-weights, etc. are useful.

References

Aitkin, M. A., Francis, B., & Hinde, J. (2005). Statistical modelling in GLIM 4 (Vol. 32). New York, NY: Oxford University Press.
MATH Google Scholar
Bolker, B. M. (2008). Ecological models and data in R. Princeton: Princeton University Press.
MATH Google Scholar
Gillespie, D. T. (1977). Exact stochastic simulation of coupled chemical reactions. The Journal of Physical Chemistry, 81(25), 2340–2361.
Article Google Scholar
Gillespie, D. T. (2001). Approximate accelerated stochastic simulation of chemically reacting systems. The Journal of Chemical Physics, 115(4), 1716–1733.
Article Google Scholar
Grais, R. F., Conlan, A. J. K., Ferrari, M. J., Djibo, A., Le Menach, A., Bjørnstad, O. N., et al. (2008). Time is of the essence: Exploring a measles outbreak response vaccination in niamey, niger. Journal of the Royal Society Interface, 5(18), 67–74.
Article Google Scholar
King, A. A., Ionides, E. L., Pascual, M., & Bouma, M. J. (2008). Inapparent infections and cholera dynamics. Nature, 454(7206), 877–880.
Article Google Scholar
McCullagh, P., & Nelder, J. A. (1989). Generalized linear models: Vol. 37. Monographs on statistics and applied probability (2nd ed.). London: Chapman and Hall.
Book Google Scholar

Download references

Author information

Authors and Affiliations

Center for Infectious Disease Dynamics, Pensylvania State University, University Park, PA, USA
Ottar N. Bjørnstad

Authors

Ottar N. Bjørnstad
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bjørnstad, O.N. (2018). Trajectory Matching. In: Epidemics. Use R!. Springer, Cham. https://doi.org/10.1007/978-3-319-97487-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-97487-3_8
Published: 31 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97486-6
Online ISBN: 978-3-319-97487-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics