1 Introduction

1.1 Problem Statement and Description of Data

The mountain pine beetle Dendroctonus ponderosae (MPB) is an insect that burrows, resides, and reproduces in mature pine stands. Native to the forests of the western United States, MPBs have, historically, played an important role in forest health by attacking weakened trees—thus speeding development of a younger, more healthy forest. However, the recent onset of warm summers and dry conditions has created an epidemic (Williams and Liebhold 2002). In particular, multiple MPB outbreaks have caused wide spread tree mortality in conifer forests including ponderosa and lodgepole pines since the early 1990s (Raffa et al. 2008).

To estimate the extent of MPB damage over a region, ecologists often rely on annual aerial detection surveys (ADS) to analyze spatial and temporal patterns of the damage (Harris et al. 2002, 2003). The primary motivation for this article comes from one such ADS conducted by the Colorado State Forest Service (CSFS) for the front range mountains in Colorado during the years 2001–2010. Each year, surveyors would fly over the survey area and digitally draw in regions on a map to denote damaged areas. The ADS extends from the southern Rocky Mountains in Colorado to southern Wyoming and the Black Hills of South Dakota, but we focus on analyzing the data in the North Central Rocky Mountains in Colorado because this area has more consistent pine tree cover. The gridded area in the left panel of Fig. 1 displays the region of interest for this study.

When considered temporally, the ADS data represent a cumulative summary of damaged areas. Notationally, let \(\fancyscript{R}_{it}\) be the \(i\)th damaged region drawn in year \(t\). For example, \(\fancyscript{R}_{it}\) represents one of the highlighted areas in the right panel of Fig. 1. The damaged areas in year \(t\) are, then, the union of all the damaged regions drawn in years up to and including time \(t\). Mathematically, damaged regions in year \(t\) (\(\fancyscript{D}_t\)) are given by \(\fancyscript{D}_t = \cup _{t'\le t}\cup _{i=1}^{R_{t'}}\fancyscript{R}_{it'}\) where \(R_t\) are the total number of damaged regions drawn in year \(t\).

Fig. 1
figure 1

(left) Elevation map of Colorado with overlaid spatial grid of study region and (right) example of Colorado counties impacted by MPB damage contained within the spatial grid of the study region from 2002 to 2010.

Statistically summarizing and modeling ADS data are an interesting challenge that can benefit researchers by helping to make informed decisions for ground management actions and aerial surveying based upon the probable damage in a particular area. Zhu et al. (2005, 2008); Zheng and Zhu (2008) consider data aggregated to a regular lattice where each grid cell is assigned a binary response (infected or not infected) and use autologistic models to model the spatio-temporal structure. We note that this type of aggregation for our ADS data, however, would result in information loss. That is, classifying a grid cell as “infected” acts as if the whole region has been infected when in reality only a portion of the region may have been impacted. Proportions of damaged areas better capture the nature of the ADS data than binary models.

To make ADS data more amenable to statistical modeling and following previous studies of MPB damage, we aggregated the ADS data to a spatial grid with 42 rows and 55 columns (\(G = 2310\) total grid cells). On this grid, each cell represents an area of roughly 16  km\(^2\). This grid size was chosen for three reasons. First, a resolution of \(4\) km is fine enough to capture landscape variability in addition to climate variation between cells. Second, this aligns with resolution of the meteorological dataset from the Parameter-elevation Regressions on Independent Slopes Model (PRISM) (Daly et al. 2002) used in the analysis below. The left panel of Fig. 1 displays the spatial grid.

In order to minimize information loss due to aggregation, the cumulative damage (rather than a binary summary) for year \(t=1,\ldots ,10\) where \(t=1\) corresponds to the year \(2001\) was calculated for each grid cell. That is, for each year, we calculated the percent of a grid cell that fell within a damaged region in that year or any previous year. More concretely, let \(\widetilde{y}_{gt}\) represent the cumulative MPB damage in grid cell \(g\) up to year \(t\). The \(\widetilde{y}_{gt}\) are calculated as

$$\begin{aligned} \widetilde{y}_{gt}&= \frac{1}{|\fancyscript{G}_g|}\int _{\fancyscript{G}_g}1\!\!1_{\{{\mathbf {s}}\in \fancyscript{D}_t\}}d{\mathbf {s}} \end{aligned}$$
(1.1)

where \(\fancyscript{G}_g\) represents the spatial region of grid cell \(g\), \(1\!\!1_{\{\cdot \}}\) is an indicator function and \(\fancyscript{D}_t = \cup _{t'\le t}\cup _{i=1}^{R_{t'}}\fancyscript{R}_{it'}\) are the damaged regions up to year \(t\). Some of the observed \(\widetilde{y}_{gt}\) are displayed in Fig. 2. Note that \(\widetilde{y}_{gt} \in [0,1)\) where \(\widetilde{y}_{gt} \ne 1\) because grid cells never reach a “completely damaged” state. Furthermore, the \(\widetilde{y}_{gt}\) are monotonically increasing as a function of \(t\) because, for the ADS data, the region of damaged areas only increases (once damaged, a grid cell will always be damaged).

Fig. 2
figure 2

Observed \(\widetilde{y}_{gt}\) for a \(t=2001\), b \(t=2005\) and c \(t=2010\). Note the strong spatial correlation between sites and the monotonically increasing proportion of damaged area.

1.2 Article Overview and Outline

The goal of this work is to develop a modeling strategy for \(\widetilde{y}_{gt}\) to aid in understanding and predicting MPB damage. The ultimate goal is to develop intervention strategies to prevent further damage. Because the \(\widetilde{y}_{gt} \in [0,1)\) and are monotonic, beta regression models advocated by Kieschnick and McCullough (2003) and Ferrari and Cribari-Neto (2004) are not a viable modeling strategy as these are only defined on the open interval \((0,1)\). More appropriate is the work by Ospina and Ferrari (2010); Wieczorek and Hawala (2011); Ospina and Ferrari (2012), and Wieczorek et al. (2012) who develop zero, one and zero-and-one-augmented beta regression models. Perhaps most pertinent to the data described here is the work by Hatfield et al. (2012) who develop a zero-augmented beta regression model with individual-specific latent trajectories to explain the probability of a zero outcome and the mean of a non-zero outcome. None of these approaches for modeling random variables on \([0,1)\), however, account for the monotonicity constraints which need to be imposed on the \(\widetilde{y}_{gt}\).

For this article, we develop a model to explain and predict both the occurrence of pine beetle damage (a binary outcome) and, given damage occurred, the percent of the region infected. We use a stick-breaking representation to account for the monotonicity constraints of the cumulative damage (\(\widetilde{y}_{gt}\)) over time. Specifically, using the stick-breaking representation, the \(\widetilde{y}_{gt}\) are expressed in terms of non-monotonic random variables (say, \(y_{gt}\)) with support on \([0,1)\) and a zero-augmented spatio-temporal beta regression model is used to model \(y_{gt}\). Following Hatfield et al. (2012), our model uses a beta regression model for proportions on (0,1) and a binary component to model the probability of no MPB damage. Our contribution, beyond the stick-breaking representation, is to add a spatial and temporal term to the model to account for the spatial and temporal variation that occurs over the Colorado region so as to exploit correlations to aid in predictions.

Temporal evolution of the pine beetle damage is captured using a dynamic linear model where both the probability and extent of damage depend on the percent of damage incurred in neighboring regions in the previous time period. The low rank conditional autoregressive (CAR) models of Hughes and Haran (2013) are used to capture any spatial information not modeled by spatially varying covariates (e.g., slope, elevation, etc.).

In Sect. 2, we use a stick-breaking representation to model the cumulative damages to enforce monotonicity, discuss the prior assumptions made for each parameter as well as outline how to perform statistical inference and prediction. Section 3 shows results of fitting the model to the MPB dataset while Sect. 4 concludes and discusses opportunities for new statistical and applied research.

2 A Zero-Augmented Spatio-Temporal Model for Mountain Pine Beetle Damage

2.1 Statistical Model

Let \(g= 1, \ldots , G = 2310\) denote the grid cells of the \(42\times 55\) spatial lattice shown in the left panel of Fig. 1 and let \(t=1, \ldots , 10\) denote the year where \(t=1\) refers to the year 2001. Let \({\mathbf {x}}'_{gt} = (x_{gt1},\ldots ,x_{gtP})\) denote a vector of \(P\) covariates (e.g., elevation and precipitation; see Sect. 3). To ensure monotonicity of the cumulative damage to a grid cell, let

$$\begin{aligned} \widetilde{y}_{gt}&= \sum _{t'\le t}\left[ y_{gt'}\prod _{\{i: i<t'\}}(1-y_{gi})\right] \end{aligned}$$
(2.1)

where \(y_{gt} \in [0,1)\) are non-monotonic. The representation of \(\widetilde{y}_{gt}\) in (2.1) follows the stick-breaking representation of the Dirichlet process by Sethuraman (1994). Intuitively, the \(y_{gt}\) represent the amount of MPB damage at time \(t\) to the undamaged portion of grid cell \(g\). For example, at time \(t=1\), \(100\times y_{g1}\,\%\) of the grid cell is damaged leaving \(100\times (1-y_{g1})\,\%\) undamaged. At time \(t=2\), \(100 \times y_{g2}\,\%\) of the undamaged portion, \((1-y_{g1})\), of the grid cell is damaged with a cumulative damage of \(\widetilde{y}_{g2} = y_{g1} + y_{g2}(1-y_{g1})\) and \(100\times (1-y_{g1})\times (1-y_{g2})\,\%\) undamaged. We emphasize that \(y_{gt} = 0\) implies that there was no further damage at time \(t\).

Notice that in (2.1) there is a one-to-one relationship between \(\widetilde{y}_{gt}\) and \(y_{gt}\). Hence, under the stick-breaking representation, a model \(\widetilde{y}_{gt}\) is induced by modeling \(y_{gt}\). We assume, \(y_{gt} = (1-z_{gt})b_{gt}\) where \(z_{gt} \in \{0,1\}\) is a Bernoulli random variable with \(\mathbb {P}\text {r}(z_{gt} = 1) = \delta _{gt} \in (0,1)\) and \(b_{gt} \in (0,1).\) Intuitively, \(z_{gt}\) is an indicator variable for no damage, \(\delta _{gt}\) is the probability that there was no damage, and \(b_{gt}\) is the amount of damage at time \(t\) conditional on the event that there was damage (\(z_{gt} = 0\)).

To model the probability of no damage, we assume,

$$\begin{aligned} \text {logit}(\delta _{gt}) = \alpha _\delta + {\mathbf {x}}'_{gt}\varvec{\beta }_\delta +\eta _{\delta g}+\phi _\delta \sum _{g' \in \fancyscript{N}_g} d^{\phi _\delta }_{g'(t-1)}+\theta _\delta d^{\theta _\delta }_{g(t-1)} \end{aligned}$$
(2.2)

where \(\text {logit}(\delta _{gt}) = \log (\delta _{gt}/(1-\delta _{gt}))\), \(\alpha _\delta \) is an intercept, \(\varvec{\beta }_\delta \) is a vector of coefficients associated with \({\mathbf {x}}_{gt}\), \(\eta _{\delta g}\) is a spatially correlated random effect for grid cell \(g\) designed to capture the effect of unmeasured, spatially correlated covariates associated with grid cell \(g\), \(\phi _\delta \) is the temporal effect of damage to the neighbors (\(\fancyscript{N}_g\)) of grid cell \(g\), and \(\theta _\delta \) is the temporal effect of damage to grid cell \(g\).

In specifying a model, we use the \(d^{\phi _\delta }_{gt}\) and \(d^{\theta _\delta }_{gt}\) as measures of damage to grid cell \(g\) at time \(t\) and allow them to take a value of either \(y_{gt}\) or \(\widetilde{y}_{gt}\). Which measure of damage (\(y\) or \(\widetilde{y}\)) to use in (2.2) to capture temporal dynamics of MPB damage is not entirely clear. On one hand, it may be the case that defining \(d^{\phi _\delta }_{gt} = \widetilde{y}_{gt}\) is more appropriate because MPBs will tend to migrate to a neighboring grid cell only after consuming the resources within that grid cell. On the other hand, defining \(d^{\phi _\delta }_{gt} = y_{gt}\) may be more appropriate because a large value of \(y_{gt}\) could indicate a high MPB population which is likely to spread to neighboring grid cells. Using both, however, is inappropriate because there is a one-to-one correspondence between \(y\) and \(\widetilde{y}\). We explore which measure of damage to use in Sect. 3.1 using variable selection.

For non-zero damage, we assume \(b_{gt} \sim \fancyscript{B}(\mu _{gt},\kappa _{gt})\) where \(\fancyscript{B}(\mu _{gt},\kappa _{gt})\) is the beta distribution with mean \(\mu _{gt} \in (0,1)\) and precision parameter \(\kappa _{gt} > 0\). We use the parameterization advocated by Ospina and Ferrari (2012) so that the density function of \(b_{gt}\) is,

$$\begin{aligned} f(b_{gt} \mid \mu _{gt}, \kappa _{gt}) = \frac{\Gamma (\kappa _{gt})}{\Gamma (\mu _{gt}\kappa _{gt})\Gamma ((1-\mu _{gt})\kappa _{gt})}b_{gt}^{\mu _{gt}\kappa _{gt}-1}(1-b_{gt})^{(1-\mu _{gt})\kappa _{gt}-1} \end{aligned}$$

with \(\mathbb {E}(b_{gt}) = \mu _{gt}\) and \(\mathbb {V}\text {ar}(b_{gt}) = \mu _{gt}(1-\mu _{gt})/(\kappa _{gt}+1)\). We model \(\mu _{gt}\) and \(\kappa _{gt}\) in the same way as \(\delta _{gt}\) with,

$$\begin{aligned} \text {logit}(\mu _{gt})&= \alpha _\mu +{\mathbf {x}}'_{gt}\varvec{\beta }_\mu +\eta _{\mu g}+\phi _\mu \sum _{g' \in \fancyscript{N}_g} d^{\phi _\mu }_{g'(t-1)}+\theta _\mu d^{\theta _\mu }_{g(t-1)} \end{aligned}$$
(2.3)
$$\begin{aligned} \log (\kappa _{gt})&= \alpha _\kappa +{\mathbf {x}}'_{gt}\varvec{\beta }_\kappa +\eta _{\kappa g}+\phi _\kappa \sum _{g' \in \fancyscript{N}_g} d^{\phi _\kappa }_{g'(t-1)}+\theta _\kappa d^{\theta _\kappa }_{g(t-1)} \end{aligned}$$
(2.4)

where, as in (2.2), the \(\varvec{\beta }\) parameters represent main effects for the covariates \({\mathbf {x}}_{gt}\), the \(\eta \) parameters represent spatially correlated random effects designed to capture the effect of unmeasured covariates, the \(\phi \) parameters capture possible temporal effects of neighboring grid cells and the \(\theta \) parameters capture the temporal dynamics of the grid cell itself.

2.2 Priors

We use vaguely informative priors for the \(\phi \) and \(\theta \) parameters. Specifically, we assume that all of the \(\phi \) and \(\theta \) parameters are a priori independent \(\mathcal {N}(0,10^2)\) random variates. Vague \(\mathcal {N}(0,10^2)\) priors are used for each of the \(\alpha \) parameters.

For the \(\varvec{\beta }\) parameters, we desire to perform variable selection by learning the covariates that are important in explaining MPB damage and shrinking the remaining coefficients. We do not know a priori what variables to include in our model; therefore, we use the Bayesian LASSO prior of Park and Casella (2008) for \(\varvec{\beta }_\delta \), \(\varvec{\beta }_{\mu }\), and \(\varvec{\beta }_{\kappa }\), which constrains the coefficients to be shrunk toward zero. Specifically, for \(\varvec{\beta }_{\delta }\) we assume,

$$\begin{aligned} \varvec{\beta }_{\delta }&\sim \mathcal {N}_p\left( {\mathbf {0}},{\mathbf {D}}_{\delta \tau }\right) , \\ {\mathbf {D}}_{\delta \tau }&= \text {diag}\left( \tau _{\delta 1}^2,\dots ,\tau _{\delta p}^2\right) , \\ \tau ^2_{\delta 1},\dots ,\tau ^2_{\delta p}&\sim \prod _{i=1}^p \frac{\lambda ^2_\delta }{2} \exp \left\{ \frac{-\lambda _\delta ^2\tau ^2_{\delta i}}{2}\right\} d\tau ^2_{\delta i}, \\ f\left( \lambda ^2_\delta \right)&\propto \left( \lambda ^2_\delta \right) \exp \left\{ -2\lambda _\delta ^2\right\} , \end{aligned}$$

where \(f(\cdot )\) denotes a density function. Equivalent priors were used for \(\varvec{\beta }_\mu \) and \(\varvec{\beta }_{\kappa }\).

In total, there are \(3\times 42\times 55 = 6930\) \(\eta \) parameters which is a computational challenge. To help alleviate this problem, we use the sparse reparameterization of a conditional autoregressive model with dimension reduction as developed by Hughes and Haran (2013). Specifically, let \({\mathbf {A}}\) represent the \(2310\times 2310\) adjacency matrix of the grid cells with entries given by diag\(({\mathbf {A}})=\varvec{0}\) and \(\varvec{A}_{ij} = 1\) if \(i\) and \(j\) are neighbors (share an edge) and 0 otherwise.

Let \(\varvec{M}\) be a \(2310\times q\) matrix of the first \(q\) columns of the Moran basis \(\varvec{P^{\perp }AP^{\perp }}\) where \(\varvec{P^{\perp } = I-X(X'X)^{-1}X'}\), the projection onto the orthogonal column space of \({\mathbf {X}}\) and \({\mathbf {X}}\) is the \(2310\times P\) matrix of time-constant covariates. We set \(\eta _g = {\mathbf {m}}_{(g)}'\varvec{\eta }^\star \) where \({\mathbf {m}}_{(g)}\) is the \(g^{th}\) row of \({\mathbf {M}}\) and \(\varvec{\eta }^\star = (\eta ^\star _1,\dots ,\eta ^\star _q)'\) is a vector of coefficients. We use the prior described in Hughes and Haran (2013) which is derived from the intrinsic conditional autoregressive (ICAR) model for \(\varvec{\eta }_{\delta }^\star , \varvec{\eta }_{\mu }^\star \), and \(\varvec{\eta }_{\kappa }^\star \). Specifically, the prior for \(\varvec{\eta }_{\delta }^\star \) is

$$\begin{aligned} p(\varvec{\eta _\delta }^\star | \tau ) \propto \tau ^{q/2}\exp \left( -\frac{\tau }{2}\varvec{\eta _\delta }^{\star '} \varvec{Q}_s \varvec{\eta _\delta }^\star \right) \end{aligned}$$

where \(\tau \) is a smoothing parameter and \(\varvec{Q}_s=\varvec{M'QM}\) where \(\varvec{Q} = \text {diag}(\varvec{A1})-\varvec{A}\) and \(\varvec{1}\) is a vector of 1s. As shown in Hughes and Haran (2013), this sparse parameterization for \(\{\eta _g\}\) has the effect of (i) alleviating confounding between the main effects (the \(\varvec{\beta }\)’s) and spatial random effects by constraining spatial smoothing to the orthogonal column space of \(\varvec{X}\) and (ii) reducing the dimension of \((\eta _1,\ldots ,\eta _{2310})'\) from \(2310\) to \(q\). Based upon (Hughes and Haran 2013) and from preliminary model fitting, we chose to use \(\approx \)10 % of the total spatial random effects, \(q=250\), as there was very little change in the estimates of the \(\eta \) parameters using \(q > 250\). Here, only the time-constant covariates were used to construct the Moran basis, \(\varvec{M}\). We explored using a time-varying basis using the time-varying covariates but found that the basis functions changed very little over time. We admit, however, that while including the time-varying covariates did not seem to impact this analysis, we note that this may not be the case for all applications.

2.3 Statistical Inference and Prediction

Let \(\varvec{\theta }_z = (\alpha _\delta ,\varvec{\beta }_\delta ,\varvec{\eta }^\star _\delta ,\phi _\delta ,\theta _\delta ,\{\tau ^2_{\delta i}\}_i,\lambda _\delta )'\) and \(\varvec{\theta }_{b} = (\alpha _\mu ,\alpha _\kappa ,\varvec{\beta }_\mu ,\varvec{\beta }_{\kappa },\varvec{\eta }^\star _\mu ,\varvec{\eta }^\star _\kappa ,\phi _\mu ,\phi _\kappa ,\theta _\mu ,\) \(\theta _\kappa ,\{\tau ^2_{\mu i}\}_i,\{\tau ^2_{\kappa i}\}_i,\lambda _\delta ,\lambda _\kappa )'\) denote the vector of model parameters associated with \(\{z_{gt}\}\) (the zero-augmented piece) and \(\{b_{gt}\}\) (the beta piece). Furthermore, let \(\fancyscript{Z}_0 = \{(g,t): z_{gt} = 0\}\) denote the set of indices where \(y_{gt} \ne 0\). Given the stick-breaking weights \(\{y_{gt} = (1-z_{gt})b_{gt}\}_{gt}\), we have the following log-likelihood functions,

$$\begin{aligned} \fancyscript{L}(\varvec{\theta }_z)&= \sum _{g,t}\left[ z_{gt}\log (\delta _{gt}) + (1-z_{gt})\log (1-\delta _{gt})\right] , \end{aligned}$$
(2.5)
$$\begin{aligned} \fancyscript{L}(\varvec{\theta }_b)&= \sum _{(g,t) \in \fancyscript{Z}_0}\log \left[ f(b_{gt}\mid \mu _{gt},\kappa _{gt})\right] , \end{aligned}$$
(2.6)

where the forms for \(\delta _{gt}\), \(\mu _{gt}\), and \(\kappa _{gt}\) are given in Eqs. (2.2), (2.3), and (2.4), respectively. The joint log-likelihood for \((\varvec{\theta }_z,\varvec{\theta }_b)\) is specified similarly to Ospina and Ferrari (2012) and is given by \(\fancyscript{L}(\varvec{\theta }_z,\varvec{\theta }_b) = \fancyscript{L}(\varvec{\theta }_z) + \fancyscript{L}(\varvec{\theta }_b)\). Due to the simple forms for \(\fancyscript{L}(\varvec{\theta }_z)\) and \(\fancyscript{L}(\varvec{\theta }_b)\) above, we opt to estimate \(\varvec{\theta }_z\) and \(\varvec{\theta }_b\) by drawing from their respective posterior distributions using a Gibbs sampler where we first draw \(\varvec{\theta }_z \sim f(\varvec{\theta _b} \mid \varvec{\theta }_z,\{y_{gt}\})\) then \(\varvec{\theta }_b \sim f(\varvec{\theta }_b\mid \varvec{\theta }_z,\{y_{gt}\})\).

The complete conditional distributions \(f(\varvec{\theta _b} \mid \varvec{\theta }_z,\{y_{gt}\})\) and \(f(\varvec{\theta }_b\mid \varvec{\theta }_z,\{y_{gt}\})\) are not available in closed form. Because of this, we use an adaptive Metropolis algorithm based on Haario et al. (2001) to update \(\varvec{\theta }_z\) and \(\varvec{\theta }_b\). Specifically, we use Gaussian proposal distributions where the variance of the proposal is set to be the variance of all previous draws. To obtain estimates of the parameters, we ran a chain for 1,000,000 iterations to ensure that the MCMC standard errors were small enough (Flegal et al. 2008).

An important component in this study is predicting what regions will be damaged (the \(z_{gt}\) component) and the amount of damage (the \(b_{gt}\) component) for the year \(t^\star = 11\). To make predictions, we obtain draws of \(y_{1t^\star },\dots ,y_{Gt^\star }\) from the joint posterior predictive distribution using the identity,

$$\begin{aligned} \pi (z_{gt^\star },b_{gt^\star },\varvec{\theta }_z,\varvec{\theta }_b\mid \{y_{gt}\}_{gt})&= \pi _{Z_{gt^\star }}(z_{gt^\star }\mid \varvec{\theta }_z,\varvec{\theta }_b,\{y_{gt}\}_{gt})\pi _{B_{gt^\star }}(b_{gt^\star }\mid \varvec{\theta }_z,\varvec{\theta }_b,\{y_{gt}\}_{gt}) \nonumber \\&\qquad \times \pi _{\varvec{\theta }}(\varvec{\theta }_z,\varvec{\theta }_b\mid \{y_{gt}\}_{gt}) , \end{aligned}$$
(2.7)

where we use \(\pi \) to denote a posterior distribution. From (2.7), we can obtain draws from the posterior predictive distribution of \(\{(z_{gt^\star },b_{gt^\star })\}\) by drawing \(z_{gt^\star }\sim \text {Bern}(\delta _{gt^\star })\) and \(b_{gt^\star } \sim \fancyscript{B}(\mu _{gt^\star },\kappa _{gt^\star })\) for each draw of \((\varvec{\theta }_{z},\varvec{\theta }_b)\) obtained from the posterior distribution. Draws from the posterior predictive distribution also give a measurement of the uncertainty associated with the prediction.

3 Results

We consider \(P=5\) covariates to include as the vector \({\mathbf {x}}_{gt}\). For each grid cell \(g\), we calculate the (i) August mean maximum temperature in degrees Celsius, (ii) January mean minimum temperature in degrees Celsius, (iii) mean annual precipitation in inches, (iv) terrain slope in percent rise, and (v) elevation in feet. Each of these components has been shown to have an impact on mountain pine beetle outbreaks in the western United States (see, for example,Waring and Pitman 1985; Mitchell and Preisler 1991; Negron and Popp 2004). Weather variables were taken from the PRISM dataset which is publicly available at. http://www.prism.oregonstate.edu/ The PRISM data estimate monthly weather data over a contiguous grid at a resolution of 0.0416 decimal degrees latitude and longitude (\(\sim \)4 km) cells (Daly et al. 2002) and align with the resolution of our gridded MPB data. The weather variables were adjusted to account for a one-year lag between infestation and the time MPB damage is detected in the ADS. That is, for August mean maximum and mean annual precipitation we used data from 1999 to 2008 and for January mean minimum temperatures we used data from 2000 to 2009. Slope and elevation data for each site were generated from a Digital Elevation Map (DEM) of the state of Colorado in ArcGIS, where slope is calculated based upon the maximum rate of change in elevation over the distance from one site and its neighboring sites.

3.1 Model Selection

Intuitively, for a grid cell \(g\), the \(\phi \) parameters are associated with the covariate \(\sum _{g' \in \fancyscript{N}_g}\) \( d^{\phi }_{g'(t-1)}\) and represent an added effect due to the cumulative damage to neighbors of \(g\) at the previous time period. Similarly, the \(\theta \) parameters are associated with the covariate \(d_{g(t-1)}^\theta \) and represent an added effect due to damage at grid cell \(g\) but at the previous time period. The model postulated in (2.2), (2.3), and (2.4) requires a choice for the covariates \(d_{gt}^{\phi } \in \{y_{gt},\widetilde{y}_{gt}\}\) and \(d_{gt}^{\theta }\in \{y_{gt},\widetilde{y}_{gt}\}\). The question, then, is which measure of damage (\(y\) or \(\widetilde{y}\)) is a better predictor of MPB damage? To answer this question, we fit the proposed model for each combination of \(d_{gt}^{\phi _\delta }, d_{gt}^{\theta _\delta }, d_{gt}^{\phi _\mu }, d_{gt}^{\theta _\mu }, d_{gt}^{\phi _\kappa }\), and \(d_{gt}^{\theta _\kappa }\) (totaling \(2^6 = 64\) models).

In this particular application, predictive performance was the most important because the ultimate goal is to predict the likelihood that MPB’s will appear in a particular grid cell allowing subsequent intervention strategies to be made. To assess prediction accuracy, we left out the year \(t=2010\) and compared model predictions of the occurrence of damage (\(z_{gt^\star }\)) and amount of damage (\(b_{gt^\star }\)) to the observed \(\widetilde{y}_{gt}\). We compared each model’s prediction of the hold-out sample based on the misclassification rate, root mean square prediction error (RMSPE), and continuous ranked probability score (CRPS; Gneiting and Raftery 2007). The misclassification rate was defined as \((2310^{-1})\sum _{g=1}^{2310} 1\!\!1_{\{\ddot{z}_{gt^\star } \ne z_{gt^\star }\}}\) where \(\ddot{z}_{gt^\star } = 1\) if \(\mathbb {P}\text {r}(z_{gt^\star } = 1 \mid \{y_{gt}\}_{gt}) \ge 0.5\) and \(\ddot{z}_{gt^\star } = 0\) otherwise. The RMSPE is calculated as \(\sqrt{(2310^{-1})\sum _{g=1}^{2310}(\widehat{y}_{gt^\star }-y_{gt})^2}\) where \(\widehat{y}_{gt}\) is the posterior predictive mean of \(y_{gt}\). Because the CRPS is only defined for continuous variables, we calculate CRPS only for those \(b_{gt^\star }\) for which \(z_{gt^\star } = 0\). That is, we calculate the CRPS of the random variable \(b_{gt^\star } \mid z_{gt^\star } = 0\). As the RMSPE and CRPS differences were relatively small we chose the model based upon percent misclassification. As an additional measure, although secondary to predictive performance, we compared each model’s fit based on the deviance information criterion (DIC) of Spiegelhalter et al. (2002).

Table 1 displays the top 5 models ranked in terms of misclassification rate. From Table 1, we note that the model which had the lowest misclassification rate also has the lowest DIC (we adjusted the DIC values so that the minimum observed DIC value was \(0\) ). For example, a DIC value of 479.741 means that the DIC value was 479.741 greater than the first model in Table 1. If we rank models according to DIC, the results in Table 1 change. Other than the first model in Table 1, the next best models according to DIC had misclassification rates greater than 27 %.

Table 1 Top 5 models, ranked according to percent misclassification low to high and DIC for various choices of \(d\) in (2.2), (2.3), and (2.4).

Considering the best model in Table 1, the \(\widetilde{y}_{gt}\) are preferred for the \(\phi _\mu \) and \(\phi _\kappa \) coefficients but not the \(\phi _\delta \) coefficient. This result seems to suggest that the cumulative amount of damage (\(\widetilde{y}_{gt}\)) to neighbors of grid cell \(g\) is predictive of the amount of damage but not of the occurrence of damage. Rather, the occurrence of damage is better explained and predicted by the amount of damage (\(y_{gt}\)) incurred at neighboring grid cells in the previous time period.

We note that, in model (2.2), \(d_{gt}^{\phi _\mu }\) and \(d_{gt}^{\theta _\mu }\) were found to be different covariates. As it is not fully known how MPBs migrate between spatial locations we allowed the covariates to differ for the \(d_{gt}^{\theta _\mu }\) and \(d_{gt}^{\phi _\mu }\) components in the model. Doing so allowed us to explore the differences between within-cell and between-cell temporal correlations in the data. That is, cumulative damage done to neighboring grid cells seems to be more explanatory of the amount of damage in a grid cell than cumulative damage done within a grid cell at the previous time period.

Prior to concluding this section, we note that comparing the different models based on RMSPE and CRPS was more challenging because the observed spread of RMSPE and CRPS between models was small. For example, the best model according to RMSPE had RMSPE \(= 0.081\) compared to a maximum RMSPE of \(0.095\). This suggests that our model is able to predict the \(z_{gt}\) component better than the \(b_{gt}\) component.

3.2 Model Fit Results

For the best model (row 1 in Table 1), Table 2 displays posterior summaries (medians and 95 % credible intervals) of the main effect coefficients \(\varvec{\beta }\), \(\phi \), and \(\theta \) in each of (2.2), (2.3), and (2.4). Values represent the percent change in the odds ratio (for \(\delta \) and \(\mu \)) or the percent change in \(\kappa \). For rows 1 through 4, values indicate percent change due to a unit increase in the covariate. For row 5, values indicate percent change given a 1000 foot increase in elevation. For rows 6 and 7, values indicate percent change given 10 % increase in damage.

As expected, several of the parameters in (2.2) and (2.3) have opposite signs for the same covariate. For example, when the amount of precipitation increases (i) the mean amount of MPB damage, given damage occurred, decreases and (ii) the probability of no damage increases. This result is also true for August mean maximum temperatures. That is, the data indicate that when August maximum temperatures increase (i) the mean amount of MPB damage decreases and (ii) the probability of no damage increases. The data also show the higher the August temperature the more variable the amount of damage is.

The result that the mean amount of MPB damage decreases with increases in August temperature is opposite from previous studies. That is, previous studies by Negron and Popp (2004); Zhu et al. (2008) show increases in temperature lead to greater damage. As multicollinearity may be a matter of concern, we tested for multicollinearity by assessing the correlation among the variables magnitude less than \(0.65\). We also removed each of the covariates out of the model one at a time. However, the signs for the covariates stayed the same across models suggesting this result is not due to multicollinearity. We hypothesize that this contradiction of previous results occurred because we used monthly rather than daily average temperatures. We hypothesize that this contradiction of previous results occurred because we used monthly rather than daily average temperatures. That is, because MPB populations are diminished with multiple days of extreme cold temperatures (e.g., less than \(-30\,^{\circ }\mathrm{C}\)), using monthly temperatures it causes the extreme cold or heat events days to be masked. However, further exploration into this result is needed.

In terms of the landscape effects, elevation has a positive relationship with the mean amount of MPB damage and a negative relationship with the probability of no damage as expected. However, the effect of slope on \(\delta \) and \(\mu \) has the same sign. That is, slope has a positive relation to mean MPB damage (\(\mu \)) and the probability of no damage (\(\delta \)). This result also seems opposite of what intuition might imply. For example, we a priori might expect that as the slope increases, \(\delta \) increases whereas \(\mu \) decreases. Upon closer inspection, this opposing relation can be explained by the diversity of the tree stands. Because only certain types of trees are able to grow on steep slopes, the probability of no damage increases with slope because MPBs might not infest these type of trees. However, the amount of damage at these high slopes can be more substantial because, conditional on damage occurring, the type of tree within the grid cell is not resilient against MPB damage.

Table 2 Posterior medians and credible intervals for main effect coefficients \(\varvec{\beta }\), \(\phi \), and \(\theta \) in each of (2.2), (2.3), and (2.4).

For \(\phi \), note that as the cumulative damage to the neighbors of a grid cell in the previous year decreases the probability of MPB damage within the grid cell increases (\(\delta \) decreases). Additionally, as the neighbors of a grid cell become less damaged, the damage within the grid cell can be more substantial. This result seems to suggest that MPBs tend to consume the resources in neighboring cells before consuming the resources within that grid cell. Hence, if a grid cell’s neighbors have been nearly entirely damaged, MPBs will finish off the resources within a grid cell before migrating to regions with more undamaged trees.

Finally, for \(\theta \), the temporal effect of the MPB damage in the previous year, as the percent of MPB damage from the previous year increases the probability of no MPB damage (\(\delta \)) decreases and the mean amount of damage (\(\mu \)) increases. This result is expected and supports the idea that MPBs consume the resources within the grid cell before migrating to other regions with undamaged trees. In the ADS survey data, if an area is designated as highly damaged subsequent years of data collection focus on regions near areas that have been previously impacted, capturing the nature of how MPBs migrate.

3.3 Predictive Results

Figure 3a displays a map of the estimated probability \((1-\delta _{gt^\star })\) for \(t^\star = 11\) (the year 2011). That is, Fig. 3a displays the posterior predictive probability of MPB damage across the study region. Historically, from the trend seen in Fig. 2, the MPB damage is moving predominantly in an easterly direction. While our model continues to predict more damage to the east, noticeably, our model is predicting damage in 2011 to be more in a south-eastern direction. That is, from Fig. 3 we see a high probability of MPB damage in the south-eastern region of the study area.

Fig. 3
figure 3

a Predicted probability of MPB damage \((1-\delta _{gt^\star })\) for \(t^\star = 11\) and b posterior median of the predicted damage \(y_{gt^\star }\) given \(z_{gt^\star } = 0\).

Figure 3b displays the posterior median of the posterior predictive distribution of \(y_{gt^\star }\) given \(z_{gt^\star } = 0\) (the amount of damage conditional on MPB damage occurring in the area). That is, Fig. 3b displays the median amount of MPB damage predicted given that MPB damage occurred. Figure 3b shows that we predict the highest amount of damage to be in the south-eastern region. Combining Fig. 3a, b, we are able to highlight regions wherein intervention strategies are, perhaps, most effective.

Figure 4 displays the full posterior predictive distribution of \(y_{gt^\star }\) for three randomly selected grid cells. Based upon inspection of a larger sample, we found that the majority of predictive distributions follow this pattern—namely, a peak near zero with a heavy tail (although the heaviness of the tail changes from grid cell to grid cell). This pattern suggests that large pine beetle damage is possible but not likely.

Fig. 4
figure 4

Posterior predictive distributions of \(y_{gt^\star }\) for three randomly selected grid cells. Vertical dashed lines denote the posterior predictive mean and the dotted lines denote a 95 % highest posterior density interval.

4 Discussion and Conclusions

This article focused on modeling and predicting aerial detection survey data of MPB damage in Colorado. Specifically, we used a stick-breaking representation to enforce monotonicity constraints of the cumulative damage to grid cells in a region in Colorado. We model the resulting stick-breaking weights using a zero-inflated beta regression model wherein the probability of zero damage as well as the mean and dispersion of a beta distribution vary over space and time.

The ultimate goal of this work was to build a predictive model for MPB damage using ADS data. We demonstrated this predictive methodology in Sect. 3.3 by the ability to predict regions where the highest amount of damage will occur. We note that the predictive distributions in Fig. 4 show a very heavy tail. This heavy-tailed nature of MPB damage challenges our assumption of a beta distribution for positive damage. An alternative modeling strategy could include the use of extreme value distributions to more appropriately model the upper tail of MPB damage.

For the ADS data, there are portions of the region wherein no trees are present; hence, there can be no MPB damage to these grid cells. However, the ADS data do not directly distinguish between zero damage resulting from no trees in a grid cell and a “true” MPB damage of zero. Due to the large amount of MPB damage in the spatial region of interest, it may be safe to assume that if a grid cell is never damaged then no trees are present. However, this judgement may be uncertain so we do not wish to simply throw out grid cells for which there is never any damage. Future work on this project should include development of methodology to appropriately partition the spatial region into regions for which MPB damage is possible.

As a final note, one area that requires more attention is the temporal lag structure used in the model. That is, in this work we accounted for the one-year lag structure between impact and detection of the infestation. However, the use of other lag structures, such as distributed lag models, may give more understanding to the temporal dynamics of MPB damage.