1 Introduction

Circular data, i.e., observations with support on the unit circle, arise in many contexts. Examples include natural directions, such as wind directions (meteorology), animal movement directions (biology) and rock fracture orientations (geology). Another type of circular data arises by wrapping periodic time data with period L (say, day or week) onto a circle with circumference L and then rescaling the circumference to \(2\pi \), that of the unit circle. Two-dimensional directional data may be observed in space and time, along with linear variables, as in marine studies where, for example wave heights and directions are jointly observed, or in atmospheric modeling where wind fields are represented by wind intensity as well as direction. Due to the restriction of the domain to the circle, analysis of circular data must be treated differently from linear data. Customary statistical summaries are replaced with their circular counterparts. For a discussion of inference with circular variables see, e.g., Fisher (1996), Mardia and Jupp (1999), Jammalamadaka and SenGupta (2001) or the recent paper by Lee (2010).

The contribution of this paper is to extend recent spatial and spatio-temporal circular data models. In particular, Jona Lasinio et al. (2012) consider the use of the wrapped normal approach by developing the wrapped Gaussian process while Wang and Gelfand (2014) consider the use of the projected normal approach by developing the projected Gaussian process. Here we: (i) extend both processes to the spatio-temporal setting, introducing space–time dependence; (ii) introduce space and time varying covariate information; (iii) show how to implement fully model-based kriging and forecasting; (iv) allow a nugget which can be time dependent, and (v) provide an extensive comparison between the more sparsely parametrized wrapped Gaussian process with the more flexible projected Gaussian process. We do this illustratively using simulation, as a proof of concept, as well as with data in the form of computer model output for wave directions in the Adriatic Sea off the coast of Italy. The models are fitted under a Bayesian framework, introducing suitable latent variables, enabling full inference.

Modeling of circular data has progressed well beyond the i.i.d. case. Examples include linear models (Harrison and Kanji 1988; Fisher 1996; Kato and Shimizu 2008), linear models in a Bayesian context (Guttorp and Lockhart 1988; Damien and Walker 1999), models for circular time series (Breckling 1989; Fisher and Lee 1992; Coles 1998; Holtzman et al. 2006; Ravindran and Ghosh 2011), and hidden Markov models to address classification issues (Lagona and Picone 2011; Bulla et al. 2012; Mastrantonio et al. 2015). In Kato (2010) a Markov process for circular variables is presented. Jona Lasinio et al. (2012) consider a spatial wrapped Gaussian process. Wang and Gelfand (2013) explore the general projected normal model while in Wang and Gelfand (2014) Bayesian analysis of space–time circular data is developed using projected Gaussian processes. In Wang et al. (2015), directional wave data is modeled jointly with linear wave height data.

The format of the remainder of the paper is as follows. In Sect. 2, we review the wrapping approach and offer a non-separable space–time model for circular data. In Sect. 3, an analogous model is presented using the projected normal process. Section 4 presents several simulation examples giving insight into the inferential performance of the models, while Sect. 5 analyzes the behavior of the models for wave directions. Section 6 extends the modeling approach to enable space–time varying covariates reflecting sea state at a location and time. Some concluding remarks are provided in Sect. 7. Implementation details, further simulated examples and more details on the real data application are available in the Supplementary Online Material, Sections S1, S2 and S3.

2 A brief review of the wrapped modeling approach

Let \(Y \in {\mathbb {R}}\) be a random variable on the real line and let g(y) and G(y) be, respectively, its probability density function and cumulative distribution function. The random variable

$$\begin{aligned} X=Y\hbox {mod }2\pi , \quad 0 \le X < 2\pi \end{aligned}$$

is the wrapped version of Y having period \(2 \pi \). The probability density function of X, f(x), is obtained by wrapping the probability density function of Y, g(y), around a circle of unit radius via the transformation \(Y= X+2\pi K \), with \(K \in {\mathbb {Z}}\equiv \{0, \pm 1, \pm 2,\ldots \}\), and takes the form

$$\begin{aligned} f(x)=\sum _{k=-\infty }^{\infty }{g(x+2\pi k )}, \end{aligned}$$
(1)

that is, a doubly infinite sum.

Equation (1) shows that \(g(x+2\pi k)\) is the joint distribution of (XK). Hence, the marginal distribution of K is \(P(K=k) = \int _{0}^{2\pi } g(x+2\pi k) dx\), the conditional distributions \(P(K=k|X=x) = g(x+2\pi k)/\sum _{j= -\infty }^{\infty } g(x+2\pi j)\) and the distribution of \(X|K=k\) is \(g(x+2\pi k)/\int _{0}^{2\pi } g(x+2\pi k) dx\). The introduction of K as latent variable facilitates model fitting (Jona Lasinio et al. 2012).

Following Coles (1998), we can extend the wrapping approach to multivariate distributions. Let \({\mathbf {Y}} = (Y_{1},Y_{2},\ldots ,Y_{p})\sim g(\cdot )\), with \(g(\cdot )\) a p-variate distribution on \({\mathbb {R}}^{p}\) indexed by say \(\varvec{\theta }\) and let \({\mathbf {K}} = (K_{1},K_{2},\ldots ,K_{p})\) be such that \({\mathbf {Y}}= {\mathbf {X}} + 2\pi {\mathbf {K}}\). Then the distribution of \({\mathbf {X}}\) is

$$\begin{aligned} f({\mathbf {X}}) = \sum _{k_{1}= -\infty }^{+\infty }\sum _{k_{2}= -\infty }^{+\infty }\ldots \sum _{k_{p}= -\infty }^{+\infty } g({\mathbf {X}}+ 2 \pi {\mathbf {K}}). \end{aligned}$$
(2)

From (2) we see, as in the univariate case, that the joint density of \(({\mathbf {X}}, {\mathbf {K}})\) is \(g({\mathbf {X}}+ 2 \pi {\mathbf {K}})\). If \(g(\cdot ; \varvec{ \theta } )\) is a p-variate normal density, with \(\varvec{ \theta } = ({\varvec{\mu }} , \varvec{\varSigma } )\), then \({\mathbf {X}}\) has a p-variate wrapped normal distribution with parameters \(({\varvec{\mu }} , \varvec{\varSigma } )\). Here, we introduce the latent random vector of winding numbers \({\mathbf {K}}\) to facilitate model fitting. Mardia and Jupp (1999) point out that only a few values of K are needed to obtain a reasonable approximation of the wrapped distribution and Jona Lasinio et al. (2012) show, when \(g(\cdot ; \varvec{ \theta } )\) is Gaussian, how to choose the set of values of K based on the variance of the associated conditional distribution.

Let \(Y({\mathbf {s}})\) be a Gaussian process (GP) with \({\mathbf {s}} \in {\mathbb {R}}^2\), mean function \(\mu ({\mathbf {s}})\) and covariance function say \(\sigma ^{2} \rho (||{\mathbf {s}}_i-\mathbf {s} _j ||; \varvec{\psi })\), where \(\varvec{\psi }\) is a set of parameters. For a set of locations \({\mathbf {s}}_{1},{\mathbf {s}}_{2},\ldots ,{\mathbf {s}}_{n}\), \({\mathbf {Y}}=(Y({\mathbf {s}}_{1}),Y({\mathbf {s}}_{2}),\ldots , Y({\mathbf {s}}_{n})) \sim N({\varvec{\mu }}, \sigma ^2 C(\varvec{\psi }))\), where \({\varvec{\mu }}= (\mu ({\mathbf {s}}_{1}),\ldots , \mu ({\mathbf {s}}_{n}))\) and \(C(\varvec{\psi })_{ij}= \rho ({\mathbf {s}}_{i} - {\mathbf {s}}_{j};\varvec{\psi })\). As a consequence \({\mathbf {X}}=(X({\mathbf {s}}_{1}),X({\mathbf {s}}_{2}),\ldots ,X({\mathbf {s}}_{n})) \sim WrapN({\varvec{\mu }},\sigma ^{2}{\mathbf {C}}(\varvec{\psi })) \) (Jona Lasinio et al. 2012), where \(WrapN(\cdot ,\cdot )\) indicates the wrapped normal distribution.

2.1 Space–time model specification and model fitting

Turning to space and time, suppose we seek \(\{ X({\mathbf {s}},t) \in [0,2 \pi ), {\mathbf {s}} \in {\mathcal {S}} \subseteq {\mathbb {R}}^2 , t\in \mathcal {T} \subseteq {\mathbb {Z}}^+ \}\), a spatio-temporal process of angular variables. We can model \(X({\mathbf {s}},t) \) as a spatio-temporal wrapped Gaussian process through its linear counterpart \(Y({\mathbf {s}},t)\), extending the above approach. We assume that the linear process is a spatio-temporal Gaussian process having non-separable covariance structure with variance \(\sigma ^2\) and the stationary correlation function due to Gneiting [see equation (14) in Gneiting 2002]:

$$\begin{aligned} \hbox {Cor}(Y({\mathbf {s}},t), Y({\mathbf {s}}',t')) \equiv \rho ({\mathbf {h}},u)=\frac{1}{(a|u|^{2\alpha }+1)^{ \tau }} \exp \left( -\frac{c\Vert {\mathbf {h}}\Vert ^{2\gamma }}{(a|u|^{2\alpha }+1)^{ \beta \gamma }}\right) \!, \end{aligned}$$
(3)

where \(({\mathbf {h}},u)\in {\mathbb {R}}^d\times {\mathbb {R}}\), \({\mathbf {h}} = {\mathbf {s}}-{\mathbf {s}}'\) and \(u = t-t' \). Here \(d=2\), a and c are non-negative scaling parameters for time and space, respectively. The smoothness parameters \(\alpha \) and \(\gamma \) take values in (0, 1], the space–time interaction parameter \(\beta \) is in [0, 1], and \(\tau \ge d/2 =1\) is, in fact, fixed at 1 following Gneiting (2002). Attractively, as \(\beta \) decreases toward zero, we tend to separability in space and time.

We write the linear GP \(Y({\mathbf {s}},t)\) as \(Y({\mathbf {s}},t)=\mu _Y+\omega _Y({\mathbf {s}},t)+ {\tilde{\varepsilon }}_Y({\mathbf {s}},t)\) where \(\mu _Y\) is a constant mean function, \(\omega _Y({\mathbf {s}},t)\) is a zero mean space–time GP with covariance function \(\sigma ^2 \rho ({\mathbf {h}},u)\), and \({\tilde{\varepsilon }}({\mathbf {s}},t) \mathop {\sim }\limits ^{iid} N(0,\phi _{Y}^2)\), i.e., is pure error. It is convenient to work with the marginalized model where we integrate over all of the \( \omega _{Y}({\mathbf {s}}, t)\), see Banerjee et al. (2014). That is,

$$\begin{aligned} Y({\mathbf {s}},t)=\mu _Y+{\varepsilon }_Y({\mathbf {s}},t). \end{aligned}$$
(4)

Then, \({\varepsilon }({\mathbf {s}},t)\) is a zero mean Gaussian process with covariance function

$$\begin{aligned} \hbox {Cov}({\varepsilon }_{Y} ({\mathbf {s}}_i, t_j),{ \varepsilon }_{Y} ({\mathbf {s}}_{i^{\prime }}, t_{j^{\prime }})) = \sigma _Y^2\hbox {Cor}({\mathbf {h}}_{i,i^{\prime }}, u_{j,j^{\prime }})+\phi _Y^2 1_{(i = i^{\prime })}1_{( j=j^{\prime })}. \end{aligned}$$

To complete the model specification, we need to specify prior distributions. We suggest the following choices. Since a and c are positive, a and \(c\sim G(\cdot ,\cdot )\) where \(G(\cdot ,\cdot )\) denotes a gamma distribution. Since \(\alpha \), \(\beta \), and \( \gamma \) are bounded between 0 and 1, we adopt a beta distribution (\(B(\cdot ,\cdot )\)). Priors for the variances and the mean direction are given the usual normal-inverse gamma form, i.e., \(\sigma ^2_Y,\phi ^2_Y\sim IG(\cdot ,\cdot )\), where \(IG(\cdot ,\cdot )\) denotes the inverse gamma, and \(\mu _y\sim WrapN(\cdot ,\cdot )\). In the sequel, this model will be denoted by WN.

2.2 Kriging and forecasting

We clarify prediction of the process at a new location and time, say \(({\mathbf {s}}_0,t_0)\), given what we have observed. We provide a full predictive distribution, extending Jona Lasinio et al. (2012) who only provide a posterior mean. Let \(\mathcal {D} \subset {\mathbb {R}}^2\times {\mathbb {Z}}^+\) be the set of n observed points. Let \({\mathbf {X}} = \{ X({\mathbf {s}},t), ({\mathbf {s}},t) \in \mathcal {D} \}\) be the vector of observed circular variables. Let \({\mathbf {Y}}=\{ Y({\mathbf {s}},t), ({\mathbf {s}},t) \in \mathcal {D} \}\) be the associated linear ones and let \({\mathbf {K}}=\{ K({\mathbf {s}},t), ({\mathbf {s}},t) \in \mathcal {D} \}\) be the associated vector of winding numbers. The predictive distribution we seek is \(g(X({\mathbf {s}}_0, t_0)|{\mathbf {X}})\). We use usual composition sampling within Markov chain Monte Carlo (MCMC) to obtain samples from it. Here, again we move from the circular process to the linear one, i.e., a sample from the distribution of \(Y({\mathbf {s}}_0, t_0)|{\mathbf {X}}\) can be considered as a sample from \(X({\mathbf {s}}_0, t_0),K({\mathbf {s}}_0, t_0)|{\mathbf {X}}\). If we let \({\varvec{\varPsi }}_Y\) be the vector of all parameters, we can write

$$\begin{aligned}&g(X({\mathbf {s}}_0, t_0), K({\mathbf {s}}_0, t_0)|{\mathbf {X}})\nonumber \\&\quad = \sum _{{\mathbf {K}}\in {\mathbb {Z}}^n} \int _{{\varvec{\varPsi }}_Y} g(X({\mathbf {s}}_0, t_0), K({\mathbf {s}}_0, t_0)|{\varvec{\varPsi }}_Y,\varvec{K}, {\mathbf {X}}) g({\varvec{\varPsi }}_Y,\varvec{K} | {\mathbf {X}}) d {\varvec{\varPsi }}_Y. \end{aligned}$$

So, suppose, for each posterior sample of \({\mathbf {K}}\) and \({\varvec{\varPsi }}_Y\) in \( \{ {\mathbf {K}}^*_l, {\varvec{\varPsi }}_{Y,l}^*, l=1,2, \ldots , L\}\) we generate a value from the distribution of \(X({\mathbf {s}}_0, t_0), K({\mathbf {s}}_0, t_0)|\) \({\varvec{\varPsi }}_Y,\varvec{K}, {\mathbf {X}} \). Then, we will obtain the set of posterior samples \(\{X^*_l({\mathbf {s}}_0, t_0),\) \( K^*_l({\mathbf {s}}_0, t_0) , l=1,2,\ldots , L \}\) from \(X({\mathbf {s}}_0, t_0), K({\mathbf {s}}_0, t_0)|{\mathbf {X}}\). If, we retain the set \(\{ X^*_l({\mathbf {s}}_0, t_0) , l=1,2,\ldots , L \}\), we will have samples from the desired predictive distribution.

Therefore, we need to sample from the distribution of \(X({\mathbf {s}}_0, t_0), K({\mathbf {s}}_0, t_0)|\) \({\varvec{\varPsi }}_Y,\varvec{K}, {\mathbf {X}}\) or equivalently \(Y({\mathbf {s}}_0, t_0)|{\mathbf {Y}}, {\varvec{\varPsi }}_Y \). Let \({\mathbf {1}}_m\) be the \(m \times 1\) vector of 1s, let \({{\mathbf {C}}}_{\mathbf {Y}}\) be the correlation matrix of \({{\mathbf {Y}}}\), and let \({{\mathbf {C}}}_{{\mathbf {Y}},Y({\mathbf {s}}_0,t_0)}\) be the correlation vector between \({\mathbf {Y}}\) and \(Y({\mathbf {s}}_0,t_0)\). Then, the joint distribution of \(Y({\mathbf {s}}_0,t_0), {\mathbf {Y}}| {\varvec{\varPsi }}_Y\) is

$$\begin{aligned} \left( \begin{array}{c} Y({\mathbf {s}}_0,t_0) \\ {\mathbf {Y}} \end{array} \right) | {\varvec{\varPsi }}_Y \sim N \left( \left( \begin{array}{c} \mu _Y\\ \mu _Y {\mathbf {1}}_n \end{array} \right) , \sigma _Y^2\left( \begin{array}{cc} 1&{} {{\mathbf {C}}}_{{\mathbf {Y}},Y({\mathbf {s}}_0,t_0)}^{\prime }\\ {{\mathbf {C}}}_{{\mathbf {Y}},Y({\mathbf {s}}_0,t_0)}&{} {\mathbf {C}}_{\mathbf {Y}} \end{array} \right) + \phi _Y^2 {\mathbf {I}}_{n+1} \right) \!. \end{aligned}$$

As a result, the conditional distribution of \(Y({\mathbf {s}}_0,t_0)| {\mathbf {Y}}, {\varvec{\varPsi }}_Y \) is Gaussian with mean

$$\begin{aligned} \hbox {M}_{Y({\mathbf {s}}_0,t_0)} = \mu _Y + \sigma _Y^2{{\mathbf {C}}}_{{\mathbf {Y}},Y({\mathbf {s}}_0,t_0)}^{ \prime } \left( \sigma _Y^2{\mathbf {C}}_{\mathbf {Y}} + \phi _Y^2 {\mathbf {I}}_n \right) ^{-1}\left( {\mathbf {Y}}- \mu _Y {\mathbf {1}}_n \right) \end{aligned}$$

and variance

$$\begin{aligned} \hbox {V}_{Y({\mathbf {s}}_0,t_0)} = \sigma _Y^2+\phi _Y^2 -\sigma _Y^2{{\mathbf {C}}}_{{\mathbf {Y}},Y({\mathbf {s}}_0,t_0)}^{\prime } \left( \sigma _Y^2{\mathbf {C}}_{\mathbf {Y}} + \phi _Y^2 {\mathbf {I}}_n \right) ^{-1} \sigma _Y^2{{\mathbf {C}}}_{{\mathbf {Y}},Y({\mathbf {s}}_0,t_0)}. \end{aligned}$$

Finally, suppose, for each posterior sample, we simulate \(Y^*_l({\mathbf {s}}_0,t_0) \) from \(N(\hbox {M}_{Y({\mathbf {s}}_0,t_0),l}^*, \hbox {V}_{Y({\mathbf {s}}_0,t_0),l}^*)\), where \(\hbox {M}_{Y({\mathbf {s}}_0,t_0),l}^*\) and \( \hbox {V}_{Y({\mathbf {s}}_0,t_0),l}^*\) are \(\hbox {M}_{Y({\mathbf {s}}_0,t_0)}\) and \( \hbox {V}_{Y({\mathbf {s}}_0,t_0)}\) computed with the lth sample. then, \( X^*_l({\mathbf {s}}_0,t_0) = Y^*_l({\mathbf {s}}_0,t_0) \hbox { mod } 2 \pi \) is a posterior sample from the predictive distribution.

3 The spatio-temporal projected normal process

Let \( (Z_1, Z_2)\) be a bivariate vector normally distributed with mean \({\varvec{\mu }}_Z = (\mu _{Z_1}, \mu _{Z_2})\) and covariance matrix

$$\begin{aligned} {\tilde{{\mathbf {V}}}}=\left( \begin{array}{cc} \sigma _{Z_1}^2 &{} \sigma _{Z_1}\sigma _{Z_2} \rho _z\\ \sigma _{Z_1}\sigma _{Z_2} \rho _z &{} \sigma _{Z_2}^2 \end{array} \right) \!. \end{aligned}$$

The vector \({\mathbf {Z}}\) is mapped into an angular variable \(\varTheta \) by the transformation \(\varTheta = {\hbox {atan}}^* ({Z_2}/{Z_1})\), where the function \(\hbox {atan}^*(S/C)\) is defined as \(\hbox {atan}(S/C)\) if \(C > 0\) and \(S \ge 0\), \(\pi / 2\) if \(C=0\) and \(S > 0\), \(\hbox {atan}(S/C)+\pi \) if \(C < 0\), \(\hbox {atan}(S/C)+2\pi \) if \(C \ge 0\) and \(S < 0\), undefined if \(C=S=0\). \(\varTheta \) is referred to as a projected normal random variable (Mardia 1972, p. 52) with parameters \({\varvec{\mu }}_{Z}\) and \({\tilde{{\mathbf {V}}}}\). Wang and Gelfand (2013) note that the distribution of \(\varTheta \) does not change if we multiply \((Z_1,Z_2)\) by a positive constant, so, following their lead, to identify the distribution we set \(\sigma _{Z_2}^2=1\) and the covariance matrix becomes

$$\begin{aligned} {\mathbf {V}} = \left( \begin{array}{cc} \sigma _{Z_1}^2 &{} \sigma _{Z_1} \rho _z\\ \sigma _{Z_1} \rho _z &{}1 \end{array} \right) \!. \end{aligned}$$

Again, it is convenient to introduce a latent variable. Here, it is \(R = ||{\mathbf {Z}}||\), obtaining the joint density of \((\varTheta , R)\):

$$\begin{aligned} (2 \pi )^{-1} |{\mathbf {V}}|^{{1/2}}\exp \left( - \frac{ (r(\cos \theta , \sin \theta )^{\prime }-{\varvec{\mu }}_Z)^{\prime } {\mathbf {V}}^{-1}(r(\cos \theta , \sin \theta )^{\prime }- {\varvec{\mu }}_Z)}{2} \right) r. \end{aligned}$$

We can move back and forth between the linear variables and the pair \((\varTheta , R)\) using the transformation \(Z_1 = R \cos \varTheta \), \(Z_2 = R \sin \varTheta \) and the equation \(\varTheta = {\hbox {atan}}^* ({Z_2}/{Z_1})\).

Consider a bivariate spatio-temporal process \({\mathbf {Z}}({\mathbf {s}},t) = (Z_1({\mathbf {s}},t),Z_2({\mathbf {s}},t))\) with constant mean \({\varvec{\mu }}_Z\) and cross covariance function \(\hbox {C}\left( {\mathbf {Z}} ({\mathbf {s}}_i,t_j), {\mathbf {Z}}({\mathbf {s}}_{i^{\prime }},t_{j^{\prime }})\right) =\hbox {Cor}( {\mathbf {s}}_i- {\mathbf {s}}_{i^{ \prime }},t_j-t_{j^{\prime }}) {\mathbf {V}} \) where \(\hbox {Cor}(\cdot ,\cdot )\) is a given space–time correlation function and \({\mathbf {V}}\) is as above. Then the circular process \(\varTheta ({\mathbf {s}},t)\) induced by \({\mathbf {Z}}({\mathbf {s}},t) \) with the \({\hbox {atan}}^*\) transformation is a projected Gaussian process with mean \({\varvec{\mu }}_Z\) and covariance function induced by \(\hbox {C}\left( {\mathbf {Z}} ({\mathbf {s}}_i,t_j), {\mathbf {Z}}({\mathbf {s}}_{i^{\prime }},t_{j^{\prime }})\right) \). More details on the properties of the process can be found in Wang and Gelfand (2014). Now, latent \(R({\mathbf {s}},t)\)s are introduced to facilitate model fitting.

3.1 Model specification and model fitting

We define the bivariate linear process as

$$\begin{aligned} Z_{\ell }({\mathbf {s}}, t) = \mu _{Z_\ell }+ \omega _{Z_\ell }({\mathbf {s}}, t)+ {\tilde{\varepsilon }}_{Z_\ell }({\mathbf {s}}, t), \quad \ell =1,2, \end{aligned}$$
(5)

where \({\varvec{\mu }}_{Z} = (\mu _{Z_1},\mu _{Z_2})^{\prime }\) is the mean level, \({\varvec{\omega }}_{Z}({\mathbf {s}}, t) = (\omega _{Z_1}({\mathbf {s}}, t), \omega _{Z_2}({\mathbf {s}}, t))^{\prime }\) is a bivariate Gaussian process with zero mean and covariance \(\hbox {Cov}({\varvec{\omega }}_{Z}({\mathbf {s}}_i,t_j), {\varvec{\omega }}_{Z}({\mathbf {s}}_{i^{\prime }},t_{j^{\prime }})) = \hbox {Cor}( {\mathbf {h}}_{i,i^{\prime }},u_{j,j^{\prime }}) {\mathbf {V}}\) where \(\hbox {Cor}( {\mathbf {h}}_{i,i^{\prime }},u_{j,j^{\prime }}) \) is defined in (3). Finally, \(\tilde{\varvec{\varepsilon }}_{Z} ({\mathbf {s}}, t)=({\tilde{\varepsilon }}_{Z_1}({\mathbf {s}}, t),{\tilde{\varepsilon }}_{Z_2}({\mathbf {s}}, t)) \) is bivariate pure error with zero mean, independent components, and variance \(\phi _{{Z}}^2\). Marginalizing over the \(\omega \) process in (5) yields

$$\begin{aligned} Z_{\ell }({\mathbf {s}}, t) = \mu _{Z_\ell }+ \varepsilon _{Z_\ell }({\mathbf {s}}, t), \quad \ell =1,2, \end{aligned}$$

where \(\varvec{\varepsilon }_{Z} ({\mathbf {s}}, t)\) is a mean zero bivariate Gaussian process with covariance function \(\hbox {Cov}(\varvec{\varepsilon }_{Z} ({\mathbf {s}}_i, t_j), \varvec{\varepsilon }_{Z} ({\mathbf {s}}_{i^{\prime }}, t_{j^{\prime }}) = \hbox {Cor}({\mathbf {h}}_{i,i^{\prime }},u_{j,j^{\prime }}) {\mathbf {V}}+ \phi _Z^2{\mathbf {I}}_21_{(i = i^{\prime })}1_{(j = j^{\prime })} \).

\(\varTheta ({\mathbf {s}},t)=\hbox {atan}^* ({Z_2({\mathbf {s}},t)} / {Z_1({\mathbf {s}},t)})\) is a circular process and, as in the WN setting, correlation between the circular variables is induced by the Gneiting spatio-temporal correlation function. To specify the prior distributions for \(\mu _{Z_1}\), \(\mu _{Z_2}\), \(\sigma _{Z_1}^2\) and \(\phi _{Z}^2\), we adopt the customary normal-inverse gamma specification. That is, \(\mu _{Z_1},\mu _{Z_2} \sim N(\cdot , \cdot )\), \(\sigma _{Z_1}^2,\phi _{Z}^2 \sim IG(\cdot ,\cdot )\) while, since \(\rho _Z\) is a correlation parameter, we adopt a truncated normal: \(\rho _Z \sim N(\cdot , \cdot )I(-1,1)\). In the sequel, this model will be denoted by PN.

We seek the predictive distribution at an unobserved location and time, \(({\mathbf {s}}_0,t_0)\). Let \({\varvec{\varTheta }}\) be the vector of observed circular values and \({\mathbf {Z}}=\{ {\mathbf {Z}}({\mathbf {s}},t), ({\mathbf {s}},t) \in \mathcal {D} \}\) be the associated linear ones. Let \({\mathbf {Z}}({\mathbf {s}}_0,t_0) = (Z_1({\mathbf {s}}_0,t_0),Z_2({\mathbf {s}}_0,t_0))^{\prime }\), \(\mathbf {R} = \{ R({\mathbf {s}},t), ({\mathbf {s}},t) \in \mathcal {D} \}\) and let \({\varvec{\varPsi }}_{Z}\) be all the parameters of the projected model.

Specifically, the predictive distribution we seek is \(\varTheta ({\mathbf {s}}_0,t_0)| {\varvec{\varTheta }}\). If we sample from the distribution of \({\mathbf {Z}} ({\mathbf {s}}_0,t_0)| {\varvec{\varTheta }}\) then \(\varTheta ({\mathbf {s}}_0,t_0)\) \(=\hbox {atan}^* ({Z_2({\mathbf {s}}_0,t_0)} / {Z_1({\mathbf {s}}_0,t_0)})\) is a sample from the desired predictive distribution. We have that

$$\begin{aligned} g({\mathbf {Z}} ({\mathbf {s}}_0,t_0)| {\varvec{\varTheta }}) = \int _{\mathbf {R}} \int _{{\varvec{\varPsi }}_Z} g({\mathbf {Z}} ({\mathbf {s}}_0,t_0)|{\varvec{\varPsi }}_Z,\mathbf {R}, {\varvec{\varTheta }}) g({\varvec{\varPsi }}_Z,{\mathbf {R}}| {\varvec{\varTheta }}) \text {d} {\varvec{\varPsi }}_Z \text {d} {\mathbf {R}}. \end{aligned}$$

So, we need to obtain \( g({\mathbf {Z}} ({\mathbf {s}}_0,t_0)|{\varvec{\varPsi }}_Z,{\mathbf {R}}, {\varvec{\varTheta }})\) and be able to sample from it. We start from the joint distribution of \({\mathbf {Z}}({\mathbf {s}}_0,t_0) , {\mathbf {Z}}| {\varvec{\varPsi }}_Z\):

$$\begin{aligned}&\displaystyle \left( \begin{array}{c} {\mathbf {Z}}({\mathbf {s}}_0,t_0) \\ {\mathbf {Z}} \end{array} \right) | {\varvec{\varPsi }}_Z\nonumber \\&\displaystyle \sim N \left( \left( \begin{array}{c} {\varvec{\mu }}_Z\\ {\mathbf {1}}_n \otimes {\varvec{\mu }}_{Z} \end{array} \right) \!, \left( \begin{array}{cc} 1 &{}{{\mathbf {C}}}_{{\mathbf {Z}},{\mathbf {Z}}({\mathbf {s}}_0,t_0)}^{\prime }\\ {{\mathbf {C}}}_{{\mathbf {Z}},{\mathbf {Z}}({\mathbf {s}}_0,t_0)}&{} {\mathbf {C}}_{\mathbf {Z}} \end{array} \right) \otimes {\mathbf {V}} +\phi _Z^2 {\mathbf {I}}_{2n+2} \right) , \end{aligned}$$

where \({\mathbf {C}}_{\mathbf {Z}}\) and \({{\mathbf {C}}}_{{\mathbf {Z}},{\mathbf {Z}}({\mathbf {s}}_0,t_0)}\) are the analogous of \({\mathbf {C}}_{\mathbf {Y}}\) and \({{\mathbf {C}}}_{{\mathbf {Y}},Y({\mathbf {s}}_0,t_0)}\) for the process \({\mathbf {Z}}({\mathbf {s}},t)\). The conditional distribution of \({\mathbf {Z}}({\mathbf {s}}_0,t_0) | {\mathbf {Z}}, {\varvec{\varPsi }}_Z\) (equivalently \({\mathbf {Z}}({\mathbf {s}}_0,t_0) | {\varvec{\varTheta }}, {\mathbf {R}}, {\varvec{\varPsi }}_Z\)) is bivariate normal with mean

$$\begin{aligned} \hbox {M}_{{\mathbf {Z}}({\mathbf {s}}_0,t_0)} = {\varvec{\mu }}_Z + {{\mathbf {C}}}_{{\mathbf {Z}},{\mathbf {Z}}({\mathbf {s}}_0,t_0)}^ {\prime }\otimes {\mathbf {V}} \left( {\mathbf {C}}_{\mathbf {Z}} \otimes {\mathbf {V}} + \phi _Z^2 {\mathbf {I}}_{2n} \right) ^{-1}({\mathbf {Z}}- {\mathbf {1}}_n \otimes {\varvec{\mu }}_{Z}) \end{aligned}$$

and variance

$$\begin{aligned} \hbox {V}_{{\mathbf {Z}}({\mathbf {s}}_0,t_0)} = {\mathbf {V}}- {{\mathbf {C}}}_{ {\mathbf {Z}},{\mathbf {Z}}({\mathbf {s}}_0,t_0)}^{\prime }\otimes {\mathbf {V}} \left( {\mathbf {C}}_{\mathbf {Z}} \otimes {\mathbf {V}} + \phi _Z^2 {\mathbf {I}}_{2n} \right) ^{-1} {{\mathbf {C}}}_{{\mathbf {Z}}, {\mathbf {Z}}({\mathbf {s}}_0,t_0)}\otimes {\mathbf {V}}. \end{aligned}$$

Using the posterior samples \(\{{\mathbf {R}}_l^*, {\varvec{\varPsi }}_{Z,l}^* , l=1,2,\ldots ,L\}\) we can collect samples of \(\varTheta _l^*({\mathbf {s}}_0,t_0)\) from its posterior predictive distribution.

4 Simulated examples

The Gneiting correlation function (3) has not been widely investigated within a Bayesian framework. The aim of this simulation study is essentially to provide a proof of concept. If space–time dependence, captured through the Gneiting correlation function, is driving an observed spatio-temporal circular dataset, can we learn about this dependence and can we demonstrate improved predictive performance by incorporating it in our modeling? We explore several different choices of parameters in (3).

For each proposed model, we simulated 48 datasets with \(n=240\) (20 locations and 12 time points) with spatial coordinates uniformly generated in \([0,10]\times [0,10]\). 24 datasets for the WN model were simulated from all possible combinations of \((a,c)= \{(1,0.2), (0.2,1) \}\), \(\beta =\{0,0.5,1\}\), \(\alpha =\{ 0.5,0.8 \}\), \(\gamma =\{ 0.5,0.8 \}\) and \((\mu _{Y},\sigma _Y^2, \phi _y^2) = (\pi ,0.1, 0.01)\). In the other 24 datasets we used the same combinations of correlation parameters but with \((\mu _{Y},\sigma _Y^2, \phi _y^2) = (\pi ,1, 0.1)\). The datasets cover a wide range of situations in terms of spatio-temporal correlation: strong spatial correlation with weak temporal correlation \(((a,c)=(1,0.2))\), weak spatial correlation with strong temporal correlation \(((a,c)=(0.2,1))\), fully separable spatio-temporal correlation (\(\beta =0\)), non-separable (\(\beta =\{0.5,0.9\}\)) and two levels for the smoothing parameters. The difference between the two collections of 24 datasets is that the first 24 have smaller circular variance than the remaining ones, where the circular variance was computed as one minus the mean resultant length divided by the sample size (Jammalamadaka and SenGupta 2001, p. 15).

The projected normal datasets were built according to the same rationale adopted for the wrapped normal, i.e., we built 24 datasets with small circular variance and 24 datasets with large circular variance. We simulated from unimodal projected distributions adopting the following sets of parameters:

  • all possible combinations of \((a,c)= \{(1,0.2), (0.2,1) \}\), \(\beta =\{0,0.5,1\}\), \(\alpha =\{ 0.5,0.8 \}\), \(\gamma =\{ 0.5,0.8 \}\) with \((\mu _{Z_1}, \mu _{Z_2}, \sigma _{Z_1}^2, \rho _Z,\phi _Z^2)= (2.5,2.5,1,0,0.01) \) which yields a circular variance close to the WN examples with \(\sigma _Y^2=0.1\).

  • all possible combinations of \((a,c)= \{(1,0.2), (0.2,1) \}\), \(\beta =\{0,0.5,1\}\), \(\alpha =\{ 0.5,0.8 \}\), \(\gamma =\{ 0.5,0.8 \}\) with \((\mu _{Z_1}, \mu _{Z_2}, \sigma _{Z_1}^2, \rho _Z , \phi _Z^2)= (0.85,0.85,1,0,0.1) \) which, again, yields a circular variance close to the WN examples with \(\sigma _Y^2=1\).

The parameters for the prior distributions were chosen so that the priors were centered on the “true” values used to simulate each dataset:

  • correlation parameters: \(a=0.2 \Rightarrow a \sim G(2,5)\), \(a=1 \Rightarrow a \sim G(5,4)\), \(c=0.2 \Rightarrow c \sim G(2,5)\), \(c=1 \Rightarrow c \sim G(5,4)\), \(\alpha =0.5 \Rightarrow \alpha \sim B(5,5)\), \(\alpha =0.8 \Rightarrow \alpha \sim B(6,1.5)\), \(\beta =0 \Rightarrow \beta \sim B(1,4)\), \(\beta =0.5 \Rightarrow \beta \sim B(5,5)\), \(\beta =0.9 \Rightarrow \beta \sim B(6,1.5)\), \(\gamma =0.5 \Rightarrow \gamma \sim B(5,5)\), \(\gamma =0.8 \Rightarrow \gamma \sim B(6,1.5)\);

  • parameters of the WN model: \(\mu _Y=5 \Rightarrow \mu _Y \sim WN(\pi ,5)\), \(\sigma _{Y}^2 =0.1 \Rightarrow \sigma _{Y}^2 \sim IG(4.5, 0.55)\), \(\sigma _{Y}^2 =1 \Rightarrow \sigma _{Y}^2 \sim IG(2.01, 4.01)\), \(\phi _{Y}^2 =0.01 \Rightarrow \phi _{Y}^2 \sim IG(2.001, 0.03)\), \(\phi _{Y}^2 =0.1 \Rightarrow \phi _{Y}^2 \sim IG(4.5, 0.55)\);

  • parameters of the PN model: \(\mu _{Z_1} = 2.5 \Rightarrow \mu _{Z_1} \sim N(2.5,5)\), \(\mu _{Z_1} = 0.85 \Rightarrow \mu _{Z_1} \sim N(0.85,5)\) \(\mu _{Z_2} = 2.5 \Rightarrow \mu _{Z_2} \sim N(2.5,5)\), \(\mu _{Z_2} = 0.85 \Rightarrow \mu _{Z_2} \sim N(0.85,5)\), \(\sigma _{Z_1}^2 = 1 \Rightarrow \sigma _{Z_1}^2 \sim IG(2.01,4.01)\), \(\rho _{Z} = 0 \Rightarrow \rho _{Z}\sim N(0,1)I(-1,1)\), \(\phi _{Z}^2 =0.01 \Rightarrow \phi _{Z}^2 \sim IG(2.001, 0.03)\), \(\phi _{Z}^2 =0.1 \Rightarrow \phi _{Z}^2 \sim IG(4.5, 0.55)\).

Among the 240 simulated observations in each dataset, 170 points, chosen between the first and tenth time points, were used for estimation and the remaining 70 points were set aside for validation purposes. The predictive performance was evaluated using two criteria. We computed an average prediction error (APE), defined as the average circular distance between a validation dataset and model predicted values, where we adopted as circular distance \(d(\alpha ,\beta )=1-\cos (\alpha -\beta )\) (Jammalamadaka and SenGupta 2001, p. 15). In particular, suppose the validation set has \(n^*\) observations, the APE for the models based on the wrapped normal is \( \frac{1}{n^*} \sum _{({\mathbf {s}}_0,t_0)}d(\mu ({\mathbf {s}}_0,t_0| {\mathbf {X}}), x({\mathbf {s}}_0, t_0))\) and \( \frac{1}{n^*} \sum _{({\mathbf {s}}_0,t_0)} d(\mu ({\mathbf {s}}_0,t_0| {\varvec{\varTheta }}), \theta ({\mathbf {s}}_0, t_0))\) for the projected normal ones. Here, \(x({\mathbf {s}}_0, t_0)\) and \( \theta ({\mathbf {s}}_0, t_0)\) are the realizations of the processes at \(({\mathbf {s}}_0,t_0)\) and \(\mu ({\mathbf {s}}_0,t_0| {\mathbf {X}})\) and \(\mu ({\mathbf {s}}_0,t_0| {\varvec{\varTheta }})\) are the posterior mean directions.

We also computed the continuous ranked probability score (CRPS) for circular variables as defined in Grimit et al. (2006):

$$\begin{aligned} \text {CRPS}(F, \delta )= E(d(\Delta ,\delta ))-\frac{1}{2}E(d(\Delta ,\Delta ^*)), \end{aligned}$$
(6)

where F is a predictive distribution, \(\delta \) is a holdout value, and \(\Delta \) and \(\Delta ^*\) are independent copies of a circular variable with distribution F. In this form, small values of CRPS are preferred.

For both models we do not know F in closed form, but we can compute a Monte Carlo approximation of (6). For the wrapped model, the CRPS for a held-out space–time point \(({\mathbf {s}}_0,t_0)\) is

$$\begin{aligned} \frac{1}{L}\sum _{l=1}^L d(x_l^*({\mathbf {s}}_0,t_0),x({\mathbf {s}}_0,t_0))-\frac{1}{2L^2}\sum _{l=1}^L\sum _{j=1}^L d(x_l^*({\mathbf {s}}_0,t_0),x_j^*({\mathbf {s}}_0,t_0)) \end{aligned}$$

and for the projected model it is

$$\begin{aligned} \frac{1}{L}\sum _{l=1}^L d( \theta _l^*({\mathbf {s}}_0,t_0),\theta ({\mathbf {s}}_0,t_0))-\frac{1}{2L^2}\sum _{l=1}^L\sum _{j=1}^L d (\theta _l^*({\mathbf {s}}_0,t_0),\theta _j^*({\mathbf {s}}_0,t_0)). \end{aligned}$$
Fig. 1
figure 1

Simulation study: CRPS comparing performances of the two proposed models. a WN. b PN

For each of the 48 simulated datasets, the values of the mean CRPS under the two models, computed over the set of points used for model validation, are shown in Fig. 1. For both models we see that the CRPS depends heavily on the variance of the process, but seems unaffected by changes in the other parameters.

A potentially important difference between the two models is the computational time required to fit them. The WN model is computationally more efficient than the PN model; the main issue is computational complexity (see Supplementary Online Material, Section S1). The PN requires, at each MCMC iteration, roughly 8 times as many operations as the WN to be fitted. If computational time is a relevant issue, then the WN may be more attractive.

5 Real data

We model wave directions obtained as outputs from a deterministic computer model implemented by Istituto Superiore per la Protezione e la Ricerca Ambientale (ISPRA). The computer model starts from a wind forecast model predicting the surface wind over the entire Mediterranean. The hourly evolution of sea wave spectra is obtained by solving energy transport equations using the wind forecast as input. Wave spectra are locally modified using a source function describing the wind energy, the energy redistribution due to nonlinear wave interactions, and energy dissipation due to wave fracture. The model produces estimates every hour on a grid with \(10\times 10\) km cells (Speranza et al. 2004, 2007). The ISPRA dataset has forecasts for a total of 4941 grid points over the Italian Mediterranean. Over the Adriatic Sea area, there are 1494 points.

Fig. 2
figure 2

Time windows for different sea states used for validation. The four panels represent the observed wave direction over the entire area at: a 12:00 on 5/5/2010 (storm); b 00:00 on 6/5/2010 (transition between storm and calm); c 00:00 on 7/5/2010 (calm); d 12:00 on 7/5/2010 (one-step prediction, calm)

Our aim is to compare the performance of the WN and PN models. From a phenomenological perspective, the PN model is arguably the more natural choice since we are not wrapping a linear scale to obtain the directions. However, the WN model does provide a suitable model and, as suggested above, it may be attractive in terms of computational efficiency and interpretability of parameters. In the selected dataset, the three sea states, calm, transition and storm are present. The sea state is defined through the wave height (which is also supplied by the computer model output): when this height is below 1 m, we have calm, when it is between 1 and 2 m we have transition (between calm and storm) and when it is greater than 2 m we have a storm. Wave directions vary more in calm than in storm. Here, we seek to learn about the spatio-temporal structure of the data relying only on the specification of the correlation function. We will use the information given by the wave heights in the models proposed in Sect. 6.1.

We fitted the model using 100 spatial points \(\times \) 10 time points 6 h apart (1000 observations in total) in order to have a dataset including all sea states. Notice that spatial distances are evaluated in kilometers. Then, we developed four validation datasets, each with 350 spatial points and 1 time point. Specifically, we have one dataset for each sea state plus one for a one-step forward prediction. Finally, we used the model fitted over the 1000 points to predict each validation dataset. Three of the datasets are inside the time window used for model estimation, one in calm sea, one in transition and one during a storm. The fourth validation set is at 12:00 on May 7, 2010, 6 h after the last time used for model fitting. The observed circular process in each of these four time windows can be seen in Fig. 2. For each time window and model we computed the mean CRPS and APE, see Table 1. Furthermore, we computed the mean CRPS and APE over the four time windows.

Table 1 Real data example: CRPS and APE for the WN and PN models computed on each validation dataset

Following our discussion in Sects. 2.1 and 3, we used the following priors: \(a\sim G(1.5,1)\), \(c\sim G(1.5,1)\), \(\alpha \sim B(2,2.5)\), \(\beta \sim B(1.1,2)\), \(\gamma \sim B(2,2.5)\), \(\sigma _Y^2\sim IG(2,2)\), \(\phi _Y^2\sim IG(1,0.25)\), \(\mu _Y\sim WrapN(\pi ,10)\), \(\mu _{Z_1} \sim N(0,10)\), \(\mu _{Z_2} \sim N(0,10)\), \(\rho _{Z} \sim N(0,5)I(-1,1)\), \(\sigma _Z^2\sim IG(2,2)\) and \(\phi _Z^2 \sim IG(1, 0.25)\). Notice that all distributions are weakly informative. Also, the prior for \(\beta \) is centered near 0.1, i.e., close to the separable model. Decay parameters in space and time are related to the minimum and maximum distances in space and time, chosen to ensure that they concentrate the probability mass over such intervals.

As we expected, the predictive capability of the two models, in terms of both CRPS and APE, is poorest in a calm state, the variance being larger than in other states. On the other hand, it is very accurate during a storm or a transition for both models as we can see in Table 1. The PN always performs better that the WN. The largest difference between the APE values of the two models (0.022) is observed during the calm sea time window.

Table 2 Real data example: mean point estimate (PE) and 95 % credible interval (CI) for the correlation parameters for the WN and PN models

In Table 2 we give credible intervals and posterior mean estimates for the value of the parameters of the correlation function. For both models non-separable correlation structure is strongly supported. The point estimates of the spatial (c) and temporal (a) decay are smaller in the PN model. Notice that data are bimodal whenever the wave directions look like those in Fig. 2c, d, i.e., when over a large region at a given time a storm is rotating or two different weather systems are meeting. Then, scalar statistics, such as the overall mean direction or the overall concentration, may not be informative regarding this behaviour.

In the Supplementary Online Material, we provide the parameter estimates for the wrapped and projected distributions with associated 95 % credible intervals (Table S1). Since \(\mu _Y\) is defined on a circular domain (recall that the prior on \(\mu _Y\) is \(WrapN(\cdot , \cdot )\)), following Jona Lasinio et al. (2012), we can compute a 95 % credible interval as the arc that contains the central 95 % of the posterior samples.

6 Extending the models

In the framework of the wrapped and projected normal models, introducing covariate information to explain the angular response is straightforward. For the wrapped approach we revise the linear version (4) to \(Y({\mathbf {s}},t)=\mu _Y({\mathbf {s}},t)+{\varepsilon }_Y({\mathbf {s}},t)\).

The external variables can be introduced by modeling the mean of the circular process. Linear specification induces a circular likelihood for the regression coefficients that has infinitely many maxima of comparable size since this model wraps the line infinitely many times around the circle, (see for example Johnson and Wehrly 1978; Fisher and Lee 1992). To address this problem it is customary to limit the domain of \(\mu _Y({\mathbf {s}}, t)\) using a link function, i.e., \(\mu _Y({\mathbf {s}}, t)= \mathcal {L}({\mathbf {H}}\left( {\mathbf {s}},t \right) \varvec{\eta })\), where \(\mathcal {L}(\cdot ):{\mathbb {R}} \rightarrow I\) is the link function and I is some interval of \({\mathbb {R}}\) of length equal to the circular variable period, in our case \(2 \pi \). We employ the inverse tan link (Fisher and Lee , 1992).

Table 3 Real data example: CRPS and APE for WNR, WNA, PNR and PNA models computed on each validation dataset
Fig. 3
figure 3

Real data example: CRPS surfaces for the WN (first column) and PN (second column) models, under calm (first row), transition (second row) and storm (third row) states. Scales differ across states

If only categorical covariates are available we do not need a link function; we can adopt an ANOVA representation for the relation between circular response and discrete covariates. This is computationally more efficient (see Supplementary Online Material, Section S1). Illustratively, suppose we have two predictors, with \(m_1\) and \(m_2\) levels, respectively, say \({\mathbf {H}}_1=(H_{1,1}, \ldots ,H_{1,m_1})\) and \({\mathbf {H}}_2=(H_{2,1}, \ldots , H_{2,m_2})\). Then, to simplify the condition ensuring \(\mu _Y\left( {\mathbf {s}},t \right) \in I\), we use the following parametrization:

$$\begin{aligned} \mu _Y\left( {\mathbf {s}},t \right) = \sum _{i=1}^{m_1}\sum _{j=1}^{m_2} \mu _{Y,im_2+j} {1}_{\left( H_1\left( {\mathbf {s}},t \right) =H_{1,i} \right) }{1}_{\left( H_2\left( {\mathbf {s}},t \right) =H_{2,j} \right) }. \end{aligned}$$

We can also introduce the covariates into the specifications for the variances, creating \(\sigma _Y^2({\mathbf {s}},t)\) and \(\phi _Y^2({\mathbf {s}},t)\). Again, we consider ANOVA-type models, e.g., \( \sigma ^2_Y\left( {\mathbf {s}},t \right) = \sum _{i=1}^{m_1}\sum _{j=1}^{m_2} \sigma ^2_{Y,im_2+j} {1}_{\left( H_1\left( {\mathbf {s}},t \right) =H_{1,i} \right) }\) \({1}_{\left( H_2\left( {\mathbf {s}},t \right) =H_{2,j} \right) } \) and \( \phi ^2_Y\left( {\mathbf {s}},t \right) = \) \(\sum _{i=1}^{m_1}\sum _{j=1}^{m_2}\phi ^2_{Y,im_2+j} {1}_{\left( H_1\left( {\mathbf {s}},t \right) =H_{1,i} \right) }{1}_{\left( H_2\left( {\mathbf {s}},t \right) =H_{2,j} \right) } \).

We investigate two models, both with an ANOVA parametrization for \(\sigma ^2_Y\left( {\mathbf {s}},t \right) \) and \(\phi ^2_Y\left( {\mathbf {s}},t \right) \) while for the mean, one has an ANOVA parametrization (WNA) and the other has a regression form (WNR). Below, we obtain an ANOVA form if we work with sea state and a regression form if we work with wave height. As prior distributions we propose: \(N(\cdot ,\cdot )\) for \(\eta _{Y,i}, i=1,2, \ldots \), that is, a customary prior for a regression coefficient; \(WrapN(\cdot , \cdot )\) for \(\mu _{Y,i}, i = 1,2,\ldots \), the circular equivalent of a normal prior over mean level; and \(IG(\cdot , \cdot )\) for \(\sigma _{Y,i}^2\) and \(\phi _{Y,i}^2, \, i=1,2, \ldots \), that is, the customary prior for a variance. To sample from the predictive distribution, we adopt the same procedure used above for the WN model.

Table 4 Real data example: mean point estimate (PE) and 95 % credible interval (CI) for the correlation parameters of the WNA, WNR, PNA and PNR models
Table 5 Real data example: mean point estimate (PE) and 95 % credible interval (CI) of the parameters of the WNA and WNR models
Table 6 Real data example: mean point estimate (PE) and 95 % credible interval (CI) for the parameters of the PNA and PNR models

To introduce dependence on covariates in the projected normal model, following Wang and Gelfand (2013), we revise Eq. (5) to \(Z_{\ell }({\mathbf {s}}, t) = \mu _{Z_\ell }({\mathbf {s}},t)+ \omega _{Z_\ell }({\mathbf {s}}, t)+{\tilde{\varepsilon }}_{Z_\ell }({\mathbf {s}}, t), \ell =1,2 \) where the mean of the linear bivariate process is a function of space and/or time and \({\tilde{\varepsilon }}_{Z_\ell }({\mathbf {s}},t) \mathop {\sim }\limits ^{iid} N(0, \phi _Z^2({\mathbf {s}},t))\). Then we marginalize over \({\varvec{\omega }}_{Z}({\mathbf {s}},t)\) to obtain \(Z_{\ell }({\mathbf {s}}, t) =\) \( \mu _{Z_\ell }({\mathbf {s}},t)+{\varepsilon }_{Z_\ell }({\mathbf {s}}, t), \ell =1,2\). We write \(\mu _{Z_\ell }({\mathbf {s}},t) = {\mathbf {H}}({\mathbf {s}},t)\varvec{\eta }_{Z_{\ell }}, \ell =1,2\) and \( \phi ^2_{Z}\left( {\mathbf {s}},t \right) = \sum _{i=1}^{m_1}\sum _{j=1}^{m_2} \phi ^2_{Z,im_2+j}{1}_{\left( H_1\left( {\mathbf {s}},t \right) =H_{1,i} \right) }{1}_{\left( H_2\left( {\mathbf {s}},t \right) =H_{2,j} \right) }\), where \(\varvec{\eta }_{Z_{\ell }} = (\eta _{Z_{\ell },1},\eta _{Z_{\ell },2}, \ldots )^{\prime }\). Note that, depending on the types of variables in \({\mathbf {H}}({\mathbf {s}},t)\), continuous or categorical, we can specify a (projected normal) regression (PNR) or (projected normal) ANOVA (PNA). As noted in Wang and Gelfand (2014), there is complex interaction among the parameters in the general projected normal, complicating interpretation of the behavior of the resulting projected normal distributions as we vary them. With the same rationale used for the priors of the WNA and WNR models, we propose \(\eta _{Z_{\ell },i}\sim N(\cdot ,\cdot ),\,l=1,2,\,i=1,2,\ldots \) and \(\phi _{Z,i} \sim IG(\cdot ,\cdot ),\,i=1,2,\ldots \). Here, again, we can sample from the predictive distribution adopting the same procedure as illustrated in Sect. 3.1.

6.1 Application to the wave data

We fitted the new models using the same dataset as in Sect. 5. For the ANOVA representation we used, as a categorical variable, the state of the sea while for the regression setting we used the significant wave height. Adopting the same rationale as in Sect. 5, the prior distributions for the regression coefficients (\(\eta _{Y,j,i}\) and \(\eta _{Z_{\ell },j,i},\,j=0,1,\, i=calm,trans,storm\)) were all N(0, 10). For the ANOVA coefficients, \(\mu _{Y,i}\) and \(\mu _{Z_{\ell },i}\), they were all \(WrapN(\pi ,10)\). For the \(\sigma _{Y,i}^2\), they were all IG(2, 2) and for the \(\phi _{Y,i}\) and \(\phi _{Z,i}\) they were all IG(1, 0.25). The prior distributions for the other parameters were the same as those used in Sect. 5.

From Table 3 we see that the WNA model is generally preferred to the WNR. For the projected models, APE and CRPS are almost indistinguishable between PNA and PNR during transition. With one-step ahead predictions, the two criteria return contradicting choices; PNR is preferred with CRPS, PNA with the APE. With the calm sea state, the CRPS chooses PNR while APE does not yield a clear decision. With the storm state, both criteria are lower with the PNA model. Overall, our selection would be the PNA model but, more importantly, we value the informative comparison our approach enables. In fact, the remarkable improvement of PNA over PNR in storm is likely due to the very high predictability of direction during a storm period. In this regard, the PN models are generally preferred to the WN models except in storm where WNR, WNA, and PNA are essentially equivalent.

To analyze the local behavior of model fitting, in Fig. 3 we report CRPS surfaces, evaluated in calm, transition and storm for the two “best average APE” models, the WNA (see Table 3) and PN (see Table 1). We see that the local behavior of the models is very similar. The worst predictions are found around the Gargano peninsula during calm. This is consistent with the physics of wave movement since, around the peninsula, local winds play a more relevant role, inducing very high variability in wave directions. The same behavior is shown with the other models. In terms of parameter estimation the WNA and PN models suggest a non-separable model (Tables 24) with very strong spatial (c) and temporal (a) dependence. WNA suggests that a different nugget is necessary for each sea state. In fact analyzing the credible intervals of these parameters we observe that, for each sea state, nuggets are significantly different among them as their credible intervals do not overlap (Table 5). For the projected normal models (Table 6), all nugget credible intervals are substantially overlapping, suggesting that one nugget should be enough to model all sea states.

7 Conclusions

We have presented a range of models for spatio-temporal circular data based on the wrapped and projected normal distributions, incorporating space–time dependence, allowing explanatory variables, introducing a nugget, implementing kriging and forecasting. The models based on the projected normal are more flexible since they allow bimodal and asymmetric distributions while the wrapped normal is unimodal and symmetric. On the other hand, the wrapped normal models are easy to interpret and are computationally better behaved and more efficient. Predictions obtained under the two models are very close and almost indistinguishable when data are roughly unimodal and symmetric (see Supplementary Online Material, Section S2). Then, if fast computation is sought, WN models become attractive.

The projected normal process can be straightforwardly extended to general directional fields on the sphere since the projected normal distribution is well defined in this case, see Mardia and Jupp (1999). The wrapped Gaussian process is not easily extended to a sphere. In fact, we are unaware of any approach to wrap multivariate linear data onto spheres. Conceptually, such wrapping would not appear to be well defined.

Future work will find us enriching wrapped modeling to allow asymmetry through the use of skewed distributions. Skewness is easy to introduce by wrapping skew normal distributions. In a completely different direction, we are also extending the modeling to explore spatio-temporal data consisting of geo-coded locations with periodic (in time) behaviour that can be represented as a circular variable. There, we work with trivariate GPs in space and time, incorporating temporal projection.