A Latent Variable Approach to Modelling Multivariate Geostatistical Skew-Normal Data

Bagnato, Luca; Minozzo, Marco

doi:10.1007/10104_2014_14

Luca Bagnato⁹ &
Marco Minozzo¹⁰

Part of the book series: Studies in Theoretical and Applied Statistics ((STASSPSS))

1689 Accesses
3 Citations

Abstract

In this paper we propose a spatial latent factor model to deal with multivariate geostatistical skew-normal data. In this model we assume that the unobserved latent structure, responsible for the correlation among different variables as well as for the spatial autocorrelation among different sites is Gaussian, and that the observed variables are skew-normal. For this model we provide some of its properties like its spatial autocorrelation structure and its finite dimensional marginal distributions. Estimation of the unknown parameters of the model is carried out by employing a Monte Carlo Expectation Maximization algorithm, whereas prediction at unobserved sites is performed by using closed form formulas and Markov chain Monte Carlo algorithms. Simulation studies have been performed to evaluate the soundness of the proposed procedures.

Access provided by Autonomous University of Puebla. Download chapter PDF

A non-homogeneous skew-Gaussian Bayesian spatial model

Article 08 August 2015

A Spatial Logistic Regression Model Based on a Valid Skew-Gaussian Latent Field

Article 01 September 2022

Bayesian Spatial Statistical Modeling

Keywords

1 Introduction

Although a large variety of spatial data sets (on radioactive contamination, rainfalls, winds, etc.) contain measurements with a considerable amount of skewness, its modelling still remains an issue. For instance, with regard to radiological monitoring, in [5], disregarding any physically-based modelling approach, it is argued on the necessity of developing mapping algorithms for emergency detection taking into consideration the skewness in the data. A boost to these developments came from the Spatial Interpolation Comparison (SIC) 2004 (see [11]) in which, whereas the routine scenario could easily be modelled using a Gaussian random field, the emergency scenario, which mimics an accidental release of radioactivity, needs to be modelled taking properly into account that, due to the presence of extreme measurements, the data are positively skewed. Just to cite a few works, to deal with skewed measurements coming from radioactive monitoring, [18] and [15] propose copula-based geostatistical approaches, whereas [9] argues that the structuring of extreme values can be faced in a coherent manner by using the class of Hermitian isofactorial models. Moreover, [4] proposes a Gaussian anamorphosis transformation to deal with skewed data coming from contaminated facilities, and [19] argues in favor of a Bayesian approach pointing out that both the Gaussian copula and the non-Gaussian χ ²-copula models are inappropriate to model strongly skewed radioactivity measurements. Other works dealing with skewed radiological measurements are [27], which is concerned with the estimation of the variogram and the development of optimal sampling plans, [7], which proposes a dynamic spatial Bayesian model for non-Gaussian measurements from radioactivity deposition, as well as the works in [22, 28] and [32]. On the other hand, a general approach developed to cope with some types of univariate non-Gaussian spatial data (including skew data) has been proposed in [8] by defining a family of transformed Gaussian random fields that provides an alternative to trans-Gaussian kriging.

Whereas in the univariate case, that is, in presence of just one regionalized variable, spatial modelling and prediction have been extensively studied for different types of non-Gaussian data, in particular skew data, in a multivariate non-Gaussian context only a limited number of works have been published. Among these, [26] and [25] extend to multivariate geostatistical non-Gaussian data the modelling approach of [10], whereas [6] proposes a hierarchical Bayesian approach to model Gaussian, count, and ordinal variables, by designing a Gibbs sampler with Metropolis-Hastings steps. Other works dealing with multivariate spatial data are those in [33], which explores the use of the Bayesian Maximum Entropy approach in presence of both continuous and categorical regionalized variables, and in [31], which uses Markov chain Monte Carlo methods for the Bayesian modelling of multivariate counts.

In this paper, to model skewness in a multivariate (that is, in presence of more than one regionalized variable) geostatistical context, we propose an alternative approach based on the use of the skew-normal distribution. Our modelling approach, which extends some of the ideas in [24] (see also [35]), is based on the skew-normal distribution [2, 3] and on a latent Gaussian factor structure. Just to give some examples, this approach might prove useful in the modelling of the radiological data in [16] or the data related to the Fukushima disaster (data are available from TEPCO at http://www.tepco.co.jp) where more than one radiological measurement has been collected for each sampling site. Apart from providing a much greater flexibility with respect to the traditional Gaussian random fields, it is possible to show that our model has all its finite-dimensional marginal distributions belonging to the family of the closed skew-normal distribution [13, 14]. It must be mentioned that the modelling construction proposed here is substantially different from some of the most popular constructions based on the skew-normal distribution that have recently appeared in the literature to model univariate skewed spatial data, like those, for instance, of [1, 20] and [17] (for a critical discussion on these constructions see [24]).

The paper is organized as follows. The model and its properties are presented in Sect. 2 and in Sect. 3, respectively. In Sect. 4 we present the estimation and prediction procedures and some simulation results, and in section “Conclusions” we make some final comments. More technical results are presented in the Appendix.

2 A Multivariate Closed Skew-Normal Geostatistical Model

In the following we define a model for geostatistical multivariate skewed data exploiting the ideas in [24] and in [25], by building the model on an unobserved latent Gaussian spatial factor structure. Let $y_{i}\left (\mathbf{x}_{k}\right )$, i = 1, …, m, k = 1, …, K, be a set of geo-referenced data measurements relative to m regionalized variables, gathered at K spatial locations x _k. Each of these m measured variables can be viewed as a partial realization of a particular stochastic process $Y _{i}\left (\mathbf{x}\right )$, i = 1, …, m, $\mathbf{x} \in \mathbb{R}^{2}$. We assume that these stochastic processes are given by

$$\displaystyle{ Y _{i}\left (\mathbf{x}\right ) =\beta _{i} + Z_{i}\left (\mathbf{x}\right ) +\omega _{i}S_{i}\left (\mathbf{x}\right ),\quad i = 1,\ldots,m, }$$

(1)

where β _i and ω _i are unknown constants, representing, respectively, an intercept and a scale parameter, and $Z_{i}\left (\mathbf{x}\right )$ and $S_{i}\left (\mathbf{x}\right )$ are latent processes. In particular, for every i = 1, …, m, $Z_{i}\left (\mathbf{x}\right )$ is a mean zero stationary Gaussian process, whereas for every i = 1, …, m, and for each $\mathbf{x} \in \mathbb{R}^{2}$, $S_{i}\left (\mathbf{x}\right )$ is an independent random variable distributed as a skew-normal [2], that is, $S_{i}\left (\mathbf{x}\right ) \sim \mathit{SN}\left (0,1,\alpha _{i}\right )$, which means that, for every $\mathbf{x} \in \mathbb{R}^{2}$, the density of S _i(x) is given by $f_{S_{i}}(s) = 2\phi _{1}(s;1)\varPhi (\alpha _{i}s)$, for −∞ < s < ∞, where $\alpha _{i} \in \mathbb{R}$, ϕ ₁(⋅ ; 1) is the scalar normal density function with zero mean and unit variance, and Φ(⋅ ) is the scalar N(0, 1) distribution function.

Let us note that, for each i = 1, …, m, and for every $\mathbf{x} \in \mathbb{R}^{2}$, conditionally on $Z_{i}\left (\mathbf{x}\right )$, the random variable $Y _{i}\left (\mathbf{x}\right )$ has a skew-normal distribution, that is,

$$\displaystyle{ Y _{i}\left (\mathbf{x}\right )\vert Z_{i}\left (\mathbf{x}\right ) \sim \mathit{SN}\left (\beta _{i} + Z_{i}\left (\mathbf{x}\right ),\omega _{i}^{2},\alpha _{ i}\right ), }$$

(2)

which means that we can write its density as

$$\displaystyle{f(y_{i}(\mathbf{x})\vert z_{i}(\mathbf{x})) = 2\ \phi _{1}\left (y_{i}(\mathbf{x}) -\beta _{i} - z_{i}(\mathbf{x});\omega _{i}^{2}\right )\varPhi \left (\frac{\alpha _{i}} {\omega _{i}}\big(y_{i}(\mathbf{x}) -\beta _{i} - z_{i}(\mathbf{x})\big)\right ),}$$

where ϕ ₁( ⋅ ; σ ²) is the scalar normal density function with zero mean and positive variance σ ². Moreover, for each i = 1, …, m, and for every $\mathbf{x} \in \mathbb{R}^{2}$, the (scalar) random variable $Y _{i}\left (\mathbf{x}\right )$ has a (marginal) skew-normal distribution, that is,

$$\displaystyle{ Y _{i}\left (\mathbf{x}\right ) \sim \mathit{SN}\left (\beta _{i},\varsigma _{i}^{2} +\omega _{ i}^{2},\alpha _{ i}\omega _{i}\Big/\sqrt{\varsigma _{i }^{2 }\left (1 +\alpha _{ i }^{2 } \right ) +\omega _{ i }^{2}}\right ), }$$

(3)

where $\varsigma _{i}^{2} =\mathrm{ Var}\left [Z_{i}\left (\mathbf{x}\right )\right ]$.

A similar result holds also for the other marginal distributions of the process. Indeed, with some algebra it is possible to show that all finite dimensional marginal distributions of the (weakly and strongly stationary) multivariate spatial process $\left (Y _{1}\left (\mathbf{x}\right ),\ldots,Y _{m}\left (\mathbf{x}\right )\right )^{T}$, for $\mathbf{x} \in \mathbb{R}^{2}$, are closed skew-normal (CSN). This implies, for instance, that, for each i = 1, …, m, the univariate spatial process $Y _{i}\left (\mathbf{x}\right )$, for $\mathbf{x} \in \mathbb{R}^{2}$, has all its finite-dimensional marginal distributions belonging to the CSN family (see the Appendix), and that, for any fixed spatial location $\mathbf{x} \in \mathbb{R}^{2}$, the random vector $\left (Y _{1}\left (\mathbf{x}\right ),\ldots,Y _{m}\left (\mathbf{x}\right )\right )^{T}$ has a multivariate CSN distribution [13, 14]. In principle, these results make the approach very appealing since they allow, due to the stationarity of the processes, to empirically check some of the distributional properties of the model. For instance, for a given set of observations, the empirical distribution of $y_{i}\left (\mathbf{x}_{k}\right )$, k = 1, …, K, for any given i = 1, …, m, can be compared with the marginal skew-normal distribution in (3).

For the latent part of the model, that is, for the stationary Gaussian processes $Z_{i}\left (\mathbf{x}\right )$, i = 1, …, m, we assume that

$$\displaystyle{ Z_{i}\left (\mathbf{x}\right ) =\sum _{ p=1}^{P}a_{\mathit{ ip}}F_{p}\left (\mathbf{x}\right ), }$$

(4)

where a _ip are m × P real coefficients, and $F_{p}\left (\mathbf{x}\right )$, p = 1, …, P, are P ≤ m non-observable spatial processes (common factors) responsible for the cross-correlations in the model. The processes $F_{p}\left (\mathbf{x}\right )$, p = 1, …, P, are assumed zero mean, stationary, and Gaussian with covariance function

$$\displaystyle{\text{Cov}\left [F_{p}\left (\mathbf{x}\right ),F_{q}\left (\mathbf{x} + \mathbf{h}\right )\right ] = \left \{\begin{array}{lcl} \rho (\mathbf{h}),&&p = q,\\ 0, & &p\neq q, \end{array} \right.}$$

where $\mathbf{h} \in \mathbb{R}^{2}$ and ρ(h) is a real spatial autocorrelation function common to all factors with ρ(0) = 1 and ρ(h) → 0, as $\left \|\mathbf{h}\right \| \rightarrow \infty $. Similarly to the classical linear factor model, this latent linear structure is responsible for a specific correlation structure among the processes Z _i(x). In particular, for each i = 1, …, m, the covariance functions are given by $\text{Cov}\left [Z_{i}\left (\mathbf{x}\right ),Z_{i}\left (\mathbf{x} + \mathbf{h}\right )\right ] =\sum _{ p=1}^{P}a_{\mathit{ip}}^{2}\rho (\mathbf{h})$, whereas the cross-covariance functions are given by $\text{Cov}\left [Z_{i}\left (\mathbf{x}\right ),Z_{j}\left (\mathbf{x} + \mathbf{h}\right )\right ] =\sum _{ p=1}^{P}a_{\mathit{ip}}a_{\mathit{jp}}\rho (\mathbf{h})$. Taking h = 0, we find that $\text{Var}\left [Z_{i}\left (\mathbf{x}\right )\right ] =\sum _{ p=1}^{P}a_{\mathit{ip}}^{2}$ and $\text{Cov}\left [Z_{i}\left (\mathbf{x}\right ),Z_{j}\left (\mathbf{x}\right )\right ] =\sum _{ p=1}^{P}a_{\mathit{ip}}a_{\mathit{jp}}$.

3 Variograms and Cross-Variograms

Let us consider here the correlation structure of the observable processes, induced by the latent factor model. For the observable stochastic processes $Y _{i}\left (\mathbf{x}\right )$, i = 1, …, m, we can show that

$$\displaystyle{\text{E}\left [Y _{i}\left (\mathbf{x}\right )\right ] =\beta _{i} +\omega _{i}\delta _{i}\left (\frac{2} {\pi } \right )^{\frac{1} {2} },\ \ \ \ \ \ \ \text{Var}\left [Y _{i}\left (\mathbf{x}\right )\right ] =\varsigma _{ i}^{2} +\omega _{ i}^{2}\left [1 -\frac{2} {\pi } \delta _{i}^{2}\right ],}$$

where $\delta _{i} =\alpha _{i}/\sqrt{1 +\alpha _{ i }^{2}}$, and, for h ≠ 0,

$$\displaystyle{ C_{\mathit{ii}}\left (\mathbf{h}\right ) = \text{Cov}\left [Y _{i}\left (\mathbf{x}\right ),Y _{i}\left (\mathbf{x} + \mathbf{h}\right )\right ] =\varsigma _{ i}^{2}\rho (\mathbf{h}). }$$

(5)

Note that if $\rho (\mathbf{h}) =\rho (-\mathbf{h})$ we have that $C_{\mathit{ii}}(\mathbf{h}) = C_{\mathit{ii}}(-\mathbf{h})$. Furthermore, C _ii(∞) = 0 and $C_{\mathit{ii}}(\mathbf{0})\neq C_{\mathit{ii}}(\mathbf{0}^{+}) =\varsigma _{ i}^{2}$, that is, the covariance function C _ii(h) is discontinuous at the origin.

On the other hand, for h ≠ 0, the variogram of the observable $Y _{i}\left (\mathbf{x}\right )$ takes the form

$$\displaystyle{ \gamma _{\mathit{ii}}\left (\mathbf{h}\right ) = \frac{1} {2}\text{Var}\left [Y _{i}\left (\mathbf{x} + \mathbf{h}\right ) - Y _{i}\left (\mathbf{x}\right )\right ] =\omega _{ i}^{2}\left [1 -\frac{2} {\pi } \delta _{i}^{2}\right ] +\varsigma _{ i}^{2}\left [1 -\rho (\mathbf{h})\right ], }$$

(6)

which is, similarly to the covariance function, discontinuous in zero. In fact, we have that $\gamma _{\mathit{ii}}\left (\mathbf{0}\right ) = 0$ and $\gamma _{\mathit{ii}}\left (\mathbf{0}^{+}\right ) =\omega _{ i}^{2}[1 - (2/\pi )\delta _{i}^{2}]$. Note that $\gamma _{\mathit{ii}}\left (\infty \right ) = C_{\mathit{ii}}(\mathbf{0})$. To visually asses Formula (6), Fig. 1 shows the form taken by the variogram $\gamma _{\mathit{ii}}\left (\mathbf{h}\right )$ for different values of the parameters, in the case of a Cauchy spatial autocorrelation function $\rho (\mathbf{h}) =\big [1 + \left (\left \|\mathbf{h}\right \|/\gamma \right )^{2}\big]^{-\eta }$, with γ = 1 and η = 1. As we can see, the nugget of the variogram decreases for decreasing values of ω and for values of the skewness parameter α departing from zero.

For any two stochastic processes $Y _{i}\left (\mathbf{x}\right )$ and $Y _{j}\left (\mathbf{x}\right )$, with i ≠ j, it is easy to show that

$$\displaystyle{ C_{\mathit{ij}}\left (\mathbf{h}\right ) = \text{Cov}\left [Y _{i}\left (\mathbf{x}\right ),Y _{j}\left (\mathbf{x} + \mathbf{h}\right )\right ] = \text{Cov}\left [Z_{i}\left (\mathbf{x}\right ),Z_{j}\left (\mathbf{x} + \mathbf{h}\right )\right ] =\varsigma _{\mathit{ij}}\rho (\mathbf{h}), }$$

(7)

where $\varsigma _{\mathit{ij}} =\sum _{ p=1}^{P}a_{\mathit{ip}}a_{\mathit{jp}} = \text{Cov}\left [Z_{i}\left (\mathbf{x}\right ),Z_{j}\left (\mathbf{x}\right )\right ]$. Note that $C_{\mathit{ij}}\left (\mathbf{h}\right ) = C_{\mathit{ji}}\left (\mathbf{h}\right )$ and that if $\rho (\mathbf{h}) =\rho (-\mathbf{h})$, then $C_{\mathit{ij}}\left (\mathbf{h}\right ) = C_{\mathit{ij}}\left (-\mathbf{h}\right )$.

For the cross-variogram between $Y _{i}\left (\mathbf{x}\right )$ and $Y _{j}\left (\mathbf{x}\right )$, with $i\neq j$, we obtain

$$\displaystyle{ \gamma _{\mathit{ij}}\left (\mathbf{h}\right ) = \frac{1} {2}\text{Cov}\left [Y _{i}\left (\mathbf{x} + \mathbf{h}\right ) - Y _{i}\left (\mathbf{x}\right ),Y _{j}\left (\mathbf{x} + \mathbf{h}\right ) - Y _{j}\left (\mathbf{x}\right )\right ] =\varsigma _{\mathit{ij}}\left [1 -\rho (\mathbf{h})\right ]. }$$

(8)

4 Estimation and Prediction

Assuming to know the number P of common factors and the spatial autocorrelation function ρ(h), the model depends on the parameter vector $\boldsymbol{\vartheta }^{{\ast}} = (\boldsymbol{\beta },\mathbf{A},\boldsymbol{\omega },\boldsymbol{\alpha })$, where $\boldsymbol{\beta }= \left (\beta _{1},\ldots,\beta _{m}\right )^{T}$, $\mathbf{A} = \left (\mathbf{a}_{1},\ldots,\mathbf{a}_{m}\right )^{T}$ with $\mathbf{a}_{i} = \left (\text{a}_{i1},\ldots,\text{a}_{iP}\right )^{T}$, $\boldsymbol{\omega }= \left (\omega _{1},\ldots,\omega _{m}\right )^{T}$, and $\boldsymbol{\alpha }= \left (\alpha _{1},\ldots,\alpha _{m}\right )^{T}$. Note that, similarly to the classical factor model, our model is not identifiable. Indeed, there are two groups of orthogonal transformations of the matrix A, given by permutation matrices and by some special reflection matrices, that leave the model unchanged [30]. However, this is the only indeterminacy in the model and can easily be faced.

In the following, we will further assume to know the parameters $\boldsymbol{\omega }$ and $\boldsymbol{\alpha }$. In this case, by resorting to Markov chain Monte Carlo (MCMC), and in particular to the Metropolis-Hasting algorithm, a likelihood based estimation procedure for the parameter $\boldsymbol{\vartheta }= (\boldsymbol{\beta },\mathbf{A})$ can be developed by exploiting the Monte Carlo Expectation Maximization (MCEM) algorithm. Let $\mathbf{F} = \left (\mathbf{F}_{1},\ldots,\mathbf{F}_{P}\right )^{T}$, where $\mathbf{F}_{p} = \left (F_{p}(\mathbf{x}_{1}),\ldots,F_{p}(\mathbf{x}_{K})\right )^{T}$, $p = 1,\ldots,P$, and let $\mathbf{y} = \left (\mathbf{y}_{1},\ldots,\mathbf{y}_{m}\right )^{T}$, where $\mathbf{y}_{i} = \left (y_{i}(\mathbf{x}_{1}),\ldots,y_{i}(\mathbf{x}_{K})\right )^{T}$, i = 1, …, m. Whereas the marginal log-likelihood $l(\boldsymbol{\vartheta }) =\ln f(\mathbf{y};\boldsymbol{\vartheta })$ is not available due to the presence of multidimensional integrals in the derivation of the marginal density $f(\mathbf{y};\boldsymbol{\vartheta })$, the complete log-likelihood based on the joint distribution $f(\mathbf{y},\mathbf{F};\boldsymbol{\vartheta })$ is easily given by

$$\displaystyle\begin{array}{rcl} l_{c}(\boldsymbol{\vartheta })& =& \ln f(\mathbf{y},\mathbf{F};\boldsymbol{\vartheta }) =\ln \big (f(\mathbf{y}\vert \mathbf{F};\boldsymbol{\vartheta }) \cdot f(\mathbf{F})\big) \\ & =& \ln \left \{\left (\prod _{i=1}^{m}\prod _{ k=1}^{K}f(y_{\mathit{ ik}};Z_{\mathit{ik}},\beta _{i})\right ) \cdot f(\mathbf{F})\right \} \\ & =& \ln \!\left \{\!\!\left (\prod _{i=1}^{m}\prod _{ k=1}^{K}2\ \phi _{ 1}\!\left (y_{\mathit{ik}}\! -\!\beta _{i}\! -\! Z_{\mathit{ik}};\omega _{i}^{2}\right )\varPhi \!\left (\!\frac{\alpha _{i}} {\omega _{i}}(y_{\mathit{ik}}\! -\!\beta _{i}\! -\! Z_{\mathit{ik}})\!\right )\!\right )\! \cdot \!\left (\prod _{p=1}^{P}f(\mathbf{F}_{ p})\!\right )\!\!\right \},{}\end{array}$$

(9)

where y _ik = y _i(x _k) and Z _ik = Z _i(x _k). In this situation, the marginal log-likelihood $l(\boldsymbol{\vartheta }) =\ln f(\mathbf{y};\boldsymbol{\vartheta })$ can be maximized by resorting to the Monte Carlo Expectation Maximization (MCEM) algorithm (see, for instance, [23] and [12]).

At the sth iteration, the MCEM algorithm involves three steps: S-step, E-step and M-step. In the first step (S-step), R _s samples F ^(r), $r = 1,\ldots,R_{s}$, are drawn from the (filtered) conditional distribution $f(\mathbf{F}\vert \mathbf{y};\boldsymbol{\vartheta }_{s-1})$, where $\boldsymbol{\vartheta }_{s-1}$ is the guess of the parameter $\boldsymbol{\vartheta }$ after the (s − 1)th iteration. These samples can be collected by using some Markov chain Monte Carlo (MCMC) procedure based on the Metropolis-Hustings algorithm. In the second step (E-step) the following approximation of the conditional expectation of the complete log-likelihood is computed

$$\displaystyle{Q_{s}\left (\boldsymbol{\vartheta },\boldsymbol{\vartheta }_{s-1}\right ) =\hat{ \text{E}}\left [\ln f(\mathbf{y},\mathbf{F};\boldsymbol{\vartheta })\vert \mathbf{y}\right ] = \frac{1} {R_{s}}\sum _{r=1}^{R_{s} }\ln f\big(\mathbf{y},\mathbf{F}^{(r)};\boldsymbol{\vartheta }\big).}$$

The last step (M-step) supplies as the new guess $\boldsymbol{\vartheta }_{s}$ the value of $\boldsymbol{\vartheta }$ which maximizes $Q_{s}\left (\boldsymbol{\vartheta },\boldsymbol{\vartheta }_{s-1}\right )$.

Although convergence results for this algorithm are not available, it is nevertheless possible to show that the “average” complete likelihood which is maximized in the M-step of the MCEM algorithm is concave and admits a unique local (and global) maximum. This result allows to safely implement standard numerical maximization techniques.

Assuming as known all parameters of the model, prediction of the observable processes $Y _{i}\left (\mathbf{x}\right )$ at an unobserved spatial location (or at an unobserved set of spatial locations) can be carried out either by exploiting some of the properties of the CSN distribution, or by implementing some MCMC algorithm. On the other hand, for the prediction of the unobserved common factors $F_{p}\left (\mathbf{x}\right )$, we need to resort to MCMC algorithms. In the case in which we are interested in predicting a common factor on a large set of spatial locations (maybe on a grid), instead of carring out an MCMC run at each spatial location, we can carry out an MCMC run only at the sampling points (that is, only at those points for which we gathered observations), and then exploit a linear property similar to Kriging, and also similar to that found by [34] in a univariate framework, to obtain predictions at all other spatial locations.

To asses the goodness of the MCEM estimation procedure we performed some simulation studies. To give some examples, in Fig. 2 we show the results of some simulation analyses. For these analyses we considered m = 2 and P = 1, that is, two observable variables and one latent common factor $F\left (\mathbf{x}\right )$. In the first two simulation experiments we considered a powered exponential (stable) spatial autocorrelation function $\rho (\mathbf{h}) =\exp \big [-\left (\gamma \left \|\mathbf{h}\right \|\right )^{\eta }\big]$, with $\gamma = 10^{-5}$ and η = 1. 5, whereas in the last two experiments we considered a Cauchy autocorrelation function with γ = 7, 000 and η = 1. For any given set of parameter values $\boldsymbol{\vartheta }^{{\ast}}$ and a given spatial autocorrelation function ρ(h), we simulated 50 realizations from the model over K = 25 equally spaced fixed sampling points located on the nodes of a grid. For each simulated realization, we run the MCEM estimation algorithm, assuming as unknown only the parameters a ₁₁, a ₂₁, β ₁ and β ₂. Each time, we considered 800 iterations of the MCEM algorithm, and at each step of the algorithm we considered 800 MCMC samples (of which 400 burn-in). As shown in Fig. 2, despite some possible distortion (which could be due to the modest sample size), the sampling distributions look quite reasonable. However, though our simulation experiments gave us reassuring results, we feel that more efforts should be made to fully investigate the theoretical inferential properties of the proposed inferential procedure.

As far as the computational load of our estimation procedure is concerned, implementing our algorithm with the help of the OpenBUGS software [21] using the package R2WinBUGS in R [29], and using standard commercial personal computers, the computing times are still demanding. Just to give an example, with 25 observations on a grid simulated assuming the powered exponential autocorrelation function and the value of the parameters used to obtain the simulated distributions in the second row of Fig. 2, one iteration of the MCEM algorithm (with an MCMC sample size of 800) took 41 s. Increasing the size of the grid to 49 observations, the computing time increases to 102 s. Let us note that much of the time is needed for the maximization step of the MCEM algorithm. In the former case, the time needed to generate the MCMC sample was less than 1 second, whereas the time needed by the maximization step was 40 s. Thus, to obtain one MCEM estimate, using 800 iterations of the MCEM, takes more than 9 h, and to obtain a simulated distribution, based on 50 replicates, of the MCEM estimator (that is, one row of Fig. 2) takes several days.

Conclusion

In this work we have proposed and studied a model for the analysis of multivariate geostatistical data showing some degree of skewness. Our geostatistical model based on latent factors can be considered as an extension to skewed non-Gaussian data of the classical geostatistical proportional covariance model.

By framing our model in a hierarchical context, that is, by extending to the multivariate case the model-based geostatistical approach in [10], it would be possible to extend the present work to deal with regionalized variables of different kind. Instead of assuming that the conditional distributions of $Y _{i}\left (\mathbf{x}\right )$ given $Z_{i}\left (\mathbf{x}\right )$ are all skew-normal, we might assume, for different values of i = 1, …, m, that they are of different type. For instance, [25] considers a model in which some of the (conditional) distributions, of the observable regionalized variables, are Poisson whereas some others are Gamma. In this way, we could obtain a model for non-Gaussian data flexible enough to account for observable regionalized variables showing different departures from normality.

On the other hand, a generalization in a different direction might involve the introduction of more spatial scales as in the classical linear model of coregionalization. This would supply a more flexible spatial autocorrelation structure in which the latent processes Z _i(x), which are behind the level of the observable regionalized variables Y _i(x), are not constrained to have proportional covariance and cross-covariance functions. However, the high level of complexity of this generalization would require a large amount of data to be detected and would pose serious inferential problems.

As regard to the model presented in this work, we presented a computationally intensive likelihood based inferential procedure, exploiting the capabilities of the MCEM algorithm. It must be noted that with this procedure we estimated just some of the parameters of the model, assuming the others as known. In particular, we assumed as known the parameters $\boldsymbol{\omega }= \left (\omega _{1},\ldots,\omega _{m}\right )^{T}$ and $\boldsymbol{\alpha }= \left (\alpha _{1},\ldots,\alpha _{m}\right )^{T}$ that characterize the shape of the skew-normal (conditional) distributions. In this way we avoided many of the well known inferential problems posed by the estimation of the parameters of the skew-normal distribution. Although in this work we did not discuss any inferential procedure for these parameters, these can nevertheless be calibrated comparing the theoretical marginal distributions and the theoretical variograms with the corresponding empirical counterparts. From a computational perspective, although we checked the feasibility of our estimation procedure for reasonable sample sizes and for different parameter values, it must be remarked that in more complex situations the computational burthen might increase considerably.

References

Allard, D., Naveau, P.: A new spatial skew-normal random field model. Commun. Stat. Theory 36, 1821–1834 (2007)
Article MATH MathSciNet Google Scholar
Azzalini, A.: The skew-normal distribution and related multivariate families. Scand. J. Stat. 32, 159–188 (2005)
Article MATH MathSciNet Google Scholar
Azzalini, A., Dalla Valle, A.: The multivariate skew-normal distribution. Biometrika 83, 715–726 (1996)
Article MATH MathSciNet Google Scholar
Bechler, A., Romary, T., Jeannée, N., Desnoyers, Y.: Geostatistical sampling optimization of contaminated facilities. Stoch. Env. Res. Risk A. 27, 1967–1974 (2013)
Article Google Scholar
Brenning, A., Dubois, G.: Towards generic real-time mapping algorithms for environmental monitoring and emergency detection. Stoch. Env. Res. Risk A. 22, 601–611 (2008)
Article MathSciNet Google Scholar
Chagneau, P., Mortier, F., Picard, N., Bacro, J.-N.: A hierarchical Bayesian model for spatial prediction of multivariate non-Gaussian random fields. Biometrics 67, 97–105 (2011)
Article MATH MathSciNet Google Scholar
De, S., Faria, Á.E.: Dynamic spatial Bayesian models for radioactivity deposition. J. Time Ser. Anal. 32, 607–617 (2011)
Article MATH MathSciNet Google Scholar
De Oliveira, V., Kedem, B., Short, D.A.: Bayesian prediction of transformed Gaussian random fields. J. Am. Stat. Assoc. 92 1422–1433 (1997)
MATH Google Scholar
Desnoyers, Y., Chilès, J.-P., Dubot, D., Jeannée, N., Idasiak, J.-M.: Geostatistics for radiological evaluation: study of structuring of extreme values. Stoch. Env. Res. Risk A. 25, 1031–1037 (2011)
Article Google Scholar
Diggle, P.J., Moyeed, R.A., Tawn, J.A.: Model-based geostatistics (with discussion). Appl. Stat. 47, 299–350 (1998)
MATH MathSciNet Google Scholar
Dubois, G., Galmarini, S.: Spatial interpolation comparison (SIC) 2004: introduction to the exercise and overview of results. In: Dubois, G. (ed.) Automatic Mapping Algorithms for Routine and Emergency Monitoring Data - Spatial Interpolation Comparison 2004, Office for Official Publication of the European Communities (2005)
Google Scholar
Fort, G., Moulines, E.: Convergence of the Monte Carlo expectation maximization for curved exponential families. Ann. Stat. 31, 1220–1259 (2003)
Article MATH MathSciNet Google Scholar
González-Farías, G., Domínguez-Molina, J.A., Gupta, A.K.: The closed skew-normal distribution. In: Genton, M.G. (ed.) Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality, pp. 25–42. Chapman & Hall/CRC, London (2004)
Google Scholar
González-Farías, G., Domínguez-Molina, J.A., Gupta, A.K.: Additive properties of skew-normal random vectors. J. Stat. Plan. Infer. 126, 521–534 (2004)
Article MATH Google Scholar
Gräler, B.: Modelling skewed spatial random fields through the spatial vine copula. Spat. Stat. (2014). http://dx.doi.org/10.1016/j.spasta.2014.01.001
Herranz, M., Romero, L.M., Idoeta, R., Olondo, C., Valiño, F., Legarda, F.: Inventory and vertical migration of⁹⁰Sr fallout and¹³⁷Cs/⁹⁰Sr ratio in Spanish mainland soils. J. Env. Radioact. 102, 987–994 (2011)
Article Google Scholar
Hosseini, F., Eidsvik, J., Mohammadzadeh, M.: Approximate Bayesian inference in spatial GLMM with skew normal latent variables. Comput. Stat. Data Anal. 55, 1791–1806 (2011)
Article MathSciNet Google Scholar
Kazianka, H., Pilz, J.: Copula-based geostatistical modeling of continuous and discrete data including covariates. Stoch. Env. Res. Risk A. 24, 661–673 (2010)
Article Google Scholar
Kazianka, H., Pilz, J.: Bayesian spatial modeling and interpolation using copulas. Comput. Geosci. 37, 310–319 (2011)
Article Google Scholar
Kim, H.-M., Mallick, B.K.: A Bayesian prediction using the skew Gaussian distribution. J. Stat. Plan. Infer. 120, 85–101 (2004)
Google Scholar
Lunn, D., Spiegelhalter, D., Thomas, A., Best, N.: The BUGS project: evolution, critique and future directions. Stat. Med. 28, 3049–3067 (2009)
Article MathSciNet Google Scholar
Maglione, D.S., Diblasi, A.,M.: Exploring a valid model for the variogram of an isotropic spatial process. Stoch. Env. Res. Risk A. 18, 366–376 (2004)
Google Scholar
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, New York (2007)
Google Scholar
Minozzo, M., Ferracuti, L.: On the existence of some skew-normal stationary processes. Chil. J. Stat. 3, 157–170 (2012)
MathSciNet Google Scholar
Minozzo, M., Ferrari, C.: Multivariate geostatistical mapping of radioactive contamination in the Maddalena Archipelago (Sardinia, Italy). AStA Adv. Stat. Anal. 97, 195–213 (2013)
Article MathSciNet Google Scholar
Minozzo, M., Fruttini, D.: Loglinear spatial factor analysis: an application to diabetes mellitus complications. Environmetrics 15, 423–434 (2004)
Article Google Scholar
Oliver, M.A., Badr, I.: Determining the spatial scale of variation in soil radon concentration. Math. Geol. 27, 893–922 (1995)
Article Google Scholar
Pilz, J., Spöck, G.: Why do we need and how should we implement Bayesian kriging methods. Stoch. Env. Res. Risk A. 22, 621–632 (2008)
Article Google Scholar
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0 (2012)
Google Scholar
Ren, Q., Banerjee, S.: Hierarchical factor models for large spatially misaligned data: a low-rank predictive process approach. Biometrics 69, 19–30 (2013)
Article MATH MathSciNet Google Scholar
Schmidt, A.M., Rodriguez, M.A.: Modelling multivariate counts varying continuously in space. Bayesian Stat. 9 (2011). doi:10.1093/acprof:oso/9780199694587.003.0020
Google Scholar
Spöck, G.: Spatial sampling design with skew distributions: the special case of trans-Gaussian kriging. Ninth International Geostatistical Congress, Oslo, Norway, 11–15 June 2012, 20 pages
Google Scholar
Wibrin, M., Bogaert, P., Fasbender, D.: Combining categorical and continuous spatial information within the Bayesian maximum entropy paradigm. Stoch. Env. Res. Risk A. 20, 423–433 (2006)
Article MathSciNet Google Scholar
Zhang, H.: On estimation and prediction for spatial generalized linear mixed models. Biometrics 58, 129–136 (2002)
Article MATH MathSciNet Google Scholar
Zhang, H., El-Shaarawi, A.: On spatial skew-Gaussian processes and applications. Environmetrics 21, 33–47 (2010)
MATH MathSciNet Google Scholar

Download references

Acknowledgements

We gratefully acknowledge funding from the Italian Ministry of Education, University and Research (MIUR) through PRIN 2008 project 2008MRFM2H.

Author information

Authors and Affiliations

Università Cattolica del Sacro Cuore, Milano, Italy
Luca Bagnato
Dipartimento di Scienze Economiche, Università degli Studi di Verona, Via dell’Artigliere 19, 37129, Verona, Italy
Marco Minozzo

Authors

Luca Bagnato
View author publications
You can also search for this author in PubMed Google Scholar
Marco Minozzo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Luca Bagnato or Marco Minozzo .

Editor information

Editors and Affiliations

Dept. of Economics and Management, University of Brescia, Brescia, Italy
Maurizio Carpita
Dept. of Economics and Management, University of Brescia, Brescia, Italy
Eugenio Brentari
Dept. of Chemometrics and Sensometrics, Oniris Nantes National College, Nantes, France
El Mostafa Qannari

Appendix

In this appendix we report some distributional results regarding the observable processes $Y _{i}\left (\mathbf{x}\right )$. Let us first recall some definitions. Following, for instance, [2], we say that a random vector Y = (Y ₁, …, Y _n)^T has an extended skew-normal distribution with parameters $\boldsymbol{\mu }$, $\boldsymbol{\varSigma }$, $\boldsymbol{\alpha }$ and τ, and we write $\mathbf{Y} \sim \mbox{ ESN}_{n}(\boldsymbol{\mu },\boldsymbol{\varSigma },\boldsymbol{\alpha },\tau )$, if it has probability density function of the form

$$\displaystyle{f(\mathbf{y}) =\phi _{n}(\mathbf{y}-\boldsymbol{\mu };\boldsymbol{\varSigma }) \cdot \varPhi (\alpha _{0} +\boldsymbol{\alpha } ^{T}\mathbf{D}^{-1}(\mathbf{y}-\boldsymbol{\mu }))/\varPhi (\tau ),\ \ \ \ \mbox{ for}\ \ \ \mathbf{y} \in \mathbb{R}^{n},}$$

(10)

where $\boldsymbol{\mu }\in \mathbb{R}^{n}$ is a vector of location parameters, $\phi _{n}(\ \cdot \;\boldsymbol{\varSigma })$ is the n-dimensional normal density function with zero mean vector and (positive-definite) variance-covariance matrix $\boldsymbol{\varSigma }$ having elements σ _ij, Φ(⋅ ) is the scalar N(0,1) distribution function, $\mathbf{D} = \mbox{ diag}(\sigma _{11},\ldots,\sigma _{nn})^{1/2}$ is the diagonal matrix formed with the standard deviations of the scale matrix $\boldsymbol{\varSigma }$, $\boldsymbol{\alpha }\in \mathbb{R}^{n}$ is a vector of skewness parameters, and $\tau \in \mathbb{R}$ is an additional parameter. Moreover, $\alpha _{0} =\tau (1 +\boldsymbol{\alpha } ^{T}\mathbf{R}\boldsymbol{\alpha })^{1/2}$ where R is the correlation matrix associated to $\boldsymbol{\varSigma }$, that is, $\mathbf{R} = \mathbf{D}^{-1}\boldsymbol{\varSigma }\mathbf{D}^{-1}$. Clearly, this distribution extends the multivariate normal distribution through the parameter vector $\boldsymbol{\alpha }$, and for $\boldsymbol{\alpha }= 0$ it reduces to the latter. When τ = 0, also α ₀ = 0 and (10) reduces to

$$\displaystyle{ f(\mathbf{y}) = 2 \cdot \phi _{n}(\mathbf{y}-\boldsymbol{\mu };\boldsymbol{\varSigma }) \cdot \varPhi (\boldsymbol{\alpha }^{T}\mathbf{D}^{-1}(\mathbf{y}-\boldsymbol{\mu })),\ \ \ \ \mbox{ for}\ \ \ \mathbf{y} \in \mathbb{R}^{n}. }$$

(11)

In this case we simply say that Y has a skew-normal distribution and we write, more concisely, $\mathbf{Y} \sim \mbox{ SN}_{n}(\boldsymbol{\mu },\boldsymbol{\varSigma },\boldsymbol{\alpha })$.

According to [13] and [14], we say that the n-dimensional random vector Y = (Y ₁, …, Y _n)^T has a multivariate closed skew-normal distribution, and we write $\mathbf{Y} \sim \mbox{ CSN}_{n,m}(\boldsymbol{\mu },\boldsymbol{\varSigma },\mathbf{D}_{c},\boldsymbol{\nu },\boldsymbol{\varDelta })$, if it has probability density function of the form

$$\displaystyle{ f(\mathbf{y}) = \frac{1} {\varPhi _{m}(\mathbf{0};\boldsymbol{\nu },\boldsymbol{\varDelta }+\mathbf{D}_{c}^{T}\boldsymbol{\varSigma }\mathbf{D}_{c})} \cdot \phi _{n}(\mathbf{y};\boldsymbol{\mu },\boldsymbol{\varSigma }) \cdot \varPhi _{m}(\mathbf{D}_{c}^{T}(\mathbf{y}-\boldsymbol{\mu });\boldsymbol{\nu },\boldsymbol{\varDelta }),\ \ \ \ \mbox{ for}\ \ \ \mathbf{y} \in \mathbb{R}^{n}, }$$

(12)

where: m is an integer greater than 0; $\boldsymbol{\mu }\in \mathbb{R}^{n}$; $\boldsymbol{\varSigma }\in \mathbb{R}^{n\times n}$ is a positive-definite matrix; $\mathbf{D}_{c} \in \mathbb{R}^{n\times m}$ is an n × m matrix; $\boldsymbol{\nu }\in \mathbb{R}^{m}$ is a vector; $\boldsymbol{\varDelta }\in \mathbb{R}^{m\times m}$ is a positive-definite matrix; and $\phi _{n}(\ \cdot \;\boldsymbol{\mu },\boldsymbol{\varSigma })$ and $\varPhi _{n}(\ \cdot \;\boldsymbol{\mu },\boldsymbol{\varSigma })$ are the probability density function and the cumulative distribution function, respectively, of the n-dimensional normal distribution with mean vector $\boldsymbol{\mu }$ and variance-covariance matrix $\boldsymbol{\varSigma }$.

Though, as we have already noticed, the multivariate finite-dimensional marginal distributions of the multivariate spatial process $\left (Y _{1}\left (\mathbf{x}\right ),\ldots,Y _{m}\left (\mathbf{x}\right )\right )^{T}$, for $\mathbf{x} \in \mathbb{R}^{2}$, are not skew-normal (in the sense of [2]), it is possible to show that they are closed skew-normal, according to the definition of [13]. This implies that, for any given i = 1, …, m, each univariate spatial process $Y _{i}\left (\mathbf{x}\right )$ has all its finite-dimensional marginal distributions that are closed skew-normal. To see this (see also [24]), consider n spatial locations x ₁, …, x _n, and the corresponding n-dimensional random vector Y = (Y _i(x ₁), …, Y _i(x _n))^T. Recalling that for any given $\mathbf{x} \in \mathbb{R}^{2}$ we can write $Y _{i}(\mathbf{x}) =\beta _{i} + Z_{i}(\mathbf{x}) +\omega _{i}S_{i}(\mathbf{x})$, the vector Y can be written as $\mathbf{Y} =\beta _{i}\mathbf{1}_{n} + \mathbf{Z} + \mathbf{D}_{\omega }\mathbf{S} = \mathbf{W} + \mathbf{V}$, where $\mathbf{W} =\beta _{i}\mathbf{1}_{n} + \mathbf{Z}$, V = D _ω S, Z = (Z _i(x ₁), …, Z _i(x _n))^T, S = (S _i(x ₁), …, S _i(x _n))^T and D _ω is the n × n diagonal matrix with ω _i on the diagonal. Now, since S _i(x), for $\mathbf{x} \in \mathbb{R}^{2}$, are independently and identically distributed as CSN_1, 1(0, 1, α _i, 0, 1), according to Theorem 3 of [14], we have that $\mathbf{S} \sim \mbox{ CSN}_{n,n}(0,\mathbf{I}_{n},\mathbf{D}_{\alpha },0,\mathbf{I}_{n})$, where D _α is the n × n diagonal matrix with α _i on the diagonal. On the other hand, since Z follows a multivariate normal distribution with mean 0 and covariance matrix $\boldsymbol{\varSigma }_{Z}$ with entries given by $\mathrm{Cov}\big[Z_{i}(\mathbf{x}),Z_{i}(\mathbf{x} + \mathbf{h})\big] =\varsigma _{ i}^{2}\rho (\mathbf{h})$, we also have that $\mathbf{Z} \sim \mbox{ CSN}_{n,1}(0,\boldsymbol{\varSigma }_{Z},0,0,1)$. Moreover, being W distributed as a multivariate normal with mean β _i 1 _n and covariance matrix $\boldsymbol{\varSigma }_{Z}$, we can write that $\mathbf{W} \sim \mbox{ CSN}_{n,1}(\beta _{i}\mathbf{1}_{n},\boldsymbol{\varSigma }_{Z},0,0,1)$, and using Theorem of [14] we can also write that $\mathbf{V} \sim \mbox{ CSN}_{n,n}(0,\mathbf{D}_{\omega ^{2}},\mathbf{D}_{\alpha /\omega },0,\mathbf{I}_{n})$, where $\mathbf{D}_{\omega ^{2 }}$ is the n × n diagonal matrix with ω _i ² on the diagonal, and D _α∕ω is the n × n diagonal matrix with α _i∕ω _i on the diagonal. Thus, considering that $\mathbf{Y} = \mathbf{W} + \mathbf{V}$, we can conclude, using Theorem 4 of [14], that $\mathbf{Y} \sim \mbox{ CSN}_{n,n+1}(\beta _{i}\mathbf{1}_{n},\boldsymbol{\varSigma }_{Z} +\omega _{ i}^{2}\mathbf{I}_{n},\mathbf{D}^{{\ast}},0,\boldsymbol{\varDelta }^{{\ast}})$, for some matrices D ^∗ and $\boldsymbol{\varDelta }^{{\ast}}$.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bagnato, L., Minozzo, M. (2014). A Latent Variable Approach to Modelling Multivariate Geostatistical Skew-Normal Data. In: Carpita, M., Brentari, E., Qannari, E. (eds) Advances in Latent Variables. Studies in Theoretical and Applied Statistics(). Springer, Cham. https://doi.org/10.1007/10104_2014_14

Download citation

DOI: https://doi.org/10.1007/10104_2014_14
Published: 28 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02966-5
Online ISBN: 978-3-319-02967-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

A Latent Variable Approach to Modelling Multivariate Geostatistical Skew-Normal Data

Abstract

Similar content being viewed by others

A non-homogeneous skew-Gaussian Bayesian spatial model