Keywords

1 Introduction

Although a large variety of spatial data sets (on radioactive contamination, rainfalls, winds, etc.) contain measurements with a considerable amount of skewness, its modelling still remains an issue. For instance, with regard to radiological monitoring, in [5], disregarding any physically-based modelling approach, it is argued on the necessity of developing mapping algorithms for emergency detection taking into consideration the skewness in the data. A boost to these developments came from the Spatial Interpolation Comparison (SIC) 2004 (see [11]) in which, whereas the routine scenario could easily be modelled using a Gaussian random field, the emergency scenario, which mimics an accidental release of radioactivity, needs to be modelled taking properly into account that, due to the presence of extreme measurements, the data are positively skewed. Just to cite a few works, to deal with skewed measurements coming from radioactive monitoring, [18] and [15] propose copula-based geostatistical approaches, whereas [9] argues that the structuring of extreme values can be faced in a coherent manner by using the class of Hermitian isofactorial models. Moreover, [4] proposes a Gaussian anamorphosis transformation to deal with skewed data coming from contaminated facilities, and [19] argues in favor of a Bayesian approach pointing out that both the Gaussian copula and the non-Gaussian χ 2-copula models are inappropriate to model strongly skewed radioactivity measurements. Other works dealing with skewed radiological measurements are [27], which is concerned with the estimation of the variogram and the development of optimal sampling plans, [7], which proposes a dynamic spatial Bayesian model for non-Gaussian measurements from radioactivity deposition, as well as the works in [22, 28] and [32]. On the other hand, a general approach developed to cope with some types of univariate non-Gaussian spatial data (including skew data) has been proposed in [8] by defining a family of transformed Gaussian random fields that provides an alternative to trans-Gaussian kriging.

Whereas in the univariate case, that is, in presence of just one regionalized variable, spatial modelling and prediction have been extensively studied for different types of non-Gaussian data, in particular skew data, in a multivariate non-Gaussian context only a limited number of works have been published. Among these, [26] and [25] extend to multivariate geostatistical non-Gaussian data the modelling approach of [10], whereas [6] proposes a hierarchical Bayesian approach to model Gaussian, count, and ordinal variables, by designing a Gibbs sampler with Metropolis-Hastings steps. Other works dealing with multivariate spatial data are those in [33], which explores the use of the Bayesian Maximum Entropy approach in presence of both continuous and categorical regionalized variables, and in [31], which uses Markov chain Monte Carlo methods for the Bayesian modelling of multivariate counts.

In this paper, to model skewness in a multivariate (that is, in presence of more than one regionalized variable) geostatistical context, we propose an alternative approach based on the use of the skew-normal distribution. Our modelling approach, which extends some of the ideas in [24] (see also [35]), is based on the skew-normal distribution [2, 3] and on a latent Gaussian factor structure. Just to give some examples, this approach might prove useful in the modelling of the radiological data in [16] or the data related to the Fukushima disaster (data are available from TEPCO at http://www.tepco.co.jp) where more than one radiological measurement has been collected for each sampling site. Apart from providing a much greater flexibility with respect to the traditional Gaussian random fields, it is possible to show that our model has all its finite-dimensional marginal distributions belonging to the family of the closed skew-normal distribution [13, 14]. It must be mentioned that the modelling construction proposed here is substantially different from some of the most popular constructions based on the skew-normal distribution that have recently appeared in the literature to model univariate skewed spatial data, like those, for instance, of [1, 20] and [17] (for a critical discussion on these constructions see [24]).

The paper is organized as follows. The model and its properties are presented in Sect. 2 and in Sect. 3, respectively. In Sect. 4 we present the estimation and prediction procedures and some simulation results, and in section “Conclusions” we make some final comments. More technical results are presented in the Appendix.

2 A Multivariate Closed Skew-Normal Geostatistical Model

In the following we define a model for geostatistical multivariate skewed data exploiting the ideas in [24] and in [25], by building the model on an unobserved latent Gaussian spatial factor structure. Let \(y_{i}\left (\mathbf{x}_{k}\right )\), i = 1, , m, k = 1, , K, be a set of geo-referenced data measurements relative to m regionalized variables, gathered at K spatial locations x k . Each of these m measured variables can be viewed as a partial realization of a particular stochastic process \(Y _{i}\left (\mathbf{x}\right )\), i = 1, , m, \(\mathbf{x} \in \mathbb{R}^{2}\). We assume that these stochastic processes are given by

$$\displaystyle{ Y _{i}\left (\mathbf{x}\right ) =\beta _{i} + Z_{i}\left (\mathbf{x}\right ) +\omega _{i}S_{i}\left (\mathbf{x}\right ),\quad i = 1,\ldots,m, }$$
(1)

where β i and ω i are unknown constants, representing, respectively, an intercept and a scale parameter, and \(Z_{i}\left (\mathbf{x}\right )\) and \(S_{i}\left (\mathbf{x}\right )\) are latent processes. In particular, for every i = 1, , m, \(Z_{i}\left (\mathbf{x}\right )\) is a mean zero stationary Gaussian process, whereas for every i = 1, , m, and for each \(\mathbf{x} \in \mathbb{R}^{2}\), \(S_{i}\left (\mathbf{x}\right )\) is an independent random variable distributed as a skew-normal [2], that is, \(S_{i}\left (\mathbf{x}\right ) \sim \mathit{SN}\left (0,1,\alpha _{i}\right )\), which means that, for every \(\mathbf{x} \in \mathbb{R}^{2}\), the density of S i (x) is given by \(f_{S_{i}}(s) = 2\phi _{1}(s;1)\varPhi (\alpha _{i}s)\), for − < s < , where \(\alpha _{i} \in \mathbb{R}\), ϕ 1(⋅ ; 1) is the scalar normal density function with zero mean and unit variance, and Φ(⋅ ) is the scalar N(0, 1) distribution function.

Let us note that, for each i = 1, , m, and for every \(\mathbf{x} \in \mathbb{R}^{2}\), conditionally on \(Z_{i}\left (\mathbf{x}\right )\), the random variable \(Y _{i}\left (\mathbf{x}\right )\) has a skew-normal distribution, that is,

$$\displaystyle{ Y _{i}\left (\mathbf{x}\right )\vert Z_{i}\left (\mathbf{x}\right ) \sim \mathit{SN}\left (\beta _{i} + Z_{i}\left (\mathbf{x}\right ),\omega _{i}^{2},\alpha _{ i}\right ), }$$
(2)

which means that we can write its density as

$$\displaystyle{f(y_{i}(\mathbf{x})\vert z_{i}(\mathbf{x})) = 2\ \phi _{1}\left (y_{i}(\mathbf{x}) -\beta _{i} - z_{i}(\mathbf{x});\omega _{i}^{2}\right )\varPhi \left (\frac{\alpha _{i}} {\omega _{i}}\big(y_{i}(\mathbf{x}) -\beta _{i} - z_{i}(\mathbf{x})\big)\right ),}$$

where ϕ 1( ⋅  ; σ 2) is the scalar normal density function with zero mean and positive variance σ 2. Moreover, for each i = 1, , m, and for every \(\mathbf{x} \in \mathbb{R}^{2}\), the (scalar) random variable \(Y _{i}\left (\mathbf{x}\right )\) has a (marginal) skew-normal distribution, that is,

$$\displaystyle{ Y _{i}\left (\mathbf{x}\right ) \sim \mathit{SN}\left (\beta _{i},\varsigma _{i}^{2} +\omega _{ i}^{2},\alpha _{ i}\omega _{i}\Big/\sqrt{\varsigma _{i }^{2 }\left (1 +\alpha _{ i }^{2 } \right ) +\omega _{ i }^{2}}\right ), }$$
(3)

where \(\varsigma _{i}^{2} =\mathrm{ Var}\left [Z_{i}\left (\mathbf{x}\right )\right ]\).

A similar result holds also for the other marginal distributions of the process. Indeed, with some algebra it is possible to show that all finite dimensional marginal distributions of the (weakly and strongly stationary) multivariate spatial process \(\left (Y _{1}\left (\mathbf{x}\right ),\ldots,Y _{m}\left (\mathbf{x}\right )\right )^{T}\), for \(\mathbf{x} \in \mathbb{R}^{2}\), are closed skew-normal (CSN). This implies, for instance, that, for each i = 1, , m, the univariate spatial process \(Y _{i}\left (\mathbf{x}\right )\), for \(\mathbf{x} \in \mathbb{R}^{2}\), has all its finite-dimensional marginal distributions belonging to the CSN family (see the Appendix), and that, for any fixed spatial location \(\mathbf{x} \in \mathbb{R}^{2}\), the random vector \(\left (Y _{1}\left (\mathbf{x}\right ),\ldots,Y _{m}\left (\mathbf{x}\right )\right )^{T}\) has a multivariate CSN distribution [13, 14]. In principle, these results make the approach very appealing since they allow, due to the stationarity of the processes, to empirically check some of the distributional properties of the model. For instance, for a given set of observations, the empirical distribution of \(y_{i}\left (\mathbf{x}_{k}\right )\), k = 1, , K, for any given i = 1, , m, can be compared with the marginal skew-normal distribution in (3).

For the latent part of the model, that is, for the stationary Gaussian processes \(Z_{i}\left (\mathbf{x}\right )\), i = 1, , m, we assume that

$$\displaystyle{ Z_{i}\left (\mathbf{x}\right ) =\sum _{ p=1}^{P}a_{\mathit{ ip}}F_{p}\left (\mathbf{x}\right ), }$$
(4)

where a ip are m × P real coefficients, and \(F_{p}\left (\mathbf{x}\right )\), p = 1, , P, are P ≤ m non-observable spatial processes (common factors) responsible for the cross-correlations in the model. The processes \(F_{p}\left (\mathbf{x}\right )\), p = 1, , P, are assumed zero mean, stationary, and Gaussian with covariance function

$$\displaystyle{\text{Cov}\left [F_{p}\left (\mathbf{x}\right ),F_{q}\left (\mathbf{x} + \mathbf{h}\right )\right ] = \left \{\begin{array}{lcl} \rho (\mathbf{h}),&&p = q,\\ 0, & &p\neq q, \end{array} \right.}$$

where \(\mathbf{h} \in \mathbb{R}^{2}\) and ρ(h) is a real spatial autocorrelation function common to all factors with ρ(0) = 1 and ρ(h) → 0, as \(\left \|\mathbf{h}\right \| \rightarrow \infty \). Similarly to the classical linear factor model, this latent linear structure is responsible for a specific correlation structure among the processes Z i (x). In particular, for each i = 1, , m, the covariance functions are given by \(\text{Cov}\left [Z_{i}\left (\mathbf{x}\right ),Z_{i}\left (\mathbf{x} + \mathbf{h}\right )\right ] =\sum _{ p=1}^{P}a_{\mathit{ip}}^{2}\rho (\mathbf{h})\), whereas the cross-covariance functions are given by \(\text{Cov}\left [Z_{i}\left (\mathbf{x}\right ),Z_{j}\left (\mathbf{x} + \mathbf{h}\right )\right ] =\sum _{ p=1}^{P}a_{\mathit{ip}}a_{\mathit{jp}}\rho (\mathbf{h})\). Taking h = 0, we find that \(\text{Var}\left [Z_{i}\left (\mathbf{x}\right )\right ] =\sum _{ p=1}^{P}a_{\mathit{ip}}^{2}\) and \(\text{Cov}\left [Z_{i}\left (\mathbf{x}\right ),Z_{j}\left (\mathbf{x}\right )\right ] =\sum _{ p=1}^{P}a_{\mathit{ip}}a_{\mathit{jp}}\).

3 Variograms and Cross-Variograms

Let us consider here the correlation structure of the observable processes, induced by the latent factor model. For the observable stochastic processes \(Y _{i}\left (\mathbf{x}\right )\), i = 1, , m, we can show that

$$\displaystyle{\text{E}\left [Y _{i}\left (\mathbf{x}\right )\right ] =\beta _{i} +\omega _{i}\delta _{i}\left (\frac{2} {\pi } \right )^{\frac{1} {2} },\ \ \ \ \ \ \ \text{Var}\left [Y _{i}\left (\mathbf{x}\right )\right ] =\varsigma _{ i}^{2} +\omega _{ i}^{2}\left [1 -\frac{2} {\pi } \delta _{i}^{2}\right ],}$$

where \(\delta _{i} =\alpha _{i}/\sqrt{1 +\alpha _{ i }^{2}}\), and, for h0,

$$\displaystyle{ C_{\mathit{ii}}\left (\mathbf{h}\right ) = \text{Cov}\left [Y _{i}\left (\mathbf{x}\right ),Y _{i}\left (\mathbf{x} + \mathbf{h}\right )\right ] =\varsigma _{ i}^{2}\rho (\mathbf{h}). }$$
(5)

Note that if \(\rho (\mathbf{h}) =\rho (-\mathbf{h})\) we have that \(C_{\mathit{ii}}(\mathbf{h}) = C_{\mathit{ii}}(-\mathbf{h})\). Furthermore, C ii () = 0 and \(C_{\mathit{ii}}(\mathbf{0})\neq C_{\mathit{ii}}(\mathbf{0}^{+}) =\varsigma _{ i}^{2}\), that is, the covariance function C ii (h) is discontinuous at the origin.

On the other hand, for h0, the variogram of the observable \(Y _{i}\left (\mathbf{x}\right )\) takes the form

$$\displaystyle{ \gamma _{\mathit{ii}}\left (\mathbf{h}\right ) = \frac{1} {2}\text{Var}\left [Y _{i}\left (\mathbf{x} + \mathbf{h}\right ) - Y _{i}\left (\mathbf{x}\right )\right ] =\omega _{ i}^{2}\left [1 -\frac{2} {\pi } \delta _{i}^{2}\right ] +\varsigma _{ i}^{2}\left [1 -\rho (\mathbf{h})\right ], }$$
(6)

which is, similarly to the covariance function, discontinuous in zero. In fact, we have that \(\gamma _{\mathit{ii}}\left (\mathbf{0}\right ) = 0\) and \(\gamma _{\mathit{ii}}\left (\mathbf{0}^{+}\right ) =\omega _{ i}^{2}[1 - (2/\pi )\delta _{i}^{2}]\). Note that \(\gamma _{\mathit{ii}}\left (\infty \right ) = C_{\mathit{ii}}(\mathbf{0})\). To visually asses Formula (6), Fig. 1 shows the form taken by the variogram \(\gamma _{\mathit{ii}}\left (\mathbf{h}\right )\) for different values of the parameters, in the case of a Cauchy spatial autocorrelation function \(\rho (\mathbf{h}) =\big [1 + \left (\left \|\mathbf{h}\right \|/\gamma \right )^{2}\big]^{-\eta }\), with γ = 1 and η = 1. As we can see, the nugget of the variogram decreases for decreasing values of ω and for values of the skewness parameter α departing from zero.

Fig. 1
figure 1

The graphs show the shape of the theoretical variogram \(\gamma _{\mathit{ii}}\left (\mathbf{h}\right )\) given in Formula (6), for a Cauchy autocorrelation function with both parameters equal to 1, and for different values of the other parameters: (left) ω = 0. 5, ς = 1; (middle) α = 2, ς = 1; (right) α = 2, ω = 0. 5. The solid line in the three graphs corresponds to the same set of parameter values. The line in the first graph corresponding to α = 0 gives the variogram in the case of a Gaussian process

For any two stochastic processes \(Y _{i}\left (\mathbf{x}\right )\) and \(Y _{j}\left (\mathbf{x}\right )\), with ij, it is easy to show that

$$\displaystyle{ C_{\mathit{ij}}\left (\mathbf{h}\right ) = \text{Cov}\left [Y _{i}\left (\mathbf{x}\right ),Y _{j}\left (\mathbf{x} + \mathbf{h}\right )\right ] = \text{Cov}\left [Z_{i}\left (\mathbf{x}\right ),Z_{j}\left (\mathbf{x} + \mathbf{h}\right )\right ] =\varsigma _{\mathit{ij}}\rho (\mathbf{h}), }$$
(7)

where \(\varsigma _{\mathit{ij}} =\sum _{ p=1}^{P}a_{\mathit{ip}}a_{\mathit{jp}} = \text{Cov}\left [Z_{i}\left (\mathbf{x}\right ),Z_{j}\left (\mathbf{x}\right )\right ]\). Note that \(C_{\mathit{ij}}\left (\mathbf{h}\right ) = C_{\mathit{ji}}\left (\mathbf{h}\right )\) and that if \(\rho (\mathbf{h}) =\rho (-\mathbf{h})\), then \(C_{\mathit{ij}}\left (\mathbf{h}\right ) = C_{\mathit{ij}}\left (-\mathbf{h}\right )\).

For the cross-variogram between \(Y _{i}\left (\mathbf{x}\right )\) and \(Y _{j}\left (\mathbf{x}\right )\), with \(i\neq j\), we obtain

$$\displaystyle{ \gamma _{\mathit{ij}}\left (\mathbf{h}\right ) = \frac{1} {2}\text{Cov}\left [Y _{i}\left (\mathbf{x} + \mathbf{h}\right ) - Y _{i}\left (\mathbf{x}\right ),Y _{j}\left (\mathbf{x} + \mathbf{h}\right ) - Y _{j}\left (\mathbf{x}\right )\right ] =\varsigma _{\mathit{ij}}\left [1 -\rho (\mathbf{h})\right ]. }$$
(8)

4 Estimation and Prediction

Assuming to know the number P of common factors and the spatial autocorrelation function ρ(h), the model depends on the parameter vector \(\boldsymbol{\vartheta }^{{\ast}} = (\boldsymbol{\beta },\mathbf{A},\boldsymbol{\omega },\boldsymbol{\alpha })\), where \(\boldsymbol{\beta }= \left (\beta _{1},\ldots,\beta _{m}\right )^{T}\), \(\mathbf{A} = \left (\mathbf{a}_{1},\ldots,\mathbf{a}_{m}\right )^{T}\) with \(\mathbf{a}_{i} = \left (\text{a}_{i1},\ldots,\text{a}_{iP}\right )^{T}\), \(\boldsymbol{\omega }= \left (\omega _{1},\ldots,\omega _{m}\right )^{T}\), and \(\boldsymbol{\alpha }= \left (\alpha _{1},\ldots,\alpha _{m}\right )^{T}\). Note that, similarly to the classical factor model, our model is not identifiable. Indeed, there are two groups of orthogonal transformations of the matrix A, given by permutation matrices and by some special reflection matrices, that leave the model unchanged [30]. However, this is the only indeterminacy in the model and can easily be faced.

In the following, we will further assume to know the parameters \(\boldsymbol{\omega }\) and \(\boldsymbol{\alpha }\). In this case, by resorting to Markov chain Monte Carlo (MCMC), and in particular to the Metropolis-Hasting algorithm, a likelihood based estimation procedure for the parameter \(\boldsymbol{\vartheta }= (\boldsymbol{\beta },\mathbf{A})\) can be developed by exploiting the Monte Carlo Expectation Maximization (MCEM) algorithm. Let \(\mathbf{F} = \left (\mathbf{F}_{1},\ldots,\mathbf{F}_{P}\right )^{T}\), where \(\mathbf{F}_{p} = \left (F_{p}(\mathbf{x}_{1}),\ldots,F_{p}(\mathbf{x}_{K})\right )^{T}\), \(p = 1,\ldots,P\), and let \(\mathbf{y} = \left (\mathbf{y}_{1},\ldots,\mathbf{y}_{m}\right )^{T}\), where \(\mathbf{y}_{i} = \left (y_{i}(\mathbf{x}_{1}),\ldots,y_{i}(\mathbf{x}_{K})\right )^{T}\), i = 1, , m. Whereas the marginal log-likelihood \(l(\boldsymbol{\vartheta }) =\ln f(\mathbf{y};\boldsymbol{\vartheta })\) is not available due to the presence of multidimensional integrals in the derivation of the marginal density \(f(\mathbf{y};\boldsymbol{\vartheta })\), the complete log-likelihood based on the joint distribution \(f(\mathbf{y},\mathbf{F};\boldsymbol{\vartheta })\) is easily given by

$$\displaystyle\begin{array}{rcl} l_{c}(\boldsymbol{\vartheta })& =& \ln f(\mathbf{y},\mathbf{F};\boldsymbol{\vartheta }) =\ln \big (f(\mathbf{y}\vert \mathbf{F};\boldsymbol{\vartheta }) \cdot f(\mathbf{F})\big) \\ & =& \ln \left \{\left (\prod _{i=1}^{m}\prod _{ k=1}^{K}f(y_{\mathit{ ik}};Z_{\mathit{ik}},\beta _{i})\right ) \cdot f(\mathbf{F})\right \} \\ & =& \ln \!\left \{\!\!\left (\prod _{i=1}^{m}\prod _{ k=1}^{K}2\ \phi _{ 1}\!\left (y_{\mathit{ik}}\! -\!\beta _{i}\! -\! Z_{\mathit{ik}};\omega _{i}^{2}\right )\varPhi \!\left (\!\frac{\alpha _{i}} {\omega _{i}}(y_{\mathit{ik}}\! -\!\beta _{i}\! -\! Z_{\mathit{ik}})\!\right )\!\right )\! \cdot \!\left (\prod _{p=1}^{P}f(\mathbf{F}_{ p})\!\right )\!\!\right \},{}\end{array}$$
(9)

where y ik  = y i (x k ) and Z ik  = Z i (x k ). In this situation, the marginal log-likelihood \(l(\boldsymbol{\vartheta }) =\ln f(\mathbf{y};\boldsymbol{\vartheta })\) can be maximized by resorting to the Monte Carlo Expectation Maximization (MCEM) algorithm (see, for instance, [23] and [12]).

At the sth iteration, the MCEM algorithm involves three steps: S-step, E-step and M-step. In the first step (S-step), R s samples F (r), \(r = 1,\ldots,R_{s}\), are drawn from the (filtered) conditional distribution \(f(\mathbf{F}\vert \mathbf{y};\boldsymbol{\vartheta }_{s-1})\), where \(\boldsymbol{\vartheta }_{s-1}\) is the guess of the parameter \(\boldsymbol{\vartheta }\) after the (s − 1)th iteration. These samples can be collected by using some Markov chain Monte Carlo (MCMC) procedure based on the Metropolis-Hustings algorithm. In the second step (E-step) the following approximation of the conditional expectation of the complete log-likelihood is computed

$$\displaystyle{Q_{s}\left (\boldsymbol{\vartheta },\boldsymbol{\vartheta }_{s-1}\right ) =\hat{ \text{E}}\left [\ln f(\mathbf{y},\mathbf{F};\boldsymbol{\vartheta })\vert \mathbf{y}\right ] = \frac{1} {R_{s}}\sum _{r=1}^{R_{s} }\ln f\big(\mathbf{y},\mathbf{F}^{(r)};\boldsymbol{\vartheta }\big).}$$

The last step (M-step) supplies as the new guess \(\boldsymbol{\vartheta }_{s}\) the value of \(\boldsymbol{\vartheta }\) which maximizes \(Q_{s}\left (\boldsymbol{\vartheta },\boldsymbol{\vartheta }_{s-1}\right )\).

Although convergence results for this algorithm are not available, it is nevertheless possible to show that the “average” complete likelihood which is maximized in the M-step of the MCEM algorithm is concave and admits a unique local (and global) maximum. This result allows to safely implement standard numerical maximization techniques.

Assuming as known all parameters of the model, prediction of the observable processes \(Y _{i}\left (\mathbf{x}\right )\) at an unobserved spatial location (or at an unobserved set of spatial locations) can be carried out either by exploiting some of the properties of the CSN distribution, or by implementing some MCMC algorithm. On the other hand, for the prediction of the unobserved common factors \(F_{p}\left (\mathbf{x}\right )\), we need to resort to MCMC algorithms. In the case in which we are interested in predicting a common factor on a large set of spatial locations (maybe on a grid), instead of carring out an MCMC run at each spatial location, we can carry out an MCMC run only at the sampling points (that is, only at those points for which we gathered observations), and then exploit a linear property similar to Kriging, and also similar to that found by [34] in a univariate framework, to obtain predictions at all other spatial locations.

To asses the goodness of the MCEM estimation procedure we performed some simulation studies. To give some examples, in Fig. 2 we show the results of some simulation analyses. For these analyses we considered m = 2 and P = 1, that is, two observable variables and one latent common factor \(F\left (\mathbf{x}\right )\). In the first two simulation experiments we considered a powered exponential (stable) spatial autocorrelation function \(\rho (\mathbf{h}) =\exp \big [-\left (\gamma \left \|\mathbf{h}\right \|\right )^{\eta }\big]\), with \(\gamma = 10^{-5}\) and η = 1. 5, whereas in the last two experiments we considered a Cauchy autocorrelation function with γ = 7, 000 and η = 1. For any given set of parameter values \(\boldsymbol{\vartheta }^{{\ast}}\) and a given spatial autocorrelation function ρ(h), we simulated 50 realizations from the model over K = 25 equally spaced fixed sampling points located on the nodes of a grid. For each simulated realization, we run the MCEM estimation algorithm, assuming as unknown only the parameters a 11, a 21, β 1 and β 2. Each time, we considered 800 iterations of the MCEM algorithm, and at each step of the algorithm we considered 800 MCMC samples (of which 400 burn-in). As shown in Fig. 2, despite some possible distortion (which could be due to the modest sample size), the sampling distributions look quite reasonable. However, though our simulation experiments gave us reassuring results, we feel that more efforts should be made to fully investigate the theoretical inferential properties of the proposed inferential procedure.

Fig. 2
figure 2

The histograms show the simulated univariate marginal sampling distributions of the MCEM estimator of the parameters a 11, a 21, β 1 and β 2 (from left to right) in a model with m = 2 and P = 1 obtained in four simulation experiments (from top to bottom). The vertical solid lines represent the true parameter values, whereas the vertical dashed lines represent the empirical means over the 50 simulated realizations. For the spatial autocorrelation function ρ(h) we chose a powered exponential model with γ = 0. 00001 and η = 1. 5 in the first two simulation experiments (first two rows), and a Cauchy model with γ = 7, 000 and η = 1 in the last two simulation experiments (last two rows). The parameters α 1 and α 2 were fixed equal to: −1 and 1 (first row); 2 and 2 (second row); −1 and 1 (third row); 2 and 2 (fourth row). For all four simulation experiments, the other parameters where equal to: a 11 = 2, \(a_{21} = -0.7\), β 1 = 1, β 2 = 2, ω 1 = 1, ω 2 = 1

As far as the computational load of our estimation procedure is concerned, implementing our algorithm with the help of the OpenBUGS software [21] using the package R2WinBUGS in R [29], and using standard commercial personal computers, the computing times are still demanding. Just to give an example, with 25 observations on a grid simulated assuming the powered exponential autocorrelation function and the value of the parameters used to obtain the simulated distributions in the second row of Fig. 2, one iteration of the MCEM algorithm (with an MCMC sample size of 800) took 41 s. Increasing the size of the grid to 49 observations, the computing time increases to 102 s. Let us note that much of the time is needed for the maximization step of the MCEM algorithm. In the former case, the time needed to generate the MCMC sample was less than 1 second, whereas the time needed by the maximization step was 40 s. Thus, to obtain one MCEM estimate, using 800 iterations of the MCEM, takes more than 9 h, and to obtain a simulated distribution, based on 50 replicates, of the MCEM estimator (that is, one row of Fig. 2) takes several days.

Conclusion

In this work we have proposed and studied a model for the analysis of multivariate geostatistical data showing some degree of skewness. Our geostatistical model based on latent factors can be considered as an extension to skewed non-Gaussian data of the classical geostatistical proportional covariance model.

By framing our model in a hierarchical context, that is, by extending to the multivariate case the model-based geostatistical approach in [10], it would be possible to extend the present work to deal with regionalized variables of different kind. Instead of assuming that the conditional distributions of \(Y _{i}\left (\mathbf{x}\right )\) given \(Z_{i}\left (\mathbf{x}\right )\) are all skew-normal, we might assume, for different values of i = 1, , m, that they are of different type. For instance, [25] considers a model in which some of the (conditional) distributions, of the observable regionalized variables, are Poisson whereas some others are Gamma. In this way, we could obtain a model for non-Gaussian data flexible enough to account for observable regionalized variables showing different departures from normality.

On the other hand, a generalization in a different direction might involve the introduction of more spatial scales as in the classical linear model of coregionalization. This would supply a more flexible spatial autocorrelation structure in which the latent processes Z i (x), which are behind the level of the observable regionalized variables Y i (x), are not constrained to have proportional covariance and cross-covariance functions. However, the high level of complexity of this generalization would require a large amount of data to be detected and would pose serious inferential problems.

As regard to the model presented in this work, we presented a computationally intensive likelihood based inferential procedure, exploiting the capabilities of the MCEM algorithm. It must be noted that with this procedure we estimated just some of the parameters of the model, assuming the others as known. In particular, we assumed as known the parameters \(\boldsymbol{\omega }= \left (\omega _{1},\ldots,\omega _{m}\right )^{T}\) and \(\boldsymbol{\alpha }= \left (\alpha _{1},\ldots,\alpha _{m}\right )^{T}\) that characterize the shape of the skew-normal (conditional) distributions. In this way we avoided many of the well known inferential problems posed by the estimation of the parameters of the skew-normal distribution. Although in this work we did not discuss any inferential procedure for these parameters, these can nevertheless be calibrated comparing the theoretical marginal distributions and the theoretical variograms with the corresponding empirical counterparts. From a computational perspective, although we checked the feasibility of our estimation procedure for reasonable sample sizes and for different parameter values, it must be remarked that in more complex situations the computational burthen might increase considerably.