1 Introduction

Many practical studies in public health, ecology and many other disciplines rely on binary spatial data. However, most of the conventional spatial analyses were designed to address the problem of estimation/prediction based on continuous observations. In the case of binary variables, for instance, diagnosis of groundwater pollution, there are only two possible outcomes, present (denoted as 1) or absent (denoted as 0). The logistic regression model is a well-known and well-documented methodology which is used in many contexts, specifically, in the presence of spatial dependence, see for example, Lin and Clayton (2005), Zhu et al. (2005), Xie et al. (2005), Tayyebi et al. (2010), Wu and Zhang (2013), Diggle and Giorgi (2016).

In a spatial framework, Paciorek (2007) focused on a large binary dataset and compared penalized likelihood and Bayesian models based on fit, speed and ease of implementation. He also devised an effective Markov chain Monte Carlo (MCMC) sampling scheme to address slow mixing of MCMC techniques in a generalized linear mixed model (GLMM). Zhu et al. (2008) studied logistic regression analysis of binary lattice data using a spatial–temporal autologistic regression model in a frequentist approach and used Monte Carlo maximum likelihood estimators for parameter estimation. To handle computational and inferential challenges posed by high-dimensional binary spatial data, Chang et al. (2016) presented a novel calibration method for computer models and applied a generalized principal component-based dimension reduction method. Sengupta et al. (2016) used a reduced-rank spatial random effects model to account for remote sensing datasets that can be massive in size and non-stationary in space. They estimated the parameters using an expectation–maximization (EM) algorithm. Nisa et al. (2019) focused on the estimation of propensity score as a method which is used to reduce bias due to confounding factors in the estimation of the treatment impact on observational data. They incorporated a spatial logistic regression model and used an EM algorithm to handle maximum likelihood estimation. Hardouin (2019) presented a variational method for parameter estimation in a logistic spatial regression since the expectations in the E-step of the EM algorithm were not available in closed-from expressions. Zhang et al. (2021) proposed a multivariate skew-elliptical link model for correlated binary responses, which included the multivariate probit model as a special case.

Intrinsically, the inference of a logistic regression model involves a hidden unobserved process, although in all aforementioned studies the hidden process has been treated as a user-friendly Gaussian random field. Nevertheless, in a whole range of applications, non-Gaussianity of the latent component arises explicitly from the existence of spatial/spatiotemporal heterogeneities. Thus, some active efforts to seek departures from Gaussianity called for some applicable strategies to handle some of the potential weaknesses associated with the transformation methods. A review of the most recent studies on this topic has been deemed by Tadayon and Torabi (2019) and Tadayon and Rasekh (2019). Mahmoudian (2018) discussed that most of previous skewed spatial models were ill-defined according to the consistency condition of the Kolmogorov existence theorem (Billingsley 2008) as their parametrization of the skewed distributions does not directly allow for an extension to a spatial random field model. Using the multivariate skew-normal distribution of Sahu et al. (2003) (SSN) they proposed a valid random field model with a skew structure to tackle non-Gaussian features and claimed that their random field is particularly convenient for computation. In addition, Mahmoudian (2018) expressed that the induced skewness under this family is not mixed with the spatial correlations.

To the best of our knowledge, the literature on modeling skewness in the case of binary spatial data is very scarce (Hosseini et al. 2011; Afroughi 2015). This design is very useful when our interest is to capture spatial dependence and avoid inefficient estimates by manipulating the data. In this paper, we focus on implementing the valid flexible skew-Gaussian random field introduced by Mahmoudian (2018) to address both spatial dependence and (possible) skewness through a logistic regression model. The plan of the remainder of this paper is as follows. The following section introduces our proposed spatial logistic regression model based on a valid skew-Gaussian random field and explains our methodology of estimating the model parameters. An analysis of a synthetic data is described in Sect. 3. Section 4 analyzes the Meuse heavy metals dataset as an application of our methodology. Finally, the paper ends with some conclusions and final remarks (Sect. 5).

2 The Spatial Model

Logistic regression is generally a kind of multiple regression model to analyze the relationship between a binary outcome and independent variables. Let \(\mathbf{Z} \left( \mathbf{S}\right) = {\left( {Z\left( {{s_1}} \right) , \ldots ,Z\left( {{s_n}} \right) } \right) ^\mathrm{T}}\) be an observable vector of spatially dependent binary variables at locations \(\mathbf{S} = {\left( {{s_1}, \ldots ,{s_n}} \right) ^\mathrm{T} }\). In a hierarchical setting, it is conventional to model \(\mathbf{Z}\left( \mathbf{S}\right) \) as Bernoulli variables, whose means depend on an underlying spatial process \(\mathbf{Y}\left( \mathbf{S}\right) = {\left( {Y\left( {{s_1}} \right) , \ldots Y\left( {{s_n}} \right) } \right) ^\mathrm{T}}\) such that \(Z\left( {s_i}\right) \)s are conditionally independent, given the hidden process \(\mathbf{Y}\left( \mathbf{S}\right) \). Like Tadayon and Torabi (2022), the specific hierarchical model we investigate has the following representation

$$\begin{aligned}&Z\left( s \right) \left| {Y\left( s \right) } \right. \sim \mathrm{Ber}\left( {p\left( s \right) = \frac{{{e^{Y\left( s \right) }}}}{{1 + {e^{Y\left( s \right) }}}}} \right) \nonumber \\&Y\left( s \right) = \mathbf{x}\left( s \right) ^\mathrm{T}{\varvec{\beta }}+ { \gamma } W\left( s \right) + \varepsilon \left( s \right) , \end{aligned}$$
(1)

where \({\varvec{\beta }}\) is a vector of k unknown parameters with corresponding \(\mathbf{{x}}\left( s \right) \! =\!\) \({\left( {{x_1}\left( s \right) , \ldots ,{x_k}\left( s \right) } \right) ^\mathrm{T} }\) as a vector of known covariates that captures the large-scale spatial variation, \(\gamma \) is a scale parameter, \(W\left( \cdot \right) \) takes account of the non-Gaussian features through a valid skewed random field in a latent mode. Finally, the white noise error \(\varepsilon \left( \cdot \right) \sim \mathrm{N}\left( {0,{\tau ^2}} \right) \) is considered to be independent of \(W\left( \cdot \right) \). Evidently,

$$\begin{aligned} \mathrm{Pr}\left[ {Z\left( s \right) = z\left| {Y\left( s \right) = y} \right. } \right]= & {} p{\left( s \right) ^z}{\left[ {1 - p\left( s \right) } \right] ^{1 - z}} \\= & {} \frac{1}{{1 + \exp \left\{ { - y\left( {2z - 1} \right) } \right\} }}. \end{aligned}$$

We consider \(\mathbf{{W}}\left( \mathbf{{S}} \right) = {\left( {W\left( {{s_1}} \right) , \ldots ,W\left( {{s_n}} \right) } \right) ^\mathrm{T} }\) as the SSN process

$$\begin{aligned} \mathbf{W}\left( \cdot \right) \sim \mathrm{SSN}_n\left[ { - \sqrt{\frac{2}{\pi }} \delta {\mathbf{1}_n},H,\delta {I_n}} \right] , \end{aligned}$$
(2)

with the probability density function

$$\begin{aligned} f\left( \mathbf{{w}} \right) = {2^n}{\phi _n}\left[ {\mathbf{{w}}; - \sqrt{\frac{2}{\pi }} \delta {\mathbf{{1}}_n},H + {\delta ^2}{I_n}} \right] {\Phi _n}\left[ {\delta {{\left( {H + {\delta ^2}{I_n}} \right) }^{ - 1}}( {\mathbf{{w}} + \sqrt{\frac{2}{\pi }} \delta {\mathbf{{1}}_n}} );\mathbf{{0}},\Delta } \right] , \end{aligned}$$

mean \(\mathbf{0}\) and covariance matrix \(H + \left( {1 - {2}/{\pi }} \right) {\delta ^2}{I_n}\), where \(\mathbf{{w}} \in {{{\mathbb {R}}}^n}\), \(\Delta = {I_n} - {\delta ^2}{\left[ {H + {\delta ^2}{I_n}} \right] ^{ - 1}}\), \(\mathbf{1}_n\) denotes an \(n\times 1\) vector of ones and \(I_n\) is the identity matrix. \({\phi _n}\left( \cdot ;\mu ,\Sigma \right) \) and \({\Phi _n}\left( \cdot ;\mu ,\Sigma \right) \) represent the normal density and the normal cumulative distribution function of \(\mathrm{N}_n\left( {\mu ,\Sigma } \right) \), respectively. The second term in the covariance matrix can be viewed as a nugget effect in geostatistics. The exponential correlation function is chosen for the entries of H such that \({H_{ij}} = \exp \left\{ { - \left\| h \right\| /\psi } \right\} = \exp \left\{ { - {{\left\| {{s_i} - {s_j}} \right\| }}/{\psi }} \right\} \), where \(\psi \) is the range parameter. Therefore, the complete log likelihood function of \({\varvec{\eta }}= \left( {{\varvec{\beta }},\gamma ,{\tau ^2},\delta ,\psi } \right) ^\mathrm{T}\) is given by

$$\begin{aligned} \ell \left( {\varvec{\eta }}\right)= & {} - \sum \limits _{i} {\ln \left( {1 + {e^{Y\left( {s_i} \right) }}} \right) } + \sum \limits _{i} {Y\left( {s_i} \right) Z\left( {s_i} \right) }\nonumber \\&- \frac{1}{{2{\tau ^2}}}{\varvec{\varepsilon }}^\mathrm{T}{\varvec{\varepsilon }}- \ln \left| {H + {\delta ^2}{I_n}} \right| - \frac{n}{2}\ln {\pi ^2}{\tau ^2} \nonumber \\&- \frac{1}{2}{\left[ {\mathbf{{W}} + \sqrt{\frac{2}{\pi }} \delta {\mathbf{{1}}_n}} \right] ^\mathrm{T} }{\left[ {H + {\delta ^2}{I_n}} \right] ^{ - 1}}\left[ {\mathbf{{W}} + \sqrt{\frac{2}{\pi }} \delta {\mathbf{{1}}_n}} \right] \nonumber \\&+ \ln {\Phi _n}\left[ {\delta {{\left( {H + {\delta ^2}{I_n}} \right) }^{ - 1}}( {\mathbf{{W}} + \sqrt{\frac{2}{\pi }} \delta {\mathbf{{1}}_n}} );\mathbf{{0}},\Delta } \right] , \end{aligned}$$
(3)

where \(\left| {\cdot } \right| \) denotes the determinant. Since the likelihood function \(\ell \left( {\varvec{\eta }}\right) \) is analytically intractable, one can use a natural extension of the EM algorithm that employs Monte Carlo methods (MCEM algorithm) to estimate the model parameters \({\varvec{\eta }}\). In order to be self-contained, we recall that the EM algorithm operates on the so-called Q-function where at the tth E-step is defined by

$$\begin{aligned} Q\left( {{\varvec{\eta }}\left| {{{\varvec{\eta }}^t}} \right. } \right) = E\left[ {\ell \left( {{\varvec{\eta }}\left| {\mathbf{{Z}},{{\varvec{\eta }}^t}} \right. } \right) } \right] = \int {\ell \left( {\varvec{\eta }}\right) f\left( {\mathbf{{W}},{\varvec{\varepsilon }}\left| {\mathbf{{Z}},{{\varvec{\eta }}^t}} \right. } \right) d\mathbf{{W}}d{\varvec{\varepsilon }}}. \end{aligned}$$
(4)

The M-step is to maximize Q with respect to \({\varvec{\eta }}\) to obtain \({{\varvec{\eta }}^{t + 1}} = \arg {\max _{{\varvec{\eta }}\in \Theta }}Q\left( {{\varvec{\eta }}\left| {{{\varvec{\eta }}^t}} \right. } \right) \), where \(\Theta \) is the parameter space. When the integral in equation (4) is analytically intractable or very high dimensional the MCEM algorithm presents a modification of the EM algorithm where the expectation in the E-step is computed numerically through Monte Carlo simulation. By replacing the conditional expectations in (4) with the corresponding Monte Carlo approximations, we can write

$$\begin{aligned} Q\left( {{\varvec{\eta }}\left| {{{\varvec{\eta }}^t}} \right. } \right) \approx \frac{1}{M}\sum \limits _{m = 1}^M {\ell \left( {{{\varvec{\eta }}^t};\mathbf{{Z}},{\mathbf{{W}}^{\left( m \right) }},{{\varvec{\varepsilon }}^{\left( m \right) }}} \right) }, \end{aligned}$$

and employ an optimization procedure to maximize \(Q\left( {{\varvec{\eta }}\left| {{{\varvec{\eta }}^t}} \right. } \right) \) with respect to \({\varvec{\eta }}\). These steps are repeated until convergence conditions of the MCMC were satisfied through the Gelman–Rubin convergence diagnostics (Gelman and Rubin 1992). At the tth iteration of the MCEM algorithm, we need to calculate some conditional expectations of the form of \({E_i}\left[ {\mathrm{{g}}\left( {\mathbf{{W}},{\varvec{\varepsilon }}} \right) \left| \mathbf{{Z}} \right. } \right] \), \( i\in \left\{ 1,\ldots ,7\right\} \) for some function \(\mathrm {g}\) of \(\mathbf{{W}}\) and \({\varvec{\varepsilon }}\). These conditional expectations that are shown in Equation (A1) of Appendix as an extended form of (4) may not have explicit forms and need to be substituted by their Monte Carlo approximations. We use the notation \({\mathbb {E}}_{i}^t\left( \cdot \right) \) to show the corresponding approximation of the ith conditional expectation \({E}_{i}\left( \cdot \right) \) whenever it does not have a closed form. \({{\mathbb {E}}}_{i}^t\) can be calculated based on samples \(\left\{ {{\mathbf{{W}}^{\left( m \right) }},{{\varvec{\varepsilon }}^{\left( m \right) }}} \right\} _{m = 1}^M\) from the joint distribution \({f_{\mathbf{{W}},{\varvec{\varepsilon }}\left| \mathbf{{Z}},{\varvec{\eta }}\right. }}\) as

$$\begin{aligned} {{\mathbb {E}}}_i^t = {M^{ - 1}}\sum \nolimits _{m = 1}^M {\mathrm{{g}}_i\left( {\mathbf{{Z}},{\mathbf{{W}}^{\left( m \right) }},{{\varvec{\varepsilon }}^{\left( m \right) }};{{\varvec{\eta }}^t}} \right) }. \end{aligned}$$

For details regarding the updates of the model parameters through the M-step see the Appendix in which a variational method is used to estimate the parameters. Generating samples from the joint distribution \({f_{\mathbf{{W}},{\varvec{\varepsilon }}\left| \mathbf{{Z}},{\varvec{\eta }}\right. }}\) requires a MCMC algorithm. To that end, we explore the full conditional distributions as follows.

  • \({\varvec{\varepsilon }}\left| {\mathbf{{Z}},\mathbf{{W}},{\varvec{\eta }}} \right. \): According to the details of the variational method described in Appendix, we can write

    $$\begin{aligned} f\left( {{\varvec{\varepsilon }}\left| {\mathbf{{Z}},\mathbf{{W}},{\varvec{\eta }}} \right. } \right)\propto & {} \exp \left\{ { - \frac{1}{{2{\tau ^2}}}\sum \nolimits _i {\varepsilon _i^2} + \sum \nolimits _i {{Z_i}{\varepsilon _i}} - \frac{1}{2}\sum \nolimits _i {{\varepsilon _i}} } \right. \\&\left. {\frac{}{} - \sum \nolimits _i {\lambda \left( {{\theta _i}} \right) \left( {\varepsilon _i^2 + 2\mathbf{{x}}_i^\mathrm{T} {\varvec{\beta }}{\varepsilon _i} + 2\gamma {W_i}{\varepsilon _i}} \right) } } \right\} , \end{aligned}$$

    therefore, the full conditional distribution of \({\varepsilon _i}\)s is approximately proportional to a normal density as

    $$\begin{aligned} {\varepsilon _i}\left| {\mathbf{{Z}},\mathbf{{W}},{\varvec{\eta }}} \right. \mathop \simeq \limits ^d \mathrm{{N}}\left[ {\frac{{{Z_i} - 2\lambda \left( {{\theta _i}} \right) \left( {\mathbf{{x}}_i^\mathrm{T} {\varvec{\beta }}+ \gamma {W_i}} \right) - 0.5}}{{{\tau ^{ - 2}} + 2\lambda \left( {{\theta _i}} \right) }},\frac{1}{{{\tau ^{ - 2}} + 2\lambda \left( {{\theta _i}} \right) }}} \right] . \end{aligned}$$
  • \(\mathbf {W} \left| { \mathbf{{Z}},{\varvec{\varepsilon }},{\varvec{\eta }}} \right. \): With regard to the hierarchical representation of the SSN distribution based on a normal and a truncated normal distributions, we can rewrite \(\mathbf {W}\) as

    $$\begin{aligned} \mathbf{{W}}\left| {\mathbf{{V}} = \mathbf{{v}}} \right. \sim {\mathrm{{N}}_n}\left[ {{{{\varvec{\mu }}_\mathbf{v}}},H} \right] ,\qquad {{\varvec{\mu }}_\mathbf{v}} = \delta \left( {{\mathbf{v}} - \sqrt{\frac{2}{\pi }} {\mathbf{{1}}_n}} \right) , \end{aligned}$$

    where \(\mathbf{{V}} \sim {\mathrm{{N}}_n}\left[ {\mathbf{{0}},{I_n}} \right] {\mathbf{{I}}_{\left\{ {{{\mathbb {R}}}_+^ n } \right\} }}\left( \mathbf{{V}} \right) \) and \({\mathbf{{I}}_{\left\{ \cdot \right\} }}\left( \cdot \right) \) denotes the indicator function. Therefore,

    $$\begin{aligned} f\left( {\mathbf{{W}}\left| {\mathbf{{Z}},{\varvec{\varepsilon }}} \right. ,{\varvec{\eta }}} \right)\propto & {} f\left( {\mathbf{{Z}}\left| {\mathbf{{W}},{\varvec{\varepsilon }}} \right. ,{\varvec{\eta }}} \right) f\left( {\mathbf{{W}}\left| {\mathbf{{V}},{\varvec{\eta }}} \right. } \right) f\left( {\mathbf{{V}}} \right) \\\propto & {} \exp \left\{ { - \frac{1}{2}\gamma \sum \nolimits _i {{W_i}} - {\gamma ^2}\sum \nolimits _i {\lambda \left( {{\theta _i}} \right) W_i^2} - 2\gamma \sum \nolimits _i {\lambda \left( {{\theta _i}} \right) \mathbf{{x}}_i^\mathrm{T} {\varvec{\beta }}{W_i}} } \right. \\&\left. { \frac{}{}- 2\gamma \sum \nolimits _i {\lambda \left( {{\theta _i}} \right) {\varepsilon _i}{W_i}} + \gamma \sum \nolimits _i {{Z_i}{W_i}} } \right\} \\&\times \exp \left\{ { - \frac{1}{2}\left( {\mathbf{{W^\mathrm{T}}}{H^{ - 1}}{} \mathbf{{W}} - 2{\varvec{\mu }}_\mathbf{v}^\mathrm{T} {H^{ - 1}}{} \mathbf{{W}}} \right) } \right\} . \end{aligned}$$

    One can synthesize the above terms to obtain \(\mathbf{{W}}\left| {\mathbf{{Z}},\mathbf{{V}},{\varvec{\varepsilon }},{\varvec{\eta }}} \right. \mathop \simeq \limits ^d {\mathrm{{N}}_n}\left[ {{{\varvec{\mu }}_{\mathbf{{w}}\left| \cdot \right. }},{\Sigma _{\mathbf{{w}}\left| \cdot \right. }}} \right] \), where

    $$\begin{aligned} {{\varvec{\mu }}_{\mathbf{{w}}\left| \cdot \right. }} = {\Sigma _{\mathbf{{w}}\left| \cdot \right. }} \left( { {H^{ - 1}}{\varvec{\mu }}_\mathbf{v} - \frac{{\mathcal{C}}}{2}} \right) ,\qquad {\Sigma _{\mathbf{{w}}\left| \cdot \right. }} = {\left( {{H^{ - 1}} + {{{\mathcal {D}}}}} \right) ^{ - 1}}, \end{aligned}$$

    in which \({{{\mathcal {C}}}}\) is an \(n\times 1\) vector with elements \({c_i} \!=\! \gamma \left( {1 + 4\lambda \left( {{\theta _i}} \right) \mathbf{{x}}_i^\mathrm{T} {\varvec{\beta }}+ 4\lambda \left( {{\theta _i}} \right) {\varepsilon _i} - 2{Z_i}} \right) \) and \({{{\mathcal {D}}}}\) is a diagonal matrix as \({{{\mathcal {D}}}} = 2{\gamma ^2}\mathrm{diag} \left( {\lambda \left( {{\theta _1}} \right) , \ldots ,\lambda \left( {{\theta _n}} \right) } \right) \).

  • \(\mathbf {V} \left| { \mathbf{{Z}},\mathbf {W},{\varvec{\varepsilon }},{\varvec{\eta }}} \right. \):

    $$\begin{aligned} f\left( {\mathbf{{V}}\left| {\mathbf{{Z}},\mathbf{{W}},{\varvec{\varepsilon }}} \right. ,{\varvec{\eta }}} \right)\propto & {} f\left( {\mathbf{{W}}\left| {\mathbf{{V}},{\varvec{\eta }}} \right. } \right) f\left( \mathbf{{V}} \right) \\\propto & {} \exp \left\{ { - \frac{1}{2}\left( {{\varvec{\mu }}_\mathbf{v}^\mathrm{T} {H^{ - 1}}{{\varvec{\mu }}_\mathbf{v}} - 2{\varvec{\mu }}_\mathbf{v}^\mathrm{T} {H^{ - 1}}{} \mathbf{{W}} + \mathbf{{V^\mathrm{T}V}}} \right) } \right\} {\mathbf{{I}}_{\left\{ {{{\mathbb {R}}}_+^ n } \right\} }}\left( \mathbf{{V}} \right) , \end{aligned}$$

    hence, \(\mathbf{V} \left| { \mathbf{{Z}},\mathbf {W},{\varvec{\varepsilon }},{\varvec{\eta }}} \right. \sim {\mathrm{{N}}_n}\left[ {{\Sigma _{\mathbf{v}\left| \cdot \right. }}(\delta {H^{ - 1}}{} \mathbf{{W}} - \sqrt{\frac{2}{\pi }} {H^{ - 1}}{\mathbf{{1}}_n}),{\Sigma _{\mathbf{v}\left| \cdot \right. }}} \right] {\mathbf{{I}}_{\left\{ {{{\mathbb {R}}}_+^ n } \right\} }}\left( \mathbf{{V}} \right) \), where its covariance matrix can be written as \({\Sigma _{\mathbf{v}\left| \cdot \right. }} = {{({\delta ^2}{H^{ - 1}} + {I_n})}^{ - 1}}\).

3 Analysis of a Synthetic Dataset

We now assess the performance of the proposed model using a synthetic dataset along with making a comparison between our results and the one is resulted by applying the model presented in Hardouin (2019). Thus, the contribution of this section is twofold. First, the performance of the presented model is evaluated in estimating the parameters using the response variable generated from model (1) (using algorithm 1), and then, the effect of sample size (the number of spatial locations) on model performance is examined. Ultimately, the results are compared with that of its competitor. All computations were performed using the publicly available statistical software R.

To address our goals, we use algorithm 1 to generate spatially correlated binary data \(Z\left( {{s_i}} \right) \) with \(E\left[ {Z\left( {{s_i}} \right) } \right] = p\left( {{s_i}} \right) \) and \(\rho \left[ {Z\left( {{s_i}} \right) ,Z\left( {{s_j}} \right) } \right] ={H_{ij}} = \exp \left\{ { - \left\| h \right\| /\psi } \right\} \). We did three distinct simulations, each with \({{{\mathcal {R}}}}=500\) generated datasets for three different sample sizes as \(n=200, 400\) and 800 observations, respectively. For all three simulations, we set \(M=100\). In each simulation study, the sites are uniformly distributed over the region \(\left( {0,10} \right) \times \left( {0,10} \right) \). The data were simulated from the model (1) with \({x_i} \sim N\left( {0,1} \right) \) where the true values of the model parameters has been shown in Table 1 which also summarizes the results. Notice that choosing \(\psi =3.5\) in each simulation yields the rough values 0.02 and 0.99 for \(\exp \left\{ { - \left\| h \right\| /\psi } \right\} \) as the approximations of the maximum and minimum dependencies based on the presented exponential correlation function correspond to the smallest and largest distances between the selected locations, respectively.

Table 1 specially reports the bias criterion for an arbitrary parameter, say \(\vartheta \), as

$$\begin{aligned} \mathrm{Bias}\left( {{\hat{\vartheta }} } \right) = {{{{\mathcal {R}}}}^{ - 1}}\sum \nolimits _{r = 1}^{{{\mathcal {R}}}} {\left( {{{{\hat{\vartheta }} }^{\left( r \right) }} - \vartheta } \right) } \end{aligned}$$

and also the empirical variance of each estimation as

$$\begin{aligned} \mathrm{MSE}\left( {{\hat{\vartheta }} } \right) = {{{{\mathcal {R}}}}^{ - 1}}\sum \nolimits _{r = 1}^{{{\mathcal {R}}}} {{{\left( {{{{\hat{\vartheta }} }^{\left( r \right) }} - {\bar{\vartheta }} } \right) }^2}} \end{aligned}$$
(5)

to assess the performance of the proposed methodology, where \({\bar{\vartheta }} = {{{{\mathcal {R}}}}^{ - 1}}\sum \nolimits _{r = 1}^{{{\mathcal {R}}}} {{{\hat{\vartheta }}^{\left( r \right) }}}\). It is worthwhile mentioning that in Hardouin (2019)’s approach spatial variation is captured through the term \(\varepsilon \left( \cdot \right) \) with the same exponential correlation function, i.e., \({\varvec{\varepsilon }}\sim {\mathrm{N}}\left( {\mathbf{0},H^*} \right) \), where \({H_{ij}^*} = {\tau ^2}\exp \left\{ { - \left\| h \right\| /\psi } \right\} \). Eventually, the resultant Akaike information criterion (AIC) values were used to compare model performance. This benchmark, which is the most popular criterion for model assessment in the literature, is calculated as \(\mathrm{AIC}= 2[\#\mathrm{model~parameters}-{{\varvec{\ell }}_{y,\lambda }^{~\cdot }}]\). The AIC values corroborate better performance of the proposed model compared to its competitor. Note that Table 1 compares parameter estimates for the data generated from the proposed model with a misspecified model considered in Hardouin (2019), so it is expected that parameter estimates will be biased for the misspecified model. To address this issue, we use the mean squared prediction error (MSPE) to assess the performance of suggested strategy. In classification problems prediction error is commonly defined as the probability of an incorrect classification, also called the misclassification rate (MCR). To compute MCR, we randomly drop n/10 observations from each simulation; then, MCR is calculated by \({\mathrm{MCR}} = {\left( n/10\right) ^{ - 1}}\sum \nolimits _{i = 1}^{n/10} {I\left( {\widehat{{Z_i}} \ne {Z_i}} \right) }\), where \(I\left( \cdot \right) \) is the indicator function that is equal to one when its input is true. The results which have been reported as percentage in Table 1 represent lower MCRs for the suggested model and also show that as the sample size increases MCRs decrease.

figure a
Table 1 Bias value and the empirical variance (Evar) of the estimated parameters for the proposed and Hardouin (2019)’s approaches based on three different simulations with \(n=200,400\) and 800

To evaluate the performance of the proposed model in different scale of spatial dependence, we did another simulation based on \(n=800\), however, in this case, we fixed all parameters \(\beta _0, \beta _1, \gamma , \delta \) and \(\tau ^2\) the same as what considered before in Table 1 and only altered \(\psi \) to \(\psi =0.1\) which allows the spatial dependency to vary from almost 0 to 0.5. We are aware that in practical issues this value should be considered according to the autocorrelation function relative to the size of the domain, however, it has been chosen to assess the performance of the proposed model in the case of low spatial dependence. The results that are presented in Table 2 could readily be compared with the corresponding part of Table 1. Patently, the results substantiate stability in the performance (bias, the empirical variance and MCR) of the proposed model in the case of low spatial dependence.

Table 2 Bias value and the empirical variance (Evar) of the estimated parameters for the proposed model based on a simulation with \(n=800\) and \(\psi =0.1\). MCR = misclassification rate

Finally, since the inference may be challenging in identifying the nugget effect components (\({\tau ^2}\) and \(\delta \)), here, we discuss to what extent information about these parameters can be recovered from data. To assess identifiability of each of these parameters, say \(\delta \), three datasets (of size 200) were generated from the proposed model with different values of \(\delta \) (and fixed values for other parameters, as described in Table 1). Then, the estimated values were obtained. The same applies for inference on \({\tau ^2}\). Table 3 indicates that the data allow for meaningful inference on the model’s nugget effect components.

Table 3 Identifiability of two parameters \({\tau ^2}\) and \(\delta \) based on three different simulations with \(n=200\)

4 Application: The Meuse Heavy Metals Data

In this section, we illustrate our proposed methodology using a well-known real dataset in the literature on spatial statistics. The Meuse dataset which has been documented in detail by Rikken and Van Rijn (1993) and Burrough and McDonnell (1998) and have been studied frequently in several geostatistical researches, comprise heavy metals measurements in the topsoil in a flood plain along the Meuse River west of the municipality of Stein, Limburg, the Netherlands. The dataset is available in the R package sp and can be loaded with the data function as data(meuse). The measures consist of 155 soil samples collected in an area of approximately \(15m \times 15m\) which were analyzed for their concentration of toxic heavy metals (zinc, lead, copper, and cadmium) in ppm. Figure 1 below depicts a schematic description of the region and sampling locations.

Fig. 1
figure 1

Study area, coordinates are in RDM, the Dutch topographical map coordinate system. The blue color shows the Meuse River

We chose the binary variable lime as our response of interest Z and simultaneously in order to find the most related covariates to our response, corrected AIC (AICC) introduced by Hoeting et al. (2006) (for geostatistical model selection) was used. AICC is given by

$$\begin{aligned} \mathrm{{AICC}} = 2\left[ {n\frac{{p + k + 1}}{{n - p - k - 2}} - {{\varvec{\ell }}_{y,\lambda }^{~\cdot }}} \right] , \end{aligned}$$

where, p shows the number of regression coefficients including an intercept term, k is the number of parameters associated with the autocorrelation function and n is the number of observed sites. Considering four variables zinc, lead, copper, and cadmium as potential covariates, we investigated among all \(2^4 -1\) feasible embedded models and ultimately, an overall consideration (that are not presented here) resulted in a model with two covariates lead and zinc. Although the simulation study showed that estimation of parameters does not depend on initial values for parameters, we use estimates of ordinary GLM for initial values of regression coefficients, i.e., \(\beta _0=-3.34, \beta _{lead}=-0.03\) and \(\beta _{zinc}=0.01\). From \(p(s) = {e^{Y(s)}}/\left( {1 + {e^{Y(s)}}} \right) \) and Equation (1) we can write

$$\begin{aligned} \gamma W\left( s \right) + \varepsilon \left( s \right) \simeq \ln \left( {\frac{{{\bar{Z}} }}{{1 - {{\bar{Z}}} }}} \right) - \mathbf{{x}}{\left( s \right) ^\mathrm{T} }{\varvec{\beta }}, \end{aligned}$$

where \(\widehat{p\left( s \right) } = {{\bar{Z}}} = 0.284\). Now, we define \(V\left( s \right) = \gamma W\left( s \right) + \varepsilon \left( s \right) \) which is approximately given by \(V\left( s \right) = 2.422 + 0.038\textit{lead}\left( s \right) - 0.017 zinc\left( s \right) \). We can see that \(V\left( s \right) \) is a member of the SSN family (2). The Q-Q plot and histogram of \(V\left( s \right) \) are demonstrated in Fig. 2. As a result of simple exploratory data analysis, the histogram shows a non-Gaussian feature, which confirms the suitability of implementing the proposed model based on above-mentioned skew random field. The empirical semi-variogram of \(V\left( s \right) \) was plotted in Fig. 3. The best model was exponential with parameters \(\textit{nugget effect} = 1.07, \textit{sill} = 6.60\) and \(range = 367.24\). Table 4 displays the model parameter estimates and the corresponding standard error for the proposed model and MCRs of both competitor models. The presented estimated-values/MCRs were calculated as the mean of estimated values/MCRs over \(\mathcal{R}=20\) runs of the program.

Fig. 2
figure 2

Left panel displays the histogram of \(V\left( s \right) \) with its kernel density estimate and right panels shows its normal QQ-plot

Fig. 3
figure 3

The empirical semi-variogram of \(V\left( s \right) \)

Table 4 The estimated values and standard error of parameters based on the presented model. Misclassification rate (MCR) has been presented for both competitor models

5 Conclusion

The present study concentrated on implementing a valid flexible skew-Gaussian random field based on the skew-normal family introduced by Sahu et al. (2003) to capture both spatial dependence and (possible) skewness through a logistic regression model. Declaring that directly maximizing the likelihood function of observed data is intractable, a Monte Carlo extension of EM algorithm was developed to compute the maximum likelihood estimate of model parameters. Moreover, a simulation study was conducted to assess the performances of the proposed model and also to investigate the effect of sample size on the results. Finally, a real data application regarding the presence of lime in the topsoil along the Meuse River was also analyzed in which, the concentration of toxic heavy metals zinc and lead were considered as two covariates.

Overall, the proposed model added flexibility to the class of spatial logistic regression models often considered in the literature to account for binary spatial data. It must be mentioned that, in the spatial context, the asymptotic properties of parameter estimators strongly depend on the asymptotic regime which is considered. Specifically, two regimes can be considered, first, when the spatial domain is fixed and bounded and the density of the sampling locations increases with n (the fixed/infill domain). Second, when the spatial domain of observation is unbounded and it grows in size with the sample dimension n (the increasing domain framework). Whereas under the latter regime the maximum likelihood estimators are consistent and asymptotically normal, subject to some regularity conditions (see, for example, Mardia and Marshall (1984)), under the former analogous results do not hold and model parameters could not be consistently estimated. Besides this, in the suggested approach, the latent factors are independent for each location which results in satisfaction of mixing conditions. However, in the general case, replicates are required to obtain consistent estimates even if the number of locations is large.

An astonishing extension of this work is to investigate how the variance process can be allowed to depend on covariates which opens up an opportunity to interpret tail behavior of the process as a function of known covariates. Another step forward is to let this covariance-covariate dependence change in time. On the other hand, in the last decade, with the wide usage of mobile applications and Global Positioning System (GPS) devices as well as the advancement of remote sensors which are accompanied with cheap data storage/computational devices, many geo-referenced data are being collected. As a result, there has been a growing enthusiasm for modeling spatial big data. The third interesting extension of this work is to scale the proposed model to big data. Moreover, in this study the exponential correlation function was chosen, although this could affect the smoothness of the process. One can choose a more flexible spatial correlation structure and compare the results. We have planned to study these approaches in our future studies.

5.1 Supplementary Material

Supplementary materials contain R codes for simulations and real data application conducted in this paper.