Bootstrap approaches for spatial data

García-Soidán, Pilar; Menezes, Raquel; Rubiños, Óscar

doi:10.1007/s00477-013-0808-9

Bootstrap approaches for spatial data

Published: 11 October 2013

Volume 28, pages 1207–1219, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Bootstrap approaches for spatial data

Download PDF

Pilar García-Soidán¹,
Raquel Menezes² &
Óscar Rubiños³

1512 Accesses
22 Citations
Explore all metrics

Abstract

Generation of replicates of the available data enables the researchers to solve different statistical problems, such as the estimation of standard errors, the inference of parameters or even the approximation of distribution functions. With this aim, Bootstrap approaches are suggested in the current work, specifically designed for their application to spatial data, as they take into account the dependence structure of the underlying random process. The key idea is to construct nonparametric distribution estimators, adapted to the spatial setting, which are distribution functions themselves, associated to discrete or continuous random variables. Then, the Bootstrap samples are obtained by drawing at random from the estimated distribution. Consistency of the suggested approaches will be proved by assuming stationarity from the random process or by relaxing the latter hypothesis to admit a deterministic trend. Numerical studies for simulated data and a real data set, obtained from environmental monitoring, are included to illustrate the application of the proposed Bootstrap methods.

Uncertainty Quantification in Robust Inference for Irregularly Spaced Spatial Data Using Block Bootstrap

Article 08 November 2018

Estimating High Quantiles Based on Dependent Circular Data

Article 21 February 2019

Consistency of bootstrap approximation to the null distributions of local spatial statistics with application to house price analysis

Article Open access 04 September 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Different statistical techniques can require extra work to approximate the unknown terms involved, to check the variance of the estimators employed or to derive their sampling distribution. Some of these problems have been solved in the curve estimation setting through the Bootstrap method (Efron 1979), whose key idea is that the relationship between the theoretical distribution and the observed sample is similar to the relationship between a distribution estimator and a secondary sample drawn from it. The original Bootstrap procedures were designed for independent data (De Angelis and Young 1992; Hall 1992) and an adaptation of these procedures to the spatial setting would have large applicability for addressing a variety of issues. A goodness-of-fit test for the variogram model could be implemented to extend the results in Maglione and Diblasi (2004) for gaussian data, and even inference on the underlying distribution of the random process could be developed. Bootstrap variants of the tests based on their asymptotic distribution, such as the one proposed in Li et al. (2007) for assessment of the properties of the covariance function, could be introduced to improve their speed of convergence. Also, the kernel approaches for characterization of the dependence structure (Hall and Patil 1994) or prediction (Menezes et al. 2010) could be accomplished by deriving the optimal bandwidths and additionally providing estimates of their accuracy.

The parametric Bootstrap methodology can be easily extended to the spatial setting, thus enabling the researchers to develop techniques for variogram assessment (Olea and Pardo-Igúzquiza 2011), to construct confidence intervals for the parameters of a distribution estimator (Goovaerts et al. 2005) or to correct a test for the spectral density (Crujeiras et al. 2010). When independence can be assumed from the residuals, the traditional nonparametric resampling approaches can be applied for approximation of the variance of an estimator (Iranpanah et al. 2011) or probabilities derived from it (Hyun-Han and Young-Il 2006). However, for dependent data, it is necessary to design ad-hoc procedures in order to guarantee consistency of the results, such as the parametric Bootstrap method for small area estimation (Hall and Maiti 2006). Alternatives of more general use are those based on resampling blocks of data (Hall 1985), subsamples obtained by deleting portions of data (Politis et al. 1999) or marks assigned to the spatial points (Loh 2008).

The aim of this work is to introduce nonparametric Bootstrap approaches that allow us to generate replicates from the available data, at a set of locations selected, by first approximating the joint distribution in a nonparametric way and then randomly drawing samples from it. With this idea, different estimators of the multivariate distribution will be proposed, which are distribution functions themselves, associated to discrete or continuous random variables, so that they will be used as the basis for resampling. We will check that consistency follows for the suggested procedures, provided that the random process is strictly stationary or when this condition is relaxed by admitting a deterministic trend. In addition, numerical studies will be carried out to analyze the behavior of both Bootstrap methods for addressing different problems.

2 Multivariate distribution estimators

To derive the distribution estimators, we will assume that the random process $\{ Z ( {\rm s} ) \in {I\!R} : {\rm s} \in D \subset {I\!R}^d \}$ can be modeled as:

$$ Z({\rm s}) = \mu ({\rm s}) + Y({\rm s}) $$

(1)

where $\{ Y({\rm s}) \in {I\!R} : {\rm s} \in D \subset {I\!R}^d \}$ is a zero-mean strictly stationary random process and $\mu(\cdot)$ represents the deterministic trend, namely, E[Z(s)] = μ(s), for all ${\rm s} \in D. $

We will denote the multivariate distribution by:

$$ F_{{\rm s}_1,{\ldots},{\rm s}_k} \left( x_1, {\ldots}, x_k\right)= P\left( Z \left( {\rm s}_1 \right) \leq x_1, {\ldots}, Z \left( {\rm s}_k \right) \leq x_k \right) $$

for all sets of sites ${\rm s}_1, {\ldots}, {\rm s}_k \in D$ and thresholds $x_1,{\ldots},x_k \in I\!R, $ with $k \in I\!N. $

From the model established for the random process in (1), one has:

$$ F_{{\rm s}_1,{\ldots},{\rm s}_k} \left( x_1, {\ldots}, x_k\right) =F_{{\rm s}_1+{\rm z},{\ldots},{\rm s}_k+{\rm z}} \left( x_1+ \mu \left( {\rm s}_1+ {\rm z} \right)- \mu \left( {\rm s}_1 \right), {\ldots}, x_k+ \mu \left( {\rm s}_k+ {\rm z} \right)- \mu \left( {\rm s}_k \right)\right) $$

(2)

for all ${\rm z} \in {I\!R}^d, $ on account of the stationarity condition of $Y(\cdot ), $ since:

$$ {\cal P} \left( Y \left( {\rm s}_1 \right) \leq x_1 , {\ldots}, Y \left( {\rm s}_k\right)\leq x_k \right)= {\cal P} \left( Y \left( {\rm s}_1 + {\rm z} \right) \leq x_1 , {\ldots}, Y \left( {\rm s}_k + {\rm z} \right)\leq x_k \right) $$

Relation (2) yields that the multivariate distribution of $Z(\cdot)$ remains invariant when the spatial locations are subjected to the same translation, by vector z, and each threshold x _j is replaced by an appropriate correction, given by x _j + μ (s_j + z) − μ (s_j) .

Our aim is the estimation of $F_{{\rm s}_1,{\ldots},{\rm s}_k} \left( x_1, {\ldots}, x_k\right), $ for a set of selected sites ${\rm s}_1, {\ldots}, {\rm s}_k \in D$ and thresholds $x_1,{\ldots},x_k \in I\!R, $ with $k \in I\!N. $ This issue will be addressed by applying property (2), which yields:

$$ F_{{\rm s}_1,{\ldots},{\rm s}_k} \left( x_1, {\ldots}, x_k\right)={\cal P} \left( Z \left( {\rm s}_1 +{\rm z} \right) \leq x_1+ \mu \left( {\rm s}_1 +{\rm z} \right)- \mu \left( {\rm s}_1 \right), {\ldots}, Z \left( {\rm s}_k +{\rm z} \right) \leq x_k+ \mu \left( {\rm s}_k+ {\rm z} \right)- \mu \left( {\rm s}_k \right)\right) = {\cal P} \left( Z \left( {\rm t}_1 \right) \leq x_1+ \mu \left( {\rm t}_1 \right)- \mu \left( {\rm s}_1 \right), {\ldots}, Z \left( {\rm t}_k \right)\leq x_k+ \mu \left( {\rm t}_k \right)- \mu \left( {\rm s}_k \right)\right) $$

for t_j = s_j + z and ${\rm z} \in {I\!R}^d. $

The relations above allow us to conclude that:

$$ F_{{\rm s}_1,{\ldots},{\rm s}_k} \left( x_1, {\ldots}, x_k\right)= {\cal P} \left( X \left( {\rm t}_1 \right) \leq x_1 , {\ldots}, X \left( {\rm t}_k \right)\leq x_k\right) $$

(3)

for t_j = s_j + z and X (t_j) = Z (t_j) − μ (t_j) + μ(s_j). Consequently, if the set $\{{\rm t}_{1},{\ldots}, {\rm t}_{k}\}$ represents a translation of the selected locations $\{{\rm s}_{1},{\ldots}, {\rm s}_{k}\}$ by any vector ${\rm z} \in {I\!R}^d, $ the distribution of the random vector $\left(Z \left( {\rm s}_1 \right), {\ldots},Z \left( {\rm s}_k \right) \right)$ equals that of $\left(X \left( {\rm t}_1 \right), {\ldots},X \left( {\rm t}_k \right) \right). $

Next, several approaches will be described for approximation of the distribution function, based on (3). With this aim, suppose that n data, $\{Z ( {\rm t}_1 ),{\ldots},Z( {\rm t}_n )\}, $ have been collected, at the respective locations $\{{\rm t}_1, {\ldots}, {\rm t}_n\}. $ Firstly, we propose constructing a weighted average of the indicator functions obtained for the possible k-combinations $\{{\rm t}_{i_{1}}, {\ldots}, {\rm t}_{i_{k}}\}$ of the observed sites $\{{\rm t}_1, {\ldots}, {\rm t}_n\}, $ as follows:

$$\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k} \left( x_1, {\ldots},x_k \right)=\sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_k=1}^{n} p_{i_1,{\ldots},i_k} I_{\{ X({\rm t}_{i_1}) \leq x_1\}} {\ldots} I_{\{ X({\rm t}_{i_k}) \leq x_k \}} $$

(4)

for some weights $p_{i_1,{\ldots},i_k} \geq 0, $ with $\sum_{i_1=1}^{n} {\ldots} \sum_{i_k=1}^{n} p_{i_1,{\ldots},i_k}=1, $ which will be established in (6) and (7), where I _A denotes the indicator function of the set A, for ${\rm t}_{i_{j}}={\rm s}_{j}+{\rm z} $ and ${\rm z} \in {I\!R}^d. $ In particular, since the latter equality holds for j = 1, we will take ${\rm z}={\rm t}_{i_{1}}-\text{s}_{1}, $ yielding:

$$ X ( {\rm t}_{i_j} )=Z ( {\rm t}_{i_j} )-\mu ( {\rm s}_j+{\rm t}_{i_1}-{\rm s}_1)+\mu({\rm s}_j) $$

(5)

For selection of the values $p_{i_1,{\ldots},i_k}, $ the key idea will be to assign more weight to the k-combination $\{{\rm t}_{i_1}, {\ldots}, {\rm t}_{i_k}\}$ which is closer to being a translation of $\{{\rm s}_1, {\ldots}, {\rm s}_k\}$ by $z=t_{i_{1}}-\text{s}_{1} $. At first sight, this approach involves computing the lag between each pair of the selected locations, s_j − s_j′, and comparing it with that of the corresponding observed sites, ${\rm t}_{i_{j}}-{\rm t}_{i_{j}} $, for j and j′ varying from 1 to k, which amounts to 0.5k(k − 1) comparisons. For the sake of simplicity, we propose solely computing the indispensable lags needed to characterize the closeness of $\{{\rm t}_{i_1}, {\ldots}, {\rm t}_{i_k}\}$ to being the aforementioned translation. This enables us to consider only the lags s_j − s_j+1 and ${\rm t}_{i_{j}}-{\rm t}_{i_{j}}+1 $, for all j ≥ 1, so the other ones would follow with simple sums or differences of vectors. Then, a nonparametric approximation of the multivariate distribution would be obtained by taking $p_{i_1,{\ldots},i_k}$ in (4) as given below:

$$ p_{i_1,{\ldots},i_k}=p_{i_1,{\ldots},i_k}^{(1)}=\frac{ p_{{\rm s}_1,{\rm s}_2}^{{\rm t}_{i_1},{\rm t}_{i_2},h_1} {\ldots} p_{{\rm s}_{k-1},{\rm s}_k}^{{\rm t}_{i_{k-1}},{\rm t}_{i_k},h_{k-1}}}{\sum_{i_1=1}^{n} {\ldots} \sum_{i_k=1}^{n} p_{{\rm s}_1,{\rm s}_2}^{{\rm t}_{i_1},{\rm t}_{i_2},h_1} {\ldots} p_{{\rm s}_{k-1},{\rm s}_k}^{{\rm t}_{i_{k-1}}, {\rm t}_{i_k},h_{k-1}}} $$

(6)

with $p_{{\rm s}_j,{\rm s}_{j+1}}^{{\rm t}_{i_j},{\rm t}_{i_{j+1}},h_j}=K\left( \frac{{\rm s}_j-{\rm s}_{j+1}-({\rm t}_{i_j}-{\rm t}_{i_{j+1}})}{h_j} \right), $ where K represents a d-variate kernel function and h _j is a bandwidth parameter, for $j=1,{\ldots},k-1. $ The resulting estimator will be referred to as $\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(1)}. $

One drawback related to estimator $\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(1)}$ is the combinatorial explosion that may occur for large k, when used for construction of Bootstrap replicates. This is mainly due to the fact that $\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(1)}$ is itself a discrete distribution function, conditional on the sample $\{ Z( {\rm t}_1 ), {\ldots} , Z( {\rm t}_n )\}, $ which takes as many values $(X({\rm t}_{i_1}),{\ldots},X({\rm t}_{i_k}))$ (or, more precisely, vectors of size k) as combinations $\{{\rm t}_{i_1}, {\ldots}, {\rm t}_{i_k}\}$ can be obtained from the set of the observed locations. Indeed, the number of combinations of this kind amount to n ^k, so the resampling approach derived from this distribution estimator would entail drawing a vector of size k from a set of n ^k vectors, whose probabilities $p_{i_1,{\ldots},i_k}^{(1)}$ require computing the terms p _s _j, s ^t_j+1 _j, t_j+1, h _j, for all j varying from 1 to k − 1 and all the combinations of size k from $\{{\rm t}_1, {\ldots}, {\rm t}_n\}. $

In view of the latter, our suggestion will be to construct a new distribution estimator as given in (4), with weights:

$$ p_{i_1,{\ldots},i_k}= p_{i_1,{\ldots},i_k}^{(2)}=\frac{ p_{{\rm s}_1,{\rm s}_2}^{{\rm t}_{i_1}, {\rm t}_{i_2},h_1} }{\sum_{i_1=1}^{n} \sum_{i_2=1}^{n} p_{{\rm s}_1, {\rm s}_2}^{{\rm t}_{i_1},{\rm t}_{i_2},h_1}}\, \frac{ p_{{\rm s}_2,{\rm s}_3}^{{\rm t}_{i_2}, {\rm t}_{i_3},h_2} }{\sum_{i_3=1}^{n} p_{{\rm s}_2,{\rm s}_3}^{{\rm t}_{i_2},{\rm t}_{i_3},h_2}} {\ldots} \frac{ p_{{\rm s}_{k-1},{\rm s}_k}^{{\rm t}_{i_{k-1}},{\rm t}_{i_k},h_{k-1}} }{\sum_{i_k=1}^{n} p_{{\rm s}_{k-1},{\rm s}_k}^{{\rm t}_{i_{k-1}},{\rm t}_{i_k},h_{k-1}}}$$

(7)

The resulting estimator, denoted by $\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)}, $ will allow us to derive a simpler Bootstrap approach, as we will describe in Sect. 3, for which valid probability statements will be made.

The two aforementioned approaches provide identical estimators of the bivariate distribution F _s,s′ and the univariate distribution F _s, as follows:

$$ \hat{F}_{{\rm s},{\rm s}'} \left( x_1,x_2\right)= \sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} \frac{K\left( \frac{{\rm s}-{\rm s}'-({\rm t}_{i}-{\rm t}_{j})}{h_1}\right) I_{\{ X({\rm t}_{i}) \leq x_1 \}} I_{\{ X({\rm t}_{j}) \leq x_2 \}}}{\sum_{i=1}^{n} \sum_{j=1}^{n} K\left( \frac{{\rm s}-{\rm s}'-({\rm t}_{i}-{\rm t}_{j})}{h_1}\right)} \hat{F}_{\rm s} (x)=\hat{F}_{{\rm s},{\rm s}} (x,x)= \sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} \frac{K\left( \frac{{\rm t}_{j}-{\rm t}_{i}}{h_1}\right) I_{\{ X({\rm t}_{i}) \leq x \}} I_{\{ X({\rm t}_{j}) \leq x \}}}{\sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} K\left( \frac{{\rm t}_{j}-{\rm t}_{i}}{h_1}\right)} $$

(8)

The consistency of $\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)}$ can be established by considering a random design for the spatial locations and a mixed increasing-domain asymptotic structure for the random process, together with the appropriate convergence rates for the bandwidth parameters and the increasing scale. A sketch of the proof of this property is outlined in Appendix 1, which gives account of the dependence of the optimal bandwidths h _j on unknown terms. Then, we propose an alternative mechanism for their selection, based on computing h _j as the Euclidean distance from s_j − s_j+1 to the m-nearest difference ${\rm t}_{i_j}-{\rm t}_{i_{j+1}}, $ for some m.

It is noteworthy that estimator $\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)}$ is itself a distribution function, conditional on the sample $\{ Z( {\rm t}_1 ), {\ldots} , Z( {\rm t}_n )\}, $ which takes values $(X({\rm t}_{i_1}),{\ldots},X({\rm t}_{i_k}))$ with respective probabilities $p_{i_1,{\ldots},i_k}^{(2)}, $ for $X (t_{i_{j}}) $ defined in (5). However, $\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)}$ is a discrete distribution function. Hence, for a continuous random process, the use of a smoother version of the distribution estimator seems to be more appropriate, which can be derived by applying in (4) an integrand of a density, instead of an indicator function, as follows:

$$ \tilde{F}_{{\rm s}_1,{\ldots}, {\rm s}_k} \left( x_1, {\ldots},x_k \right)= \sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_k=1}^{n} p_{i_1,{\ldots},i_k}^{(2)} {\cal L} \left( \frac{x_1 - X({\rm t}_{i_1})}{h} \right){\ldots} {\cal L} \left( \frac{x_k - X({\rm t}_{i_k})}{h} \right)$$

(9)

where ${\cal L}(x)=\int_{-\infty}^{x} L(u) du, \,L$ is a univariate kernel function, h is a bandwidth parameter and $p_{i_1,{\ldots},i_k}^{(2)}$ is defined in (7). Consistency can also be derived for $\tilde{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{h_1,{\ldots},h_{k-1}}, $ as outlined in Appendix 2.

Unlike $\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(i)}, $ for i = 1, 2, estimator $\tilde{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}$ is a continuous distribution function, conditional on the sample, with density:

$$ \tilde{f}_{{\rm s}_1,{\ldots}, {\rm s}_k} \left( x_1, {\ldots},x_k \right)= \frac{1}{h^k} \sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_k=1}^{n} p_{i_1,{\ldots},i_k}^{(2)} L \left( \frac{x_1 - X({\rm t}_{i_1})}{h} \right){\ldots} L \left( \frac{x_k - X({\rm t}_{i_k})}{h} \right) $$

The optimal h is dependent on the bandwidths h _j as well as on unknown moments from the random process. Hence, we suggest using a cross-validation procedure for its selection in practice, based on the results given in Bowman et al. (1998) and adapted to the spatial data setting, which aims at providing an appropriate characterization of the performance of the multivariate distribution in the manner described below:

$$ h_{CV,k}={\rm argmin}_{h \in H} \sum\limits_{i_1, {\ldots}, i_k=1}^{n} \int\limits_{-\infty}^{+\infty} {\ldots} \int\limits_{-\infty}^{+\infty} \left( \tilde{F}_{-(i_1,{\ldots},i_k)} \left( x_1,{\ldots},x_k \right) - I_{\{ Z({\rm t}_{i_1}) \leq x_1\}} {\ldots} I_{\{ Z({\rm t}_{i_k}) \leq x_k \}} \right)^2 dx_1 {\ldots} dx_k $$

where H is an adequate set of positive numbers and $\tilde{F}_{-(i_1,{\ldots},i_k)} \left( x_1,{\ldots},x_k \right)$ is the result of implementing estimator $\tilde{F}_{{\rm t}_{i_1},{\ldots},{\rm t}_{i_k}}$ at $(x_1,{\ldots},x_k)$ when ignoring $\{Z({\rm t}_{i_1}),{\ldots},Z({\rm t}_{i_k})\}. $ Proceeding in this way, h _{CV, k} would provide us with a global bandwidth selector that could be applied at any $(x_1,{\ldots},x_k). $ For simplification of the approach to derive the cross-validation bandwidth, our alternative proposal will be based on the use of the univariate continuous estimator, by considering:

$$\tilde{F}_{\rm s} (x)=\sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} \frac{K\left( \frac{{\rm t}_{j}-{\rm t}_{i}}{h_1}\right) {\cal L} \left( \frac{x - X({\rm t}_{i})}{h} \right){\cal L} \left( \frac{x - X({\rm t}_{j})}{h} \right)}{\sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} K\left( \frac{{\rm t}_{j}-{\rm t}_{i}}{h_1}\right)} $$

(10)

since the objective function should involve just one threshold and two bandwidths, to obtain:

$$ h_{CV,1}={\rm argmin}_{h \in H} \sum\limits_{i=1}^{n} \int\limits_{-\infty}^{+\infty} \left( \tilde{F}_{-(i)} \left( x \right) - I_{\{ Z({\rm t}_{i}) \leq x\}} \right)^2 dx $$

where Z(t_i) is left out in the implementation of $\tilde{F}_{{\rm t}_i} (x)$ to produce $\tilde{F}_{-(i)} \left( x \right). $ Furthermore, the integral can be numerically approximated over a bounded subset instead of ${I\!R} $.

Remark 1

Application of any of the previous distribution approaches requires assuming that the trend function can be estimated. In this respect, different procedures have been proposed for approximation of $\mu (\cdot)$ (Goovaerts 1997), so that a parametric approach can be adopted for its estimation or the spatial interpolation techniques can be used to compute the trend.

Remark 2

When the trend function is supposed to be constant, relation (2) can be simplified to yield:

$$ F_{{\rm s}_1,{\ldots},{\rm s}_k} \left( x_1, {\ldots}, x_k\right)= F_{{\rm s}_1+{\rm z},{\ldots},{\rm s}_k+{\rm z}} \left( x_1, {\ldots}, x_k \right) $$

for all ${\rm z} \in {I\!R}^d. $ Then, no characterization of the constant trend is necessary for implementation of the distribution estimators, as we could take $X (t_{i_{j}})=Z (t_{i_{j}}) $ in $\tilde{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}$ and $\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(i)}, $ for i = 1,2.

Remark 3

For the specific case where the difference of trends depends on the lag between the locations involved, namely:

$$ \mu ({\rm s} )- \mu ({\rm s}')= M ({\rm s} - {\rm s}') $$

(11)

for all s,s′ $\in D$ and some function M, combination of (2) and (11) leads to:

$$ F_{{\rm s}_1,{\ldots},{\rm s}_k} \left( x_1, {\ldots}, x_k\right)= F_{{\rm s}_1+{\rm z},{\ldots},{\rm s}_k+{\rm z}} \left( x_1+M({\rm z}), {\ldots}, x_k+M({\rm z})\right) $$

The equality above means that the translation of the spatial locations by vector z, together with the correction of the thresholds by M(z), preserves the value of the distribution function. Then, the distribution approaches would hold for $X ({\rm t}_{i_{j}})={\rm Z} ({\rm t}_{i_{j}}) - M({\rm t}_{i_{1}})-{\rm s}_{1} $, which demands approximation of function M. This issue can be addressed by adapting the different procedures that can be used for μ or even a nonparametric estimator can be derived as follows:

$$ \hat{M}({\rm z})=\frac{\sum_{i=1}^{n}\sum_{j=1}^{n} w_{i,j} \left( {\rm z} \right) \left( Z\left( {\rm t}_i \right)- Z\left( {\rm t}_j \right)\right)}{\sum_{i=1}^{n}\sum_{j=1}^{n} w_{i,j} \left( {\rm z} \right)} $$

for some nonnegative values $w_{i,j}\left( {\rm z} \right)$ satisfying that $\sum_{i=1}^{n}\sum_{j=1}^{n} w_{i,j}\left( {\rm z} \right) >0. $ For instance, we can take $w_{i,j} (z) =I_{\{t_{i}-t_{j}\approx z\}}$ or $w_{i,j}\left( {\rm z} \right) =G \left( \frac{{\rm z}-\left( {\rm t}_i - {\rm t}_j \right)}{g} \right), $ to yield an empirical or a kernel estimator, respectively, where G is a d-variate kernel function and g is a bandwidth parameter. By assuming appropriate hypotheses, consistency of $\hat{M}$ could be proved by using similar arguments as those applied in the kernel variogram estimation (García-Soidán 2007).

Remark 4

None of the proposed distribution estimators fulfills Kolmogorov’s condition of symmetry, namely, that they remain invariant when the locations s_j and the corresponding thresholds x _j are subjected to the same permutation, for $j=1, {\ldots}, k. $ Hence, a criterion must be established to assign an order to the spatial locations and, therefore, to the thresholds, previously to the implementation of the joint distribution estimator, so as to guarantee unicity of the result under permutation.

Among the different options, we propose proceeding in such a way that the sites will be organized in a decreasing order of their influence on the remainder, measured in terms of proximity, because of the underlying stationarity condition. With this idea, departing from the set $\{{\rm s}_1, {\ldots}, {\rm s}_k\}, $ we will take s_j, for j varying from 1 to k, as the closest location to the center of mass (or the d-dimensional mean of the coordinates) of the sites $\{{\rm s}_j,{\rm s}_{j+1}{\ldots},{\rm s}_k\}, $ for $j=1,{\ldots},k-1. $ To solve the problem of tied distances, preference can be given to the location with the smallest first coordinates. The thresholds x _j would also be reordered accordingly.

3 Bootstrap approaches

The distribution estimation approaches, introduced in Sect. 2, can be used to propose Bootstrap methods for spatial data, so that for a given set of selected sites, $\{{\rm s}_1, {\ldots}, {\rm s}_k\}, $ a Bootstrap sample $\{Z^{\ast} ( {\rm s}_1 ), {\ldots} , Z^{\ast}( {\rm s}_k )\}$ can be obtained.

The direct mechanism derived from the discrete estimator $\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(1)}$ entails producing each replicate by drawing from a random variable which takes values $(X ( {\rm t}_{i_1} ), {\ldots}, X( {\rm t}_{i_k} ))$ with probabilities $p_{i_1,{\ldots},i_k}^{(1)}, $ for each $i_j=1,{\ldots},n$ and $j=1,{\ldots},k, $ with $X ( {\rm t}_{i_j} )$ as given in (5). Nevertheless, implementation of this approach, as mentioned in Sect. 2, can have a strong computational cost for large k.

Then, for construction of the replicates, we can take instead estimator $\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)}, $ although there are again n ^k probabilities $p_{i_1,{\ldots},i_k}^{(2)}$ to be considered in order to obtain a Bootstrap sample. However, we suggest an alternative option based on proceeding in a sequential way, which would be less computationally demanding. The resampling scheme would be performed in the manner described below:

(1)
Reorder the locations to take s_j as the closest location to the center of mass of $\{{\rm s}_j,{\rm s}_{j+1}{\ldots},{\rm s}_k\}, $ for $j=1,{\ldots},k-1. $
(2)
Select the bandwidth h ₁ as the Euclidean distance from s₁ − s₂ to the m-nearest difference ${\rm t}_{i_1}-{\rm t}_{i_2},$ for some m.
(3)
Obtain $(Z^{\ast} ( {\rm s}_1 ), Z^{\ast}( {\rm s}_2 ))$ by drawing from a random variable that associates the probabilities:
$$ \frac{p_{{\rm s}_1,{\rm s}_2}^{{\rm t}_{i_1},{\rm t}_{i_2},h_1}}{\sum_{i_1=1}^{n} \sum_{i_2=1}^{n} p_{{\rm s}_1,{\rm s}_2}^{{\rm t}_{i_1},{\rm t}_{i_2},h_1}} $$
to the respective pairs $(X({\rm t}_{i_1}),X({\rm t}_{i_2})). $

Proceeding in this way, a couple of values $(X({\rm t}_{i_1}),X({\rm t}_{i_2})). $ (and, therefore, two indices i ₁ and i ₂) are selected in this step.
(4)
For j = 3, consider the index i _j−1 derived previously and take h _j−1 to be the Euclidean distance from s_j−1 − s_j to the m-nearest difference ${\rm t}_{i_{j-1}}-{\rm t}_{i_{j}}.$
(5)
For j = 3, consider indexes i ₁ and i _j−1 to obtain $Z^{\ast} ( {\rm s}_j )$ by resampling from the random variable which takes values $(X({\rm t}_{i_{j}})$, with respective probabilities:
$$ \frac{p_{{\rm s}_{j-1},{\rm s}_j}^{{\rm t}_{i_{j-1}},{\rm t}_{i_j},h_{j-1}}}{\sum_{i_j=1}^{n} p_{{\rm s}_{j-1},{\rm s}_j}^{{\rm t}_{i_{j-1}},{\rm t}_{i_j},h_{j-1}}} $$
An index i _j is chosen in this step.
(6)
Repeat steps 4 and 5 for all j > 3.

Validity of the preceding procedure follows straightforwardly from the fact that the resulting sample satisfies:

$$ {\cal P}^{\ast} \left(Z^{\ast} ( {\rm s}_1 )=X({\rm t}_{i_1}), {\ldots}, Z^{\ast}( {\rm s}_k )=X({\rm t}_{i_k}) \right)= {\cal P}^{\ast} \left( Z^{\ast} ( {\rm s}_1 )=X({\rm t}_{i_1}), Z^{\ast} ( {\rm s}_2 )=X({\rm t}_{i_2}) \right)\cdot \prod\limits_{j=3}^{k} {\cal P}^{\ast} \left( Z^{\ast} ( {\rm s}_j )=X({\rm t}_{i_j}) \left/ Z^{\ast} ({\rm s}_{j'} )=X({\rm t}_{i_{j'}}), j'<j \right. \right) =\frac{ p_{{\rm s}_1,{\rm s}_2}^{{\rm t}_{i_1},{\rm t}_{i_2},h_1} }{\sum\limits_{i_1=1}^{n} \sum\limits_{i_2=1}^{n} p_{{\rm s}_1,{\rm s}_2}^{{\rm t}_{i_1},{\rm t}_{i_2},h_1}} \prod\limits_{j=3}^{n}\frac{ p_{{\rm s}_{j-1},{\rm s}_j}^{{\rm t}_{i_{j-1}},{\rm t}_{i_j},h_{j-1}} }{\sum\limits_{i_j=1}^{n} p_{{\rm s}_{j-1},{\rm s}_j}^{{\rm t}_{i_{j-1}},{\rm t}_{i_j},h_{j-1}}}=p_{i_1,{\ldots},i_k}^{(2)}$$

on account of the multiplication rule of probability, where ${\cal P}^{\ast}$ denotes the probability, conditional on the sample $\{ Z( {\rm t}_1 ), {\ldots} , Z( {\rm t}_n )\}. $ Then, proceeding as just indicated, a Bootstrap sample from $\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)}$ can be generated.

On the other hand, if the aim is that of drawing replicates from the continuous distribution estimator $\tilde{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}, $ we should additionally extract a random sample of size k from the density L, denoted by $\{V_1, {\ldots},V_k\}$ and independent of $\{ Z( {\rm t}_1 ), {\ldots} , Z( {\rm t}_n )\}. $ Then, the continuous version of the replicates for the random process Z, at locations $\{{\rm s}_1, {\ldots}, {\rm s}_k\}, $ would be constructed as:

$$ \{X ( {\rm t}_{i_1} )+h V_1, {\ldots}, X ( {\rm t}_{i_k} )+h V_k\} $$

To justify the generation of Bootstrap samples as described, for the continuous estimator, bear in mind that:

$$ {\cal P}^{\ast} \left( X ( {\rm t}_{i_1} ) +h V_1 \leq x_1, {\ldots}, X ( {\rm t}_{i_k} )+h V_k \leq x_k \right) = {\cal P}^{\ast} \left( V_1 \leq \frac{x_1 - X( {\rm t}_{i_1} )}{h}, {\ldots}, V_k \leq \frac{x_k -X( {\rm t}_{i_k} )}{h} \right) ={\rm E}^{\ast} \left[ {\cal L} \left( \frac{x_1 - X( {\rm t}_{i_1} )}{h} \right) {\ldots} {\cal L} \left( \frac{x_k -X( {\rm t}_{i_k} )}{h} \right) \right]= \sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_k=1}^{n} {\cal P}^{\ast} \left(Z^{\ast} ( {\rm s}_1 )=X ( {\rm t}_{i_1} ), {\ldots}, Z^{\ast}( {\rm s}_k )=X ( {\rm t}_{i_k} ) \right) \cdot{\cal L} \left( \frac{x_1 -X ( {\rm t}_{i_1} )}{h} \right) {\ldots} {\cal L} \left( \frac{x_k -X ( {\rm t}_{i_k} )}{h} \right) =\sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_k=1}^{n} p_{i_1,{\ldots},i_k}^{(2)} {\cal L} \left( \frac{x_1 -X ( {\rm t}_{i_1} )}{h} \right) {\ldots} {\cal L} \left( \frac{x_k -X ( {\rm t}_{i_k} )}{h} \right)=\tilde{F}_{{\rm s}_1,{\ldots}, {\rm s}_k} \left( x_1,{\ldots}, x_k \right)$$

by the conditions required from the variables V _j, where ${\rm E}^{\ast}$ denotes the expectation, conditional on the sample $\{ Z( {\rm t}_1 ), {\ldots} , Z( {\rm t}_n )\}. $

The Bootstrap approaches can be used to approximate unknown parameters, estimate standard errors, make inference on the correlation structure or on the distribution function of the random process. Suppose, for instance, that $T=T\left( Z( {\rm s}_1 ), {\ldots} , Z( {\rm s}_k ) \right)$ is an estimator of interest, dependent on the data and on the underlying distribution $F_{{\rm s}_1 , {\ldots} , {\rm s}_k}. $ Denote by $T^{\ast}$ its Bootstrap counterpart, namely, $T^{\ast}=T\left( Z^{\ast}( {\rm s}_1 ), {\ldots} , Z^{\ast}( {\rm s}_k ) \right), $ for a Bootstrap sample $\{Z^{\ast } ( {\rm s}_1 ), {\ldots} , Z^{\ast}( {\rm s}_k )\}$ obtained by either of the resampling methods proposed. Then, the unknown characteristic of T, depending on $F_{{\rm s}_1 , {\ldots} , {\rm s}_k}, $ can be approximated by that of $T^{\ast}, $ under the distribution estimator selected. For the latter aim in practice, we can compute the corresponding sample characteristic of B values $T^{\ast (b)}=T\left( Z^{\ast (b)}( {\rm s}_1 ), {\ldots} , Z^{\ast (b)}( {\rm s}_k ) \right), $ derived for B replicates $\{Z^{\ast (b)} ( {\rm s}_1 ), {\ldots} , Z^{\ast (b)}( {\rm s}_k )\}, $ for $b=1,{\ldots},B$ and large B.

4 Application examples

We now describe some examples of the practical usefulness of the methodology proposed in this manuscript to generate Bootstrap replicates from $\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)}$ (or its continuous counterpart $\tilde{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}$), which does not require the estimation of the corresponding multivariate distribution. Firstly, these methods are applied to simulated data and, then, an example with a real data set of air quality indicators is presented.

4.1 Numerical studies with simulated data

In order to analyze the performance of the Bootstrap approaches suggested in Sect. 3, we carried out several numerical studies with simulated data on the unit square $D=[0,1] \times [0,1] \subset {I\!R}^2. $ A complete spatial randomness design was assumed, so the sample locations were uniformly distributed on D. With the spatial locations t_i, obtained for $i= 1,{\ldots}, n$ and n = 50, stationary gaussian data Z(t_i) were generated, by assuming zero mean and by selecting a valid model for the variogram to specify the spatial dependency, as follows:

$$ Z({\rm s})=\mu({\rm s})+Y({\rm s}),\quad {\rm with}\, \mu({\rm s})=0 \quad {\rm and}\ Y({\rm s}) \sim {\rm SGP}(0,\sigma^2,\rho(.;0.2)) $$

In particular, we considered the isotropic exponential and spherical variograms, with a partial sill σ ² equaling 1 or 2.25 (or asymptotic partial sill, for the exponential model), a range ϕ = 0.2 (or practical range, for the exponential model) and a null nugget effect or a nugget effect τ² = 0.09.

For the implementation of the resampling algorithm given in the preceding section, we selected k = 15 sites, $\{{\rm s}_1,{\ldots},{\rm s}_k\}, $ among the set of the sample locations $\{{\rm t}_1,{\ldots},{\rm t}_n\}, $ when avoiding those points too close to the boundaries of the observation region. To generate a Bootstrap sample on these k locations, one needs to derive weights $p^{(2)}_{i_1,{\ldots},i_k}, $ as explained in steps 3 and 5 of the algorithm described in Sect. 3. With this purpose, we took K as the Epanechnikov kernel and the bandwidths $h_1,{\ldots},h_{k-1}, $ based on a balloon estimator, were computed by considering the m-nearest differences in the kernel function and by guaranteeing that 15 %, for h ₁, and 30 %, otherwise, of all distances were used. Given the probabilities $p^{(2)}_{i_1,{\ldots},i_k}, $ the indices i _j were then chosen by a classic accept–reject method. In this respect, note that the stochastic processes $X(\cdot )$ and $Z (\cdot )$ are the same when the trend function is constant, as pointed out in Remark 2.

The smoother Bootstrap version was acquired by applying the continuous distribution estimator $\tilde{F}_{{\rm s}_1,{\ldots},{\rm s}_k}$ in (9), where function ${\cal L}$ was chosen as the standard normal distribution. The corresponding optimal bandwidth was elected by cross-validation, as explained in Sect. 2, among a reasonable set of bandwidth candidates. Then, Bootstrap replicates were generated for the simulated data, under the aforementioned conditions, aiming to analyze the performance of the proposed approaches for the following issues:

(1)
The estimation of the variance of the spatial process, as a common parameter for the overall process, ${\rm Var}\left[Z({\rm s})\right]={\rm E}\left[Z({\rm s})^2\right]-{\rm E}\left[Z({\rm s})\right]^2. $
(2)
The comparison of the discrete and continuous estimators of the univariate distribution, denoted by $\hat{F}_{{\rm s}}$ and $\tilde{F}_{{\rm s}}, $ as given in (8) and (10), respectively.
(3)
The approximation of the variogram, as a function modeling the spatial dependence, $\gamma({\rm t})=0.5{\rm Var}\left[Z({\rm s})-Z({\rm s}+{\rm t})\right]=0.5\cdot{\rm E}\left[\left(Z({\rm s})-Z({\rm s}+{\rm t})\right)^2\right]. $

The first numerical study was designed to compare the discrete and continuous resampling methods for approximation of the variance of the spatial process Z(s). Proceeding as described above, the theoretical expectations involved were approximated through a sample average obtained from B = 500 Bootstrap replicates and 200 samples. The data were simulated under the exponential model, with σ ² equal to 1 or 2.25, ϕ = 0.2 and τ² = 0.09, so the theoretical variance, whose true value is Var[Z(s)] = σ ² + τ², amounts to 1.09 or 2.34. The resulting absolute errors associated to the estimation of the variance of the spatial process are represented in Fig. 1.

The reduced values displayed in Fig. 1 show a good behavior of both Bootstrap approaches, although giving some advantage to the smooth version, despite the fact that an additional bandwidth must be estimated. Furthermore, one should bear in mind that the underlying continuous distribution asks for a continuous tool to make inference and, particularly, it avoids obtaining repeated values.

Aiming to proceed with a deeper analysis to compare the proposed Bootstrap approaches, a new simulation study was carried out, focused on the estimation of the unidimensional distribution F _s. Observe that F _s(x) = F(x), for all s, since the trend function has been taken to equal 0 at all locations. Five thresholds x were selected, identifying the quantiles 5, 25, 50, 75 and 95 % as being representative of the distribution domain, respectively denoted by P _i, with i = 5, 25, 50, 75, 95. For each x, we derived the discrete and continuous estimators from the sample locations $\{{\rm t}_1,{\ldots},{\rm t}_n\}, $ given by:

$$ \begin{aligned}\hat{F} (x)&=\sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n}\frac{K\left( \frac{{\rm t}_{j}-{\rm t}_{i}}{h_1}\right) I_{\{Z({\rm t}_{i}) \leq x \}} I_{\{ Z({\rm t}_{j}) \leq x\}}}{\sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} K\left( \frac{{\rm t}_{j}-{\rm t}_{i}}{h_1}\right)} \\ \tilde{F} (x) &= \sum_{i=1}^{n} \sum_{j=1}^{n} \frac{K\left( \frac{{\rm t}_{j}-{\rm t}_{i}}{h_1}\right) {\cal L} \left( \frac{x - Z({\rm t}_i)}{h} \right) {\cal L} \left( \frac{x - Z({\rm t}_j)}{h} \right)}{\sum_{i=1}^{n} \sum_{j=1}^{n} K\left( \frac{{\rm t}_{j}-{\rm t}_{i}}{h_1}\right)} \end{aligned}$$

The discrete and the continuous Bootstrap methods were used to approximate the errors of both estimators. With this aim, k locations s_i were chosen, where B discrete and continuous replicates were generated to derive the Bootstrap analogues of $\hat{F} (x)$ and $\tilde{F}(x), $ denoted by $\hat{F}^{\ast b} (x)$ and $\tilde{F}^{\ast b} (x)$ for the b-th replicate and $b=1,{\ldots},B. $ Then, we computed $( B^{-1} \sum_{b=1}^{B} ( \hat{F}^{\ast b} (x) - \hat{F} (x) )^2 )^{1/2}$ and $(B^{-1} \sum_{b=1}^{B} ( \tilde{F}^{\ast b} (x) - \tilde{F} (x) )^2 )^{1/2}, $ which provide us with approximations of the Bootstrap standard errors of the discrete and continuous distribution estimators, respectively. Table 1 summarizes the results obtained from 500 Bootstrap replicates and 200 data sets simulated under the exponential model, with σ ² = 1, ϕ = 0.2 and τ² = 0.09.

Table 1 Mean and standard deviation of the standard errors (SE) obtained for the discrete and continuous distribution estimators, for 500 discrete and 500 continuous Bootstrap replicates of 200 samples

Full size table

The small values presented in Table 1 make clear the good performance of the two Bootstrap approaches. In generic terms, it seems advantageous to adopt the continuous estimator $\tilde{F}$ over the discrete estimator $\hat{F}, $ regardless of the resampling method considered. On the other hand, both Bootstrap procedures provide similar estimates of the accuracy of $\tilde{F}, $ while the difference is more evident for approximation of the distribution through $\hat{F}. $

Taking into account the foregoing results, the smoother Bootstrap version will be applied in the following studies. We will now compare the estimation of the total variance of Z(s), with another spatial approach, such as the parametric estimation. With this idea, parametric variograms were obtained by selecting valid models and deriving maximum likelihood (ML) estimates to approximate the unknown parameters. The exponential and the spherical models were used for the latter purpose, thus providing two different settings, depending on whether the parametric candidate coincides with the theoretical model or it is affected by misspecification. The resulting values are shown in Table 2, where data were simulated under the exponential and the spherical models, with σ ² = 1, ϕ = 0.2 and τ² = 0.

Table 2 Mean and standard deviation of the absolute errors (AE) associated to the estimation of Var[Z(s)], whose true value is 1, for 500 Bootstrap replicates and the ML estimates

Full size table

According to the values presented in Table 2, the Bootstrap replicates offer more accurate estimates of Var[Z(s)] than the ML approaches, even when assuming knowledge of the true variogram model. Surprisingly, the prior knowledge of the parametric family is not always advantageous, as illustrated in the case presented in Fig. 2. The latter can be explained by the fact that, under the parametric approach, the approximation derived for the total variance is strongly dependent on the appropriate characterization of the model parameters rather than on the overall variogram function.

In view of the latter, a further step in this research was to analyze the behavior of the resampling methods when dealing with the estimation of the variogram function γ. To do the latter, the integrated quadratic error, ISE $=\int ( \gamma({\rm t})-\hat{\gamma}({\rm t}) )^2 d{\rm t}, $ was approximated numerically, for each data set and for each of the estimators $\hat{\gamma}$ implemented, including a valid version obtained through the Bootstrap approach. We started by deriving the empirical Bootstrap estimates of the variogram and then fitting them, through an iterated weighted least squares criterion, to a class of permissible variograms, following the procedure developed in Shapiro and Botha (1991). Proceeding in this way, the validity of the resulting estimator was guaranteed with no prior specification of a parametric model. Figure 2 represents the example of a data set simulated with an exponential model together with the resulting variogram estimators, one given by the Bootstrap approach and the other two acquired by ML.

The procedure described above was repeated for 200 samples of size n = 50, for data simulated under the exponential and spherical models, with σ ² = 1, ϕ = 0.2 and τ² = 0. The ISE values were computed and Table 3 summarizes the results obtained.

Table 3 Mean and standard deviation of the ISE values obtained through the Bootstrap estimator combined with Shapiro and Botha’s method and the ML estimators

Full size table

The results displayed in Table 3 give account of the good performance of the Bootstrap approach when addressed to estimate the spatial dependence of the random process in terms of the variogram. The ML estimates improve the behavior of the resampling procedure for small distances, although the Bootstrap method competes with the parametric approach for lags larger than 0.05.

4.2 Application to environmental monitoring data

In this section we derive an application of the Bootstrap methodology to a real data set concerning biomonitoring of arsenic pollution in the Central Region of Portugal, classified as NUTS II (NUTS stands for ”Nomenclature of Units for Territorial Statistics”). The measured variable represents the concentrations in moss samples, in micrograms per gram dry weight. The typical procedure, alternative to the more expensive solution of determining the amount of pollutant directly, is to plant the moss and some time later to collect it, which allows the concentration of arsenic (and other heavy metals) to be measured. More details on this Portuguese project of air pollution analysis can be found in Martins et al. (2012).

The data set was collected in 2006 and it can be represented by $\{({\rm t}_i, Z({\rm t}_i)), i=1,{\ldots}, n\}, $ with n = 98 and Z(t_i) identifying the log-transformed concentration of arsenic (As) at location t_i. We adopted the log-transformation to reduce the impact of the presence of outliers. Afterwards, there were still three gross outliers, which were replaced by the average of the remaining values from that year’s survey. Table 4 gives the summary statistics for the resulting data, showing that the log-transformation leads to a more symmetric distribution. Furthermore, Fig. 3 presents the spatial representation of log-transformed data, where each bullet size is proportional to the corresponding measured value.

Table 4 Summary statistics for arsenic pollution levels measured in the Central Region of Portugal (NUTS II)

Full size table

Aiming to exemplify the usefulness of the proposed Bootstrap approaches in this application, we first estimate a deterministic model for E[Z(s)] = μ(s) with ${\rm s}=({\rm UtmX}, {\rm UtmY}) \in D \subset {I\!R}^2$ and D identifying the region of NUTS II in Portugal. We then assume that the random process Z(s) can be modeled as:

$$ Z({\rm s})=\mu({\rm s})+Y({\rm s}) $$

where $\{Y({\rm s}): s \in D\}$ is a zero-mean strictly stationary random process and μ(s) = α ₀ + α ₁ × UtmY. These regression coefficients were estimated, presenting statistically significant values α ₀ = −37.8 and α ₁ = 0.0083.

To derive weights $p^{(2)}_{i_1,{\ldots},i_k}, $ associated to the multivariate distribution $\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)}, $ one needs to select locations ${\rm s}_1,{\ldots},{\rm s}_k \in D. $ So, we proceeded by choosing s_i, for $i=1,{\ldots},k$ and k = 40, among the 98 sample locations, as represented in Fig. 3. The probabilities $p^{(2)}_{i_1,{\ldots},i_k}$ were acquired as explained in steps 3 and 5 of the resampling algorithm given in Sect. 3, which allowed us to generate Bootstrap samples by taking into account the dependence structure of the underlying random process.

With 500 replicates, we estimated the total variance of the process $Y(\cdot ), $ whose results are given in Table 5, where three different approaches were considered. According to the numerical studies presented in Sect. 4.1, the value 0.886 seems an accurate approximation of Var[Y(s)].

Table 5 Estimates of the variance of the log-transformed arsenic pollution level, by considering the discrete and continuous versions of the Bootstrap replicates and assuming an exponential covariance model fitted to the observed data through ML

Full size table

To highlight the potentiality of the Bootstrap techniques within the scope of dependent data, one of their applications, pointed out in Sect. 1, will be addressed. In particular, we will focus on the estimation of the accuracy of a spatial approach, such as the nonparametric spatial predictor proposed in Menezes et al. (2010), under stochastic sampling design. The use of the latter kernel-based predictor demands an optimal bandwidth, which can be defined as dependent on the target location, offering better results than when a global optimal bandwidth is adopted. A drawback of the aforementioned predictor is that no estimation of the prediction error is available.

Here, we suggest to estimate standard errors through the Bootstrap approaches specified along this manuscript. In fact, if a prediction value is obtained for each Bootstrap sample (for a total of B replicates), then a simple method to approximate the unknown standard error is the standard deviation of those prediction values. Results are summarized in Table 6, for three points randomly chosen in the NUTS II region, represented as s_A, s_B and s_C in Fig. 3. The nonparametric predictions were computed as established in Menezes et al. (2010), by using optimal local bandwidths equal to 75.5, 36.9 and 90, for s_A, s_B and s_C, respectively.

Table 6 Estimates of the prediction error of the log-transformed arsenic pollution level at locations s_A, s_B and s_C. For each nonparametric prediction (NP Pred), two different estimates of the standard errors (SE) are presented, by considering the discrete and continuous versions of the Bootstrap replicates. Results from OK are given in the two right columns

Full size table

For each nonparametric prediction, two different estimates of the standard errors are presented, obtained by considering either the discrete or the continuous versions of the Bootstrap replicates. As complementary information, we also present the prediction results derived by application of the ordinary kriging (OK) in Table 6. The latter can allow us to conclude that the Bootstrap estimators point out smaller values for the standard errors than those obtained through the OK. The largest standard errors are associated to the location s_B, where less information is available, since that area includes less sampled data.

As a last note, we have back-transformed the predicted values and added the trend information, to obtain the estimates of the process Z(s) at those three target locations, leading to $\hat{Z}({\rm s}_A)=1.153, \,\hat{Z}({\rm s}_B)=0.878$ and $\hat{Z}({\rm s}_C)=0.354$ (under OK, the corresponding values were 1.187, 0.818 and 0.345). Knowing that 50 % of the locations have a concentration of arsenic smaller than 0.57, it is possible to conclude that s_C is one of the locations with lower intensity of air pollution, as opposed to s_A and s_B.

5 Conclusions

In this paper consistent estimators of the multivariate distribution function have been proposed, which can be used as the basis for implementation of Bootstrap approaches in the spatial setting. The resampling method derived from the discrete estimator is an adaptation to this setting of the naive Bootstrap described in Efron (1979) and has similar properties, such as consistency or the fact that nearly every sample derived from it contains repeated values. The alternative version, obtained in the current work by applying a continuous distribution estimator, is the analogue of the smoothed Bootstrap approach for independent data (Lejeune and Sarda 1992). An advantage of the second approach is that it entails resampling from a continuous distribution and, therefore, it avoids the aforementioned problem of providing repeated data in the replicates. However, an additional uncertainty is introduced, in terms of the bandwidth parameter that must be estimated. For independent data, the question of whether the smoothed Bootstrap is superior to the naive alternative, and for which smoothing parameter, has been analyzed by a few authors (Silverman and Young 1987; Hall 1992), but no definitive answers exist. Therefore, the difficulties to check the behavior of the resampling approaches in the spatial setting increase, because of the underlying dependence structure. In this respect, although further research should be developed, the numerical studies conducted in the current work give account of a good performance of both procedures, with a little superiority of the continuous spatial Bootstrap, when the bandwidth parameter is appropriately selected.

On the other hand, as pointed out in the introduction, the Bootstrap methodology allows the researchers to solve different statistical problems inherent to the estimation process. It must not be intended as a substitute of other techniques designed for addressing specific issues, but for complementing them and adding extra information. From this perspective, the Bootstrap proposals offer an attractive alternative for resampling in the spatial setting. Such approaches aim at reproducing the data dependence structure, before deriving subsamples. In this respect, the numerical studies developed with simulated data give account of the good behavior of the resampling methods, which can be even advantageous over other procedures for estimation of the variance or the variogram function, although the really important thing is that they help capture the main features of the underlying spatial process. Consequently, the generation of replicates offer an accurate alternative to derive estimates of the standard error or any other unknown characteristic of the statistical approach that can be considered.

References

Bowman A, Hall P, Prvan T (1998) Bandwidth selection for the smoothing of distribution functions. Biometrika 85(4):799–808. doi:10.1093/biomet/85.4.799
Article Google Scholar
Crujeiras RM, Fernández-Casal R, González-Manteiga W (2010) Goodness-of-fit tests for the spatial spectral density. Stoch Environ Res Risk Assess 24(1):67–79. doi:10.1007/s00477-008-0300-0
Article Google Scholar
De Angelis D, Young GA (1992) Smoothing the bootstrap. Int Stat Rev 60(1):45–56
Article Google Scholar
Efron B (1979) Bootstrap methods: another look at the Jackknife. Ann Stat 7(1):1–26. doi:10.1214/aos/1176344552
Article Google Scholar
García-Soidán P (2007) Asymptotic normality of the Nadaraya–Watson semivariogram estimators. TEST 16(3):479–503. doi:10.1007/s11749-006-0016-8
Article Google Scholar
Goovaerts P (1997) Geostatistics for natural resources evaluation. Oxford University Press, New York
Govaerts B, Beck B, Lecoutre E, Le Bailly C, VandenEeckaut P (2005) From monitoring data to regional distributions: a practical methodology applied to water risk assessment. Environmetrics 16(2):109–127. doi:10.1002/env.665
Google Scholar
Hall P (1985) Resampling a coverage pattern. Stoch Process Appl 20(2):231–246. doi:10.1016/0304-4149(85)90212-1
Article Google Scholar
Hall P (1992) The bootstrap and edgeworth expansion. Springer, New York
Hall P, Maiti T (2006) On parametric bootstrap methods for small area prediction. J R Stat Soc B 68(2):221–238. doi:10.1111/j.1467-9868.2006.00541.x
Article Google Scholar
Hall P, Patil P (1994) Properties of nonparametric estimators of autocovariance for stationary random fields. Probab Theory Relat Fields 99(3):399–424. doi:10.1007/BF01199899
Article Google Scholar
Hyun-Han K, Young-Il M (2006) Improvement of overtopping risk evaluations using probabilistic concepts for existing dams. Stoch Environ Res Risk Assess 20(4):223–237. doi:10.1007/s00477-005-0017-2
Article Google Scholar
Iranpanah N, Mansourianb A, Tashayob B, Haghighic F (2011) Spatial semi-parametric bootstrap method for analysis of kriging predictor of random field. Procedia Environ Sci 3:81–86. doi:10.1016/j.proenv.2011.02.015
Article Google Scholar
Lejeune M, Sarda P (1992) Smooth estimators of distribution and density functions. Comput Stat Data Anal 14:457–471. doi:10.1016/0167-9473(92)90061-J
Article Google Scholar
Li B, Genton M, Sherman M (2007) A nonparametric assessment of properties of space–time covariance functions. JASA 102(478):736–744. doi:10.1198/016214507000000202
Article CAS Google Scholar
Loh JM (2008) A valid and fast spatial Bootstrap for correlation functions. Astrophys J 681(1):726–734. doi:10.1086/588631
Article Google Scholar
Maglione DS, Diblasi AM (2004) Exploring a valid model for the variogram of an isotropic spatial process. Stoch Environ Res Risk Assess 18(6):366–376. doi:10.1007/s00477-003-0143-7
Article Google Scholar
Martins A, Figueira R, Sousa A, Sérgio C (2012) Spatio-temporal patterns of Cu contamination in mosses using geostatistical estimation. Environ Pollut 170:276–284. doi:10.1016/j.envpol.2012.07.004
Article CAS Google Scholar
Menezes R, García-Soidán P, Ferreira C (2010) Nonparametric spatial prediction under stochastic sampling design. J Nonparametr Stat 22(3):363–377. doi:10.1080/10485250903094294
Article Google Scholar
Olea RA, Pardo-Igúzquiza E (2011) Generalized Bootstrap method for assessment of uncertainty in semivariogram inference. Math Geosci 43(2):203–228. doi:10.1007/s11004-010-9269-6
Article Google Scholar
Politis DN, Romano JP, Wolf M (1999) Subsampling. Springer, Berlin
Shapiro A, Botha JD (1991) Variogram fitting with a general class of conditionally nonnegative definite functions. Comput Stat Data Anal 11(1):87–96. doi:10.1016/0167-9473(91)90055-7
Article Google Scholar
Silverman BW, Young GA (1987) The bootstrap: to smooth or not to smooth? Biometrika 74(3):469–479. doi:10.1093/biomet/74.3.469
Article Google Scholar

Download references

Acknowledgments

The authors would like to thank the helpful suggestions and comments from the Reviewers. The authors are also grateful to Dr. K. J. Duncan-Barlow (University of Vigo) for her contribution in the language revision. The first and third authors acknowledge financial support from the Project TEC2011-28683-C02-02 of the Spanish Ministry of Science and Innovation and the Project CN2012/279 from the European Regional Development Fund and the Galician Regional Government (Xunta de Galicia). The second author’s work has been supported by the Project PTDC/MAT/112338/2009 (FEDER support included) of the Portuguese Ministry of Science, Technology and Higher Education.

Author information

Authors and Affiliations

Department of Statistics and Operations Research, University of Vigo, Vigo, Spain
Pilar García-Soidán
Department of Mathematics and Applications, University of Minho, Braga, Portugal
Raquel Menezes
Department of Signal Theory and Communications, University of Vigo, Vigo, Spain
Óscar Rubiños

Authors

Pilar García-Soidán
View author publications
You can also search for this author in PubMed Google Scholar
Raquel Menezes
View author publications
You can also search for this author in PubMed Google Scholar
Óscar Rubiños
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pilar García-Soidán.

Appendices

Appendix 1: Consistency of $\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)} \left( x_1,{\ldots},x_k \right)$

To check that consistency follows for $\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)} \left( x_1,{\ldots},x_k \right), $ the hypotheses described below will be assumed:

(i)
$\{ Z ( {\rm s} ) \in {I\!R} : {\rm s} \in D \subset {I\!R}^d \}$ can be modeled as given in (1).
(ii)
D = λD ₀, for some $\lambda=\lambda (n) \mathop{\longrightarrow}\limits^{n \rightarrow + \infty} +\infty$ and bounded $D_0 \subset {I\!R}^d. $
(iii)
t_i = λu_i, for 1 ≤ i ≤ n, where ${\rm u}_1, {\ldots}, {\rm u}_n$ denotes a realization of a random sample of size n drawn from a density function g ₀ considered on D ₀.
(iv)
$Z(\cdot)$ is α-mixing, with α(r) = O(r ^−a), for r > 0 and some constant a > 0.
(v)
K is d-variate and symmetric density function with compact support.
(vi)
$\{ h_{1}^{2}+ {\cdots} + h_{k-1}^{2}+ \lambda^{-1} + n^{-k} \lambda^{d(k-1)} h_{1}^{-d} {\ldots} h_{k-1}^{-d} \} \mathop{\longrightarrow}\limits^{n \rightarrow + \infty} 0. $
(vii)
$F_{{\rm s}_1,{\ldots}, {\rm s}_k} (x_1,{\ldots},x_k)$ is three-times continuously differentiable as a function of $({\rm s}_1,{\ldots}, {\rm s}_k). $

We will prove that the bias and the variance of $\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)} \left( x_1,{\ldots},x_k \right)$ are of the respective orders $( h_{1}^{2}+ {\cdots} + h_{k-1}^{2})$ and $(n^{-k} \lambda^{d(k-1)} h_{1}^{-d} {\ldots} h_{k-1}^{-d} + \lambda^{-d})$ and, therefore, tend to zero as the sample size n increases, which would yield the consistency of the distribution estimator. To do the latter, conditions (i)–(vii) will be applied and a similar procedure as in the proof of Theorem 3.1 in Hall and Patil (1994).

Write $A_{i_1,i_2}=\frac{ p_{{\rm s}_1,{\rm s}_2}^{{\rm t}_{i_1},{\rm t}_{i_2},h_1} }{\sum_{i_1=1}^{n} \sum_{i_2=1}^{n} p_{{\rm s}_1,{\rm s}_2}^{{\rm t}_{i_1},{\rm t}_{i_2},h_1}}$ and $A_{i_{j-1},i_j}=\frac{ p_{{\rm s}_{j-1},{\rm s}_j}^{{\rm t}_{i_{j-1}},{\rm t}_{i_j},h_{j-1}} }{\sum_{i_j=1}^{n} p_{{\rm s}_{j-1},{\rm s}_j}^{{\rm t}_{i_{j-1}},{\rm t}_{i_j},h_{j-1}}}$ for $j=3,{\ldots},k. $ Firstly, we can take into account that, for large n:

$$ {\rm E} \left[\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)} \left( x_1,{\ldots},x_k \right)\right] ={\rm E} \left[ {\rm E} \left[ \hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)} \left( x_1,{\ldots},x_k \right) {\rm t}_{i_j}, \forall j \right] \right] =\sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_k=1}^{n} {\rm E} \left[A_{i_1,i_2} {\ldots} A_{i_{k-1},i_k} {\rm E} \left[ I_{\{ X({\rm t}_{i_1}) \leq x_1 \}} {\ldots} I_{\{ X({\rm t}_{i_k}) \leq x_k \}} {\rm t}_{i_j}, \forall j \right]\right] = \sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_k=1}^{n} {\rm E} \left[ A_{i_1,i_2} {\ldots} A_{i_{k-1},i_k} F_{{\rm t}_{i_1},{\ldots}, {\rm t}_{i_k}} \left( x_1 + \mu \left( {\rm s}_1 + {\rm t}_{i_1}- {\rm s}_1\right) - \mu \left( {\rm s}_1 \right),{\ldots},x_k + \mu \left( {\rm s}_k + {\rm t}_{i_1}- {\rm s}_1\right) - \mu \left( {\rm s}_k \right)\right)\right] =\sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_k=1}^{n} {\rm E} \left[ A_{i_1,i_2} {\ldots} A_{i_{k-2},i_{k-1}} \cdot {\rm E} \left[ A_{i_{k-1},i_k} F_{{\rm t}_{i_1}-{\rm t}_{i_1}+{\rm s}_1, {\ldots},{\rm t}_{i_k}-{\rm t}_{i_1}+{\rm s}_1 } \left( x_1,{\ldots},x_k \right){\rm t}_{i_j}, j\leq k-1 \right] \right]$$

on account of (3).

Now, the last conditional expectation will be approximated. With this aim, bear in mind that:

$$ {\rm E} \left[ K\left( \frac{{\rm s_{k-1}}-{\rm s}_k-({\rm t}_{i_{k-1}}-{\rm t}_{i_k})}{h_{k-1}} \right) F_{{\rm t}_{i_1}-{\rm t}_{i_1}+{\rm s}_1, {\ldots},{\rm t}_{i_k}-{\rm t}_{i_1}+{\rm s}_1 } \left( x_1,{\ldots},x_k \right)\right]= \int K\left( \frac{{\rm s_{k-1}}-{\rm s}_k-({\rm t}_{i_{k-1}}-\lambda {\rm u})}{h_{k-1}} \right) F_{{\rm t}_{i_1}-{\rm t}_{i_1}+{\rm s}_1, {\ldots},\lambda {\rm u}-{\rm t}_{i_1}+{\rm s}_1 } \left( x_1,{\ldots},x_k \right) g_0({\rm u}) d{\rm u}\approx \lambda^d h_{k-1}^{d} g_0(0) \int K\left( {\rm z}_1 \right) F_{{\rm t}_{i_1}-{\rm t}_{i_1}+{\rm s}_1, {\ldots},{\rm s}_k-{\rm s}_{k-1}+{\rm t}_{i_{k-1}} -{\rm t}_{i_1}+{\rm s}_1+h_{k-1} {\rm z}_1 } \left( x_1,{\ldots},x_k \right) d{\rm z}_1 {\rm E} \left[ K\left( \frac{{\rm s_{k-1}}-{\rm s}_k-({\rm t}_{i_{k-1}}-{\rm t}_{i_k})}{h_{k-1}} \right) \right] = \int K\left( \frac{{\rm s_{k-1}}-{\rm s}_k-({\rm t}_{i_{k-1}}-\lambda {\rm u})}{h_{k-1}} \right) g_0({\rm u}) d{\rm u} \approx \lambda^d h_{k-1}^{d} g_0(0) $$

From the previous relations, it follows that:

$${\rm E} \left[ \left. A_{i_{k-1},i_k} F_{{\rm t}_{i_1}-{\rm t}_{i_1}+{\rm s}_1, {\ldots},{\rm t}_{i_k}-{\rm t}_{i_1}+{\rm s}_1 } \left( x_1,{\ldots},x_k \right)\right/ {\rm t}_{i_j}, j\leq k-1 \right]\approx n^{-1} \int K\left( {\rm z}_1 \right) F_{{\rm t}_{i_1}-{\rm t}_{i_1}+{\rm s}_1, {\ldots},{\rm s}_k-{\rm s}_{k-1}+{\rm t}_{i_{k-1}} -{\rm t}_{i_1}+{\rm s}_1+h_{k-1}{\rm z}_1 } \left( x_1,{\ldots},x_k \right) d{\rm z}_1 $$

and, therefore:

$${\rm E} \left[\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)} \left( x_1,{\ldots},x_k \right)\right] \approx \sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_{k-1}=1}^{n} {\rm E} \left[ A_{i_1,i_2} {\ldots} A_{i_{k-2},i_{k-1}} \int K\left( {\rm z}_1 \right) \cdot F_{{\rm t}_{i_1}-{\rm t}_{i_1}+{\rm s}_1, {\ldots},{\rm s}_k-{\rm s}_{k-1}-{\rm t}_{i_{k-1}} -{\rm t}_{i_1}+{\rm s}_1 +h_{k-1}{\rm z}_1} \left( x_1,{\ldots},x_k \right) d{\rm z}_1 \right].$$

We can iterate the strategy above, based on applying an appropriate conditional expectation and developing the resulting term, to achieve that:

$${\rm E} \left[\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)} \left( x_1,{\ldots},x_k \right)\right]\approx \int {\ldots} \int F_{{\rm s}_1,{\rm s}_2+h_1{\rm z}_{k-1},{\ldots}, {\rm s}_k+h_1{\rm z}_{k-1}+ {\cdots} +h_{k-1}{\rm z}_1} \left( x_1,x_2,{\ldots},x_k \right) \cdot K \left( {\rm z}_1 \right) {\ldots} K \left( {\rm z}_{k-1} \right) d{\rm z}_1 {\ldots} d{\rm z}_{k-1} = F_{{\rm s}_1,{\rm s}_2,{\ldots}, {\rm s}_k} \left( x_1,x_2,{\ldots},x_k \right) + O \left( h_{1}^{2}+{\cdots}+ h_{k-1}^{2}\right) $$

With regard to the variance, one has for large n that:

$$ {\rm Var} \left[\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)} \left( x_1,{\ldots},x_k \right)\right]={\rm E} \left[ \left(\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)} \left( x_1,{\ldots},x_k \right) -{\rm E} \left[\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)} \left( x_1,{\ldots},x_k \right)\right]\right)^2 \right] \approx V_1+V_2 $$

where:

$$ \begin{aligned} V_1 &= \sum\limits_{i_1=1}^{n} {\ldots}\sum\limits_{i_k=1}^{n} {\rm E} \left[ A_{i_1,i_2}^{2} {\ldots}A_{i_{k-1},i_k}^{2}\cdot \left( I_{\{ X({\rm t}_{i_1}) \leq x_1\}} {\ldots} I_{\{ X({\rm t}_{i_k}) \leq x_k \}} - F_{{\rm s}_1,{\ldots},{\rm s}_k } \left( x_1,{\ldots},x_k \right)^2 \right)\right]\\ V_2 &= \sum\limits_{i_1=1}^{n} {\ldots}\sum\limits_{i_k=1}^{n} \sum\limits_{j_1=1}^{n} {\ldots}\sum\limits_{j_k=1}^{n}\cdot{\rm E} \left[ A_{i_1,i_2} {\ldots}A_{i_{k-1},i_k} A_{j_1,j_2} {\ldots} A_{j_{k-1},j_k}\left( I_{\{X({\rm t}_{i_1}) \leq x_1 \}} {\ldots} I_{\{ X({\rm t}_{i_k}) \leq x_k \}} \cdot I_{\{ X({\rm t}_{j_1}) \leq x_1 \}} {\ldots} I_{\{X({\rm t}_{j_k}) \leq x_k \}} -F_{{\rm s}_1, {\ldots},{\rm s}_k }\left( x_1,{\ldots},x_k \right)^2 \right) \right] \end{aligned}$$

By using similar arguments as above, we could check that:

$$ V_1 \approx \frac{n^{-k} \lambda^{d(k-1)} h_{1}^{-d} {\ldots} h_{k-1}^{-d} \left( F_{{\rm s}_1,{\ldots}, {\rm s}_k} \left( x_1,{\ldots},x_k \right)- F_{{\rm s}_1,{\ldots}, {\rm s}_k} \left( x_1,{\ldots},x_k \right)^2 \right)}{ g_0(0)^{k}\left( \int K \left( {\rm z} \right)^2 d{\rm z} \right)^{k-1} } V_2 \approx \lambda^{-d} \int \left( F_{{\rm s}_1,{\ldots}, {\rm s}_k,{\rm s}_1+{\rm t},{\ldots}, {\rm s}_k+{\rm t}} \left( x_1,{\ldots},x_k ,x_1,{\ldots},x_k\right) - F_{{\rm s}_1,{\ldots}, {\rm s}_k} \left( x_1,{\ldots},x_k \right)^2 \right) d{\rm t} $$

Consequently:

$$ {\rm Var} \left[\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)} \left( x_1,{\ldots},x_k \right)\right]= O \left( n^{-k} \lambda^{d(k-1)} h_{1}^{-d} {\ldots} h_{k-1}^{-d} + \lambda^{-d} \right) $$

We could derive the dominant terms of the bias and the variance of the distribution estimator as well as asymptotically minimize the mean squared error (MSE) of the distribution estimator, namely:

$${\rm MSE} \left[\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)} \left( x_1,{\ldots},x_k \right) \right]={\rm Bias} \left[ \hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)} \left( x_1,{\ldots},x_k \right) \right]^2 + {\rm Var} \left[\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)} \left( x_1,{\ldots},x_k \right)\right] $$

to obtain the optimal bandwidths h _j, for $j=1,{\ldots},k-1, $ which would be dependent on unknown terms, such as the multivariate distribution function itself and its second-order derivatives.

Appendix 2: Consistency of $\tilde{F}_{{\rm s}_1,{\ldots}, {\rm s}_k} \left( x_1,{\ldots},x_k \right)$

To derive this proof, we will assume conditions (i)–(v), together with:

(vi′)
$\{ h_{1}^{2}+ {\cdots} + h_{k-1}^{2}+ h^2+\lambda^{-1} + n^{-k} \lambda^{d(k-1)} h_{1}^{-d} {\ldots} h_{k-1}^{-d} \} \mathop{\longrightarrow}\limits^{n \rightarrow + \infty} 0. $
(vii′)
$F_{{\rm s}_1,{\ldots}, {\rm s}_k} (x_1,{\ldots},x_k)$ is three-times continuously differentiable as a function of $({\rm s}_1,{\ldots}, {\rm s}_k)$ and as a function of $(x_1,{\ldots},x_k). $
(viii)
L is a univariate and symmetric density function with compact support.

For large n, the aforementioned hypotheses yield that:

$$ {\rm E} \left[\tilde{F}_{{\rm s}_1,{\ldots}, {\rm s}_k} \left( x_1,{\ldots},x_k \right)\right] ={\rm E} \left[ {\rm E} \left[ \left. \tilde{F}_{{\rm s}_1,{\ldots}, {\rm s}_k} \left( x_1,{\ldots},x_k \right) \right/ {\rm t}_{i_j}, \forall j \right] \right] =\sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_k=1}^{n} {\rm E} \left[ \left. A_{i_1,i_2} {\ldots} A_{i_{k-1},i_{k}} {\rm E} \left[ {\cal L} \left( \frac{x_1 -X ( {\rm t}_{i_1} )}{h} \right) {\ldots} {\cal L} \left( \frac{x_k -X ( {\rm t}_{i_k} )}{h} \right) \right/ {\rm t}_{i_j}, \forall j \right]\right] =\sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_k=1}^{n} {\rm E} \left[ A_{i_1,i_2}\cdot{\ldots}\cdot A_{i_{k-1},i_{k}} \cdot \int {\cal L} \left( \frac{x_1 -u_1+\mu ( {\rm s}_1-{\rm t}_{i_1}+{\rm s}_1)-\mu({\rm s}_1)}{h} \right) {\ldots} {\cal L} \left( \frac{x_k -u_k+\mu ( {\rm s}_k-{\rm t}_{i_1}+{\rm s}_1)-\mu({\rm s}_k)}{h} \right) \cdot f_{{\rm t}_{i_1},{\ldots}, {\rm t}_{i_k}} \left( u_1,{\ldots},u_k\right) du_1 {\ldots} du_k \right]$$

where $f_{{\rm t}_1,{\ldots}, {\rm t}_k}$ denotes the joint density function of $(Z({\rm t}_1),{\ldots},Z({\rm t}_k)). $

We can integrate by parts and apply relation (3) to obtain that:

$$ {\rm E} \left[\tilde{F}_{{\rm s}_1,{\ldots}, {\rm s}_k} \left( x_1,{\ldots},x_k \right)\right] = \sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_k=1}^{n} {\rm E} \left[ A_{i_1,i_2} {\ldots} A_{i_{k-1},i_{k}} \int L\left( y_1\right) {\ldots} L\left( y_k\right) \cdot F_{{\rm t}_{i_1},{\ldots}, {\rm t}_{i_k}} \left( x_1 -hy_1+\mu ( {\rm s}_1-{\rm t}_{i_1}+{\rm s}_1)-\mu({\rm s}_1) , {\ldots}, x_k -hy_k+\mu ( {\rm s}_k-{\rm t}_{i_1}+{\rm s}_1)-\mu({\rm s}_k)\right) dy_1 {\ldots} dy_k \right]= \sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_k=1}^{n} {\rm E} \left[ A_{i_1,i_2} {\ldots} A_{i_{k-1},i_{k}} \int L\left( y_1\right) {\ldots} L\left( y_k\right) \cdot F_{{\rm t}_{i_1}-{\rm t}_{i_1}+{\rm s}_1, {\ldots},{\rm t}_{i_k}-{\rm t}_{i_1}+{\rm s}_1 } \left( x_1 -hy_1 , {\ldots}, x_k -hy_k\right) dy_1 {\ldots} dy_k \right] $$

By proceeding with analogue arguments as those used for the bias of $\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)}, $ it follows that:

$${\rm E} \left[\tilde{F}_{{\rm s}_1,{\ldots}, {\rm s}_k} \left( x_1,{\ldots},x_k \right)\right] \approx \int {\ldots} \int \int {\ldots} \int L\left( y_1\right) {\ldots} L\left( y_k\right) \cdot K \left( {\rm z}_1 \right) {\ldots} K \left( {\rm z}_{k-1} \right) F_{{\rm s}_1,{\rm s}_2+h_1{\rm z}_{k-1},{\ldots}, {\rm s}_k+h_1{\rm z}_{k-1}+ {\cdots} +h_{k-1}{\rm z}_1} \left( x_1 -hy_1, {\ldots}, x_k -hy_k\right) d{\rm z}_1 {\ldots} d{\rm z}_{k-1} dy_1 {\ldots} dy_k= F_{{\rm s}_1,{\rm s}_2,{\ldots}, {\rm s}_k} \left( x_1,{\ldots},x_k \right) + O \left( h_{1}^{2}+{\cdots}+ h_{k-1}^{2}+h^2\right)$$

Finally, the approximation of the variance of the continuous estimator will be addressed as given below:

$$ {\rm Var} \left[\tilde{F}_{{\rm s}_1,{\ldots}, {\rm s}_k} \left( x_1,{\ldots},x_k \right)\right] \approx W_1+W_2 $$

with:

$$ W_1 = \sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_k=1}^{n} {\rm E} \left[ A_{i_1,i_2}^{2} {\ldots} A_{i_{k-1},i_k}^{2} \left( {\cal L} \left( \frac{x_1 -X ( {\rm t}_{i_1} )}{h} \right) {\ldots} {\cal L} \left( \frac{x_k -X ( {\rm t}_{i_k} )}{h} \right) - F_{{\rm s}_1, {\ldots},{\rm s}_k } \left( x_1,{\ldots},x_k \right)^2 \right) \right] W_2 = \sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_k=1}^{n} \sum\limits_{j_1=1}^{n} {\ldots} \sum\limits_{j_k=1}^{n} {\rm E} \left[ A_{i_1,i_2} {\ldots} A_{i_{k-1},i_k} A_{j_1,j_2} {\ldots} A_{j_{k-1},j_k}\cdot \left( {\cal L} \left( \frac{x_1 -X ( {\rm t}_{i_1} )}{h} \right) {\ldots} {\cal L} \left( \frac{x_k -X ( {\rm t}_{i_k} )}{h}\right) {\cal L} \left( \frac{x_1 -X ( {\rm t}_{j_1} )}{h} \right) {\ldots} {\cal L} \left( \frac{x_k -X ( {\rm t}_{j_k} )}{h} \right)-F_{{\rm s}_1, {\ldots},{\rm s}_k } \left( x_1,{\ldots},x_k \right)^2 \right) \right] $$

Now, we can combine the arguments used for the bias of the continuous estimator with those applied for the variance of the discrete estimator to check that both terms, W ₁ and W ₂, are asymptotically negligible, as established next:

$$ W_1 \approx \sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_k=1}^{n} {\rm E} \left[ A_{i_1,i_2}^2 {\ldots} A_{i_{k-1},i_{k}}^2 \int L( y_1) {\ldots} L( y_k)\cdot ( F_{{\rm s}_1, {\ldots},{\rm s}_k} ( x_1 -hy_1 , {\ldots}, x_k -hy_k ) - F_{{\rm s}_1, {\ldots},{\rm s}_k } ( x_1,{\ldots},x_k )^2 ) dy_1 {\ldots} dy_k\right] \approx \sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_k=1}^{n} {\rm E} \left[ A_{i_1,i_2}^2 {\ldots} A_{i_{k-1},i_{k}}^2 ( F_{{\rm s}_1, {\ldots},{\rm s}_k } ( x_1,{\ldots},x_k ) - F_{{\rm s}_1, {\ldots},{\rm s}_k } ( x_1,{\ldots},x_k )^2)\right] \approx\frac{n^{-k} \lambda^{d(k-1)} h_{1}^{-d} {\ldots} h_{k-1}^{-d} ( x_1,{\ldots},x_k ) ( F_{{\rm s}_1,{\ldots}, {\rm s}_k} ( x_1,{\ldots},x_k )- F_{{\rm s}_1,{\ldots}, {\rm s}_k} ( x_1,{\ldots},x_k )^2 )}{ g_0(0)^{k}\left( \int K ( {\rm z} )^2 d{\rm z} \right)^{k-1} }= O \left( n^{-k} \lambda^{d(k-1)} h_{1}^{-d} {\ldots} h_{k-1}^{-d} \right) W_2 \approx \sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_k=1}^{n} \sum\limits_{j_1=1}^{n} {\ldots} \sum\limits_{j_k=1}^{n} {\rm E} \left[ A_{i_1,i_2} {\ldots} A_{i_{k-1},i_{k}} A_{j_1,j_2} {\ldots} A_{j_{k-1},j_{k}}\right]\cdot \int \int {\ldots} \int \int {\ldots} \int L( y_1) {\ldots} L( y_k) L( w_1) {\ldots} L( w_k) ( F_{{\rm s}_1, {\ldots},{\rm s}_k,{\rm s}_1+{\rm t}, {\ldots},{\rm s}_k+{\rm t}} ( x_1 -hy_1 , {\ldots}, x_k -hy_k,x_1 -hw_1 , {\ldots}, x_k -hw_k) - F_{{\rm s}_1, {\ldots},{\rm s}_k } ( x_1,{\ldots},x_k )^2) d{\rm t} dy_1 {\ldots} dy_k \cdot dw_1 {\ldots} dw_k\approx \lambda^{-d} \int ( F_{{\rm s}_1,{\ldots}, {\rm s}_k,{\rm s}_1+{\rm t},{\ldots}, {\rm s}_k+{\rm t}} ( x_1,{\ldots},x_k ,x_1,{\ldots},x_k) - F_{{\rm s}_1,{\ldots}, {\rm s}_k} ( x_1,{\ldots},x_k )^2) d{\rm t}= O (\lambda^{-d})$$

Then, consistency yields for $\tilde{F}_{{\rm s}_1,{\ldots}, {\rm s}_k} \left( x_1,{\ldots},x_k \right), $ since its bias and variance tend to zero, as the sample size increases.

Rights and permissions

Reprints and permissions

About this article

Cite this article

García-Soidán, P., Menezes, R. & Rubiños, Ó. Bootstrap approaches for spatial data. Stoch Environ Res Risk Assess 28, 1207–1219 (2014). https://doi.org/10.1007/s00477-013-0808-9

Download citation

Published: 11 October 2013
Issue Date: July 2014
DOI: https://doi.org/10.1007/s00477-013-0808-9

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Bootstrap approaches for spatial data

Abstract

Similar content being viewed by others

Uncertainty Quantification in Robust Inference for Irregularly Spaced Spatial Data Using Block Bootstrap

Estimating High Quantiles Based on Dependent Circular Data

Consistency of bootstrap approximation to the null distributions of local spatial statistics with application to house price analysis

1 Introduction