1 Introduction

Different statistical techniques can require extra work to approximate the unknown terms involved, to check the variance of the estimators employed or to derive their sampling distribution. Some of these problems have been solved in the curve estimation setting through the Bootstrap method (Efron 1979), whose key idea is that the relationship between the theoretical distribution and the observed sample is similar to the relationship between a distribution estimator and a secondary sample drawn from it. The original Bootstrap procedures were designed for independent data (De Angelis and Young 1992; Hall 1992) and an adaptation of these procedures to the spatial setting would have large applicability for addressing a variety of issues. A goodness-of-fit test for the variogram model could be implemented to extend the results in Maglione and Diblasi (2004) for gaussian data, and even inference on the underlying distribution of the random process could be developed. Bootstrap variants of the tests based on their asymptotic distribution, such as the one proposed in Li et al. (2007) for assessment of the properties of the covariance function, could be introduced to improve their speed of convergence. Also, the kernel approaches for characterization of the dependence structure (Hall and Patil 1994) or prediction (Menezes et al. 2010) could be accomplished by deriving the optimal bandwidths and additionally providing estimates of their accuracy.

The parametric Bootstrap methodology can be easily extended to the spatial setting, thus enabling the researchers to develop techniques for variogram assessment (Olea and Pardo-Igúzquiza 2011), to construct confidence intervals for the parameters of a distribution estimator (Goovaerts et al. 2005) or to correct a test for the spectral density (Crujeiras et al. 2010). When independence can be assumed from the residuals, the traditional nonparametric resampling approaches can be applied for approximation of the variance of an estimator (Iranpanah et al. 2011) or probabilities derived from it (Hyun-Han and Young-Il 2006). However, for dependent data, it is necessary to design ad-hoc procedures in order to guarantee consistency of the results, such as the parametric Bootstrap method for small area estimation (Hall and Maiti 2006). Alternatives of more general use are those based on resampling blocks of data (Hall 1985), subsamples obtained by deleting portions of data (Politis et al. 1999) or marks assigned to the spatial points (Loh 2008).

The aim of this work is to introduce nonparametric Bootstrap approaches that allow us to generate replicates from the available data, at a set of locations selected, by first approximating the joint distribution in a nonparametric way and then randomly drawing samples from it. With this idea, different estimators of the multivariate distribution will be proposed, which are distribution functions themselves, associated to discrete or continuous random variables, so that they will be used as the basis for resampling. We will check that consistency follows for the suggested procedures, provided that the random process is strictly stationary or when this condition is relaxed by admitting a deterministic trend. In addition, numerical studies will be carried out to analyze the behavior of both Bootstrap methods for addressing different problems.

2 Multivariate distribution estimators

To derive the distribution estimators, we will assume that the random process \(\{ Z ( {\rm s} ) \in {I\!R} : {\rm s} \in D \subset {I\!R}^d \}\) can be modeled as:

$$ Z({\rm s}) = \mu ({\rm s}) + Y({\rm s}) $$
(1)

where \(\{ Y({\rm s}) \in {I\!R} : {\rm s} \in D \subset {I\!R}^d \}\) is a zero-mean strictly stationary random process and \(\mu(\cdot)\) represents the deterministic trend, namely, E[Z(s)] = μ(s), for all \({\rm s} \in D. \)

We will denote the multivariate distribution by:

$$ F_{{\rm s}_1,{\ldots},{\rm s}_k} \left( x_1, {\ldots}, x_k\right)= P\left( Z \left( {\rm s}_1 \right) \leq x_1, {\ldots}, Z \left( {\rm s}_k \right) \leq x_k \right) $$

for all sets of sites \({\rm s}_1, {\ldots}, {\rm s}_k \in D\) and thresholds \(x_1,{\ldots},x_k \in I\!R, \) with \(k \in I\!N. \)

From the model established for the random process in (1), one has:

$$ F_{{\rm s}_1,{\ldots},{\rm s}_k} \left( x_1, {\ldots}, x_k\right) =F_{{\rm s}_1+{\rm z},{\ldots},{\rm s}_k+{\rm z}} \left( x_1+ \mu \left( {\rm s}_1+ {\rm z} \right)- \mu \left( {\rm s}_1 \right), {\ldots}, x_k+ \mu \left( {\rm s}_k+ {\rm z} \right)- \mu \left( {\rm s}_k \right)\right) $$
(2)

for all \({\rm z} \in {I\!R}^d, \) on account of the stationarity condition of \(Y(\cdot ), \) since:

$$ {\cal P} \left( Y \left( {\rm s}_1 \right) \leq x_1 , {\ldots}, Y \left( {\rm s}_k\right)\leq x_k \right)= {\cal P} \left( Y \left( {\rm s}_1 + {\rm z} \right) \leq x_1 , {\ldots}, Y \left( {\rm s}_k + {\rm z} \right)\leq x_k \right) $$

Relation (2) yields that the multivariate distribution of \(Z(\cdot)\) remains invariant when the spatial locations are subjected to the same translation, by vector z, and each threshold x j is replaced by an appropriate correction, given by x j  +  μ (s j  + z) − μ (s j ) .

Our aim is the estimation of \(F_{{\rm s}_1,{\ldots},{\rm s}_k} \left( x_1, {\ldots}, x_k\right), \) for a set of selected sites \({\rm s}_1, {\ldots}, {\rm s}_k \in D\) and thresholds \(x_1,{\ldots},x_k \in I\!R, \) with \(k \in I\!N. \) This issue will be addressed by applying property (2), which yields:

$$ F_{{\rm s}_1,{\ldots},{\rm s}_k} \left( x_1, {\ldots}, x_k\right)={\cal P} \left( Z \left( {\rm s}_1 +{\rm z} \right) \leq x_1+ \mu \left( {\rm s}_1 +{\rm z} \right)- \mu \left( {\rm s}_1 \right), {\ldots}, Z \left( {\rm s}_k +{\rm z} \right) \leq x_k+ \mu \left( {\rm s}_k+ {\rm z} \right)- \mu \left( {\rm s}_k \right)\right) = {\cal P} \left( Z \left( {\rm t}_1 \right) \leq x_1+ \mu \left( {\rm t}_1 \right)- \mu \left( {\rm s}_1 \right), {\ldots}, Z \left( {\rm t}_k \right)\leq x_k+ \mu \left( {\rm t}_k \right)- \mu \left( {\rm s}_k \right)\right) $$

for t j  = s j  + z and \({\rm z} \in {I\!R}^d. \)

The relations above allow us to conclude that:

$$ F_{{\rm s}_1,{\ldots},{\rm s}_k} \left( x_1, {\ldots}, x_k\right)= {\cal P} \left( X \left( {\rm t}_1 \right) \leq x_1 , {\ldots}, X \left( {\rm t}_k \right)\leq x_k\right) $$
(3)

for t j  = s j  + z and X (t j ) = Z (t j ) − μ (t j ) + μ(s j ). Consequently, if the set \(\{{\rm t}_{1},{\ldots}, {\rm t}_{k}\}\) represents a translation of the selected locations \(\{{\rm s}_{1},{\ldots}, {\rm s}_{k}\}\) by any vector \({\rm z} \in {I\!R}^d, \) the distribution of the random vector \(\left(Z \left( {\rm s}_1 \right), {\ldots},Z \left( {\rm s}_k \right) \right)\) equals that of \(\left(X \left( {\rm t}_1 \right), {\ldots},X \left( {\rm t}_k \right) \right). \)

Next, several approaches will be described for approximation of the distribution function, based on (3). With this aim, suppose that n data, \(\{Z ( {\rm t}_1 ),{\ldots},Z( {\rm t}_n )\}, \) have been collected, at the respective locations \(\{{\rm t}_1, {\ldots}, {\rm t}_n\}. \) Firstly, we propose constructing a weighted average of the indicator functions obtained for the possible k-combinations \(\{{\rm t}_{i_{1}}, {\ldots}, {\rm t}_{i_{k}}\}\) of the observed sites \(\{{\rm t}_1, {\ldots}, {\rm t}_n\}, \) as follows:

$$\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k} \left( x_1, {\ldots},x_k \right)=\sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_k=1}^{n} p_{i_1,{\ldots},i_k} I_{\{ X({\rm t}_{i_1}) \leq x_1\}} {\ldots} I_{\{ X({\rm t}_{i_k}) \leq x_k \}} $$
(4)

for some weights \(p_{i_1,{\ldots},i_k} \geq 0, \) with \(\sum_{i_1=1}^{n} {\ldots} \sum_{i_k=1}^{n} p_{i_1,{\ldots},i_k}=1, \) which will be established in (6) and (7), where I A denotes the indicator function of the set A, for \({\rm t}_{i_{j}}={\rm s}_{j}+{\rm z} \) and \({\rm z} \in {I\!R}^d. \) In particular, since the latter equality holds for j = 1, we will take \({\rm z}={\rm t}_{i_{1}}-\text{s}_{1}, \) yielding:

$$ X ( {\rm t}_{i_j} )=Z ( {\rm t}_{i_j} )-\mu ( {\rm s}_j+{\rm t}_{i_1}-{\rm s}_1)+\mu({\rm s}_j) $$
(5)

For selection of the values \(p_{i_1,{\ldots},i_k}, \) the key idea will be to assign more weight to the k-combination \(\{{\rm t}_{i_1}, {\ldots}, {\rm t}_{i_k}\}\) which is closer to being a translation of \(\{{\rm s}_1, {\ldots}, {\rm s}_k\}\) by \(z=t_{i_{1}}-\text{s}_{1} \). At first sight, this approach involves computing the lag between each pair of the selected locations, s j  − s j, and comparing it with that of the corresponding observed sites, \({\rm t}_{i_{j}}-{\rm t}_{i_{j}} \), for j and j′ varying from 1 to k, which amounts to 0.5k(k − 1) comparisons. For the sake of simplicity, we propose solely computing the indispensable lags needed to characterize the closeness of \(\{{\rm t}_{i_1}, {\ldots}, {\rm t}_{i_k}\}\) to being the aforementioned translation. This enables us to consider only the lags s j  − s j+1 and \({\rm t}_{i_{j}}-{\rm t}_{i_{j}}+1 \), for all j ≥ 1, so the other ones would follow with simple sums or differences of vectors. Then, a nonparametric approximation of the multivariate distribution would be obtained by taking \(p_{i_1,{\ldots},i_k}\) in (4) as given below:

$$ p_{i_1,{\ldots},i_k}=p_{i_1,{\ldots},i_k}^{(1)}=\frac{ p_{{\rm s}_1,{\rm s}_2}^{{\rm t}_{i_1},{\rm t}_{i_2},h_1} {\ldots} p_{{\rm s}_{k-1},{\rm s}_k}^{{\rm t}_{i_{k-1}},{\rm t}_{i_k},h_{k-1}}}{\sum_{i_1=1}^{n} {\ldots} \sum_{i_k=1}^{n} p_{{\rm s}_1,{\rm s}_2}^{{\rm t}_{i_1},{\rm t}_{i_2},h_1} {\ldots} p_{{\rm s}_{k-1},{\rm s}_k}^{{\rm t}_{i_{k-1}}, {\rm t}_{i_k},h_{k-1}}} $$
(6)

with \(p_{{\rm s}_j,{\rm s}_{j+1}}^{{\rm t}_{i_j},{\rm t}_{i_{j+1}},h_j}=K\left( \frac{{\rm s}_j-{\rm s}_{j+1}-({\rm t}_{i_j}-{\rm t}_{i_{j+1}})}{h_j} \right), \) where K represents a d-variate kernel function and h j is a bandwidth parameter, for \(j=1,{\ldots},k-1. \) The resulting estimator will be referred to as \(\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(1)}. \)

One drawback related to estimator \(\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(1)}\) is the combinatorial explosion that may occur for large k, when used for construction of Bootstrap replicates. This is mainly due to the fact that \(\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(1)}\) is itself a discrete distribution function, conditional on the sample \(\{ Z( {\rm t}_1 ), {\ldots} , Z( {\rm t}_n )\}, \) which takes as many values \((X({\rm t}_{i_1}),{\ldots},X({\rm t}_{i_k}))\) (or, more precisely, vectors of size k) as combinations \(\{{\rm t}_{i_1}, {\ldots}, {\rm t}_{i_k}\}\) can be obtained from the set of the observed locations. Indeed, the number of combinations of this kind amount to n k, so the resampling approach derived from this distribution estimator would entail drawing a vector of size k from a set of n k vectors, whose probabilities \(p_{i_1,{\ldots},i_k}^{(1)}\) require computing the terms p s j , s t j+1 j , t j+1h j , for all j varying from 1 to k − 1 and all the combinations of size k from \(\{{\rm t}_1, {\ldots}, {\rm t}_n\}. \)

In view of the latter, our suggestion will be to construct a new distribution estimator as given in (4), with weights:

$$ p_{i_1,{\ldots},i_k}= p_{i_1,{\ldots},i_k}^{(2)}=\frac{ p_{{\rm s}_1,{\rm s}_2}^{{\rm t}_{i_1}, {\rm t}_{i_2},h_1} }{\sum_{i_1=1}^{n} \sum_{i_2=1}^{n} p_{{\rm s}_1, {\rm s}_2}^{{\rm t}_{i_1},{\rm t}_{i_2},h_1}}\, \frac{ p_{{\rm s}_2,{\rm s}_3}^{{\rm t}_{i_2}, {\rm t}_{i_3},h_2} }{\sum_{i_3=1}^{n} p_{{\rm s}_2,{\rm s}_3}^{{\rm t}_{i_2},{\rm t}_{i_3},h_2}} {\ldots} \frac{ p_{{\rm s}_{k-1},{\rm s}_k}^{{\rm t}_{i_{k-1}},{\rm t}_{i_k},h_{k-1}} }{\sum_{i_k=1}^{n} p_{{\rm s}_{k-1},{\rm s}_k}^{{\rm t}_{i_{k-1}},{\rm t}_{i_k},h_{k-1}}}$$
(7)

The resulting estimator, denoted by \(\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)}, \) will allow us to derive a simpler Bootstrap approach, as we will describe in Sect. 3, for which valid probability statements will be made.

The two aforementioned approaches provide identical estimators of the bivariate distribution F s,s′ and the univariate distribution F s, as follows:

$$ \hat{F}_{{\rm s},{\rm s}'} \left( x_1,x_2\right)= \sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} \frac{K\left( \frac{{\rm s}-{\rm s}'-({\rm t}_{i}-{\rm t}_{j})}{h_1}\right) I_{\{ X({\rm t}_{i}) \leq x_1 \}} I_{\{ X({\rm t}_{j}) \leq x_2 \}}}{\sum_{i=1}^{n} \sum_{j=1}^{n} K\left( \frac{{\rm s}-{\rm s}'-({\rm t}_{i}-{\rm t}_{j})}{h_1}\right)} \hat{F}_{\rm s} (x)=\hat{F}_{{\rm s},{\rm s}} (x,x)= \sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} \frac{K\left( \frac{{\rm t}_{j}-{\rm t}_{i}}{h_1}\right) I_{\{ X({\rm t}_{i}) \leq x \}} I_{\{ X({\rm t}_{j}) \leq x \}}}{\sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} K\left( \frac{{\rm t}_{j}-{\rm t}_{i}}{h_1}\right)} $$
(8)

The consistency of \(\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)}\) can be established by considering a random design for the spatial locations and a mixed increasing-domain asymptotic structure for the random process, together with the appropriate convergence rates for the bandwidth parameters and the increasing scale. A sketch of the proof of this property is outlined in Appendix 1, which gives account of the dependence of the optimal bandwidths h j on unknown terms. Then, we propose an alternative mechanism for their selection, based on computing h j as the Euclidean distance from s j  − s j+1 to the m-nearest difference \({\rm t}_{i_j}-{\rm t}_{i_{j+1}}, \) for some m.

It is noteworthy that estimator \(\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)}\) is itself a distribution function, conditional on the sample \(\{ Z( {\rm t}_1 ), {\ldots} , Z( {\rm t}_n )\}, \) which takes values \((X({\rm t}_{i_1}),{\ldots},X({\rm t}_{i_k}))\) with respective probabilities \(p_{i_1,{\ldots},i_k}^{(2)}, \) for \(X (t_{i_{j}}) \) defined in (5). However, \(\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)}\) is a discrete distribution function. Hence, for a continuous random process, the use of a smoother version of the distribution estimator seems to be more appropriate, which can be derived by applying in (4) an integrand of a density, instead of an indicator function, as follows:

$$ \tilde{F}_{{\rm s}_1,{\ldots}, {\rm s}_k} \left( x_1, {\ldots},x_k \right)= \sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_k=1}^{n} p_{i_1,{\ldots},i_k}^{(2)} {\cal L} \left( \frac{x_1 - X({\rm t}_{i_1})}{h} \right){\ldots} {\cal L} \left( \frac{x_k - X({\rm t}_{i_k})}{h} \right)$$
(9)

where \({\cal L}(x)=\int_{-\infty}^{x} L(u) du, \,L\) is a univariate kernel function, h is a bandwidth parameter and \(p_{i_1,{\ldots},i_k}^{(2)}\) is defined in (7). Consistency can also be derived for \(\tilde{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{h_1,{\ldots},h_{k-1}}, \) as outlined in Appendix 2.

Unlike \(\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(i)}, \) for i = 1, 2, estimator \(\tilde{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}\) is a continuous distribution function, conditional on the sample, with density:

$$ \tilde{f}_{{\rm s}_1,{\ldots}, {\rm s}_k} \left( x_1, {\ldots},x_k \right)= \frac{1}{h^k} \sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_k=1}^{n} p_{i_1,{\ldots},i_k}^{(2)} L \left( \frac{x_1 - X({\rm t}_{i_1})}{h} \right){\ldots} L \left( \frac{x_k - X({\rm t}_{i_k})}{h} \right) $$

The optimal h is dependent on the bandwidths h j as well as on unknown moments from the random process. Hence, we suggest using a cross-validation procedure for its selection in practice, based on the results given in Bowman et al. (1998) and adapted to the spatial data setting, which aims at providing an appropriate characterization of the performance of the multivariate distribution in the manner described below:

$$ h_{CV,k}={\rm argmin}_{h \in H} \sum\limits_{i_1, {\ldots}, i_k=1}^{n} \int\limits_{-\infty}^{+\infty} {\ldots} \int\limits_{-\infty}^{+\infty} \left( \tilde{F}_{-(i_1,{\ldots},i_k)} \left( x_1,{\ldots},x_k \right) - I_{\{ Z({\rm t}_{i_1}) \leq x_1\}} {\ldots} I_{\{ Z({\rm t}_{i_k}) \leq x_k \}} \right)^2 dx_1 {\ldots} dx_k $$

where H is an adequate set of positive numbers and \(\tilde{F}_{-(i_1,{\ldots},i_k)} \left( x_1,{\ldots},x_k \right)\) is the result of implementing estimator \(\tilde{F}_{{\rm t}_{i_1},{\ldots},{\rm t}_{i_k}}\) at \((x_1,{\ldots},x_k)\) when ignoring \(\{Z({\rm t}_{i_1}),{\ldots},Z({\rm t}_{i_k})\}. \) Proceeding in this way, h CV, k would provide us with a global bandwidth selector that could be applied at any \((x_1,{\ldots},x_k). \) For simplification of the approach to derive the cross-validation bandwidth, our alternative proposal will be based on the use of the univariate continuous estimator, by considering:

$$\tilde{F}_{\rm s} (x)=\sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} \frac{K\left( \frac{{\rm t}_{j}-{\rm t}_{i}}{h_1}\right) {\cal L} \left( \frac{x - X({\rm t}_{i})}{h} \right){\cal L} \left( \frac{x - X({\rm t}_{j})}{h} \right)}{\sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} K\left( \frac{{\rm t}_{j}-{\rm t}_{i}}{h_1}\right)} $$
(10)

since the objective function should involve just one threshold and two bandwidths, to obtain:

$$ h_{CV,1}={\rm argmin}_{h \in H} \sum\limits_{i=1}^{n} \int\limits_{-\infty}^{+\infty} \left( \tilde{F}_{-(i)} \left( x \right) - I_{\{ Z({\rm t}_{i}) \leq x\}} \right)^2 dx $$

where Z(t i ) is left out in the implementation of \(\tilde{F}_{{\rm t}_i} (x)\) to produce \(\tilde{F}_{-(i)} \left( x \right). \) Furthermore, the integral can be numerically approximated over a bounded subset instead of \({I\!R} \).

Remark 1

Application of any of the previous distribution approaches requires assuming that the trend function can be estimated. In this respect, different procedures have been proposed for approximation of \(\mu (\cdot)\) (Goovaerts 1997), so that a parametric approach can be adopted for its estimation or the spatial interpolation techniques can be used to compute the trend.

Remark 2

When the trend function is supposed to be constant, relation (2) can be simplified to yield:

$$ F_{{\rm s}_1,{\ldots},{\rm s}_k} \left( x_1, {\ldots}, x_k\right)= F_{{\rm s}_1+{\rm z},{\ldots},{\rm s}_k+{\rm z}} \left( x_1, {\ldots}, x_k \right) $$

for all \({\rm z} \in {I\!R}^d. \) Then, no characterization of the constant trend is necessary for implementation of the distribution estimators, as we could take \(X (t_{i_{j}})=Z (t_{i_{j}}) \) in \(\tilde{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}\) and \(\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(i)}, \) for i = 1,2.

Remark 3

For the specific case where the difference of trends depends on the lag between the locations involved, namely:

$$ \mu ({\rm s} )- \mu ({\rm s}')= M ({\rm s} - {\rm s}') $$
(11)

for all s,s′ \(\in D\) and some function M, combination of (2) and (11) leads to:

$$ F_{{\rm s}_1,{\ldots},{\rm s}_k} \left( x_1, {\ldots}, x_k\right)= F_{{\rm s}_1+{\rm z},{\ldots},{\rm s}_k+{\rm z}} \left( x_1+M({\rm z}), {\ldots}, x_k+M({\rm z})\right) $$

The equality above means that the translation of the spatial locations by vector z, together with the correction of the thresholds by M(z), preserves the value of the distribution function. Then, the distribution approaches would hold for \(X ({\rm t}_{i_{j}})={\rm Z} ({\rm t}_{i_{j}}) - M({\rm t}_{i_{1}})-{\rm s}_{1} \), which demands approximation of function M. This issue can be addressed by adapting the different procedures that can be used for μ or even a nonparametric estimator can be derived as follows:

$$ \hat{M}({\rm z})=\frac{\sum_{i=1}^{n}\sum_{j=1}^{n} w_{i,j} \left( {\rm z} \right) \left( Z\left( {\rm t}_i \right)- Z\left( {\rm t}_j \right)\right)}{\sum_{i=1}^{n}\sum_{j=1}^{n} w_{i,j} \left( {\rm z} \right)} $$

for some nonnegative values \(w_{i,j}\left( {\rm z} \right)\) satisfying that \(\sum_{i=1}^{n}\sum_{j=1}^{n} w_{i,j}\left( {\rm z} \right) >0. \) For instance, we can take \(w_{i,j} (z) =I_{\{t_{i}-t_{j}\approx z\}}\) or \(w_{i,j}\left( {\rm z} \right) =G \left( \frac{{\rm z}-\left( {\rm t}_i - {\rm t}_j \right)}{g} \right), \) to yield an empirical or a kernel estimator, respectively, where G is a d-variate kernel function and g is a bandwidth parameter. By assuming appropriate hypotheses, consistency of \(\hat{M}\) could be proved by using similar arguments as those applied in the kernel variogram estimation (García-Soidán 2007).

Remark 4

None of the proposed distribution estimators fulfills Kolmogorov’s condition of symmetry, namely, that they remain invariant when the locations s j and the corresponding thresholds x j are subjected to the same permutation, for \(j=1, {\ldots}, k. \) Hence, a criterion must be established to assign an order to the spatial locations and, therefore, to the thresholds, previously to the implementation of the joint distribution estimator, so as to guarantee unicity of the result under permutation.

Among the different options, we propose proceeding in such a way that the sites will be organized in a decreasing order of their influence on the remainder, measured in terms of proximity, because of the underlying stationarity condition. With this idea, departing from the set \(\{{\rm s}_1, {\ldots}, {\rm s}_k\}, \) we will take s j , for j varying from 1 to k, as the closest location to the center of mass (or the d-dimensional mean of the coordinates) of the sites \(\{{\rm s}_j,{\rm s}_{j+1}{\ldots},{\rm s}_k\}, \) for \(j=1,{\ldots},k-1. \) To solve the problem of tied distances, preference can be given to the location with the smallest first coordinates. The thresholds x j would also be reordered accordingly.

3 Bootstrap approaches

The distribution estimation approaches, introduced in Sect. 2, can be used to propose Bootstrap methods for spatial data, so that for a given set of selected sites, \(\{{\rm s}_1, {\ldots}, {\rm s}_k\}, \) a Bootstrap sample \(\{Z^{\ast} ( {\rm s}_1 ), {\ldots} , Z^{\ast}( {\rm s}_k )\}\) can be obtained.

The direct mechanism derived from the discrete estimator \(\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(1)}\) entails producing each replicate by drawing from a random variable which takes values \((X ( {\rm t}_{i_1} ), {\ldots}, X( {\rm t}_{i_k} ))\) with probabilities \(p_{i_1,{\ldots},i_k}^{(1)}, \) for each \(i_j=1,{\ldots},n\) and \(j=1,{\ldots},k, \) with \(X ( {\rm t}_{i_j} )\) as given in (5). Nevertheless, implementation of this approach, as mentioned in Sect. 2, can have a strong computational cost for large k.

Then, for construction of the replicates, we can take instead estimator \(\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)}, \) although there are again n k probabilities \(p_{i_1,{\ldots},i_k}^{(2)}\) to be considered in order to obtain a Bootstrap sample. However, we suggest an alternative option based on proceeding in a sequential way, which would be less computationally demanding. The resampling scheme would be performed in the manner described below:

  1. (1)

    Reorder the locations to take s j as the closest location to the center of mass of \(\{{\rm s}_j,{\rm s}_{j+1}{\ldots},{\rm s}_k\}, \) for \(j=1,{\ldots},k-1. \)

  2. (2)

    Select the bandwidth h 1 as the Euclidean distance from s1 − s2 to the m-nearest difference \({\rm t}_{i_1}-{\rm t}_{i_2},\) for some m.

  3. (3)

    Obtain \((Z^{\ast} ( {\rm s}_1 ), Z^{\ast}( {\rm s}_2 ))\) by drawing from a random variable that associates the probabilities:

    $$ \frac{p_{{\rm s}_1,{\rm s}_2}^{{\rm t}_{i_1},{\rm t}_{i_2},h_1}}{\sum_{i_1=1}^{n} \sum_{i_2=1}^{n} p_{{\rm s}_1,{\rm s}_2}^{{\rm t}_{i_1},{\rm t}_{i_2},h_1}} $$

    to the respective pairs \((X({\rm t}_{i_1}),X({\rm t}_{i_2})). \)

    Proceeding in this way, a couple of values \((X({\rm t}_{i_1}),X({\rm t}_{i_2})). \) (and, therefore, two indices i 1 and i 2) are selected in this step.

  4. (4)

    For j = 3, consider the index i j−1 derived previously and take h j−1 to be the Euclidean distance from s j−1 − s j to the m-nearest difference \({\rm t}_{i_{j-1}}-{\rm t}_{i_{j}}.\)

  5. (5)

    For j = 3, consider indexes i 1 and i j−1 to obtain \(Z^{\ast} ( {\rm s}_j )\) by resampling from the random variable which takes values \((X({\rm t}_{i_{j}})\), with respective probabilities:

    $$ \frac{p_{{\rm s}_{j-1},{\rm s}_j}^{{\rm t}_{i_{j-1}},{\rm t}_{i_j},h_{j-1}}}{\sum_{i_j=1}^{n} p_{{\rm s}_{j-1},{\rm s}_j}^{{\rm t}_{i_{j-1}},{\rm t}_{i_j},h_{j-1}}} $$

    An index i j is chosen in this step.

  6. (6)

    Repeat steps 4 and 5 for all j > 3.

Validity of the preceding procedure follows straightforwardly from the fact that the resulting sample satisfies:

$$ {\cal P}^{\ast} \left(Z^{\ast} ( {\rm s}_1 )=X({\rm t}_{i_1}), {\ldots}, Z^{\ast}( {\rm s}_k )=X({\rm t}_{i_k}) \right)= {\cal P}^{\ast} \left( Z^{\ast} ( {\rm s}_1 )=X({\rm t}_{i_1}), Z^{\ast} ( {\rm s}_2 )=X({\rm t}_{i_2}) \right)\cdot \prod\limits_{j=3}^{k} {\cal P}^{\ast} \left( Z^{\ast} ( {\rm s}_j )=X({\rm t}_{i_j}) \left/ Z^{\ast} ({\rm s}_{j'} )=X({\rm t}_{i_{j'}}), j'<j \right. \right) =\frac{ p_{{\rm s}_1,{\rm s}_2}^{{\rm t}_{i_1},{\rm t}_{i_2},h_1} }{\sum\limits_{i_1=1}^{n} \sum\limits_{i_2=1}^{n} p_{{\rm s}_1,{\rm s}_2}^{{\rm t}_{i_1},{\rm t}_{i_2},h_1}} \prod\limits_{j=3}^{n}\frac{ p_{{\rm s}_{j-1},{\rm s}_j}^{{\rm t}_{i_{j-1}},{\rm t}_{i_j},h_{j-1}} }{\sum\limits_{i_j=1}^{n} p_{{\rm s}_{j-1},{\rm s}_j}^{{\rm t}_{i_{j-1}},{\rm t}_{i_j},h_{j-1}}}=p_{i_1,{\ldots},i_k}^{(2)}$$

on account of the multiplication rule of probability, where \({\cal P}^{\ast}\) denotes the probability, conditional on the sample \(\{ Z( {\rm t}_1 ), {\ldots} , Z( {\rm t}_n )\}. \) Then, proceeding as just indicated, a Bootstrap sample from \(\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)}\) can be generated.

On the other hand, if the aim is that of drawing replicates from the continuous distribution estimator \(\tilde{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}, \) we should additionally extract a random sample of size k from the density L, denoted by \(\{V_1, {\ldots},V_k\}\) and independent of \(\{ Z( {\rm t}_1 ), {\ldots} , Z( {\rm t}_n )\}. \) Then, the continuous version of the replicates for the random process Z, at locations \(\{{\rm s}_1, {\ldots}, {\rm s}_k\}, \) would be constructed as:

$$ \{X ( {\rm t}_{i_1} )+h V_1, {\ldots}, X ( {\rm t}_{i_k} )+h V_k\} $$

To justify the generation of Bootstrap samples as described, for the continuous estimator, bear in mind that:

$$ {\cal P}^{\ast} \left( X ( {\rm t}_{i_1} ) +h V_1 \leq x_1, {\ldots}, X ( {\rm t}_{i_k} )+h V_k \leq x_k \right) = {\cal P}^{\ast} \left( V_1 \leq \frac{x_1 - X( {\rm t}_{i_1} )}{h}, {\ldots}, V_k \leq \frac{x_k -X( {\rm t}_{i_k} )}{h} \right) ={\rm E}^{\ast} \left[ {\cal L} \left( \frac{x_1 - X( {\rm t}_{i_1} )}{h} \right) {\ldots} {\cal L} \left( \frac{x_k -X( {\rm t}_{i_k} )}{h} \right) \right]= \sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_k=1}^{n} {\cal P}^{\ast} \left(Z^{\ast} ( {\rm s}_1 )=X ( {\rm t}_{i_1} ), {\ldots}, Z^{\ast}( {\rm s}_k )=X ( {\rm t}_{i_k} ) \right) \cdot{\cal L} \left( \frac{x_1 -X ( {\rm t}_{i_1} )}{h} \right) {\ldots} {\cal L} \left( \frac{x_k -X ( {\rm t}_{i_k} )}{h} \right) =\sum\limits_{i_1=1}^{n} {\ldots} \sum\limits_{i_k=1}^{n} p_{i_1,{\ldots},i_k}^{(2)} {\cal L} \left( \frac{x_1 -X ( {\rm t}_{i_1} )}{h} \right) {\ldots} {\cal L} \left( \frac{x_k -X ( {\rm t}_{i_k} )}{h} \right)=\tilde{F}_{{\rm s}_1,{\ldots}, {\rm s}_k} \left( x_1,{\ldots}, x_k \right)$$

by the conditions required from the variables V j , where \({\rm E}^{\ast}\) denotes the expectation, conditional on the sample \(\{ Z( {\rm t}_1 ), {\ldots} , Z( {\rm t}_n )\}. \)

The Bootstrap approaches can be used to approximate unknown parameters, estimate standard errors, make inference on the correlation structure or on the distribution function of the random process. Suppose, for instance, that \(T=T\left( Z( {\rm s}_1 ), {\ldots} , Z( {\rm s}_k ) \right)\) is an estimator of interest, dependent on the data and on the underlying distribution \(F_{{\rm s}_1 , {\ldots} , {\rm s}_k}. \) Denote by \(T^{\ast}\) its Bootstrap counterpart, namely, \(T^{\ast}=T\left( Z^{\ast}( {\rm s}_1 ), {\ldots} , Z^{\ast}( {\rm s}_k ) \right), \) for a Bootstrap sample \(\{Z^{\ast } ( {\rm s}_1 ), {\ldots} , Z^{\ast}( {\rm s}_k )\}\) obtained by either of the resampling methods proposed. Then, the unknown characteristic of T, depending on \(F_{{\rm s}_1 , {\ldots} , {\rm s}_k}, \) can be approximated by that of \(T^{\ast}, \) under the distribution estimator selected. For the latter aim in practice, we can compute the corresponding sample characteristic of B values \(T^{\ast (b)}=T\left( Z^{\ast (b)}( {\rm s}_1 ), {\ldots} , Z^{\ast (b)}( {\rm s}_k ) \right), \) derived for B replicates \(\{Z^{\ast (b)} ( {\rm s}_1 ), {\ldots} , Z^{\ast (b)}( {\rm s}_k )\}, \) for \(b=1,{\ldots},B\) and large B.

4 Application examples

We now describe some examples of the practical usefulness of the methodology proposed in this manuscript to generate Bootstrap replicates from \(\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)}\) (or its continuous counterpart \(\tilde{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}\)), which does not require the estimation of the corresponding multivariate distribution. Firstly, these methods are applied to simulated data and, then, an example with a real data set of air quality indicators is presented.

4.1 Numerical studies with simulated data

In order to analyze the performance of the Bootstrap approaches suggested in Sect. 3, we carried out several numerical studies with simulated data on the unit square \(D=[0,1] \times [0,1] \subset {I\!R}^2. \) A complete spatial randomness design was assumed, so the sample locations were uniformly distributed on D. With the spatial locations t i , obtained for \(i= 1,{\ldots}, n\) and n = 50, stationary gaussian data Z(t i ) were generated, by assuming zero mean and by selecting a valid model for the variogram to specify the spatial dependency, as follows:

$$ Z({\rm s})=\mu({\rm s})+Y({\rm s}),\quad {\rm with}\, \mu({\rm s})=0 \quad {\rm and}\ Y({\rm s}) \sim {\rm SGP}(0,\sigma^2,\rho(.;0.2)) $$

In particular, we considered the isotropic exponential and spherical variograms, with a partial sill σ 2 equaling 1 or 2.25 (or asymptotic partial sill, for the exponential model), a range ϕ = 0.2 (or practical range, for the exponential model) and a null nugget effect or a nugget effect τ2 = 0.09.

For the implementation of the resampling algorithm given in the preceding section, we selected k = 15 sites, \(\{{\rm s}_1,{\ldots},{\rm s}_k\}, \) among the set of the sample locations \(\{{\rm t}_1,{\ldots},{\rm t}_n\}, \) when avoiding those points too close to the boundaries of the observation region. To generate a Bootstrap sample on these k locations, one needs to derive weights \(p^{(2)}_{i_1,{\ldots},i_k}, \) as explained in steps 3 and 5 of the algorithm described in Sect. 3. With this purpose, we took K as the Epanechnikov kernel and the bandwidths \(h_1,{\ldots},h_{k-1}, \) based on a balloon estimator, were computed by considering the m-nearest differences in the kernel function and by guaranteeing that 15 %, for h 1, and 30 %, otherwise, of all distances were used. Given the probabilities \(p^{(2)}_{i_1,{\ldots},i_k}, \) the indices i j were then chosen by a classic accept–reject method. In this respect, note that the stochastic processes \(X(\cdot )\) and \(Z (\cdot )\) are the same when the trend function is constant, as pointed out in Remark 2.

The smoother Bootstrap version was acquired by applying the continuous distribution estimator \(\tilde{F}_{{\rm s}_1,{\ldots},{\rm s}_k}\) in (9), where function \({\cal L}\) was chosen as the standard normal distribution. The corresponding optimal bandwidth was elected by cross-validation, as explained in Sect. 2, among a reasonable set of bandwidth candidates. Then, Bootstrap replicates were generated for the simulated data, under the aforementioned conditions, aiming to analyze the performance of the proposed approaches for the following issues:

  1. (1)

    The estimation of the variance of the spatial process, as a common parameter for the overall process, \({\rm Var}\left[Z({\rm s})\right]={\rm E}\left[Z({\rm s})^2\right]-{\rm E}\left[Z({\rm s})\right]^2. \)

  2. (2)

    The comparison of the discrete and continuous estimators of the univariate distribution, denoted by \(\hat{F}_{{\rm s}}\) and \(\tilde{F}_{{\rm s}}, \) as given in (8) and (10), respectively.

  3. (3)

    The approximation of the variogram, as a function modeling the spatial dependence, \(\gamma({\rm t})=0.5{\rm Var}\left[Z({\rm s})-Z({\rm s}+{\rm t})\right]=0.5\cdot{\rm E}\left[\left(Z({\rm s})-Z({\rm s}+{\rm t})\right)^2\right]. \)

The first numerical study was designed to compare the discrete and continuous resampling methods for approximation of the variance of the spatial process Z(s). Proceeding as described above, the theoretical expectations involved were approximated through a sample average obtained from B = 500 Bootstrap replicates and 200 samples. The data were simulated under the exponential model, with σ 2 equal to 1 or 2.25, ϕ = 0.2 and τ2 = 0.09, so the theoretical variance, whose true value is Var[Z(s)] = σ 2 + τ2, amounts to 1.09 or 2.34. The resulting absolute errors associated to the estimation of the variance of the spatial process are represented in Fig. 1.

Fig. 1
figure 1

Estimated absolute errors of Var[Z(s)] for 500 discrete and 500 continuous Bootstrap replicates of 200 samples. The data were simulated with the exponential model and each sample size is 50. The true values of Var[Z(s)] are 1.09 and 2.34 in the left and right panels, respectively

The reduced values displayed in Fig. 1 show a good behavior of both Bootstrap approaches, although giving some advantage to the smooth version, despite the fact that an additional bandwidth must be estimated. Furthermore, one should bear in mind that the underlying continuous distribution asks for a continuous tool to make inference and, particularly, it avoids obtaining repeated values.

Aiming to proceed with a deeper analysis to compare the proposed Bootstrap approaches, a new simulation study was carried out, focused on the estimation of the unidimensional distribution F s. Observe that F s(x) = F(x), for all s, since the trend function has been taken to equal 0 at all locations. Five thresholds x were selected, identifying the quantiles 5, 25, 50, 75 and 95 % as being representative of the distribution domain, respectively denoted by P i , with i = 5, 25, 50, 75, 95. For each x, we derived the discrete and continuous estimators from the sample locations \(\{{\rm t}_1,{\ldots},{\rm t}_n\}, \) given by:

$$ \begin{aligned}\hat{F} (x)&=\sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n}\frac{K\left( \frac{{\rm t}_{j}-{\rm t}_{i}}{h_1}\right) I_{\{Z({\rm t}_{i}) \leq x \}} I_{\{ Z({\rm t}_{j}) \leq x\}}}{\sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} K\left( \frac{{\rm t}_{j}-{\rm t}_{i}}{h_1}\right)} \\ \tilde{F} (x) &= \sum_{i=1}^{n} \sum_{j=1}^{n} \frac{K\left( \frac{{\rm t}_{j}-{\rm t}_{i}}{h_1}\right) {\cal L} \left( \frac{x - Z({\rm t}_i)}{h} \right) {\cal L} \left( \frac{x - Z({\rm t}_j)}{h} \right)}{\sum_{i=1}^{n} \sum_{j=1}^{n} K\left( \frac{{\rm t}_{j}-{\rm t}_{i}}{h_1}\right)} \end{aligned}$$

The discrete and the continuous Bootstrap methods were used to approximate the errors of both estimators. With this aim, k locations s i were chosen, where B discrete and continuous replicates were generated to derive the Bootstrap analogues of \(\hat{F} (x)\) and \(\tilde{F}(x), \) denoted by \(\hat{F}^{\ast b} (x)\) and \(\tilde{F}^{\ast b} (x)\) for the b-th replicate and \(b=1,{\ldots},B. \) Then, we computed \(( B^{-1} \sum_{b=1}^{B} ( \hat{F}^{\ast b} (x) - \hat{F} (x) )^2 )^{1/2}\) and \((B^{-1} \sum_{b=1}^{B} ( \tilde{F}^{\ast b} (x) - \tilde{F} (x) )^2 )^{1/2}, \) which provide us with approximations of the Bootstrap standard errors of the discrete and continuous distribution estimators, respectively. Table 1 summarizes the results obtained from 500 Bootstrap replicates and 200 data sets simulated under the exponential model, with σ 2 = 1,  ϕ = 0.2 and τ2 = 0.09.

Table 1 Mean and standard deviation of the standard errors (SE) obtained for the discrete and continuous distribution estimators, for 500 discrete and 500 continuous Bootstrap replicates of 200 samples

The small values presented in Table 1 make clear the good performance of the two Bootstrap approaches. In generic terms, it seems advantageous to adopt the continuous estimator \(\tilde{F}\) over the discrete estimator \(\hat{F}, \) regardless of the resampling method considered. On the other hand, both Bootstrap procedures provide similar estimates of the accuracy of \(\tilde{F}, \) while the difference is more evident for approximation of the distribution through \(\hat{F}. \)

Taking into account the foregoing results, the smoother Bootstrap version will be applied in the following studies. We will now compare the estimation of the total variance of Z(s), with another spatial approach, such as the parametric estimation. With this idea, parametric variograms were obtained by selecting valid models and deriving maximum likelihood (ML) estimates to approximate the unknown parameters. The exponential and the spherical models were used for the latter purpose, thus providing two different settings, depending on whether the parametric candidate coincides with the theoretical model or it is affected by misspecification. The resulting values are shown in Table 2, where data were simulated under the exponential and the spherical models, with σ 2 = 1, ϕ = 0.2 and τ2 = 0.

Table 2 Mean and standard deviation of the absolute errors (AE) associated to the estimation of Var[Z(s)], whose true value is 1, for 500 Bootstrap replicates and the ML estimates

According to the values presented in Table 2, the Bootstrap replicates offer more accurate estimates of Var[Z(s)] than the ML approaches, even when assuming knowledge of the true variogram model. Surprisingly, the prior knowledge of the parametric family is not always advantageous, as illustrated in the case presented in Fig. 2. The latter can be explained by the fact that, under the parametric approach, the approximation derived for the total variance is strongly dependent on the appropriate characterization of the model parameters rather than on the overall variogram function.

Fig. 2
figure 2

The left panel represents one simulated data set with n = 50 sample locations, where each bullet size is proportional to the corresponding measured value. The selected k = 15 locations in the left panel are displayed after being reordered. The right panel represents the variogram estimators, where the theoretical model chosen to generate the data is the exponential one. The bullets in the right panel represent the Bootstrap variogram estimates obtained for 500 replicates

In view of the latter, a further step in this research was to analyze the behavior of the resampling methods when dealing with the estimation of the variogram function γ. To do the latter, the integrated quadratic error, ISE \(=\int ( \gamma({\rm t})-\hat{\gamma}({\rm t}) )^2 d{\rm t}, \) was approximated numerically, for each data set and for each of the estimators \(\hat{\gamma}\) implemented, including a valid version obtained through the Bootstrap approach. We started by deriving the empirical Bootstrap estimates of the variogram and then fitting them, through an iterated weighted least squares criterion, to a class of permissible variograms, following the procedure developed in Shapiro and Botha (1991). Proceeding in this way, the validity of the resulting estimator was guaranteed with no prior specification of a parametric model. Figure 2 represents the example of a data set simulated with an exponential model together with the resulting variogram estimators, one given by the Bootstrap approach and the other two acquired by ML.

The procedure described above was repeated for 200 samples of size n = 50, for data simulated under the exponential and spherical models, with σ 2 = 1,  ϕ = 0.2 and τ2 = 0. The ISE values were computed and Table 3 summarizes the results obtained.

Table 3 Mean and standard deviation of the ISE values obtained through the Bootstrap estimator combined with Shapiro and Botha’s method and the ML estimators

The results displayed in Table 3 give account of the good performance of the Bootstrap approach when addressed to estimate the spatial dependence of the random process in terms of the variogram. The ML estimates improve the behavior of the resampling procedure for small distances, although the Bootstrap method competes with the parametric approach for lags larger than 0.05.

4.2 Application to environmental monitoring data

In this section we derive an application of the Bootstrap methodology to a real data set concerning biomonitoring of arsenic pollution in the Central Region of Portugal, classified as NUTS II (NUTS stands for ”Nomenclature of Units for Territorial Statistics”). The measured variable represents the concentrations in moss samples, in micrograms per gram dry weight. The typical procedure, alternative to the more expensive solution of determining the amount of pollutant directly, is to plant the moss and some time later to collect it, which allows the concentration of arsenic (and other heavy metals) to be measured. More details on this Portuguese project of air pollution analysis can be found in Martins et al. (2012).

The data set was collected in 2006 and it can be represented by \(\{({\rm t}_i, Z({\rm t}_i)), i=1,{\ldots}, n\}, \) with n = 98 and Z(t i ) identifying the log-transformed concentration of arsenic (As) at location t i . We adopted the log-transformation to reduce the impact of the presence of outliers. Afterwards, there were still three gross outliers, which were replaced by the average of the remaining values from that year’s survey. Table 4 gives the summary statistics for the resulting data, showing that the log-transformation leads to a more symmetric distribution. Furthermore, Fig. 3 presents the spatial representation of log-transformed data, where each bullet size is proportional to the corresponding measured value.

Table 4 Summary statistics for arsenic pollution levels measured in the Central Region of Portugal (NUTS II)
Fig. 3
figure 3

Spatial representation of moss data in the Central Region of Portugal (NUTS II). The size of the bullets, representing the sampled locations, is proportional to the measured value. The points marked with numbers are used to generate the Bootstrap replicates. The points identifying the three locations { s A , s B , s C } are the goal of prediction for Z(.)

Aiming to exemplify the usefulness of the proposed Bootstrap approaches in this application, we first estimate a deterministic model for E[Z(s)] = μ(s) with \({\rm s}=({\rm UtmX}, {\rm UtmY}) \in D \subset {I\!R}^2\) and D identifying the region of NUTS II in Portugal. We then assume that the random process Z(s) can be modeled as:

$$ Z({\rm s})=\mu({\rm s})+Y({\rm s}) $$

where \(\{Y({\rm s}): s \in D\}\) is a zero-mean strictly stationary random process and μ(s) = α 0 + α 1 × UtmY. These regression coefficients were estimated, presenting statistically significant values α 0 = −37.8 and α 1 = 0.0083.

To derive weights \(p^{(2)}_{i_1,{\ldots},i_k}, \) associated to the multivariate distribution \(\hat{F}_{{\rm s}_1,{\ldots}, {\rm s}_k}^{(2)}, \) one needs to select locations \({\rm s}_1,{\ldots},{\rm s}_k \in D. \) So, we proceeded by choosing s i , for \(i=1,{\ldots},k\) and k = 40, among the 98 sample locations, as represented in Fig. 3. The probabilities \(p^{(2)}_{i_1,{\ldots},i_k}\) were acquired as explained in steps 3 and 5 of the resampling algorithm given in Sect. 3, which allowed us to generate Bootstrap samples by taking into account the dependence structure of the underlying random process.

With 500 replicates, we estimated the total variance of the process \(Y(\cdot ), \) whose results are given in Table 5, where three different approaches were considered. According to the numerical studies presented in Sect. 4.1, the value 0.886 seems an accurate approximation of Var[Y(s)].

Table 5 Estimates of the variance of the log-transformed arsenic pollution level, by considering the discrete and continuous versions of the Bootstrap replicates and assuming an exponential covariance model fitted to the observed data through ML

To highlight the potentiality of the Bootstrap techniques within the scope of dependent data, one of their applications, pointed out in Sect. 1, will be addressed. In particular, we will focus on the estimation of the accuracy of a spatial approach, such as the nonparametric spatial predictor proposed in Menezes et al. (2010), under stochastic sampling design. The use of the latter kernel-based predictor demands an optimal bandwidth, which can be defined as dependent on the target location, offering better results than when a global optimal bandwidth is adopted. A drawback of the aforementioned predictor is that no estimation of the prediction error is available.

Here, we suggest to estimate standard errors through the Bootstrap approaches specified along this manuscript. In fact, if a prediction value is obtained for each Bootstrap sample (for a total of B replicates), then a simple method to approximate the unknown standard error is the standard deviation of those prediction values. Results are summarized in Table 6, for three points randomly chosen in the NUTS II region, represented as s A , s B and s C in Fig. 3. The nonparametric predictions were computed as established in Menezes et al. (2010), by using optimal local bandwidths equal to 75.5, 36.9 and 90, for s A , s B and s C , respectively.

Table 6 Estimates of the prediction error of the log-transformed arsenic pollution level at locations s A , s B and s C . For each nonparametric prediction (NP Pred), two different estimates of the standard errors (SE) are presented, by considering the discrete and continuous versions of the Bootstrap replicates. Results from OK are given in the two right columns

For each nonparametric prediction, two different estimates of the standard errors are presented, obtained by considering either the discrete or the continuous versions of the Bootstrap replicates. As complementary information, we also present the prediction results derived by application of the ordinary kriging (OK) in Table 6. The latter can allow us to conclude that the Bootstrap estimators point out smaller values for the standard errors than those obtained through the OK. The largest standard errors are associated to the location s B , where less information is available, since that area includes less sampled data.

As a last note, we have back-transformed the predicted values and added the trend information, to obtain the estimates of the process Z(s) at those three target locations, leading to \(\hat{Z}({\rm s}_A)=1.153, \,\hat{Z}({\rm s}_B)=0.878\) and \(\hat{Z}({\rm s}_C)=0.354\) (under OK, the corresponding values were 1.187, 0.818 and 0.345). Knowing that 50 % of the locations have a concentration of arsenic smaller than 0.57, it is possible to conclude that s C is one of the locations with lower intensity of air pollution, as opposed to s A and s B .

5 Conclusions

In this paper consistent estimators of the multivariate distribution function have been proposed, which can be used as the basis for implementation of Bootstrap approaches in the spatial setting. The resampling method derived from the discrete estimator is an adaptation to this setting of the naive Bootstrap described in Efron (1979) and has similar properties, such as consistency or the fact that nearly every sample derived from it contains repeated values. The alternative version, obtained in the current work by applying a continuous distribution estimator, is the analogue of the smoothed Bootstrap approach for independent data (Lejeune and Sarda 1992). An advantage of the second approach is that it entails resampling from a continuous distribution and, therefore, it avoids the aforementioned problem of providing repeated data in the replicates. However, an additional uncertainty is introduced, in terms of the bandwidth parameter that must be estimated. For independent data, the question of whether the smoothed Bootstrap is superior to the naive alternative, and for which smoothing parameter, has been analyzed by a few authors (Silverman and Young 1987; Hall 1992), but no definitive answers exist. Therefore, the difficulties to check the behavior of the resampling approaches in the spatial setting increase, because of the underlying dependence structure. In this respect, although further research should be developed, the numerical studies conducted in the current work give account of a good performance of both procedures, with a little superiority of the continuous spatial Bootstrap, when the bandwidth parameter is appropriately selected.

On the other hand, as pointed out in the introduction, the Bootstrap methodology allows the researchers to solve different statistical problems inherent to the estimation process. It must not be intended as a substitute of other techniques designed for addressing specific issues, but for complementing them and adding extra information. From this perspective, the Bootstrap proposals offer an attractive alternative for resampling in the spatial setting. Such approaches aim at reproducing the data dependence structure, before deriving subsamples. In this respect, the numerical studies developed with simulated data give account of the good behavior of the resampling methods, which can be even advantageous over other procedures for estimation of the variance or the variogram function, although the really important thing is that they help capture the main features of the underlying spatial process. Consequently, the generation of replicates offer an accurate alternative to derive estimates of the standard error or any other unknown characteristic of the statistical approach that can be considered.