Introduction

Site-specific management (SSM) has been developed and applied to annual cropping systems since the late 1980s, and there is an abundance of literature about the impact of soil, microclimate and other landscape-scale influences on crop production. In a review of SSM and related technologies, Plant (2001) stated that “Three criteria that must be satisfied in order for SSM to be justified are, (1) that significant within-field spatial variability exists in factors that influence crop yield, (2) that, causes of this variability can be identified and measured, and (3) that, the information from these measurements can be used to modify crop management practices to increase profit or decrease environmental impact.” In relation to the first two criteria, much of the published work to date has focused on annual cropping systems with mechanical harvesting technology to provide spatial data on yield. The research on perennial crops generally lacks discussion about the effect of variation in the individual plant, on yield or on the delineation of management zones; the latter are based on soil properties and landscape scale factors.

The research that has been done does indicate that perennial crop yields vary spatially. Whelan and McBratney (2001) included a comprehensive review of the available literature (Table 3), highlighting the spatial structure of the observed variability. The variogram analyses cited had ranges from 22 m (for soil moisture) to 180 m (for soil P). Pozdnyakova et al. (2005) measured and analyzed yields in cranberry bogs at three scales representing within- and between-field variability. The results of their variogram analysis indicated that at both spatial scales a spherical model provided a good fit with ranges of about 3.5 m within fields and 2300 m between fields. Zaman and Schumann (2006) showed that the normalized difference vegetation index (NDVI) and soil organic matter (SOM) could be used to delineate low, medium and high yielding management zones for variable-rate applications of soil amendments within a citrus grove. There was a strong soil influence on productivity as excessive Cu in the low SOM areas of the grove induced Fe deficiency, which was visible as chlorotic disorders of the foliage and stunted tree growth. They fitted spherical variogram models to NDVI, soil and leaf nutrient data with ranges of 234–255 m (the extent of the citrus grove was about 800 m × 800 m). The soil and plant variables recorded were significantly different in each of the delineated management zones. Similarly, Castrignanò et al. (2008) generated maps of the risk of soil salinization for a citrus growing area. The maps categorized the risk as high, medium and low salinity, based on soil properties and distance to the sea. Multi-collocated indicator cokriging and factorial kriging were used; the variogram models had ranges of 750–2000 m. Other recent research involving perennial crops certainly highlights the effect of spatial influences at landscape scales, for example in grapes (e.g. Lamb et al. 2004; Bramley and Hamilton 2004; Johnson et al. 2003) and citrus crops (Tumbo et al. 2002; Whitney et al. 2001a, b).

Tree fruit differs from most annual and some perennial crops in that most of the plants grown within an orchard block are cloned, suggesting that there should be minimal biological variability between them compared with annual crops. However, orchard management and site-specific soil and climate effects are superimposed on this homogeneity. Some management practices such as pruning, thinning and harvesting are done on a per tree basis, whereas other management practices, such as irrigation and fertilizer application are applied uniformly over larger areas, up to entire orchard blocks. Therefore we would expect to see both tree to tree variability, as well as landscape scale effects based on the soil, microclimate and management. Wang et al. (2006) recorded yield, cost and profit data by tree over a 0.87 ha block of 272 pear trees. They used data envelope analysis, a non-parametric linear modeling method used in economics, to determine the revenue capacity efficiency for each tree, and to identify the limiting inputs (soil properties, micro- and macro-nutrients, or other environmental effects). Much of the soil data was aggregated, and spatial differences were compared by dividing the field into 14 blocks. However, even with this coarse blocking factor, spatial effects were evident.

In this paper we used this same unique data set of individual tree yield and trunk size data as Wang et al. (2006), but also included higher spatial resolution soil data and remotely sensed data to address unanswered questions related to management zones for these tree fruit. We also applied spatial statistics that have not been used for precision agriculture, and that might be useful for justifying management zones. Specifically, we wanted to determine whether yield (per tree) and tree sizes were spatially correlated or unrelated over the field, and if correlated to determine the spatial structure. We used the autocorrelation statistic, Moran’s I (described later), which was applied both globally and locally. We also examined whether spatial stationarity (Brunsdon et al. 1998) holds, i.e., whether linear regression models described for the entire orchard block hold equally well for any subset of the block. This has important implications for management zones: if these relationships do not hold, at what spatial scales do the models differ? To address these questions, we investigated spatial stationarity by applying geographically weighted regression (described later).

Description of the orchard site

The research orchard used for this study is described in Wang et al. (2006). It is located near Hood River, Oregon, approximately longitude 121°41′ west and latitude 45°41′ north. Soil samples from the site indicate that the soil texture is a sandy loam with an average of 16 cmol kg−1 organic matter. The mean growing season (March–August) precipitation measured at the site weather station from 1971 to 2004 was 0.208 m (Oregon State University 2005), therefore, the tree fruit crops depend on additional water from irrigation. The research orchard is a pear (Pyrus communis) rootstock trial of d’Anjou variety planted on four different rootstocks (Fig. 1). The orchard is approximately 170 m west to east and 60 m north to south in extent. The tree spacing is approximately 3 m along the rows, which run west to east, and 6 m between rows.

Fig. 1
figure 1

Layout of research orchard with soil sampling locations

Tree and soil measurements

We focused on data that were recorded and related to individual trees. Trunk cross-sectional area (TCSA) was determined from circumference measurements made in 2003 just above the graft line of each tree. Yield data were collected in 2002 and 2003, expressed as total fruit picked (kg) per tree. Remote sensing was used to access crop vigor during 2002 and 2003. Space Imaging IKONOS imagery (www.spaceimaging.com) were acquired in June, July and August of 2002, and Digital Globe Quickbird imagery (www.digitalglobe.com) was acquired in June of 2003. The imagery was converted to reflectance values based on within scene targets and the normalized difference vegetation index (NDVI; Rouse et al. 1974) was determined as:

$$ NDVI = \frac{{R_{\text{nIR}} - R_{\text{red}} }}{{R_{\text{nIR}} + R_{\text{red}} }}, $$
(1)

where R nIR is the near infrared band and R red is the red band of the imagery. The 0.6 m panchromatic band was registered to an orthorectified aerial image, and the tree crowns were delineated. The corresponding pixel for each tree was selected from the co-registered NDVI image. Soil samples were taken in 2005 and 2006 near a subset of the trees in the orchard block (Fig. 1). The soil was sampled at a depth of 0–0.015 m from 4 to 5 sites within a 1 m radius of each tree and mixed together to form a composite sample. Soil analysis of these samples was done by a commercial laboratory using standard techniques to determine TEC, pH, organic matter and crop nutrients. A subset of the soil samples was analyzed for soil texture (sand, silt and clay).

Statistical methods

For the analysis of spatial heterogeneity and stationarity, we used Moran’s I (Moran 1948), localized application of Moran’s I (Anselin 1995), geographically weighted regression (Fotheringham et al. 2002) and variogram analysis. We briefly introduce and describe our application of these statistical tools and then provide some explanation of our determination and relevance of spatial stationarity.

Moran’s I is an index of the degree of spatial autocorrelation, as described by Cliff and Ord (1981):

$$ I = \frac{{n\sum_{i = 1}^{n} {\sum_{j = 1}^{n} {{\mathbf{W}}_{ij} \left( {x_{i} - \bar{x}} \right)\left( {x_{j} - \bar{x}} \right)} } }}{{\sum_{i = 1}^{n} {\sum_{j = 1}^{n} {{\mathbf{W}}_{ij} \left( {x_{i} - \bar{x}} \right)^{2} } } }}. $$
(2)

The global autocorrelation (I) is computed for the variable of interest (X) based on the variable mean (\( \bar{x} \)) and contiguity matrix W i,j for all n spatial units indexed by i,j. The expected mean and variance for random distributions is determined as

$$ {\text{E}}\left( I \right) = \frac{1}{{\left( {n - 1} \right)}}. $$
(3)
$$ {\text{E}}_{\text{N}} \left( I \right) = \frac{{n^{2} s_{1} - ns_{2} + 3s_{0}^{2} }}{{s_{0}^{2} \left( {n^{2} - 1} \right)}}, $$
(4)

where

$$ S_{0} = \sum\limits_{i = 1}^{n} {} \sum\limits_{j = 1}^{n} {{\mathbf{W}}_{ij} } , $$
(5)
$${S_{1} = \frac{1}{2}\sum\limits_{i = 1}^{n} {} \sum\limits_{j = 1}^{n} {\left( {\mathbf{W}}_{ij} + {\mathbf{W}}_{ji} \right)}^{2}} , $$
(6)

and

$$ S_{2} = \sum\limits_{i = 1}^{n} {\left( {\sum\limits_{j = 1}^{n} {{\mathbf{W}}_{ij} } + \sum\limits_{j = 1}^{n} {{\mathbf{W}}_{ji} } } \right)^{2}}. $$
(7)

Then the test statistic under a null hypothesis of complete spatial randomness is:

$$ {\text{z = }}\frac{{I - {\text{E}}\left( I \right)}}{{\sigma^{2} \left( I \right)}}. $$
(8)

Positive values of I indicate spatial dependence among values. If Z ≥ 1.96 and I ≈ 1.0 then X is strongly spatially structured at the α = 0.025 level of confidence; if Z ≤ −1.96 and I ≈ −1.0 then X is uniform at α = 0.025; if −1.96 < Z < 1.96 and I ≈ 0 then X is uncorrelated.

Anselin (1995) showed that Moran’s I can be applied locally to evaluate the degree of autocorrelation for a given location, where the summation for i,j is over a local neighborhood, resulting in a value I i computed for each sample. The localized Moran’s I provides a test of locational relevance of the global Moran’s I and thus serves to provide locational information on deviations from an expectation of strict stationary (Anselin 1995). As with the global Moran’s I, large positive values indicate strong local autocorrelation. In addition, the expected value, variance, Z statistic and probability of Z, p(Z), were computed for each sample. In this paper, we have used the term spatial clustering to indicate neighborhoods of trees or soil samples displaying a high degree of spatial autocorrelation. The size of these neighborhoods was determined based on the dimensions of the kernel used, and the degree of clustering was measured by p(Z). Both the global and localized Moran’s I were computed using ArcGIS software (ESRI, Redlands, CA, USA).

Geographically weighted regression (GWR) provides a tool for identifying when the assumptions of strict stationarity do not hold, as well as a means of mitigating any violations using localized model fitting. For our applications, a local linear regression was fitted to each tree or soil sample. Model parameters were determined based on the samples within the band width (kernel size) surrounding that tree, allowing the relationships being measured to vary over space:

$$ y\left( {\mathbf{g}} \right) = \beta 0\left( {\mathbf{g}} \right) + \beta 1\left( {\mathbf{g}} \right)x 1+ \beta 2\left( {\mathbf{g}} \right) x 2+ \cdots \beta n\left( {\mathbf{g}} \right)xn + \varepsilon \left( {\mathbf{g}} \right), $$
(9)

where (g) refers to a location at which estimates of the GWR parameters are obtained and the β are the regression coefficients. The parameters are determined as:

$$ \beta^{\prime} = \left( {X^{T} {\mathbf{W}}({\mathbf{g}})X} \right) - X^{T} {\mathbf{W}}({\mathbf{g}})Y, $$
(10)

where W(g) is a matrix of weights specific to location g such that observations nearer to g are given greater weight than observations further away. This local linear regression generates a coefficient of determination, R 2, and model parameters at each point (tree). Details of the method are described in Fotheringham et al. (2000, 2002). The GWR analysis was done with the GWR3 software (University of Newcastle upon Tyne, Newcastle upon Tyne, England). To determine the optimum band width for localized fitting, we minimized Akaike’s information criterion (AIC; Akaike 1974), following Hurvitch et al. (1998) and Fotheringham et al. (2002):

$$ AIC = 2K + n\left[ {\ln \left( {\frac{2\pi RSS}{n}} \right) + 1} \right], $$
(11)

where k is the number of parameters (one for this research), n is the number of samples, and RSS is the residual sum of squares.

Fixed band widths starting with this optimum size were used and then increased until the results were not significant. Adaptive band widths were also used to determine the number of samples to include in a kernel, as an alternative to a fixed distance. We computed the significance for the spatial variation in the estimates of the model parameters using a Monte Carlo significance method (Hope 1968; Fotheringham et al. 2000). This significance tests the null hypothesis of spatial independence, that is, of the likelihood of the spatial model parameters given that the global regression model actually holds.

Variogram analysis (Deutsch and Journel 1998) was used to quantify the spatial scales of yield, TCSA and NDVI. GenStat (Payne et al. 2008) was used to compute the experimental variograms and to fit several models. Variograms of the soil properties were estimated by the residual maximum likelihood (REML) method using the MLREML program of Pardo-Igúzquiza (1997) as this method may require fewer data (e.g., Pardo-Igúzquiza 1998; Kerry and Oliver 2007) than the more usual method of moments variogram estimator. In this approach, linear combinations of the data (generalized increments) are used rather than the original data. The model parameters are calculated directly from the generalized increments of a covariance matrix of the full data. Note that for the REML method there is no experimental variogram, so only the fitted model and parameters are shown.

Stationarity and isotropy

Several definitions of stationarity are used in this study and are either implemented as assumptions or tested for validity. Stationarity implies that a generating process is consistent over the study area for which observations are obtained, and it is essential for the creation of consistent point and interval estimators resulting from spatial processes. Usually, a process is assumed to be stationary, but new methods have been developed to assess whether this is the case. The assumptions of stationarity and isotropy required for the statistical tools used in our study are summarized in Table 1.

Table 1 Stationarity and isotropy assumptions for statistical methods used

Strict stationarity assumes that the distribution of the generating process is consistent over the study area and is invariant to changes in location (Schabenberger and Gotway 2005a). Strict stationarity implies that a single generating process and finite-dimensional single and joint distributions apply to all the data in the set of observations. The assumption can be evaluated by calculating Moran’s I function. In the computation of Moran’s I, only a single sample mean and variance are used as the observations are all assumed to be derived from the same generating distribution regardless of location. If strict stationarity is violated by a spatially dependent process that cannot be observed directly, then estimates based on the assumption might not be consistent or efficient estimators of the parameters of the generating distribution. For an estimator that is computed from spatially generated data, if the variable response is not stationary, the estimator cannot be guaranteed to be consistent or unbiased. Several recent studies highlight this issue for regression analysis in particular (Anselin and Kelejian 1997; Beran and Hall 1992; Lahiri et al. 2002).

The residual maximum likelihood (REML) variogram estimator requires a joint normal distribution and depends on the weaker assumptions of second-order stationarity where the mean, μ = E[X(s)], is constant for all s, and the a priori variance of the process, \( \sigma^{2} = {\text{E[\{ }}X ({\mathbf{s}}) - \mu \}^{2} ] \)is assumed to be finite and, as for the mean, the same everywhere. When two points s i and s j do not coincide, their covariance, C, depends on their separation and not on their absolute positions, and this applies to any pair of points separated by the lag h = s i  − s j (a vector in both distance and direction), so that

$$ \begin{aligned} C\left( {{\mathbf{s}}_{i} ,{\mathbf{s}}_{j} } \right) &= {\text{E}}\left[ {\left\{ {X\left( {{\mathbf{s}}_{i}} \right) - \mu } \right\}\left\{ {X\left( {{\mathbf{s}}_{j} } \right) - \mu } \right\}} \right] \\ &= {\text{E}}\left[ {\left\{ {X\left( {\mathbf{s}} \right)} \right\}\left\{ {X\left( {\mathbf{s + {\mathbf{h}}}} \right)} \right\} - \mu^{2} } \right] \\ &= C\left( {\mathbf{h}} \right), \\ \end{aligned} $$
(12)

which is also constant for a given h. This constancy of the first and second moments of the process constitutes second-order or weak stationarity.

Sometimes the variance appears to increase indefinitely as the extent of the area increases. The covariance cannot be defined then because we cannot insert a value for μ into Eq. 12. This is a departure from second-order stationarity. Matheron’s (1963) solution to this is the weaker intrinsic hypothesis of geostatistics (Cressie 1993). Although the general mean might not be constant, it would be for small lag distances and so the expected differences would be zero as follows:

$$ {\text{E}}\left[ {X\left( {\mathbf{s}} \right) - X\left( {\mathbf{s}} + {\mathbf{h}} \right)} \right] = 0, $$
(13)

and the expected squared differences for those lags define their variances

$$ {\text{E = }}\left[ {\left\{ {X\left( {\mathbf{s}} \right) - s\left( {\mathbf{s}} + {\mathbf{h}} \right)} \right\}^{2} } \right] = \text{var} \left[ {X\left( {\mathbf{s}} \right) - X\left( {\mathbf{s}} + {\mathbf{h}} \right)} \right] = 2\gamma \left( {\mathbf{h}} \right), $$
(14)

where γ(h) is the semivariance at lag h. As for the covariance, the semivariance depends only on the lag and not on the absolute positions of the data. If the process X(s) is second-order stationary, the semivariance and covariance are equivalent. However, if the process is intrinsic only there is no equivalence because the stationary covariance function (Eq. 12) does not exist. The following equation gives Matheron’s (1965) variogram estimator which is the method commonly used

$$ \hat{\gamma }\left( {\mathbf{h}} \right) = \frac{1}{{2\left( {m\left( {\mathbf{h}} \right)} \right)}}\sum\limits_{i = 1}^{{m\left( {\mathbf{h}} \right)}} {\left[ {x\left( {{\mathbf{s}}_{i} } \right) - \left( {x\left( {{\mathbf{s}}_{i} + {\mathbf{h}}} \right)^{2} } \right)} \right]} , $$
(15)

where m(h) is the number of paired comparisons.

If a generating process is stationary for small subregions or geographical subsets and the generating process can be disaggregated into simpler stationary processes where the mean is consistent over short distances or for smaller subsets, then the process is said to exhibit local stationarity (Schabenberger and Gotway 2005b). The local indicators of spatial association (LISA) statistics assume local stationarity and estimate local differences using proximal dyadic pairs of observations. The LISA (e.g. localized Moran’s I) is a local disaggregation of the global Moran statistic that enables the local assessment of spatial dependence and the evaluation of globally or strictly non-stationary dependence. The GWR estimates also assume local stationarity as defined by weighted locally stationary processes using a variety of sampling kernels. Within an estimation kernel, observations are assumed to be generated from a locally stationary process. The methods developed by Fotheringnham et al. (2000) can determine statistically significant kernel ranges or sizes within which stationarity is locally consistent. For kriging, local stationarity is also sufficient because it is a local estimator.

Results and discussion

To address our first objective, that is, determining whether yield and tree size (TCSA) are spatially clustered or autocorrelated, we computed Moran’s I for the entire orchard. Table 2 gives Moran’s I for the yields of 2002 and 2003 both by rootstock, and based on all of the rootstocks combined. Note that the Moran’s I values are positive, with large positive Z values that are significant (p < 0.01). For comparison, Moran’s I for rootstock (which we would expect to be non-clustered) is −0.047 with a Z value of −7.18. This indicates that the yields for 2002 and 2003 are spatially correlated. Likewise, the Moran’s I values (Table 2) for the tree trunk TCSA measurements also suggest spatial clustering of the tree characteristics, regardless of whether the trees were analyzed according to rootstock or when combined. In addition to TCSA, we analyzed NDVI as a surrogate measure of tree vigor. As with the yields and TCSA, global Moran’s I was computed for the available image dates when the trees were in full canopy cover in 2002 and 2003. The Moran’s I and corresponding p (Z) given in Table 3 indicate that NDVI is also spatially clustered or autocorrelated.

Table 2 Global Moran’s I for TCSA and yield by rootstock
Table 3 Global Moran’s I for NDVI for all rootstocks

If these variables are spatially correlated; what are the scales of the spatial structures? We applied the localized Moran’s I to assess the degree of autocorrelation for each tree relative to its neighbors. For this, the orchard was not subdivided on the basis of rootstock and a 30 m kernel was used, in order to include trees both within the row and across rows. The resulting maps of local Moran’s I for yield and TCSA are shown in Fig. 2. Negative values of Moran’s I are plotted as ‘NS’. Positive values are plotted according to the probability of the corresponding Z value. The maps of yield and TCSA show similar patterns for the resulting probabilities, with more significant values of Z (larger positive values of Moran’s I) at both the eastern and western ends of the orchard block, indicating stronger similarities among adjacent trees in these areas. Likewise, localized Moran’s I was computed for NDVI for 2002 and 2003 (Fig. 2). The resulting maps differ somewhat among image dates, but are similar to TCSA, especially for the June 2003 NDVI data. The combined results suggest that there are distinct zones that could be differentiated for management.

Fig. 2
figure 2

Local Moran’s I, computed with a 30 m kernel (window) and using all rootstocks

Given the evidence of spatial correlation and differences in neighborhood correlations, we wanted to address the question of strict spatial stationarity: do regression models described for the entire block hold for every subset of the block equally? We evaluated this using GWR, and selected models based on measurements that showed large correlation coefficients (not shown). These were the yields for 2003 and 2002, and also the yield for each year (as dependent variables) and TCSA. The GWR was also computed for several image dates for NDVI on TCSA, and for June 2003 NDVI on June 2002 NDVI.

The results of the GWR based on yield and TCSA were somewhat contradictory. The localized R 2, intercept and coefficients show obvious patterns (Fig. 3). Likewise, the local R 2 values and decrease in AIC generally show an improvement over the global values (Tables 4, 5). The residuals from the GWR appeared random (Moran’s I resulted in a Z score of −1.8, not shown) and slightly smaller than when using global regression as seen in Fig. 3d, which again indicates an improvement in the fit of the model. However, the spatial significance tests failed for the models at some kernel sizes greater than the initial one selected by the minimization of the AIC. For the fixed bandwidths, the tests resulted in p > 0.1 for the coefficient of 2002 yield at bandwidths of 20 m and 30 m, and for most of the 2002 and 2003 yield results on TCSA (Table 3). When using the bandwidth based on the number of samples (adaptive bandwidth), the spatial test was significant at p < 0.1 only for the 2003 yield on 2002 yield, and p was >0.1 for the other two models (Table 4). All of the regression models for GWR on NDVI failed the spatial significance test. Since the poor model fitting overall suggested that the models were not entirely explaining the variance in the dependent variables, other models were fitted with additional variables, such as soil characteristics and elevation, without any noticeable improvements (not shown).

Fig. 3
figure 3

Local geographically weighted regression applied to the relationship of 2003 yield on 2002 yield

Table 4 Comparison of global regression with GWR results
Table 5 Comparison of global regression with GWR results using an adaptive kernel

The variogram analysis results for TCSA and yield for 2002 and 2003 are shown in Fig. 4. Variograms were computed based on individual rootstocks (not shown) as well as for all rootstocks combined, but were found to be very similar. Likewise, the variograms computed in all directions, across rows, along rows and in the direction of elevation change (130 degrees from grid north) were similar, therefore only the omni-directional results are shown (Fig. 4). After experimenting with different lag intervals for the variogram of TCSA, an interval of 15 m was used which corresponds with the distance between the same rootstock values. The lag value used for the 2002 and 2003 yields was 3 m, i.e., the distance between the trees. The variograms for TCSA and yield have ranges of 30–45 m and large nugget variances (Fig. 4). Variograms determined for NDVI recorded on several dates showed little difference in structure by direction or rootstock (not shown). Figure 5 shows the variogram for each NDVI image date over all directions and all rootstocks. The variogram models for June and July of 2002, and June of 2003 indicate ranges of approximately 14 m, which represents a local neighborhood given the tree spacing (3 m between trees and 6 m between rows). The variogram for NDVI August 2002 has a range that is similar to that for yield 2002.

Fig. 4
figure 4

Variograms of: a tree cross-sectional area (TCSA), b 2002 yield and c yield 2003. Variograms were computed for all trees (n = 278) and all directions. TCSA was computed with a lag value of 15 m, whereas the yields for 2002 and 2003 were computed with lags of 3 m. The model functions are: Circ. is circular and Pentaspher. is pentaspherical, and the parameters are: c 0 is the nugget variance, c + c 0 is the sill variance and a is the range of spatial dependence

Fig. 5
figure 5

Variograms of NDVI by date. Variograms were computed for all trees (N = 278) over all directions, and using a lag value of 5 m. The model functions are: Pentaspher. is pentaspherical and Exp. is exponential, and the parameters are: c 0 is the nugget variance, c + c 0 is the sill variance, a is the range of spatial dependence for the pentaspherical function and a′ is an approximate range for the exponential function based on three times the model’s distance parameter

The TCSA, yield and NDVI are directly related to tree by tree variability, but how does their variation relate to the underlying spatial structure of the orchard’s soil? As for the tree properties, we computed Global Moran’s I and variograms for the soil properties. The results of these analyses varied according to the soil property or nutrient. Organic matter, TEC, Zn, Cu and Fe have significant values for the global Moran’s I (Table 6). Variograms of the soil properties were estimated by the residual maximum likelihood (REML) method (Pardo-Igúzquiza 1997, 1998) as there were fewer soil data than tree and NVDVI data. Variograms were estimated successfully by REML for Fe, OM, Cu and TEC (Fig. 6); these have longer ranges (31–95 m) than those for the tree properties (14–44 m). Variograms could not be estimated for Zn, B and pH. These results suggest that Cu, Fe, Om and TEC are spatially structured.

Table 6 Global Moran’s I for soil
Fig. 6
figure 6

Variograms of soil properties estimated by residual maximum likelihood (REML), where Exp. is exponential, and the parameters are: c 0 is the nugget variance, c + c 0 is the sill variance and a′ is an approximate range for the exponential function based on three times the model’s distance parameter

As an additional test of spatial stationarity, we applied GWR to model TEC on OM. Although we realized that this is not of particular interest as a model for prediction, the relationship has a large global R 2, and we used it to determine whether a localized model improved the fit, and at what scale. The soil data were interpolated using an inverse distance weighting quadratic function, which is localized and makes no assumptions of stationarity. The results are given in Table 7. Based on minimization of the AIC, a kernel of 13 samples was used. The localized R 2 values did show improvement for most of the orchard block compared with the global regression, and the AIC has reduced significantly from the global value. The spatial tests (Table 7) indicate significance at p < 0.0001. Overall, these results again suggest that the soil data are non-stationary in the strict sense.

Table 7 Comparison of global regression with GWR results for total exchange capacity on organic matter

The scales of the spatial structure differ somewhat between the tree-based measurements (yield, size and NDVI) and the soil properties. Variograms of yield and TCSA have ranges of approximately 30–45 m. The large nugget variances for yield and tree size indicate considerable variability from tree to tree. Variograms of NDVI for several image acquisition dates indicate a short range of variation of approximately 14–27 m. Variograms of OM, Fe, TEC and Zn have a longer range of spatial dependence, from 31 m to 95 m.

In summary, analyses using both global and localized Moran’s I suggest that yield, TCSA and NDVI are strongly spatially clustered or autocorrelated. The autocorrelation between trees (determined by a localized Moran’s I) is greater in some areas than others. Most soil variables also show significant spatial correlation in the global Moran’s I; the exceptions are pH, soil texture and B. The GWR analyses support the conclusions of non-stationarity, in the strict sense, of the data, but the GWR spatial significance tests failed at some scales, so we were unable to use these analyses to determine the spatial scales.

Conclusions

So what do these results suggest for management of the orchard block? The spatial structure of clustering of tree TCSA, yield and NDVI, as shown by the Moran’s I values, would suggest management zones on a scale of about 30 m. However, the large nugget variances and short ranges of variograms for many of the crop variables measured (e.g. 14–27 m for NDVI) favor management by tree, in spite of the longer scales of variation for some of the soil variables as determined by variogram analysis. The spatial non-stationarity, in the strict sense, in these data has implications in addition to the impacts on management zones. Regression models fitted to the entire dataset might be not be fitted optimally if there are spatial clusters in the data. Future assessment of perennial crop datasets such as this might include techniques such as autoregression to build spatial effects directly into the models. In addition, Markov random fields would enable further examination of specific spatial effects at different scales within a model.