Introduction

Maps representing a field and a topic associated with it are called thematic maps (TMs) and aim to inform, by graphical symbols, where a specific geographical phenomenon occurs. TMs have become an essential tool in geospatial science to understand spatial information (Fraser & Congalton, 2019), e.g., digital elevation model, slope map, soil map, aspect map, land use/land cover map, and contour map (Gojiya et al., 2018).

In precision agriculture, TM is an essential tool to assist analysts in decision-making, as it allows them to identify spatial variability within the field and manage the area in a localized way. TMs development is associated with data collection, analysis, interpretation, and information representation on a map, facilitating identifying similarities and enabling spatial correlations visualization. One specific case of TMs is contour maps built by connecting points of the same value and applying them to geographical phenomena that show continuity in geographic space. Another is choropleth maps, which use color to show ranges of values of a specific variable within a defined geographic area. Contour and choropleth maps can be built from categorical data (yield, elevation, temperature, precipitation, humidity, and atmospheric pressure) or relative data (density, percentages, and indexes) (Aikes Junior et al., 2021). Usually, both maps are called contour maps.

The advancement of computational technologies allows the creation and analysis of TMs using different techniques, methodologies, and software. For example, geographical information systems (GISs) can store, exhibit, recover, and dissect spatial data in a friendly approach. GIS has been widely used in many studies for spatial and temporal data creation (Gojiya et al., 2018).

Usually, the sampled data are interpolated in a dense and regular grid to generate continuous and smooth TMs. This task is carried out with the aid of interpolation methods. The most used methods in precision agriculture are inverse distance weighted interpolation (IDW—Shepard, 1968) and ordinary Kriging (OK—Cressie, 1993), which are differentiated by how weights are attributed to different samples, and may influence the estimated values (Reza et al., 2010). IDW procedure has been used because it is quick and straightforward; Kriging has been used because it provides the best linear unbiased estimates. However, it is more complex and time-consuming (Mueller et al., 2004). IDW interpolator considers weights at the sample points, which are evaluated during the interpolation process. Each sampled point’s influence is inversely proportional to the distance increased to a power from the point to be estimated (Isaaks & Srivastava, 1989). The value of the chosen power predetermines the weight factor; that is, the higher this value, the lower the most distant points’ influence.

Kriging has been identified as a Best Linear Unbiased Estimator (BLUE) interpolator (Diggle & Ribeiro, 2007; Isaaks and Srivastava, 1989). However, it must meet the spatial dependence (SD) modeling requests (Oliver & Webster, 2015; Cambardella et al., 1994) to have the correct performance and adequate use in creating a TM. The procedure’s performance can be influenced by variability and spatial structure of data, semivariogram model, search radius, and the used number of the closest neighboring points (Reza et al., 2010; Isaaks & Srivastava, 1989). Therefore, the interpolations’ quality depends on the variable’s spatial structure under study (Amaral & Justina, 2019). The deterministic interpolator IDW does not consider SD and the specific behavior of data, leading to less efficiency in mapping the spatial distribution of a given variable than Kriging (stochastic interpolator) (Betzek et al., 2019). However, when there is no SD (Rodrigues et al., 2018; Cambardella et al., 1994), the use of a deterministic interpolator can be more appropriate.

In geostatistics, semivariograms are not only used as an exploratory tool but allow estimating parameters (Diggle & Ribeiro, 2007). After the experimental semivariogram construction, it is necessary to adjust a theoretical model representing data variability. The curve-fitting can be done “by eye” by trying different values for the model parameters and visually inspecting the fit to the sample variogram (Diggle & Ribeiro, 2007). However, parametric covariance functions can be used to estimate semivariogram parameters. As a result, the variogram parameter estimates minimize the theoretical model’s squared differences and experimental variogram (Li et al., 2018).

Betzek et al. (2019) developed computational routines to determine the best interpolator and its parameters for a data set. The routines determine the best semivariogram model (and its parameters) for OK and the best power and number of neighbors used in the IDW interpolator. The interpolation selection index (Bier & Souza, 2017) enables the selection of the best among several existing mathematical and geostatistical models in a simplified and less subjective manner. It was observed that, in some data sets, the routine implemented to select an interpolator, may mistakenly select a geostatistical model that does not have spatial dependence or consider a model with a lack of adjustment to the experimental semivariogram.

Therefore, this work aims to adopt criteria to guarantee a minimum spatial dependence in the semivariograms applied to the interpolators’ selection process. For that, the indices were proposed (i) the effective spatial dependence index (%ESDI), (ii) the first semivariance significance index (\(\%\gamma \left(1\right)\)), and (iii) the slope of the model ends index (%SMEI).

Materials and methods

AgDataBox (ADB, http://adb.md.utfpr.edu.br; Michelon et al., 2019, Borges et al., 2020, Dall’agnol et al., 2020) web platform provides tools to create, store, recover, manage, exhibit, and analyze geographic and spatial data of TMs focused on precision agriculture. ADB offers farmers, researchers, and service providers focused on precision agriculture the ability to integrate data, software, procedures, and methodologies to contribute to agriculture development in the country using free technologies. This web platform has a microservices architecture (MSA), called ADB-MSA, which consists of a set of resources accessible remotely, through the hypertext transfer protocol (HTTP), to process and store data from an agricultural environment. ADB-MSA allows interoperability of several applications in which data and processing routines are centralized. The following applications, under development, consume ADB-MSA resources: (1) ADB-Mobile; (2) ADB-Map; (3) ADB-Admin; (4) ADB-IoT; (5) ADB-Remote Sensing.

ADB-Map application is included in ADB web platform and was employed for: (i) descriptive and exploratory analyses, (ii) data interpolation, (iii) selection of the best interpolation method, and (iv) TMs creation. This application aims at mitigating the problem of using different software to create TMs and delineate management zones. In addition, ADB-Map application provides user-friendly interfaces and procedures. This proposal converges to digitize agriculture. The functionalities of ADB-Map application are divided into conceptual modules (Fig. 1).

Fig. 1
figure 1

Overview of modules that make up AgDataBox-Map application

ADB’s data interpolation module interpolates data by IDW, OK, moving average, and nearest neighbor. Furthermore, it is possible to select the best interpolation method between OK and IDW, in addition to determining its interpolation parameters (Fig. 2). We improved and implemented new features in the module studied and implemented by Betzek et al. (2019). We developed algorithms that make interpolations with R software, using the packages geoR (Ribeiro & Diggle, 2001) and gstat.

Fig. 2
figure 2

Architecture of the ADB data interpolation module, representing the components and workflow

Location of the field, data collection, and selection of the coordinate system

Physical and chemical soil attributes were collected based on irregular sampling grids in two agricultural fields located in the municipality of Serranópolis do Iguaçu, western Paraná state, southern Brazil (Fig. 3). The fields have been cultivated under a no-tillage system with a crop succession of soybean and corn. The coordinate systems were the geographic coordinate system (GCS) with WGS 1984 datum. The sampling points’ locations were obtained by a GNSS receiver (Juno SB Trimble Navigation Limited, Westminster, CO, USA).

Fig. 3
figure 3

Location of experimental fields and sampling grids of a 100 points in field A-2018; 36.6 ha, b 52 points in field A-2019; 20.0 ha, and c 73 points in field B-2015; 20.9 ha in the municipality of Serranópolis do Iguaçu, Paraná state, Southern Brazil. Black contour delineates the fields used. Coordinates are in degrees (WGS 1984). The minimum and maximum distances among the sampling points are 41 and 1027 m in field A-2018, 45 and 706 m in field A-2019, and 31 and 838 m in field B-2015

Soil samples were taken from 0 to 0.20 m depth and analyzed in a commercial laboratory. Around each sampling point (using a GNSS Juno SB Trimble Navigation Limited, Westminster, CO, USA) and using a 3-m radius, eight subsamples were randomly collected, two per quadrant, within a symmetrical circle divided into four quadrants. Field A (Fig. 3a, b) was sampled with 100 sampling points in 2018 (36.6 ha) and 52 in 2019 (20.0 ha) and field B (Fig. 3c) was sampled with 73 sampling points (20.9 ha). The minimum and maximum distances among the sampling points are 41 and 1027 m in field A-2018, 45 and 706 m in field A-2019, and 31 and 838 m in field B-2015. Thus, the sampled density corresponds, respectively, to 2.7, 2.6, and 3.5 points ha−1 (Table 1), which were considered enough to identify spatial variabilities of the variables of these fields given that they exceed the recommended minimum density of 1 sample ha−1 (Ferguson & Hergert, 2009) to 2.5 samples ha−1 (Doerge, 2000; Journel & Huijbregts, 1978). However, Oliver and Webster (2015) observed that at least between 100 and 150 samples are required for a reliable variogram, but Clark (1979) recommended at least 30–50 data points to use Kriging. Nevertheless, the threshold for a sufficient density in one case may not enough in another. We used different sample densities meeting at least each of the recommendations, 100–150 samples in field A-2018 and 30–50 samples in fields A-2019 and B-2015, to confirm the robustness of ADB’s automated procedure and determine whether it can help be employed to determine when to use IDW and when to use OK (i.e., to determine whether the sample density is enough and/or if SD is detected; the pure nugget effect characterizes this case).

Table 1 Details of the study fields

Each point sample was composed of eight individual samples (Wollenhaupt et al., 1994). The sampling points were located along an imaginary line among intermediate contour lines with alternated distances and provided a better fit at the smallest lag distances, which is essential in Kriging (Bier & Souza, 2017). The variables obtained from soil analysis were chemical attributes (organic matter (OM; g dm−3), zinc (Zn; mg dm−3), iron (Fe; mg dm−3), manganese (Mn; mg dm−3), phosphorus (P; mg dm−3), potassium (K; cmolc dm−3), copper (Cu; mg dm−3), the potential of hydrogen (pH), calcium (Ca; cmolc dm−3), magnesium (Mg; cmolc dm−3), aluminum (Al; cmolc dm−3), pH of buffer solution Shoemaker–McLean–Pratt (SMP) method, potential acidity (H + Al; cmolc dm−3), the sum of bases (SB; cmolc dm−3), base saturation (V%), aluminum saturation (m%), and physical attributes (clay (%), sand (%), and silt (%)).

Exploratory data analysis

Data were analyzed using descriptive and exploratory statistics and geostatistics. During the descriptive analysis of data, measures of central tendency (mean and median), dispersion [standard deviation (SD) and coefficient of variation (CV)], and normality tests (Kolmogorov–Smirnov and Anderson–Darling tests at 0.05 significance level) were calculated. Data were considered normal when, in at least one of the tests, they presented normality. The coefficient of variation (CV) was classified as low when CV ≤ 10%, medium when 10% < CV ≤ 20%, high when 20% < CV ≤ 30%, and very high when CV > 30% (Pimentel-Gomes, 2009). The exploratory data analysis (EDA) was used to detect and remove outliers and inliers. Using the module ADB-Map-Clean of platform ADB, duplicate, negative or null points, outliers, and inliers were removed. The outliers were identified as values outside the mean ± 3 SD (Córdoba et al., 2016). The inliers were obtained by Moran’s local spatial autocorrelation index (II) (Anselin, 1995).

Analysis of spatial dependence

The semivariogram chart is determined from a set of observed values according to Oliver and Webster (2015) in two stages: (i) the calculation of the empirical semivariogram that summarizes spatial relations in data, and (ii) the adjustment of a mathematical model that best represents semivariances’ distribution in each lag distance. Each calculated semivariance for a particular lag (h) is only an estimate of a mean semivariance \(\widehat{\gamma }\left(h\right)\) for that lag. The four main elements are (i) the nugget effect (C0), (ii) the partial sill (C1), (iii) the sill (C0 + C1), and (iv) the range of spatial autocorrelation (Ra).

The Matheron (1963) classic estimator was used to calculate semivariances with at least 30 pairs of points (Journel & Huijbregts, 1978), and the range Ra was limited to half of the maximum distance (MD) among points (cutoff = 0.5*MD). The semivariances’ calculation should not exceed distances among points greater than half of the maximum distance (Clark, 1979). Points located beyond the cutoff are considered non-influential (Isaaks & Srivastava, 1989). Lag size (h) was defined by calculating the number of lags, the relationship between the cutoff, and the shortest distance among the pairs of points. Therefore, the lag h sizes were 43 m (field A-2018), 44 m (field A-2019), and 30 m (field B-2015), while semivariances 102 and 438 in area A-2018, 53 and 180 in area A-2019, and 55 and 182 in area B-2015. A significant limitation to address in this ADB-Map version is that anisotropy’s eventual presence is not considered.

The mathematical model adjustment should describe the spatial variation to estimate or predict values at unsampled places optimally by Kriging (Oliver & Webster, 2015). Only certain mathematical functions are suitable for this purpose, so, choosing and fitting a model must be done with care (Lark, 2000). We selected the most commonly used theoretical models: spherical, exponential, gaussian, and Matérn’s family (Uribe-Opazo et al., 2012; Isaaks & Srivastava, 1989).

To evaluate the degree of the SD variable, we used the spatial dependence index (%SDI—Biondi et al., 1994). The %SDI classification (Konopatzki et al., 2012) was adopted: very low for %SDI < 20%; low for 20 ≤ %SDI < 40%; medium for 40 ≤ %SDI < 60%; high for 60 ≤ %SDI < 80%; and very high for %SDI > 80%. This classification has the advantage of having five interpretation levels instead of three as proposed by Cambardela et al. (1994). The classification proposed by Konopatzki et al. (2012) is proportional to the spatial variability (the higher %SDI, the higher SD).

Figure 4 shows hypothetical sample points for which the spherical model was adjusted by routine in R. Considering that C0 is 1 and C1 is 9, the associated %SDI is 90%, corresponding to a strong SD. However, all semivariances are in the interval from 7 to 10. In this context, this works presents a new index, the effective spatial dependence index (%ESDI—Eq. 1), a new measure of SD degree. This index considers semivariance (\(\gamma \left(1\right)\)) in the first lag distance (h(1)).

$$\text{\%}ESDI=\frac{C-\gamma \left(1\right)}{C}\text{*}100,$$
(1)

where \(C\) is the sill (nugget effect + partial sill) and \(\gamma \left(1\right)\) is the first semivariance of the semivariogram. The %ESDI was classified as %SDI.

The second proposed index was the first semivariance significance index (\(\%\gamma \left(1\right)\)—Eq. 2), SD fraction due only to (\(\%\gamma \left(1\right))\).

$$\%\gamma \left(1\right)=\frac{\gamma \left(1\right)-{C}_{0}}{{C}_{1}}*100,$$
(2)

where \({C}_{0}\) is the nugget effect, \({C}_{1}\) is the partial sill, and \(\gamma \left(1\right)\) is the first semivariance of the semivariogram.

Furthermore, we also propose a slope of the model ends index (%SMEI—Eq. 3), which aims to assess the inclination degree between the nugget effect and the last adjusted semivariance. When %SMEI is null, it is a pure nugget effect, characterizing a lack of SD.

$$\%SMEI=\left(1-\frac{{\gamma }_{Z}\left(0\right)}{{10}^{-10}+{\gamma }_{Z}\left(n\right)}\right)*100=\left(1-\frac{{C}_{0}}{{10}^{-10}+{\gamma }_{Z}\left(n\right)}\right)*100,$$
(3)

where \({\gamma }_{Z}\) is the adjusted theoric semivariance, \({\gamma }_{Z}\left(0\right)={C}_{0}\) is the nugget effect, and \({\gamma }_{Z}\left(n\right)\) is the last adjusted theoretic semivariance, correspondent to the cutoff. The arbitrary constant 10−10 was included to avoid division by zero.

Fig. 4
figure 4

Example of semivariogram chart adjusted with spherical semivariogram model, where \(\gamma \left(1\right)\) is the first semivariance, \({\gamma }_{Z}\) is the adjusted theoric semivariance, \({\gamma }_{Z}\left(0\right)={C}_{0}\) is the nugget effect, and \({\gamma }_{Z}\left(n\right)\) is the last adjusted theoretic semivariance

Data interpolation

The variables used to generate TM were interpolated using OK and IDW in a 9 × 9 m grid with pixels. ADB-Map application automatically sets the pixel size based on the area’s size, with the value of 1 hundredth of the longest distance (horizontal or vertical). Computational routines were implemented in R language in ADB-Map application (Betzek et al., 2019).

Inverse distance weighting

IDW (Shepard, 1968) deterministic estimator considers the closest points to the location to be estimated more representative than the most distant one according to the samples’ linear distances. Twelve different values were used as IDW exponents (p) (0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, and 6.0).

Ordinary Kriging

Variables’ semivariograms were adjusted using theoretical models (spherical, gaussian, exponential, Matérn 0.5, Matérn 1.0, Matérn 1.5, and Matérn 2.0) by OLS and WLS methods. WLS weights were considered using the same number of pairs in each bin. Twenty-five different parameter sets (five initial values for the partial sill parameter and five for range) were used for each model, totalizing 350 adjustments.

Determination of the best semivariogram model and its parameters

Bier and Souza (2017) proposed the interpolation selection index (ISI) to automatize the selection of the best interpolation method, which assumes a lower value as better the interpolator is. By cross-validation (Faraco et al., 2008; Isaaks & Srivastava, 1989), mean error (ME) and standard deviation of mean error (SDME) are calculated. ME and SDME values calculated for each parameter set are stored and used to determine ISI that compares the deterministic and stochastic interpolation methods, thus, identifying the best adjustment for each model analyzed.

Statistic called error comparison index (ECI—Souza et al., 2016) was used to determine the best semivariogram fit in each \(j\) model analyzed, which assumes a lower value for the model is better stochastic methods of interpolation. The best semivariogram of each \(j\) model was used in ISI analysis. The reduced mean error (RME) and the standard deviation of the reduced mean error (SDRME) was determined by ordinary kriging cross-validation.

Computational routines by Betzek et al. (2019) were developed in statistical software R, using the geoR library and functions implemented directly in the PostgreSQL database, to determine the best interpolator (and its parameters) based on ECI and ISI. These computational routines were reimplemented, optimized, and made available on the ADB platform. In the geostatistics module, seven semivariogram models are tested (spherical, gaussian, exponential, Matérn 0.5, Matérn 1.0, Matérn 1.5, and Matérn 2.0), as well as two statistical methods to optimize the semivariogram adjustment, ordinary least squares (OLS) and weighted least squares (WLS—Cressie, 1985), thus totalizing 14 different models. For each model, 25 different parameter sets (five initial values for the partial sill parameter and five for range) are used, totalizing 350 different adjustments being analyzed to find the best one. In the IDW module, is analyzed a range of values for the exponent (0.5, 1.0, …, n) and a range of values for the number of neighbors (4, 5, …, n). For selecting the best semivariogram model, ISI is used to identify the best value for the exponent and number of neighbors.

Improving models’ selection using effective spatial dependence (%ESD)

Three problems should be addressed when selecting the best semivariogram:

  1. 1.

    A minimum of %ESD should be observed. We proposed that %ESDI must be greater than 25%.

  2. 2.

    The selected semivariogram model should contemplate a fraction of SD due only to (%γ(1)) lower than 50%.

  3. 3.

    The inclination degree of between the nugget effect and the last adjusted semivariance, estimated by %SMEI, should be greater than 20%. Otherwise, there is an indication of a pure nugget effect.

We proposed that the selection of the best interpolator model should not depend only on ISI but on the criteria presented on Table 2.

Table 2 Criteria to select the best interpolation method

The variable selection process was tested using three methods (Table 3): (i) method 1: best ISI, (ii) method 2 (Fig. 5): the three criteria (Table 2) are applied after geostatistics analysis, (iii) method 3 (Fig. 6): The three criteria are applied during geostatistics analysis.

Table 3 Methods used to select the best interpolation model

The main difference between methods 2 and 3 is observed when the three criteria are applied. In method 2, the three criteria are applied to analyze geostatistical models after the ISI determination step and the best interpolator’s indication (Fig. 5). For each semivariogram model and estimation method (Spherical OLS, Spherical WLS, Exponential OLS, Exponential WLS, etc.), all analyses to estimate semivariogram parameters are considered (5 partial sill intervals * 5 range intervals = 25 analysis). In method 3 (Fig. 6), a modification was proposed to filter out unsatisfactory geostatistical models before ECI has determined a semivariogram model’s best fit. Therefore, when selecting the analyses by ECI, only the cleaned models (not discarded) by the new selection criteria are considered.

Fig. 5
figure 5

Selection process of the best interpolator between inverse distance weighting and ordinary Kriging by method 2: the filters using %ESDI, \(\%\gamma \left(1\right)\), and %SMEI were applied after geostatistics analysis

Fig. 6
figure 6

Selection process of the best interpolator between inverse distance weighting and ordinary Kriging by method 3: the filters using %ESDI, \(\%\gamma \left(1\right)\), and %SMEI were applied during geostatistics analysis

Selection Methods 2 and 3 can lead to different results. The central aspect of method 3 is to allow another ‘fitted model’ to be selected in an interpolator selection analysis. In geostatistical analysis, for each combination of ‘geostatistical model’ (Spherical, Exponential, etc.) vs. ‘estimation method’ (OLS and WLS), 25 ‘fitted models’ (5 partial sill interval * 5 range intervals) are generated. When applying the selection criteria by Method 2, and eliminating the ‘fitted model’ that was considered the best, it is impossible to use another ‘fitted model’ from the same combination of ‘geostatistical model’ vs. ‘estimation method.’ In this case, the twenty five analyses were eliminated. On the other hand, selection by Method 3 makes it possible to use other ‘adjusted models’ within the combined analysis of ‘geostatistical model’ vs. ‘estimation method’.

Map’s evaluation

The interpolated maps were compared using the coefficient of relative deviation (CRD) proposed by Coelho et al. (2009). The coefficient expresses the average absolute percent difference between both maps. The choice of a reference map used for comparison is arbitrary. For this study, the map generated by the best interpolator selected by Method 3 was considered the reference for each variable.

Results and discussion

Descriptive statistics

The descriptive analysis of variables (Tables 4, 5, 6) showed that CV varied from 5% (low, pH SMP) to 118% (very high, Al in field A-2018), 5% (low, pH SMP, and clay) to 123% (very high, aluminum saturation-m% in field A-2019), and from 4% (low, pH SMP, field B-2015) to 146% (very high, Al in field B-2015). Variables Al, C, Ca, Cu, Fe, K, Mg, OM, P, pH (CaCl2), pH SMP, V, m%, clay, sand, and silt had points that were eliminated after eliminating outliers during EDA. Few outliers were found and eliminated in ten, nine, and twelve variables in fields A-2018, A-2019, and B. In several cases, variables did not present normality at 5% significance level: (i) field A-2018: Al, Cu, H + Al, K, and P; (ii) field A-2019: Al, m%, P, pH (CaCl2), Zn, and sand; and (iii) field B-2015: Al, C, H + Al, and P.

Table 4 Descriptive statistics of soil attributes in field A-2018 (100 samples)
Table 5 Descriptive statistics of soil attributes in field A-2019 (52 samples)
Table 6 Descriptive statistics of soil attributes in field B-2015 (73 samples)

Selection of the best interpolator model

Method 1

The results of selecting the best interpolator model for IDW and OK using ISI for variables of fields A-2018 (Table 9—Appendix), A-2019 (Table 10—Appendix), and B (Table 11—Appendix) showed that the OK one is the best interpolator for 35 variables (9 in field A-2018, 16 in field A-2019, and 10 in field B-2015) and IDW for 15 variables (7 in field A-2018, 3 in field A-2019, and 5 in field B-2015).

During SD analysis, the 50%-cutoff limited range to 513 m (field A-2018), 353 m (field A-2019), and 419 m (field B-2015). Therefore, the correspondent number of lags was twelve (field A-2018), eight (field A-2019), and fourteen (field B-2015), always with a minimum of 30 pairs of points. The first semivariance corresponded to 41 m (field A-2018), 45 m (field A-2019), and 31 m (field B-2015). ISI selected IDW as the best interpolator for (i) field A-2018: H + Al, K, Mn, pH CaCl2, pH SMP, V%, and Zn, (ii) field A-2019: Ca, Cu, K, m%, and SB, and (iii) field B-2015: Ca, Fe, Mg, Mn, and Zn. For the remained variables, OK was indicated as the best interpolator.

Some variables had their semivariogram models considered unsatisfactory, highlighted in Light Salmon (Tables 9, 1011). They did not agree with the criteria defined in Table 2 (%ESDI > 25%, %γ(1) < 50%, and %SMEI > 20).

The variables’ spatial dependences (SD, Fig. 7), measured by the traditional %SDI, were classified, on average, as medium (24%), as high (20%), and very high (30%). However, using %ESDI (Eq. 1), SD was classified, on average, as medium (22%), as high (16%), and very high (12%). That means that the high and very high sum lowered from 50 to 28% and that %SDI masks the actual SD.

Fig. 7
figure 7

Number of variables of each class for %SDI and %ESDI (very low, low, medium, high, and very high) for each field (A-2018, A-2019, and B-2015)

According to the visual inspection of each variable semivariogram (Tables 9, 1011), there seems to be a lack of adjustment of the model pointed out as the best for some variables in the fields A-2018 (K), A-2019 (Al, H + Al, K, m%, pH SMP, and V%) and B (V%). In other cases, there is an indication of pure nugget effect in field A-2019 (OM and pH CaCl2) and field B-2015 (Al, Ca, H + Al, P, pH CaCl2, pH SMP, and SB). Clay and silt can also be included in this list (field A-2019). Among the variables with “doubtful” or “pure nugget effect” adjustment, IDW interpolator was considered the best only for K, fields A-2018, and A-2019, and Ca in field B-2015.

Another aspect observed was the fact that %SDI (Fig. 7) indicated wrongly the presence of strong spatial dependence (high or very high) in some variables in the following areas: (i) field A-2018: K; (ii) field A-2019: Al, H + Al, K, m%, pH SMP, and V%; and (iii) field B-2015: V%. The first semivariance plotted in the semivariograms of these variables shows a high variance of data at the closest distances and that the model was adjusted incorrectly. In these cases, %SDI gives some false feeling of having an adequate model, which presents a strong spatial dependence.

This kind of problem with semivariogram adjustments is due to the model’s automatic adjustment to the semivariogram made by geoR package’s routines. The automatic adjustment of models to semivariograms is pointed out in literature as a notoriously tricky task (Webster & Oliver, 1990; Goovaerts, 1997). As with any method for adjusting the variogram model, they all assume the model’s basic structure in advance and then obtained the predefined model structure’s optimal coefficients. Selecting the variogram model and its parameters is the most controversial aspect of geostatistics; shapes of valid variogram models are finite; sometimes, the model’s optimal shape cannot be fitted, leading to reduced estimation accuracy (Han et al., 2016). In this sense, it is proposed in this work criteria (using %ESDI, %γ(1), and %SMEI) to improve the semivariogram adjustment process, which is presented by Methods 2 and 3.

Method 2

This method was applied to variables with unsatisfactory semivariogram models (Tables 91011). As a result, other semivariogram models were selected for variables in field A-2019 (Al, H + Al, m%, pH CaCl2, m%, and clay). In another case, the IDW interpolator was considered the best for variable SB (field B-2015) (Table 10). It is noteworthy that variables OM and silt, from field A-2019, and C, H + Al, P, pH SMP, and V%, from field B-2015, had all semivariogram models eliminated. In these cases, the IDW interpolator was considered the best one.

IDW interpolator was considered using Method 1 as the best interpolator for variable K, in fields A-2018 (Table 9) and A-2019 (Table 10), and for variable Ca, in field B-2015 (Table 11). However, other semivariogram models’ selection behavior was evaluated regardless of whether IDW was identified as the best. As a result, this allowed us to verify that the variable K, from fields A-2018 and A-2019, and the variable Ca, from field B-2015, could choose another semivariogram model (Table 12—Appendix).

It is essential to highlight that the three criteria must be considered together in the semivariogram models’ selection process. According to the semivariogram structure, a wrong model can be selected when it is not applied in association (see results in Table 7). This issue was the most important in field A-2019 and the least important in field A-2018.

Table 7 Result of selecting the best interpolator model for ordinary Kriging (OK) with Method 2 using each criterion separately and all together for variables of fields A-2018, A-2019, and B-2015

Method 3

This method, like Method 2, was applied to the variables with unsatisfactory semivariogram models (Tables 9, 10, 11). As a result, some models were eliminated in favor of others. In OM and silt variables, from field A-2019, and in C, H + Al, P, and V% variables, from field B-2015, all geostatistical models were eliminated during the geostatistical analysis (Table 13—Appendix). All other variables had changes in semivariogram parameters in comparison to Method 1.

Other semivariogram models were selected for variables in field A-2019 (Al, m%, pH CaCl2, and clay) and field B-2015 (pH CaCl2, pH SMP, and SB) (Table 12). In other cases, the IDW interpolator was considered the best one: field A-2019 (OM and silt) and field B-2015 (C, H + Al, P, pH SMP, SB, and V%).

Variable V% (field A-2019) kept the model selected by Method 1 (Spherical – OLS or WLS) but with other semivariogram adjusting parameters. In variables H + Al, K, and pH SMP (field A-2019) and Ca, the model selected by Method 1 (Spherical) remained; however, the method of adjusting the semivariogram changed between OLS and WLS. Variables Al, m%, and clay (field A-2019) and Ca and SB (field B-2015) kept the model selected in Method 2. Despite maintaining the models, variables K (field A-2018) and pH SMP (field A-2019) changed the semivariogram adjustment parameters.

As it was expected, Methods 2 and 3 conducted different results. Method 3 allows another ‘fitted model’ to be selected in the geostatistical analysis, and as it was explained in section M&M, it is expected to lead to the best interpolator model (IDW or OK).

Comparison of the three methods

When comparing the interpolator selection result for the variables considered with inadequate geostatistical models, it can be noticed that the selected interpolator might change according to the selection method (Table 8).

Table 8 The best interpolation models selected with each of the three methods

The variables OM and silt, from field A-2019, and C, H + Al, P, pH SMP, SB, and V%, from field B-2015 registered that Method 1 had considered OK as the best interpolator, and, after applying the selection criteria by Methods 2 and 3, it started to consider IDW as the best interpolator. Most of these variables had all geostatistical models eliminated after applying the selection criteria, except for variables SB and pH SMP (by Method 3) from field B-2015.

Even with eliminating inappropriate geostatistical models, K, from fields A-2018 and A-2019, and Ca, from field B-2015, kept IDW as the best interpolator. The other variables, Al, H + Al, m%, pH CaCl2, pH SMP, V%, and clay, from field A-2019, and pH CaCl2, from field B-2015, kept OK as the best interpolator, as selected by method 1. However, there was the selection of other geostatistical models after selection by Methods 2 and 3.

Thematic maps

Thematic maps (TMs, Table 14—Appendix) were generated by OK using the semivariogram selected by each of three methods and IDW with its best interpolator. The variables are the same as in Table 8. The best interpolator was considered the one selected with Method 3.

Using CRD to compare the maps generated by the interpolator selected by Method 3 (IDW or OK) versus the best semivariogram model indicated by Method 1 (Fig. 8), it can be seen that:

  • the selection of other interpolator parameters can result in large differences among the maps. In variable Al, from area A-2019, the best interpolator model, selected by Method 3 (Matérn 2—OLS), deviated by 64% from the map selected by Method 1 (Spherical—WLS);

  • the difference was below 5% in eight variables;

  • the difference was from 5 to 10% in seven variables;

  • over 10% in four variables.

Fig. 8
figure 8

The coefficient of relative deviation (CRD) between the interpolator selected by method 3 (IDW or OK) versus the best semivariogram model indicated by method 1

When comparing the maps generated by the interpolator selected by method 3 (IDW or OK) versus the best semivariogram model indicated by method 2 (Fig. 9), it can be seen that:

  • The most significant difference was observed in variable K (field A-2018; 18%);

  • The difference was below 5% in ten variables;

  • The difference between 5% and 10% in one variable.

Fig. 9
figure 9

The coefficient of relative deviation (CRD) between the interpolator selected by method 3 (IDW or OK) versus the best semivariogram model indicated by method 2

Our study analyzed 50 cases, and in 23 of them, IDW outperformed OK. Consequently, in 27 cases, OK was better than IDW. These results confirm the ones presented by Mueller et al. (2004), i.e., for sample datasets with semivariograms, which did not indicate spatial structure, IDW was a better choice than OK with a nugget model.

Work contribution

Thematic maps in precision agriculture allow identifying the spatial distribution of geographical attributes, soil, and plant productivity (Bazzi et al., 2015). Estimating values for unsampled regions is important to reduce costs with laboratory analysis. More accurate estimates of the interpolated positions contribute to the correct interpretation of the analyzed phenomena, helping the producer in decision-making.

Several precision agriculture applications are available for farmers. However, existing software for creating TMs are not developed specifically for precision agriculture, but for generic data handling (Whelan & Taylor, 2013). Choosing a tool not dedicated to precision agriculture can be challenging (Borges et al., 2020).

As the ADB platform is biased towards precision agriculture, it provides the necessary tools to create TMs without dependence on various software. It allows users with no specific skills to obtain the analysis result, without getting involved in too many process details. On the other hand, it also allows experienced users to choose the analysis settings. The automated routine for interpolator selection calculates, in its default configuration, 398 deterministic and stochastic models, and by ISI selects the best among them. Therefore, this work contributed to the improvement of data interpolation, eliminating the possibility of selecting the wrong model by the automatic selection process, and resulting in more accurate estimates of the data set.

Spatial variability characterization of soil’s chemical and physical attributes with greater precision allows, for example, prescription maps creation of fertilizer in variable rates and correctives for the soil and plant. Hence, this may optimize the use of fertilizers and other inputs.

Conclusion

The inclusion of the three criteria (i) effective spatial dependence index (%ESDI) > 25%, (ii) the first semivariance significance index (\(\%\gamma \left(1\right)\)) < 50% and (iii) the slope of the model ends index (%SMEI) > 20% improved the selection of the best interpolator using only the interpolator selection index (ISI—Bier and Souza, 2017).

The comparison carried out the methodology influence on selecting the best interpolator among the studied thematic maps using three Methods: (i) Method 1—best ISI; (ii) Method 2—the three criteria were applied after geostatistics analysis; Method 3—the three criteria are applied during geostatistics analysis. Method 3 showed as the best approach. The coefficient of relative deviation (CRD) varied from 0.1 to 64% when comparing the maps generated by the three methods.

The newly proposed measurement of the effective spatial dependence index (ESDI) of a semivariogram showed better performance than the usual spatial dependence index (%SDI) widely adopted in the literature.

With the implementation of the methods shown in the ADB platform, it appears that farmers and researchers who work with precision agriculture will have a free tool to carry out analyses in situations where it is difficult to create adequate geostatistical models for the thematic map’s creation.