Keywords

1 Introduction

Increasing efficiency in agricultural production is instrumental to developing both intensive and subsistence agriculture towards the goals of sustainability set by future challenges to primary production for human and animal consumption.

The 2030 Agenda for Sustainable Development of the United Nations issued 17 Goals regarding Sustainable Development (United Nations General Assembly 2015), among which the reduction of hunger and poverty in a world of growing population, while protecting the environment and pursuing social equity and innovation. The ensuing specific general target for agriculture is summarized as “sustainable intensification” of crop production, as set since 2009 among the strategic objectives of the United Nations Food and Agriculture Organization (FAO 2009).

The definition of sustainable intensification implies unit yield increases without an increase—or with a decrease—of adverse environmental impact of cropping techniques (Royal Society 2009). The compatibility of agricultural intensification with sustainability has been questioned (see discussion in Pretty and Bharucha 2014; Struik and Kuyper 2017) to the point of defining sustainable intensification an oxymoron. Pretty and Bharicha (2014) have highlighted the many facets and ambiguities of even the single terms intensification and sustainability. Nevertheless, evidence of shifts to a lower input—higher yield combination exist (Alromeed et al. 2015). Struik and Kuyper (2017) propose that sustainable intensification may be pursued through the “de-intensification” of intensive systems in order to make them more sustainable and the intensification of systems where yield gaps are found so they become more productive.

The issue is more complex than this framework, though. For instance, the increase in inputs in low-yielding conditions may or may not result in a higher yield and therefore may increase or decrease sustainability according to the ability to identify causes of poor yield and apply the appropriate means (Oliver et al. 2010).

In all cases the key to sustainable intensification is increasing resource efficiency and therefore reducing environmental impact per unit surface or per unit yield. To this end, Precision agriculture (PA) is one of the most promising approaches. The core of PA is to detect and manage the spatial variation of crop performance within fields, as opposed to uniform management. Sound agronomy and geospatial technology play a key role in PA, for within-field resource analysis and optimization. Oliver et al. (2010) make a good analysis of the issue by presenting nine case studies related to Precision agriculture, and namely to reduction of inefficiencies linked to the simultaneous presence of high- and low-yield zones in the same field. Authors remark that farmer’s knowledge of their own fields allow them to map and rank areas of different yield performance in a way comparable to technological means; the reasons of poor performance, though, are not correctly identified in many cases and therefore farmers may apply additional fertilizer or other inputs in poor yielding areas and not obtain improvements which justify the extra amount of resources. This way inefficiencies increase rather than decreasing. Authors show how only finding the soil spatial constraints allows to diagnose the causes of poor yield and to plan alternatives for the correct management decisions.

The role of geospatial techniques in such a framework is increasingly recognized and can be key in discriminating cases where poor yielding potential needs to be addressed with low inputs to reduce waste of resources, from other instances where constraints to yield may be removed and an increase in inputs will result in higher yields. Geophysical techniques are especially useful to this end since they allow to map soil features such as texture and impeding layers in depth, while spectroscopic methods are limited to soil surface exploration but allow to detect additional chemical soil properties.

The task is challenging, though, due to complex interactions between permanent and variable properties of soils, crop behavior, weather, climate, and anthropic dynamics. This implies that techniques for the detection of the spatial variability of relevant properties in the system need to be combined with approaches to data treatment and analysis, and with support tools for decision making.

This chapter addresses some open issues and presents approaches to data treatment and interpretation based on the joint analysis of soil and plant spatial behavior.

2 Using Geospatial Techniques for Decision Making in Agriculture

While farmers’ knowledge of field variability has always empirically guided choices to some extent, the wide availability of positioning systems for farm machines has made it possible to implement a spatially-aware form of agriculture which has two main broad fields of application: tractor guidance systems and Precision agriculture.

Tractor guidance systems include the early satellite-assisted driving and the latest automated or self-driving devices, all aimed at reducing overlapping and gaps in farming operations and therefore reduce time, soil compaction and the consumption of fuel, seed, fertilizers and other inputs. Such savings may improve resource efficiency (e.g. 20% in Kharel et al. 2020) and reduce costs and impacts on the environment.

Precision agriculture is a system where the management of crops is not uniform within a field, but site-specific according to the different needs of areas with different characteristics relevant to plant production and to the impact of agronomic techniques on the environment.

Spatial techniques for PA are aimed at two main objectives:

  • detecting and mapping the spatial variation of crop behavior and environmental factors relevant to crop production and impact

  • guiding the differential application in space of agricultural inputs (fertilizers, water, crop protection, …). This is also called variable-rate application.

The earliest schemes of spatialization in Precision agriculture were such as depicted in Fig. 4.1. Classic spatial data collection consists in acquiring georeferenced plant data during the vegetative growth of crops and/or harvest, and produce maps trough spatial statistics methods. Data analysis and application of agronomic know-how are then integrated in a further step to produce geo-referenced instructions for the site-specific management of crops in subsequent years.

Fig. 4.1
A schematic of the agriculture cycle consists of maps, identification of management zone, production of prescription maps, variable rate treatments, and crop data collection at various stages. Variable-rate treatments consist of irrigation and fertilization. Crop data collection is down by drones and vehicles sending data to satellites.

Architecture of the early Precision agriculture cycle

Crop and soil spatial data, and possibly spatial data on meteorological information are used alone or in combination to make decisions for the differential management of areas within a field. Choices may be tactic or strategic.

Tactic decisions are taken at or slightly before the time of input application (“on the go”), and are typically based on one single layer of spatial data—e.g. from a sensor of crop status—which drives choices about the amount of inputs needed to correct a deficit or a stress of plants in a given field area. An example may be irrigation or nitrogen fertilization given in amounts which vary as the machine proceeds in the field with a variable-rate irrigation or fertilization device driven by an electromagnetic sensor of plant water or nitrogen status. A pro of this approach is that in principle it accounts for the actual space and time variability of crops (Casa et al. 2017). A big problem, tough, is that translating information on plant conditions in a management decision is hardly appropriate with on-the-go decisions based on a single sensor.

This depends on three main reasons:

  • one indicator is often not enough to characterize the status of plants.

  • even when the condition of a crop is correctly assessed the question is whether poor performance needs to be addressed with higher or lower inputs. As mentioned in the introduction this depends on what the reasons of bad plant growth, status or yield are. If soil depth or texture in a given area is insufficient for holding enough water, for instance, more irrigation will not improve water stress but it will result in more water deep percolation and/or runoff and therefore more waste of this precious resource without improving yields. The combination of previous spatial knowledge (e.g. soil texture map) with instant measurements may be a way of improving information relevant for management choices. In this case, though, decisions may be less immediate and less continuous (e.g. they will depend on previous zonation to some extent).

  • choices in agriculture are made with a target of amount and quality of produced goods, and responses to inputs are often the result of interactions with other factors. For example full nitrogen nutrition may induce excessive early growth of plants, with over-use of water during vegetative stages and this may leave insufficient water in the soil for the reproductive stage. Therefore decisions mediated by agronomic knowledge may be better than automatic choices.

The latter remark introduces a common issue to tactic and strategic approaches: translating sensor data into agronomic prescriptions remains a bottleneck, and future improvements may include a stronger incorporation of agronomic know-how into spatial approaches or vice-versa. At present, spatial data are increasingly used within agronomic tools:

  • as an added dimension for decisions based on classical agronomic approaches like the water or nutrient balance.

  • jointly with simulation models of plant growth and production in order to predict the spatial variability of crop performance under different conditions.

  • within “decision support systems” where spatial information is used with other tools such as weather analyzers or generators, crop models, market analysis, simulators of environmental impact and more.

This way agronomists produce various possible scenarios of uniform or spatially-aware management in different meteorological conditions, price or regulations framework and thus help decisions makers—from farmers to politicians—analyze pros and cons of each choice (Ritchie and Amato 1990; Alromeed et al. 2015).

In strategic approaches choices are based on a spatial tool called “prescription map”: a set of geo-referenced instructions for the site-specific management of crops. Prescription maps may be issued for one or several inputs (such as irrigation, fertilization, pest control, …) and they may be different for each type and time of application. Prescription maps are based on the structure of variation of relevant plant and terrain attributes and may follow their continuous variation or adopt a division in areas. Such areas are called “uniform management zones” (MZ) and spatial techniques for identifying them are the object of research, regarding both data acquisition methods and data treatment and criteria.

3 Spatial Techniques in Agriculture: Data Acquisition

Although farmers’ knowledge appear to be effective in identifying zones of different crop behavior, the causes of such differences need to be identified before MZs can be established, and they often reside in soil attributes. Among properties to be mapped for Precision agriculture, Nawar et al. (2017) list farmers’ knowledge, morphology, pedology, soil chemistry, yield and vegetation indices, soil properties from proximal or remote sensors.

3.1 Crop Spatial Data

Sensors for crop behavior may be classified based on several criteria. The most important destructive methods are those used for yield mapping, based on weight measurement upon harvest. Many spatial sensors, though, provide non-destructive mapping of crop behavior, where calibration and ground-truthing may be necessary and spatial resolutions vary widely. Technologies range from proximal to remote and span across a wide array of scales.

The most widespread methods are based on transmittance or reflectance of radiation from crop canopies at different wavelengths used alone or in combination:

  • In the visible (VIS 0.4–0.7 μm) and near-infrared (NIR, 0.7–1.3 μm)

  • In other infrared regions such as SWIR, (1.3–2.5 μm) or thermal (TIR 7.0–20.0 μm).

Fluorescence spectroscopy (at 0.68 and 0.74 μm wavelengths) is also increasingly used.

Such methods rely on different physical phenomena interfering with reflection and transmission of radiation. Some are based on known interactions of crop physiology with physical processes or on behaviors under study. The two most exploited are based on distinctive features of plants, and specifically the regulation of chlorophyll concentration and stomata opening in response to stress:

  1. a.

    Plant water status or nitrogen nutrition status are linked to the concentration of a pigment unique to healthy plants: the chlorophyll. The turnover of chlorophyll in leaves is fast therefore environmental or management problems are reflected in changes of chlorophyll levels within a short time. Chlorophyll exhibits a distinctive behavior with regard to solar radiation, and spectroscopy uses the unique feature of chlorophyll of absorbing at the wavelength corresponding to the VIS red color while exhibiting a low absorption (therefore a high reflection and transmission) at the near infrared wavelengths. Measuring reflectance or transmittance of solar radiation in the VIS red and in the NIR has given rise to a few spatial vegetation indices used for the mapping of plant cover and stress and of nitrogen status. The most used are: NDVI = (NIR – VIS-RED)/(NIR + VIS-RED) and Chlorophyll Content Index = NIR/VIS RED. Where NIR = reflectance or transmittance of radiation in the near infrared; VIS RED = reflectance or transmittance of radiation at wavelength/s corresponding to the red color. Each sensor may vary in the specific wavelength selected.

  2. b.

    Stomata are openings on leaf surfaces which allow the exchange of water and carbon dioxide between plants and the atmosphere. Plants are able to finely regulate the status of such openings: a short term response of plants to water stress (or few other stress conditions) is the partial or total closure of stomata which causes a drastic reduction of water loss. Under such circumstances crop systems cannot rely on their main thermoregulation system, that is water passing from liquid to vapour at the plant-atmosphere interface employing latent heat. The consequent increase in temperature of the crop surfaces is therefore an index of stress.

Other methods include x-ray, laser or ultrasound imaging aimed at mapping crop height or biomass cover.

Sensors of crop status are the object of a large body of literature and were recently reviewed by Galieni et al. (2021).

3.2 Detection and Mapping Techniques for Agricultural Soils

3.2.1 Destructive Soil Sampling

Earlier approaches to soil measurement for addressing the spatial variability of crop performance were based on direct measurements of soil properties such as texture, pH, salinity, content of nutrients and possibly hydrologic constants. With this method sample collection is destructive and provides point information. Sampling density is typically low therefore interpolation accuracy is necessary. A great effort in improving interpolation has been devoted to this task in the past years (Nawar et al. 2017).

If methods of interpolation are aimed at assessing the spatial dependency of data with geostatistics, at least 100 data points are needed (Webster and Oliver 2007). This number of samples is small for recent sensor data, but large for traditional methods which are destructive and labor intensive. Traditional methods of point sampling of soil coupled with geostatistics have therefore been applied in research-oriented field campaigns (e.g. Castrignanò et al. 2020) but may not be proposed to farmers in regular practice.

An alternative approach is to apply stratified sampling (also called “targeted sampling” or “surface-response sampling”) where a small number of field regions is identified based on plant behavior, and only one or few samples per region are collected. Ritchie and Amato (1990) identified zones of different crop performance in a maize field on the basis of aerial photographs and plant height, and were able to characterize differences in soil properties by sampling five soil profiles only. Oliver et al. (2010) used targeted soil sampling on the basis of zones of different crop performance as established from farmers’ knowledge and vegetation indices in nine different case studies in Australia. They identified different types of soil constraints in areas of poor plant performance, and this led to envisage different farming strategies on the basis of crop modelling scenarios: where it was possible to remove constraints then yields could be increased and a matching high input of resources was beneficial. Where limiting soil factors could not be overcome, though, potential yields stayed low and the best strategy was to reduce inputs accordingly.

3.2.2 Geospatial Techniques for Agricultural Soils

Geospatial sensor techniques result in a revolution in approaches to the variability of agricultural soils. They provide non-destructive and almost continuous high spatial sampling resolution (e.g. >1500–2000 readings per ha in Mouazen et al. 2007) of electromagnetic soil behavior or other physical properties which are proxies of soil properties relevant to crop behavior. This allows fast zonation but the relations between sensor data and soil properties required for field management need to be established.

As a consequence, after geospatial sensor techniques were introduced in soil science research has focused somewhat less on interpolation methods and more on calibration (Casa et al. 2013).

Based on contact, methods may be remote or proximal, whereas based on sensor type techniques may be classified in two broad categories: spectroscopic: based on radiation just like plant sensors, and geophysical, based on electromagnetic behavior of soil components (Amato and Priori 2020).

The most used spectroscopic methods for agriculture include gamma and VIS–NIR sensors, increasingly used in multispectral and hyperspectral mode to detect different features of the soil surface. Spectroscopes may be remote or proximal. Pros of spectroscopy used in remote mode are the possibility to map large soil surface areas rapidly and at low cost. Problems include the need to measure when/where the soil is bare and the fact that measurements refer to the surface soil layer/s only.

Geophysical methods cannot be used remotely since they rely on proximal or contact sensors, but they have the great advantage of allowing measurements at surface and deep soil layers. This is an important feature for applications in agriculture, since the soil conditions beyond the surface horizons have been identified as crucial for identifying the causes of different crop behavior at the field scale and detecting constraints to agricultural production (Oliver et al. 2010). Therefore geophysical mapping improve the feasibility of precision farming and represent a revolution in the Precision agriculture cycle, where soil mapping can be effectively incorporated in the decision process. Geophysical surveying methods may be broadly classified as those making use of natural Earth fields and methods which need the application of artificially generated energy (Bitella et al. 2015).

Spectroscopic and geophysical methods are able to detect different soil properties, though, and the most informative strategy is a combined use of techniques in order to exploit the characteristics of each sensor.

Example of spectroscopic methods in agriculture-related uses (Selige et al. 2006; Casa et al. 2019; Priori et al. 2016) include gamma-ray spectroscopy measuring the spectrum of gamma-rays emitted by the surface 20–30 cm of soil, which are useful to map soil texture, mineralogy, stoniness and carbonate content. Spectroscopes measuring reflectance in VIS–NIR are used in multispectral or hyperspectral mode to detect texture, organic carbon, calcium carbonates, nitrogen, and water in soils. This kind of fast and low-cost mapping can help delineate Management zones based on soil fertility as reviewed by Nawar et al. (2017) and be of great use to quantify and monitor the effect of agricultural management on environmental variables. As an example, Priori et al. (2016) propose a combination of spectroscopic methods to map soil surface (0–30 cm) carbon stock with a limited number of destructive measurements. They used passive gamma-ray (“The Mole sensor”—Medusa Systems-The Netherlands) in proximal mode to map topsoil spatial variability, and lab VIS–NIR to estimate the carbon stock of fine earth at several points, based on existing spectral libraries. Calibration was then performed on few destructive measurements, at the average density of one sample per hectare. Gamma-ray data and elevation were used for interpolation through geostatistical methods and corrections of VIS–NIR carbon estimation based on gamma-ray mapped stoniness.

Geophysical methods in agricultural production and related environmental issues have a wide range of uses from texture mapping to monitoring soil water and studying plant roots. Many works point at them as an invaluable tool for precision farming (King et al. 2005; Alromeed et al. 2015; Rossi et al. 2015). Methods under study range from radar to seismic (Bitella et al. 2015). But the most widespread technology for agriculture is related to the measurement of electric resistivity (ER) or its inverse bulk electric conductivity (Ec) with galvanic or electro-magnetic induction devices. Principles, advantages and disadvantages have been reviewed in recent years (Samouelian et al. 2005; Bitella et al. 2015; Romero Ruiz et al. 2018).

Their success in agriculture is based:

  • on the sensitivity of electrical resistivity/conductivity to the electrical properties of soil materials, which allow not only to discriminate minerals, but also to detect and quantify water, salt concentration, porosity, and resistive materials such as plant structures (Bitella et al. 2015; Amato et al. 2012).

  • on the possibility to use them statically (galvanic) or on-the-go (galvanic and electro-magnetic induction) on bare or vegetated surfaces, and thus achieve a fast coverage of agricultural and natural fields.

One of the first and classical uses is the mapping of field soil salinity, an important constraint for crops (e.g. Corwin et al. 2003).

Regarding more specific applications, Basso et al. (2010) show a static application of geo-electrical methods for imaging and quantification of the effects of soil tillage methods on soil porosity in layers relevant for plant root growth. A series of works show field and container measurements based on electric resistivity tomography where plant root mass density could be mapped under trees (Amato et al. 2008; Rossi et al. 2010) and to a lower extent under herbaceous crops (Amato et al. 2009). Some examples are shown in Fig. 4.2.

Fig. 4.2
A two-part schematic labeled a and b. Part a is vertical E R topography of minimum tillage, conventional tillage, and no-tillage along with a scale. High porosity areas are marked on the conventional tillage. In part b, a graph plots the E R and L n on the top and a E R tomography volume reconstruction of a citrus orchard root at the bottom.

Examples of applications of static geo-electrical methods in plant-soil systems: a vertical ER tomography in three different tillage systems (redrawn from data in Basso et al. 2010); b results of ER tomography measurements for plant root studies. Top: relation between ER and root biomass density under Alnus glutinosa (L.) trees. Bottom: volume reconstruction of root in a Citrus orchard with 3-D ER tomography (redrawn from data in Rossi et al. 2010)

4 Geospatial Techniques in Agriculture: Data Treatment and Management zones

Identifying Management zones is still one of the major open issues in precision farming. In many instances, the Precision agriculture cycle of Fig. 4.1 still the prevalent approach, where plant sensor data are the only zonation criterion. Problems with this scheme include that:

  • the spatial pattern of plant behavior is variable in time, due to interactions of different soil and terrain features with weather (e.g. Machado et al. 2002). Therefore proper assessment of stable Management zones needs a multi-year analysis (McBratney et al. 2005).

  • crop behavior is the resultant of many factors. A scheme where the spatial pattern of plant behavior is established but causes are not inquired is based on the stability in time of zones of good and poor yield and not on the identification of limiting factors. This may result in wrong management decisions (Oliver et al. 2010).

What changes the perspective is to use terrain attributes relevant to plant production as well, and namely soil properties and topography. This allows to address soil- and morphology-based sources of variability in the field such as soil constraints or different soil texture in selected areas and therefore to identify many of the causes of yield variability and address them appropriately. Also, such sources of variability are often permanent, and this allows to identify stable Management zones in the field in one go or coupled with a reduced number of years of crop performance observation compared with plant-based observations alone.

4.1 Classical Criteria for Identifying Management zones

Management zones are ideally regions where it is appropriate to apply a specific crop input uniformly, but with a rate that is different from that of a different neighboring region; MZs need to be stable in time in order to be used for planning. This definition implies that within MZs the combination of yield-limiting factors is relatively homogeneous, so that optimal use of resources may be pursued with a uniform application of each crop input (Vrindts et al. 2005)  but different from that needed in surrounding MZs. In a simpler way Haghverdi et al. (2015) define MZs as subregions of a field that are homogeneous with respect to soil-landscape attributes. 

Management zones may be identified based on one or more layers of spatial data and may include logistic criteria linked to farm operation, such as the dimension of machines and input application devices or time and economic constraints.

Nawar et al. (2017) review common approaches to MZ and his conceptual framework of classical MZ research, after the collection of spatial data lists the following steps:

  1. a.

    identifying homogeneous areas

  2. b.

    finding the optimal number of classes

  3. c.

    establishing MZs and evaluating the effectiveness of classification.

The individuation of homogeneous areas is considered the most challenging step since it requires choices as to the definition of zone boundary (Nawar et al. 2017). Also, given the many interactions of factors determining crop yield and environmental impact, multiple data layers are often collected and the relevance of each data layer needs to be assessed, in addition to using techniques for data merging, fusion and multivariate analysis.

Research focuses on:

  • statistical techniques to identify the most relevant data layers and their association, such as principal component analysis

  • methods to group data layers, like the calculation of indices where single soil or crop properties may bear different weights

  • clustering techniques including machine learning with parameters set by users or found through fuzzy/neural network methods.

The choice of technique for this step will then imply methods and indicators for finding the optimal number of classes.

The use of multiple data layers often results in improved effectiveness of MZ individuation compared to one layer only (Nawaret al. 2017), although costs are higher. More specifically, it is the joint use of soil and crop data that makes a difference rather than multiple information on crop only or soil only.

Assessing the effectiveness of classification may be performed on the basis of statistical criteria (e.g. the comparison of variability within and between MZs) or by analyzing the performance of uniform management versus variable-rate management with MZs obtained with one or more criteria. The most interesting methods include the production of scenarios through joint use of spatial data and crop modelling, and cost–benefit analyses.

4.1.1 Joint Use of Spatial Data and Crop Modelling for Scenarios

Among pioneers of this approach Ritchie and Amato (1990) used aerial photos of a maize field during water shortage in Michigan to identify areas of different sensitivity to water stress. Five different zones were thus mapped, where plant biometrics and yield as well as soil properties were measured, and differential crop behavior was explained in terms of spatial variability in soil available water in the profile. Management options for this field were then compared based on scenarios produced with the CERES-MAIZE crop model (Jones and Kinry 1986) and a weather generator: yield and use of irrigation water were simulated for 30 years of weather for southern Michigan. Irrigation strategies ranged from uniform watering of the whole field with different criteria, to PA with differential irrigation where timing and amounts of irrigation events were scheduled according to water retention characteristics of each of the five zones. Differential irrigation was the only strategy which allowed to reach the maximum yield and use the lowest amount of irrigation water in all of the five zones.

Oliver et al. (2010) propose the use of crop sensor data or farmers’ knowledge to identify areas of different crop behavior, followed by soil sampling to identify soil constraints, and crop modelling for comparing scenarios as a basis for management decisions.

Alromeed et al. (2015, 2019): used electrical soil mapping and irrigation-oriented modelling for precision irrigation planning. An Automatic Resistivity Profiler (ARP © Geocarta—Paris) was used to obtain resistivity maps at the depths of 50, 100, 200 cm in Southern Italy. The map of electrical resistivity was used for sampling soil at a limited number of sites where soil texture was measured and then translated into total available water (TAW) calculated with the Saxton and Rawls (2006) pedotransfer corrected for gravel. Values of TAW were then applied to the ER map and used as an input for the ISAREG model (Teixeira and Pereira 1992) applied to each of six soil areas within the field. Increasing ER corresponded to increasing coarse soil fraction content and decreasing TAW, which ranged between 216 and 121 mm with respect to the whole 200 cm profile and between 120 and 66 mm over 100 cm. Daily weather data for 15 years (1999–2013) were used for simulations comparing uniform and differential irrigation with different criteria on 6 crop types. Differences in irrigation requirements between soil zones identified with ER mapping were 10–44% and varied with irrigation strategy. Differential irrigation of each area according to its own TAW up to 100 cm allowed to save an average of 20% of water without yield losses compared to uniform irrigation using average TAW.

4.1.2 Cost–Benefit Analysis

Nawar et al. (2017) review the use of economic criteria for farm-scale comparison of uniform and variable-rate agriculture, and show that precision farming allows to increase yield and/or allow savings of inputs with an overall favourable economic balance. The profitability of precision versus uniform management depends, of course, on the degree of variability within a field, but reviewed yield increases range from 1 to 10% and resource savings from 4 to 46% due to precision application of fertilizers. The overall net return is always higher with precision farming. A case study is also presented on nitrogen fertilization of cereals in the UK where different approaches to Management zones delineation for variable-rate farming are compared. The study shows that MZ delineation based on soil data only is less profitable than MZ identification obtained through soil and crop data.

Farmer’s cost–benefit analyses, though, cannot be considered a complete account of economic benefits of a given agricultural management approach, since costs or revenues linked to environmental impacts or ecosystem services should also be taken into account. They would show that Precision agriculture is even more economically viable than uniform management since a lower use of agricultural inputs implies a lower waste of resources and pollution potential.

4.2 Using Soil–Plant Spatial Relations to Identify Management zones

In classical clustering procedures different data layers may have different weights in statistical treatment but their relations in view of agronomic criteria are not usually studied. A different approach is found in Rossi et al. (2015, 2018), Pollice et al. (2019), who used the relation between soil and crop data for preliminary zonation before applying other criteria as summarized in the following paragraphs.

In a study on differential irrigation in an alfalfa field in southern Italy (Rossi et al. 2015) electrical resistivity mapping was performed with automatic profiling (ARP © Geocarta Paris) at three layers: 50, 100 and 200 cm of depth from the soil surface (Fig. 4.3).

Fig. 4.3
Six maps placed in two rows. In the top row, the maps are E R of 0.5 meters, E R of 2 meters, and N D V I of date 1. In the bottom row, E R of 1 meter, digital elevation model and N D V I of date 2.

Electrical resistivity maps up to three depths, elevation model and NDVI measured at 2 dates in a 7-ha alfalfa field in southern Italy. Redrawn from data in Rossi et al. (2015)

Values ranged between 3.7 and 64 Ω m with definite spatial variability. Surface-response sampling was used in order to choose 6 positions in the field corresponding to different ER values spanning across the whole range of values. Traditional soil profile studies and soil lab analyses were performed at the 6 positions, and ER was shown to be a proxy of soil texture, being sensitive to clay and sand, just as shown in many other instances in the literature.

The crop biomass was also sampled at the same sites while vegetation cover and vigor were mapped with proximal VIS–NIR methods for the calculation of NDVI at 4 dates corresponding to different phenological stages of alfalfa (Rossi et al. 2018). Examples of NDVI at 2 dates are mapped in Fig. 4.3.

Both biomass and the NDVI index were strongly correlated with ER. Alfalfa is a forage crop and the whole plant is fed to livestock, hence biomass coincides with crop yield. Therefore in this study the spatial variation in yield was well predicted by ER maps. The best relationship (statistically strongest) was found between vegetation indices and ER at the deepest measured layer (up to 200 cm) and this was explained by pointing out that alfalfa is a perennial crop with a deep root, therefore sensitive to soil changes at depth (Rossi et al. 2015).

The soil-crop relationship from such data was studied using generalized additive models (Rossi et al. 2015, 2018; Pollice et al. 2019) and showed a complex behavior, which is depicted in Fig. 4.4 where a function of the crop index NDVI and field slope is plotted against ER of the deepest layer.

Fig. 4.4
A schematic of the A 1, A 2, A 3, A 4, A 5, and A 6 soil profiles marked from an E R map. A graph on the left plots the s versus v 3 giving A 1, A 2, A 3 at low and A 4, A 5, and A 6 at high. A 5 and A 4 are labeled as hadpans. Another stacked row graph plots the composition of the sand silt and clay of the six soil profiles.

Soil-crop relationship function, soil profiles and fine earth granulometric composition in different regions of the ER map up to 200 cm. Redrawn from data in Rossi et al. (2015, 2018)

This function may be divided in three distinct regions, each with a different soil-crop relation:

Region I: low ER (<12–15 Ω m). This region of the relationship corresponds to the dark blue areas on the soil ER map. Here the crop behavior is not a strong function of ER: the wide gray zone around the function line corresponds to a wide confidence interval due to erratic crop response. Also the inversion of the slope in the soil-crop relation points to problem areas. The study of the soil profile in one of these zones shows that the landscape position and a high clay content result in a hydromorphic soil profile (A1 in Fig. 4.4) indicating waterlogging. This explains the erratic crop behavior: the area may be productive at times of low precipitation but less productive in periods of high rainfall when the soil will hold too much water and impair root physiology and production.

Region II Intermediate ER (approximately between 12 and 25 Ω m) Most of the field falls in this relatively narrow range of ER values (areas in green and yellow on the ER map). Here the crop biomass is very sensitive to ER: the vegetation function increases steeply as ER decreases and confidence intervals are very narrow, therefore the ER-crop relation is statistically strong and vegetation will reliably behave in response to ER. Management can usefully be planned according to ER in this area, with continuously variable application of agricultural inputs or through ER-based zonation of the field using common clustering criteria.

Region III High ER (>25 Ω m). This corresponds to red–black areas in the field. In this area the soil-vegetation relationship is again erratic, and has a lower, variable and even inverted slope. This is a strong suggestion of problem areas, and the study of soil profiles in such spots revealed a high stone percentage or the presence of impeding soil layers such as hardpans.

Authors (Rossi et al. 2018; Pollice et al. 2019) therefore propose a two-step procedure for identifying Management zones where step 1 consists of studying the soil-crop relation and separate zones with different relationships and step 2 consists of applying common clustering criteria for MZ to the separate zones or only to the field areas where the soil-crop relation is strong and reliable. Also, finding areas where the soil-crop relationship changes may help identify zones with soil constraints and provides a spatial indication to study their properties and suggest different management options than what effective in other areas.

Rossi et al. (2018) compared this approach with common clustering criteria used in the literature such as fuzzy clustering applied to the whole field, and point out that fuzzy clustering alone, based on similarity criteria and not on the functional relation of crop and soil data, would result in wrong management decisions in areas I and III of the field and not provide indications for identifying soil constraints.

Datasets in this study showed several features common to spatial data in agricultural research. Among them:

  • different resolution and density or misalignment of data from different sensors (e.g. resistivity and NDVI maps)

  • many sources of data errors

  • lack of normality in the frequency distribution of data

  • non-linear relations between soil and crop data

  • spatial relations of data (autocorrelation, covariance, …)

  • need to account for soil-plant relationship in a way useful for applications.

The treatment of such and other features was addressed in the papers (Rossi et al. 2015, 2018; Pollice et al. 2019) and is the object of the following paragraph of this chapter.

4.2.1 Data Filtering, Spatial Interpolation, Statistical Modeling and Management Zone Delineation

In agriculture, techniques using proximal soil and crop sensors allow to acquire data fast and at extremely high spatial resolutions, and therefore to obtain a large number of data points per unit surface. Depending on the sensor and the terrain some datasets contain a variable amount of systematic and random errors that include georeferencing errors, operator error, few extreme values due to local crop failure, to poor establishment or planter skip. This is very typical of yield maps that require data filtering prior to interpolation (Simbahan et al. 2004), or soil geophysical surveys that can be affected by different sources of noise contamination. Sometimes the goal is to analyse the functional relationships between variables (i.e. yield responsiveness to soil and terrain attributes). To this regard it is worth noticing that different techniques exhibit different data density and spatial distribution, but also that data acquired at different dates are commonly misaligned in space even if measured with the same sensor. Very often, finding a functional relation between yield and soil or sensor surrogates of soil variables requires appropriate treatment of problems related to spatial misalignment (or change of support Problem—COSP, see Gelfand et al. 2010, Chap. 29) and to the large data size. Also sensor data can show a relatively large amount of outliers due to accidental sensor flaws or (this is the case of proximal soil sensors) to rough terrains which prevent optimal coupling between sensor and soil surface. Pre-processing is usually necessary for many sensors to remove outliers, reduce background noise, correct positioning errors and align data acquired at different locations. Data filtering and spatial interpolation on a common lattice covering the field allow to upscale the data to a common support. Here we address three steps frequently taken in sensor data processing: data filtering, spatial interpolation and statistical modeling. We refer to methods by far more computationally efficient than those traditionally used in this field. Indeed, standard variogram modeling and kriging would hardly be feasible even with few thousands data points.

In the following paragraph we will give a brief overview of:

  1. i.

    the median filter as an example of simple but efficient filter for crop and soil sensor data,

  2. ii.

    deterministic interpolators (inverse distance, spline and nearest neighbors) as rapid tools for large sensor datasets

  3. iii.

    statistical modeling for high density misaligned sensor data aimed at facilitating yield map interpretation or as a tool for field zonation.

4.2.1.1 Data Filtering: The Median Filter

Proximal sensing can yield massive amounts of finely spaced data. Depending on the technology and the terrain, some datasets can be affected by a variable amount of unwanted noise. Conductivity meters, for instance, are sensitive to the electrical interference from nearby metal objects. For galvanic sensors a poor contact between soil and electrodes can result in a large number of null values, while hitting rock fragment clusters can spike up resistivity values of several orders of magnitude. This unwanted clutter can be filtered out to facilitate the recognition of a broader pattern. The median filter is a simple yet effective tool to remove extreme values without altering the general trend. A median filter operates by calculating the middle value of an ascending-ordered sequence of numbers within a moving window of a given dimension. Every time an observation, within this windows, departs from the median (above or below a user’s defined threshold) it is replaced by the window’s median. Tabbagh (1988) gives the first example of the use of median filtering for improving on-the-go resistivity survey data quality. The author showed how median filtering de-spikes data without altering the broad pattern. Median filtering algorithms are now available at no cost in many open-source software items such as “R” (https://CRAN.R-project.org). Specifically designed for sub-surface geophysical survey data processing, WuMapPy is an open source python package which provides a median filtering routine (Marty et al. 2015). Median filtering has also been used for plant-based data image processing to improve the accuracy of discrimination and mapping of weed patches in sunflower fields (Peña-Barragán et al. 2007). A data filtering protocol for management zone delineation, which includes median filtering, was described by Córdoba et al. (2016). Median filtering was also used by Mavridou et al. (2019) for the morphological analysis of fruits based on image analysis.

4.2.1.2 Spatial Interpolation

Spatial interpolation can be considered a special case of statistical inference because it implies prediction over a spatial process. Observed values at certain geographical locations (sampling sites) are used to predict the unobserved values at unsampled sites. Interpolation techniques can be classified into two comprehensive categories: deterministic versus stochastic. Deterministic approaches rely on mathematical functions to derive surfaces from sample data points either on the basis of similarity between points or on the degree of smoothing (Adhikary and Dash 2017). Stochastic methods are founded on the statistical properties of sample points, the field is regarded as a random process and the optimality of the smoothing method is established in terms of minimizing some specific criterion (Babak and Deutsch 2009). Deterministic techniques include: inverse distance weighting (IDW) and the use of spline functions. To stochastic interpolators belongs the large family of geostatistical kriging with its many variations such as simple, universal, ordinary kriging which rely on variogram estimation. A number of works have compared deterministic versus stochastic interpolators but mixed outcomes were obtained (Gong et al. 2014; Gotway et al. 1996; Kravchenko 2003; Mueller et al. 2001, 2004). Very often these techniques were used to interpolate sparse data (e.g. soil survey data). A typical soil survey datasets for variogram estimation comprises few hundreds data points over hectares (Webster and Oliver 1992). This data density is greatly overridden with proximal soil sensing. As an example, Rossi et al. (2013) measured soil resistivity with a continuous resistivity profiling equipment over a 3.5 ha. Tempranillo vineyard. Data were acquired along parallel rows spaced approximately 5.60 m. Data density along the rows was very high, with a measure every 20 cm, yielding over 115,000 data points. In this case the computational effort required to estimate variogram parameters might not be compensated by a hypothetical gain in precision. “Light” deterministic techniques are very efficient and allow to process large geophysical datasets within minutes (Rossi et al. 2013). Inverse Distance Weighing is a spatially-weighted average of the sample values within a search neighbourhood (Shepard 1968; Diodato and Ceccarelli 2005). Unlike kriging, no prior information is needed for spatial prediction. IDW exhibits sensitivity to properties of data and data-bases (e.g. skeweness, anisotropy, samples spatial distribution) (Babak and Deutsch 2009). For instance, with IDW the choice of the exponent value (which greatly affects map accuracy) needs to be based on the data skewness coefficient (Kravchenko and Bullock 1999; Weber and Englund 1994). Examples of IDW potential use in Precision agriculture can be found in several works (Souza et al. 2016; Usowicz and Lipiec 2017; Robinson and Metternicht 2006). Other popular deterministic interpolators include spline functions. A nice definition of spline interpolation can be found in McKinley and Levine (1998, p. 1): “[The fundamental idea behind cubic spline interpolation is based on the engineer’s tool used to draw smooth curves through a number of points. This spline consists of weights attached to a flat surface at the points to be connected…The weights are the coefficients on the cubic polynomials used to interpolate the data. These coefficients ‘bend’ the line so that it passes through each of the data points without any erratic behaviour or breaks in continuity]”. Spline interpolation (e.g. cubic spline) has been used in crop science to interpolate climate time-series such as seasonal evapotranspiration (Sadler et al. 2000). Boer et al. (2001) applied thin plate splines to estimate temperatures and precipitations in Jalisco (Mexico-Boer et al. 2001). Rossi et al. (2013) interpolated high density multi-depth soil apparent resistivity spatial data using a cubic spline interpolation. Different soil variables might require specific interpolators. Robinson and Metternicht (2006) showed that while kriging was the best interpolator for electrical conductivity, pH required IDW, organic matter content was best estimated using a cubic spline.

Upon reviewing the literature it is clear that there is no such thing as a “one fits all” interpolation method. As a general rule of thumb for sparse data, uncertainty measures may help increase map accuracy. If a valid sample variogram can be computed, geostatistical kriging is likely to give good predictions in many circumstances. For dense proximal sensing dataset, users may benefit from non-computational-intensive deterministic interpolators to process large datasets in a reasonable time. Using these simple tools, however, always requires a close look to the data. Exploratory analysis is absolutely needed to select the most appropriate interpolator.

One of the difficulties in establishing a functional relationship between variables observed over a common spatial domain comes from the possible difference between the spatial support of the observations (i.e. each variable having its own data points). In Pollice et al. (2019), COSP has been treated through a non-standard approach: multiple spatial data were upscaled by interpolating data points to a square lattice overlaying the studied field. Because of differences in number and location of sampled spots corresponding to each spatial variable, a proportional nearest neighbors neighborhood structure was used to calculate the upscaled values. For each spatial variable the number of neighbors was considered proportional to the samples size at all grid points, where the neighbors’ values of mean, variance and covariance between spatial variables were obtained. Although such statistics were not the main intended product of the upscaling procedure, they were useful as inputs of the model likelihood for model fitting.

This latter example of multiple sensor data interpolation leads to the last topic covered by this paragraph: the statistical modeling aimed at estimating functional relationships between variables.

4.2.1.3 Spatial Modeling

The key concept of Precision agriculture is to deliver crop inputs when and where crops require. In most applications this implies the need of subdividing the field into two or more MZs: management units or zones where crop inputs (i.e. seed/fertilizer rates) are applied at different levels but uniformly within a given unit (Moral et al. 2010). Unsupervised classification techniques such as the fuzzy k-means algorithm are routinely employed in multivariate clustering for automatically identifying homogeneous field zones, also thanks to the availability of freeware user-friendly softwares that perform the task (Fridgen et al. 2004; Paccioretti et al. 2020). Spatial clustering techniques are an invaluable instrument to delineate functional spatial units for variable rate-equipment but do not answer key questions regarding the causes of variation in crop performance and give no clues on crop requirements. Establishing functional relations between plant production and individual soil characteristics can help identify which soil characteristics limit yield in different zones of the field. Shatar and McBratney (1999) compared different empirical methods such as neural networks, projection pursuit regression, generalized additive models and regression trees to model sorghum yield as a function of soil properties. Field-scale yield responsiveness to local field conditions can also be modelled through the boundary-line analysis (Shatar and McBratney 2004). This work features a comparison between individual yield-response functions and a single predictor variable (among soil properties) was chosen at each location on the basis of prediction of the smallest yield value. Simple implements such as correlation analysis can still give some indications about influential variables and subsequent management options albeit more sophisticated modelling techniques such as geographically weighted regression (GWR), specifically designed to handle spatial non-stationarity, may be more suitable if local variation of yield-response function exists (Fotheringham et al. 2003). GWR alone or in combination with temporally weighted regression predicted plant production with a relatively high accuracy (Feng et al. 2021). To select an optimal linear regression model, the shape of the relationship between the target variables should be inspected: a spatially constant relation requires a linear regression model, while GWR is well suited for a spatially varying relationship or even semi-varying (as in mixed geographically weighted regression) when some regression coefficients are globally constant while the others are geographically varying (Yang et al. 2019). Soil–plant relationships often exhibit non-linear features (Shatar and McBratney 1999). Non-linear relationships are conveniently modelled using smooth functional effects in generalized additive models (GAM) (Rossi et al. 2018). When nonlinear spatial patterns emerge in the map of regression residuals, it is possible to add a smooth function of sample coordinates to the model predictor, for instance specifically an anisotropic bivariate smooth represented using tensor product splines can be used. This model term accounts for the nonlinear spatial pattern of the vegetation which is not explained by the predictor variables. GAMs are fitted by penalized likelihood and automatic choice of smoothing parameters may be used minimizing an internal Generalized Cross Validation (GCV) criterion (Wood 2006). The first step is variable selection; starting from a set of alternative candidate models, a stepwise process of variable selection can be followed. The final model selection is performed through comparison of the proportion of the null deviance explained by the model, the Akaike information criterion (AIC), and the Bayesian information criterion (BIC). Such criteria quantify the model goodness of fit penalizing for model complexity to control over-fitting (Zuur et al. 2009). This kind of models apply however to a very simple scenario: a one-time measurement of two continuous variables. But what if things are more complex: say the variables are temporally and spatially misaligned and sensor data have been acquired at different spatial resolution? What if there are repeated measures that need to be jointly analyzed and the dataset involves several thousand observations? We will need a more sophisticated yet very computationally efficient approach. In the next paragraph we will review a case study reported in details in Pollice et al. (2019) in which the nature of the dataset required to address all these issues together.

In Pollice et al. (2019) the objective was to represent the nonlinear relationship between plant production (indirectly estimated by proximal sensing NDVI measurements) and soil information (represented by continuous electrical resistivity measures at three soil depths) through a smooth function. The specific characteristics of the dataset required to address the lack of correspondence of data sampling location and scale between soil and plant sensor data (spatial misalignement), the problem of the repeated measurements in time and along a depth gradient and the issue of the residual variation of unsampled spatial features. As is usually the case, a model-based solution for data integration is set within the Bayesian framework that allows to consider different sources of uncertainty and to rely on Markov chain Monte Carlo (MCMC) simulations to generate observations from the joint posterior distribution (Gelfand 2019). Hierarchical models are generally denoted through the structure: [data|process, parameters] * [process|parameters] * [parameters]. In the notation the brackets denote the probability distributions while the vertical bar symbolizes a conditional specification. The data driven by the underlying natural and anthropic processes is represented through a stochastic model at the top level of the hierarchy. Likewise a stochastic model will be specified for all the other processes under study. Uncertainty in both levels of the modelling will lead to unknowns/parameters. Models for the parameters are deferred to the third stage of the hierarchical specification. This specification which may appear very simple at a first sight is in fact quite rich: it allows multiple data sources, spatial and dynamic structure for both the data and the processes, measurement error in explanatory variables and much more. A structured distributional regression model was implemented to account for the position and the scale heterogeneity of the fodder yield surrogate variable. This model relies on the assumption that response conditional probability distribution belongs to a parametric family and that a regression specification underlies the estimate of each parameter. Specifically an additive composition of potentially nonlinear effects and an additional overall intercept forms the predictor of each model parameter. The nonlinear part of the predictor is composed of different effects such as spatial fields, interaction surfaces but also nonlinear effects of some continuous covariate (e.g. soil information). All the mentioned effects can be approximated through a linear combination of basis functions (Klein et al. 2015). Standard regression model are inadequate to treat cases in which an explanatory variable (our sensor data) has multiple measurements (e.g. repeated measures in time or space) that need to be jointly matched to a single-valued response variable While standard regression theory assumes that explanatory variables are deterministic or error-free in most cases biological processes do not follow this rule. If repeated observations of explanatory variables are available, they can be used to quantify the variation due to measurement error. This latter is actually a problem for inferences based on regression models. Bayesian measurement error correction addresses the issue by including the unknown true covariate values as additional unknowns to be imputed by MCMC simulations accompanying the estimation of all other parameters in the structured distributional regression model. Replicates are considered as contaminated observations of the unknown true covariate. The lack of independence between nearby observations determined by all sources of spatial variability, including erratic and deterministic components, is accounted for including a nonlinear trend surface within the framework of a structured distributional regression model. In this way the functional relationship under analysis (in our case the effect of soil on vegetation) is cleared of any source of spatial variability. All inferences on the structured additive distributional regression model are based on MCMC simulations performed within the open source software BayesX (Belitz et al. 2015).

Information retrieved from the analysis of yield—response functions, not only facilitate yield pattern interpretation, but can be used as a field zoning criteria itself, as shown in Sect. 4.4.2.1 (Rossi et al. 2018) where a field zonation is proposed, based on the functional relationship between alfalfa NDVI and soil electrical resistivity.

The study illustrated in Sect. 4.4.2 (Rossi et al. 2015, 2018; Pollice et al. 2019) estimates a nonlinear effect of soil (i.e. the surrogate variable soil resistivity) on fodder biomass (i.e. the surrogate variable NDVI), which increases monotonically as resistivity increases, then at certain values of resistivity the shape and sign of the relationship changes with a subsequent decline in NDVI which is steep at first and moderate subsequently. Cut-offs for the smooth function illustrated in Sect. 4.4.2 were defined to classify the field into zones corresponding to a different soil–plant relationship (Rossi et al. 2018). The field zonation illustrated in Sect. 4.4.2 can be considered an informed-clustering that splits the fields into areas of high, low, or season-driven yield responsiveness. Namely, the three zones the relationship was divided in can be considered: zones where fodder biomass is affected by even little changes in soil properties (zone II); zones where soil properties cannot be changed (zone III) or areas when evaluations are needed throughout the crop cycle (zone I). Each part of the smooth function then carries information on the extent and direction of the relations between soil and plant variation. The consequent identification of field zones therefore corresponds to areas where soil constraints are different from the point of view of management. This information is also invaluable as a basis to increase the representativeness and efficiency of ground truth validation destructive sampling which remains necessary for identifying the actual nature of soil constraints in each area. Zones where plant responsiveness to soil variation is high are identified as preferential areas for precision management, because the cause of variation is known and in some cases can be managed (i.e. precision irrigation, precision planting). In such areas a map of soil variation easily translates into a prescription map.

5 Conclusions

Geospatial techniques provide an invaluable tool for making decisions in agriculture, and especially for the spatial monitoring of farming impact on the environment and for spatially-aware management methods such as Precision agriculture with its variable rate application of inputs. They are also a basis for stratification, variation and co-variation criteria for the design and interpretation of soil and agronomic surveys and experiments. One of the most fruitful applications is the identification of uniform zones for management and sampling, and the best use of geospatial techniques for agriculture is a combination of data from different sensors. In order to overcome drawbacks of simple data fusion of different information layers, though, advances in data treatment should rely on agronomic interpretation of data, appropriate statistical treatment and management-oriented modelling of soil-vegetation relationships.