1 Introduction

Soil resources provide many important ecosystem goods and services. However, they are at risk from a variety of threats operating over a broad range of scales. Political awareness that soil is threatened by increasing pressures has been rising for several years (European Commission 2006). Indeed, the demand for soil information is increasing continuously (Richer de Forges and Arrouays 2010). Although rates of soil degradation are often slow and only detectable over long timescales, they are often irreversible. Therefore, monitoring soil quality and condition is essential in order to detect adverse changes in their status at an early stage.

Soil monitoring is the systematic determination of soil properties so as to record their temporal and spatial variations (FAO/ECE 1994). In a recent review, Morvan et al. (2008) defined a soil monitoring network (SMN) as a set of sites/areas where changes in soil characteristics are documented through periodic assessment of an extended set of soil properties. According to this definition, national frameworks for soil monitoring exist in numerous countries and in most member states of the European Union. However, while some countries have achieved uniformity in methodology and coverage, this is far from common even among national systems (Arrouays et al. 2008a; Morvan et al. 2008; van Wesemael et al. 2011). In addition to achieving harmonisation, there are many generic issues that must be addressed by scientists when establishing and operating SMNs, including the requirement for these to be effective for different soil systems. Of particular importance is the requirement for SMNs to detect change in soil over relevant spatial and temporal scales with adequate precision and statistical power (Arrouays et al. 2008b; Desaules et al. 2010; van Wesemael et al. 2011).

In this chapter, we present some of these generic issues including the design and implementation of soil sampling in space and time, the development of statistical techniques that are general enough to describe the complicated patterns of spatial and temporal variations of soil properties and harmonisation issues.

2 Soil Monitoring Objectives

In a review of European SMNs, Arrouays et al. (1998) stressed that their establishment may have several objectives:

  1. 1.

    Determination of the current characteristics and properties of soils as well as their environmental stresses, which can be considered as an initial assessment of the soil status, often called “baseline” values, although the term “baseline” may be reserved for some assessment of soil state without the impact of human activities, inferred, perhaps, from nearby soils under climax vegetation

  2. 2.

    Long-term and/or early determination of changes in soils as a consequence of location-, stress- and use-specific factors, through periodic investigations

  3. 3.

    Assessment of the sensitivity of soils to changes and prediction of their future development;

  4. 4.

    Development and validation of models for the simulation of ecosystem responses and the use of these to estimate responses to actual or predicted changes and stresses and to make regional assessments in concert with survey data

  5. 5.

    Establishment of reference sites for calibration of environmental measurements

  6. 6.

    Generation of information about soil trends, to inform future national policies to protect soils from degradation and pollution, including the identification of new threats to soil quality/condition and tests of the effectiveness of existing policies

de Gruijter et al. (2006) grouped the objectives of SMNs into three broad categories that have implications when developing the options for the design of a SMN:

  1. 1.

    Status/ambient monitoring to characterise or quantify the status of soil and follow how its properties change over time, such as topsoil carbon content under different land uses

  2. 2.

    Trend/effect monitoring to assess the possible effects of pressures or drivers on soils to determine not only status but also whether a change was caused by a specific event or process

  3. 3.

    Regulatory/compliance monitoring to determine whether soils are failing to meet set standards or targets

3 General Considerations About SMN Design and Construction

The choice of design for a SMN is crucial, especially when assessing large areas and several properties that are driven by numerous controlling factors of various origins and scales.

3.1 Establishing a SMN

Several reviews have highlighted large differences between existing networks (Arrouays et al. 1998; Morvan et al. 2008; Saby et al. 2008b; van Wesemael et al. 2011). The geographical coverage of SMNs is very diverse between and within countries. Three broad approaches to the establishment of SMNs can be distinguished, including:

  1. 1.

    The design and construction of purpose-built SMNs

  2. 2.

    Resampling of the soil at sites where measurements have previously been made for other purposes

  3. 3.

    Compilation and analysis of soil data that have previously been collected in other soil analysis exercises or experiments

Purpose-built SMNs have been adopted by many countries (e.g. France, UK, Denmark, Austria, Switzerland, Germany) although in most cases the sites have, as yet, been sampled only once and hence remain inventories until sampling is repeated. The sampling design is critical when establishing new SMNs. There are continuing and extensive discussion about the choice between probability sampling, which permits design-based analyses free of any statistical model, and model-based sampling schemes, commonly regular grids with some supplementary points, which are analysed by model-based statistical methods (Brus and de Gruijter 1993, 1997; de Gruijter et al. 2006). The probability designs include a random component in the selection of sampling locations, whereas the purposive designs select sampling locations such that a specified objective is best served (see Chap. 11). Often the purposive design consists of a regular grid since this ensures that the study region is evenly sampled. This design choice controls the types of statistical analyses that can be performed. Probability designs are generally used to answer questions about behaviour across the whole study area or within a restricted number of subareas using design-based analyses. This means that inferences from the data are based upon the probability that particular locations are included in the design. Designs other than probability designs require model-based analyses (see Chap. 11) where statistical models of the variation of the property are estimated. If these models truly reflect the variation of the property, it is possible to make localised predictions and maps and to quantify the uncertainty associated with these maps.

The decision about the scale over which results should be reported presents issues in itself. To some extent it should be controlled by the scale at which policymakers require information (see Chaps. 23 and 17). However, some effects may only be observable at particular spatial scales. For example, Wang et al. (2010) demonstrated that effects of climate on soil organic carbon (SOC), which were evident at the provincial scale, were less evident at smaller spatial scales.

Discussions are still ongoing in Europe about the effectiveness of stratified random sampling compared to purposive sampling on a grid. Previous simulations have shown that a 16 × 16 km grid is representative of most soil-type/land cover combinations at European and national scales (Arrouays et al. 2001; Van-Camp et al. 2004; Morvan et al. 2008). In a report about the design and implementation of a future SMN for the UK, Black et al. (2008) provided an extensive review of the advantages, limitations and relative performances of these sampling options. This study compared two purposive designs (grid and optimised grid) and two probability designs (stratified random and stratified cluster random sampling). The stratified random scheme was found to be the most suitable option for some of the specific questions being addressed, particularly in terms of the assessment of status and changes in SOC. In a review of ten national SMNs focused on SOC changes, van Wesemael et al. (2011) showed that most of these SMNs (seven out of ten) are based on stratified random sampling. Indeed, several studies dedicated to sampling schemes for SOC monitoring have pointed out that a stratified sampling design would be more efficient (Walter et al. 2003; Goidts et al. 2009b; Viaud et al. 2010; Meersmans et al. 2011). In view of these studies, there appears to be a consensus that stratified designs should be selected if the aim of the SMN is to determine the average status and change of soil properties over large regions and if the spatial patterns of factors which control the variation of all of the soil properties are known. Major soil groups and land use categories are often suitable factors for the stratification of the design.

However, grid-based surveys have the advantage of achieving good spatial coverage, with proportional representation of the regions of interest. Overall, the grid-based sampling scheme should be more flexible for incorporating unknown future requirements such as the impact of new pressures and monitoring of new soil quality indicators and indicators for which spatial patterns are not yet known. Also, grid-based designs will in general be more appropriate if a key objective is to produce maps of status or change.

A further consideration is how the design of different phases of a SMN should relate to each other. With reference to probability sampling, de Gruijter et al. (2006) classified designs according to whether they are static and all sampling takes place at a fixed set of locations or whether the set of locations changes for each phase of the survey. Rotational designs are a compromise where only a proportion of the locations from the previous phase are resampled and new locations are selected for the remainder of the observations. de Gruijter et al. (2006) defined synchronous designs as those where multiple observations are made at the same time. There are trade-offs between these different classes of design. If locations are resampled, then the temporal variation at these sites will be well understood, but the spatial resolution of estimates can be improved if the locations change and more sites are visited. If the measurement approach is destructive or alters the soil properties at the site, then it might not be possible to revisit a particular location. Also static designs mean that any bias in the initial sample design persists throughout the life of the SMN. Static designs might be required if it is expensive to move and reinstall monitoring devices such as the lysimeters used by Brus et al. (2010). This SMN used a nonsynchronous design because the aim was to estimate the space-time means of the measured indicators. Other surveys favour synchronous designs because estimates of the indicators are required on different dates or because they lead to simple estimators (Brus and Knotters 2008). The aim of model-based surveys is often to produce a series of maps of soil indicators on different dates, and these are most easily predicted from synchronous designs (Marchant et al. 2009). However, the number of samples and the time taken to travel between them might mean it is not practical to use truly synchronous designs for national-scale SMNs; it may take more than 1 year to complete the sampling as in the National Soil Inventory of England and Wales (Bellamy et al. 2005).

Regardless of the choice between probability-based and purposive approaches, it is important to estimate, prior to implementation of the scheme, how many measurements will be sufficient to predict status and change of key soil properties with the precision required by policymakers (e.g. Black et al. 2008). The expected errors from a particular purposive sample design can only be determined if the variogram of the status and/or change of each indicator is known. The variogram (Webster and Oliver 2007) is a function which describes the variance and spatial correlation of a property (see Chaps. 10 and 21). It is the model in much model-based analysis of soil data. It is unlikely that the variograms are known exactly prior to monitoring, but approximate variograms can be estimated from previous surveys of similar indicators in similar circumstances. This approach has been used to design both probabilistic (Brus and Noij 2008) and purposive (Marchant et al. 2009) sample schemes. Often the required precision of a SMN is unclear because neither the rate of change of an indicator nor the implications of changes are known prior to sampling. In a recent study, Lark (2009) emphasised that the current status of a particular indicator and the rate of change of that indicator are different variables, and so their variability may differ. Some plausible statistical models of change in the soil were examined, and their implications for sampling to estimate mean change in large regions were considered. These results show that taking account of knowledge of soil processes may improve the design of the SMN. Some authors recommend adapting (or calculating) the sampling time interval to make sure that the observed changes will be significantly higher than the differences that might be due to sampling and other methodological issues (Smith 2004; Bellamy et al. 2005; Saby et al. 2008b), while others (e.g. Desaules et al. 2010) argue that given these uncertainties, reducing the time interval increases the power of the scheme to observe short-term and potentially important changes in the observed trends.

Finally, it should be stressed that resources for SMN establishment and operation are always limited to some extent, and this affects the actual choice of sampling strategy and places a premium on identifying an optimal scheme taking account of the monitoring objectives and a requirement for resource efficiency. Considering this limitation, Black et al. (2008) choose to test the design of a UK SMN on SOC status and changes, as these properties are involved in processes controlling a large number of threats to soil (e.g. decline in soil organic matter, erosion, soil biodiversity, compaction, fate of contaminants). Similarly, Yu et al. (2011) assessed the sampling required to detect a change of 1.52 g kg−1 in SOC under various types of land management in South China. Chapter 23 gives a relevant example of designing a cost-effective monitoring scheme for farm-scale soil carbon auditing.

Resampling inventory sites from past soil mapping surveys allows immediate estimates of change and reduces the opportunity cost of establishing a SMN, as the baseline sampling exercise is already completed. This strategy has been used extensively in Belgium for monitoring SOC (Arrouays et al. 1996; Sleutel et al. 2007; Goidts et al. 2009a, b; Meersmans et al. 2009, 2011). It supports a focus on the change in SOC stock at the point scale. Although Goidts et al. (2009a) resampled within a radius of 11 m of the original site of the Belgian National Soil Survey (1947–1974), the source of error related to imprecise resampling of each location was quite large (i.e. relative RMSE ranging between 12% and 31%) due to large variability in SOC concentration, bulk density, stone content and sampling depth at very fine spatial scales (i.e. variability within the same field). Consequently, given the response time of SOC to changes in management or land use (i.e. in the order of decades), most soil inventories are probably not old enough, and/or the rates of SOC changes at individual sites are too small to be detected by resampling. Nevertheless, the latter study shows that uncertainty because of positioning error was considerably lower when studying SOC stock changes for homogeneous landscape units (characterised by same land use, agricultural region and soil type) due to the fact that multiple locations (9–47) were sampled at this aggregated level (i.e. relative RMSE ranging between 1% and 11%). Indeed, other studies have been able to detect significant temporal changes when conducting SOC stock comparisons for areas rather than individual monitoring sites. For example, Meersmans et al. (2009) studied changes in the vertical heterogeneity of SOC by resampling soil profile pits from the National Soil Survey and comparing modelled depth distribution of SOC from both time periods within homogenous land use-soil-type combinations in North Belgium. Moreover, Sleutel et al. (2007) related average SOC stock evolution between 1990 and 2000 by municipality (from 190,000 SOC measurements) to agricultural variables (e.g. manure application, crop rotation, land use change) in order to derive the main factors explaining the overall SOC change over time in Flanders (North Belgium). Recently, Meersmans et al. (2011) identified an overall countrywide significant increase under grassland and a decrease under cropland after modelling the spatial distribution of SOC in all agricultural soils over all of Belgium in 1960 using the initial sites from the National Soil Survey and data for 629 locations resampled in 2006.

Analysing the results from existing soil measurement exercises, such as operational soil testing by farmers or fertiliser suppliers, is one potential option for detecting large temporal trends in soil characteristics. Pre-existing data, such as historic soil testing results, have often been used to assess temporal changes at national and regional levels, e.g. for phosphorus by Skinner and Todd (1998) in England and Wales, Cahoon and Ensign (2004) in eastern North Carolina (USA), Wheeler et al. (2004) in New Zealand, Lemercier et al. (2008) in France and Reijneveld et al. (2010) in the Netherlands and for carbon, Saby et al. (2008a) in France and Reijneveld et al. (2009) in the Netherlands. These studies assessed the change in soil test results with respect to land uses, cropping regimes and soil types. A spatial analysis of a soil test database performed by Baxter et al. (2006) in England and Wales contributed to designing future sampling approaches for monitoring soil properties at the national scale.

The conclusions drawn using these kinds of data may be subject to several sources of bias that are inherent in a noncontrolled sampling strategy. The farmers’ agronomic concerns for soil testing may have induced skews accentuating the proportion of extreme values, especially for trace element contents. Indeed, farmers are likely to require trace element soil testing when they suspect a crop or animal deficiency or toxicity. Moreover, possible biases may arise from changes in sampling resolution in space and time.

In deciding upon the monitoring approach to be used in SMN, managers must weigh the efficiency of purpose-built designs against the reduced costs and immediacy of change estimates from the other types of designs. The benefits of the purpose-built design might be strongly felt if the SMN has a long lifetime and is to be resampled on several occasions. If soil monitoring is required to quantify an immediate threat in the short term, then the use of existing soil observations becomes more important. If the resampling of an existing inventory is being proposed, then the suitability of the inventory design for soil monitoring must be assessed. If the initial inventory was a non-probability survey, then it will not be possible to apply design-based analyses to the SMN. Model-based analyses will require that the design of the inventory is adequate to estimate a model of the spatial variation in the change of key properties and to predict this change across the study region. The data from existing soil measurement exercises should only be used if they are considered to be representative of the underlying variability of the soil.

3.2 Within-Site Sampling

Site area and number of subsamples. When planning sampling in a site where a soil indicator is expected to change, it is necessary to know how many samples should be taken to demonstrate a given change and after how long this change will be detectable. At the site level, numerous studies have addressed these issues (Hungate et al. 1996; Garten and Wullscheleger 1999; Conen et al. 2003, 2004; Saby and Arrouays 2004; Smith 2004). Arrouays et al. (2008a) reviewed within-site variability using data from the literature. One hundred and twenty references were collected, providing information about the short-range variability of soil indicators, for sites having areas ranging from 1 m2 to 20 ha. The data were used to derive quantitative estimates of the mean variances, standard deviations and coefficients of variation for all available parameters. They examined the possible relationships between within-site variability and site area and/or mean values, and they found a strong relationship between the within-site variability of some parameters and the size of the site area. A marked relative increase in variability was observed for sites having areas >1 ha. This was particularly the case for some trace elements which are known to exhibit large spatial variations over quite short distances (Pb, Cd, Zn and Cu). In view of the increase in variability with site area and its implications for the number of samples that should be collected, they recommended using site areas not exceeding 1 ha to keep the number of subsamples practically feasible. If the aim of the SMN is to report the mean of an indicator over large scales such as soil-landscape units, then within-site variability is less important provided that the effect of this variability on the overall error of the mean is controlled, perhaps by forming a soil sample for analysis by aggregation of aliquots from across the site.

Due to resource constraints, most of the national monitoring sites are sampled using composite sampling, i.e. taking subsamples and bulking them. However, as has been stressed, studies of the subsampling error of monitoring sites are crucial for the interpretation of results and changes. In a study of results from the Swiss soil monitoring network (NABO), Desaules et al. (2010) showed that no certified trends can be stated after three measurement campaigns over a period of 10 years. Moreover, these authors stressed that the only way to detect reliable signals and trends earlier is to improve the overall measurement quality (precision and bias) and to shorten the sampling time interval.

Sampling depth. In their review of European SMNs, Arrouays et al. (2008a) and Morvan et al. (2008) showed that fixed-depth increments are predominantly used for core sampling (in more than 70% of the SMNs). This sampling method ensures standardisation between sites. It is also the most relevant for some anthropogenic characteristics (e.g. anthropogenic heavy metals, radionuclides, organo-chemicals) and for properties showing a strong gradient near the soil surface where the soil is often sampled over smaller increments.

Pedogenic horizons are often sampled in soil pits, outside the monitoring site, but close to it. This method of sampling is relevant for some parameters (e.g. particle-size distribution, water retention properties, mineralogy). It is also the most relevant unit to link SMN observations to geographical soil information systems derived from soil mapping activities.

For nearly all the SMNs, the organic layers at the soil surface are sampled separately from the underlying organo-mineral soil, and this is our recommendation.

For organo-mineral layers, we recommend adoption of systematic depths in order to avoid subjectivity in sampling, harmonise sampling protocols and facilitate comparisons between SMNs.

The best practice would be to sample both by depth increments in the site and by pedogenetic horizons in soil pits, outside the monitoring area, but close to it. Arrouays et al. (2008a) examined, for each European SMN, the depths to which indicators are measured or can be calculated which were highly variable amongst SMNs.

Another way to compare vertical sampling is to calculate for each SMN the maximal depth to which sampling is realised. About 90% of the SMNs provided information down to 20 cm, whereas nearly 65% of the SMNs reached at least 30 cm.

It is very difficult to recommend sampling depths which should be adopted for all SMNs. Moreover, there may be good reasons for accepting a particular depth in a particular SMN, and changing systematic depths for a national SMN might, in some cases, make it very difficult to use the data from previous campaigns to assess changes. For example, it is not possible to compare data for indicators based on a 0–15 cm sampling depth with that for the same indicators based on a 0–30 cm resampling depth. One way to harmonise reporting at the international level could be to report the results on the basis of an equivalent mineral mass (Ellert and Bettany 1995). However, this would require the determination of bulk density at all sites and at each sampling date. General considerations about using soil depth functions or horizons and classes are given in Chap. 9.

3.3 Resampling the SMN

One objective when resampling should be to replicate as closely as possible the original sampling methodology and location. This requires that the original methodology is documented completely, but even when this is done, it is likely that variation in detailed procedures will occur, for example, due to differences in practice between different operators. This extends to laboratory testing as well as field sampling. While the availability of global position system (GPS), especially if this incorporates a ground station, means that the longitude and latitude for sampling locations can be precisely recorded and repeat sampling can be exactly located, this does not extend to altitude, and very often the soil surface has been altered and sometimes eroded leading to uncertainty in the equivalence of sampling exercises. Deviation from sampling and analytical protocols and location errors are likely to be confounded with those arising from actual spatial and temporal variation in the indicator being monitored. When making in situ measurements, it is possible in principle to resample a specific location and the soil within it, but where a sample is extracted for laboratory testing, this is clearly impossible. In the latter case, it is essential to establish an adequate sampling scheme that can be applied rigorously at each sampling location, for example, by establishing a grid and removing samples from randomly determined locations at each sampling exercise.

4 Statistical Inference Issues

4.1 Design-Based or Geostatistical Methods

The variation of soil properties is very complex since soil is affected by many processes acting over different spatial and temporal scales. Local factors such as geological anomalies or anthropogenic pollution can distort and disguise the underlying relationships of interest. Therefore statistical analyses are required to test the significance of relationships and to determine the uncertainty associated with estimates and predictions. We described previously how some SMNs such as the Countryside Survey of Great Britain (Firbank et al. 2003) are based on probability sampling, whereas others such as the French National Soil Monitoring Network are based on purposive designs.

There are different statistical methods associated with these different types of designs. Design-based analyses which are reviewed by Barnett (2002) and de Gruijter et al. (2006) are associated with probability designs. These are well-established statistical techniques which can estimate summary statistics such as the mean, median or probability density function (PDF) of a soil property across the entire study region or a portion of it. They can be used to understand the underlying behaviour in the region and test hypotheses about the effect of particular factors or threats. An estimation variance is also calculated that quantifies the uncertainty associated with these estimates. Design-based methods can account for different stratifications of the data, compare different temporal phases of SMNs and determine whether a soil property has changed significantly between phases. Kravchenko and Robertson (2011) stressed the importance of performing power analyses prior to sampling to predict the sampling requirements and post sampling to determine if observed changes are significant and exactly what can be inferred from the absence of a significant change.

Soil monitoring networks based on purposive designs are generally analysed by geostatistical techniques which can be used to make local predictions and quantify uncertainties at any site of interest (see Chaps. 10, 11 and 14). Many of the geostatistical methodologies commonly used today can be directly attributed to Matheron (1965) and his analyses of the spatial variation of ore bodies. These methodologies are based on a statistical model known as the variogram which is fit to available data and describes the pattern of spatial variation of the observed variable (see Chap. 21). The fitted variogram is used in kriging (Krige 1966; Chap. 10) to predict the variable across the region and to calculate a measure of the uncertainty associated with the predictions, known as the kriging variance. These methodologies, which are introduced in an accessible manner by Webster and Oliver (2007), rely upon a number of assumptions about the statistical distribution of the soil indicator and the regularity of its variation. Brus and de Gruijter (1997) consider this a disadvantage of geostatistical methods since the inferences made from them will be invalid when these assumptions are not appropriate. In contrast, design-based methods do not fit models to the data but instead base inferences on the sample design and the probability that a point is included in the sample. The development of techniques to analyse SMNs which are sampled spatially by probability designs but temporally by non-probability designs is an active area of research (Brus et al. 2010).

A major challenge associated with geostatistical techniques, but not design-based techniques, is to ensure that the model of variation is appropriate everywhere in the study region. In the remainder of this section, we focus upon this challenge in various circumstances that might not be consistent with the standard geostatistical model.

4.2 Generalising the Geostatistical Model

Often the variation of a soil property is sufficiently consistent with the assumptions of standard geostatistical models for the methods of Matheron (1965) to produce adequate results. However in general the variation of soil indicators is much more complex. Therefore, since the 1960s, many methods have been proposed to generalise the geostatistical model so that, for example, the expectation (Lark et al. 2006b) or variance (Marchant et al. 2009) of the indicator can vary according to covariates such as soil type. Furthermore, in some situations, the kriging variance does not give a sufficiently complete description of the uncertainty associated with the SMN, and model-based geostatistical methods have been introduced to predict the entire PDF of the soil indicator at each site (see Chap. 11). Then the PDF can be interrogated to answer specific questions such as “What is the probability that the concentration of the soil indicator exceeds the regulatory threshold?” or “What is the probability that the concentration of the soil indicator has increased?”

The geostatistical analysis of national-scale SMNs can be particularly challenging. The vast area covered by these surveys means that the observed variation is the combined effect of processes acting over disparate spatial scales. The number of observations means that efficient computational methods are required to ensure that the statistical analysis is tractable.

4.3 Extreme Observations

Isolated geological anomalies or pollution can lead to outliers or extreme values amongst SMN observations. Outliers are not consistent with standard geostatistical models and can lead to the underlying uncertainty in the SMN being overestimated. Standard kriging methods can exaggerate the spatial extent of hotspots around outliers. In studies of trace elements, this can mean that excessive remediation is conducted or the areas of potential deficiencies are missed. This issue was addressed by Marchant et al. (2010) in a study of cadmium variation across France. They used robust geostatistical methods which reduced the influence of outliers when they fitted their models of variation. These models were used to identify outliers which were censored prior to kriging (Hawkins and Cressie 1984). Their methodology separated variation into geological, diffuse and anomalous components and meant that underlying relationships could be investigated. When these methods were applied to a wider group of trace elements (Saby et al. 2011), soil experts were able to identify processes contributing to variation at each scale. Figure 22.1 shows how the variation of lead across France is divided between these scales.

Fig. 22.1
figure 1

Components of lead variation at the geological (a), diffuse (b) and anomalous localised (c) spatial scales estimated by robust geostatistical methods (Figure reprinted from Arrouays D, Marchant BP, Saby NPA, Meersmans J, Orton TG, Martin MP, Bellamy PH, Lark RM, Kibblewhite M (2012) Generic issues on broad-scale soil monitoring schemes: A review. Pedosphere 22(4):456–469)

Robust methodologies are not appropriate for compliance monitoring where the risk of extreme values must be included and a model that can accommodate them is required. Marchant et al. (2011a) demonstrated that copula-based methods can accommodate general statistical distributions including the extreme value distribution. The PDF of the indicator of interest can be calculated at any site in the study region, and any relevant measure, such as the probability of exceeding a threshold, can be determined. They used this model to map the probability of cadmium exceeding a regulatory threshold of 0.8 mg kg−1 within France (Fig. 22.2).

Fig. 22.2
figure 2

Map of probability that regulatory cadmium threshold of 0.8 mg kg−1 is exceeded and (inset) PDF (probability density function) for cadmium at specified sites. Predictions are derived from a copula-based model (Figure reprinted from Arrouays D, Marchant BP, Saby NPA, Meersmans J, Orton TG, Martin MP, Bellamy PH, Lark RM, Kibblewhite M (2012) Generic issues on broad-scale soil monitoring schemes: A review. Pedosphere 22(4):456–469)

4.4 Different Sources of Uncertainty

Geostatistical methods can be used to quantify the uncertainty that results from the prediction of spatial maps based on observed data. However, the data obtained from a SMN includes other sources of uncertainty (see Chap. 14). In the field, there may be errors in locating observation sites. In the laboratory, there may be measurement error; for some trace elements, many observed values might be less than the detection limit, meaning that the value cannot be distinguished from zero. Our discussion focuses on continuous data such as concentrations of elements in the soil, but noncontinuous types of data such as radioactive emission counts require that the uncertainty is expressed in a different manner. Also, there can be errors in estimating spatial models. All of these components of uncertainty should be understood if we are to fully appreciate the total uncertainty of a predicted map (see Sect. 14.4.5). Rawlins et al. (2009) considered the relative influence of errors from different sources and strategies that do exist to isolate these different uncertainties. In large-scale SMNs that include many observations, the effects of these uncertainties might well be negligible. However, it is prudent to confirm that this is the case.

4.5 Location Uncertainty

Cressie and Kornak (2003) reviewed methods that account for location errors and suggested novel kriging equations which included such errors. Area-to-point kriging (Kyriakidis 2004) can be used to incorporate the uncertainty that results from data that are averaged over geographical units of varying sizes. The method is based on the assumption that the covariance between any two areal data units is the average of point-to-point covariances between the two units; this assumption allows a point-to-point covariance function to be fitted to represent the variation of the areal data, which can then be used to calculate the area-to-point kriging predictions.

4.6 Measurement Error and Detection Limit Data

Laboratory errors can be estimated if repeated measurements are made at a small number of sites in a survey (Marchant et al. 2011b). Orton et al. (2009) used information from such repeated measurements to define a simple Gaussian measurement error model, and this was combined with the effects of the micro-scale field variation (which was assumed to be Gaussian on a log scale) using a Bayesian hierarchical modelling approach (Banerjee et al. 2004).

When laboratory measurements are reported as being less than a detection limit (DL), it is important to consider how they should be included in a spatial analysis. Commonly, measurements below the DL are incorporated in the analysis by replacing them with some function of the limit (e.g. DL/2). Although this approach is simple and allows analysis through the standard variogram estimation and kriging methods, Helsel (2006) observed that it can lead to biased estimates of the mean and variance. De Oliveira (2005) and Fridley and Dixon (2007) used data augmentation in the Bayesian framework to incorporate DL data in geostatistical prediction, in which the DL data were replaced by sampled values below the DL using a Markov chain Monte Carlo method, and their uncertainty and effect on variogram estimation and prediction were determined.

4.7 Other Forms of Data

Noncontinuous soil indicators, such as emission counts of radioactive material from the soil, or the presence or absence of some bacteria can be observed in SMNs. In such cases, interest will typically lie in a nonmeasurable quantity: the underlying true quantity of the radioactive contaminant in the soil or the probability of the presence of bacteria at each location. Although we can proceed with analysis as if the measured quantity were our primary focus (e.g. by indicator kriging for binary data), uncertainty in such cases can be more appropriately described by some statistical description of the data-generating mechanism: count-type data can often be described well by a Poisson distribution, and binary data might be better described by the binomial distribution. For describing uncertainty resulting from such data, the generalised linear mixed model (LMM, Diggle and Ribeiro 2007) and Bayesian hierarchical modelling approaches (Banerjee et al. 2004) provide powerful expansions of the classical kriging methods (see Sects. 11.3 and 11.4); they provide more flexible statistical representations of the data than the classical approaches, so that the processes that gave rise to the measurements can be more accurately modelled. These general methods seem to offer significant opportunities for representing uncertainty in geostatistical analyses of these types of SMN data.

4.8 Uncertainty in Estimating Spatial Models

Parameters in spatial models can usually be separated into two sets: those that represent the expected value or a trend for the primary variable and those that represent its variance and correlation in space/time. Since Matheron’s work (Matheron 1965), the uncertainty about trend parameters has been accounted for through the ordinary or universal kriging methodologies. For the variance parameters, the kriging methods have adopted a plug-in approach: first, the parameters are estimated, and then the estimated values plugged into the kriging equations to calculate the prediction and associated kriging variance. Hence the uncertainty of the estimated spatial model is ignored. Marchant and Lark (2004) used the Fisher information matrix to further include variance parameter uncertainty in the resulting spatial predictions in a maximum likelihood framework. Bayesian methods also incorporate fully the effects of variance parameter uncertainty through Markov chain Monte Carlo methods (Minasny et al. 2011) (see Sect. 14.4.2).

4.9 Inclusion of Temporal Variation

After more than one phase of a SMN has been completed, the model of variation must be modified to quantify temporal variation in addition to spatial variation. Then kriging algorithms can be used to map the change in indicators across the study region. Different spatio-temporal models have been applied in existing monitoring surveys. De Cesare et al. (2001) reviewed the use of space-time covariance models. Papritz and Flühler (1994) suggested that different phases of a survey can be treated as coregionalised variables, and Lark et al. (2006a) used robustly estimated coregionalisation models to determine the sampling requirements for mapping change in metals in the part of eastern England. Bellamy et al. (2005) included the rate of change of SOC as a parameter in their model of variation. The challenge is to determine the most appropriate model for a particular SMN. It is important to validate the model once it has been fitted so deficiencies can be identified and strategies introduced to rectify them. For example, in a monitoring survey of phosphorus enrichment in the Florida Everglades, Marchant et al. (2009) identified that phosphorus was more variable in parts of the study region adjacent to pumping stations which input agricultural runoff. They used remotely sensed data of dominant vegetation to automatically identify these regions and adjusted their model to accommodate the larger variability.

5 Laboratory Testing Methods

The question of which soil measurement methods to recommend is complex. Most countries have long-established SMNs and use specific testing methods. Changing these methods to a different one would impede data comparison with previous results, unless parallel analyses are performed, using both national and new reference methods. An important example relates to the assessment of global SOC stocks or changes. The analytical methods may often be different in space or time. For example, modern analytical methods such as dry combustion might be used instead of the more common Walkley and Black method (Meersmans et al. 2009). Therefore correction factors are needed to avoid methodological bias when comparing SOC data from sampling campaigns using different analytical procedures (e.g. Jolivet et al. 1998; Lettens et al. 2007; Meersmans et al. 2009).

Arrouays et al. (2008a) reported information on soil testing techniques gathered by partners from all the European member states. They found that, in some cases, the applied test procedures were not sufficiently detailed; the information provided was often very vague, even after several requests, with partners reporting only the type of extract or equipment used. Nevertheless, for SMNs for which this information was available, the testing methods showed numerous differences, indicating that the use of international standards (when they exist) is far from common. Indeed, as numerous international standards for soil analysis are still lacking, standardisation will be one of the main issues in setting up a SMN at an international level. Clearly, there is a widespread need for agreeing testing methods and ensuring that these are validated and conducted to produce data of known and documented quality.

As a minimum, for each testing method employed, the following is essential: a fully documented procedure with details of calibration methods that ensure traceability to international standards; data on the repeatability (within-batch error) and reproducibility (between-batch error) of the method, based on repeated analysis of standard samples (preferably certified reference materials); and a testing method detection limit, based on an agreed multiple of the standard error for whole procedure blanks. In addition, to support continuing quality assurance procedures, repeated analysis is required of standard samples included within each batch and analysis of the results using statistical process control charts. Participation in inter-laboratory proficiency exercises is critical. Although using a single laboratory to test all samples ensures consistency in the quality of results, it does not guarantee adequate quality, and in this case, as when several laboratories are participating in testing, it is imperative that inter-laboratory comparisons are conducted to support and demonstrate sufficient inter-comparability between laboratories.

Except for those parameters for which a consensus exists, the question of testing method harmonisation remains a very difficult issue. For several parameters, combining several techniques, on all samples or on a subset of samples, from archives can be a useful option to harmonise data obtained using different or inadequately validated testing methods. This can allow samples taken in previous campaigns to be used to detect changes and to establish pedotransfer functions for prediction of non-measured indicators (e.g. estimation of bulk density based on texture and organic matter content).

Generally, the main cost in soil monitoring is from sampling in the field, and adding more indicator measurements has a relatively low opportunity cost depending on the complexity of the testing procedure.

6 Archiving Samples

Soils are monitored through long-term networks that require long-term commitment from researchers and from funding agencies. In numerous countries, soils are monitored on the basis of national schemes. Despite these enormous efforts to characterise soils, it is striking that in the European Union, for instance, about 40% of the monitoring programmes do not archive soil samples that are collected and analysed (Arrouays et al. 2008a; Morvan et al. 2008). However there are very good reasons to retain samples.

We do not know what we will be interested in the future. When the Broadbalk experiment was established in 1843 at Rothamsted, UK, researchers were certainly not aware that their decision to carefully archive samples taken from the experimental plots would enable monitoring a posteriori of the levels of polychlorinated biphenyls in the environment (Alcock et al. 1993).

New analytical techniques will arise in the future. These will be more precise and/or will allow the use of new tracers of environmental or biogeochemical processes. A number of substances which cannot be detected using current testing methods will become quantifiable. Progresses in microbiology and molecular tools already enabled soil DNA libraries to be built to explore the micro-biodiversity and its long-term changes in relation to global change or other pressures (Dequiedt et al. 2009, 2011; Gardi et al. 2009; Ranjard et al. 2010; Bru et al. 2011). Techniques and standards for soil analyses are evolving continuously. Thus it is good practice to retain soil samples so that they can be tested in the future. However, archiving samples raises some scientific and technical issues concerning the effects of changes in sample properties with time: effect of drying (temperature) and ageing on numerous soil properties, e.g. volatilisation of persistent organic pollutants (Garmouma and Poissant 2004), changes in trace element speciation and bioaccessibility (Martens and Suarez 1997; Martinez et al. 2003; Furman et al. 2007), changes in pH (Prodromou and Pavlatou-Ve 1998), changes in phosphorus solubility (Styles and Coxon 2006) and changes in microbial communities identification (e.g. Clark and Hirsch 2008; Tzeneva et al. 2009).

7 Harmonisation Issues

The uniformity of methodologies and the scope of actual monitoring networks are variable between national systems. A recent review identified the differences between existing systems and described options for harmonising soil monitoring in the member states and some neighbouring countries of the European Union (Morvan et al. 2008). The present geographical coverage is uneven between and within countries (Morvan et al. 2008; Saby et al. 2008b). The most serious barrier identified, which limits the harmonisation of data from existing SMNs, is the wide variety of soil testing methods that have been employed historically. Harmonisation can be defined as the minimisation of systematic differences between different sources of environmental measures (Keune et al. 1991). There are some opportunities for harmonising data obtained using different testing methods, for example, by regression analysis, but these are limited. Recently, Baume et al. (2011) proposed a universal kriging approach that is able to deal with the issue of merging data from different monitoring networks. The establishment of benchmark sites devoted to harmonisation and inter-calibration has been advocated as a technical solution by many authors (e.g. Theocharopoulos et al. 2001; Wagner et al. 2001; Van-Camp et al. 2004; Kibblewhite et al. 2008; Morvan et al. 2008; Gardi et al. 2009). Cathcart et al. (2008) have recently set up 43 benchmark sites in Alberta, Canada, to monitor agricultural soil quality, and the sites were selected to be representative of a number of soil chemical and physical properties across the region. However, at present, few studies have addressed crucial scientific issues such as how many calibration sites are necessary and how to choose them (Louis et al. 2014).

8 Conclusions

Numerous scientific challenges arise when designing a SMN, especially when assessing large areas and several properties that are driven by numerous controlling factors of various origins and scales.

Three broad approaches to the establishment of SMNs can be distinguished, including:

  1. 1.

    The design and construction of purpose-built SMNs

  2. 2.

    Resampling of the soil at sites where measurements have previously been made for other purposes

  3. 3.

    Compilation and analysis of soil data that have previously been collected in other soil analysis exercises or experiments

It is essential to establish an adequate sampling protocol that can be applied rigorously at each sampling location and time. The organic layers at the soil surface should be sampled separately from the underlying organo-mineral soil, and organo-mineral soils can be sampled both by depth increments in the site and by pedogenic horizons in soil pits, outside the monitoring area, but close to it. Different statistical methods should be associated with the different types of sampling design. Several studies propose new statistical methods that account for different sources of uncertainty (e.g. location errors, measurement error and detection limit, estimation of spatial model). Except for those parameters for which a consensus exists, the question of testing method harmonisation remains a very difficult issue. For several parameters, combining several techniques, on all samples or on a subset of samples, from archives can be a useful option to harmonise data obtained using different or inadequately validated testing methods. The establishment of benchmark sites devoted to harmonisation and inter-calibration has been advocated as a technical solution. However, no one has addressed crucial scientific issues such as how many calibration sites are necessary and how to locate them.