Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The term pedotransfer function (PTF) was coined by Bouma (1989) as ‘translating data we have into what we need’. Pedotransfer functions are regression functions used to predict soil properties that would be otherwise infeasible to obtain. Typical reasons for this infeasibility include, but are not limited to, the cost, time, difficulty or hazard involved in procuring direct measurements. Each PTF is developed around some insight into a soil’s physical, chemical or biological properties that relates a set of input parameters (predictor properties) to an output parameter (a predicted property).

Pedotransfer functions (PTFs) have multiples uses. They are essential, for example, in soil carbon stock assessment (Chap. 23) based on legacy soil data, where bulk density is usually not measured. PTFs can also be used to estimate soil organic carbon pools required in soil carbon evolution models. In digital soil mapping (Chap. 12), the use of pedotransfer functions is to provide more useful information in relation to soil attributes or soil functions. Pedotransfer functions can further be used to estimate the soil’s condition or capability (e.g. available water capacity). The predicted properties resulting from PTFs can be used as inputs into process-based simulation models to run scenarios on the effects of different agricultural management on soil functioning, drainage, evapotranspiration and biomass yields.

Some consider prediction of soil attributes from environmental variables (e.g. climate and topographic indices) in a spatial context as pedotransfer functions, but we would caution against that. We called this particular case the soil spatial prediction functions (SSPFs) (see Chap. 12 for more details). Pedotransfer functions sensu stricto are when we are predicting soil attributes from other soil attributes, orS = f(s). There is a possible intersection or area of overlap, between PTFs and SSPFs (e.g. the spatial component), which results in what we call spatial pedotransfer functions. Figure 7.1 illustrates the differences and possible overlap between PTFs and SSPFs. Pachepsky et al. (2001) and Romano and Palladino (2002) illustrate examples of spatial or contextual pedotransfer functions, but they are examples of S = f(s,r).

Fig. 7.1
figure 1

A Venn diagram showing the relationship between pedotransfer functions (PTFs), soil spatial prediction functions (SSPFs) and, their intersection, spatial PTFs

2 A Brief History of Pedotransfer Functions

Reviews on the development and the use of PTFs can be found in Pachepsky et al. (1999, 2015) and Wosten et al. (2001). Most of these reviews, however, are limited to the prediction of soil hydraulic properties, which regulate the retention and movement of water and chemicals in soils.

The concept of using empirical relations to predict soil properties can be traced to Briggs and McLane (1907) and Briggs and Shantz (1912) in their work on determining the wilting coefficient. Furthermore, various ‘rule of thumbs’ were formulated to estimate various soil properties. Probably because of its particular difficulty and cost of measurement, the most comprehensive research in developing PTFs has been for the estimation of water retention. With the introduction of the concepts of field capacity (FC) and permanent wilting point (PWP) by Veihmeyer and Hendricksen (1927), research during the period of 1950–1980 attempted to correlate particle-size distribution, bulk density and organic matter content with water content at field capacity (FC, θ at −33 kPa), permanent wilting point (PWP, θ at −1500 kPa) and available water content (AWC = FC – PWP). Nielsen and Shaw (1958), for example, presented a parabolic relationship between clay content and PWP from 730 Iowa soils.

In the 1960s various papers dealt with the estimation of FC, PWP and AWC, notably in a series of papers by Salter and Williams (1965a, b, 1966, 1967, 1969). These explored relationships between texture classes and available water capacity, which are now known as class PTFs. Salter and Williams also developed functions relating the particle-size distribution to AWC, now known as continuous PTFs.

In the 1970s more comprehensive research using large databases was developed. A particularly good example is the study by Hall et al. (1977) who used soil samples from England and Wales. Hall et al. (1977) established field capacity, permanent wilting point, available water content and air capacity as a function of textural class, as well as deriving continuous functions estimating these soil water properties. In the USA, Gupta and Larson (1979) developed 12 functions relating particle-size distribution and organic matter content to water content at water potentials ranging from −4 to −1500 kPa.

With the flourishing development of hydraulic models (van Genuchten 1980) and computer modelling of soil water and solute transport (de Wit and van Keulen 1972), the need for hydraulic properties as input to these models became more and more evident. Clapp and Hornberger (1978) derived average values for the parameters of a power-function water retention curve, sorptivity and saturated hydraulic conductivity for different texture classes. In probably the first research of its kind, Bloemen (1980) derived the relationships between parameters of the Brooks and Corey hydraulic model and particle-size distribution.

Lamp and Kneib (1981) introduced the term pedofunction, while Bouma and van Lanen (1986) used the term transfer function. To avoid confusion with the terminology, transfer function which is used in other disciplines with many different meanings, Bouma (1989) later termed the pedotransfer function.

From the 1990s to the early 2000s, the development of hydraulic PTFs became a popular topic of research. Results of such research have been reported widely from various countries globally, including the UK (Mayr and Jarvis 1999), Australia (Minasny and McBratney 2000), the Netherlands (Wösten et al. 1995), Germany (Scheinost et al. 1997b) and Iran (Ghorbani and Homaei 2002).

Since the late 2000s, the popularity of developing hydraulic PTFs continued (Santra and Das 2008; Twarakavi et al. 2009; Haghverdi et al. 2012). Here, the development of PTFs for special conditions is worth noting, such as saline and saline-alkali soils of Iran (Abbasi et al. 2011), permafrost soils of China (Yi et al. 2013) and volcanic ash soils of Japan (Nanko et al. 2014), and the use and development of PTFs for continental or global extent such as the work presented by Dai et al. (2013) for China, Hollis et al. (2012) and Tóth et al. (2015) for Europe and Glendining et al. (2011) for the world.

In addition, some PTFs consider adjustments because of the differences in criteria and measurements from existing pedotransfer functions. For example, as outlined in the previous chapter of this book (Fig. 5.3), sand fractions are different according to the IUSS/Australian classification (particle diameter 20–2000 μm) and the FAO/USDA criteria (particle diameter 50–2000 μm). Padarian et al. (2012) give equations for converting between these two classification systems. On the other hand, Henderson and Bui (2002) established relationships between pH measured in water and pH measured in CaCl2.

Although most PTFs have been developed to predict soil hydraulic properties, they are not restricted to hydraulic properties only. PTFs for estimating soil physical, mechanical, chemical and biological properties have also been developed (Table 7.1). In addition,PTFs were developed to also predict processes such as deep percolation. For example, Selle and Huwe (2005) used a regression tree approach to simplify process-based models to identify key soil and environmental variables which govern percolation. Wessolek et al. (2008) called these hydro-pedotransfer functions, as soil and hydrological variables are used to predict other soil processes. Wessolek et al. (2008) developed empirical functions that predict deep percolation and evapotranspiration from soil conditions, vegetation and land uses.

Table 7.1 Some examples of pedotransfer functions

Pachepsky et al. (2015) reviewed more recent developments in PTFs and identified research gaps that require future work:

  • The need for sufficient upscaling of PTFs. PTFs were mainly generated on point observations, and many applications require simulations on regional or continental extent. An example is saturated hydraulic conductivity which is highly dependent on the measurement support.

  • The need for more regional or specific PTFs for saline soils, calcareous and gypsiferous soils, peat soils, paddy soils, soils with well-expressed shrink-swell behaviour, and soils affected by freeze-thaw cycles.

  • The need for parameters governing biogeochemical processes, such as in soil carbon and nitrogen evolution models, where parameters are related to organic matter pools (e.g. Weihermueller et al. 2013). For these cases, soil heat transfer and water availability inputs can be improved.

  • The need to expand work on the spatial and temporal structure of PTFs which is not well known.

  • The use of PTFs in large-scale projects, where soil information is usually not represented properly. In soil carbon stock assessment studies, bulk density is usually not measured, so PTFs for bulk density are required, which can be a main source of uncertainty (Hollis et al. 2012).

3 Developing Pedotransfer Functions

The basic steps for developing PTFs are simple – in theory, S = f(s), and therefore:

  1. 1.

    Collect a sufficient data set of soil properties (S and s) that are suspected to having empirical relationships to each other.

  2. 2.

    Set aside a certain fraction of the data for developing the PTFs, and use the remaining data for testing the performance of the PTFs (e.g. an 80:20% split of the data set).

  3. 3.

    Choose a modelling method f for analysing the data (e.g. linear regression, neural networks or other machine learning algorithms), and subsequently develop the empirical equations.

  4. 4.

    Test the empirical equations on the testing data to show their validity.

  5. 5.

    Calculate the output uncertainty.

4 Predictors

There are several sources of information that can be used to predict soil properties and that can be considered as input for pedotransfer functions. Here, we will present the use of PTFs and their potential predictors which are sourced from the laboratory, field description (including soil morphology) as well as the soil electromagnetic spectrum.

4.1 Laboratory Data

Laboratory analysis of soil samples is usually conducted to allocate a particular soil profile to an existing soil class. The high cost of laboratory analysis, however, drove the development of empirical relationships by relating more easily or routinely measured soil properties to other attributes that are, for example, more useful for soil management purposes. One of the well-known examples is the estimation of available water capacity from particle-size distribution. The development in pedotransfer functions is boosted by the availability of large national or regional soil databases, which allows the use of machine learning tools. The most useful variable in predicting soil physical properties is perhaps clay content, as it affects moisture retention, soil strength and many physical and chemical processes. Routine analysis usually lacks of physical data. Research is still mainly focusing on improving the prediction of hydraulic properties, such as water retention and saturated hydraulic conductivity. Some simpler analysis, however, has also been utilised to estimate more difficult-to-measure properties, such as pH in sodium fluoride which is an indication of phosphorous sorption capacity (Gilkes and Hughes 1994).

4.2 Field Description and Soil Morphology

Most research has been focused on correlating laboratory-determined soil properties with more difficult-to-measure properties, mainly because of the availability of comprehensive soil survey databases and the presumption that these properties are most appropriate for predictive purposes. However, it has also been recognised for some time that soil morphological description could be used as predictor (O’Neal 1949, 1952; McKeague et al. 1984; McKenzie and McLeod 1989; McKenzie and Jacquier 1997).

Calhoun et al. (2001) contended that soil morphology and field description have been underutilised in the development of pedotransfer functions. They presented the representation of Jenny’s state factors through the variables’ physiography, parent material, horizon, field texture and structure as collected in soil surveys for predicting bulk density. They demonstrated that morphology and field descriptors account for more variability in predicting bulk density than laboratory measurement of particle size and organic carbon. Physiographic description and soil morphological characterisation (slope gradient, position of the slope and horizon classes) were also found as useful predictors of water retention (Rawls and Pachepsky 2002).

Several studies have been successful in predicting hydraulic conductivity by using soil morphological features (e.g. O’Neal 1952; McKeague et al. 1982). However, the descriptive systems and interpretative guidelines in conventional soil survey have been largely qualitative and only appropriate for a given range of soils. McKenzie et al. (1991) found that several published descriptive systems for inferring hydraulic properties provided poor predictions for a limited range of soils from South Australia. McKenzie and Jacquier (1997) reasoned that good predictive relationships should only be expected when the field criteria used have a logical physical connection with hydraulic properties. They further postulated that predictive systems that develop direct relationships between hydraulic properties and field criteria of physical significance should be superior to systems that rely on classified entities such as horizons or soil series. They devised a simple visual estimate of areal porosity and found that saturated conductivity can be estimated from field texture, grade of structure, areal porosity, bulk density, dispersion index and horizon type. A similar idea was performed by Lin et al. (1999), who converted morphological properties to scores which are related to water flow. From these studies, it was concluded that additional morphological descriptors to those routinely surveyed may be needed to improve the predictive capacity.

4.3 Handheld and On-the-Go Proximal Soil Sensing and Remote Sensing

4.3.1 Handheld, Stationary, Proximal Soil Sensing

As outlined in Chap. 5, in traditional soil surveys, soil scientists used the visible light spectrum through the Munsell soil colour chart to determine soil colour and the presence of pedological features like mottles or concretions. Furthermore, it was discussed in the previous chapter that developments in spectroscopy have resulted in an increase in the potential for soil analysis, and we will include a short summary of its capability here (Fig. 7.2). Diffuse reflectance infrared spectroscopy in both the visible-near (400–700–2500 nm) and mid-infrared ranges (2500–25,000 nm) allows rapid acquisition of soil information in the field or in the laboratory. Diffuse reflectance infrared spectroscopy is based on the fact that molecules have specific frequencies at which they rotate or vibrate corresponding to discrete energy levels. Absorption spectra of compounds are a unique reflection of their molecular structure. Spectral signatures of soil materials are characterised by their reflectance to a particular wavelength in the electromagnetic spectrum. Soil spectra in the vis-NIR and MIR ranges can be used to estimate a range of soil physical, chemical and biological properties simultaneously. Good results were reported for measurement of total C, total N, clay and sand content, CEC and microbial activity (Soriano-Disla et al. 2014).

Fig. 7.2
figure 2

The electromagnetic spectrum and regions useful for soil measurement

Mid-infrared (MIR) spectroscopy usually produces better predictions than vis-NIR. The use of MIR also enables estimation of various soil organic carbon pools derived from tedious and time-consuming physical fractionation procedures. These pools can be used as inputs in soil carbon evolution models. Vis-NIR spectrometers particularly are used extensively and gained popularity in soil science because they are also available in a portable format and easy and ready to use in the field and require minimal or even no sample preparation. Reviews on the use of vis-NIR for predicting soil properties can be found in Stenberg et al. (2010) and Soriano-Disla et al. (2014).

Because soil is a complex mixture of materials, it is difficult to assign specific features of the spectra to specific chemical components. Ultraspectral data obtained from infrared spectrometers contain thousands of reflectance values as a function of wavelength. Since there are more predictor variables than the observations and predicted soil attributes as outlined in the previous chapter, methods that reduce the dimension of the spectra are required. Principal component regression and partial least squares (PLS) methods are commonly utilised. Principal component regression reduces the dimension of the spectra via principal component analysis and then form linear regression between the principal components and soil attributes (Martens and Naes 1989; Chang et al. 2001). Partial least squares (PLS) (Martens and Naes 1989) extracts successive linear combinations of the spectra, which optimally address the combined goals of explaining response variation and explaining predictor variation. Other machine learning techniques that are capable of variable (wavelength) selection have also been found useful (e.g. Minasny and McBratney 2008; Sarajith et al. 2016).

In addition to vis-NIR spectroscopy, the direct measurement of the elemental concentration of soils in the field also became possible using energy-based portable X-ray fluorescence (XRF) devices (Weindorf et al. 2012). Bulk density can also be estimated utilising photogrammetry via a digital single-lens camera or laser scanning (Bauer et al. 2014; Rossi et al. 2008).

4.3.2 On-the-Go Proximal Soil Sensing

While we can collect detailed soil information at limited locations using conventional methods of soil analysis and interpolate resulting values across space and time using geostatistics, in some instances it would be more beneficial if we could directly measure soil information at a fine spatial scale (e.g. measurements every 2–20 m). In this instance, proximal soil sensing offers a cost- and time-effective solution (Viscarra Rossel et al. 2010). Proximal soil sensing acquires information about soil through the use of field-based sensors that are placed in proximity to the soil (within 2 m) or within the soil body, which is in contrast to remote sensing (McBratney et al. 2011a, b). The development and use of on-the-go proximal soil sensing techniques is motivated by the need for high-resolution spatial and temporal soil information. Proximal soil sensors operate on a range of frequencies in the electromagnetic spectrum, from microwaves to gamma rays. These sensing devices either measure soil properties directly or can be used to make inferences via PTFs about specific soil properties. Often sensors are also used simultaneously to overcome the limitations of single-sensor data interpretation (Wong et al. 2010). For example, electromagnetic induction instruments (EMI) are used to measure the soil’s electrical conductivity, a highly valuable soil property that is influenced by soil porosity, moisture content, salinity, temperature and the amount and composition of soil colloids.

Ground-penetrating radar, electrical resistivity as well as electrical conductivity sensors are available to monitor the spatial distribution of soil moisture (Adamchuk et al. 2004). In addition, gamma ray spectrometers have been used to measure the amount of potassium, uranium and thorium in the upper soil profile which is most likely directly related to the parent material the surveyed soil originated from (Dickson and Scott 1997). Local PTFs have been developed to estimate soil attributes (such as clay and organic carbon content) from the sensed variables (e.g. bulk electrical conductivity, gamma K).

As outlined in Chap. 5, portable sensors can now be used in the field on profile and core faces for pedological studies, which is termed digital soil morphometrics (Hartemink and Minasny 2014). Field observation via proximal sensors and PTFs should be fused in an inference system into a powerful approach for estimating a range of soil properties for pedological studies, precision farming or contamination assessment (Horta et al. 2015).

4.3.3 Remote Sensing

The value of remote sensing over proximal sensing is that large spatial extents can be covered quickly with many estimates. The inferred value of remotely sensed data either airborne or satellite sourced has been shown to be an efficient means of assessing the condition of natural resources at reasonably broad scales (and this will be discussed further in Chap. 13). The remotely sensed data can include spectral, radar, thermal and radiometric signals. These reflect the environmental and soil condition and are known to be associated with soil properties. Mulder et al. (2011) reviewed the application of optical and microwave remote sensing for soil and terrain mapping. Soil properties that have been measured include mineralogy, texture, soil iron content, soil moisture content, soil organic carbon content, soil salinity and carbonate content. Its use for soil mapping is, however, hampered by vegetation cover. Nevertheless, indicators, such as plant functional groups, NDVI and productivity changes, can be used as indications of soil properties.

The application of remotely sensed infrared data for mapping soil clay content and mineralogy is demonstrated by Mulder et al. (2013) and Gomez et al. (2015). Some studies demonstrated that time series data collected from remotely sensed data can be used to derive soil hydraulic properties. Dimitrov et al. (2014) derived soil hydraulic parameters, surface roughness and soil moisture of a tilled bare soil plot using measured brightness temperatures at 1.4 GHz (L-band), rainfall and potential soil evaporation. This required a radiative transfer model and a soil hydrologic model combined with an optimisation routine.

5 Modelling Approaches

Approaches to develop PTFs can be purely empirical or physico-empirical. Empirical approaches attempt to find relationships between the predictor and predicted variables using regression analysis or various machine learning models. In a physico-empirical approach, the soil properties are derived based on some physical principles. For example, in water retention curve prediction, Arya and Paris (1981) translated the particle-size distribution into a water retention curve by converting solid mass fractions to water content and pore-size distribution into hydraulic potential by means of the capillary equation. Zeiliguer et al. (2000) proposed an additive model for soil water retention, which assumed that water retention of a soil can be approximated by the sum of the components of water retention of its textural composition.

Considering the type of data we wish to predict, we can distinguish single point and parametric PTFs. Single point PTFs predict a soil property, while parametric PTFs predict parameters of a model.

Most survey agencies have their own ‘rule of thumb’ for predicting soil properties. One form is a look-up table, which usually relates field texture class to properties such as clay content, available water capacity, etc. These rules or tables are usually derived from experience and expert knowledge or from means of properties for a particular class in a soil database.

For the continuous predicted variables, a range of machine learning models can be used to derive PTFs, finding relationships between the predictor and predicted variables. Many of the modern regression techniques are described in Hastie et al. (2009). The methods range from linear regression, generalised linear models (GLM) and generalised additive models (GAM) to regression trees, random forests, neural networks, genetic programming and fuzzy systems. Most of these tools are available in commercial and open-source projects. R (https://www.r-project.org) and Python (https://www.python.org/) are commonly used by the scientific community, because they offer many free-of-use advanced mathematical and machine learning tools.

The predictive power and interpretability vary between models depending on theircomplexity. Tables 7.2 and 7.3 provide a guideline for various models. The more complex the model, the more parameters it will have, so users need to be aware of the principle of parsimony (which is a general principle that for any model, which provides an adequate fit for a set of data, the one with the fewest parameters is to be preferred) (Lark 2001). There is a limit for predictive models; here, users should choose the simplest model that can adequately account for the variation in the prediction. Models with high complexity will appear to fit the data very well; however, these may also cause overfitting or include too many parameters in the model; thus the model will fit the noise of the data. It is recommended to split the data into a calibration and validation set, using the calibration data for fitting and then testing or validating the model with a validation set (see Hastie et al. (2009) for more detail). Wosten et al. (2001) compared the performance of three models to predict water content at −33 kPa from basic soil properties using the same data set. They reported that the accuracy of all three methods was similar and suggested that the improvement of fit may not be expected from the use of different models, but from a better set of data.

Table 7.2 Common machine learning algorithms used for developing PTFs
Table 7.3 Comparison of different mathematical predictive models

5.1 Ensemble Models

An alternative to selecting a single predictive model is model ensembles. This consists of creating multiple models and combining them to obtain a single final model. The advantage of this method is that, most of the time, the combined model performs better than any of the individual models in terms of lower error and unaltered bias. This method has been used for almost 200 years as pointed out in an interesting review by Clemen (1989). Baker and Ellison (2008a) discussed various aspects of implementation of ensemble methods for soil studies. In soil science, examples of its use are Baker and Ellison (2008b) who used ensemble ANN for PTFs. Kim et al. (2015) combined two microwave satellite soil moisture products, Malone et al. (2014) combined estimates of soil properties from soil maps and regression kriging prediction, and Padarian et al. (2014) generated an ensemble map of soil available water capacity in Australia.

Guber et al. (2009) suggested the use of all available PTFs in a multimodel prediction technique. They used 19 published PTFs as inputs in Richards’ soil water flow equation; the output of the 19 simulations was then combined to obtain a more optimal soil water prediction. The challenge in this type of ensemble method is how to calibrate and to use appropriate weighting for each of the PTF to obtain an optimal prediction.

6 Characterising PTF’s Performance

As with all numerical methods, there are questions concerning how well any prediction agrees with real observational data. In the literature, PTFs can be characterised by their accuracy, reliability, uncertainty and validity, as well as their ultimate utility. A brief survey of these concepts follows.

6.1 Accuracy

Accuracy refers to how well a PTF predicts its target property based on inputs taken from the training data. It measures the performance of a PTF on its training data (a PTF has ‘seen’ the data). Usually accuracy is expressed in terms of error, the difference between observed and predicted values. Weynants et al. (2009) amongst others used several common statistic measures for evaluating the accuracy of PTFs: the root-mean-square error (RMSE), mean absolute error (MAE), mean error (ME) or bias, coefficient of determination (R2) and the model efficiency. Accuracy in PTFs can also be computed with other statistics, e.g. the concordance correlation coefficient which measures how close the model predictions fall along a 45-degree line from the origin with the measured data (or a slope of exactly 1) (Lawrence and Lin 1989).

6.2 Reliability

Reliability in PTFs refers to a PTF’s performance in making predictions on data outside its original training data (data a PTF has not ‘seen’) (Pachepsky and Rawls 1999). A reliable PTF should produce accurate predictions for seen data (data used in the model development process), as well as unseen data (data that had not been used in the model development process) (Baker and Ellison 2008a). Pachepsky and Rawls (1999) state that the reliability of PTFs can be estimated by cross validation, or using an independent data set. In the cross validation method, the training data set is split into two subsets – a calibration set and a validation set; two-thirds of the data for calibration and one-third for testing are a common practice. However, the results from such cross validation can be biassed against the data set used. If a PTF is intended for prediction over a region, the independent test data set should contain observations that are unbiased (in statistics, collected based on a random sampling approach). PTFs that lack independent validation result in potentially optimistic assumptions about the functions’ predictive performance.

6.3 Validity

Validity has to do with how appropriate a particular PTF is in predicting a soil property from a given soil sample. The greater the similarity of a soil sample to the soil used to develop a PTF, the greater the assumed validity of that PTF. Validity can be in terms of the geographical and pedological region over which a PTF’s original training data were collected. If a PTF is used to predict soil properties outside its original data boundaries, its validity is doubtful (Wösten et al. 1999). Not surprisingly, PTFs perform best on soils having similar parent material and pedogenesis to the soils used to develop them (Bruand et al. 2003). Acutis and Donatelli (2003) stated that validity in PTFs is strictly related to the data set used to develop them. They add that when many PTFs are available to predict the same property, knowing which one to choose is a difficult task.

An important mechanism for establishing validity is stratification or the custom creation of PTFs strictly on soil-type or classification scheme basis. Stratification has been conducted according to soil horizons (Hall et al. 1977); soil classes (Batjes 1996); textural classes (Tietje and Hennings 1996); hydraulic-functional horizons (Wösten et al. 1986); great soil groups, temperature regime and moisture regime (Pachepsky and Rawls 1999); parent material and horizon morphology (Franzmeier 1991); numerical soil class (Williams et al. 1983); and management units (Droogers and Bouma 1997).

Validity can also refer to the congruence between some input data set and the original training set. Despite knowledge that the validity of a given PTF should not be interpolated or extrapolated beyond the pedological origin or soil type on which it is developed, there is still a lack of appropriate information that adequately describes the calibration data, and, thus, we know very little about where a published PTF may be applied. There is still a lack of a mechanism that can automatically check its validity. Tranter et al. (2009) give a method for determining the valid domain of a PTF based on Mahalanobis distance of the predictor space, cautioning that it is unwise to extrapolate PTFs beyond these bounds. The uncertainty estimates of a PTF can also be a measure of its validity.

6.4 Uncertainty

Uncertainty refers to the variability in a prediction from its mean value. This occurs because PTF inputs and outputs are random variables. Therefore, they have a mean value and a variance. PTF uncertainty is typically reported as the prediction variance. Uncertainty in PTF prediction can be quantified in terms of structural uncertainty due to flaws in the PTF model, uncertainty due to sampling and measurement errors and parameter uncertainty of the PTF.

Vereecken and Herbst (2004) suggest three approaches to handling uncertainty in PTFs: (1) Compute the RMSE at 90% confidence; (2) Quantify parameter uncertainty using a covariance in PTFs during the calibration process; and (3) Use a Monte Carlo analysis to quantify parameter uncertainty associated with sampling effects in the calibration database, e.g. the bootstrap method (Efron and Tibshirani 1993).

PTF uncertainty can be computed empirically based on the calibration error using the fuzzy k-means with extragrade (FkME) method given by Tranter et al. (2010). It does not seek to disseminate sources of error but rather expresses uncertainty in the form of a prediction interval determined empirically from the calibration data. The method partitions the predictor space into classes of similar model errors, with each class represented by a prediction interval determined from the empirical distribution of the error. In addition, it also identifies those observations that exist outside the convex hull of the calibration data, thus ascertaining validity of the PTF. Those observations outside the convex hull are considered outliers of the calibration data and subsequently have their uncertainty penalised by a simple multiplier.

6.5 Utility

Wösten et al. (2001) stated that the utility of PTFs in modelling is defined as the correspondence between measured and simulated functional soil behaviour. This can be interpreted to mean that the authors advocate validating the final use (utility) of the PTFs, not just the PTF predictions. An example would be to develop some PTFs that predict water retention and conductivity and then to use those predictions in crop simulations to predict seasonal water storage. Thus, the validation occurs at the seasonal water storage level, not at the level of the individual predictions of water retention and conductivity.

7 Spatial Pedotransfer Functions

Most PTFs have been calibrated from point source data and assume spatial independence. In digital soil mapping, we are interested in estimating the spatial distribution of soil properties. Pringle et al. (2007) recommended that an investigator who wishes to apply a PTF in a spatially distributed manner first has to establish the spatial scales relevant to their particular study site. Following this, the investigator must ascertain whether these spatial scales correspond to those that are adequately predicted by the available PTFs. Pringle et al. (2007) proposed three aspects of performance in the evaluation of a spatially distributed PTF: (i) the correlation of observed and predicted quantities across different spatial scales, (ii) the reproduction of observed variance across different spatial scales and (iii) the spatial pattern of the model error. For an example of predicting water retention across a 5 km transect, they showed that the tested PTFs performed quite well in reproducing a general spatial pattern of soil water retention; however, the magnitude of observed variance was underestimated. Springer and Cundy (1987) compared the parameters of the Green-Ampt infiltration equation from field measurements and those calculated from PTFs. They showed that the mean and variance of the parameters when estimated by PTFs were not preserved; the variances are always lower. The spatial trends and cross-correlations amongst the parameters were also reduced. They further used the PTFs to simulate overland flow and found that the results were significantly different when using field-measured parameters.

When measured properties are spatially limited, spatial prediction is required to generate a continuous map. Combination of spatial interpolation methods such as kriging and PTFs can generate a continuous map, and there are two possibilities to combine them. The first approach is to first interpolate related soil properties at unvisited locations using kriging and then to apply PTFs to the interpolated variables. The second approach applies PTFs to point measurements and then interpolates the predicted results. Bocneau (1998) compared these approaches to estimate CEC in West Flanders province, Belgium, and found that the performance of both methods is almost equal. Sinowski et al. (1997) compared these approaches in estimating the water retention curve and found that the first approach yields better prediction.

Heuvelink and Pebesma (1999) discussed the role of support or scale. As most PTFs were derived from point sources, they are not valid at the block support. This means that in the situation where the PTF input is available at point support and where output is required at block support, spatial aggregation should take place after the functions are calculated. It is essential to separate spatial aggregation from spatial interpolation. Interpolation should better take place before a function or model is executed because this enables a more efficient use of the spatial distribution characteristics of individual inputs. When a model is executed with interpolated inputs, it is important to note the uncertainty of the interpolation.

8 Soil Inference Systems

While there are many similar pedotransfer functions generated using new or existing data sets, there seems to be much less effort in gathering and using the available PTFs. McBratney et al. (2002) proposed a soil inference system that would match the available input with the most appropriate PTF to predict properties with the lowest uncertainty. The soil inference system was proposed as a way of collecting and making better use of pedotransfer functions that have been abundantly generated. McBratney et al. (2002) demonstrated the first approach towards building a soil inference system is to create a very rudimentary system in the form of a specially adapted spreadsheet. Such a rudimentary inference system has two essentially new features. Firstly, it contains a suite of published pedotransfer functions, and the output of one PTF can act as the input to other functions (if no measured data are available). Secondly, the uncertainties in estimates are inputs, and the uncertainties of subsequent calculations are performed. The input consists of the essential soil properties.

The inference engine will work in the following manner:

  1. 1.

    Predict all the soil properties using all possible combinations of inputs and PTFs.

  2. 2.

    Select the combination that leads to a prediction with the minimum variance.

There have been some attempts at pattern matching of PTFs using a distance metric (Tranter et al. 2009) or nearest-neighbour algorithm (Nemes et al. 2006). However, there have been no research applications that do what soil inference systems (SINFERS) aim to do, to build a system that would chain the PTF predictions together while accounting for uncertainty.

Morris et al. (2016) built an expert system software, which uses rules to select appropriate PTFs and predicts new property values and error estimates. SINFERS can use the estimated property values as new inputs, which can trigger more matching patterns and more PTFs to ‘fire’ cyclically until the knowledge base is exhausted and SINFERS has inferred everything it can about what it was originally given.

9 Soil Spectral Inference Systems

As discussed in Sect. 7.4.3, soil spectroscopy and proximal soil sensing research have mainly focused on spectral calibration and prediction of a range of soil properties using multivariate statistics. PTF research, on the other hand, is mainly focusing on predicting soil model parameters from other soil properties. There is no real connection between these research areas which have the same aim, to predict one soil property from other soil properties S = f(s).

It is desired to develop soil spectral calibrations for a complete suite of soil physical, chemical and biological properties. However, this might not be possible, mainly for two reasons: (i) Not all soil properties show a spectral response and (ii) the development of a comprehensive soil spectral library is quite challenging. McBratney et al. (2006) proposed a spectral soil inference system (SPEC-SINFERS), where soil diffuse reflectance spectroscopy is linked with PTFs. SPEC-SINFERS uses soil spectra to estimate various basic soil properties which are then used to infer other important and functional soil properties via pedotransfer functions (Fig. 7.3). An important feature to be considered is the propagation of both input and model uncertainties. Tranter et al. (2008) demonstrated the use of the SPEC-SINFERS approach in predicting volumetric soil water retention. This is for sure a research area that requires future investigations.

Fig. 7.3
figure 3

An example of a spectral soil inference system. Soil spectra were used to predict important soil properties, and these properties were, in turn, used to predict other properties, applying well established PTFs