Keywords

1 Introduction

The use of spectral information for estimating soil characteristics is a rapidly growing research area, with much of the current effort directed at infrared or visible–near-infrared wavelengths. The use of visible wavelength light alone has been demonstrated to be useful (Liles et al. 2013; Ibanez-Asensio et al. 2013). Soil colour attributes have been measured using a number of different ways, ranging from naked eye comparison with Munsell colour charts (Aitkenhead et al. 2013) to electronic measurement (e.g. Gunal et al. 2008). Proximal sensing of soil with digital cameras has also been used. Mausel et al. (1997) explored the potential of digital photography for identifying spectrally distinct soil types. Levin et al. (2005) used colour indices from digital photography to estimate iron oxide content and textural parameters in sandy soils, whereas Gregory et al. (2006) estimated soil organic matter content using a digital camera with visible and near-infrared wavelength capacity.

There are some examples of research directed towards the engineering design of soil proximal sensing systems, for example Rossel et al. (2008), but most are targeted at soil or remote sensing scientists. Some work has considered the design and practicalities of soil imaging systems from an agricultural perspective (e.g. Chung and Joh 2012). There is also some research that crosses the boundaries between standard digital cameras with visible wavelength range and the use of sophisticated and expensive hyperspectral imaging systems (e.g. Zhao et al. 2012). The field protocols, parameters estimated, data interpretation and presentation of results tend to overlap between these two techniques, and it is mainly the cost of the equipment and sometimes the quality of the results that separates them.

The use of mobile phone cameras with their additional functionality can add processing capacity and other data interpretation and transmission abilities. Moonrungsee et al. (2015), for example, demonstrated colorimetric analysis of soil water using indicators for estimating available phosphorus, while El Kaoutit et al. (2013) achieved something similar for mercury concentrations. Gomez-Robledo et al. (2013) investigated the use of smartphone camera as a soil colour sensor, using it to determine Munsell colour of soil samples. Field-based investigation of soil biology has also been experimented with, for example Bogoch et al. (2013) who used a smartphone coupled with a basic microscope to detect helminth species from soil samples. Aitkenhead (2013) demonstrated a smartphone app linking camera, image analysis and server-side processing for the estimation of soil carbon.

In this paper, an overview of the use of image colour and texture for characterising soil, along with a discussion of image colour calibration and mobile phone sensors, is given. This is followed by the use of spatial covariates and their integration into modelling frameworks for estimating soil characteristics. The development of mobile phone apps that incorporate these modelling frameworks is described, with examples given of systems that have been developed and for ongoing work. Lastly, potential applications are explored.

2 Colour and Soil Character

Traditionally, soil scientists have determined the colour of a sample in the field by matching a soil aggregate against a series of colour patches first produced by Albert H. Munsell in the early twentieth century (www.munsell.com). The effects of lighting are assumed to be the same on a Munsell colour card patch and a soil of the same colour, eliminating the effects of lighting. There is some subjectivity in the Munsell soil colour assessment.

Complexity of soil colour–character relationships means that it is necessary to have information regarding the soil-forming factors (e.g. topography, climate, vegetation, parent material and land use). Modelling using legacy data is an important component of this work. If no legacy data are available that include colour and the parameter(s) of interest, then additional field sampling effort is needed. Soil colour and other parameters are included in several national and international data sets including the ISRIC–World Soil Information data set.

Soil parameters that have been estimated using colour include organic matter content (Aitkenhead et al. 2013; Liles et al. 2013), texture (Ibanez-Asensio et al. 2013), water table depth (Humphrey et al. 2011), iron oxide (Gunal et al. 2008) and others. Recent and ongoing work at the James Hutton Institute in the UK has demonstrated the ability to estimate a number of soil physical and chemical properties using soil colour and spatial covariates.

3 Mobile Phone Sensors

A number of sensors exist as standard in modern mobile phones that can be used to provide sensor data for soil monitoring. Below, we describe the sensors a smartphone/tablet device is equipped with, how they are relevant and how they can be used to further this goal. The long-term goal of much of the work described in this paper is to optimise the use of these sensors and the data they produce for real-time soil and general environmental characterisation—turning the smartphone into a Star Trek-style ‘tricorder’.

3.1 GPS

GPS (Global Positioning System) is a navigation system using satellite signals, with the first fully working system being developed by the US military. Most models of smartphone and tablet have GPS circuitry installed, giving them the same functionality as a standard GPS device. The basic GPS location information is given in latitude/longitude rather than in individual national grid reference systems and so may need to be converted to match spatial data sets.

GPS positioning allows the user’s location to be captured at the time of making other sensor readings. This positional information is then inserted into the header of any photographs that are uploaded and can be extracted and used to determine the parameter values of spatial covariates at the user’s location. This eliminates the need for the user to record anything other than the image/sensor reading that they are interested in and allows automation of site characterisation.

The accuracy of smartphone GPS locations is less than standard GPS devices, largely due to the limited size of the built-in antenna. With a mid-range smartphone, the location accuracy is usually within 35–40 m more than 95 % of the locations. This level of accuracy is smaller than the spatial resolution of most of the spatial data sets that are being used in parallel with the positional information, and so it is considered acceptable for this kind of work.

3.2 Camera

Improvements to digital cameras in smartphones have resulted in high-quality and consistent imaging. The number of pixels in a smartphone camera is now more than needed to simply determine the soil colour although for texture there is never a lower limit of requirement (some soil particles will always be smaller than the imaging capabilities of a commercial digital camera). The spectral range of cameras is an issue as they only provide colour information across broad spectral ranges. This limits their application for spectroscopic analysis. Spectral sensitivity or the response curve of the camera’s light-detecting sensors to different wavelengths is another issue, as these response curves vary between devices and so do not produce a uniform colour response.

Without specialist equipment, the shortest minimum focus distance varies from approximately 5 to 20 cm across smartphone/tablet cameras. This means that the minimum image pixel resolution that can be achieved is around 10 microns with up to 100 microns for older models. Smartphone cameras are unable to produce images that capture the full range of silt particle size and cannot acquire images of clay particles.

Automatic image adjustment can present a problem, as the camera’s internal software will attempt to adjust contrast and focus in ways that alter the colour response. There are also implications of the digital sensor array design due to the distribution of spectral filters on the pixel array. This can mean that the true RGB (red, green, blue) characteristics of individual pixels are inaccurate as they contain information from surrounding pixels.

So while cameras on smartphones and tablets can provide imagery of soil, they are unable to satisfy all the requirements in terms of spectral resolution and spatial scale, and are variable in terms of the images that will be acquired. It is necessary therefore to consider methods that can deal with this relatively coarse and inconsistent imagery.

4 Calibrating Image Colour

4.1 Why Calibrate Image Colour?

The need for a colour ‘absolute’ standard in imaging soil is necessary if colour information is to be used as a predictor of soil properties. Without this standardisation, it is impossible to tell whether colour variation is due to differences in the appearance of the soil, or in the device used to image it. Spectral response is measured in terms of the quantum efficiency (proportion of incoming light that is detected) at different wavelengths, with response curves due to the filter/sensor architecture and design usually having three distinct curves in the red, green and blue sections of the visible spectrum. The shape of these curves varies between devices and can alter over time in the same device, so calibration is required.

Loss of data from using multispectral instead of hyperspectral imaging systems is considered likely to reduce the accuracy of soil property estimation. Many of the comparisons that have been carried out (few of which have involved soil) have used hyperspectral imaging systems with a different, usually greater, spectral range than the multispectral system. Examples include Garrido-Novell et al. (2012), who looked at automated grading of apples, and Taghizadeh et al. (2011), who examined the quality evaluation of mushrooms.

A number of colour spaces exist (Munsell, RGB, LAB, etc.), often implying a need to convert from the initial colour description of the soil to the colour space of the model/calibration being used. Translation tables between the different colour spaces are readily available online, but this translation can sometimes result in a degradation of the colour information as colour spaces vary in the level of detail with which they cover different parts of the represented colour space.

4.2 Lighting Conditions

The effects of lighting conditions on the digital image are various, difficult to predict in advance and often seen in combination with one another. Lighting intensity is obvious, with cameras operating within a fairly broad range of light intensity. If light levels drop below a certain level, the camera will not produce images with pixel intensities across the full range available, resulting in a loss of data. For light levels that are too high, overexposure and glare from reflective surfaces will produce a restricted intensity range at the upper levels. In photographing soil, we have found that during daylight hours (preferably with the Sun well above the horizon), it is possible to produce adequate photographs.

The spectral distribution of daylight varies not only in maximum intensity but also in distribution. The angle of the Sun above the horizon plays a major factor with daylight being shifted towards the redder end of the spectrum when the Sun is low. Overcast skies also produce a slightly different wavelength distribution, with this variation depending on cloud thickness and other conditions. Below, we have four images of soils photographed with the same device at different dates and times within north-east Scotland. A colour correction card with the James Hutton Institute logo is also shown in each photograph, and it is clear that there is substantial colour variation between the images due to the lighting conditions (Fig. 7.1).

Fig. 7.1
figure 1

Examples of topsoil images taken of soils under different lighting conditions

4.3 Photography Requirements

A number of effects to be avoided can be easily produced in photography of soil. These include shadows caused by trees or the observer themselves. Image calibration becomes problematic if there are inconsistent lighting levels across the scene being photographed. Blurring caused by the movement of the camera while taking the photograph will be a problem if image analysis is to be carried out but is not an issue if only colour is being measured. Image focus will have a strong influence on image morphology, but not on the soil colour.

Contrast in the image can be a problem for very low or high lighting levels, or if there are highly reflective objects in the image that cause glare. If these problems are avoided, then the automatic colour calibration will resolve variable contrast levels. This means that automatic contrast adjustments made by the camera are more of a help than a hindrance, as they tend to produce image intensity distributions that are suitable for working with.

Some camera-induced image artefacts include faulty or damaged devices where false image signals are caused by misalignment or poor operation of the optical components. If an image contains unevenly distributed colours or rainbow-like image artefacts, it is best to use another camera as these are difficult to remove from the image.

File format effects can also be seen with devices that use Joint Photographic Experts Group (JPEG) image compression although the use of the uncompressed (RAW) file format is becoming more common. The JPEG compression format reduces file size and thus makes it easier to upload and use, but can result in a loss of image data and reduction in image quality. This is a problem with measurements of image morphology, as the compression algorithm introduces image artefacts at the pixel scale that cannot be distinguished from real image features.

4.4 Calibration Methodology

Colour calibration is required to produce a standard ‘true’ colour image that is independent of lighting conditions, camera spectral response and other. The way to do this is to determine the relationship of image colour to a standard colour sample within the image and to use this relationship to adjust the colour distribution of the rest of the image. We have developed an approach that uses a colour calibration card containing a standard distribution of RGB pixel values and which can also be used to determine the pixel resolution of the image.

The James Hutton Institute’s app development team has used two different colour calibration cards for different apps. The first used the Institute logo as it provided values across the RGB colour space while at the same time served as promotional material for the Institute (see below). The calibration results achieved with this card were good, but it did not provide a broad range of colour intensity values. The second card contains several greyscale bands, each of which has known RGB ratios while providing a range of reflectance values. This provides a spectral response curve that can be matched to the values received in an image (Fig. 7.2).

Fig. 7.2
figure 2

Examples of images taken incorrectly and correctly with a colour correction card

Colour card recognition in the image is necessary and requires identification of the edges of the card in order to isolate the pixels to be used for colour correction. The approach that we have used is to identify lines and rows within the image that contains more than a certain number of ‘white’ pixels—that is, pixels for which the red, green and blue values were all above 95 % of the maximum image intensity. Once these lines and rows had been identified, it was relatively trivial to identify the ‘bounding box’ of the colour correction card as the colour correction card’s outer surround is a large white area. Some trial and error was required to ensure that the threshold value of ‘white’ pixels was set at a value that allowed the correction card to be identified consistently. Calibration pixel extraction is carried out by selecting specific areas within this bounding box and identifying the mean RGB values from these areas. Development of the RGB calibration curve is done by calculating the ratios between known colour values for the calibration pixels and the values acquired from the image. This is done for a large number of pixels (several hundred distributed across the colour space—we used between 200 and 1000 depending on the colour correction card size in the image) to allow the correction across the full range of RGB values. Accuracy of the calibration process for RGB values was determined across a number of different lighting conditions, by comparing calibration pixels with target values. It is estimated that for imagery acquired under moderate and good lighting conditions, the RGB pixel value error is consistently reduced to less than 10 % of precalibration values. Under lighting conditions that are very dark or very light, the correction is less even but was found to always result in some improvement in the RGB value distribution.

5 Image Texture

Several image texture analysis approaches exist that can provide information about the relationships between the spatial distribution of image pixel intensity values and soil characteristics. These include wavelets (detection of specific frequencies in intensity variation within the image), GLCM (grey-level co-occurrence matrix) (spatial relationships of similar greyscale values), edge detection and the calculation of statistical parameters describing intensity values (e.g. range, mean, maximum, standard deviation, entropy) within a moving window of selected size within the image.

Removing non-soil pixels is the first step in the image analysis, followed by the reduction of the image colour space to greyscale. The implementation of image texture mapping with depth down the soil profile is carried out by calculating the GLCM texture parameters across the whole image, at a number of different scales. The image is sequentially reduced in pixel resolution by 2 (five times) and subjected to texture analysis, resulting in six sets of image texture data. This was done in order to capture variation in image texture with scale, which may be important for characterising the soil texture.

Measuring image scale using the colour correction card allows the image texture parameters to be given values in relation to real scales, which is important when comparing soils with different structural properties. The procedure for this is to measure image texture at the pixel/multipixel resolution, determine the resolution of a single pixel in the image and then fit the curve of measured texture values to a logarithmic range of preselected spatial scales. The scale values used in the work demonstrated here were 40, 80, 160, 320, 640, 1280 and 2560 µm, and the curve fitting was carried out by fitting a third-order polynomial curve to the values (Fig. 7.3).

Fig. 7.3
figure 3

Examples of image texture-scale curves adjusted to constant scale values. Raw textural measurement of a parameter (contrast) is on the left, and the values derived from fitted curves for different scales are on the right

6 Integration of Site Descriptors

6.1 SCORPAN

The concept of SCORPAN, which is an acronym of soil, climate, organisms, topography, parent material, age and N (for geographical location), is an adaptation of the concept described by Hans Jenny (Jenny 1994).

Nonlinear relationships between covariates and soil character make the implementation of SCORPAN within a modelling framework difficult. In practice, it is used as a conceptual model rather than as an approach for predicting soil properties (McBratney et al. 2003). Effects of non-SCORPAN drivers can confuse the issue, with, for example, burial of a soil profile by sediments which cannot easily be predicted.

6.2 Spatial Covariates

Examples of covariates that can be derived from spatial data sets and used in SCORPAN-derived predictive models of soil character include elevation and slope (topography), parent material from geological maps, vegetation classes from land cover maps, monthly or annual mean temperature and rainfall (climate). Land management and historical land cover data are also useful. Normalisation of covariate values is often necessary, particularly for parameters that are biased within their distribution (e.g. elevation, slope) or that have discontinuities in value (e.g. aspect, in which the difference between 359° and 0° should have the same impact as between 0° and 1°).

Location is useful because it allows other information about the soil’s environment to be included in a calibration model. The link with mobile device geolocation is useful, because it provides a system that incorporates image capture, geolocation and either onboard processing or transmission to a processing server. Accuracy requirements of the geolocation are difficult to define as soil varies, but normal operating accuracies of a few tens of metres or less are considered sufficient—the spatial data sets used are not usually of finer resolution than this in any case. Speed of response is also a consideration for real-time soil monitoring in the field. The SOCIT (Soil Organic Carbon Information Technology) app provides an estimate of soil organic matter content within 10–30 s, most of which is taken up by transmitting the image (in compressed form) to the processing server.

6.3 Spatial Data sets

Global data sets that allow covariates to be derived include topography (e.g. SRTM (Shuttle Radar Topography Mission), Aster GDEM (global digital elevation map), WorldDEM), climate (e.g. WorldClim, NOAA (National Oceanic and Atmospheric Administration) data), soil (Food and Agriculture Organisation Harmonized World Soil Database (FAO HWSD)—this also provides some information on parent material) and land cover (e.g. Joint Research Centre (JRC) Global Land Cover). Many other data sets exist at national and even local level, usually at smaller spatial resolution/larger scale than these global ones. A number of high-quality spatial data sets of relevant parameters exist for Scotland and were used in the work described here (see Sect. 7.9.1). Preparation requirements for the data sets include the reclassification of categorical maps, normalisation for bias in the range of values, extraction of additional parameters (e.g. slope and aspect from elevation maps) and spatial coregistration of the multiple data sets used.

The spatial data sets should not be on the device, because trying to put all of the necessary data onto a smartphone or tablet would require a data storage capacity beyond even modern devices. It would also mean that the developer was sharing data acquired from other sources, generally under restricted licence agreements. This would put these data sets onto devices from which they could be extracted, violating intellectual property. A solution is to use server-side processing, with all data and models stored at a single location and with the minimum of functionality on the device itself.

The concept of server-side processing is one that reduces the device-based processing requirements and gives the developer more options, but does introduce the need for developing a framework for passing data between the field device and the server. It also adds complexity to the processing chain while at the same time allowing the information derived to be recorded and stored for later use by the developer. One requirement when working with spatial covariates is that the specific site characteristics must be extracted and fed into any integrative model rapidly. This means that sequential reading of large spatial files to find the correct location is inappropriate, and the spatial data must be organised or split to allow more rapid access.

Once the spatial covariates have been parameterised, they can be linked to the image-derived data to generate input values for models developed to predict soil characteristics. Sample number versus parameter count must be appropriate, with large numbers of model parameters and low sample count resulting in what is known as the ‘curse of dimensionality’. The distribution curves of all parameter values must be as close to normal as possible, either through sample selection or through parameter normalisation. It is useful to attempt to reduce the number of model parameters by checking for high correlation values between input variables. For real-world soil data sets, there are often missing values and outliers due to analytical error that must be estimated using some imputation approach or removed from the data set, respectively.

7 Modelling Frameworks

Strong linear correlations between SCORPAN/image data input parameters and soil characteristics of interest are not common, so sophisticated methods of mapping between inputs and outputs are required. These can include multivariate correlation, decision trees, neural networks, Bayesian statistics, partial least squares or a number of others. There is no single method for developing models with complicated, noisy data sets, and so the approach used is generally decided based on preference, software availability and experience with specific approaches or familiarity with similar work. It is not that the methods themselves are not successful—merely that there is rarely a clear winner in terms of capability. In the case of the James Hutton Institute’s app development team, preference is to use neural networks as they are easily implemented, relatively intuitive and sufficiently flexible to be used for almost all soil-related data sets. We have also experimented with partial least squares, multivariate regression and decision trees. These and other approaches may provide an improvement of a few percentage points, but it is difficult to identify when one approach will be better than another.

It is possible to produce good predictive results that turn out to be meaningless due to inadequate model training. One of the most fundamental considerations is the splitting of the available data into training and testing data sets. A simple split into one subset for training and one for testing is valid if done robustly (i.e. the data points in each subset are representative of the full data set while at the same time avoiding the placing of replicates into different subsets). One of the commonly used approaches is k-means cross-validation, in which the data set is split at random into k approximately equal subsets, and k models are developed, each of which is tested on a separate subset. This has the advantage of using all of the data efficiently while at the same time producing an ensemble of models that can be used together at a later point. A further consideration for additional robustness is the testing of the model using a verification subset that is independently developed and unrelated to the training data set. We have used this approach to validate the model developed for the SOCIT app described later.

8 Mobile Phone Apps

8.1 Server Processing

The principal coding languages and environments for mobile devices are Java and Android Studio (for Android devices) and Objective-C and Xcode (for Apple devices). Additional coding languages may be used for server-side support of applications; there is a large number of these, and each coder will have their own preference, but they include PHP, which is useful for providing a connection between the app and a server-side database and languages such as Visual Basic or Visual C++, which can be used for running software to generate outputs from server-side data sets.

There are two security considerations: protection of the user and their device and protection of the server. Apps should be designed to use the minimum set of functions required to operate, in order to risk exposing the mobile device to electronic attack. For example, WebViews in Android apps support JavaScript and this can be exploited in malicious attacks. On the server side, the type of security implemented will reflect the application, e.g. databases must be protected against Structured Query Language (SQL) injection attacks, white lists can be used to permit allowed options, and secure passwords can be used and careful database administration, including mirroring and views can all be effective.

Online processing is the obvious choice for rapid field assessment of soils using the approach detailed here, but is not always possible, usually due to poor mobile phone reception. It is possible to send the imagery at a later date, as the location of the user is irrelevant—it is the location stored in the image that is used.

8.2 User and Design Requirements

User requirements include stability of the app, response speed and accuracy of the soil parameter estimates given. The issue of ergonomics and usability of apps is complex as the diversity of devices increases. An app must be designed to work on both low- and high-resolution devices with screen sizes from 9 to 25 cm and work with landscape and portrait screen orientations. It requires careful design to ensure legibility and that software buttons are large enough to touch. Also, while tools exist to help designers cope with multiple devices, there remains considerable effort required in producing graphics (logos, images and textures) for each of the required resolutions.

Design team expertise requirements for developing this kind of system cover four main areas: (1) soil science, particularly in the subfield of soil modelling; (2) data management; (3) programming (in any one of a number of appropriate languages—we have found Python works well, but there are other options); and (4) app interface development.

The intellectual property of all components in an app must be duly acknowledged and also communicated to the user through the End User License Agreement (EULA). The EULA is intended to make explicit the rights which the owner of the app confers on the user and what the user may and may not do with the app. It is written to satisfy the requirements of any relevant legislation and any health and safety implications.

Agreement to the EULA can be enforced from within the app. On current James Hutton Institute apps, the user is presented with the EULA when the app is first run. The user must click an acceptance, or the app will terminate. After acceptance, the EULA is only displayed if the user clicks on a button to show it.

Keeping the app simple in design means that less effort is required in the development and also avoids confusing the user with overambitious design. A simple design is usually most easily reused for later work if other apps are to be developed. Another important rule is to keep it free, as attempting to make profit from an app that uses underlying spatial data sets can cause legal issues.

A number of criteria exist for measuring the success of any app, and information on these can usually be obtained from analytics available through the app provider. These include the number of downloads of the app itself, the number of times it has been used and feedback that has been sent. Additionally, the availability of user-provided data for later use can also be considered a criterion of success.

Licences associated with the data used in any model/app framework must be considered, to ensure that all requirements are being met. Some form of licence must be considered for the model and app itself, to protect the IP of the developers. Server-side protection of the data is a sensitive issue, and the app design should make it impossible for malicious users to use the app to access the data directly. This is also true of the user-derived data, which should be made invisible unless a deliberate decision is made to share this information.

9 Examples

9.1 SOCIT

The SOCIT app originated through the existing work for QMS (Quality Meat Scotland), on estimating soil organic matter in grassland soils based on spatial covariates. A software package for desktop PCs was anticipated, before realisation that a smartphone app would be a better option and would provide a link with institutional priorities in relation to digital soil mapping and the use of legacy data for improving our understanding of the soils of Scotland.

The Scottish Soils Database provided data on soil organic matter content and colour from hundreds of sites sampled across Scotland. The majority of data used in the database came from NSIS1, the first National Soil Inventory of Scotland. Parameters used from the database included LOI (loss on ignition), spatial location of the sample site and Munsell colour estimated under field conditions.

The decision to use organic matter content (in reality LOI) rather than soil organic carbon content was made for two reasons: primarily, land managers were found to be more familiar with the concept of ‘organic matter content’ than with ‘carbon content’ and stated a preference for using this parameter; secondly, Scottish soils almost all contain very little carbonate (based on the evaluation of the Scottish Soils Database), and so the LOI values could be reasonably assumed to equate to organic matter content. Converting Munsell colour to RGB was carried out using an online conversion table (Boronkay 2013).

Topographical data included elevation, slope, aspect and curvature derived from the 50-m resolution DEM from the UK Ordnance Survey (OS). Land cover data included Land Cover of Scotland 1988 (LCS88) and Land Cover Map 2007 (LCM2007) data sets, reclassified to produce a simple categorisation of ten land cover classes. Soil map information was taken from the 1:250,000 Scottish Soil Map generated by the Macaulay Land Use Research Institute (MLURI). Parent material data were derived from the soil maps. Climate data used included mean monthly temperature and rainfall, from gridded UK Meteorological Office observations between 1971 and 2000.

The app requires rapid access to specific information about sites of interest. To facilitate this, the spatial data were used to produce a set of data strips as separate files, each of which contained the relevant parameter values for a strip of data 100 m wide across the country. These smaller files could then be read quickly to access data relevant for specific locations.

A neural network model was used to estimate soil organic matter content from the various input parameters. This model was kept simple, using the backpropagation error minimisation algorithm and using the k-means cross-validation approach to create a robust consensus model. Validation accuracy measurements for a model trained with all LOI values less than 20 % for agricultural, grassland and forestry soils gave an R 2 value of 0.79, a root mean square error (RMSE) of 1.58 % and a mean absolute error of 1.12 %.

The apps produced by the James Hutton Institute have been designed using the client–server paradigm where the client device is the mobile device and the server is at the James Hutton Institute. The app is designed to enable and guide the user to structure an appropriate request for information and to send that to the server. The server processes the query, runs the required software, generates an output and returns it to the mobile device. The device receives the response for the query and interprets and processes it into a form suitable for display on the mobile device (Fig. 7.4).

Fig. 7.4
figure 4

Framework for client–server information flow used in James Hutton Institute apps

The main processing thread of the app that responds to inputs from the user must continue while a second, and the so-called asynchronous thread must be created to communicate with the server. There are time delays between the client sending the request for information and receiving the result from the server, and if this process were to remain on the main processing thread, the operating system or the user could interpret the wait as a software error. In the time between sending a request and receiving a response the device must still be usable (e.g. the user might wish to take a phone call), the user must understand that the process is ongoing and the device must be in a state whereby it can receive the response from the server and process it appropriately.

Implementation of the neural network model, coordinate transformation and image analysis scripts was made using Python, as was that of the controlling ‘master code’ that coordinated the activity of the various subroutines. The app was tested in the field but was hampered occasionally by the lack of signal. It was found that the model was much more accurate (in terms of RMSE and mean absolute error) when developed for soils under agriculture, seminatural grassland and forestry only. Inclusion of organic soils and heathland areas resulted in a model with poorer prediction ability.

Having selected a location, a small inspection hole is dug to a depth sufficient to expose the subsoil, a supplied colour correction card is placed in the hole, and a photograph is taken. The georeferenced photograph is sent to a server for processing, where code uses the colour correction card to determine the colour of the sample in red/green/blue colour space. The neural network model then uses this colour value, along with attributes determined from the geographical location, to estimate organic matter content which is returned to the user.

Getting the colour correction card is relatively easy—you can request it directly using the app, by providing an email address and delivery address. This is useful information for the developers, as it gives an indication of the geographical distribution of people interested in the app. When the address is not in Scotland, we email back to inform the contact that the app does not work where they are. Interpreting the results is straightforward, as the app provides two numbers—estimates of organic matter and organic carbon in the topsoil. The ratio between these two values is variable but normally lies within the range 1.5–2.0 in Scotland (based on values in the Scottish Soils Database).

9.2 Visual Structural Assessment (VSA)

As soil structure affects the ability of roots to penetrate soil and access water and nutrients, it is an important property of soils that is of direct relevance to many land users. A simple, rapid, field-based assessment has been developed that allows users to obtain a measure of structure (Guimaraes et al. 2011).

As the basic principle of the method is that soil is naturally found in some sort of aggregate (although these can be difficult to see where soil compaction has occurred) and that larger soil aggregates can be broken into smaller ones, image analysis techniques can be used to detect and classify aggregates; where the scale can be determined with reference to some standard of known size (see the colour correction card example above) this information can be used to estimate aggregate structure (size and structural strength). Soil textural and structural parameters might be predictable using an app similar to SOCIT system described above, although with different image analysis.

Field imagery was acquired using a number of different smartphones and tablets, including Apple and Android devices. An example of the images acquired is given below (note the different colour correction card, which gives better correction accuracy over the full pixel intensity range). Soil analysis was carried out using wet chemistry for a number of exchangeable cations, LOI for organic matter content, laser diffraction analysis for particle distribution and visual structural assessment in the field for the VSA scoring. Spatial data sets used for the VSA model were the same as those used for the development of the SOCIT app model (Fig. 7.5).

Fig. 7.5
figure 5

Example of image acquired of topsoil profile for visual soil assessment model

Colour calibration is similar to the SOCIT app, followed by GLCM image texture analysis and the scaling of image texture as explained earlier. Site descriptor values are derived using the data strips developed for SOCIT, to provide model input/output data. The number of input parameters for the model is greater than that for the soil organic matter model, as image texture analysis provides a larger number of parameters than colour. Image colour has been shown to have some impact on the estimation of soil structure, possibly through the detection of organic matter levels. Of the GLCM parameters derived from the imagery, it appears that contrast provides the strongest link and that it is the variance in contrast (measured in horizontal pixel lines across the profile) at different scales that provide an indication of structure.

9.3 From Scotland to Europe

The geographical expansion of the SOCIT model concept seemed a natural progression, and one of the data sets to use for this is LUCAS (Tóth et al. 2013). The one disadvantage of this data set is that it does not contain information on soil colour in situ and only has spectroscopy data from dried and milled samples. A proportion of the work carried out so far has involved developing an approach using spectroscopy data to estimate an ‘absolute’ soil colour that can also be derived in the field.

Soil carbon data were the main target of this work, although the other parameters measured for LUCAS have also been investigated. Early results confirmed that splitting the data into mineral/organic subsets decreased the R 2 values of predictive models but also greatly improved the RMSE and mean error values.

The creation of data ‘strips’ for EUSOCIT has resulted in the generation of 10 rows of data, each 5° of longitude wide and extending from 37°N to 71°N. The first of these rows begins at 15°W, and the last ends at 35°E. Within each row, represented by a folder, there are 35 subfolders for each degree of latitude, and within each subfolder, there are 1200 files, each of which represents 5 degrees of longitude and 3 arcseconds in latitude. Each file (of which there are 444,000) contains in sequence 80 environmental parameter values for each of the 6000 3-arcsecond points along the 5° strip.

Now, we can access the environmental descriptors (topography, climate, soil, land cover and geology) for any location with 3 arcseconds (approximately 90 m or less) precision, with a search time of no more than 3 s. This information can then be formatted and used as input data for the EUSOCIT model to provide an estimate of soil organic matter.

The other information that is needed for EUSOCIT to work is soil colour. As LUCAS does not contain soil field colour, we have to rely on the spectroscopy data. For this, the visible range values have been extracted for each sample point and converted to RGB values by averaging over the relevant wavelength ranges.

First indications of predictive accuracy for the EUSOCIT model trained with different partitioning give an R 2 value of 0.82 and a mean absolute error (MAE) value of 2.3 % when using all data, and lower R 2 values of 0.57–0.65 and MAE values of 0.9–1.2 % (6.3 % for organic soils) when the data are split between different land cover types. This indicates that using several models rather than one, with each model linked to a specific land cover, will produce more robust prediction accuracy. The use of spatial covariates definitely improves model performance over the use of colour alone.

9.4 Potential Applications

Soil colour has been shown to be related to a number of soil properties (e.g. Aitkenhead et al. 2013; Moritsuka et al. 2014). It should be possible to devise a series of apps which would give the user a quick ‘health check’ of their soil against a common set of health or quality indicators (e.g. organic matter content, pH, texture, structure, available water capacity). Additionally, while some of the underlying data sets used by the neural network model for national-scale predictions are of a coarse resolution (i.e. >5 km pixel size), where higher-resolution data sets exist for a specific geographical area, there is potential to use this approach in applications such as precision agriculture.

Extension from mobile devices to custom low-cost sensors is a possible area of development. The type of information would be the same, but it would allow more rugged and field-capable sensors to be used.

Free and rapid estimation of soil characteristics in the field fits well with citizen science activities, as it provides the user with information while at the same time automatically recording estimates on the process server. The SOCIT app provides a template for future work in this area. Caution about estimation versus direct measurement is an issue that must be made clear to the potential user of these data.

Upload of data from citizen scientists/field surveyors for Web mapping services is an option. With appropriate consideration of data protection issues, it is possible to include Web mapping services on standard app implementations. ESRI has produced a development kit for both platforms and this makes the coding of apps with WMS and other mapping functions more straightforward. An existing online presence giving an indication of what this could look like is MySoil, produced by the British Geological Survey (BGS) under the UK Soil Observatory umbrella (http://www.ukso.org/home.html).

10 Discussion

Points of advice to focus on during the development of a model/app system of the kind described here include the following:

  • Keep the team small and focused on the bare bones of the functionality in the first instance.

  • Multidisciplinary work is important for this kind of project—scientists, software developers and data managers are required.

  • Keep your communications and legal expert colleagues close—they can save a lot of effort and prevent you from reinventing what already exists.

  • Conversely, keep your communications and legal teams at arms’ length where required—their instinct may be to ‘overbrand’ the outputs and make things more legally complex than they really need to be.

  • The apps that are produced must be at all times simple, clear to understand and free to use.

  • The End User License Agreement is vitally important but must not intrude on the user’s experience of the app.

11 Conclusions

What can be achieved using this suite of approaches? Direct estimation of soil characteristics in the field is possible for some soil properties such as organic matter content, texture, structure, pH, nitrogen, base saturation and some elements (Ca, Mg, Fe, Al). Our work has shown that these soil properties can be estimated with accuracy levels suitable for soil monitoring requirements (ongoing). Potassium and phosphorus remain difficult (for us, using the methods described here) to estimate from colour and site descriptors, as do most of the heavier elements that have been measured within soil samples listed in the Scottish Soils Database. This work is ongoing, and the links between model inputs and outputs in these cases need to be further investigated.