Keywords

1 Introduction

During the last two decades, a growing interest on the quantification of soil attributes by means of soil sensing techniques has emerged (Ramirez-Lopez et al. 2014; Vasques et al. 2008) using visible–near-infrared (Vis–NIR) diffuse reflectance spectroscopy to provide data for digital soil mapping (Viscarra Rossel and Behrens 2010; Wetterlind et al. 2010) and soil morphometrics. Visible and near-infrared spectroscopy (Vis–NIR, 400–2500 nm) can be used as a tool to acquire more data rapidly and consequently increases mapping accuracy. Vis–NIR has potential to analyze several soil attributes simultaneously without considerable increase in costs (Viscarra Rossel and Lark 2009; Wetterlind et al. 2010), time and with less production of reagents residues (Viscarra Rossel et al. 2006; Demattê and da Silva Terra 2014).

Vis–NIR spectra contain information on minerals, organics, water, color, and particle size, which are fundamental components of the soil (Viscarra Rossel and Chen 2011). Reflectance spectroscopy can be very useful in the assessment of soil variations in depth (from different layers and/or horizons) due to the interaction of light with soil attributes reflecting intrinsic data related with soil (Demattê and da Silva Terra 2014).

Recent studies also investigated the relationships between soil attributes and its spectral reflectance aiming into predicting physical–chemical soil attributes (Summers et al. 2011). Using a laboratory spectrometer, Demattê et al. (2012) analyzed the relation between the reflected electromagnetic energy and soil attributes. They observed features between 450 and 600 nm caused by subtle differences in the absorption intensity, which can be used to separate the subhorizons in the field from their color.

The studies of soil reflectance spectra across the whole visible, near-infrared, and shortwave infrared (Vis–NIR–SWIR) have been successfully applied by Dotto et al. (2014) who developed models using multiple linear regression analysis to predict the content of sand, silt, and clay. The models produced good results, explaining 77 and 72 % of the variance for sand and clay, respectively. In a study carried out by Viscarra Rossel et al. (2006), the authors have shown that a spectrometer operating in visible region of the spectra (400–700 nm) can be used for soil organic carbon prediction. Their results achieved a RMSE of 0.18 % and R2 of 0.60. The authors highlighted that the predictions using only the visible part of the spectra can be comparably accurate and not as expensive as the infrared spectrometer.

Considering that spectrum of the visible region (400–700 nm) is used for morphological in field soil classification, this paper has the hypothesis that, even using a spectrometer capable of acquiring a limited region of the spectrum, the soil signatures collected with such equipment is suitable for estimating soil texture. Thus, the main objective of the paper is to predict soil texture using second derivatives of the reflectance in a limited region of the spectrum (325–1075 nm). It is anticipated that lower-cost near-infrared camera or spectrometer with a limited range of spectra can be used for digital soil morphometrics.

2 Materials and Methods

2.1 Soil Sampling and Laboratory Analysis

Samples were collected in the southwest part of Marombas river watershed, located near the center of Santa Catarina State, south of Brazil (Fig. 6.1). Parental material in the region consists mainly of basaltic igneous rocks of Serra Geral formation. A small area of the watershed, located toward east, consists in consolidated sedimentary rocks of the Botucatu Formation. The climate is subtropical with mild summer and mean annual temperatures of 16 °C. Köppen climate classification system for the area is Cfb. Annual precipitation is about 1600 mm. Altitude of watershed varies from 900 to 1300 m above sea level. Natural vegetation belongs to the mixed ombrophylous forest. The total area of the watershed is approximately 950 km2, and predominant land cover consists of 22 % of agriculture (garlic, onion, soy beans, and maize), 37 % of cultivated forest (Pinus taeda), 33 % of natural forest (with Araucaria angustifolia), and 8 % of grassland and pasture. Prevalent soil types in the area are Oxisols, Inceptisols, and Entisols (Latossolos, Cambissolos, and Neossolos in the Brazilian classification system).

Fig. 6.1
figure 1

Study area and sampling locations (dots) in the Marombas river watershed (red polygon). Small inbox shows the location of the watershed in south of Brazil

A total of 42 soil profiles were sampled following the GlobalSoilMap specifications (Arrouays et al. 2014). In every profile, samples were collected until 2 m depth (when possible) in the intervals of 0–5, 5–15, 15–30, 30–60, 60–100, and 100–200 cm. Soil analyses were conducted in the Pedology Laboratory of the Federal University of Santa Maria (Santa Maria, RS, Brazil). Soil organic carbon (SOC) and soil texture were determined for the 237 soil samples after air-dried, ground, and sieved through a 2-mm mesh according to Embrapa (1997). Sand, silt, and clay (g kg−1) were determined by the pipette method, and SOC (g kg−1) by Walkley–Black wet digestion as described by Tedesco et al. (1995).

2.2 Spectral Analysis

In the laboratory, in a controlled setting, the 237 air-dried grounded samples were scanned using a FieldSpec HandHeld II (ASD Inc.) spectrometer, with a spectrum range acquisition of 325–1075 nm and spectral resolution of <3 nm at 700 nm. Soil scanning was conducted inside a black painted box (dimensions L/750 × H/400 × W/400 mm), to allow for a controlled light illumination. Inside the box, soil samples were put in a Petri dish. Spectrometer was installed on top of the box with a conical field of view of 10° at a distance of 400 mm from samples. With this configuration, the spectrometer sampling area in the Petri dish was 40.7 cm2. A light source of 70 W quartz–tungsten–halogen lamp with integrated reflector was placed inside the box. Light source was placed 400 mm away from the soil sample and inclined 30° from lamp nadir. Four composite scans (each one is an average of 100 internal scans) were obtained for each sample from the four quadrants of Petri dish by rotating it 90°. Reference spectrum using a white Spectralon® panel was collected prior to the first scan and at every new group of samples from a different profile. Final spectrum was calculated by averaging the four composite scans.

2.3 Spectral Data Analysis

This study applied three preprocessing steps to soil reflectance spectra. First, spectra with high noise-to-signal ratio at the edges were removed (325–400 and 980–1075 nm) which were confirmed by visual observation. Second, the reflectance spectra were smoothed by a Savitzky–Golay second-order polynomial using a moving window of nine values (Savitzky and Golay 1964). Third, to reduce the dimensionality of the data, the reflectance values were averaged across a 5-nm window. This pretreatment reduced the soil spectral curves to 116 reflectance values (400–980 nm) which were then used for modeling.

Savitzky–Golay second derivatives were calculated on the 116 soil reflectance spectral values using a second-order polynomial across a 9-nm window. This derivative procedure followed the recommendation by Vasques et al. (2008). The modeling dataset was formed by sand, silt, and clay values and second derivatives of the air-dried grounded samples, using partial least-squares regression (PLSR) with The Unscrambler®X 10.3 software (CAMO Inc., Woodbridge, NJ).

2.4 Partial Least-Squares Regression Modeling

For each Vis–NIR spectral pretreatment, a PLSR model was tested. PLSR is the most common algorithm used to calibrate Vis–NIR spectra to soil properties (Viscarra Rossel et al. 2006) where there are many predictor variables that are highly collinear (Viscarra Rossel and Behrens 2010). PLSR handles this multicollinearity and is robust in terms of data noise and missing values (Summers et al. 2011; Viscarra Rossel et al. 2006). The PLSR algorithm integrates the compression and regression steps, and it selects successive orthogonal factors that maximize the covariance between predictor and response variables (Viscarra Rossel and Behrens 2010).

Dataset was also further partitioned in three subsets related to soil depth. In all PLSR models, the quality of prediction was assessed by randomly dividing the datasets in two groups (70:30 split) for calibration (C) and validation (V). Thus, there were four groups of data formed by soil texture and reflectance second derivatives: whole dataset (i.e., 166C/77 V), 0–15 cm (i.e., 59C/25 V), 15–60 cm (i.e., 58C/24 V), and 60–200 cm (i.e., 51C/20 V). For modeling, soil texture clay, silt, and sand content were expressed in g 100 g−1 or %. Models were evaluated based on the coefficient of determination of validation (R2, Eq. (6.1)). Complementary error statistics were also provided, including the root-mean-square error (RMSEP, Eq. (6.2)) for models accuracy, and mean error (ME, Eq. (6.3)) for its bias:

$$R^{2} = {{\mathop \sum \limits_{i = 1}^{n} (\hat{y}i - \bar{y})^{2} } \mathord{\left/ {\vphantom {{\mathop \sum \limits_{i = 1}^{n} (\hat{y}i - \bar{y})^{2} } {\mathop \sum \limits_{i = 1}^{n} (yi - \bar{y})^{2} }}} \right. \kern-0pt} {\mathop \sum \limits_{i = 1}^{n} (yi - \bar{y})^{2} }}$$
(6.1)
$${\text{RMSEP}} = \sqrt {{{\mathop \sum \limits_{i = 1}^{n} (\hat{y}i - yi)^{2} } \mathord{\left/ {\vphantom {{\mathop \sum \limits_{i = 1}^{n} (\hat{y}i - yi)^{2} } n}} \right. \kern-0pt} n}}$$
(6.2)
$${\text{ME}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} (\hat{y}i - yi)$$
(6.3)

where ŷ = predicted values, ȳ = mean of observed values, y = observed values, and n = number of predicted/observed values with i = 1, 2,…, n.

Homogeneity of variance test, between soil texture calibration and validation sets, was carried out with Levene’s test. Following results of homogeneity of variance (i.e., groups had equal or unequal variances), a comparison between the mean was conducted with Student’s t test. All tests were done with a critical p-value of 0.05 (95 % confidence).

3 Results and Discussion

3.1 Descriptive Statistics

Soil textures in the Marombas river watershed are predominantly clay and silty clay (Fig. 6.2). There are also a few samples of clay loam and silty clay loam. Those soils are deeply weathered with strong presence iron oxides with particles diameter less than 0.002 mm. Soil clay content of the 237 samples ranges from 31.79 to 78.48 % and sand content ranges from 1.38 to 35.48 % (Table 6.1). The mean clay content increases from 51.73, 56.49, and 63.82 % within the increasing soil depth of 0–15, 15–60, and 60–200 cm, respectively. This small increase in clay with depth is due to translocation. The dominant minerals are calcic plagioclase and pyroxene basalt which weathered completely and formed clay minerals through oxidation process of the parental material contributing to this fine texture. The profiles were classified as Oxisols (Latossolos in Brazilian classification).

Fig. 6.2
figure 2

Soil texture of the samples following the USDA triangle

Table 6.1 Entire and segregated by depth datasets descriptive statistics

Sand, silt, and clay contents were tested for normality with Shapiro–Wilk test at a 0.05 significant level. The test indicates that sand, silt, and clay were normally distributed, and thus, no transformation was applied to the attribute datasets before modeling. To verify whether there was similarity between calibration and validation datasets, tests of homogeneity of variance (Levene’s test) and comparison of the mean (Student’s t test) were carried out with a 0.05 significant level. The Levene’s test indicated no homogeneity of variance between sand datasets for depth of 0–15 and 60–200 cm (Table 6.2). All remain groups of data had equality of variances between calibration and validation samples. Due to the lack of homogeneity of variance, the Student’s t test for comparison of the mean in those two groups (0–15 and 60–200 cm) was carried out with non-equal variance assumption. Comparison between the mean for sand, silt, and clay values for calibration and validation sets did not show a significant difference (Table 6.2). Sand, silt, and clay are compositional data which needs to sum to 100 %. In this study, we model the components independently to study the relative predictability of the content using NIR. Future work will look into additive log-ratio transformation.

Table 6.2 Values of p for the tests of homogeneity of variance and comparison of the mean

3.2 Qualitative Description of the Spectral Data

Spectra of all soils were similar with minor features apparent in visible and near-infrared region. An increase in soil reflectance could be noticed toward deeper soil samples (Fig. 6.3a). Samples located near the surface have higher SOC content which absorbs radiation. Sousa Junior et al. (2011) found similar results on correlation between soil attributes and its reflectance showing that soil organic matter has a high influence on the spectral behavior, resulting in a significant negative correlation in all evaluated bands. The organic matter can also mask features of the reflectance (Demattê et al. 2012).

Fig. 6.3
figure 3

Reflectance data and 1st and 2nd derivatives. Data collected in 6 depths at soil profile number 1

The SOC content varied from 0.03 to 8.32 % in the dataset of 237 samples. High SOC presence is due to constant supply of new organic material in vegetated areas. The altitude of the region has annual average temperature to be around 16 °C, thus maintaining a high SOC content on top layers. Clay soil texture also plays a role in protecting organic carbon from decomposition through physical protection. The 71 samples from depth of 60–200 cm showed an amount of 0.03 to 3.78 % of SOC, indicating a decrease of SOC with depth.

First and second spectra derivatives highlighted features related to soil samples mineralogical composition (Fig. 6.3b, c). According to Torrent and Barrón (2002), soil reflectance of weathered Oxisols shows features related to the presence of iron oxides goethite and hematite around 480 and 530 nm, respectively. Those features are a product of various electronic or vibrational transitions in the atoms and molecules of minerals. In the case of Oxisols, this is of decisive influence for morphological description and soil color determination. Summers et al. (2011) found some contributions from the visible (400–700 nm) and near-infrared region (700–1300 nm) in the clay absorption feature at 2200 nm and the features at 1400 and 1900 nm, indicating there may be some covariation between the clay content and the color of the soil. The second derivative spectra showed similar behavior in all depth except for the presence of different amounts of SOC. Samples with higher amount of SOC showed smaller amplitude. Another effect of increasing amounts of SOC is the obliteration of a concavity feature around 880 nm which is related to the presence of iron oxides (Fig. 6.3a). Demattê et al. (2004) reported that the depth of this concavity is related to the degree of the crystallization of iron oxides, and the presence of SOC will diminish this spectral feature.

Second derivative (Fig. 6.3c) shows the absence of goethite from the concavity around 450–480 nm. On the other hand, a strong peak in the second derivative values near 540–560 nm is related to the samples that reach content of hematite (Fig. 6.3c). These features can be used for soil texture and spectral signature modeling with PLSR.

3.3 Development of Calibration Model

Overall, best PLSR predictive values were achieved for clay (mean R 2 = 0.58), followed by silt (mean R 2 = 0.56), with worst predictive values achieved for sand (mean R 2 = 0.24) (Table 6.3).

Table 6.3 Statistics of PLSR modeling

Considering PLSR results separately in each of the four datasets, the best predictive values can be achieved by modeling soil clay content using data from all depths. When this whole dataset was used, the R 2 = 0.69, RMSEP(%) = 5.39 and ME(%) = −0.01. Small bias was found when the validation set is carried out on samples very similar to the ones which have used for calibration procedures. Worst results for clay PLSR prediction is obtained for soil samples from 60 to 200 cm, with R 2 = 0.46, RMSEP(%) = 6.56 and a clay underestimation of ME(%) = −0.86. Those results are somehow the opposite of what was expected. Since at this depth, SOC is lower, it was expected that a less interference of organic molecule on the spectra would allow a higher clay content prediction. However, one has to bear in mind that the 60–200 cm dataset had only 51 samples for calibration and 20 samples for validation of the models, with similar clay content, thus causing the model to underperform due to the lack of the representativeness of the information. Clay variability remained high in this dataset shown by the range values of 46.37 and 37.57 % for 51C and 20 V, respectively (Table 6.1).

For sand prediction, poor performance with R 2 = 0.09, RMSEP(%) = 4.14, and ME(%) = 0.26 was found for soil samples form 60 to 200 cm depth. This might also be due to the smaller amount of information in this dataset. Nevertheless, when modelled using the whole dataset (237 samples), PLSR for sand prediction also achieved poor results with R 2 = 0.30, RMSEP(%) = 5.47 and ME(%) = 0.59. Future work should rely on datasets with a broader range of sand content. This could be an evidence that sand prediction in Oxisols, using a limited spectral region, could be a challenge. Model adjustment might demand higher sample datasets to cope with soil variability, in addition, the high soil clay content might coat the sand particles, thus making sand prediction more difficult.

In PLSR modeling, a specific region of the spectrum may be important for modeling of soil attributes. Such attributes are identified by large PLS regression coefficients. The regression coefficients for the three soil attributes are shown in Fig. 6.4. The magnitude of those regression coefficients, negative or positive, represents the importance of the reflectance band in terms of the explanation of variance in soil analysis data. Positive peaks are due to the component of interest, while negative peaks correspond to interfering components (Haaland and Thomas 1988). Spectra with a coefficient near zero do not have predictive capability.

Fig. 6.4
figure 4

Regression coefficients of the partial least-squares regression model with whole dataset for soil attributes: a sand, b silt, and c clay

For sand prediction, regression coefficients with positive values can be found at 432, 512, 582, and 882 nm. A significant negative peak can be seen at 457 nm. Looking into the whole spectrum of clay regression coefficients (Fig. 6.4c), its peaks are much better defined than the ones for sand and silt (Fig. 6.4a, b). This could be due to the strong presence of iron oxide characteristics (i.e., soil color within 400–700 nm) in the analyzed Oxisols samples. For clay prediction, positive regression coefficients were 462, 547, 627, and 752 nm. On the other hand, negative coefficients were located at 492, 512, 587, 662 and 867 nm. This last negative peak around 867 nm could be associated with the presence of higher amounts of SOC in the soil surface. The presence of organic material diminishes the perception of the iron oxide concavity around 880 nm, which in turn makes it more difficult to the PLSR models to predict clay content. All the negative and positive peaks of regression coefficients are spectral regions which deserve more attention toward selecting, and possible model recalculation, focusing in more significant variables for PLSR models.

4 Conclusions

Soil attribute prediction with PLSR using a limited spectral region (325–1075 nm) performed poorly for sand. The results were more promising when considering the capabilities to predict silt and clay.

The application of visible and part of the near-infrared region (400–980 nm) for clay prediction in Oxisols achieved relative good results when all dataset (n = 237) was used for modeling with no stratification by depth with R 2 = 0.69, RMSEP(%) = 5.39, and ME = −0.01 %. Regression coefficients showed good relation to the spectral behavior of weathered soils in visible and near-infrared region. They should be used in future studies as a filtering approach toward selecting more significant variables (i.e., spectral regions) for modeling.