1 Introduction

Soil organic matter (SOM) is a crucial gauge for soil quality (Schmidt et al. 2011; Zhou et al. 2016; Hengl et al. 2017; Chen et al. 2020; Zhou et al. 2021). It is essential to the stewardship of ecosystem functions, such as soil fertiliser and carbon sequestration, since SOM drives the soil carbon decomposition and nitrogen (N) mineralisation processes in the soil (Fang et al. 2005; Yuste et al. 2011; Zhu et al. 2014; Schillaci et al. 2017; Bai and Zhou 2020; Saby et al. 2020). Therefore, precise knowledge on the spatial distribution of SOM and potential controls plays vital roles in sustainable soil utilisation and management, as well as climate change migration.

However, SOM may vary spatially and is influenced by various factors like soil properties, climate, terrain, and anthropogenic activities (Wiesmeier et al. 2011; Ji et al. 2012; Zhang et al. 2012; Zhou et al. 2016; Liang et al. 2019; Yang et al. 2020; Cambou et al. 2021; Hu et al. 2021). Therefore, the spatial estimation and identification of potential controls of SOM have been widely and extensively studied (Hengl 2009; Marchetti et al. 2012; Piccini et al. 2014; Boubehziz et al. 2020). Among these studies, Zhang et al. (2012) estimated the spatial distribution of SOM with covariates such as terrain indices and categorical variables in Miyun County in Beijing City of China using ordinary kriging (OK), multiple linear stepwise regressions, and regression kriging. They concluded that the spatial patterns of SOM in their survey area were affected by the collaborative consequence of topography, soil texture, and soil type. Dai et al. (2014) mapped the spatial variation of SOM content by combining artificial neural networks and OK in the Tibetan Plateau in China. Mao et al. (2014) produced the map of SOM in urban soils in Xuzhou City, China with OK. Wiesmeier et al. (2011) applied the random forest (RF) to map SOM content stocks in a semi-arid area in the autonomous province of Inner Mongolia, China. They found that land use, soil group, and geology are key factors that determine the amount of SOM in a surveyed region. Wu et al. (2009) produced spatial maps of SOM in Haining City in Southeast China using OK and the cokriging method. Guo et al. (2015) used RF plus the residuals kriging approach to estimate the spatial variation of SOM in Danzhou City, Hainan Province, in the south of China. Their study revealed that geological unit, precipitation, and terrain played a crucial role incontrolling SOM variation. Zhou et al. (2016) revealed the scale-specific dominators of SOM in the Northeast and North China Plain using discrete wavelet transforms. Their results indicated that the correlation between environmental factors and the original SOM is quite different at different spatial scales. Bogunovic et al. (2018) used OK to map the spatial distribution of SOM in the Baranja region of eastern Croatia. They reported high relationships between spatial variation of SOM and geological or landform factors (Bogunovic et al. 2018). Using co-ordinary kriging, Medhioub et al. (2019) mapped the spatial pattern of SOM in southern Tunisia with the help of covariates extracted from remote sensing image.

Dongting Lake is the second largest freshwater lake in China, with an area of 3879.2 km2. There are large areas of farms and wetlands around Dongting Lake, which has a history of more than 8000 years of rice cultivation and the region around the Dongting Lake is well known as the hometown of fish and rice in China. It is also one of the most important bases for commodity grain production in China. However, with the increasing population and rapid development of industry and urbanisation, land use types and soil management practices have greatly changed in the area around Dongting Lake, which has led to great effects on the ecological environment. SOM is very sensitive to changes linked to anthropogenic activities, climate, land use, terrain, and soil properties. Accurate information of the spatial variability and the identification of potential controls of SOM are necessary for better soil management and agricultural practices. However, to our knowledge, among substantial research on the spatial patterns and potential controls of SOM, few studies have been reported to conduct extensive surveys at county scales and fewer studies have involved factors such as rotation system, soil plough depth, and cropping system. Moreover, no research has been reported that analysed spatial variations and identified potential controls of SOM in the Dongting Lake region.

To fill these gaps, 8040 topsoil samples (0~0.2 m) were collected from Yueyang County in Eastern Dongting Lake in the south of China in this study. Additionally, two widely used and efficient methods, OK and RF, were used to estimate the spatial variation of SOM and build a model to predict SOM content with covariates as well as identify main controls of SOM variation in the survey area. The main aims of the current research were (1) to explore the spatial variability of SOM content; (2) to build a model to quantitatively predict SOM content with various sources of covariates and with the RF method; and (3) to identify potential controls of SOM based on the relative importance of variables. These results could contribute to more efficient soil management and climate mitigation such as carbon sequestration.

2 Methods and materials

2.1 Study area

The survey region covers whole the Yueyang county (112° 55′ 27″ E–113°45′ 54″ E, 28° 32′ 16″ N–29° 42′ 59″ N) and is located in the east of Dongting Lake, which is the second largest freshwater Lake in Central China. Yueyang county has an area of 2809 km2 (Fig. 1). The climate is a typical subtropical monsoon climate, with an annual average temperature of 17 °C and an annual mean precipitation of 1331.5 mm. The main soil types are fluvo-aquic soil, red earth, paddy soil, and purplish soil. The main land use types in the study area are arable land, forest land, and industrial, mining, and residential lands. Topographically, the landform of the county slopes down in steps from the northeast part to the southeast part. The proportions of mountains, hills, plains, and water can be roughly assigned as 12:11:24:13:40.

Fig. 1
figure 1

Map of sampling locations

2.2 Sampling and chemical analysis

The survey was conducted between 2007 and 2014, with 8040 topsoil (0–20 cm) samples were collected (Fig. 1). The sampling locations were recorded using a portable global positioning system. Meanwhile, other information, including altitude, slope, aspect, land use, and vegetation types, soil texture, and soil plough depth, were also recorded when employing soil sampling. Each sample was composited by five sub-samples within a radius of 5 m and had a total weight of 3 kg. Then, the samples were first air-dried in the laboratory and then passed through a 2-mm sieve for the measurement of SOM by the potassium dichromate-wet combustion procedure (Faurescu et al. 2010). The soil organic content (SOC) of each soil sample was determined using the potassium dichromate oxidation method (external heat applied), after that multiplied it by the coefficient 1.724 to get the SOM content. The soil pH was measured using the glass electrode method (pHS-3C, REX, Shanghai, China). Available K were extracted by ammonium lactate solution and detected by spectrophotometry and flame photometry.

2.3 Spatial interpolation

Ordinary kriging (OK) (Webster and Oliver 2007) was used to estimate the spatial pattern of SOM in the survey region using ArcGIS (Version 10.3, ESRI Inc., USA) with all the soil samples we collected. OK is one of the most widely used methods for mapping soil properties in many different fields around the worldwide (Simbahan et al. 2006; Chabala et al. 2017; Vasu et al. 2017; Hu et al. 2019; Hu et al. 2020a; Xia et al. 2019; Xia et al. 2020). Its core assumption is that the mean estimation error equals zero and the optimisation target is minimizing the variance of estimation error (Goovaerts 1997). Experimental semi-variograms were calculated to indicate the spatial dependence of soil SOM using the following equation (Webster and Oliver 2007):

$$ {\gamma}^{\ast }(h)=\frac{1}{2N(h)}\sum \limits_{i=1}^{N(h)}{\left[Z\left({x}_i\right)-Z\left({x}_i+h\right)\right]}^2 $$
(1)

where γ(h) represents the semi-variance, N(h) indicates the number of separated experimental point pairs at distance lag h, Z(xi) indicates the SOM content at the observation site i and Z(xi + h) expresses the SOM content at the observation site i + h. Based on the experimental variogram function, we can fit a suitable model using the weighted least squares and set some prior values for model parameters such as range, nugget, and sill prior during the process of fitting and interpolation. Some studies have proven that the accuracy of OK with different semi-variance functions is similar (Xie et al. 2011; Qiao et al. 2018). Therefore, the parameters of the best optimised semi-variance function in this study were fitted by GS+, recorded, and then mapped using the ArcGIS software (Version 10.3, ESRI Inc., USA).

2.4 Random forest

The RF was proposed by Breiman (2001) as an extension of CART (classification and regression trees) to improve the prediction ability (Fu et al. 2020; Jia et al. 2020; Wang et al. 2020). It is based on regression trees but makes predictions based on many trees in an entire forest instead of a single tree. A large number of trees could guarantee model stability (500 trees were used in this study). The data are divided into a training subset, which is also known as the “in-bag” data, and the remaining part, known as “out-of-bag” data. The latter are used to estimate prediction errors (Breiman 2001). The RF method could capture linear and non-linear relationships between dependent variables and covariates. In addition, it could provide a measure of variable importance, calculated based on how much worse the prediction would be if the data were randomly permuted (Prasad et al. 2006). In this study, the dataset was divided as training dataset and independent validation dataset according to the ratio of 2:1 when we predicted the SOM content using the RF method. The model was trained with the training dataset and then independently validated by the validation dataset.

2.5 Data collection and analysis

Information related to 17 factors that would affect soil organic matter content was collected. As shown in Table 1, these factors included four categories, including (1) factors related to soil properties: soil groups (SG), soil texture (ST), soil plough depth (SD), pH, available phosphorus (AP), and available potassium (AK); (2) variables of climate: annual temperature, annual precipitation, and NDVI; (3) factors related to anthropogenic activities: land use (Landuse), rotation system (RS), cropping system (CS), and population density (Pop); and (4) factors related to terrain: elevation (Elev), slope, landform, and soil erosion degree (SE). The information related to soil groups, soil texture, soil plough depth, rotation system, cropping system, Elev, slope, landform, and soil erosion degree was recorded when performing soil sampling. The pH, available phosphorus, and available potassium were measured in the laboratory. The map of land use in 2015, annual temperature in 2014, annual precipitation in 2014, population density in 2014, and NDVI in 2015 was downloaded from the Resource and Environmental Data Cloud Platform (REDC) (http://www.resdc.cn/Default.aspx), with a resolution of 1 km.

Table 1 Environmental variables used in this study to model and predict SOM (g kg−1)

The summary statistic was employed in R studio (R Development Core Team 2013). The experimental semi-variogram for SOM was fitted using GS+ software (Version 7.0) (Robertson 1998). The random forest was applied in the package “randomForest” in R studio (Liaw and Wiener 2002).

3 Results and discussion

3.1 Exploratory data analysis

The summary statistics and histograms of physical-chemical properties of the soil samples studied in the survey region are listed in Table 2 and presented in Fig. 2. The distribution of SOM content in Yueyang County had an approximately normal distribution (Fig. 2), with a skewness value of 0.077 and a kurtosis value of 0.84. The SOM content in the survey region varied from 4.00 to 446.60 g kg−1 and had an average content of 33.17 g kg−1 (Table 2), which is clearly higher than the mean content of SOM (24.65 g kg−1) in farmland soil in China (Yang et al. 2017); this indicates a high-fertility soil condition in the survey region. The mean values for pH, AP, and AK were 5.41, 22.91 g kg−1, and 112.20 g kg−1, respectively. The coefficient of variation (CV, %) (Sokal and Rohlf 1981) was determined as an important indicator for reflecting the overall variation of SOM. Three categories of variation were classified as Wilding (1985): low (CV = 0–15%), moderate (CV = 16–35%), and high (CV > 36%). The CV of the SOM content in our study area was 31.81%, which indicates moderate SOM variation. This revealed that SOM variation in the study area was affected by both natural factors and anthropogenic activities.

Table 2 Physico-chemical properties of studied soils
Fig. 2
figure 2

Histogram of the values of SOM in Yueyang County (the red dashed line represents the mean value of SOM)

3.2 SOM spatial pattern

As presented in Fig. 2 and Table 2, the SOM content in the study area showed an approximately normal distribution. An exponential model of semi-variance of the SOM in the study area was fitted using GS+ software (Fig. 3) through 1000 iterations. The values of parameters for the fitted exponential model are presented in Table 3. The ratio of nugget variance (C0) to the whole variance (sill, C0+C) was calculated to assess the degree of spatial correlation. A lower value for the ratio indicates a stronger spatial correlation. As suggested by Cambardella et al. (1994), the value of C0/(C0+C) could be classified as three degrees: strong spatial dependence (< 25%), moderate spatial dependence (25–75%), and weak spatial dependence (> 75%). Accordingly, the C0/Sill ratio of SOM in the survey region is 9.94%, which means a strong spatial dependence of SOM in the study area.

Fig. 3
figure 3

Semi-variogram of SOM in Yueyang County

Table 3 Parameters of the semivariogram model

The map for the SOM content in the study area is presented in Fig. 4. As confirmed by the low value of C0/(C0+C), SOM presents strong spatial dependence in the study area, and the areas with high SOM contents and low SOM contents were mixed and discretely distributed. Generally, high SOM values were majorly located in the northeast and southwest parts of the survey regions, whereas low SOM values were primarily distributed in the western and southeastern parts of the survey region. The northeast and southwest parts of the research regions are very flat. Much of the farmland in this region is the reclamation area of Dongting Lake. Therefore, land in these regions has a relatively high fertility grade. The terrain in the western and southeastern parts of the area under survey is dominated by mountains and hills and has relatively poor soil fertility.

Fig. 4
figure 4

Spatial distribution map of SOM in the study area produced by ordinary kriging

To evaluate the prediction accuracy of OK, we presented the prediction standard error map of OK (Fig. 5). The standard error of the OK method ranged between 5.83 and 10.47 g kg−1 in the study area. The largest standard error was detected in the boundary of the study area which was attributed to the edge effects of the interpolation methods. In the area near the edges of the study area, less measured samples are available for estimation variables at unvisited locations which then could increase the prediction error (Goovaerts 1997; Webster and Oliver 2007).

Fig. 5
figure 5

Prediction standard error map of SOM of ordinary kriging interpolation results

3.3 Modelling SOM with RF

In this study, a model was constructed to predict SOM content using RF based on 17 covariates, including SG, ST, SD, pH, AP, AK, Temperature, Precipitation, NDVI, Landuse, RS, CS, Pop, Elev, Slope, Landform, and SE. As presented in Fig. 6, the developed model showed a good ability to estimate SOM concentration in the survey region, with an R2 of 0.74, an RMSE of 0.16 g kg−1, and a bias of 0.002 g kg−1. However, limited by the inherent shortcomings of many machine learning methods and even classical statistical methods, we could find that small parts of extremely low values prone to be overestimated, while part of extremely high values prone to be underestimated in our prediction results. In practice, the overestimation of SOM may lead farmers to be overconfident in the fertility of their farmland, and the underestimation of SOM may lead farmers to take additional but unnecessary measures such as the application of fertilisers to improve soil fertility. Both of these situations could bring economic losses or even negative effects on the environment. Therefore, in further work, we can consider introducing a penalty function in the RF algorithm to reduce the negative impact of low-value overestimation and high-value underestimation on the prediction result (Costa 2003).

Fig. 6
figure 6

Relationships between the observed and predicted values of SOM using Random Forest method

3.4 Potential controls of SOM

As described above, one of the important advantages of the RF algorithm is that it could quantitatively define the relative importance of predictors using the increased mean standard error (%IncMSE). In this study, four kinds of covariates (soil properties, terrain factors and Elev, climate factors, anthropogenic factors) were used to construct our models for predicting SOM. The relative importance of these predictors we used in this study is presented in Fig. 5.

3.4.1 Soil properties

As shown in Fig. 7, factors relating to soil properties, such as soil group, available phosphorus, available potassium, soil texture, and soil plough depth, were determined as priority controls for SOM in the area under study. Some studies have confirmed that soil nutrients, such as P and K, are closely related to SOM content (Tao et al. 2012; Wang et al. 2013; Debicka et al. 2016; Hu et al. 2020b). Tao et al. (2012) reported that when the content of soil total phosphorus was maintained at a low level, soil samples with higher SOM tend to have lower AP contents.

Fig. 7
figure 7

Relative variable importance for modelling SOM in soils in the study area (RS: rotation system, AK: available potassium, Elev: elevation, SG: soil group, AP: available phosphorus, Pop: population density, CS: cropping system, ST: soil texture, SD: soil plough depth; and SE: soil erosion degree, NDVI: normalised difference vegetation index. The brown colour represents factors related to soil properties, the blue colour represents factors related to climate, the red colour represents factors related to anthropogenic activities, and the green colour represents factors related to the terrain)

Soil groups have been proven by many previous studies to be an important driver of SOM variation (Hu et al. 2018; Fan et al. 2020). Our study also confirmed the finding reported by many previous researchers that paddy soils have relatively higher SOM content than most other soil groups (Hu et al. 2018; Fan et al. 2020) (Fig. 8). This may be attributed to several reasons: firstly, compared with other lands, a larger amount of straw and stubble may be returned to paddy soil, which functions as an important input of SOM (Jiang et al., 2018). This practice is widely encouraged and very popular in the south of China, including regions around Dongting Lake. In addition, frequent flooding of paddy fields gives rise to anoxic status in the paddy soils, which reduces the decomposition speed of SOM, and then leads to SOM accretion (Fan et al. 2020). Finally, the repeated redox alternations during the paddy rice planting process could enhance the emergence of amorphous material. For example, Fe oxyhydroxide has high reactivity for SOM adsorption and could strengthen the sequestration of SOM via a large specific surface area (Huang et al. 2018).

Fig. 8
figure 8

SOM content in different soil groups (g kg−1)

The effect of soil texture on the spatial distribution of SOM was confirmed by numerous studies such as Don et al. (2009), Mirzaee et al. (2016), and Schillaci et al. (2017). Generally, the SOM content is positively correlated with silt and clay content but negatively correlated with sand content (Wang et al. 2014). Alcántara et al. (2016) found deep ploughing in soil could increase agricultural SOM stocks by expanding the storage space for SOC-rich material. Fan et al. (2020) also reported that SOM decomposition was notably influenced by changes in SOM constituent related to soil depth.

3.4.2 Climate factors

As presented in Fig. 7, climate factors (annual precipitation, annual temperature, and NDVI) were identified as secondary important controls for SOM variability. Many studies have found that decomposition of all SOM is sensitive to temperature (Jenkinson and Rayner 1977; Schimel et al. 1994; Kirschbaum 1995; Davidson and Janssens 2006). One of the popular current opinions is that the decomposition of SOM is positively correlated with temperature (Hartley and Ineson 2008; Nianpeng et al. 2013). The index of temperature sensitivity (Q10) has been widely used to represent the reaction of SOM decomposition to temperature variation (Davidson and Janssens 2006; Conant et al. 2011).

Precipitation is one of the other chief elements that controls the net primary production of terrestrial vegetation, which then affects the input of SOM to the soil. Wang et al. (2014) revealed a positive relation between the SOM and precipitation in grasslands in Northwest China. Many other researchers also reported a significant positive relation between SOM and precipitation in the forest ecosystem (Baritz et al. 2010; Wiesmeier et al. 2011). NDVI has been used to represent the vegetation portion cover and the existence of vegetation in the survey region (Rouse et al. 1974). The biomass produced by vegetation is a critical input for SOM accumulation in soil.

However, in this study, the correlations between SOM and precipitation (R2 = 0.002), temperature (R2 = 0.003), as well as NDVI (R2 = 0.001), were not significant. Two factors may contribute to this. The main reason is that the study area only covers an area of 2716 km2. The change of latitude and longitude in the study area is very little; therefore, little differences exist among precipitation and temperature in different locations in the study area. Many soil samples had similar or even almost the same annual precipitation and temperature. Climate factors are expected to play a more important role in the control of SOM variability at large spatial scales (Sun et al. 2020). In addition, temperature has relatively lower importance than precipitation. This may stem from the low-temperature sensitivity of SOM decomposition in the study area. The average annual temperature of the study area was larger than 17 °C, which indicates that the temperature sensitivity was much lower than that within a low-temperature range (< 10 °C) (Kirschbaum 1995).

The resolution of the image for NDVI is 1 km in this study, which is relatively coarse. Considering the extensive density of soil sampling in this study, many soil samples also had similar or even the same values for NDVI, which may also weaken the relationship between SOM and NDVI values in the study area.

3.4.3 Anthropogenic factors

Our results revealed that anthropogenic factors, such as rotation system, cropping system, population density, and land use, were also determined to be important variables for controlling SOM variation in the area under study (Fig. 7). In this study area, soils with a double cropping system had the highest content of SOM (Fig. 9a). Additionally, land that had a rotation system of wheat, fruit, wheat-corn, rice, and rice-rice had the highest SOM compared with land with other kinds of rotation systems (Fig. 9b). Different soil management policies were employed on land with different rotation and cropping systems, which then led to variations of SOM content. For example, land that incorporated cropping rotations with high-residue-producing crops and maintenance of surface residue cover with reduced tillage led to more input of SOM in soils (Havlin et al. 1990; Lesoing and Doran 2019). Additionally, usually low SOM concentration could be supposed in land with cropping rotations with a high portion of root crops that need rigorous tillage such as potatoes (Götze et al. 2016). Moreover, long-term application of fertiliser was also reported to lead to increases in SOM in soils (He et al. 2018). Large amounts of chemical and organic fertilisers are applied into the arable soil, which increases SOM content in arable soils (Yang et al. 2017).

Fig. 9
figure 9

SOM content in different cropping systems (CS) (a), rotation systems (RS) (b), and land uses (c)

Regarding land use, our results indicate that arable land, forest, grassland, and resident land had similar and higher SOM contents than other land uses and that unused land had the lowest SOM content (Fig. 9c). Different land use types have different effects on soil fertility and productivity. Changes in land use and land management have a great impact on the balance of SOM in soil (Guggenberger et al. 1994). Liu et al. (2015) found that SOM concentration in forest was clearly higher than in grassland. No significant linear relationship was detected between SOM content and population density in the study area, which may stem from the same reasons we described above for annual precipitation and temperature.

In addition, when we focus on the effects of cropping system on the SOM in the same soil group, different order of SOM content could be detected in different soil groups (Fig. 10). In fluvo-aquic soils, SOM content in areas with a double cropping system was higher than that in areas with a single cropping system. In paddy soil, areas with single and double cropping systems have higher content of SOM than other regions. In purplish soils, areas with a single cropping system have slightly higher content of SOM than other areas except quadra-cropping system since the sample size is too small (N = 2) in purplish soils. In red earths, SOM content in areas with a single cropping system and other cropping systems have higher content than areas with a double cropping system. Our results revealed that high-intensity tillage activities would reduce the SOM content in the soil, which will adversely affect the soil quality. This statement was also confirmed by many other researchers (Bayer et al. 2002; Pittelkow et al. 2015; Shakoor et al. 2020).

Fig. 10
figure 10

SOM content in the same kind of subgroup soil in different cropping systems (CS)

3.4.4 Terrain factors and the elevation

Slope and Elev were also found as important factors for regulating SOM content (Fig. 7). Terrain attributes are typically assumed to affect the spatial distribution of SOM by redistributing the environmental variables and soil materials and further regulating particular conditions for vegetation growth, nutrient migration, and carbon cycling (Sun et al. 2010). Terrain attributes are also the principal factors that govern the process of soil formation (McBratney et al. 2003; Zhou et al. 2016; Yan et al. 2020), and therefore, terrain factors have been widely used for predicting soil properties (Zhou et al. 2016; Yang et al. 2014; Caubet et al. 2019; Chen et al. 2019; Peng et al. 2019; Hu et al. 2020c). Many studies have confirmed terrain attributes as controlling factors of SOM (Schillaci et al. 2017; Zhu et al. 2018; Yu et al. 2020). The influence of altitude was complicated and was likely indirect. Liu et al. (2011) reported that wetter and warmer conditions could improve biomass productivity in areas with relatively lower elevation, which then lead to more SOM input. Generally, lower temperatures are expected in places with high altitudes, which then slow the rate of SOM decomposition and result in relatively high SOM contents (Leifeld et al. 2005). Many researchers have reported that SOM content increased linearly with the increase in altitude (Raich et al. 2006; Girardin et al. 2010; Dieleman et al. 2013). Other factors, such as slope, landform, and soil erosion degree, are related to the erosion and deposition of soil. Strong soil erosion tends to occur on areas with steep slopes and higher precipitation, which then accelerates the loss of more soil fertilisers, including SOM, and usually leads to a higher concentration of SOM in the downward slope site or valley (Nan et al. 2012). The study conducted by Zhong and Xu (2009) revealed a negative relationship between the slope and the SOM content.

3.5 Limitations and implication

As with other studies, there are several limitations in our study. Firstly, bacterial and fungal diversity plays a critical role in controlling SOM decomposition in soil (Yuste et al. 2011). This factor was not taken into account in our prediction model. Therefore, including information on bacteria and fungi may improve the model performance in further work. Secondly, we used the map downloaded from the REDC Platform, which has a resolution of 1 km. However, according to Barthold et al. (2008), an elevation map with a resolution of 90 m is probably too coarse to capture some key topographic processes, and an elevation map at a 30-m resolution was already too coarse to capture the spatial variation of soil properties such as soil potassium. Therefore, the data we used in our study may be too coarse to enhance the prediction of SOM in soil. In future studies, maps of land use, population density, precipitation, and temperature with finer resolution should be used to build the model, and a higher model accuracy and higher importance for these variables can be expected. Thirdly, 17 factors were used in this study to construct a model to estimate SOM content, and some of the factors were closely related to each other. Therefore, multi-collinearity should be removed in further work, especially when many covariates are included in the model, which may improve model performance. Finally, part of information such as landforms was recorded in the field based on expert knowledge. This may lead to deviation from practical situations and have negative effects on the results.

4 Conclusions

In the current study, we employed OK to estimate the spatial variability of SOM in Yueyang County in Eastern Dongting Lake Plain in southern China. Then, we used RF to build a model to predict SOM content and to identify potential controls for SOM variation. The mean content of SOM was 33.17 g kg−1, which was clearly higher than the mean SOM content in farmland soils in China (Yang et al. 2017), and indicates high-fertility soil status in the survey region. High values of SOM were mostly discovered in the northeast and southwest part of the study regions, whereas low values of SOM were largely situated in the western and southeastern parts.

Our developed RF model showed a good performance to estimate SOM content by making use of other covariates. Available phosphorus, precipitation, soil group, rotation system, available potassium, altitude, and slope played an important role in predicting SOM content. It indicates that these variables have greater effects on accumulation of SOM. Results obtained from this study could provide new insights into the prediction of SOM and could contribute to the sustainable development of agriculture and better regulation of soil quality in the study area.

5 Acknowledgements

Thanks to Prof. Qinke Yang for providing constructive suggestions for improving our manuscript.