1 Introduction

The soils store about 1500 Pg (1 Pg = 1015 g) of organic carbon for the upper 1 m depth, and this amount is about two times larger than that present in the atmosphere (Batjes 1996). Thus, even a small fraction of the organic carbon pool being converted into greenhouse gases and then released into the atmosphere could trigger significant positive feedback towards global warming (Melillo et al. 2002; Stockmann et al. 2013; Schuur et al. 2015). A comprehensive evaluation to the spatial variation, stocks, and environmental controls of the soil organic carbon (SOC) may therefore enable one to better parameterize the carbon cycle models which have been used to predict feedbacks of soil carbon to global environmental change. Subsequently, this can help support new measures for improved carbon management and climate change mitigation policy (Yang et al. 2008; Liu et al. 2011; Scharlemann et al. 2014; Li et al. 2018).

Soils in alpine regions are of great importance in the global carbon cycle, mainly because of the high organic carbon density and the potential carbon losses as influenced by a warming climate (Prietzel et al. 2016; Chen et al. 2016a; Zhu et al. 2018). It has been suggested that the warming of the climate system in a mountainous region is likely to be elevation-dependent (Pepin et al. 2015); thus, organic carbon loss rate at a higher elevation zone may be subsequently enhanced. Additionally, as the carbon losses are highly dependent on the size of the standing organic carbon stock (Yang et al. 2010), an accurate estimation to the regional organic carbon stocks in alpine soils should therefore be regarded as a major research priority in the soil science area. However, due to insufficient field measurements resulting from a limited accessibility as well as a strong topography-induced spatial heterogeneity, the estimation to SOC stocks in alpine regions is still characterized by relatively large uncertainties (Chen et al. 2016a; Zhu et al. 2017; Qin et al. 2018). To reduce these uncertainties, a comprehensive soil sampling procedure is necessary, and appropriate upscaling and soil modeling methods should also be introduced to facilitate the estimation of SOC.

The most commonly used methods for upscaling the site-level measurements of SOC to the regional scale include the kriging interpolations approach based on geostatistical models, the classical statistical model that assume linear or non-linear relationships, and the more recently adopted machine learning techniques (Meersmans et al. 2008; Prasad et al. 2018a, b; Ghorbani et al. 2019). The ordinary kriging interpolation in particular has been used to predict the spatial patterns of SOC in regions that have a reasonably less strong local heterogeneity (Yang et al. 2010; Li et al. 2018). However, in regions characterized by complex landscape, where the SOC profile is likely to vary sharply at a very short distance, the interpolations-based approach (e.g., kriging) must be further calibrated by using other environmental covariates that influence the SOC, to improve the prediction accuracy (Song et al. 2016). The classical statistical models, on the other hand, such as those based on general linear model (Rial et al. 2016) and its modified counterpart known as the generalized linear model (Yang et al. 2008), including linear mixed models (Doetterl et al. 2013), and general addictive models have also been employed to predict the SOC content (de Brogniez et al. 2015). In recent years, with a rapid development of artificial intelligence approaches, machine learning techniques have evolved. Among them, artificial neural network (Yang et al. 2014), support vector machines (Ding et al. 2016), boosted regression tree (Yang et al. 2016), and the random forest models (Akpa et al. 2016; Kouadio et al. 2018) have been the most used technique in mapping and modeling SOC and related soil parameters. These machine learning techniques were demonstrated to have a better performance compared to the other methods, since these approaches have a good ability to fit the non-linear relationships between SOC and any of its related environmental covariates (Grimm et al. 2008; Ding et al. 2016).

As for the environmental controls of SOC in a mountainous region, the topography, which substantially shapes the climatic, edaphic, and ecological processes, may also be regarded as a fundamental control parameter (Yimer et al. 2006; Lozano-García et al. 2016; Zhu et al. 2017). Generally, topographic factors do not directly affect the SOC accumulation, whereas they may shape the spatial patterns of SOC through directly affecting climatic factors, which can further alter the plant production and decomposition rates of the organic matter. Although many studies have highlighted the role of topography in influencing SOC at hillslopes and catchment scales (e.g., Lozano-García and Parras-Alcantára 2014; Chen et al. 2016a; Bangroo et al. 2017; Zhu et al. 2018), only a few have quantified how topography can affect the SOC through exerting its direct or indirect influence on other climatic, vegetative, and edaphic factors, especially in semiarid alpine regions. This knowledge, if well researched, is likely to enable researchers to better understand the topography-climate-vegetation-carbon relationships under complex topography. Furthermore, it will help reduce uncertainties in modeling the relative direction of change and the size of the carbon-climate feedback in a semiarid alpine region.

The Qilian Mountains, which is the focus of this study, has elevation ranging from 1600 to 5500 m and is a typical semiarid alpine region in northwestern China. These mountains have a long history of land use activities focused on grazing pastures. However, due to their importance in providing the main source of water flow for the oases in the middle and lower reaches (Yang et al. 2017), human activities in the study area are relatively limited since establishment of the National Nature Reserve of the Qilian Mountains in 1988. Across the range of this mountain, the topography-induced vegetation patterns are characterized both the vertical zonality and the horizontal patchiness. Within each vegetation and elevation zone, the aspect- and slope-induced microclimates are able to shape the community occurrence and plant productions at a smaller scale (Zhu et al. 2017; Zhu et al. 2018). This complex topography-induced vegetation patterns could further promote a much stronger spatial heterogeneity in the SOC profile within a relatively short distance (Qin et al. 2016; Zhu et al. 2017). Taken together, the unique vegetation patterns and strong spatial heterogeneity, together with the limited anthropogenic disturbances, make the Qilian Mountains an ideal semiarid alpine region for evaluating the spatial variability, stocks, and environmental controls of SOC.

Considering this importance and a paucity of this information over any semiarid alpine region, this study has conducted a soil sampling procedure by the topography across a typical basin in the middle Qilian Mountains. The main objectives of this research study are threefold: (1) to investigate the distribution of SOC density in respect to its topography factors, (2) accurately estimate the regional SOC stocks based on a more appropriate upscaling method, and (3) determine how the topography is able to shape the spatial patterns of SOC density by directly exerting influences on the other environmental controls.

2 Materials and methods

2.1 Study area

This research study was carried out in the Yingluoxia basin (98°–101°E, 38°–42°N), located in the middle Qilian Mountains. This region lies at the northern edge of the Qinghai-Tibetan Plateau (Fig. 1a). The Yingluoxia basin covers an area of 1.05 × 104 km2 with elevation ranging from 1755 to 5051 m ASL (Fig. 1a). The basin is characterized by a typical semiarid alpine climate with regional mean annual temperature and precipitation around − 4.6 °C and 396 mm, respectively. The spatial patterns of climatic factors mainly depend on the elevation, with a lapse rate of 0.58 °C for temperature changes, and about 15.5–22.1 mm per 100 m increasing rate of precipitation, respectively (Chen et al. 1992). Vegetation zone is strongly stratified by the elevation, and it varies from being that of a desert steppe (< 2500 m), a forest steppe (2500–3300 m), a subalpine shrub-meadow (3300–3600 m), an alpine meadow (3600–4000 m) to the alpine desert zone (> 4000 m) (Fig. 1b, c). Dominant species in the study area include Stipa breviflora, Agropyron cristatum, Polygonum viviparum, and Kobresia spp. for grasslands; Potentilla fruticosa, Berberis diaphana, Caragana jubata, and Salix gilashanica for shrublands; and Sabina przewalskii and Picea crassifolia for forests.

Fig. 1
figure 1

Distribution of sampling sites across the Yingluoxia basin in the Qilian Mountains, northwestern China (a). Distribution of vegetation type in the basin (b). Photographs of the eight vegetation types with elevation, aspect, and slope added in the brackets (c). The S, NW, N, E, and SW in the brackets in c represent the south-, northwest-, north-, east-, and southwest-facing slopes, respectively

The soil parent materials are mainly eolian deposits and residual, slope, diluvial, alluvial, and fluvioglacial deposits made of limestone, sand-shale stone, or conglomerate. According to the Chinese soil genetic classification, the soils along the elevation gradient are defined as sierozems, kastanozems, chernozems, gray cinnamon, subalpine meadow soils, subalpine shrubby meadow soils, alpine meadow soils, and alpine frigid desert soils (Chen et al. 1992). These are roughly referred as luvic calcisols, kastanozems, chernozems, haplic greyzems, mollic leptosols, eutric leptosols, gelic leptosols, and glacic cryosols, respectively, according to the World Reference Base for Soil Resources 2014 (IUSS Working Group WRB 2015). In the alpine desert zone, the soil depths are generally less than 20 cm, and there appear to be rock compositions, mainly of the limestone type that lies below 60 cm in the sloping areas of shrublands and forests zones. As for the steppe and desert steppe zones, the soil profile comprises of lime concretions below the 60-cm depth.

2.2 Soil sampling and analysis

During the summer period of 2012–2015, researchers conducted soil sampling campaigns across the middle of the Qilian Mountains, and a total of 138 sampling sites were selected according to the topography and its vegetation type (Fig. 1). At each sampling site, the geographic and topographic information (i.e., the longitude, latitude, elevation, aspect, and slope) was recorded by a hand-held GPS and a compass. Following this, five soil profiles in a 10-m × 10-m plot were excavated, and the bulk density samples of 5.00 cm in diameter and 5.05 cm in height were collected by using the core ring method (100 cm3 core volume) at a depth of 5, 15, 30, and 50 cm. Soil samples with three replicates were collected for an analysis of the soil chemical properties next to each soil profile using a soil auger (3.5 cm in diameter) at 0–10, 10–20, 20–40, and 40–60 cm. Finally, the soil samples from each plot were pooled to generate a composite sample from each of the four depth intervals.

In this study, the soil samples collected in the field were first sealed in aluminum specimen boxes and then transported to the research laboratory. Soil samples were then air-dried and then passed through a 2-mm sieve before any chemical analysis. The SOC content was determined by means of the wet oxidation with dichromate according to the Walkley-Black method (Nelson and Sommers 1982). Soil texture (i.e., sand, silt, and clay content) was determined by a laser diffraction approach using a Mastersizer 2000 (Malvern Instruments, Malvern, England). Soil bulk density was calculated as the ratio of the oven-dry soil mass (after 24 h desiccation at 105 °C) to the core volume (100 cm3). Soil pH was measured in a 1:2.5 soil-to-deionized water mixture by a pH electrode (PB-10, Sartorius, Germany). SOC density (SOC amount per unit area) for a given depth at each site was calculated using Eq. (1) as (Yang et al. 2008):

$$ \mathrm{SOCD}=\sum \limits_{i=1}^n{C}_i{D}_i{B}_i\left(1-{G}_i\right)/100 $$
(1)

where SOCD is the SOC density (kg C m−2), n is the number of layers divided in each soil profile, and Ci, Di, Bi, and Gi represent the SOC content (g kg−1), thickness (cm), bulk density (g cm−3), and fractional percentage (%) of gravel larger than 2 mm in diameter of layer i, respectively.

2.3 Environmental variables

The elevation, aspect, slope, longitude, and latitude data in their raster formats at a 90-m resolution were derived from the SRTM 90-m digital elevation model (DEM) (USGS 2006) using ArcGIS 10.2 (ESRI Inc., USA). The monthly compositing normalized difference vegetation index (NDVI) products at a 30-m resolution (2011 to 2014) were derived from the HJ/CCD (Li et al. 2017). The vegetation type data at the 1:100000 scale was also obtained from the Cold and Arid Regions Science Data Center at Lanzhou (http://westdc.westgis.ac.cn). The monthly NDVI products to a 90-m resolution were resampled, and subsequently, they were averaged over the growing season (from May to September) to generate their relative seasonal values (Ding et al. 2016).

In the present study area, the mean values of annual temperature and precipitation mainly depended on the topographic factors. These were obtained for each sampling site according to the following empirical equations (Zhao et al. 2005, 2006):

$$ \mathrm{MAT}=20.957-0.00549H-0.166Y+0.0089X,{R}^2=0.98 $$
(2)
$$ \mathrm{MAP}=1680.6235+0.119H-75.264Y+12.405X,{R}^2=0.92 $$
(3)

where MAT, MAP, H, Y, and X is the mean annual temperature (°C), mean annual precipitation (mm), elevation (m), latitude (°), and longitude (°) at the sampling sites, respectively. Note that the latitude, longitude, and elevation for each sampling site were derived by using a hand-held GPS.

The potential solar insolation for each sampling site was calculated based on the following equation (McCune and Keon 2002):

$$ \mathrm{Rad}=0.339+0.808\mathrm{cos}(Y)\cos (S)\hbox{--} 0.196\mathrm{sin}(L)\sin (S)\hbox{--} 0.482\mathrm{cos}(A)\sin (S),{R}^2=0.98 $$
(4)

where Rad represents the potential solar insolation (MJ cm−2 year−1), and Y, S, and A is the latitude (°), slope (°), and aspect (°) at each of the sampling site, respectively.

2.4 Model inputs

In accordance with previous studies, the vegetation index and topographic factors are reasonably good indicators of the SOC under complex topographic systems, especially in an alpine region (Yang et al. 2008; Song et al. 2016; Yang et al. 2016). In addition, it is important to note that an ideal model must have a series of covariates that are reasonably well correlated with the dependent variable (e.g., SOC), while the covariates themselves must have negligible correlation with each other. In this study, a pre-analysis showed that the climatic factors were highly depended on topographic factors as per Eqs. (2)–(4). Consequently, to minimize the uncertainties encountered in the estimation of SOC, and also to ensure computational efficiency of the models is not compromised due to a redundancy of the model’s potential inputs, this study has only used satellite-derived vegetation index (i.e., the NDVI), elevation, aspect, slope, and latitude as model inputs to predict the spatial patterns of SOC density at 0–10, 10–20, 20–40, and 40–60 cm.

2.5 Model development process

In this research study, both a multiple linear regression (MLR) and a random forest (RF) regression model were developed using site-level measurements to predict the spatial patterns of SOC density at a depth of 0–10, 10–20, 20–40, and 40–60 cm across the study basin. The study has utilized the MLR model, as a classical regression method belonging to the general linear model, in which the SOC density can be modeled as a linear combination of the respective environmental covariates. By contrast, the RF regression model is a relatively newly developed machine learning technique especially in soil science area (Breiman 2001). Despite this, it has been increasingly used, for example, in digital soil mapping (Vågen et al. 2013; Heung et al. 2016), forecasting soil moisture (Prasad et al. 2018a, 2018b), modeling coffee yield in respect to soil fertility parameters (Kouadio et al. 2018), and the prediction of soil electrical conductivity (Ghorbani et al. 2019). According to these studies, RF regression model was seen to exhibit a relatively good performance in predicting SOC, mainly because it is able to well accommodate the non-linear relationships between SOC and environmental covariates (Grimm et al. 2008; Yang et al. 2016; Akpa et al. 2016). In addition, the RF model is characterized by two randomization procedures of bootstrapping and random input selection and carries out a bagging of the predictions, which can subsequently improve its prediction ability (Suuster et al. 2012; Heung et al. 2016).

The best-fit equation of the MLR model used in this study can be stated as follows:

$$ \ln \left(\mathrm{SOCD}\right)={a}_0+{a}_1 NDVI+{a}_2Y+{a}_3H+{a}_4A+{a}_5 AS $$
(5)

where SOCD is the SOC density (kg C m−2), a0, a1, a2, a3, a4, and a5 are the intercept, regression coefficient for the NDVI, latitude (Y), elevation (H), aspect (A), and the interaction between aspect and slope (AS), respectively.

The regression coefficients, which are normally used to explain the response variable against a set of exploratory variables, were estimated by using the ordinary least squares method. In addition, we log-transformed the SOC density to meet the normal distribution requirements of the MLR approach, which helped avoid any negative (i.e., false) predictions of the tested SOC dataset.

It should be noted that the elevation term in Eq. (3) had to be first transformed, mainly to ensure a linear relationship is established between elevation and the ln(SOCD), so as to further improve the performance of the prescribed MLR model. The transformation was also conducted according to Eqs. (3)–(6) for a depth of 0–10, 10–20, 20–40, and 40–60 cm, respectively, detailed as follows:

$$ \mathrm{For}\ 0-10\ \mathrm{cm},H=2.0248\exp \left(-0.5{\left|\left(h-3385.9358\right)/797.3748\right|}^{2.1113}\right) $$
(6)
$$ \mathrm{For}\ 10-20\ \mathrm{cm},H=1.8412\exp \left(-0.5{\left|\left(h-3402.6992\right)/775.0846\right|}^{1.7827}\right) $$
(7)
$$ \mathrm{For}\ 20-40\ \mathrm{cm},H=2.4320\exp \left(-0.5{\left|\left(h-3426.2270\right)/910.6737\right|}^{1.0000}\right) $$
(8)
$$ \mathrm{For}\ 40-60\ \mathrm{cm},H=2.2589\exp \left(-0.5{\left|\left(h-3472.1136\right)/830.1236\right|}^{1.0000}\right) $$
(9)

In Eqs. (6)–(9), the term H and h refers to the transformed and real (or raw) elevation (m), respectively. These equations above were obtained by regressing the relationships between ln(SOCD) at the four selected depths with the real (or raw) elevation dataset (Fig. 2a–d). In addition, as the raw values of aspect can be relatively less useful variable in building models (Zhu et al. 2017), transformed transformation of the aspect with the cosine function was implemented, as a data pre-processing technique before conducting the MLR modeling procedure.

Fig. 2
figure 2

Relationships between SOC density (SOCD) and elevation at 0–10 (a), 10–20 (b), 20–40 (c), and 40–60 cm (d). Note that the SOC density is log-transformed to meet the normal distribution premise of the fit models, and also to avoid negative predictions in SOC density at very lower or higher elevations

To fit the RF regression model in this study, researchers employed the same model inputs used in the MLR model. As for the key parameters incorporated into the regression model, one third of the total number of covariates were used to grow each of the random forest’s tree, with the minimum number of terminal nodes set to 5, and the number of trees in the forest set at 1000. The “lm” function and “randomForest” package in the R software was used for modeling (R version 3.5.1, R Development Core Team 2018).

2.6 Model evaluation criteria

For any predictive model, the evaluation of its performance with both the visual display measures (e.g., plots of modeled and real values) and overall statistical metrics is important. To validate the accuracy of the prescribed MLR and the RF regression models used in predicting SOC at the four different depths, this study has applied a 10-fold cross-validation procedure (Ali et al. 2018; Yin et al. 2018). In the 10-fold cross-validation approach, the original dataset was randomly divided into ten equal-sized subsets. Out of these ten subsets, a single subset was retained as the validation data for testing the developed model, and the remaining nine subsets were used as the model training dataset. This cross-validation procedure was then repeated ten times, with each of the ten subsets used exactly once as the validation data. The ten sets of results were then averaged to generate a single estimation model truly representative of a well-optimized model for SOC. The advantage of this procedure, which follows from a 50-fold model developed earlier (Yin et al. 2018) entailed from its repeated random sub-sampling approach where all of the observations were used in both the training and the validation process, and each of these observations were also used for the validation process exactly once.

After each round of a 10-fold cross-validation procedure, the root mean square error (RMSE), coefficient of determination (R2), and Lin’s concordance correlation coefficient (LCCC) (Lin 1989) were calculated for the two prescribed models, according to the following equations (Yang et al. 2016):

$$ \mathrm{RMSE}=\sqrt{\frac{1}{n}\sum \limits_{i=1}^n{\left({P}_i-{O}_i\right)}^2} $$
(10)
$$ {R}^2=\frac{\sum \limits_{i=1}^n{\left({P}_i-\overline{O}\right)}^2}{\sum \limits_{i=1}^n{\left({O}_i-\overline{O}\right)}^2} $$
(11)
$$ \mathrm{LCCC}=\frac{2r{\sigma}_O{\sigma}_P}{{\sigma_O}^2+{\sigma_P}^2+{\left(\overline{O}-\overline{P}\right)}^2} $$
(12)

where Pi and Oi are the prediction and observation at sampling site i, respectively; n, σO, σP, and r are the sample size, variance of the observations, variance of predictions, and Pearson correlation coefficient between predictions and observations, respectively; \( \overline{O} \) and \( \overline{P} \) are the means of observations and predictions, respectively. To ensure the precision of the above statistics for the two models, we totally conducted 100 times of the 10-fold cross-validation procedure and then calculated the means. The predictions of SOC became increasingly optimal as the RMSE and R2 getting closer to 0 and 1, respectively. LCCC represented the degree to which the predicted and observed values follow the 45° line.

To further quantify the relative uncertainties in the bootstrapped predictions generated by trees or rules from different models in RF regression, we calculated the interquartile value (i.e., the difference between the 75th and 25th percentile) of the predicted SOC by performing 100 simulations of the RF regression for each pixel (Ding et al. 2016). As for the uncertainties in the MLR model, the 50% confidence interval of the regression results was first calculated, and then the upper limits with the lower limits was subtracted to obtain the interquartile for each pixel. The summed up quartiles were then used to assess the uncertainty of the SOC for each vegetation type. In addition, the relative uncertainty was obtained by dividing the interquartile by mean value for each pixel.

2.7 Statistical analyses

Analysis of variance (ANOVA) was used to examine the variation in SOC density in respect to the topographic factors. For this, the Duncan’s new multiple range tests were performed for multiple comparisons when significant differences were detected by the ANOVA approach. In addition, the general linear models (GLMs) were used to quantify the contribution of aspect, elevation, and slope to the overall variation in SOC density. Furthermore, to explore the relationships between SOC and the topography-induced climatic factors, as well as the vegetative and edaphic factors, the Pearson correlation analysis was first conducted. Then, the relative importance of each control on SOC density based on the structural equation modeling (SEM) was quantified according to the known relationships between SOC and its key controls. The SEM was a multivariate statistical method that can test the plausibility of a hypothetical model, which was based on a priori information regarding the relationships among particular variables. Additionally, the SEM method was also able to partition the direct and indirect effects that one environmental factor may have on another, and also to estimate the size of the multiple effects (Chen et al. 2016b).

Considering the non-linear relationships between SOC density and mean annual temperature and precipitation in this study, the two climatic factors with functions derived from the fit equations between SOC density and them were transformed, before conducting the Pearson correlation and SEM analyses. Also, the real (raw) elevation and aspect with Eqs. (6)–(9) and cosine function, respectively, were transformed and then standardized, before performing the GLM analyses. Both the ANOVA and the GLM analyses were performed employing the GLM procedure in SAS 9.2 (SAS Institute, Inc., Cary, NC, USA). The results of the Pearson correlation analysis were also obtained and visualized by the “cor” function and “corrplot” package, respectively, in R software. The SEM was performed using AMOS 20.0 software (IBM Corp., Armonk, New York, USA).

3 Results

3.1 Spatial distribution of SOC density

In general, the spatial distribution of SOC density was significantly shaped by the topographic factors (Table 1 and Fig. 2). Specifically, the SOC density of 3000–3500 m was close to 3500–4000 m, and it was also significantly larger than that at higher or lower elevation zones (P < 0.01). The fit lines in Fig. 2 further indicated that SOC density had reached its peak at an elevation zone of ~ 3400 m. Along the aspect gradient, the SOC density actually decreased by 28%, 36%, 42%, and 46% as the aspect was transformed from north (0–45°) to south (135–180°) at a depth of 0–10, 10–20, 20–40, and 40–60 cm, respectively. SOC density was less varied with slope while still presented a slightly decreasing trend with increasing slope (Table 1).

Table 1 Spatial distribution of SOC density

3.2 SOC stock estimation

The R2 values of the RF regression were higher than that of the MLR except at a depth of 0–10 cm, where the R2 value of the MLR was higher than that of RF regression by about 0.01 (Fig. 3a–h). The average uncertainties in the MLR model reached about 23.91%, 28.52%, 18.99%, and 21.47% for the 0–10, 10–20, 20–40, and 40–60 cm, respectively, with large uncertainties mainly existent in regions with lower SOC values (Fig. 4a–d). By contrast, the RF regression provided a more robust prediction at most parts of the basin, with average uncertainties of about 6.48%, 6.40%, 7.94%, and 8.07% at the four depths, respectively (Fig. 4e–h).

Fig. 3
figure 3

Model evaluation for both the MLR (ad) and the RF regression (e–h) in predicting SOC density (SOCD) at 0–10, 10–20, 20–40, and 40–60 cm. Note that the predicted SOC density in the scatter plots are averages of the predictions from 100 runs of the 10-fold cross-validation

Fig. 4
figure 4

Relative uncertainties of SOC density at a resolution of 90 m × 90 m across the basin for different depths, estimated by the MLR (ad) and the RF regression (e–h). The relative uncertainties are calculated as the ratio between the interquartile range (difference between the 75th and 25th percentiles) and the mean

According to SOC density map estimated by the RF regression (Fig. 5a–h), the total SOC stock at 0–60 cm was estimated to be about 219.33 Tg C (1 Tg = 1012 g) with an average density of 21.25 kg C m−2 (Table 2). The SOC stock at a depth of 0–60 cm in the alpine meadow, subalpine shrub, and subalpine meadow was estimated to be about 77.78, 39.59, and 38.94 Tg C, respectively, which together accounted for about 71.27% of the total stock over the basin (Table 2).

Fig. 5
figure 5

SOC density (SOCD) at different depths estimated by the MLR (ad) and the RF regression (e–h) at a resolution of 90 m × 90 m across the basin

Table 2 Summary of estimated density and stock of SOC (with interquartile range) to a depth of 60 cm by the RF regression for different vegetation types across the basin

3.3 Environmental controls of SOC density

The general linear model (GLM) indicated that the topography could explain about 58.11%, 54.66%, 56.98%, and 55.32% of the total variation in SOC density at the 0–10, 10–20, 20–40, and 40–60 cm, respectively (Table 3). Out of the topographic factors examined in this study, the elevation and aspect factors were able to explain the largest two proportions (i.e., 44.39% and 11.25%, respectively) of the variation in SOC density at 0–10 cm, while the slope and the interactions between aspect and slope was able to only explain about 1.07% and 1.40%, respectively. Similar proportions contributed by each topographic factor were also obtained at a depth of 10–20, 20–40, and 40–60 cm (Table 3).

Table 3 Contribution of topographic factors to the total variation of SOC density at different depths

The SOC density showed somewhat different correlations with the climatic, vegetative, and edaphic factors at the four depths (Fig. 6a–d). Specifically, the SOC density at 0–10 cm was significantly positively correlated with mean annual precipitation (correlation coefficient = 0.59), clay (0.42) and silt (0.21) content, and vegetation type (0.65), while negatively correlated with mean annual temperature (− 0.66), solar insolation (− 0.29), soil pH (− 0.39), and sand content (− 0.41) (Fig. 6a). Similar correlations were also found for deeper layers (Fig. 6b–d).

Fig. 6
figure 6

Pearson correlation coefficients (r) for SOC density (SOCD) at 0–10 (a), 10–20 (b), 20–40 (c), and 40–60 cm (d) and other environmental controls. Correlation coefficients larger than r(0.05) are statistically significant at the 0.05 level. The Veg, MAP, MAT, and Rad represent vegetation type, mean annual precipitation, mean annual temperature, and solar insolation, respectively. As the vegetation type is not a numerical variable, we first rank it from 1 to 8 before correlation analysis, which sequentially represent the montane desert steppe, alpine desert, montane steppe, alpine meadow, subalpine meadow, montane forest, subalpine shrub, and montane meadow steppe. The rank is conducted according to the average NDVI value under each vegetation type, and higher code value corresponds to higher vegetation cover and potentially higher plant production

The structural equation modeling (SEM) further showed that the climatic, vegetative, and edaphic factors were able to explain about 54%, 50%, 56%, and 62% of the total variation in SOC density at a depth of 0–10, 10–20, 20–40, and 40–60 cm. Specifically, the solar insolation appeared to have a direct effect on SOC density for a depth of 20–40 cm, and it also had an indirect effect on SOC density by negatively influencing vegetation type for all considered depths. Mean annual precipitation indirectly affected SOC density through its positive correlation with vegetation type (Fig. 7a–d). Mean annual temperature exerted the strongest direct effect on SOC density, with a standardized coefficient (β) of about − 0.43, − 0.33, and − 0.35 at a depth of 0–10, 10–20, and 20–40 cm, respectively (Fig. 7a–c). As for the vegetation type, which was strongly affected by mean annual temperature (β = − 0.65), the results showed the strongest direct effect on SOC density at a depth of 40–60 cm (Fig. 7d). In addition, the clay content also had a significant direct effect on SOC density at 10–20 and 40–60 cm.

Fig. 7
figure 7

Structure equation modeling (SEM) examining the climatic, vegetative, and edaphic effects on SOC density (SOCD) at 0–10 (a), 10–20 (b), 20–40 (c), and 40–60 cm (d). Single-headed arrows represent the hypothesized direction of causation. Numbers next to single-headed arrows are standardized path coefficients, which indicate the effect size of the relationship. Double-headed arrows indicate covariance between related variables. Red and purple arrows indicate negative and positive relationships, respectively, and arrow width is proportional to the strength of the relationship. The proportion of variance (r2) explained appears below each response variables in the model. Only statistically significant (P < 0.05) relationships were shown. The Veg, MAP, MAT, and Rad represent vegetation type, mean annual precipitation, mean annual temperature, and solar insolation, respectively. *P < 0.05; **P < 0.01; ***P < 0.001

4 Discussion

4.1 Spatial distribution of SOC density

In any mountainous region, SOC tends to exhibit an increasing trend with an increase in its elevation mainly due to the lower decomposition rate at higher elevations (Parras-Alcantára et al. 2015; Zhao et al. 2017). Evidence can be drawn from the study of Prietzel and Christophel (2014) which suggested that the soils at generally high-elevation sites are likely to have a particularly large SOC density, for example, the forest soils of German Alps. However, in this study, the researchers have found that the SOC density increased with an increase in elevation below ~ 3400 m, and then it showed a decreasing trend with elevation above that height (Fig. 2). A similar result was obtained by Chen et al. (2016a), who had found that the SOC density increased significantly with elevation from 2650 to 3400 m on the north-facing slopes. However, it showed a significantly decreasing trend from 3400 to 3700 m. Hence, as a good indicator of SOC in alpine regions, the elevation should first be calibrated to assure its linear relationship with SOC, as suggested by Eqs. (6)–(9) in this study, so as to improve the accuracy of the linear models used to predict SOC.

The SOC density on the north-facing slopes decreased by 44% in comparison with the south-facing slopes across the basin (Table 1). Similar results were obtained by several recent studies, which showed that SOC was higher on the north-facing slopes (Lozano-García et al. 2016; Qin et al. 2016). In addition, in regions where the natural pattern of vegetation type is strongly shaped by aspect, the spatial variability of SOC may be further enhanced. Zhu et al. (2017) has highlighted the importance of aspect in markedly shaping the spatial patterns of SOC at a hillslope scale in the forest steppe zone of the Xishui Forest Reserve, northwestern China, where SOC on the north-facing slopes was larger than the south-facing slopes by a factor of 3.2.

Generally, the slope is likely to affect the variability of SOC at the hillslope scale (Zhu et al. 2018). Slopes may thus alter the distribution of SOC along the hillslope through soil erosion and decomposition processes (Zhang et al. 2013; Ma et al. 2016). In this study, we found that SOC showed a slightly decreasing trend with increasing slope. This trend was consistent with that of many previous studies (Karchegani et al. 2012; Ajami et al. 2016), whereas the variation of SOC by slope in this study was more trivial (Table 1). This was probably associated with the limited soil erosion processes along hillslopes in the study area, mainly due to higher vegetation cover, fewer extreme precipitation events, and less disturbances to soils as a result of limited human activities. In addition, the effect of vegetation type may mask the slope effect on SOC, as a previous study conducted only in grasslands suggested that, among the examined topographic factors, slope positions could contribute to almost 10% of the total variation in SOC (Zhu et al. 2018).

4.2 SOC stock estimation

The 10-fold cross-validation and the uncertainty analysis suggested that the RF regression exhibited better performance (higher R2 values and lower uncertainties) in predicting SOC density than the MLR (Figs. 3 and 4), although we have pre-processed the raw elevation and aspect data for the MLR. This could be ascribed to the two advantages of the RF regression. First, compared with linear regression models, the RF could better deal with the non-linear relationships between SOC and environmental covariates (Grimm et al. 2008; Akpa et al. 2016). Second, as a machine learning technique, the RF is not constrained by any statistical premise such as normality and independence, and also overcomes the limitations of parametric and nonparametric statistical methods, such as spatial autocorrelation (Breiman 2001; Were et al. 2015). In this study, although we used topographic factors and NDVI as predictors in the RF regression, the R2 value obtained by the 10-fold cross-validation still reached to 0.55–0.60, which was close to that of previous studies conducted at a similar spatial scale (Wiesmeier et al. 2011; Yang et al. 2016). Taken together, our results highlighted that the RF regression combined with topographic and vegetative factors as model inputs is an effective approach for spatially explicit SOC prediction in semiarid alpine regions.

Based on the spatially gridded SOC predicted by the RF regression, we calculated the SOC stock across the basin. Results showed that the total SOC stock was estimated at 219.33 Tg C for the upper 60 cm, with an average density of 21.25 kg C m−2 (Table 2). Compared with regions under similar semiarid climates, we found that SOC density in the middle Qilian Mountains was much higher. For example, the mean SOC density in the semiarid Loess Plateau regions of China was estimated at 7.70 kg C m−2 at 0–100 cm (Liu et al. 2011). As for the semiarid Horqin grasslands, the SOC density at 0–100 cm was about 6.84 kg C m−2 (Li et al. 2018). Albaladejo et al. (2013) found that SOC density reached to 7.18 kg C m−2 in the semiarid region in Spain. Fernández-Romero et al. (2014) suggested that the maximum SOC density in the forest soils of Mediterranean natural areas reached to 15.87 kg C m−2. In the mountainous regions of northern Iran, the SOC density was estimated at about 18.58 kg C m−2 (Ajami et al. 2016).

The higher SOC density in this study area was largely associated with the cold climate under alpine environments and also with the poorly drained conditions in valleys, which together enhanced the accumulation of organic matter in soils. Our result confirmed the importance of alpine soils in regional carbon cycle due to the much higher carbon density in comparison with the surrounding zonal regions. In addition, we found that the amount of organic carbon stored in soils of the alpine meadow, subalpine shrub, and subalpine meadow totally accounted for 71.27% of the total stock across the basin. The large contribution of these alpine vegetation types to regional carbon pool further suggested that protection to vegetation at high-elevation zones should be given priority in terms of carbon management, due to the potentially large amount of carbon losses under global environmental change.

4.3 The topography-climate-vegetation-carbon relationships in semiarid alpine regions

In this study, the researchers have found that the climatic, vegetative, and edaphic factors were significantly correlated with SOC density at different depths (Fig. 6). These results were similar with many previous studies (Feng et al. 2002; Wynn et al. 2006; Prietzel and Christophel 2014; Song et al. 2016; Zhu et al. 2019). However, in semiarid alpine regions where human activities are limited, topography is supposed to be a fundamental factor in affecting the spatial patterns of SOC density (Chen et al. 2016a; Zhu et al. 2019). In this study, the GLM analyses showed that topography could explain about 58.11% of the total variation in SOC density for the upper 10 cm (Table 3). Additionally, this proportion was close to that obtained from the SEM analyses, which showed that the variation in climatic, vegetative, and edaphic factors could totally explain 54% of the total variation in SOC density at 0–10 cm (Fig. 7a). Generally, the elevation, aspect, and slope do not directly affect the SOC density, whereas they could directly shape the patterns of mean annual precipitation and temperature as well as solar insolation in semiarid alpine regions (Zhao et al. 2005, 2006; McCune and Keon 2002), which further alter the plant production and decomposition rate of soil organic matter, and thus contribute to the variability in SOC density (Qin et al. 2016; Zhu et al. 2018). In this study area, zonal climatic factors like mean annual precipitation and temperature largely depend on elevation (Eqs. (2) and (3)), while solar insolation on the hillslope mainly depend on aspect and slope (Eq. (4)). That is to say, topography substantially affects both the zonal climatic conditions and local microclimate in the study area.

The SEM analyses in this study suggested that the topography-induced variation in mean annual temperature exerted the largest control on SOC density, mainly by its direct effect, and also indirect effect as delivered by the vegetation type (Fig. 7). This was mainly because that the temperature not only directly affected the decomposition rate of organic matter in soils but also lowered the plant production at very high elevations because of reduced length of the growing season and accumulative temperature. The SEM analyses also showed that although the topography-induced variation in precipitation did not directly affect SOC density, it could exert indirect effect by its positive correlation with vegetation type. This could be ascribed to the semiarid climates in our sampling basin, where water availability was supposed to be a dominant environmental control of plant production (Yang et al. 2008; Qin et al. 2016). Higher precipitation may stimulate the growth of forest and shrub species characterized by much higher plant production than the steppe and desert steppe, and subsequently enhance the litter input into soils.

Actually, soil temperature and soil moisture as affected by microclimates along the hillslope may contribute to the spatial variability of SOC density at a smaller scale (Qin et al. 2016; Zhu et al. 2019). Importantly, this variability may not be explained by the zonal climatic factors, i.e., the SOC density could still vary significantly under the same mean annual temperature and precipitation. In this study, we used the solar insolation, as calculated by aspect and slope, to reflect this variability at the local scale. The SEM analyses suggested that solar insolation exerted direct negative effect as well as indirect effect through its negative correlation with vegetation type, on SOC density at 20–40 cm. The direct effect of solar radiation was mainly associated with its control on the heat patterns characterized by higher soil temperature on the south-facing slopes in comparison with the north-facing slopes (Qin et al. 2016), which shaped the decomposition rate of organic matter in soils. The indirect effect was mainly associated with the response of vegetation type to aspect. In the middle Qilian Mountains, the aspect-induced variability in solar insolation under such semiarid climates significantly affected soil water conditions at the hillslope scale, which further sharply shape the vegetation type along the aspect gradient (Zhu et al. 2017, 2019). As a result, steppe and desert steppe mainly occupied the south-facing slopes, while shrublands and forests were mainly distributed on the north-facing slopes. These unique patterns of vegetation type, primarily resulting from the aspect-induced variability in solar insolation, further enhance the variability of SOC density within the same elevation/climatic zone.

5 Conclusions

In this study, the higher SOC density appears to highlight the importance of soils in alpine regions in terms of the balance of regional carbon cycle. The better performance of the RF regression confirmed its effectiveness in estimating SOC stock in semiarid alpine regions. The GLM and the SEM analyses suggested the fundamental roles of elevation and aspect in shaping the spatial patterns of SOC density. In addition, the elevation-induced variation in mean annual temperature was supposed to be the most important control of SOC, highlighting the potentially large amount of carbon losses as affected by the elevation-dependent warming under future climate change.