Introduction

Forests are critical ecosystems that play an indispensable role in maintaining the health of our planet (Maier et al. 2021). As the world grapples with climate change and its associated challenges, the importance of understanding and effectively managing these intricate ecosystems has grown exponentially. One significant transformation in forest ecology and management is the shift towards the comprehensive mapping and spatial prediction of forest characteristics (Tian et al. 2023). This evolution has been fueled by the realization that precise, up-to-date information about forests is crucial for informed decision-making, conservation efforts, and mitigating the impacts of climate change.

The measurement and assessment of forest growing stock volume (GSV) have emerged as pivotal components in this endeavour. GSV is a key indicator of forest health, carbon sequestration, and ecosystem vitality. Accurate assessments of GSV are indispensable for tracking carbon budgets, understanding carbon cycling in forested landscapes, and devising strategies to combat climate change (McRoberts et al. 2007). The imperative to assess this parameter and subsequently map it lies at the core of modern forest management and conservation practices.

Remote sensing has revolutionized the way we gather data about our natural environment, particularly in the context of forests (Engler et al. 2013;  Caffaratti et al. 2021; Li et al. 2020; Sesnie et al. 2023; Uniyal et al. 2022; Hu et al. 2016). Traditional field-based methods, while valuable, are often costly, time-consuming, and spatially limited. Remote sensing, with its ability to capture vast landscapes from above, has emerged as a cornerstone tool for mapping various forest properties efficiently and accurately. Moreover, remote sensing technologies have provided researchers with the means to monitor forests on a global scale (Xie et al. 2008), offering insights into their structure, composition, and dynamics.

In recent years, the fusion of remote sensing data with advanced machine learning (ML) approaches has opened new horizons in the field of forest science (Ahmadi et al. 2020; Cho et al. 2023; Grabska et al. 2020; Liu et al. 2020; Wang et al. 2023). Among the satellites with high spatial resolution, Sentinel and Landsat are popular sources of remote sensing data (Singh et al. 2021; Tripathi and Tiwari 2020). A comparison of Landsat and Sentinel for mapping forest properties reveals that both have their strengths. For instance, Clark (2020) found that Sentinel-2 demonstrated a strong capability for mapping forest alliances, with higher overall accuracy than Landsat 8. Similarly, Astola et al. (2019) recommended Sentinel-2 as the principal data source for forest resource assessment in the boreal region. Similar results were obtained also for GSV mapping purposes (Chrysafis et al. 2017; Mura et al. 2018). These studies collectively suggest that while both satellites have their advantages, Sentinel-2 may be more suitable for certain forest mapping applications.

At the same time, the uncertainty assessment of spatial predictions for forest characteristics is a critical component in understanding the reliability of predictive models. One key aspect is the consideration of model input data, acknowledging the inherent variability and potential errors in data sources. The selection of appropriate modelling techniques and explanatory variables also plays a pivotal role, as different approaches may yield varying levels of accuracy and uncertainty (Araza et al. 2022; Persson and Ståhl 2020; Suleymanov et al. 2024). Holdaway et al. (2014) emphasized the need to quantify and incorporate measurement error and model uncertainty in plot-based estimates of forest carbon stock and carbon change. Transparently communicating uncertainty is crucial for facilitating informed decision-making and ensuring the responsible application of forest management strategies. Ultimately, a rigorous uncertainty assessment enhances the credibility and applicability of spatial predictions for forest characteristics.

Despite the increased interest in mapping forest parameters using modern ML techniques, taking into account the uncertainty of predictions is often overlooked. Accompanying forest mapping with estimates of uncertainty assessment is crucial as it provides decision-makers with a realistic understanding of the potential variability in predicted outcomes, enabling more informed and risk-aware forest management strategies. Therefore, the main objectives of this study were: (i) to compare the accuracy of Sentinel-2 A and Landsat 8 data for spatial modelling of GSV; (ii) to identify important explanatory variables; and (iii) to assess the uncertainty of spatial predictions.

Materials and methods

Study area

The study area is located in Republic of Bashkortostan (Russia), 54° 39′ 10″ N latitude and 54°03′ 50″ E longitude, and is a natural park “Kandry-Kul” with an area of about 5200 hectares (Fig. 1). Within the nature reserve, there is a village, a recreational facility, a children’s camp, and camping zones, all of which have a detrimental impact on the environment. Additionally, the territory encompasses agricultural lands and newly planted trees, causing an extremely heterogeneous landscape. The dominant tree species in the study area include birch (Betula pendula) and pine (Pinus sylvestris) (Volkov et al. 2023). Forests cover about 1000 hectares of the park.

Fig. 1
figure 1

The location of the study area (a, b) and the location of sampling points (red points)

The climate in the study area is classified as moderate continental or warm-summer humid continental (Dfb) according to the Köppen climate classification (Beck et al. 2018). Winters are cold, with an average temperature in January of -13.8 °C, reaching an absolute minimum of -50 °C. Steady snow cover lasts for about 134 days, with an average snow height of 28 cm. Winter rainfall averages around 103 mm. July experiences the highest temperatures, with an average of + 18.4 °C and an absolute maximum of + 40 °C. The frost-free period in the region spans 123 days.

Field data

The field studies were conducted in the summer of 2018. To investigate GSV levels, a single trial plot was set up in each forested site. The boundaries and area of each single plot were determined by the natural boundaries of forest stands that met the following characteristics: uniform in terms of tree species composition and age, as well as in terms of density and forest growth conditions. A total of 217 sites were selected. Then, within each sample plot, we measured the diameter at breast height diameter, height of trees and tree density. These measurements were used to estimate the volume of individual trees using auxiliary tables from the reference manual “All-Union Standards for Forest Taxation” (Anuchin 1982). We calculated the volume of each tree species using diameter and height measurements and summed the values of all trees in a plot to estimate the GSV (volume of trees per unit area, m3/ha).

Remote sensing data and pre-processing

In the study, cloud-free remote sensing data was utilized for the spatial modelling of GSV. The data were acquired from Sentinel-2 A and Landsat 8, which were synchronized with fieldwork (summer 2018). Sentinel-2 A is equipped with a multispectral sensor, capable of capturing data in thirteen spectral bands, ranging from visible light to near-infrared (NIR) and short-wave infrared (SWIR) regions. The technical characteristics of Sentinel-2 A include a spatial resolution of 10 m for visible and NIR bands, 20 m for the SWIR bands, and three bands at 60 m spatial resolution. Because of the coarse resolution (60 m), bands 1, 9 and 10 were not used as explanatory variables.

The Landsat 8 satellite is equipped with the Operational Land Imager (OLI) and the Thermal Infrared Sensor (TIRS), which collectively capture high-resolution imagery across visible, near-infrared, and thermal bands. With a revisit time of approximately 16 days, Landsat 8 continues the legacy of its predecessors by contributing to the long-term record of Earth observation data, fostering scientific research, and supporting various applications related to natural resource management and environmental assessment. The spectral bands from both satellites were scaled to surface reflectance by applying the image-based, dark-object subtraction (DOS) atmospheric correction (Chavez 1988). Considering the uniform and relatively small size of the trial plots, we used single pixel extraction of remote sensing data that corresponds to the center plot.

Additionally, several spectral indices, including NDVI (Normalized Difference Vegetation Index) (Rouse et al. 1974), GRVI (Green-Red Vegetation Index) (Tucker 1979) and LSWI (Land Surface Water Index) (Xiao et al. 2004), were computed. To mitigate multicollinearity and redundancy, we intentionally limited the creation of additional spectral indices, opting for a focused selection of widely used and accepted metrics. This approach not only streamlined our analysis but also facilitated comparisons with other studies, ensuring the reproducibility and consistency of our findings within the broader scientific community. Their Eqs. (13) are defined as follows:

$$\:\text{N}\text{D}\text{V}\text{I}=\:(\text{N}\text{I}\text{R}\:\--\:\text{R}\text{e}\text{d})\:/\:(\text{N}\text{I}\text{R}\:+\:\text{R}\text{e}\text{d})$$
(1)
$$\:\text{G}\text{R}\text{V}\text{I}=\:(\text{G}\text{r}\text{e}\text{e}\text{n}\:\--\:\text{R}\text{e}\text{d})\:/\:(\text{G}\text{r}\text{e}\text{e}\text{n}\:+\:\text{R}\text{e}\text{d})$$
(2)
$$\:\text{L}\text{S}\text{W}\text{I}=\:(\text{R}\text{e}\text{d}\:\--\:\text{S}\text{W}\text{I}\text{R})\:/\:(\text{R}\text{e}\text{d}+\:\text{S}\text{W}\text{I}\text{R})$$
(3)

where Red, Green, NIR and SWIR are spectral reflectance measurements acquired in red, green, NIR and SWIR regions, respectively.

Variable selection

The covariates’ selection procedure was implemented using the recursive elimination feature (RFE) algorithm. RFE is a powerful tool for feature selection. This process initiates with training ML models using the entire set of features in the dataset. Subsequently, each feature is assigned an importance score based on its contribution to the model’s predictive performance. RFE then iteratively eliminates the least important feature, retains the model on the reduced feature set, and evaluates performance until a predetermined number of features is reached or performance stabilizes. This systematic elimination and retraining process allows RFE to identify the most influential features.

Machine learning approach

We used a random forest (RF) approach for the digital mapping of GSV levels. RF algorithm is a powerful ML method for classification and regression tasks. Its core principle revolves around creating a forest of decision trees and aggregating their predictions to enhance overall accuracy and reduce overfitting (Breiman 2001). RF operates as follows: First, it randomly selects a subset of the training data (bootstrapping) and a subset of the features of each tree in the forest. Next, multiple decision trees are grown using these subsets. These trees are constructed independently and use a random subset of features at each split point. When making predictions, RF aggregates the results of all the individual trees. For regression tasks, this typically involves averaging the outputs of the trees. RF offers several advantages, including high predictive accuracy, resistance to overfitting, and robustness against noisy data. Also, this algorithm can handle large datasets with numerous features.

Model tuning of the RF model was done for the hyperparameters ntree (number of trees) as 100, 250, 500 (ranger package default), 750 and 1000; and for mtry (number of covariates to consider at any given split) as √covariates (ranger default), 25, 33.3 and 50% of covariates number. The model with the lowest root mean squared error (RMSE) after a 10-fold cross-validation procedure was selected for the digital mapping of GSV values.

Uncertainty assessment

In our study, we implemented a quantile regression forest (QRF) approach for uncertainty assessment in the prediction of GSV (Meinshausen 2006). QRF is an ensemble method that combines the principles of RF with quantile regression. It leverages the strength of decision trees to account for complex relationships within the data and adaptively captures variations at different quantiles of the response distribution. Unlike traditional regression models (such as RF) that estimate the conditional mean of the response variable, QRF focuses on modelling the entire conditional distribution, allowing us to capture the variability and uncertainty inherent in our predictions.

In our analysis, we calculated the quantiles at 0.05 and 0.95, representing the lower and upper bounds of the prediction distribution, respectively. This allowed us to derive a prediction interval at the 90% confidence level as presented in Eq. 4, providing a robust measure of uncertainty around our GSV predictions. A 90% prediction interval is a statistical concept representing a range of values that is anticipated to encompass a forthcoming observation with a 90% confidence level.

$$\:PI90=Q95-\:Q5$$
(4)

where PI90 is 90% prediction interval; Q95 and Q5 are 95th and 5th percentiles, respectively.

For the estimation of the prediction uncertainty, we used the prediction interval coverage probability (PICP) (Shrestha and Solomatine 2006) as presented in Eq. 5. PICP evaluates if the probability assigned to the prediction interval is equal to the frequency of empirical test data within the prediction interval. Ideally, in our case, for a 90% prediction interval, we desire a PICP of 90%. If PICP is less than the confidence level, the uncertainty is underestimated; and if a PICP is greater than the confidence level, the uncertainty is overestimated.

$$\:PICP=\frac{count({L}_{i}<\:{O}_{i}<{U}_{i})}{n}\times\:100$$
(5)

where the numerator is the counts that an observation Oi fits within its prediction; Li and Ui are the predicted lower and upper limits of GSV, respectively; and n is the number of GSV observations.

Model training and evaluation

Validation criteria, including RMSE and coefficient of determination (R2) were used to evaluate and determine the models’ performance. Their equations are below (Eqs. 67):

$$\:RMSE=\sqrt{\frac{{\sum\:}_{i=1}^{n}{\left({O}_{i}{-P}_{i}\right)}^{2}}{n}}$$
(6)
$$\:{R}^{2}={\left(\frac{{\sum\:}_{i=1}^{n}\left({O}_{i}-\:{O}_{avg}\right)\:\times\:\:\left({P}_{i}-\:{P}_{avg}\right)}{\sqrt{{\sum\:}_{i=1}^{n}{\left({O}_{i}{-O}_{avg}\right)}^{2}}\times\:\:{\left({P}_{i}-\:{P}_{avg}\right)}^{2}}\right)}^{2}$$
(7)

where Oi and Pi are observed and predicted values of GSV, Oavg and Pavg are the average values, and n is the number of samples.

The statistical analyses, spatial modelling process and validation were performed using the “ranger” and other basic packages in R programming language.

Results

Forest growing stock volume

Descriptive statistics of GSV in the study area are presented in Table 1. The dataset encompassed a range of GSV values from a minimum of 30 to a maximum of 340 m3/ha, representing the variation in forest GSV over the observation period. On average, the GSV was approximately 160 m3/ha, reflecting the central tendency of the forest’s biomass accumulation. The standard deviation (SD) of 67 m3/ha highlighted the dispersion and variability in the GSV, underscoring the dynamic nature of the forest ecosystem within our study area. Furthermore, the coefficient of variation (CV), calculated at 42%, indicated that the relative variability in GSV was substantial.

Table 1 Descriptive statistics of GSV in the study area (n = 217)

Notes1 Standard deviation; 2 Coefficient of variation.

Variable selection and importance

Since the Sentinel-2 A showed significantly better accuracy, we show the optimal number of covariates and relative importance of variables only for this satellite. Figure 2 displays the optimal number of covariates used in the RF prediction of GSV values. Following the variable selection process in the RFE analysis, it was determined that the optimal number of variables to include in the fitted RF model was seven, indicating that not all variables contributed to the spatial prediction.

Fig. 2
figure 2

RMSE values for different numbers of Sentinel-2 A covariates included in the RF model as determined by the RFE technique

The relative importance of Sentinel-2 A explanatory variables is shown in Fig. 3. Notably, the variable importance analysis indicated that NDVI, an index commonly used to assess vegetation health and density, emerged as the most influential predictor. Following NDVI, spectral index LSWI, displayed a high level of importance. Additionally, B08 band, corresponding to the NIR spectrum of the Sentinel-2 A satellite imagery, featured prominently in the variable importance analysis.

Fig. 3
figure 3

Estimation of the importance of variables in the RF model using Sentinel-2 A data

Model performance

The performance of RF models for both remote sensing datasets based on the 10-fold cross-validation are shown in Table 2. The RF model using Sentinel-2A data yielded an RMSE of 56.6 m3/ha and an R2 value of 0.53, which was significantly better compared to Landsat 8 data (71.2 m3/ha, R2 = 0.23). The scatter plot of observed vs. predicted GSV values using Sentinel-2A data is shown in Fig. 4. The estimation of the prediction uncertainty with the PICP revealed that both Sentinel-2A and Landsat 8 datasets were underestimated the uncertainty associated to GSV predictions (88 and 83%, respectively).

Table 2 Performance analysis of RF predictive models using remote sensing data
Fig. 4
figure 4

Scatter plot of observed vs. predicted GSV values using Sentinel-2 A data. The 1:1 line is indicated in green

Spatial distribution of GSV and its uncertainty

Figure 5 shows the maps of the predicted GSV concentrations with accompanying uncertainty using the Sentinel-2 A data. According to the generated map, predicted GSV values ranged from 51 to 302 m3/ha. It is expected that the highest GSV values were found in areas with developed vegetation in the east part, while the smallest concentrations were found in anthropogenic areas, as well as in areas without forests and shrubs.

Fig. 5
figure 5

Spatial distribution of GSV using Sentinel-2 A data (top), and their uncertainty (bottom) expressed as the width of the 90% prediction intervals (PI90). Note that the water surface does not display the GSV values and its uncertainty

The uncertainty map (Fig. 5), generated using a 90% prediction interval for GSV mapping based on the Sentinel-2 A data, provides valuable insights into the variability and uncertainty of GSV values across the study area. A 90% prediction interval is a range of values that are expected to contain a future observation with a probability of 90%. The GSV values on the uncertainty map ranged from 90 to 240 m³/ha, while for Landsat, these values ranged from 200 to 250 m³/ha. It was expected as a covariate set with Sentinel-2 A data resulted in more accurate predictions. The analysis revealed that east areas tended to exhibit the highest uncertainty, while other parts were characterized by less uncertainty. In general, higher uncertainty was associated with dense vegetation and lower was observed in non-forested areas, suggesting that uncertainty is related to the explanatory variables. Areas with dense vegetation were expected to have higher uncertainty because accurately field measuring the GSV levels became more difficult due to increasing of values.

Discussion

Performances of predictions

Choosing suitable explanatory variables for the digital mapping of forest properties is an essential step. In our investigation of GSV mapping, the utilization of Sentinel-2 A and Landsat 8 satellite data as separate entities revealed notable disparities in accuracy, with Sentinel-2 A demonstrating superior performance over Landsat 8. The enhanced accuracy observed with Sentinel-2 A can be attributed to its higher spatial resolution and advanced sensor capabilities, allowing for finer details and more precise discrimination of GSV levels compared to Landsat 8. Similar findings were reported in other studies. For instance, a Sentinel-2-based model achieved higher accuracy compared to Landsat 8 for GSV mapping in Huairou District, China (Zhou and Feng 2023). Similarly, Korhonen et al. (2017) demonstrated that Sentinel-2 was slightly better than Landsat 8 in the estimation of canopy cover and leaf area index.

In heterogeneous landscapes characterized by diverse land cover types and varying vegetation patterns, as in this study, a higher spatial resolution allowed for a more detailed and nuanced representation of the landscape. The forest species of the reserve are significantly disturbed, as they are partially cut down, and built up with buildings and recreational facilities (Volkov et al. 2023). Anthropogenic areas often experience significant disturbances and land-use changes, such as deforestation, urbanization, and agricultural practices. These activities can lead to a fragmentation of the natural landscape and disrupt the spatial continuity of forested areas. Also, in anthropogenic areas, land management practices can be highly variable, leading to differences in forest composition, density, and age structure.

Although the RF model explained 53% of the GSV variation, according to the RMSE, our research findings slightly better align with GSV mapping studies conducted in other regions. For instance, Jiang et al. (2020) reported comparable error metrics, with an RMSE of 65.1 m3/ha using a stepwise RF approach in Northern China. Similarly, Mauya et al. (2019) attained an RMSE of 72.6 m3/ha using ALOS PALSAR-2, Sentinel-1 and Sentinel-2 data in Tanzania, whereas Suleymanov et al. (2024) obtained RMSE = 76 m3/ha using Sentinel-2 data and QRF approach in the Ural mountains (Russia). In other study, Zharko et al. (2020) achieved an RMSE of 91 m3/ha using Sentinel-2 data in the Russian Southern Taiga region.

Variable importance analysis

Our findings underscore the critical role of NDVI in the ensemble predictive model, suggesting that changes in vegetation health and greenness have a substantial impact on GSV patterns. Numerous studies have demonstrated the importance of NDVI for the spatial modelling of many environmental variables, such as soil (Burgheimer et al. 2006; Singh et al. 2004), vegetation characteristics (Huang et al. 2021; Suleymanov et al. 2020), and even living organisms (Pettorelli et al. 2011). NIR bands are known for their sensitivity to vegetation characteristics (Astola et al. 2019; Tsuchikawa et al. 2022), making the B08 band an essential contributor to predictive accuracy. Similar results were reported in Nasiri et al. (2022), where NDVI and Sentinel B08 band were the most important variables in the spatial modelling of forest canopy cover using an RF method. Moreover, the significant contribution of LSWI index highlights the relevance of hydrological information in our predictive model. These findings collectively emphasize the importance of remote sensing data, particularly NDVI and NIR spectra, in predicting forest properties.

Uncertainty of predictions

Besides the superior accuracy in GSV mapping, our study revealed that the uncertainty associated with predictions was reduced when using Sentinel-2 A, although both sensors underestimated the uncertainty. This reduced uncertainty can be also attributed to the finer spatial resolution and advanced sensor capabilities of Sentinel-2 A, enabling a more accurate representation of GSV. Hence, our findings highlight the significance of considering spatial resolution as a key factor when selecting satellite data for studies in diverse and heterogeneous landscapes.

Uncertainty in GSV mapping studies can arise from various sources, including the limitation of explanatory variables in accuracy and accounting for all variations in GSV, potential errors in field measurements, and spatial autocorrelation (Gonzalez et al. 2010; Kangas et al. 2018). Earlier, Xu et al. (2021) demonstrated that the accuracy of classification of tree species increases depending on spatial resolution of remote sensing data. Performance was better using spatial data with a resolution from 4 to 10 m, while with 30 m accuracy fall. It can be explained by the fact, that finer spatial resolution allows for better discrimination of features, while coarser resolution may lead to mixed pixels and reduced accuracy, especially in heterogeneous landscapes. We concluded that spatial resolution yielded higher prediction accuracy and a more robust estimate of prediction uncertainty. Thus, the increased level of detail provided by Sentinel-2 A facilitates better discrimination and characterization of distinct forest features, capturing variations within the heterogeneous landscape more effectively. This spatial refinement becomes particularly crucial when dealing with complex ecosystems, enabling the identification and differentiation of subtle changes in GSV values.

Earlier Chen et al. (2017) demonstrated that a high level of anthropogenic disturbances (e.g., high percentage of built-up or high degree of tree patch fragmentation) introduced a high variation in forest carbon estimation. Errors in field measurements or inconsistencies in ground truth data also can propagate uncertainties into the model outputs. Moreover, spatial modelling of natural components in the urban environment is associated with additional sources of uncertainty (Gu and Townsend 2017; Tigges and Lakes 2017). Thus, Richardson and Moskal (2014) recommended collecting large numbers of points to achieve a high degree of certainty in complex urban areas. We assume that the collection of additional field observations, especially under mature dense forests, and the use of ultra-high-resolution remote sensing data will lead to more accurate results and, consequently, fewer uncertainties.

Conclusion

Accurate GSV mapping is crucial for informed land management decisions, climate change mitigation, and sustainable resource utilization. Moreover, given the importance of forests to the world’s health, such investigations are necessary to conduct across various geographical regions with different species. This study aimed to digital mapping with accompanying uncertainty of GSV levels across birch (Betula pendula) and pine (Pinus sylvestris) species in forest-steppe zone with a warm summer continental climate. We used RF technique and tested different satellites, and the following results were discussed:

  1. 1.

    Out of the two RF models employed, which included Sentinel-2A or Landsat 8 data, Sentinel-2A exhibited the highest level of accuracy (RMSE = 56.6 m3/ha, R2 = 0.53) in comparison with Landsat 8 data (RMSE = 71.2 m3/ha, R2 = 0.23).

  2. 2.

    The analysis of variable importance demonstrated that NDVI emerged as the foremost influential predictor. Furthermore, LSWI and B08 band also played prominent roles in the spatial predictions.

  3. 3.

    The uncertainty associated with GSV levels revealed that higher uncertainty was observed in areas with dense and mature species, whereas the lowest values were located in urban zones and areas without vegetation. The PICP suggested that the uncertainty associated with GSV predictions was underestimated using both Sentinel-2 A and Landsat 8 sensors.

These results affirm that not only does Sentinel-2 A offer improved accuracy in GSV mapping, but it also provides a more robust and less uncertain basis for decision-making in forest management. The combined assessment of accuracy and uncertainty reinforces the suitability of Sentinel-2 A for applications in heterogeneous landscapes, emphasizing its role in supporting informed and reliable decisions in the field of forestry. This knowledge can serve as a valuable foundation for future endeavours involving GSV mapping in similar environmental contexts.