Introduction

Quantifying aboveground biomass (AGB) is a crucial step in accounting for the total carbon in the forest. As research on the effects of carbon dioxide on climate and the sequestration of carbon has progressed, the significance of this estimation has grown to become increasingly important. The majority of the AGB, or about 55 percent of the biomass stored in terrestrial vegetation, is found in tropical forests, which also have one of the highest rates of carbon sequestration per unit of land area (Lawrence et al., 2022; Harris et al., 2021; Réjou‐Méchain et al., 2017). Additionally, more than 10% of all anthropogenic greenhouse gas emissions come from deforestation and degradation in the tropics (Lewis et al., 2015). Thus, it is crucial for climate mitigation policy to quantify AGB in the tropics which can be a key component of climate mitigation efforts.

Field-based AGB estimation is regarded as the most accurate method for determining AGB; however, this method is not practical in the tropics on a regional scale due to the challenging terrain and logistics involved. Integration of field inventory method with remote sensing approach, both active and passive sensors, found to be an alternative and effective method for AGB estimation in the recent past. However, both active and passive sensors have their own drawbacks. Light detection and ranging (LiDAR) can accurately measure AGB because of its capacity to penetrate the forest canopy and its ability to accurately collect stand structural information. But the data collected by LiDAR, both airborne and spaceborne, is extremely limited and sparse, which makes it difficult to provide wall-to-wall coverage of the AGB map. Another active sensor known as synthetic aperture radar (SAR) operates only at a particular band frequency, which suggests that it might not be appropriate for AGB estimation in all types of forests specifically in the tropics (Gao et al., 2018). The ALOS-PALSAR L-band, which has the maximum ability to penetrate canopy layers at the moment, has a saturation limit of 150 Mg/ha (Joshi et al., 2015; Patenaude et al., 2005; Rodríguez-Veiga et al., 2017). The same applies to Sentinel-1 C-band which is having lower bandwidth than ALOS-PALSAR L-band with lower canopy penetrating power and is comparatively less sensitive to AGB (Kumar et al., 2019). In the process, optical datasets, e.g., Landsat-8, Sentinel-2, Planet, Gaofen-1 etc., have seen a lot of use due to the fact that they have an appropriate bandwidth, richest time series and a wide availability of data. Previous studies have achieved AGB variations of 25–80% in the tropics using optical datasets (Askar et al., 2018; Dube & Mutanga, 2015; Foody et al., 2003; Pandit et al., 2018).

ESA (European Space Agency) launched Sentinel-2A and Sentinel-2B in 2015 and 2017 with improved spectral, spatial, and temporal resolutions. Multi-spectral imager (MSI) of Sentinel-2 records incoming radiation in 13 spectral bands (additional bands in vegetation red-edge zone) at 10 m, 20 m, and 60 m spatial resolutions with a 5-day revisit period. It has increased the likelihood of obtaining cloud-free satellite images in the tropics and will be useful for long-term monitoring because Sentinel-2's band width is analogous to that of the Landsat series, which has been providing optical data for the past five decades (Ghosh et al., 2021). This makes Sentinel-2 as an ideal spaceborne platform to estimate AGB.

Direct use of surface reflectance and vegetation indices are often shown to be saturated because of the complex layered nature of the canopy in tropical forests, although are found effective in homogenous forest cover (Ghosh & Behera, 2018). As a result, a large number of innovative strategies have been developed in response. One of these methods is the technique of image transformation, e.g., texture image generation. It is proven that texture generated from both optical and SAR can surpass the normal saturation level of AGB estimation as they measure local variations in an image rather than relying on pixel information directly (Csillik et al., 2019; Ghosh & Behera, 2018; Kelsey & Neff, 2014; Pierre Ploton et al., 2012). Many studies have used textures to improve the accuracy of biomass modeling at local, regional and national scales (Csillik et al., 2019; Dang et al., 2019; Ghosh & Behera, 2018). The Gray-level co-occurrence matrix, also known as GLCM, is an approach to texture analysis that has been applied by numerous researchers in the recent past and is regarded as both effective and promising for AGB estimation (Csillik et al., 2019; Dang et al., 2019; Kelsey & Neff, 2014). In addition, it has been emphasized that care should be taken to select and combine multiple predictor variables as well as make use of nonparametric regression functions (machine learning approach) for the purpose of developing the AGB model (Jiang et al., 2021; Turton et al., 2022). The nonparametric method takes into account the nonlinear relationships between the predictor variables and AGB and have the ability to delineate patterns in big datasets (Turton et al., 2022). One of the nonparametric techniques that has been deemed the most reliable and robust is random forest (RF) (Degenhardt et al., 2019; Rodriguez-Galiano et al., 2012; Sarker, 2021).

Even though it is very difficult to map AGB in the tropics using spaceborne data, experiments have been conducted to make progress and lower estimation uncertainty (Csillik et al., 2019; Fararoda et al., 2021; Ghosh & Behera, 2018; Han et al., 2021; Jiang et al., 2022; Li et al., 2020; Nandy et al., 2021). The goal of this study is to develop an effective random forest regression framework for AGB estimation that makes use of Sentinel-2 spectral and textural features in conjunction with topographical factors and the available GEDI canopy height product (Potapov et al., 2021) in a tropical forest landscape. The methodology described in this study will be useful in estimating AGB at a regional scale in fine resolution.

Materials and Methods

Study Area

The study was conducted in a landscape named RV Nagar—Sileru (Raghavendra Nagar—Sileru region) which is situated in the Eastern Ghats of Northern Andhra Pradesh, India. It is located between 17° 46' N and 18° 02' N latitude and 82° 02' E to 82° 18' E longitude (Fig. 1). The site represents a unique vegetation composition in the tracts of Gudem Valley (Reddy et al., 2008). Natural vegetation of the area is composed of three major forest types; semi-evergreen, moist deciduous and dry deciduous along with grasslands and savannah (Reddy et al., 2015). It receives an average rainfall of 1600–1800 mm annually. Maximum temperature attains 35 °C in summer and minimum temperature in winter falls down to 4 °C. Elevation of the area ranges between 800 and 1500 m from the sea level. It is identified as an ecologically sensitive area of the Eastern Ghats of India because of its rich biodiversity (Reddy et al., 2010).

Fig. 1
figure 1

The geographic location of the study area (RV Nagar- Sileru) and distributions of forest inventory plots

Plot Establishment

Field work was conducted between October, 2019 to December, 2020 in the study area. A total of 60 plots distributed across forest types with a calculated area of 0.1 ha each having dimensions of 31.63 m × 31.63 m were established for tree inventories following methods as provided in Reddy (2021). Within the plot, all the trees ≥ 30 cm girth at breast height (GBH) were measured at 1.37 m above the ground and identified to the species level. A total of 2083 trees were inventoried across 60 plots during the inventory. Species identification is a critical component in AGB estimation in the tropics. Maximum care was taken to identify all the individuals to species level and their identities were confirmed by using the Flora of Andhra Pradesh (Pullaiah & Chennaiah, 1997) and referring specimens in the herbarium of the Department of Botany, Andhra University, Visakhapatnam. A total of 104 tree species, species list was given in online resource 1, were identified of which Xylia xylocarpa, Terminalia alata, Grewia tiliifolia, Mallotus philippensis and Pterocarpus marsupium were found top five dominants in the study area.

Field AGB Estimation

The entire process of field AGB estimation was performed in the BIOMASS package (Réjou‐Méchain et al., 2017) in R by considering diameter at breast height (DBH in cm), tree height (in m) and species wise wood density values (g/cm3) as per Chave et al. (2014). Prior to calculation, field-measured GBH values were converted to DBH. We used the retrieveH function to predict individual height information from DBH values. The retrieveH function was assigned with an argument called coord which considers geographic coordinates of the plot as an input to assign a bioclimatic predictor variable E for computing height from a pantropical diameter-height allometric model following Chave et al. (2014). The getWoodDensity function was used to retrieve mean wood density values from Global databases such as Chave et al. (2009), Zanne et al. (2009). Finally, the computeAGB function was employed to calculate the AGB for each individual. The total AGB per plot was the sum of the AGB of all stems inside the plot, which was then converted to megagrams per hectare (Mg/ha). Species-level biomass was also calculated as the sum of the biomass of all individuals from a given species to find out top contributors of AGB at species level. We found Mangifera indica, Terminalia alata, Xylia xylocarpa, Pterocarpus marsupium, Anogeissus latifolia, Protium serratum, Syzygium cumini, Schleichera oleosa, Mallotus philippensis, Grewia tiliifolia as top 10 AGB contributor species in the study area.

The AGB values for 60 field plots ranged from 35.97 to 450.27 Mg/ha with a mean of 162.65 Mg/ha. The coefficient of variation of AGB was 53.97%. The average AGB was found pertinent with other studies in this region (Srinivas & Sundarapandian, 2019). In addition, the uncertainty in AGB calculation was quantified using Monte Carlo simulation algorithm by considering error propagation in various steps (height, wood density, diameter, allometric equation) of the analysis. The AGBMonteCarlo function was applied for the uncertainty calculation. We obtained a mean AGB values of 163.48 Mg/ha, median AGB values of 163.54 Mg/ha, standard deviation of 3.46 Mg/ha with a confidence interval (CI, 5%) ranging from 157.09 to 170.23 Mg/ha.

Satellite Data and Predictor Variables

The Copernicus Sentinel data hub (https://scihub.copernicus.eu) was used to download the Sentinel-2 level 2A Ortho-rectified satellite data for November 1, 2020. The data provider pre-processed the image into a Top of the Atmosphere (ToA) reflectance and masked it with a vector layer representing the research area. It has 13 spectral bands, including four bands, six bands, and three bands with 10 m, 20 m, and 60 m spatial resolutions, respectively. This study focused on the spectral bands in the visible, near-infrared, shortwave infrared, and red-edge wavelengths.

Red edge bands (band 5, band 6, band 7 and band 8a) and shortwave infrared bands (band 11 and band 12), were resampled to 10 m resolution from 20 m resolution. Then, principal component analysis was applied on the multispectral bands to reduce dimensionality without compromising variability among them. The first and second principal components (PC1 and PC2), together explain 90% of variance in the datasets, were chosen for further texture processing using the Gray-level co-occurrence matrix (GLCM) method. Seven GLCM components were computed in a 3 × 3 processing window following Haralick et al. (1973). Vegetation indices are essential components of AGB estimation, so we calculated normalized difference vegetation index (NDVI) following Rouse et al. (1974) and Visible Atmospherically Resistant Index (VARI) following Gitelson et al. (2002) to incorporate in the analysis. The decision to choose NDVI and VARI was made after employing a correlation test of several vegetation indices with field estimated AGB, where NDVI and VARI were top contributors. Others were not considered for the analysis to avoid unnecessary multicollinearity effect. Leaf area index (LAI) was derived using function Biophysical processor in SNAP toolbox as a proxy to biophysical parameters (https://earth.esa.int/eogateway/tools/snap) following PROSAIL radiative transfer method (Jacquemoud et al., 2009). In addition, Global canopy height product (Potapov et al., 2021), SRTM DEM elevation, slope and aspect were considered as the predictor variables for the AGB estimation. List of predictor variables employed in the analysis with their descriptions is given in Online resource 2.

Feature Extraction

Geolocation coordinates obtained using absolute GPS (portable GPS device) often introduce uncertainty in geolocation and spatial mismatch between field plot and remote sensing data (Zhang et al., 2019). To reduce the uncertainty at this level, we followed methods of Carreiras et al. (2013) and Ghosh and Behera (2018) and extracted remote sensing data by creating a 30 m radius buffer around the central point of the plot. Mean pixel values inside each buffer was considered here as the central pixel value of the plot. In addition, we also calculated the neighborhood statistics (minimum, maximum, average) with a 3 × 3 processing window size for each predictor variable to account more variability, see Online resource 2. A total of 71 predictor variables were chosen for model development after a series of experiments with data matrix, refer Online resource 2.

Random Forest Model

Random forest (RF), is a nonparametric machine learning method, has been employed frequently in remote sensing applications. It is known for its high predictive accuracy by establishing complex and nonlinear relationships among predictor variables. In addition, it draws conclusive notes on the importance of each variable by computing average decline in impurity index. In the analysis, we applied RF as a regression function to model AGB. We provided two user-defined parameters to the model called a number of decision trees (ntrees) to 500 and number of attributes to be selected for best entry (mtry) to 10. The values were determined based on previous studies and analysis (Immitzer et al., 2016; Mohammadpour et al., 2022; Wang et al., 2015). We experimented here four combinations of the final datasets, namely Model 1, Model 2, Model 3 and Model 4 (Table 1). In the experiments, variables of the spectral and biophysical parameters are included in Model 1, while the texture variables are included in Model 2. The variables of Model 1 and 2 are combined together in model 3. The final Model 4 was provided with all variables including physical parameters. The entire datasets were randomly segregated to 70% and 30% for model development and validation purposes, respectively. We applied a tenfold cross-validation method (Kuhn and Johnson 2013) in the training dataset (70%) to select the best model for each variable combination (Model 1 to Model 4).

Table 1 Model parameters and variable selections

Model Validation

The validation set (30%) was used to compare predicted AGB (model based) and observed AGB (field based) on the basis of Pearson’s correlation coefficient (r), root mean square error (RMSE), relative RMSE (rRMSE in percentage, %), mean absolute error (MAE) and bias. The formulas for the selected validation metrics are provided in Table 2 and have been widely used in similar practices. An additional Welsh’s ANOVA was employed to analyze the mean difference of observed and predicted AGB.

Table 2 List of validation parameters

AGB Prediction Map

We prepared a map of AGB for the study area using the best fitted model. The produced map was resampled to 30 m resolution using nearest neighborhood method of resample function (package Raster in R) to represent an area of approximately 0.1 ha for each pixel.

A methodological flowchart is presented in Fig. 2.

Fig. 2
figure 2

Methodological flowchart for estimating AGB

Results

Model Development and Variable Importance

The results obtained from tenfold cross-validation of four models (Model 1–Model 4) are described and compared in Table 3. Model 4 was found best performing among all with RMSE of 89.15 Mg/ha, R2 of 0.61 and MAE of 74.25 Mg/ha. Except Model 1, R2 values were reported greater than 0.5 which concludes a strong model fitting to the training datasets. We calculated variable importance among the predictor variables in Model 4, which represents combinations of all the variables. We also calculated the relative importance of predictor variables, in which, texture variables emerged as the most important variables, followed by the spectral, biophysical and physical (topography and stand structure), refer Fig. 3. All the variables contributed positively to the final model and their combinations outperformed individual models (Table 3).

Table 3 Model derived cross-validation statistics
Fig. 3
figure 3

Variable importance of predictor variables (category wise) in percentage

Model Validation

An overall validation result is given in Table 4. The observed AGB (Mg/ha) and predicted AGB (Mg/ha) for all the models are represented in Fig. 4 as scatter plots to study the correlation of each model. The range of correlation coefficients was observed in between 0.47 to 0.72 (p < 0.05), and 69.18 Mg/ha to 75.60 Mg/ha for RMSE. A comparison of the correlation coefficient and RMSE among the models is given in Fig. 5. Model 1 is observed to have the least correlation with coefficient values of 0.47, while Model 4 demonstrates the strongest correlation coefficient with values of 0.72. It is also found that while comparing individual-based models, texture (Model 2) outperformed spectral (Model 1) with a coefficient difference of 0.15. A decrease of 3% in rRMSE values was observed when the texture was added to the spectral model. A difference of 5% in rRMSE was observed between Model 1 and Model 4, refer Table 4. A very similar bias (Mg/ha) was recorded between Model 2 (6.73 Mg/ha) and Model 4 (6.31 Mg/ha), which demonstrates the importance of texture in the final model. Welch’s ANOVA result (p = 0.98, p > 0.05) concludes no statistical difference between mean AGB values among the models and observed values. By considering the values of the correlation coefficient of 0.72, RMSE of 69.18 Mg/ha and rRMSE(%) of 41.3, we selected Model 4 to prepare the final AGB map (Fig. 6). Ranges of the AGB map varies from 107.3 to 243.02 Mg/ha with a mean value of 168.4 Mg/ha.

Table 4 Validation of models using testing datasets
Fig. 4
figure 4

Scatterplots of observed and predicted AGB values along with correlation coefficient (r) and RMSE values

Fig. 5
figure 5

Performance metrics of four models in predicting AGB a correlation coefficient (r), b RMSE (Mg/ha)

Fig. 6
figure 6

AGB map predicted using best fitted model

Discussions

Main objective of this study is to find the effectiveness of Sentinel-2 to model AGB and achieve a generalized prediction method for a landscape level AGB mapping. We trained here a random forest model with field-obtained AGB data and Sentinel-2 generated variables of GLCM textures, principal components of spectral bands, vegetation indices (NDVI, VARI) and biophysical parameter (LAI) along with topographical, stand structural information derived from SRTM and global canopy height product information respectively. The model explained AGB variability significantly (r = 0.72, RMSE = 69.18 Mg/ha, MAE = 58.22 Mg/ha) with an uncertainty observation of 41.3% (rRMSE) (Fig. 4, Table 4), and it demonstrated consistency with analogous studies which were reported previously for heterogeneous forested landscapes (Fararoda et al., 2021; Kelsey & Neff, 2014; Lamulamu et al., 2022; Li et al., 2020).

Random forest, which has been used widely for AGB prediction from remote sensing, was chosen because of its robustness in handling large predictive variables and nonlinear datasets with high predictive capability. We found texture as a major contributor to the AGB model while calculating variable importance (Fig. 3) and was found effective to overcome the saturation limit of 150 Mg/ha. The explanation could be, texture quantifies the spatial variation between pixels and provides information about the local variability and heterogeneity of an image, which effectively captures the heterogeneity of a densely canopied tropical forest.

However, the highest AGB value predicted by the model (Model 4), 243.3 Mg/ha, is lower than the observed AGB in the ground. Similarly, the predicted lowest AGB value of 107.3 Mg/ha is higher than the observed lowest value of in the ground. The error overestimation in lower AGB values and underestimation of higher AGB values is inevitable for random forest models as they are not considering spatial patterns during modeling (Heuvelink & Webster, 2022; Turton et al., 2022) and it was reported previously for many studies (Ghosh & Behera, 2018; Jiang et al., 2021; Lamulamu et al., 2022; Li et al., 2020). But it can be nullified with a greater number of samples in the extreme regions and inclusion of spatial statistics into consideration while mapping AGB (Jiang et al., 2021).

Generalization of the AGB model based on spaceborne remote sensing technique is often found challenging but at the same time it is necessary for mapping AGB at a broader scale (regional, landscape and national) with wall-to-wall representations. And it also helps in building reliable AGB prediction maps where the field sample representation is low. Generalization often comes with uncertainty in modeling practices. Many limitations could be attributed to the same. Guitet et al. (2015) classified limitations, e.g. (i) Saturation phenomena at more than 150 Mg/ha, (ii) Spatial mismatches between field and RS data, (iii) Problem of representativeness of the data, generally undersized (Pixel size, which is greater than plot size, represents homogeneity instead of actual heterogeneity), (iv) Dilution bias (up scaling from field data to pixel level), (v) High uncertainty because of scarce field observed data. Other uncertainties may be attributed because of spatial resolution of sensors, sampling error, errors in allometric equations, errors associated with prediction models etc. Some of the uncertainties aforementioned were addressed and discussed below.

As we are not using direct measurements of AGB through optical remote sensing data, indirect variables with different characteristics were recommended. Selection and combination of variables for AGB modeling is an important processing step for better prediction as well as uncertainty reductions. All the predictor variables, which were used here, support positively to the model, whereas texture variables contribute predominantly (83.34% relative importance) followed by spectral and vegetation indices (12.19% relative importance), topography and stand structure (4.46% relative importance), refer Fig. 3. Results of our study concludes that texture parameters are better predictors of AGB in a mixed forest landscape which comprises higher canopy heterogeneity and demonstrates high local variations in an image. It aligns well with other studies (Asner et al., 2014; Csillik et al., 2019; Kelsey & Neff, 2014; Pierre Ploton et al., 2012; Suraj Reddy et al., 2017). The study also supports utilization of spectral bands and vegetation indices derived from Sentinel-2 in AGB mapping practices, which have been proven essential for AGB estimation previously (Muhe & Argaw, 2022; Pandit et al., 2018). Similarly, topographical parameters were considered here as predictor variables as they are key drivers for forest structure and composition. Topography act as an environmental filter and decides the growth and survival of trees by constraining nutrients and hydraulic mechanisms in a landscape (Jucker et al., 2018; Rodrigues et al., 2020). Canopy height is a direct measurement of stand structure and AGB in tropics. Global canopy height product of Potapov et al. (2021) was selected as a proxy to canopy height measurements and stand structural parameters for different forest types in the landscape. Due to the different characters between predictor variables, their combination is essential to reduce uncertainty and enhance the performance of AGB models, and this improvement is evident in the final model (Table 4 and Fig. 6).

In our analysis, each field sampling plot represents an area of 1000 m2 which is higher than the Sentinel-2-pixel area of 100 m2. Thus, values extracted from predictor variables do account for the variability of pixel values into consideration. And it demonstrates better integration of field data with remote sensing variables. The techniques of extracting variables using buffer and neighborhood statistics captured maximum variability and heterogeneity at plot level, which eventually reduced uncertainty in geolocation mismatches and captured high local variance in the study. Thus, our study findings supported the technique applied here and appeals to be considered while extracting remote sensing variables.

Previous studies of Asner et al. (2014), Csillik et al. (2019), Li et al. (2020), Ploton et al. (2017), Pierre Ploton et al. (2012), Suraj Reddy et al. (2017), conclude that spatial resolution of optical imagery plays a significant role in AGB mapping in the tropics as finer resolution captures more complex canopy texture heterogeneity effectively compared to coarse resolution. Those studies recommended the employment of very high resolution (VHR) optical data that needs to be considered for AGB mapping. But, obtaining very high-resolution optical data is costly with restricted accessibility and researchers from developing nations find it difficult to obtain these with limited resources. Whereas, Sentinel-2 is freely available and can provide data in 13 spectral bands with a spatial resolution of 10 m, 20 m and 60 m in 5 days revisit period. Thus, with our obtained results and analysis, we believe Sentinel-2 can be an ideal replacement for VHR data for AGB mapping in tropics. Moreover, Jha et al. (2021) In their study compared Sentinel-2 with other available high to medium resolution satellite sensors (e.g., Landsat 8, GaoFen, SPOT etc.) for large scale AGB mapping and found Sentinel-2 out-performing others.

This study appeared with its own limitations. The final AGB map demonstrated moderate accuracy (r = 0.72).  A possible explanation for moderate accuracy could be a lower sample size to build and validate models. More training samples corresponding to extremely high and low AGB could assist satellite data to capture the complete range of AGB. However, the study demonstrated that the integration of texture, spectral, and physical variables with field inventory datasets can overcome the saturation limit of AGB estimation in a dense canopy closure tropical forest. The introduction of neighborhood statistics in the analysis significantly reduced uncertainty by capturing more variability.

In addition, we also recommend to compare the AGB map with other map products if available, so that it could address REDD + targets effectively.

Conclusions

Estimating AGB is a crucial factor in determining a forest's functional diversity and carbon sequestration. Sentinel-2 is now the only spaceborne platform with great spatial, spectral, and temporal resolution that is provided without a cost. In order to derive vegetation biophysical properties, the Sentinel-2 Multispectral Imager (MSI) collects surface reflectance data in the visible to shortwave infrared bands. In fact, addition of spectral bands in near infrared regions (red-edge bands) made it an essential mapping and monitoring tool for Earth observation. Moreover, the spectral bands are analogous with Landsat, enabling it to access five decadal inventories of spectral information of the Landsat series. Though optical data faces saturation limits while predicting AGB, image transformation techniques such as GLCM texture parameters generation and their combined application with spectral reflectance and physical parameters can easily overcome the saturation limit which was reported in the study. Our study demonstrates here a landscape level AGB model using spectral, textural information of Sentinel-2 along with topographical and structural information of the forest. Obtained results are promising and will encourage reproducibility of the model to achieve advancement in prediction accuracy as well as uncertainty reduction. Wider availability of Sentinel-2 data with technological advancement appeals to its suitability for national level biomass mapping. Methods described in the study such as Image transformation techniques and calculations of image statistical parameters are provided in the materials and methods section with details. All the technical performances are done using open-source environments.