Introduction

The global carbon cycle is a major issue in global climate change research. As the largest terrestrial ecosystem, the forest ecosystem and carbon exchange within it have an important influence on the global carbon balance. Estimating aboveground forest biomass is thus the most important step in measuring carbon stocks and fluxes from forests and helps to determine the contribution of forests to the global carbon cycle (Vicharnakorn et al. 2014). An estimate of forest biomass also provides an important reference point for global carbon and carbon cycle research (Dong et al. 2013; IPCC 2000).

The amount of carbon sequestered by a forest can be inferred from its biomass accumulation because approximately 50% of forest dry biomass is carbon (Brown 1997a, b; Vicharnakorn et al. 2014). The bulk of biomass assessments are performed on aboveground biomass (AGB) of trees since it generally represents the greatest fraction of the total living biomass in a forest and does not pose significant logistical problems during field measurements (IPCC 2007; Vicharnakorn et al. 2014). Forest inventories can be used to determine the spatiotemporal distribution of aboveground forest carbon pools, but they require substantial amounts of time and money and are limited to five years (Goetz et al. 2009).

Satellite data make it possible to monitor and map forests and to trace changes in forest biomass and carbon stock (Eckert 2012). A great deal of research has been devoted to remote sensing-based estimations of biomass, as interdependencies between remote sensing data and biomass have been increasingly discovered for empirical methods when compared to more complicated approaches based on physical mechanism models (Liu et al. 2015).

A common approach is to apply regression analyses to the reflectance channels and spectral and textural indices based on information from sampling sites (Steininger 2000; Castillo-Santiago et al. 2010). Recently, nonparametric algorithms have been explored to estimate forest attributes because of advantages such as flexibility and ability to describe nonlinear dependencies compared with parametric algorithms (Sironen et al. 2010). One particularly useful advantage of nonparametric algorithms is that they are free from assumptions of any given probability distribution and observations are assumed to be independent (Sironen et al. 2010). Machine-learning algorithms are groups of data-mining and nonparametric-based algorithms that use numerous independent variables in classification and regression applications.

Support vector machines (SVMs) are a family of classification and regression techniques that use statistical learning theory (Walton 2008). SVMs have emerged in recent times as a popular technique for data mining, with a great number of applications such as tissue classification (Furey et al. 2000; Pavlidis et al. 2004), shape extraction and classification (Cai et al. 2001; Du and Sun 2004), protein recognition (Zien et al. 2000), bakery process data (Rousu et al. 2003), hyperspectral data (Gualtieri and Cromp 1998), crop classification (Perales et al. 2003), and regression problems (Mukherjee et al. 1997; Pontil et al. 1998; Sivapragasam et al. 2001; Gao et al. 2003; Bray and Han 2004).

Support vector regression (SVR) is the application of SVM in regression, where the output is in real or continuous numbers (Mustakim et al. 2016). The aim of this study was to assess SPOT-HRG satellite data and SVM model regression as tools for estimating aboveground tree carbon stock in Darabkola forest in Mazandaran Province, Iran.

Materials and methods

Study area

The study area was located in the Hyrcanian Forests, District One of Darabkola’s forests, in northern Iran (Fig. 1). Darabkola’s forest, with an area of about 2612 ha, is a natural, nd mature forest with uneven-aged and dense to semidense stands. These comprise mixed hardwood types including Fagus dominant, mixed Fagus, CarpinusFagus, FagusCarpinus, and CarpinusTilia. Elevations range from 140 to 880 m, a.s.l., and the general slope in the study area is northerly, with subtle differences in slope aspects. Logging is performed by selecting individual trees in an attempt to approximate natural silviculture. The investigation was carried out in only one part of Darabkola’s District One, about 1224 ha, where Persian beech (Fagus orientalis) is a dominant species (Fagus dominant type).

Fig. 1
figure 1

Location of plots in Darabkola’s forestry district, Mazandaran Province, northern Iran

SPOT data (satellite for earth observation)

For this study, the images of the visible and near-infrared (VNIR) and short wave infrared or middle infrared (SWIR) subsystems were used from the SPOT HRG satellite data, acquired on 14 July 2009. The VNIR imagery contains three bands (green, red and near-infrared) with a spatial resolution of 10 m. The SWIR imagery has a spatial resolution of 20 m.

Field inventory data

We demarcated sampling plots on areas supporting relatively heterogeneous forest types. To reduce the effect of topography on illumination, we randomly selected plots with only a northern aspect. Plots were 30 × 30 m (0.36 ha) (Fig. 1) distributed in a cluster with north–south and east–west directions. The central position of each plot was accurately registered using a high quality handheld ground control point (STONEX SD7 model) device several times to get an accurate position. In each plot, for all trees with breast-height diameter greater than 7.5 cm, we recorded tree species, diameter at breast height (DBH). For some trees in each plot, we also measured height. The volume per hectare in each plot in natural forest was calculated based on tariff tables (Anonymous 2005) for the Darabkola Forest and using Eq. (1) for a forest plantation.

$$ B_{A} = d^{2} \frac{\uppi}{4}, $$
(1)

where B A is base area and d is diameter at breast height. To determine the amount of biomass, densities of all species were measured in the laboratory, and then using Eq. (2) in closed forests with wet weather, the amount of biomass in the plot was calculated (FAO 1997).

$$ B = V_{\text{OB}} W_{\text{D}} B_{\text{EF}} , $$
(2)

where, B is biomass, V OB is the volume of trees, W D is tree density by species, B EF is a biomass expansion factor (FAO 1997). For calculating the amount of stored carbon, biomass values were multiplied by conversion coefficients, which are generally considered to be 50% (Eckert 2010; Goetz et al. 2009; Gibbs et al. 2007).

Image processing

The visible and near-infrared (VNIR) and short wave infrared or middle infrared (SWIR). Images were geometrically orthorectified using a 10-m resolution digital elevation model (DEM) and 37 control points obtained by a handheld GPS (3–5 m accuracy). The obtained root mean square errors (RMSEs) of imagery were less than 0.59 pixels for VNIR and 0.5 pixel for SWIR images. Geometric precision of the images was also verified using a road vector layer and unused collected GPS control points to rectify the accuracy of geometric rectification. Also, COST atmospheric correction method was used in this research. Studies have shown that the green and then red and near-infrared bands have the greatest darkness and that the amount of the SWIRv band is close to zero. With the atmospheric correction bands, atmospheric error rate was reduced.

Some commonly used vegetation indices (Table 1) for quantifying the vegetation attributes and enhancing the biophysical characteristics were generated using the VNIR and SWIR imagery. Principal component analysis (PCA) is normally performed in two standard ways: (1) using all bands and (2) selecting bands having the highest correlations. In this study, both methods were applied separately for VNIR and SWIR bands.

Table 1 Vegetation indices used of this study

Characteristics for each of the bands of green, red, near infrared and SWIR using a gray level co-occurrence matrix (GLCM) were used to assess the texture of the image. GLCM is a tabulation of different constituents of pixel values in an image (Dutta et al. 2012). In this study, to assess the characteristics of texture matrix, occurrence and co-occurrence were used. In total, 30 independent variables and predictors were used in the analysis to predict the forest attributes.

Extraction of spectral values

In remote sensing, each pixel of the digital images has a numerical value that reflects the spectral behavior of the corresponding phenomena on the Earth’s surface. In this study, postproduction of synthetic bands and measurement of the characteristics of the samples through field operation were performed, and the spectral values corresponding to plots of the original and synthetic bands were extracted. Then the obtained values and the corresponding spectral values of the corresponding pixels in the satellite data as well as the statistical relations were examined.

Modeling

In this study, an SVM was for modeling. An SVM is a nonlinear generalization of the generalized portrait algorithm developed by Vapnik (1995). Generally, SVMs focus on the boundary between classes and map the input space created by independent variables using a nonlinear transformation according to a kernel function. Linear, polynomial, radial basis function (RBF), and sigmoid are the four most commonly used kernel types. RBF is the most popular kernel used in SVMs (Cortez and Morsis 2007; Durbha et al. 2007). The simplicity of the method is one of its main advantages over other data mining techniques, such as artificial neural networks (ANNs). Thus, only a few parameters need to be adjusted by the users to optimize the model.

A prerequisite for SVM to achieve better results is to determine the parameters that play a key role in achieving high accuracy and provide better performance (Wang et al. 2009). The specified grid search using v-fold cross-validation (Durbha et al. 2007) is the most commonly used method to find appropriate values for the parameters, i.e., epsilon (ε) and capacity (C) with fixed a gamma that would produce high-accuracy results. A brief description of the proposed methods was summarized by Durbha et al. (2007). In this study, four different cores, including linear, polynomial, circular and radial basis function were used. Equation (3) was used to calculate the scale where \( n \) is equal to the number of variables (Hsu et al. 2010).

$$ \upgamma = \, 1/n,. $$
(3)

To choose the best ε and C values, we used the grid search method (Hsu et al. 2010), wherein C values ranged from 1 to 50, equal to the range of input variables (Mattera and Haykin 1999), and ε values from 0.1 to 0.5.

Model validation

The validity of the model was determined in a number of ways. Using 30% of samples, models were selected according to the significance of RMSE (in reverse order) and the t test for bias (based on the t_Student). RMSE is the arithmetic mean of square errors, which was applied to the regression models to determine the accuracy of the estimate (Eq. 4).

$$ R_{m} = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {\mathop C\limits^{{}}_{i} - C_{i} } \right)^{2} }}{n}} , $$
(4)

where R m is the root mean square error, Ĉ i is the carbon estimate based on the regression model for each plot of sample, C i is the carbon measured in the control plots on the ground, and n is the number of control samples. After the model was obtained, the precision (P r) of the calculation prediction needed to be examined using Eq. (5) (Li 2010; Dong et al. 2013).

$$ P_{\text{r}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left[ {1 - {\text{abs}}\left( {\frac{{\hat{C}_{i} - C_{i} }}{{C_{i} }}} \right) \times 100} \right]}}{n}. $$
(5)

Bias (B i , disambiguation) or systematic error is a way of estimating regression models used to determine authenticity. Biases for all models of the study were calculated using Eq. (6).

$$ B_{i} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {\hat{C}_{i} - C_{i} } \right)}}{n}. $$
(6)

Student’s t test was used to assess the significance of the bias and the significance was analyzed by applying Eq. (7) (Ranta et al. 1991).

$$ t = \frac{{B_{i} }}{{{\raise0.7ex\hbox{${S_{\text{D}} }$} \!\mathord{\left/ {\vphantom {{S_{\text{D}} } {\sqrt n }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${\sqrt n }$}}}}, $$
(7)

where S D is the standard deviation of the residuals (Ĉ i C i ).

Results

Statistical analysis of field data and the extracted spectral bands

Of the 165 plots, 117 were used for modeling, while 48 were utilized for validation. Overall, 81 plots were in natural forest areas and 84 plots in afforestation. Average carbon storage was 206 tons per hectare, which was greater than the average carbon stored in afforestation (151 tons per hectare) and less than in natural forest at 264 tons of carbon per hectare. The maximum carbon stored in natural forests was 495.88 tons per ha, whereas the minimum of 55.8 tons per ha was estimated for afforested stands (Table 2).

Table 2 Specifications and a descriptive assessment model for aboveground carbon stock

The best value obtained for ε was equal to 0.1; however, the value of the parameter C fluctuated from 13 to 50. Excluding the linear model, the value of gamma remained unchanged for all models according to the equation used in the calculation. The parameters for prediction and modeling kernels in each run were used to obtain the best results. The specified grid search for different models in the entire study area, including natural forest and afforestation area, are given in Table 3.

Table 3 SVM analysis results to determine the best parameters using the grid search for carbon stocks characteristic for the entire region, natural forests, and afforestation

The results of SVR implementations determined by cross validation for carbon stock estimations are given in Table 4. The results of the carbon stock estimation for the entire region showed a lower relative RMSE and bias using a three-degree polynomial kernel type.

Table 4 The results validate the various models in the entire region and afforestation area

In carbon stock estimation using SVR in afforestation, the best results based on relative RMSE and bias were obtained using a polynomial kernel type. The validation results showed that the three-degree polynomial model in the afforestation with the remaining 67.05 t/ha root mean square error and estimation accuracy 64.3% is the best model for predicting carbon stocks in this area (Table 5).

Table 5 The results validate the various models in the natural forest

Evaluation of the best model in the natural forest showed that the third degree polynomial had the lowest root mean square error (100.81 t/ha) among the four models, accuracy of the best forecast model for the forest area was 57.37, lower than the accuracy calculated for the best selected models in the entire area and the forest range (Table 5).

The correlation between the number of artificial bands, PCA, texture analysis and indexes are shown in Table 6. Results showed the most of the bands were negatively correlated.

Table 6 Correlation between artificial and carbon bands in the entire region

Using the best model forecast map, the distribution of carbon in Table 6 in the region was prepared to show the spatial distribution of these variables in the region. This map can be used to identify areas with high carbon storage and apply the necessary measures. Areas marked in red have the lowest amount of carbon stored in the study area, and dark green colors represent areas with high carbon stocks in the entire area of the study (Fig. 2).

Fig. 2
figure 2

Carbon distribution map of the study

The average amount of carbon stored related to classes IV and V are located mostly within natural forest and also points to the low carbon often seen in the afforestation area.

Discussion

Forests play an important role in global change on Earth. Deforestation and forest degradation can result in carbon emission to the atmosphere, thus affecting global climate and environmental change (Hansen et al. 2015). Current concerns for global change and ecosystem functioning require accurate biomass estimation and examination of its dynamics (Zhou and Hemstrom 2009). Carbon sequestration potential in terms of plant species and in siting management practices, is different. Remote sensing data provides nondestructive and continuous detection information on the biomass of forest ecosystem.

In this study for estimating carbon stock using SPOT-HRG satellite data, correlations between carbon stock reflectance in SPOT-HRG spectral bands and forest attribute were low and negative. These negative correlation values for stand volume and biomass are in line with previous studies (Shataee et al. 2012). This negative value means that with increasing age of stands and number of canopy stories, the amount of shadow increases, leading to a reduction of the overall spectral response. An important point is the negative correlation between the bands and the indicators obtained from SPOT-HRG with the carbon indicator, implying that as carbon storage increases indicator values go down. This negative value in the natural forest means that with increasing age of stands and number of canopies, the amount of shadow increases, leading to a reduction in the overall spectral response. Thus, spectral reflection in the infrared section is lowered, and the increase in biomass leads to reduced spectral reflection. Naturally, the carbon that has accumulated over time reduces the reflection, allowing the indicator to appear in satellite images. A wide range of spectral indicators, along with reflection values of different bands, were considered. The results indicated that the red band and the RVI have the highest correlation with biomass, R 2 = 0.48 and R 2 = 0.58, respectively.

The results showed that the average carbon stored in natural forest is higher than in the afforestation, which has young trees (approximately 23 years) with low diameter, thus lower volume and lower carbon stock than in older trees.

In this study derivatives of the NIR band and NIRs, including the NDVI index, are highly correlated with above ground carbon stock, similar to those of Gao et al. (2000), Todd et al. (1998) and Tucker (1979), who showed that the NDVI index is not useful for forest areas where vegetation is dense.

With respect to statistical applications, textural analyses are very useful since they provide much more information compared with spectral characteristics, especially for cases where large variations are caused in the information spectrum by a heterogeneous stand forest (Wulder et al. 1998). Texture measures are very sensitive to the spectral reflectance of forest canopy. Observations by Eckert (2012) confirmed that the more homogeneous a forest canopy structure is, the stronger the correlation between biomass and textural parameters; the issue is more relevant when high-resolution images are used (Tuominen and Pekkarinen 2005). Kajisa et al. (2009) used generalized linear model (GLM) to merge the data obtained from field statistics and remote sensing data to present a model for estimating biomass in the forest. Among other factors, this study also uses textural characteristics extracted from satellite images, which showed that the highest average was obtained for the infrared band, in line with the findings of Sohrabi et al. (2010), who compared various bands for estimating the stored volume in each segment and reported that the infrared band was best.

In previous studies, linear regression (Mitchard et al. 2009; Chen et al. 2009) and machine learning such as artificial neural network (ANNs) (Wijaya et al. 2010; Amini and Sumantyo 2009) were used to estimate aboveground biomass (AGB). The present study for modeling used support vector machine (SVM) regression. SVM is an important statistical learning algorithm to estimate forest parameters using remote sensing data (Shataee et al. 2012). Its advantage is its ability to use less training sample data to produce relatively higher accuracy for classification or estimation than other approaches.

On the basis of our results, the third-degree polynomial model was the best model for the entire study area, natural forest and afforestation with R MSE = 77.69 t/ha, 100.81 t/ha and 67.05 t/ha, respectively, to estimate carbon stocks. The results of this study showed that SPOT-HRG satellite data can be useful in estimating carbon stocks.