Introduction

Forest has significant implication on the environment such as protection of biological diversity and climate change. The forest species map is useful information to drive ecosystem’s model, preserve vegetation and management forest (Liang and Zeng 2009; Wang et al. 2010) It is well reported that the biomass and net productivity are quite different for different tree species (Ustin et al. 2010). However, it is difficult and time-consuming to accurately map the distribution of vegetation species based on ground investigation. Remote sensing techniques provide powerful and efficient tools to solve such problems. Several studies have been carried out in this field, analysing the potential of different remote sensing sensors in vegetation classification. Multispectral sensors (like TM of Landsat Satellites) have been widely used for forest classification and analysis (Lu 2005; Lu et al. 2008; Zaw Htun et al. 2011). Regarding classification, due to the different spectral and spatial resolution of multispectral sensors, it is possible to distinguish vegetation with different levels of geometrical detail. Regarding low-resolution multispectral data such as MODIS, the analysis is generally limited to discrimination between forested and non-forested area (Sedano et al. 2005). With medium resolution sensors such as TM, the level of geometrical detail increases, and the analysis on vegetation type classification can be achieved. With high geometrical resolution sensors such as IKONOS, SPOT 5, a more detailed analysis is possible. Especially the very high resolution imagery such as WorldView-2, GeoEye could get more detail about context information (such as textural information or object-based information) about canopy which was used to distinguish the tree species (Immitzer et al. 2012; Leempoel et al. 2013). However, due to the poor spectral information acquired by these multispectral sensors, they do not permit a detailed analysis to distinguish trees at species level (Jose and David 1996; Gong et al. 1998; Chen et al. 2007).

Recently, hyperspectral remote sensing sensors which provide a significant enhancement of spectral measurement capabilities over conventional multi-spectral data have been widely used for detecting vegetation characteristics. Compared to the multispectral data, the hyperspectral data has much higher spectral resolution and shows great potential in vegetation stress (Smith et al. 2004), measuring chlorophyll content and leaf area index (LAI) of vegetation (Zhao et al. 2007), classifying and mapping vegetation species (Clark et al. 2005; Zhao et al. 2007; Hestir et al. 2008; Huang and Asner 2009; Kozoderov and Dmitriev 2011). Concerning classification problems, hyperspectral images have been used in a variety of forest applications, ranging from discrimination between forest and other land covers, to a more detailed analysis dealing with the distinction of different tree species. All the results confirmed that, with hyperspectral data, it is possible to obtain much higher classification accuracies than with multispectral images.

However, it must be noted that analysis of hyperspectral images are much more complex than multispectral data to classify vegetation species. The first task in for processing hyperspectral images for vegetation species classification is to select the suitable features to distinguish the different species. Feature extraction and feature selection methods were used to solve these problems by selecting optimal bands or optimal subset from the hyperspectral data, such as genetic search algorithms (Vaiphasa et al. 2007), principal component analysis (Bajorski 2011), and minimum noise fraction (Jouan 2007). The classification methods such as the maximum likelihood, decision trees, and random forests classifiers are becoming commonly used in tree species classification. The Support Vector Machines (SVM) which were suggested by Vapnik (1998), are one of the latest effectiveness classifiers which can manage classification problems in hyperdimensional features spaces and have been widely applied in tree species classification. But, these methods consider per-pixel spectral information and do not considering the neighbourhoods of a pixel. Because of the large variation in growing conditions caused by difference in geology, lithology, soil, elevation, historic background, local climatic factors and the land abandonment process itself, a large variety in heterogeneous vegetation communities is found in the area. The heterogeneous vegetation communities are challenging to classify using spectral classifiers because the different vegetation may have a very similar spectral response. And the neighbourhood information may useful for classification species especially in high spatial resolution images.

In this paper, we were used the context and spectral features to classification tree species with airborne hyperspectral image using the SVM classifier. The objects of this paper are to: (1) test whether the spectral and context information can promote the accuracy in tree species classification; (2) compare the effectiveness of different kernel function in SVM classifier in tree species classification.

Data Set Description

The study site is located in natural reserve area in liangshui, HeiLongJiang province, Northeast China. The forest species are dominated by larch, red pine, birch, conifer and poplar. The hyperspectral image of liangshui was acquired by the Compact Airborne Spectrographic Image (CASI) 1500 hyperspectral sensor August 23, 2009. The CASI imagery provides 144 bands at a 2.3 nm spectral resolution and 1.5 m spatial resolution covering the visible and near-infrared range from 350 to 1,050 nm. A natural colour composite image of the study area is given in Fig. 1. The image was then georeferenced by the position and orientation system (POS) data which including inertial measurement unit (IMU) and global position system (GPS).

Fig. 1
figure 1

Combined image of the research area (R:680 nm, G:550 nm and B:450 nm)

Five tree species types, fir, red pine, larch, birch, willow and three other non-forest types, water, built-up areas, cloud were located and marked during ground truth. The tree species classification scheme is shown in Table 1. The sampling unit used in this paper was a pixel and the samples were selected from the CASI Hyperspectral image based on the field survey. There are 18,540 ground truth sample pixels were selected from the CASI hyperspectral image. And 10-fold cross-validation method was used in accuracy estimating. Average overall accuracy was then computed from the confusion matrix with 10th classification.

Table 1 Distribution of ground truth samples among investigated classes

Methodology

The overall method used in this study is shown as flowchart in Fig. 2. First the minimum noise fraction (MNF) transformation to reduce the dimension of the CASI image and then the grey level co-occurrence matrix (GLCM) is used to extract the textural information. The MNF features are combined with textural features and are used for classification of tree species by SVM with different kernel.

Fig. 2
figure 2

Flowchart of the tree species classification with hyperspectral imagery

Minimum Noise Fraction Transform

MNF analysis first suggested with Green et al. (1988). MNF transforms were used to determine the inherent dimensionality of image data, to segregate noise in the data, and to reduce the computational requirements for subsequent processing. The MNF transform is essentially two cascaded principal component’s transformations. The first transformation, based on an estimated noise covariance matrix, decorrelates and rescales the noise in the data. This first step results in transformed data in which the noise has unit variance and no band-to-band correlations. The second step is a standard principal components transformation of the noise-whitened data. For the purposes of further spectral processing, the inherent dimensionality of the data is determined by examination of the eigenvalues and the associated images. The data space can be divided into two parts: one part associated with large eigenvalues and corresponding eigenimages, and a complementary part with near zero eigenvalues and noise-dominated images (Jouan 2007; Nielsen 2011). By using only the coherent portions, the noise is separated from the data, thus improving spectral processing results. Based on MNF results, the first 20 eigenvectors which had the cumulative contribution rate up to 95 % were selected and then used as input for the classifiers.

Texture-Based Features

The texture-based features extracted from the grey-level co-occurrence matrix (GLCM). The GLCM represents the distance and angular spatial relationship over an image sub region of the specified size. The GLCM quantifies texture by measuring the spatial frequency of co-occurrence of pixel grey levels in a user-defined moving kernel and forms a co-occurrence of pixel of kernel. During the computation of the GLCM texture measure, consideration should be given to the window size that would best capture the target classes. The optimal window size could be determined through the image spatial resolution and the tree canopy size. In this paper the semi-variograms method was used to determine the optimal windows size (Onojeghuo and Blackburn 2011). The optimal window size for calculating the GLCM measures is 7. A series of GLCM texture measures were calculated according to the following (Onojeghuo and Blackburn 2011):

$$ CON={\displaystyle \sum_{i, j}{\left( i- j\right)}^2 p\left( i, j\right)} $$
(1)
$$ DIS={\displaystyle \sum_{i, j} p\left( i, j\right)\left| i- j\right|} $$
(2)
$$ ASM={\displaystyle \sum_{i, j} p{\left( i, j\right)}^2} $$
(3)
$$ ENT={\displaystyle \sum_{i, j} p\left( i, j\right) \log \left( p\left( i, j\right)\right)} $$
(4)
$$ HOM={\displaystyle \sum_{i, j}\frac{p\left( i, j\right)}{1+{\left( i- j\right)}^2}} $$
(5)
$$ COR={\displaystyle \sum_{i, j} p\left( i, j\right)\frac{\left( i-{\mu}_i\right)\left( j-{\mu}_j\right)}{\sigma_x{\sigma}_y}} $$
(6)

Where CON is the contrast, i,j are row and col of value in the grey level co-occurrence matrix, DIS is the dissimilarity value, HOM is the homogeneity, ENT is the entropy, ASM is the angular second moment of grey level co-occurrence matrix, COR is the correlation value of grey level co-occurrence matrix.

Classification Method and Accuracy Assessment

The support vector machine (SVM) was used for tree species classification in this paper. SVM classifiers have undergone great development in the last 10 years and have been successfully applied to several remote sensing problems. Let us consider a binary classification problem. And assume that the training set consists of Q vectors x p∈Rq with the corresponding target yp ∈ {−1; +1}, where “+1” and “-1” denote the labels of the considered classes.

The linear SVM approach consists of mapping the data into a higher dimensional feature space to separate the two classes by means of an optimal hyperplane defined by a weight vector w and a bias b. The optimal hyperplane is the one that minimizes a cost function, which expresses a combination of two criteria: margin maximization and error minimization. It is defined as (7) and (8)

$$ \varPsi \left( w,\xi \right)=\frac{1}{2}{\left\Vert w\right\Vert}^2+ C{\displaystyle \sum_{p=1}^Q{\xi}_p} $$
(7)
$$ {y}_p\cdot \left( w\cdot {x}_p+ b\right)\ge 1-{\xi}_p p=1,\cdots, m $$
(8)

Where ξ p are the so-called slack variables and ξ p  ≥ 0.

The constant C which called cost parameter represents a regularization parameter that controls the shape of the discriminant function, and consequently, the decision boundary when data are nonseparable. The above optimization problem can be reformulated through a Lagrange functional for which the Lagrange multipliers can be found by means of a dual optimization leading to a quadratic programming solution. According to the nonlinear case, the SVM uses the kernel functions to generalize the non-linear decision boundaries. Commonly use SVM kernels include polynomial, radial basis function (RBF) and sigmoid kernels. The SVM classifier was also easily extended to multiclass problems with One-Against-One and One-Against-All methods (Vapnik 1998).

Several SVM programs have been developed and made publicly available. In this study, we used the LIBSVM program developed by Hsu et al. (Hsu et al. 2001). We choose the linear, quadratic polynomial, cubic polynomial, sigmoid and RBF kernel to test the effect of different kernels in tree species discrimination. The parameters that are needed in the LIBSVM program were predefined as suggested in Hsu et al. (2001). The SVM need two type of parameters: 1) the kernel function type and its parameters; 2) the cost parameter C. For each kernel function, the kernel parameters are not the same. The Table 2 list the parameters for each kernel function. The appropriate values for these parameters were determined with the guidance of Hsu et al. (2001). Specifically, the values for γ, r, d and C was systematically change from low to high. For each combination of γ, r, d and C, the prediction accuracy of the trained SVM model was estimated through cross-validation. The combination giving the highest prediction accuracy was used to tree species classification.

Table 2 The kernel parameters with different kernel function

In order to evaluate the effectiveness of the proposed tree species classification strategy and achieve the goal of this paper, there are three level experiments were defined: 1) tree species classification with SVM using MNF features; 2) tree species classification with SVM using MNF and texture-based features; 3) SVM classifier with 5 different kernels such as linear, quadratic polynomial, cubic polynomial, sigmoid and RBF kernel.

Results

The overall accuracies of SVM classification method with different kernels and features are given in Table 3. From the table, it can be see that the best classification result of all combinations is the linear kernel function with MNF and texture-based features. The Fig. 3 shows the classification results using SVM with linear kernel function and MNF and texture-based features.

Table 3 The overall accuracy of SVM classifier with different kernel and features
Fig. 3
figure 3

The classification results in SVM with MNF and texture-based features and linear kernel function

The classification accuracy with SVM is different when kernel function changes. On average, the linear kernel function gives the best classification results, followed by RBF and sigmoid kernel. Polynomial kernel functions give the worst classification results. This indicates that the polynomial kernels are not good for MNF and texture-based features in tree species classification in this case.

Considering the features in SVM algorithm, we can find that the over accuracy in MNF combined with texture-based features is higher than that only with MNF features, but the overall accuracy increase is low with all kernel functions in SVM.

The SVM method with linear, RBF and sigmoid kernel functions all perform well in tree species classification with MNF and MNF combined texture-based features when using CASI hyperspectral image. However, the kernel function also influences the classification results. How to select the kernel function maybe has the relationship with the feature types. In this paper, we find that when we use the MNF and textures based features, the linear kernel function has the best the result. The features types may also influence the hyperspectral images classification results. The spectral feature combined with the context feature extracted from hyperspectral images can promote the classification. In this paper, it is found that MNF features combined with texture based features increase the accuracy of the classification though the increase is low.

Discussion

The tree species classification at crown level in forests with high tree species diversity is a big problem with only spectral or textural information. Airborne hyperspectral sensor provides data with both high spatial and spectral resolution imagery which has the huge advantages in tree species classification. The research results in this paper showed that the hyperspectral information combined with textural information can promote the accuracy in tree species classification. And the results are consistent with other researcher’s reports (Immitzer et al. 2012). Overall classification accuracies of 75–90 % are achieved by several groups of researchers in tree species classification. The accuracy in this paper (85.92 %) is in line with the accuracies reported in comparable studies (Clark et al. 2005; Hestir et al. 2008; Zhang et al. 2006; White et al. 2010).

However, classification with MNF features combined the texture features didn’t give much improvement in overall accuracy. The result is not same with the result of Onojeghuo and Blackburn (2011) which indicate that the texture information highly increases the overall accuracy. The mainly reason is that tree species type in these two studies are not similarity. In our experiments, fir, red pine, larch are all coniferous trees, and the appearance is almost the same. And also the same condition in the birch and willow which are broadleaved trees. However, in Onojeghuo and Blackburn (2011) paper, the textural information was used to distinguish the broadleaved, coniferous, grassland and reedbeds, and these four types had the significant difference appearance. Whether textural features increase a little or big accuracy in tree classification may depend on the type of vegetation. If the species are in the same type, the textural features may not give much improvement and if the species are in different type, the textural features may improve much.

SVM is an advanced machine learning algorithms for classification, but the classification accuracy with SVM is different when kernel function changes. How to select the suitable kernel function is depend on the number of features, number of samples and the distribution of the feature (Keerthi and Lin 2003). According to the Keerthi and Lin (2003), if the feature distribution didn’t known, the RBF kernel is a reasonable first choice and this kernel nonlinearly maps samples into a higher dimensional space, so it can handle the case when the relation between class labels and attributes is nonlinear. Furthermore, the linear kernel is special case of RBF; in addition, the sigmoid kernel behaves like RBF for certain parameters (Lin and Lin 2003). Compared with the RBF kernel, the polynomial kernel has more hyper-parameters which will influence the complexity of computation in SVM. The difference of kernel parameters with each kernel function can be seen in Table 2. So, in this paper the accuracy of RBF kernel, linear kernel and sigmoid kernel is almost the same and higher than the polynomial kernel. Another possible reason for different accuracy with different kernel function is the input features. In this paper, the feature number is 26 and the sample number is 18,540. The sample number is much larger than the feature number, so in hyper-plane the linear kernel may classify the 8 class types.

Conclusion

The results indicate that hyperspectral images provide the ability for effective forest species recognition. The spectral and context features were used as input for SVM classifier and compared the effective of different kernel function in SVM classifier for forest tree species classification. The classification results indicate that the SVM method with linear, RBF and sigmoid kernel functions all perform well in tree species classification when using CASI hyperspectral image and the linear kernel function has the best result. MNF features combined with texture based features increase the accuracy of the classification.