Keywords

1 Introduction

Hyperspectral images (HSI) are widely used in many fields, such as agriculture, military, medicine, due to the rich information it contains [1,2,3,4]. The increase in spatial resolution and spectral resolution makes HSI contain more fine-grained feature information. As the amount of information increases, stronger feature extraction capabilities are needed, multi-feature extraction and fusion are proven to be an effective way to improve feature extraction [5,6,7].

To make full use of information contained in hyperspectral data, multi-feature based strategy has been widely applied to current popular method. He et al. [8] proposed a multi-scale 3D deep convolutional neural network to jointly learn both 2D multi-scale spatial feature and 1D spectral feature from HSI data in an end-to-end approach, achieved better results with large-scale data set. However, Deep Neural Network (DNN) is hard to fuse multi-feature obtained by different methods limited by the structure of neural network. Some methods use traditional extractor to extract multiple shallow features to improve utilization of HSI information by enhancing the diversity of features, such as Extended Morphological Profile (EMP), Gabor, Local Binary Patterns (LBP), Scale-invariant Feature Transform (SIFT), etc. [9,10,11,12]. Li et al. [13] used linear divergence analysis (LDA) to extract spectral features, and adaptive weighted filters (AWFs) to extract spatial information. After multiple iterations, they were fused with LBP features. The experimental results proved that the method can extract feature information further effectively. Zhang et al. [14] used Principal Component Analysis (PCA), Extended morphological profile (EMP), Differential Morphological Profiles (DMP), and Gabor filters to extract different features, and used Support Vector Machine (SVM) and Gabor filters for fusion and classification, achieved good performance. The aforementioned methods combine traditional feature extractor and extracted features have no ability to present well. Liu et al. [15] firstly extracted EMP features from HSI that was reduced by PCA, and then extracted Boolean Map based Saliency (BMS) visual features and fused them. While data redundancy is reduced, feature extraction is improved. However, BMS can not extract edge information well due to the less focus on edge of object. The above methods do not fully extract the spatial features.

After multi-feature extraction provides guarantee for HSI classification, the use of high-performance classifiers is also crucial in HSI classification. Deep forest is a new deep model of the alternative DNN proposed by Zhou et al. [11]. The deep forest used non-differentiable modules to build deep model, and the modules can be decision trees, random forest, et al. As the deep forest inherits the advantages of decision-tree ensemble approach, with fewer hyper-parameters than deep neural networks, and its model complexity can be automatically determined in a data-dependent way. The deep forest includes multi-grained scanning and cascade forest, the multi-grained scanning further enhanced the representational learning ability, potentially enabling deep forest to be contextually or structurally aware; the cascade levels can be automatically determined such that the model complexity can be determined in a data-dependent way. Yin et al. [12] applied deep forest to HSI classification, making it possible for deep forest to be used for HSI classification. Liu et al. [13] proposed a deep multi-grained scanning cascade forest (dgcForest), and improved the deep forest to make it more suitable for HSI classification by deep multi-grained scanning. But the dgcForest extracts features just using different sizes of sliding windows, not making full use of spatial information on HSI.

Extended morphological profile (EMP) is a simple and commonly used feature map method that can denoise images [19]. Saliency detection can highlight the spatial scene features [20]. However, saliency detection will weaken the boundary information of each area in HSI. Edge detection can detect the edge feature of feature map, which makes up for the deficiency of saliency detection. Thus, the combination of three features has a certain rationality.

In this paper, we proposed a multi-feature fusion based deep forest method for HSI classification. The main contributions of this paper are summarized as follows.

  1. 1)

    Three deep multi-grained scanning branches in dgcForest were used to deeply extract EMP features, saliency features and edge features, which can supplement each other, to make full use of spatial information of HSI.

  2. 2)

    In order to enhance features, voting fusion method was used to fuse three deeply extracted features, which can make the link between features closer than decision-level fusion method.

2 Proposed Method

Reasonable multi-feature selection, efficient multi-feature extraction and fusion are essential for promoting HSI classification. In this part, a novel framework is proposed to extract multiple complementary features in depth and fuse them effectively to improve the classification of HSI. The framework is shown in Fig. 1. Firstly, in order to decrease the redundancy of HSI in the spectral channel, PCA is introduced. Secondly, following the role played by EMP and Visual Saliency Detection (VSD) in previous studies, EMP operation is used on the global PCA image to remove the noise and smooth the image. And then VSD is used on EMP image to extract the more salient information, thereby reducing the impact of irrelevant background on HSI classification. However, VSD is difficult to detect the edge of ground objects, which will cut down the classification of pixels in the edge area. Therefore, edge detection (ED) is introduced to compensate for the loss of edge information caused by VSD, which is used to determine the edge by detecting the dramatic changes in the gray value around the pixel and calculating the reciprocal of the gray value change, and locate the edge by detecting the dramatic changes in the gray value around the pixel, that is, calculating the reciprocal of the gray value change. In the process of extracting edge information from EMP images, first, the EMP feature map is denoised by Gaussian blur. Second, a discrete differential operator called the Sobel operator is used to calculate the approximate gradient of the image, and Non-Maximum Suppression (NMS) is used to obtain the point with the largest gradient. Finally, the edge point is obtained by setting the threshold, and the edge maps are obtained by calculating all the edge points. The calculation process is shown in Eq. 1.

$$ \left\{ {\begin{array}{*{20}c} {C\left( E \right) = DUAL\_THRESHOLD\left( {NMS\left( {G_{f} } \right)} \right)} \\ {G_{f} = (arctan\left( {\frac{{G_{x} }}{{G_{y} }}} \right),\sqrt {G_{x}^{2} + G_{y}^{2} } } \\ {G_{x} ,G_{y} = \frac{\partial f}{{\partial x}},\frac{\partial f}{{\partial y}}} \\ \end{array} } \right. $$
(1)

In the Eq. 1, \({\text{C}}\left( {\text{E}} \right)\) represents the operation of edge detection on the EMP feature map, \({\text{DUAL}}\_{\text{THRESHOLD}}\) indicates dual threshold filtering in CED, \({\text{NMS}}\) is non-maximum suppression, \({\text{G}}_{{\text{f}}}\) is the gradient of the pixel \(\left( {{\text{x}},{\text{ y}}} \right)\), including the horizontal gradient \({\text{G}}_{{\text{f}}}\) and the vertical gradient \({\text{G}}_{{\text{f}}}\).

Although VSD and ED can extract semantic features, which is high-level feature compared to texture features. But in the face of HSI, traditional methods are difficult to extract enough useful information. To extract multi-feature effectively, our proposed method uses the efficient feature extractor of dgcForest, deep multi-grain scanning, to extract these three features separately. Firstly, each pixel of HSI is set as the center pixel and the sub-block is obtained by choosing a neighbor region of this pixel with size of \(\left( {2\;{*}\;{\text{w}} - {\text{l}}} \right){*}\left( {2\;{*}\;{\text{w}} - {\text{l}}} \right)\), where \({\text{w}}\) and \({\text{l}}\) is the size and step length of multi-grained scanning window of dgcForest, which ensures that each neighbor region obtained by deep multi-grained scanning contains the central pixel uniquely. Next, these sub-pixels blocks containing global spatial and local spatial information are input in deep multi-grained scanning, and three multi-grained scanning will output three class probability vectors in parallel (random forest, the core component of deep multi-grain scanning, can output a probability vector of class that this sample belongs to). Then, the three feature vectors are simply added to obtain the final deep feature vector containing global and local information. Finally, the fused vector was sent into cascade forest to obtain final prediction.

Fig. 1.
figure 1

The flowchart of our proposed method: mfdForest.

3 Experimental Results

This section presents the classification accuracy of several states-of-the-art methods and three cases that our method used different types of features. The performance of our proposed algorithm will be demonstrated through extensive experiments on two well-known HSI datasets, using three evaluative metrics including average accuracy (AA), overall accuracy (OA) and Kappa coefficient (Kappa). 10% of each category is selected as the training set on the two datasets, and the rest data as testing data, which are described in Table 1. The effects of several contributions of the proposed approach also be summarized in this section. All experiments were implemented on ubuntu16.04, with the CPU of Xeon E5-1603 and GPU of GeForce RTX1080Ti.

Table 1. The detail of two data sets.

To ensure the validity of the experiment, the parameters involved in our methods are consistent with comparative methods: ACGAN [18], ResNet [19], dgcForest [14]. The parameters involved are number of PCA, the number of morphological opening and closing operations on EMP, the size of neighbor region in dgcForest, the number of decision tree in random forest and completely random forest, the number of random forest and completely random forest in deep multi-grained scanning and cascade forest. In order to ensure the consistency of the experiment, the above parameters are set to the same. It’s just that the layer of cascading forest is increased to 2. Table 2 shows the parameters setting. The number of features selected through the first layer of the cascading forest will be different according to the datasets. Except for the experiment about selecting the different number of features, other experiments uniformly use the optimal number corresponding to the datasets.

Table 2. Parameters setting of proposed method.

In order to evaluate the contribution of edge detection, two sets of experiments are prepared. It can be seen from in Fig. 2, Fig. 3, Table 3, Table 4, that the addition of feature extraction branches improves OA, AA, and KAPPA on three datasets. The EMP, EMP + Saliency, mfdForest means that our algorithm uses different type of feature extraction branch, the EMP just uses extended morphological profile feature for mfdForest, the EMP + Saliency uses extended morphological profile and Boolean Map Saliency Detection features for mfdForest, mfdForest uses extended morphological profile, Boolean Map Saliency Detection, and edge detection features together.

In Fig. 2 and Fig. 3, our algorithm has a better performance than other methods in OA in two data sets. Although our method has a lower accuracy than ResNet when using less than three branches in Indian Pines data set, but three branches can still have a best performance. What’s more, in the case of our method using different number of extraction branches, the accuracy was increased with more branches. As can be seen from Table 3 and Table 4, with the addition of the morphological features, saliency features, and edge features extracted, the classification accuracy increases. It shows that each feature we chosen and feature fusion method we used has played a significant role. In Table 3 and Table 4, our method also has superiority in OA, AA and Kappa comparing to other methods, and AA has a subtle change in the case of using two extraction branches in Indian Pines data set. It shows that saliency detection hasn’t improve classification in some classes, which is the point we need to solve later.

In terms of running time in Table 3 and Table 4, which is the total time of training and testing, although our improved algorithm is slower than dgcForest, the reduction in speed has not been presented as multiple, the accuracy of both is higher than the dgcForest, indicating that the three extracted features can improve the classification accuracy and complement each other.

Fig. 2.
figure 2

OA of different methods in Indian Pines

Fig. 3.
figure 3

OA of different methods in Salinas.

Table 3. Accuracy comparison of different algorithms in Indian Pines datasets.
Table 4. Accuracy comparison of different algorithms in Salinas datasets.

4 Conclusion

In this paper, we introduce a multi-feature fusion based deep forest to extract and fuse three feature maps of the HSI for improving classification accuracy. For a certain pixel, even the classification improved by certain feature map is not good, but one of the other two features may have a better effect, so it can work together for each pixel to get the best results. Although we use three branches to increase the size of the model, which result in the running time longer than dgcForest. This problem can be solved by parallel computing, and we will set three branches of mfdForest in parallel in the future work.