Keywords

1 Introduction

Hyperspectral data is a three-dimension image with the spatial information and abundant spectral information which makes great development of hyperspectral images in the field of remote sensing [1, 2]. Multidimensional spectral bands can improve the representation of the ground objects. However, not all the bands play an important role in hyperspectral image processing [3]. That is because high dimensional hyperspectral data may cause a lot of problems, such as redundancy of information, noise bands and Hughes phenomenon. Therefore, an effective dimension reduction method is necessary for the hyperspectral data. Traditional dimensionality reduction methods mainly include two types: band selection and feature extraction [5,6,7]. Compared with feature extraction, band selection can effectively protect the original information as much as possible.

The research of band selection can be mainly divided into two directions: supervised band selection [8, 9] and unsupervised band selection [10,11,12]. Supervised band selection method can select a spectral band subset that have a strong correlation with class labels. However, it’s time-consuming to label the hyperspectral images. Therefore, the unsupervised band selection method is more applicable. In this paper, we propose an unsupervised band selection method to select a low-redundancy band subset with maximum amount of information.

The unsupervised band selection process mainly contains two steps: the selection criteria and search strategy. At present, the unsupervised selection methods can be divided into the ranking-based methods [13], clustering-based methods [17], and greedy-algorithms [15]. However, most of the above-mentioned band selection methods only consider the property of the band combination ignoring the latent structure between spectral bands in the high dimensional space [3]. This defect will cause the lack of global considerations. In addition, most band selection methods do not consider the correlation between pairwise bands from the spatial information. Spatial information helps to make the measurement of spectral redundancy more comprehensive. In [3], Yuan et al. proposed a multiple graph to measure similarity between spectral bands by spectral clustering method. However, the spectral clustering has very high computation complexity and high memory requirements. In view of this, this paper proposes a double-graph structure to describe the complex correlation between spectral bands with both pixels information and neighborhood information of pixels.

In addition, selecting a low-redundancy band subset from the graph model with traditional search methods is a challenging work. DPP [16] is a probabilistic model that can select a diverse subset of features. The k-DPP [17], as an improved algorithm of the DPP, can select a low-redundancy band subset with dimension k. The original k-DPP can only be applied to select from single graph model without full use of the spatial information of hyperspectral images. Furthermore, it does not consider the physical properties of each band in the selecting process.

To solve the above problems, a double-graph model DPP is proposed with full use of pixel information and its domain information. To attain a high-performance band subset, a spectral band selection criterion: maximum band subset information [4] is used which means that the selected spectral band subset has a rich amount of information.

2 The Proposed Framework

2.1 Hyperspectral Data Representation

A hyperspectral dataset is represented by \( {\text{B}} = \left\{ {{\text{b}}_{1} ,{\text{b}}_{2} ,{\text{b}}_{3} , \ldots ,{\text{b}}_{l} } \right\} \in R^{n \times l} \), where n represents the number of pixels in each spectral band and l is the total number of all spectral bands. \( {\text{b}}_{i} \left( {1 \le i \le l} \right) \) indicates the i-th spectral band. Given a pixel point \( p_{x}^{j} \), we use \( \hat{p}_{x}^{j} \) to represent the mean of the neighbourhood of pixel \( p_{x}^{j} \). Then, the neighbour information can be rewritten as a new dataset \( {\hat{\text{B}}} = \left\{ {{\hat{\text{b}}}_{1} ,{\hat{\text{b}}}_{2} ,{\hat{\text{b}}}_{3} , \ldots ,{\hat{\text{b}}}_{l} } \right\} \in R^{s \times l} \), where s is the total number of \( {\hat{\text{b}}}_{i } \left( {1 \le i \le l} \right) \).

2.2 Double Graph Model

In [3], a multi-graph structure with spectral clustering method is proposed that can effectively represent the relationship between bands. In this section, we construct a double graph model to capture the complex relationships between pairwise spectral bands with statistical property of space. In graph model, each vertex represents a band bi and each edge corresponds to the correlation between pairwise bands. The correlation of vertices can be described as an adjacency matrix which can be defined by a Gram matrix. The adjacency matrix of bands is defined as follows:

$$ L_{B} = B^{T} B $$
(1)

And the adjacency matrix of neighbour information data can be expressed as follows:

$$ L_{{\hat{B}}} = \hat{B}^{T} \hat{B} $$
(2)

LB and \( L_{{\hat{B}}} \) are both l × l positive semidefinite similarity matrices. According to the reference [18], it is clear that

$$ \det \left( {L_{Y} } \right) = {\text{Vol}}^{2} \left( {\left\{ {{\text{b}}_{i} } \right\}_{i \in Y} } \right) $$
(3)

where \( { \det }\left( {{\text{L}}_{Y} } \right) \) is equal to the squared k-dimensional volume of the parallelepiped spanned by the \( \left\{ {{\text{b}}_{i} } \right\}_{i \in Y} \) of B corresponding to bands in subset Y, where k is the dimension of subset Y.

2.3 Search Criterion

To select a band subset with rich information, the evaluation criteria of maximum information entropy [4] is used here. The information entropy of the spectral band subset is defined as:

$$ {\text{max }}H_{k} = - \frac{1}{k}\sum\nolimits_{i = 1}^{k} {\sum\nolimits_{x = 1}^{n} {P\left( {p_{x}^{i} } \right)\log P\left( {p_{x}^{i} } \right)} } $$
(4)

where \( p_{x}^{i} \) represents the pixel value of the i-th spectral band and k is the number of band subset. Equation (4) can be used as the evaluation criteria for the performance of each band subset, which is called MI for short.

2.4 MI-DPP Band Selection Method

The k-DPP [14] can be expressed as follows:

$$ \sum\nolimits_{{\left| {Y^{\prime } } \right| = k}} { \det } \left( {L_{{Y^{\prime } }} } \right) = \det \left( {L + I} \right)\sum\nolimits_{{\left| {Y^{\prime } } \right| = k}} {\text{P}_{L} } \left( {Y^{\prime } } \right) $$
(5)

where \( \text{P}_{L} \left( {Y^{\prime } } \right) \) represents the probability of any band subset \( Y^{\prime } \) with dimension k. According to Eq. (5), k-DPP can be expressed as follows:

$$ P_{L}^{k} \left( Y \right) = \frac{{{ \det }\left( {{\text{L}}_{Y} } \right)}}{{\sum\nolimits_{{\left| {Y^{\prime } } \right| = k}} { \det } \left( {L_{{Y^{\prime } }} } \right)}} $$
(6)

According to reference [13], we can get

$$ P_{L}^{k} = \frac{1}{{e_{k}^{N} }}\det \left( {L + I} \right)\text{P}_{L} = \frac{1}{{e_{k}^{N} }}\sum\nolimits_{\left| Y \right| = k} {P^{{V_{Y} }} \left( {Y^{\prime } } \right)\varPi_{n \in Y} } \lambda_{n} $$
(7)

Equation (7) is the mathematical expression of k-DPP which shows that \( P_{L}^{k} \) is a mixture of the elementary DPPs \( P^{{V_{Y} }} \).

With full use of the spatial information, our algorithm can be written as follows:

$$ P_{{L_{B} ,\,L_{{\hat{B}}} }}^{k} \left( Y \right) = \frac{1}{{e_{B,\,k}^{N} }}\,\sum\nolimits_{\left| Y \right| = k} {uP^{{V_{Y}^{B} }} } \left( {Y^{\prime } } \right)\,\prod\nolimits_{n \in Y} {\lambda_{n}^{B} } + \frac{1}{{e_{{\hat{B},k}}^{N} }}\sum\nolimits_{\left| Y \right| = k} {\hat{u}P^{{V_{Y}^{B} }} } \left( {Y^{\prime}} \right)\prod\nolimits_{n \in Y} {\lambda_{n}^{{\hat{B}}} } $$
(8)

where u and \( \hat{u} \) are scalars that balance the weight of two k-DPPs. Equation (8) shows that our algorithm is a mixture of two elementary DPPs \( P^{{V_{Y}^{B} }} \) and \( P^{{V_{Y}^{{\hat{B}}} }} \), where \( \lambda_{n}^{B} \) and \( \lambda_{n}^{{\hat{B}}} \) are the eigenvalues of the adjacent matrices LB and \( L_{{\hat{B}}} \) respectively. \( e_{B,k}^{N} = \sum\nolimits_{\left| Y \right| = k} {\prod\nolimits_{n \in Y} {\lambda_{n}^{B} } } \) and \( e_{{\hat{B},k}}^{N} = \sum\nolimits_{\left| Y \right| = k} {\prod\nolimits_{n \in Y} {\lambda_{n}^{{\hat{B}}} } } \) are the eigenvalue polynomials of the adjacent matrices LB and \( L_{{\hat{B}}} \), respectively. \( P_{{L_{B} ,L_{{\hat{B}}} }}^{k} \left( Y \right) \) is the probability of sampling a band subset Y. Combined with the search criterion MI mentioned in Sect. 2.3, we call the proposed algorithm as MI-DPP. Details of the sampling process of MI-DPP are summarized in Algorithm 1.

figure a

We repeat Algorithm 1 for α times to obtain α spectral band subsets Y with the dimension k. Then we select the band subset that meets the maximum search criteria Hk from the α spectral band subsets. With the proposed algorithm MI-DPP, the selected band subset will have the advantages of low redundancy and high information.

3 Experimental Result

In this section, we test the performances of the proposed algorithm and compare with several well-known unsupervised selection algorithms, including MIC [18] and Lscore [19].

3.1 Hyperspectral Datasets

AVRIS sensor: Indian Pines dataset

The India Pines dataset was obtained by the Airborne Visible/Infrared Imaging Spectrometer sensor (AVIRIS) in 1992 [20]. The image covers Indian Pines test site in North-western Indiana which has 220 spectral bands covering the spectrum range of 0.2–2.4 \( \upmu{\text{m}} \). And 20 water absorption spectral bands [104–108], [150–163] are removed and remaining 200 spectral bands form the dataset. It has 145 × 145 spatial pixels, including 16 classes ground truth.

3.2 Experimental Parameter Setup

In this section, we will introduce the parameter settings. The weight scalar u and \( \hat{u} \) are set to be u = \( \hat{u} \) = 0.5. The number of iterations α = 5. The number of selected bands increased by 10 and up to 100 bands. We use the SVM algorithm [21] to test the performance of the selected bands. The parameters C and \( \gamma \) of RBF kernel of SVM are in the range of 10{1:4} and 2{-4:4}, and are determined by five-fold cross validation. In all experiments, 10% of the samples were randomly selected for training, and the rest of the samples were used for testing. In Table 1, we show the numbers of training samples and test samples for Indian Pines dataset. The results of the experiment are evaluated through three widely used indexes: overall accuracy (OA), average accuracy (AA) and kappa coefficient [22].

Table 1. The numbers of training and test samples for each category of the Indian Pines image

3.3 Runtime Analysis

As showed in [18], k-DPP has a time complexity of \( {\mathcal{O}}\left( {Nk^{3} } \right) \). It spends \( {\mathcal{O}}\left( {N^{3} } \right) \) time to decompose s adjacent kernel matrices of bands. The MI-DPP algorithm takes total time \( {\mathcal{O}}\left( {nN^{3} + nNk^{3} } \right) \) to sample k bands, where n is the number of the similar kernel matrices. The computation time of the algorithm will greatly increase when N and k are large. However, the numbers of N and k are normally small in the selecting progress. We pre-calculate the information entropy for each band and establish lookup tables for Indian Pines dataset. Establishing lookup tables for Indian Pines dataset takes only 0.05 s. All the experiments are carried out on Matlab with Intel CPU E5-1620@3.5 GHz with 32 GB RAM.

In Fig. 1, the computing time of the three algorithms is shown as the number of selected spectral bands increases from 10 to 100. It’s obvious that the calculation time of the proposed MI-DPP is the least comparing with the other algorithms when the dimension of band subset is low. As the number of selected spectral bands increases, the calculation time of the MIC and Lscore does not change significantly. The calculation time of the algorithm MI-DPP will increase as the dimension of selected band subset increases. In general, the proposed algorithm MI-DPP takes less time than the other band selection algorithms.

Fig. 1.
figure 1

Runtime of MI-DPP, MIC, Lscore on the Indian Pines

3.4 Classification Results

In this subsection, we test the performance of the selected subsets of spectral bands on Indian Pines image. In Fig. 2, we compared the classification accuracy of the band subset selected by different algorithms. When the dimension of band subsets is in the range of 10–70, our proposed band selection method MI-DPP is significantly better than other algorithms. The algorithm Lscore performs the worst. When the dimension of spectral band subsets is in the range of 70–100, the performances of the three algorithms are similar. This is because as the number of bands is increasing, the amount of information in the spectral band increases. In the following, we will analyse the performance of the proposed MI-DPP in the low dimension in detail. We set the dimension of selected band subset to be 10 for each dataset and perform the classification tests. The results are shown in Table 2. In Fig. 3, we show the classification maps of three band selection algorithms on Indian Pines. It can be seen that among these three algorithms, Lscore obtains the worse result. Compared with the other two algorithms, the classification accuracy of the optical band selected by the proposed MI-DPP in all categories has been significantly improved. This is because the spectral band subset selected by the algorithm MI-DPP are more diverse and low-redundant which can provide abundant information.

Fig. 2.
figure 2

Classification accuracy of the three band selection algorithms on Indian Pines

Table 2. Classification accuracy of the band set on the Indian pines image selected by Lscore, MIC and the proposed MIMN-DPP
Fig. 3.
figure 3

Classification maps of three band selection algorithms on Indian Pines

4 Conclusion

In this paper, an unsupervised band selection algorithm called MI-DPP is proposed for hyperspectral images. We use the evaluation criterion to evaluate the spectral band performance, aiming to maintain the maximum band information. A double graph model is used to capture the complex relationship between bands. Furthermore, an improved k-DPP is proposed to sample a diversity and low-redundancy band subset from the two-graph structure. The experimental results on Indian Pines image dataset show that the proposed algorithm has a good expression in the task of band selection.