Unsupervised Hyperspectral Band Selection Based on Maximum Information Entropy and Determinantal Point Process

Yang, Zhijing; Chen, Weizhao; Yan, Yijun; Cao, Faxian; Cai, Nian

doi:10.1007/978-3-030-00563-4_34

Zhijing Yang²⁰,
Weizhao Chen²⁰,
Yijun Yan²¹,
Faxian Cao²⁰ &
…
Nian Cai²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10989))

Included in the following conference series:

International Conference on Brain Inspired Cognitive Systems

2577 Accesses
1 Citations

Abstract

Band selection is of great important for hyperspectral image processing, which can effectively reduce the data redundancy and computation time. In the case of unknown class labels, it is very difficult to select an effective band subset. In this paper, an unsupervised band selection algorithm is proposed which can preserve the original information of the hyperspectral image and select a low-redundancy band subset. First, a search criterion is designed to effectively search the best band subset with maximum information entropy. It is challenging to select a low-redundancy spectral band subset with maximizing the search criteria since it is a NP-hard problem. To overcome this problem, a double-graph model is proposed to capture the correlations between spectral bands with full use of the spatial information. Then, an improved Determinantal Point Process algorithm is presented as the search method to find the low-redundancy band subset from the double-graph model. Experimental results verify that our algorithm achieves better performance than other state-of-the-art methods.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Mutual Information-Based Hierarchical Band Selection Approach for Hyperspectral Images

An Improved Unsupervised Band Selection of Hyperspectral Images Based on Sparse Representation

Unsupervised Hyperspectral Band Selection Method Based on Low-Rank Representation

Keywords

1 Introduction

Hyperspectral data is a three-dimension image with the spatial information and abundant spectral information which makes great development of hyperspectral images in the field of remote sensing [1, 2]. Multidimensional spectral bands can improve the representation of the ground objects. However, not all the bands play an important role in hyperspectral image processing [3]. That is because high dimensional hyperspectral data may cause a lot of problems, such as redundancy of information, noise bands and Hughes phenomenon. Therefore, an effective dimension reduction method is necessary for the hyperspectral data. Traditional dimensionality reduction methods mainly include two types: band selection and feature extraction [5,6,7]. Compared with feature extraction, band selection can effectively protect the original information as much as possible.

The research of band selection can be mainly divided into two directions: supervised band selection [8, 9] and unsupervised band selection [10,11,12]. Supervised band selection method can select a spectral band subset that have a strong correlation with class labels. However, it’s time-consuming to label the hyperspectral images. Therefore, the unsupervised band selection method is more applicable. In this paper, we propose an unsupervised band selection method to select a low-redundancy band subset with maximum amount of information.

The unsupervised band selection process mainly contains two steps: the selection criteria and search strategy. At present, the unsupervised selection methods can be divided into the ranking-based methods [13], clustering-based methods [17], and greedy-algorithms [15]. However, most of the above-mentioned band selection methods only consider the property of the band combination ignoring the latent structure between spectral bands in the high dimensional space [3]. This defect will cause the lack of global considerations. In addition, most band selection methods do not consider the correlation between pairwise bands from the spatial information. Spatial information helps to make the measurement of spectral redundancy more comprehensive. In [3], Yuan et al. proposed a multiple graph to measure similarity between spectral bands by spectral clustering method. However, the spectral clustering has very high computation complexity and high memory requirements. In view of this, this paper proposes a double-graph structure to describe the complex correlation between spectral bands with both pixels information and neighborhood information of pixels.

In addition, selecting a low-redundancy band subset from the graph model with traditional search methods is a challenging work. DPP [16] is a probabilistic model that can select a diverse subset of features. The k-DPP [17], as an improved algorithm of the DPP, can select a low-redundancy band subset with dimension k. The original k-DPP can only be applied to select from single graph model without full use of the spatial information of hyperspectral images. Furthermore, it does not consider the physical properties of each band in the selecting process.

To solve the above problems, a double-graph model DPP is proposed with full use of pixel information and its domain information. To attain a high-performance band subset, a spectral band selection criterion: maximum band subset information [4] is used which means that the selected spectral band subset has a rich amount of information.

2 The Proposed Framework

2.1 Hyperspectral Data Representation

A hyperspectral dataset is represented by $ {\text{B}} = \left\{ {{\text{b}}_{1} ,{\text{b}}_{2} ,{\text{b}}_{3} , \ldots ,{\text{b}}_{l} } \right\} \in R^{n \times l} $, where n represents the number of pixels in each spectral band and l is the total number of all spectral bands. $ {\text{b}}_{i} \left( {1 \le i \le l} \right) $ indicates the i-th spectral band. Given a pixel point $ p_{x}^{j} $, we use $ \hat{p}_{x}^{j} $ to represent the mean of the neighbourhood of pixel $ p_{x}^{j} $. Then, the neighbour information can be rewritten as a new dataset $ {\hat{\text{B}}} = \left\{ {{\hat{\text{b}}}_{1} ,{\hat{\text{b}}}_{2} ,{\hat{\text{b}}}_{3} , \ldots ,{\hat{\text{b}}}_{l} } \right\} \in R^{s \times l} $, where s is the total number of $ {\hat{\text{b}}}_{i } \left( {1 \le i \le l} \right) $.

2.2 Double Graph Model

In [3], a multi-graph structure with spectral clustering method is proposed that can effectively represent the relationship between bands. In this section, we construct a double graph model to capture the complex relationships between pairwise spectral bands with statistical property of space. In graph model, each vertex represents a band b_i and each edge corresponds to the correlation between pairwise bands. The correlation of vertices can be described as an adjacency matrix which can be defined by a Gram matrix. The adjacency matrix of bands is defined as follows:

$$ L_{B} = B^{T} B $$

(1)

And the adjacency matrix of neighbour information data can be expressed as follows:

$$ L_{{\hat{B}}} = \hat{B}^{T} \hat{B} $$

(2)

L_B and $ L_{{\hat{B}}} $ are both l × l positive semidefinite similarity matrices. According to the reference [18], it is clear that

$$ \det \left( {L_{Y} } \right) = {\text{Vol}}^{2} \left( {\left\{ {{\text{b}}_{i} } \right\}_{i \in Y} } \right) $$

(3)

where $ { \det }\left( {{\text{L}}_{Y} } \right) $ is equal to the squared k-dimensional volume of the parallelepiped spanned by the $ \left\{ {{\text{b}}_{i} } \right\}_{i \in Y} $ of B corresponding to bands in subset Y, where k is the dimension of subset Y.

2.3 Search Criterion

To select a band subset with rich information, the evaluation criteria of maximum information entropy [4] is used here. The information entropy of the spectral band subset is defined as:

$$ {\text{max }}H_{k} = - \frac{1}{k}\sum\nolimits_{i = 1}^{k} {\sum\nolimits_{x = 1}^{n} {P\left( {p_{x}^{i} } \right)\log P\left( {p_{x}^{i} } \right)} } $$

(4)

where $ p_{x}^{i} $ represents the pixel value of the i-th spectral band and k is the number of band subset. Equation (4) can be used as the evaluation criteria for the performance of each band subset, which is called MI for short.

2.4 MI-DPP Band Selection Method

The k-DPP [14] can be expressed as follows:

$$ \sum\nolimits_{{\left| {Y^{\prime } } \right| = k}} { \det } \left( {L_{{Y^{\prime } }} } \right) = \det \left( {L + I} \right)\sum\nolimits_{{\left| {Y^{\prime } } \right| = k}} {\text{P}_{L} } \left( {Y^{\prime } } \right) $$

(5)

where $ \text{P}_{L} \left( {Y^{\prime } } \right) $ represents the probability of any band subset $ Y^{\prime } $ with dimension k. According to Eq. (5), k-DPP can be expressed as follows:

$$ P_{L}^{k} \left( Y \right) = \frac{{{ \det }\left( {{\text{L}}_{Y} } \right)}}{{\sum\nolimits_{{\left| {Y^{\prime } } \right| = k}} { \det } \left( {L_{{Y^{\prime } }} } \right)}} $$

(6)

According to reference [13], we can get

$$ P_{L}^{k} = \frac{1}{{e_{k}^{N} }}\det \left( {L + I} \right)\text{P}_{L} = \frac{1}{{e_{k}^{N} }}\sum\nolimits_{\left| Y \right| = k} {P^{{V_{Y} }} \left( {Y^{\prime } } \right)\varPi_{n \in Y} } \lambda_{n} $$

(7)

Equation (7) is the mathematical expression of k-DPP which shows that $ P_{L}^{k} $ is a mixture of the elementary DPPs $ P^{{V_{Y} }} $.

With full use of the spatial information, our algorithm can be written as follows:

$$ P_{{L_{B} ,\,L_{{\hat{B}}} }}^{k} \left( Y \right) = \frac{1}{{e_{B,\,k}^{N} }}\,\sum\nolimits_{\left| Y \right| = k} {uP^{{V_{Y}^{B} }} } \left( {Y^{\prime } } \right)\,\prod\nolimits_{n \in Y} {\lambda_{n}^{B} } + \frac{1}{{e_{{\hat{B},k}}^{N} }}\sum\nolimits_{\left| Y \right| = k} {\hat{u}P^{{V_{Y}^{B} }} } \left( {Y^{\prime}} \right)\prod\nolimits_{n \in Y} {\lambda_{n}^{{\hat{B}}} } $$

(8)

where u and $ \hat{u} $ are scalars that balance the weight of two k-DPPs. Equation (8) shows that our algorithm is a mixture of two elementary DPPs $ P^{{V_{Y}^{B} }} $ and $ P^{{V_{Y}^{{\hat{B}}} }} $, where $ \lambda_{n}^{B} $ and $ \lambda_{n}^{{\hat{B}}} $ are the eigenvalues of the adjacent matrices L_B and $ L_{{\hat{B}}} $ respectively. $ e_{B,k}^{N} = \sum\nolimits_{\left| Y \right| = k} {\prod\nolimits_{n \in Y} {\lambda_{n}^{B} } } $ and $ e_{{\hat{B},k}}^{N} = \sum\nolimits_{\left| Y \right| = k} {\prod\nolimits_{n \in Y} {\lambda_{n}^{{\hat{B}}} } } $ are the eigenvalue polynomials of the adjacent matrices L_B and $ L_{{\hat{B}}} $, respectively. $ P_{{L_{B} ,L_{{\hat{B}}} }}^{k} \left( Y \right) $ is the probability of sampling a band subset Y. Combined with the search criterion MI mentioned in Sect. 2.3, we call the proposed algorithm as MI-DPP. Details of the sampling process of MI-DPP are summarized in Algorithm 1.

We repeat Algorithm 1 for α times to obtain α spectral band subsets Y with the dimension k. Then we select the band subset that meets the maximum search criteria H_k from the α spectral band subsets. With the proposed algorithm MI-DPP, the selected band subset will have the advantages of low redundancy and high information.

3 Experimental Result

In this section, we test the performances of the proposed algorithm and compare with several well-known unsupervised selection algorithms, including MIC [18] and Lscore [19].

3.1 Hyperspectral Datasets

AVRIS sensor: Indian Pines dataset

The India Pines dataset was obtained by the Airborne Visible/Infrared Imaging Spectrometer sensor (AVIRIS) in 1992 [20]. The image covers Indian Pines test site in North-western Indiana which has 220 spectral bands covering the spectrum range of 0.2–2.4 $ \upmu{\text{m}} $. And 20 water absorption spectral bands [104–108], [150–163] are removed and remaining 200 spectral bands form the dataset. It has 145 × 145 spatial pixels, including 16 classes ground truth.

3.2 Experimental Parameter Setup

In this section, we will introduce the parameter settings. The weight scalar u and $ \hat{u} $ are set to be u = $ \hat{u} $ = 0.5. The number of iterations α = 5. The number of selected bands increased by 10 and up to 100 bands. We use the SVM algorithm [21] to test the performance of the selected bands. The parameters C and $ \gamma $ of RBF kernel of SVM are in the range of 10^{1:4} and 2^{-4:4}, and are determined by five-fold cross validation. In all experiments, 10% of the samples were randomly selected for training, and the rest of the samples were used for testing. In Table 1, we show the numbers of training samples and test samples for Indian Pines dataset. The results of the experiment are evaluated through three widely used indexes: overall accuracy (OA), average accuracy (AA) and kappa coefficient [22].

Table 1. The numbers of training and test samples for each category of the Indian Pines image

Full size table

3.3 Runtime Analysis

As showed in [18], k-DPP has a time complexity of $ {\mathcal{O}}\left( {Nk^{3} } \right) $. It spends $ {\mathcal{O}}\left( {N^{3} } \right) $ time to decompose s adjacent kernel matrices of bands. The MI-DPP algorithm takes total time $ {\mathcal{O}}\left( {nN^{3} + nNk^{3} } \right) $ to sample k bands, where n is the number of the similar kernel matrices. The computation time of the algorithm will greatly increase when N and k are large. However, the numbers of N and k are normally small in the selecting progress. We pre-calculate the information entropy for each band and establish lookup tables for Indian Pines dataset. Establishing lookup tables for Indian Pines dataset takes only 0.05 s. All the experiments are carried out on Matlab with Intel CPU E5-1620@3.5 GHz with 32 GB RAM.

In Fig. 1, the computing time of the three algorithms is shown as the number of selected spectral bands increases from 10 to 100. It’s obvious that the calculation time of the proposed MI-DPP is the least comparing with the other algorithms when the dimension of band subset is low. As the number of selected spectral bands increases, the calculation time of the MIC and Lscore does not change significantly. The calculation time of the algorithm MI-DPP will increase as the dimension of selected band subset increases. In general, the proposed algorithm MI-DPP takes less time than the other band selection algorithms.

3.4 Classification Results

In this subsection, we test the performance of the selected subsets of spectral bands on Indian Pines image. In Fig. 2, we compared the classification accuracy of the band subset selected by different algorithms. When the dimension of band subsets is in the range of 10–70, our proposed band selection method MI-DPP is significantly better than other algorithms. The algorithm Lscore performs the worst. When the dimension of spectral band subsets is in the range of 70–100, the performances of the three algorithms are similar. This is because as the number of bands is increasing, the amount of information in the spectral band increases. In the following, we will analyse the performance of the proposed MI-DPP in the low dimension in detail. We set the dimension of selected band subset to be 10 for each dataset and perform the classification tests. The results are shown in Table 2. In Fig. 3, we show the classification maps of three band selection algorithms on Indian Pines. It can be seen that among these three algorithms, Lscore obtains the worse result. Compared with the other two algorithms, the classification accuracy of the optical band selected by the proposed MI-DPP in all categories has been significantly improved. This is because the spectral band subset selected by the algorithm MI-DPP are more diverse and low-redundant which can provide abundant information.

Table 2. Classification accuracy of the band set on the Indian pines image selected by Lscore, MIC and the proposed MIMN-DPP

Full size table

4 Conclusion

In this paper, an unsupervised band selection algorithm called MI-DPP is proposed for hyperspectral images. We use the evaluation criterion to evaluate the spectral band performance, aiming to maintain the maximum band information. A double graph model is used to capture the complex relationship between bands. Furthermore, an improved k-DPP is proposed to sample a diversity and low-redundancy band subset from the two-graph structure. The experimental results on Indian Pines image dataset show that the proposed algorithm has a good expression in the task of band selection.

References

Qiao, T., Yang, Z., Ren, J., et al.: Joint bilateral filtering and spectral similarity-based sparse representation: a generic framework for effective feature extraction and data classification in hyperspectral imaging. Pattern Recogn. 77, 316–328 (2017)
Article Google Scholar
Qiao, T., Ren, J., et al.: Effective denoising and classification of hyperspectral images using curvelet transform and singular spectrum analysis. IEEE Trans. Geosci. Remote Sens. 55(1), 119–133 (2017)
Article Google Scholar
Yuan, Y., Zheng, X., Lu, X.: Discovering diverse subset for unsupervised hyperspectral band selection. IEEE Trans. Image Process. 26(1), 51–64 (2017)
Article MathSciNet Google Scholar
Feng, J., Jiao, L., Liu, F., Sun, T., Zhang, X.: Unsupervised feature selection based on maximum information and minimum redundancy for hyperspectral images. Pattern Recogn. 51, 295–309 (2016)
Article Google Scholar
Zabalza, J., Ren, J., Yang, M., et al.: Novel folded-PCA for improved feature extraction and data reduction with hyperspectral imaging and SAR in remote sensing. ISPRS J. Photogram. Remote Sens. 93, 112–122 (2014)
Article Google Scholar
Zabalza, J., et al.: Structured covariance principal component analysis for real-time onsite feature extraction and dimensionality reduction in hyperspectral imaging. Appl. Opt. 53(20), 4440–4449 (2014)
Article Google Scholar
Ren, J., Zabalza, J., Marshall, S., Zheng, J.: Effective feature extraction and data reduction in remote sensing using hyperspectral imaging. IEEE Sig. Process. Mag. 31(4), 149–154 (2014)
Article Google Scholar
Sotoca, J.M., Pla, F.: Supervised feature selection by clustering using conditional mutual information-based distances. Pattern Recogn. 43(6), 2068–2081 (2010)
Article Google Scholar
Yang, H., Du, Q., Su, H., Sheng, Y.: An efficient method for supervised hyperspectral band selection. IEEE Geosci. Remote Sens. Lett. 8(1), 138–142 (2011)
Article Google Scholar
Sui, C., Tian, Y., Xu, Y., Xie, Y.: Unsupervised band selection by integrating the overall accuracy and redundancy. IEEE Geosci. Remote Sens. Lett. 2(1), 185–189 (2015)
Google Scholar
Wang, C., Gong, M., Zhang, M., Chan, Y.: Unsupervised hyperspectral image band selection via column subset selection. IEEE Geosci. Remote Sens. Lett. 12(7), 1411–1415 (2015)
Article Google Scholar
Zhang, M., Ma, J., Gong, M.: Unsupervised hyperspectral band selection by fuzzy clustering with particle swarm optimization. IEEE Geosci. Remote Sens. Lett. 4(5), 773–777 (2017)
Article Google Scholar
Chang, C., Wang, S.: Constrained band selection for hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 44(6), 1575–1585 (2006)
Article Google Scholar
Jia, S., Ji, Z., Qian, Y., Shen, L.: Unsupervised band selection for hyperspectral imagery classification without manual band removal. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 5(2), 531–543 (2012)
Article Google Scholar
Geng, X., Sun, K., Ji, L., Zhao, Y.: A fast volume-gradient-based band selection method for hyperspectral image. IEEE Trans. Geosci. Remote Sens. 52(11), 7111–7119 (2014)
Article Google Scholar
Kulesza, A., Taskar, B.: Determinantal point processes for machine learning. Found. Trends Mach. Learn. 5(2–3), 123–286 (2012)
Article Google Scholar
Kulesza, A., Taskar, B.: k-DPPs: fixed-size determinantal point processes. In: Proceedings of International Conference on Machine Learning, pp. 1193–1200 (2016)
Google Scholar
Mitra, P., Murthy, C.A., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 301–312 (2002)
Article Google Scholar
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: NIPS, pp. 507–514 (2006)
Google Scholar
Green, R.O., et al.: Imaging spectroscopy and the airborne visible/infrared imaging spectrometer (AVIRIS). Remote Sens. Environ. 65(3), 227–248 (1998)
Article Google Scholar
Bazi, Y., Melgani, F.: Toward an optimal SVM classification system for hyper-spectral remote sensing images. IEEE Trans. Geosci. Remote Sens. 44(11), 3374–3385 (2006)
Article Google Scholar
Foody, G.M.: Status of land cover classification accuracy assessment. Remote Sens. Environ. 80, 185–201 (2002)
Article Google Scholar

Download references

Acknowledgements

This work is supported by the National Nature Science Foundation of China (nos. 61471132 and 61372173), and the Training program for outstanding young teachers in higher education institutions of Guangdong Province (no. YQ2015057).

Author information

Authors and Affiliations

School of Information Engineering, Guangdong University of Technology, Guangzhou, 510006, China
Zhijing Yang, Weizhao Chen, Faxian Cao & Nian Cai
Department of Electronic and Electrical Engineering, University of Strathclyde, Glasgow, G1 1XW, UK
Yijun Yan

Authors

Zhijing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Weizhao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yijun Yan
View author publications
You can also search for this author in PubMed Google Scholar
Faxian Cao
View author publications
You can also search for this author in PubMed Google Scholar
Nian Cai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhijing Yang .

Editor information

Editors and Affiliations

University of Strathclyde, Glasgow, UK
Jinchang Ren
Edinburgh Napier University, Edinburgh, UK
Amir Hussain
Northwestern Polytechnical University, Xi’an, China
Jiangbin Zheng
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
Anhui University, Hefei, China
Bin Luo
Guangdong Polytechnic Normal University, Guangzhou, China
Huimin Zhao
Northwestern Polytechnical University, Xi’an, China
Xinbo Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, Z., Chen, W., Yan, Y., Cao, F., Cai, N. (2018). Unsupervised Hyperspectral Band Selection Based on Maximum Information Entropy and Determinantal Point Process. In: Ren, J., et al. Advances in Brain Inspired Cognitive Systems. BICS 2018. Lecture Notes in Computer Science(), vol 10989. Springer, Cham. https://doi.org/10.1007/978-3-030-00563-4_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-00563-4_34
Published: 06 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00562-7
Online ISBN: 978-3-030-00563-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics