Keywords

1 Introduction

Hyperspectral imaging systems record per-pixel reflectance spectroscopy in a number of narrow wavelength bands over a wide range of the electromagnetic spectrum, thereby providing a much higher spectral resolution than the typical gray-level and RGB images, and are thus able to better capture material-specific information. More recently, thanks to the development and decreasing cost of the latest hyperspectral cameras, they have become affordable and popular in many computer vision and multimedia applications [1] such as face recognition, iris recognition, and medical diagnosis [24], etc.

However, the provided numerous spectral bands have led to some problems [5]. Compared to the conventional gray-level and RGB images, despite possessing much more information, HSIs are trapped in increased computational and time cost owning to the large number of spectral bands. Thus it’s quite important to find a way to efficiently process and analyze these HSIs which have hundreds of thousands of contiguous spectral bands. As an ideal solution, image compression algorithms played a great role in solving this problem.

There are many conventional and classical image compression methods that can solve aforementioned problems. Pixel coding-based methods such as JPEG [6], SPIHT [7] and EZW [8] are most widely applied compression methods. Du [9] proposed to deploy PCA [10, 11] in JPEG 2000 for gray-level image processing. 3D coding based algorithms such as the 3D reversible integer lapped transform [12] and the DWT coupled with tucker decomposition [13] are proved to be effective for both spectral and spatial information. However, these methods do not consider the internal characteristics of the HSI, which might result in a loss of the important information in the subsequent image analysis (e.g., classification and recognition). To address this issue, tensor [14] was applied in image compression method to preserve both neighborhood relationship across the spatial dimension as well as global correlation among the spectral dimension. Typical examples are multilinear principal component analysis [15] (MPCA) generalized by PCA, concurrent subspace analysis [16], tensor canonical correlation analysis [17], and some other tensor based HSI compression methods [18, 19]. Although, at first glance, these methods may appear suitable for HSI compression, they were actually designed for a group of tensor objects, and they may not be effective for single tensor data based image compression.

Conventional compression methods usually process the HSI by spectral channel or by pixel, while it is quite different in our framework. To fully exploit the advantages of HSI, we propose a novel patch-based low-rank tensor decomposition (PLTD) method, combining patch-based tensor representation and the tensor extension of dictionary learning [20]. Firstly, HSI is blocked into local tensor patches. Then, by exploring the nonlocal similarity over the spatial domain [21, 22], similar tensor patches are grouped together by clustering method, and a fourth-order tensor are constituted for each cluster. Since the tensor data is assumed to be redundant, each big tensor can be finally low-rank decomposed into low-dimensional coefficient tensor and dictionary matrices. In this way, both spatial and spectral redundancy are optimally removed by our proposed framework.

The remainder of this paper is organized as follows. Section 2 presents the detailed PLTD algorithm for HSI compression and reconstruction. Experiments and analysis are provided in Sect. 4. The last section is a conclusion.

2 Patch-Based Low-Rank Tensor Decomposition (PLTD) for HSI Compression

The compression process of the PLTD framework can be summarized as blocking, clustering, tensor dictionary learning and low-rank decomposition. It can realize simultaneous compression of both the spatial and spectral modes. The reconstructed HSI can be simply obtained by combining the recovered patches which are the tensor product of coefficient tensor and dictionaries per cluster. The overall process is illustrated in the flow chart in Fig. 1. The detailed theory of the framework are given in the following sections, in which notations and multilinear algebras can refer to paper [23].

Fig. 1.
figure 1

Flow chart of the PLTD algorithm.

2.1 Patch-Based Tensor Representation

Given a third-order tensor HSI \( {\mathcal{I}} \in {\text{R}}^{{L^{W} \times L^{H} \times L^{S} }} \), we divide it by block size \( l^{W} \times l^{H} \) (\( l^{W} < L^{W} \), \( l^{H} < L^{H} \)) and consider there is no overlap between patches. We can then construct a group of 3D patches \( \{ {\mathcal{P}}_{i,j} \}_{{1 \le i \le n^{W} ,1 \le j \le n^{H} }} \subset R^{{l^{W} \times l^{H} \times l^{S} }} \), where \( n^{W} = L^{W} /l^{W} \) and \( n^{H} = L^{H} /l^{H} \), and the patch number n is equal to \( n = n^{W} n^{H} \). Each patch is a cube preserving all the spectral information of the original HSI.

To make full use of the local information in the spatial and spectral domain and to simplify the deduction of the algorithm, we divide the patches into several clusters according to nonlocal similarity. After that, each cluster is reformed as a fourth-order tensor and can be resolved individually.

2.2 Tensor Dictionary Learning

Dictionary learning was originally used for image restoration. The objective function of the dictionary learning model can be given as:

$$ \mathop { \hbox{min} }\limits_{{\varvec{D},\varvec{Z }}} \parallel \varvec{A} - \varvec{DZ}\parallel , s.t.{ \mathcal{O}}(z_{i} ) \le s $$
(1)

where \( \varvec{D} = [\varvec{d}_{1} , \ldots ,\varvec{d}_{m} ] \in {\text{R}}^{d \times m} , m > d \) is a redundant dictionary; and \( \varvec{A} = [\varvec{a}_{1} , \ldots ,\varvec{a}_{n} ] \in {\text{R}}^{d \times n} \) is a collection of n sample images, each image is originally 2D and is vectorized to a long vector a i . \( \varvec{Z} = [\varvec{z}_{1} , \ldots ,\varvec{z}_{n} ] \in {\text{R}}^{m \times n} \) is the coefficient matrix. \( {\mathcal{O}}( \cdot ) \) denotes a sparsity controlling operator such as the \( l_{0} \) or \( l_{1} \) norm.

The tensor form of dictionary learning can be written as:

$$ \mathop {\hbox{min} }\limits_{{\varvec{D}^{W} ,\varvec{D}^{H} ,\varvec{D}^{S} ,{\mathcal{Z}}_{i} }} \mathop \sum \nolimits_{k = 1}^{K} \parallel {\mathcal{P}}^{\left( k \right)} - {\mathcal{Z}}_{i} \times_{1} \varvec{D}^{W} \times_{2} \varvec{D}^{H} \times_{3} \varvec{D}^{S} \parallel ,s.t.{ \mathcal{O}}({\mathcal{Z}}_{i} ) \le \varvec{s} $$
(2)

In this function, \( \varvec{D}^{W} \in {\text{R}}^{{l^{W} \times d^{W} }} \), \( \varvec{D}^{H} \in {\text{R}}^{{l^{H} \times d^{H} }} \), \( \varvec{D}^{S} \in {\text{R}}^{{L^{S} \times d^{S} }} \), and (\( l^{W} < d^{W} , l^{H} < d^{H} ,L^{S} < d^{S} \)). When applied to HSI compression problem, \( {\mathcal{P}}_{i} \) is the original data and \( {\mathcal{Z}}_{i} \), \( \varvec{D}^{W} \), \( \varvec{D}^{H} \) and \( \varvec{D}^{S} \) are the compressed data.

Describing clustering by algebraic representation, the clusters after dividing are denoted as \( \left\{ {{\mathcal{P}}_{k,j} } \right\}_{j = 1}^{{n_{k} }} (k = 1, \ldots ,K) \), where K is the cluster number and n k is the number of patches of cluster k. We reform cluster k to be a fourth-order tensor \( {\mathcal{P}}^{(k)} \in R^{{l^{W} \times l^{H} \times L^{S} \times n_{k} }} \). Similarly, we define the corresponding coefficient tensors of cluster k as \( \left\{ {{\mathcal{Z}}_{k,j} } \right\}_{j = 1}^{{n_{k} }} (k = 1, \ldots ,K) \) and its fourth-order tensor as \( {\mathcal{Z}}^{(k)} \in R^{{d^{W} \times d^{H} \times d^{S} \times n_{k} }} \).

In addition, it is verified that for an nth-order tensor \( {\mathcal{T}} \) there is a smallest subset \( I^{1} , \ldots ,I^{n} (I^{i} |_{i = 1, \ldots ,n} \in R^{{m^{i} }} ) \) satisfying that the tensor value \( t\left( {i_{1} , \ldots ,i_{n} } \right) = 0 \) for all \( \left( {i_{1} , \ldots ,i_{3} } \right) \notin [I^{1} ,I^{2} , \ldots ,I^{n - 1} ] \). On this condition, the tensor \( {\mathcal{T}} \) is sparse [24, 25]. We can denote \( idt({\mathcal{T}}) \) as an intrinsic dense tensor of \( {\mathcal{T}} \) composing of all the nonzero entries from \( {\mathcal{T}} \). According to this analysis, the objective function can then be reformed as follows:

$$ \begin{array}{*{20}l} {\mathop {\hbox{min} }\limits_{{\varvec{D}^{W} ,\varvec{D}^{H} ,\varvec{D}^{S} ,{\mathcal{Z}}^{(k)} }} \mathop \sum \nolimits_{k = 1}^{K} \parallel {\mathcal{P}}^{\left( k \right)} - {\mathcal{Z}}^{\left( k \right)} \times_{1} \varvec{D}^{W} \times_{2} \varvec{D}^{H} \times_{3} \varvec{D}^{S} \parallel } \hfill \\ {s.t. {\mathcal{O}}_{t} ({\mathcal{Z}}_{k,j} |_{{j = 1, \ldots ,n_{k} ,k = 1, \ldots ,K}} ) \prec (m_{k}^{W} ,m_{k}^{H} ,m_{k}^{S} )} \hfill \\ \end{array} $$
(3)

The constraint means that the number of nonzero elements of first three modes are less than \( m_{k}^{W} ,m_{k}^{H} \), and \( m_{k}^{S} \), respectively, while the fourth-order is unchanged. Thus constraining the tensor \( {\mathcal{Z}}^{\left( k \right)} \) to be sparse in the spatial and spectral domains.

2.3 Low-Rank Tensor Decomposition

Each patch is linearly combined by a rather small number of dictionary atoms, leading to high redundancy of the three dictionaries. Considering this fact, we split the dictionaries so that each sub-dictionary consists of only those atoms that are utilized by a single cluster. For example, the dictionaries of cluster k can be reformulated as: \( \varvec{D}_{k}^{W} \in {\text{R}}^{{l^{W} \times m_{k}^{W} }} , \varvec{D}_{k}^{H} \in {\text{R}}^{{l^{H} \times m_{k}^{H} }} ,\varvec{D}_{k}^{S} \in {\text{R}}^{{L^{S} \times m_{k}^{S} }} \), where \( m_{k}^{w} ,m_{k}^{H} \) and \( m_{k}^{S} \) represent the number of atoms that are utilized in cluster k along the width, height, and spectral modes, respectively. In the same time, we can replace the coefficient tensor with its intrinsic dense tensor. As a result, the objective function can be reformulated like:

$$ \mathop { \hbox{min} }\limits_{{\varvec{D}_{k}^{W} ,\varvec{D}_{k}^{H} ,\varvec{D}_{k}^{S} ,idt({\mathcal{Z}}^{(k)} )}} \mathop \sum \nolimits_{k = 1}^{K} \parallel {\mathcal{P}}^{\left( k \right)} - idt({\mathcal{Z}}^{\left( k \right)} ) \times_{1} \varvec{D}_{k}^{W} \times_{2} \varvec{D}_{k}^{H} \times_{3} \varvec{D}_{k}^{S} \parallel $$
(4)

The size of coefficient tensor and dictionaries in each cluster are much smaller than the original ones, and the coefficient tensor is dense now so there is no longer any need to impose the sparsity constraint on the values. After replacing \( idt({\mathcal{Z}}^{(k)} ) \) with \( {\mathcal{S}}^{(k)} \) for simplification, the objective function is equivalent to the following:

$$ \mathop {\hbox{min} }\limits_{{\varvec{D}_{k}^{W} ,\varvec{D}_{k}^{H} ,\varvec{D}_{k}^{S} ,{\mathcal{S}}^{(k)} }} \mathop \sum \nolimits_{k = 1}^{K} \parallel {\mathcal{P}}^{\left( k \right)} - {\mathcal{S}}^{(k)} \times_{1} \varvec{D}_{k}^{W} \times_{2} \varvec{D}_{k}^{H} \times_{3} \varvec{D}_{k}^{S} \parallel $$
(5)

This problem is much easier since it has no constraints. Considering that there are only three dictionaries for the fourth-order core tensor, this function can be approximated as a low-rank tensor decomposition problem by adding a matrix:

$$ \mathop {\hbox{min} }\limits_{{\varvec{U}_{k}^{(1)} ,\varvec{U}_{k}^{(2)} ,\varvec{U}_{k}^{(3)} ,\varvec{U}_{k}^{(4)} ,{\mathcal{G}}^{(k)} }} \mathop \sum \nolimits_{k = 1}^{K} \parallel {\mathcal{X}}^{\left( k \right)} - {\mathcal{G}}^{(k)} \times_{1} \varvec{U}_{k}^{(1)} \times_{2} \varvec{U}_{k}^{(2)} \times_{3} \varvec{U}_{k}^{(3)} \times_{4} \varvec{U}_{k}^{(4)} \parallel $$
(6)

Matrices \( \varvec{U}_{k}^{(1)} ,\varvec{U}_{k}^{(2)} \) and \( \varvec{U}_{k}^{(3)} \) are three dictionaries, and the result of \( {\mathcal{G}}^{(k)} \times_{4} \varvec{U}_{k}^{4} \) is equal to coefficient tensor \( {\mathcal{S}}^{(k)} \). Note that \( \varvec{U}_{k}^{3} \in {\text{R}}^{{L^{S} \times m_{k}^{S} }} \) and \( L^{S} > m_{k}^{S} \), so the spectral bands are compressed. Therefore, the redundant spectral information is removed. The detailed steps are exhibited in the following Algorithm 1.

3 Experiments and Analysis

In this section, we present the experiments undertaken on a natural scene hyperspectral dataset and provide the experimental analysis.

3.1 Experimental Setup

We employed five image compression methods for comparison: MPCA [15], PCA [10], JPEG [6], PCA+JPEG, and KSVD [26]. Among which, JPEG and KSVD are originally proposed for gray-level image processing. When dealing with HSIs, JPEG treats them as a group of gray-level images and compresses one by one. PCA treats all the spectral values of one pixel as a sample, i.e., PCA compresses the band information only. MPCA takes the gray-level image of each band as a sample and compress the spatial redundancy only. For the PCA+JPEG method, this method involves successively compressing an image by PCA and JPEG.

The HSI has to be clipped so that the spatial dimension is divisible by the patch size. For JPEG, the spatial dimension size has to be divisible by 8 × 8 since the DCT translation splits the original gray-level image into 8 × 8 data blocks. Considering these limitations, we adopt the clipping step in all the comparison methods. Furthermore, to alleviate the influence caused by data diversity, we normalized the hyperspectral data to keep all the values in the interval [0, 255].

We employed five indices for the image compression performance evaluation: the peak signal-to-noise ratio (PSNR), the structural similarity (SSIM) index, the feature similarity (FSIM) index, the erreur relative globale adimensionnelle de synthese (ERGAS) [27], and the spectral angle mapper (SAM) [28]. For PSNR, SSIM, and FSIM, the larger the values, the better the performance of the algorithm. The opposite goes for ERGAS and SAM.

3.2 Performance on Natural Scene HSI

To begin with, we choose an image from a natural scene HSI database for compression, the HSI is named scene 8 in this paper. The original dimension of scene 8 is 1018 × 1340 × 33. We clip it into the size of 512 × 512 × 33. Then, we normalize it to keep its value be within the scope 0–255. The results and comparison are listed in Table 1. The compression ratio is the ratio between the size of the original data and the sum of the compressed data size. It is reasonable that the compression ratio of the six methods could not be completely the same; however, we tried as far as possible to keep them approximately equal.

Table 1. Comparision of compression and reconstruction results on hyperspectral scene 8.

Note that the above comparisons were carried out at only one compression ratio. So what would the situation be with other compression ratios? Would the other methods outperform the PLTD algorithm? To answer these questions, we varied the compression ratio and tested the compression results. We changed the ratio from 0 to 0.25, which was adequate to test these methods. The comparisons are exhibited in Fig. 2.

Fig. 2.
figure 2

Comparison of the compression and reconstruction results when the compression ratio varies from 0 to 0.25.

According to Fig. 2, despite the certain interval within which the MPCA algorithm or the PCA algorithm perform better than the PLTD algorithm, the PLTD algorithm obtains the best performance in most cases. Exceptions occur because the PCA algorithm only compresses the spectral information, and MPCA only compresses the spatial information, while PLTD compresses both the spatial and spectral information. Therefore, most of the time, the tensor-based PLTD algorithm performs better than the other methods.

To sum up the experiments, we implemented enough experiments on the hyperspectral scene 8 image to confirm that the proposed method outperforms all the other methods.

4 Conclusion

In this paper, a new compression method called patched-based low-rank tensor decomposition (PLTD) have been proposed. The new method employs tensor patch representation, nonlocal similarity, tensor dictionary learning together to solve the problem of effectively compressing HSIs. According to the paper, by clustering, a big data problem is decomposed into several small data problems which can be quickly solved by low-rank decomposition algorithm. Since the decomposed data discard both the spectral and spatial redundancy, and only retain the most discriminant information, the data after compression can better improve the efficiency and accuracy for the subsequent analysis. As far as we know, this is the first time that a patch-based tensor structure has been proposed for HSI compression. And it is confirmed in the experiments that the proposed method outperforms others and has a wide applicability in various fields.