1 Introduction

Compared with multispectral images, hyperspectral images contain more spectral information which consists of hundreds of different spectral bands, as shown in Fig. 1a. These continuous narrow spectral bands can reflect more detailed, potential, and discriminative land cover information [1, 3, 8]. As shown in Fig. 1b, it can present the spectral differences of different materials clearly on part of bands in HSI. In recent years, the HSI processing has been widely used in target detection [8], precision agriculture [13], military application [20], environmental monitoring [26] etc.

Fig. 1
figure 1

(a) Visual hyperspectral data cube; (b) Spectral curve of some land covers in Indian Pines dataset

HSI classification plays a key role in HSI applications. It aims at assigning a specific label to each pixel so that it describes ground object more precisely [6, 14]. During the last decade, many efforts have been made to complete this task. The SVM-based HSI classification frames are the most classical method, Camps-Valls and Melgani et al. make excellent contributions in this area [2, 23]. In addition, multinomial logistic regression [21], adaptive artificial immune network [19], gaussian process [31], the graph-based model [12, 29] and subject projection [38] are the commonly used classifier in HSI area. All these above mentioned approaches have achieved good classification performance in different aspects by combining their own advantages and the characteristics of HSI. Recently, deep learning architectures have been presented promising performance in various fields such as image segmentation [28], video analysis [37] and HSI classification. To the authors’ knowledge, Chen et al. [5] first introduced the deep learning architecture into HSI classification by utilizing a multilayer stacked autoencoders to extract high-level features. Thereafter, many improved deep learning approaches were developed to obtain a more excellent performance. For instance, Zhang el al. [42] utilized recurrent neural network (RNN) to extract local spatial sequential features for HSI classification. Cheng et al. [7] further improved the classification accuracy by exploring the features generated from different layers in convolutional neural networks (CNN), and designing a unified metric learning based framework to alternately learn spectral-spatial features.

As we know, the superior of deep learning depends on large amount of labeled data. However, the reliable labeled samples are limited in HSI areas for the high cost of time and labor charge in labeling. As stated by Pan et al. [25], some traditional approaches still outperform the deep-learning based frame. As a traditional machine learning technique, sparse representation (SR) has made remarkable achievements in many research areas such as dimensionality reduction [15], image annotation [16], video semantic recognition [17]. The outstanding advantage of this technique is that it does not require large amount of samples to train the model, and no training is required in the residual-based classification model. Thus, many SR based methods are developed for HSI classification. For example, Chen et al. [4] introduced the Laplacian constraint and the joint representation of test sample to the sparsity model, which effectively utilized the neighboring spatial information. Yuan et al. [39] presented a SR method, which contained more discriminative information by utilizing the set-to-set distance. Fu et al. [11] proposed a shape-adaptive SR classification method, which sufficiently exploited the spatial information through constructing a shape-adaptive local region for each target pixel. In general, the aforementioned SR based methods can be categorized as full-sized dictionary based SR. This type of dictionary is composed of the entire training samples and grouped by classes. Simultaneously, another type of techniques associated with dictionary learning were proposed [30, 33, 35], which learned a more representative dictionary from training samples.

Both the two types of SR methods have achieved good performances in terms of accuracy in HSI classification. However, when adopting the full-sized dictionary, the sparsity model may cost much time to solve the coefficients by greed pursuit algorithms, since each target pixel needs to compare with all atoms in the dictionary for searching the most approximate one within each iteration [4, 11, 36, 39]. For the dictionary learning methods, although the learned dictionary may have fewer atoms than full-sized dictionary, the learning procedure is time consuming. In addition, the learned dictionary is not grouped clearly by classes in general such that it cannot be directly used to make a decision for the classification, resulting in an extra classifier to deal with the obtained coefficients [30, 33].

Inspired by the issues mentioned above, we expect to design a simple and effective dictionary, which is both thinner than the full-sized dictionary and still grouped by classes. Moreover, we also pursue no complex updating and iterations in the processing. Accordingly, in this paper, we proposed a compact-dictionary-based SR method for HSI classification. Similar to the full-sized dictionary, the designed compact dictionary is still grouped by classes. While, it no longer consists of the entire training samples but some specific classes of training samples. These classes are determined by the known neighboring labels of target pixel and the spectral similarity between target pixel and the center of each class of training samples. Besides, a spatial location expanding strategy is developed to further exploit the spatial information. During the classification, the compact dictionary is adaptively determined for each test pixel and its size is less than the full-sized dictionary. Also, it is noted that no iteration is required in the formation of the compact dictionary when compared with dictionary learning methods. Therefore, the time consumption of CDSR for HSI classification would be reduced.

To sum up, the major contributions of our work are summarized as follows. 1) A customized compact dictionary for each test pixel is constructed to eliminate the interference of unrelated classes, and reduce the computational burden in the HSI classification. 2) A spatial location expanding classification strategy is designed: first, the unlabeled neighbors of labeled pixels are classified; then these unlabeled neighbors become labeled pixels; continue this process until all test pixels are classified. The expanding strategy can make full use of the spatial contextual information during the generation of compact dictionary.

The rest of the paper is organized as follows. In Section 2, some related works are briefly reviewed. In Section 3, the detailed descriptions of the proposed compact dictionary based sparse representation are provided. In Section 4, experimental results on three commonly used hyperspectral images are presented. Finally, a conclusion and future work are given in Section 5.

2 Related works

Most of HSI classification tasks completed by SR are on the basis of the observation that the hyperspectral pixels in the same class are located in a low-dimensional subspace in many cases [4, 41]. Consequently, each pixel x in a hyperspectral image can be sparsely represented by an appropriate dictionary constructed by or learned from training samples. Let xRB be a B-dimensional spectral vector representing a pixel in HSI, DRB×N be the dictionary and αRN be the sparse coefficient vector, then the above process can be formulated as

$$ \mathbf{x}=\mathbf{D}\boldsymbol{\alpha} $$
(1)

where B is the number of bands of HSI and N represents the number of atoms in D.

In HSI, in view of the fact that the pixels within an appropriate size of window belong to the same class with high probability. Hence, using a region to replace the single spectral pixel would be better to descript the land cover information, and the (1) can be modified as

$$ \mathbf{X}=\mathbf{D}\mathbf{A} $$
(2)

where X = [x1,x2,⋯ ,xQ] ∈ RB×Q is neighboring matrix, which is stacked by neighboring spectral vectors within a \(\sqrt {Q}\times \sqrt {Q}\) window centered at target pixel x; ARN×Q is the corresponding coefficient matrix; for the dictionary, in [4], it is designed as D = [D1,⋯ ,Dc,⋯ ,DC] ∈ RB×N, where C represents the categories of land covers, N denotes the number atoms in dictioanry, \(\mathbf {D}_{c}=[\mathbf {d}_{1}^{c},\mathbf {d}_{2}^{c},\cdots ,\mathbf {d}_{N_{c}}^{c}]\in R^{B\times N_{c}}\) is the subdictionary constructed by Nc training pixels of the c th class and \({\sum }_{c = 1}^{C}N_{c}=N\). To obtain the sparse coefficient matrix A, the (2) can be reformulated as the follow optimal problem:

$$ \hat{\mathbf{A}}=\arg\min\limits_{\mathbf{A}}\| \mathbf{X}-\mathbf{D}\mathbf{A}\|_{F},\quad s.t.\quad\| \mathbf{A}\|_{row,0}\le K $$
(3)

where ∥⋅∥F denotes the Frobenius norm, K represents the sparse constraint, and ∥⋅∥row,0 is called joint sparse norm which guarantees that the nonzero rows have the same index. This means that the neighboring pixels and the target pixel share the same sparse support but have different weight value, because the pixels within the window are assumed as the same class. Through applying Simultaneous Orthogonal Matching Pursuit (SOMP) algorithm [4], an approximate solution can be achieved. Then, the class of target pixel x can be directly determined by the minimal total residuals:

$$ Class(x)=\arg\min\limits_{c}\| \mathbf{X}-\mathbf{D}_{c}\hat{\mathbf{A}}_{c} \|_{F},\quad c = 1,2,\cdots,C. $$
(4)

The model reviewed above is the classical joint sparse representation classification (JSRC). After that, many efforts have been made, mainly focused on three aspects and corresponding to the three parts in (2), neighboring matrix X, sparse coefficient matrix A, and dictionary D. The shape-adaptive joint sparse representation (SASR) [11], weighting prior sparse representation (WPSR) [32] and spatial-aware dictionary learning (SADL) [30], are the representative methods that follow the aforementioned three schemes respectively.

2.1 SASR

The SASR considers that the fixed-size square window used in JSRC may be inappropriate, since the center pixel and the neighboring pixels in the same square window may belong to different classes, especially in complex areas of HSI. Thus, SASR designed a shape-adaptive window Ω for each pixel by utilizing the local polynomial approximation (LPA) filtering technique and the intersection of confidence intervals. Then, a shape-adaptive neighboring matrix can be constructed as Xsa = [x1,x2,⋯ ],xi ∈Ω, and the SASR can be formulated as follows:

$$ \hat{\mathbf{A}}=\arg\min\limits_{\mathbf{A}}\| \mathbf{X}_{sa}-\mathbf{D}\mathbf{A}\|_{F},\quad s.t.\quad\| \mathbf{A}\|_{row,0}\le K. $$
(5)

The final class of test pixel can be determined by the minimal residual:

$$ Class(x)=\arg\min\limits_{c}\| \mathbf{X}_{sa}-\mathbf{D}_{c}\hat{\mathbf{A}}_{c} \|_{F},\quad c = 1,2,\cdots,C. $$
(6)

2.2 WPSR

Similar to SASR, WPSR also noted the possible differences of pixels within the neighboring window, and it is not directly focused on the expression of neighboring matrix, but introducing weighting coefficient into sparse coefficient and relaxing the sparse constraint. The model of WPSR can be expressed as follows:

$$ \min\limits_{\mathbf{A}}\frac{1}{2}\|\mathbf{X}-\mathbf{D}\mathbf{A}\|_{F}^{2}+\lambda_{1}\|\mathbf{A}\|_{1}+\lambda_{2}\sum\limits_{i,j}w_{i,j}\|\boldsymbol{a}_{i}-\boldsymbol{a}_{j}\|_{2}^{2} $$
(7)

where λ1 and λ2 are the regularization parameters. wi,j is the weight coefficient, used to measure the spectral similarity among pixels in neighboring window. The larger weight value indicates the higher similarity. In WPSR, the wi,j is obtained by the sparse subspace clustering method [9]. Once the sparse coefficient matrix is solved, the label of test pixel can be determined by the similar minimal residual rule as (4) and (6).

2.3 SADL

Different from many SR based classifiers, whose dictionary is directly composed of the complete set of training samples and grouped by classes. SADL is a data-driven model, and its fundamental goal is building a representative and discriminative dictionary to better represent the test pixel. The dictionary of SADL is learned from the samples, rather than directly composed of them.

Let \(\{\mathbf {y}_{i}\}_{i = 1}^{i=I}\) denote the set of all spectral vector in HSI. To integrate the spatial information into the dictionary learning, the spectral vector is separated into some groups according the spatial context, i.e., Y = [Y1,⋯ ,Yg,⋯ ,YG], the details of SADL can be formulated as follows:

$$\min\limits_{\mathbf{D},\mathbf{A}}\|\mathbf{Y}-\mathbf{D}\mathbf{A}\|_{F}^{2}+\sum\limits_{g = 1}^{G}\gamma_{g}\|\mathbf{A}_{g}\|_{2,1} \quad s.t \quad \forall n \quad \|\mathbf{d}_{n}\|_{2}\leq 1 $$
(8)

where γg is the regularization parameter for the g th group, Ag is the corresponding sparse matrix for Yg, and ∥Ag2,1 denotes the sum of l2 norm of the rows Ag. The dictionary D and sparse matrix A can be solved by an alternatively iterative scheme. While, it is noted that the label of test pixel cannot be determined by the minimal residual rule since the learned D has no grouping structure. Accordingly, an extra classifier must be involved to deal with the sparse coefficient.

In this paper, our attention also focuses on the construction of the dictionary. In the construction procedure, the complex updating and iterator are avoided compared with the dictionary learning technique. And the constructed dictionary is expected to be more thinner than full-sized dictionary but still keep the grouped structure.

3 Proposed CDSR for HSI classification

In this section, we present the details of the proposed approach from three aspects, i.e., the determination of spatial label set, the determination of spectral set as well as the final classification model. The flowchart of the proposed CDSR is shown in Fig. 2.

Fig. 2
figure 2

The flowchart of the proposed CDSR

3.1 Determination of spatial label set

Generally, many spatial-spectral classifiers integrate the spatial information depending on the assumption that neighboring pixels tend to have similar spectral characteristics [10, 41]. However, they may ignore that the labels of neighboring pixels potentially provide the target pixel with an explicit reference. In this paper, the explicit information is used to shrink the size of the dictionary.

Through scanning a p × p window centered at target pixel, it is easy to obtain a spatial label set Sspa which is composed of the known labels within the window. While limited by the number of training data and its spatial distribution, the obtained Sspa is an empty set sometimes if the test data is classified in random order. Therefore, a spatial location expanding strategy, which spreads the limited and local label information to a wider region, is proposed to deal with this case. The detailed expanding strategy is summarized to the following steps:

  1. Step 1:

    index the positions of pixels adjacent to the label-known pixels;

  2. Step 2:

    classify the unprocessed neighboring test pixels obtained by step 1;

  3. Step 3:

    repeat step 1 and 2 until all unprocessed test pixels are handled.

The graphic illustration of spatial location expanding strategy is shown in Fig. 3.

Fig. 3
figure 3

Illustration of spatial location expanding strategy. Green blocks represent the training pixels, the blocks with other colors represent the test pixels that should be classified after each expansion

3.2 Determination of spectral label set

The obtained Sspa is beneficial to determine a relatively small scope for the compact dictionary, while it tends to be invalid in certain case. For instance, as shown in Fig. 4, the true label of target pixel is not contained in Sspa for there is no label-known pixels of the same class in the scanning window. And this case often appears in the regions that are not adjacent with training data. Under this case, the target pixels will suffer from misclassification. As a result, spectral information is introduced to obtain a more reliable scope for the compact dictionary.

Fig. 4
figure 4

Improper circumstances for determining the scope of compact dictionary just considering spatial information. (i.e., Sspa does not contain the true label of test sample)

Similar to Sspa, a spectral label set Sspe is obtained by measuring the spectral similarity between the neighboring matrix X of the target pixel and each subdictionary Dc. Since it is necessary for all test samples to measure the spectral similarity, a similar average strategy that has been also used in [34] is applied to X and Dc to speed up this procedure. Next, the detailed procedure of getting Sspe is given as follows:

  1. Step 1:

    apply the average strategy to X and Dc to obtain the neighboring center u and the subdictionary center vc, c = 1,2,⋯ ,C, i.e.,

    $$ \boldsymbol{u}=\frac{1}{Q}\sum\limits_{q = 1}^{Q}\boldsymbol{x}_{q} $$
    (9)
    $$ \boldsymbol{v}^{c}=\frac{1}{N_{c}}\sum\limits_{i = 1}^{N_{c}}\boldsymbol{d}^{c}_{i},\qquad c = 1,2,\cdots,C. $$
    (10)
  2. Step 2:

    use the spectral angle mapper (SAM) [43] to measure the spectral similarity between u and vc for its simplicity and efficiency, i.e.,

    $$ sam(\boldsymbol{u},\boldsymbol{v}^{c})=\arccos\frac{{\sum}_{b = 1}^{B}({u}_{b}\times {v_{b}^{c}})}{\sqrt{{\sum}_{b = 1}^{B}(u_{b})^{2}}\sqrt{{\sum}_{b = 1}^{B}({v_{b}^{c}})^{2}}}, \qquad c = 1,2,\cdots,C\quad $$
    (11)

    where \({u^{c}_{b}}\) and \({v^{c}_{b}}\) denote the bth entry of u and vc, respectively.

  3. Step 3:

    sort all sam(u,vc) with ascending order, and use the classes corresponding to the J smallest sam value to form the spectral label set Sspe, where JC and J is called spectral similarity level for the convenience of following description.

3.3 Incorporation of spatial and spectral information to design the compact dictionary

The final scope for target pixel is determined by the union of Sspa and Sspe; and an index used to form the compact dictionary can be defined by

$$ {\Lambda} =Index(S_{spa}\cup S_{spe}) $$
(12)

where Index(⋅) is designed for getting the subdictionary index of corresponding class in full-sized dictionary D. Then, the compact dictionary \(\tilde {\mathbf {D}}\) can be determined as

$$ \tilde{\mathbf{D}}=\mathbf{D}_{{\Lambda} } $$
(13)

where \(\tilde {\mathbf {D}}\in R^{B\times \tilde {N}}\), \(\tilde {N}\) is the number of atoms in \(\tilde {\mathbf {D}}\), and DΛ denotes the part of D corresponding to the index Λ. Compared with the overcomplete dictionaries designed by many methods [4, 11, 39], the proposed compact dictionary turns to be undercomplete when the number of training data is small. In fact, as stated by [24], the overcomplete dictionary is not necessarily required for classification tasks, but plays a great role to reconstruction tasks. For example, methods proposed by [24] and [27] achieve the good classification performance when using the undercomplete dictionary.

After obtaining the compact dictionary, we use (3) and (4) with the corresponding compact dictionary \(\tilde {\mathbf {D}}\) to complete the classification task, i.e.,

$$ \hat{\mathbf{A}}=\arg\min\limits_{\mathbf{A}}\| \mathbf{X}-\tilde{\mathbf{D}}\mathbf{A}\|_{F},\quad s.t.\quad\| \mathbf{A}\|_{row,0}\le K $$
(14)
$$ Class(x)=\arg\min\limits_{c}\| \mathbf{X}-\tilde{\mathbf{D}}_{c}\hat{\mathbf{A}}_{c} \|_{F},\quad c\in S_{spa}\cup S_{spe}. $$
(15)

For a more clear representation, the detailed procedure of CDSR is summarized as Algorithm 1.

figure g

4 Experiments and analysis

4.1 Comparison methods and metrics

To demonstrate the performance of the proposed CDSR, several related methods have been used for comparison, i.e. SVM [23], SVM-CK [2], OMP [4], JSRC [4], SASR [11], SADL [30] and NLSS-RNN [42]. SVM and SVM-CK are the classical and popular SVM-based HSI classifiers; OMP, JSRC, SASR and SADL are the SR-based methods, where SADL employs the technique of dictionary learning, while the others adopts the full-sized dictionary; NLSS-RNN is the popular deep-learning based approach, where a novel local spatial sequential feature was constructed and then was fed into the RNN for classification. In addition, the above methods can also be classified into two categories according to the use of spatial information. For a more clear representation, they are organized into a table as shown in Table 1. In this paper, three commonly used quantitative metrics are used to evaluate the performance of these methods. 1) Overall accuracy (OA): It denotes the ratio of correctly classified test samples in total test samples, i.e., the sum of correctly classified test samples divided by the total number of test samples. 2) Average accuracy (AA): It refers to the mean value of classification accuracy for all individual classes. 3) Kappa coefficient (KA): It is a statistic criteria in terms of omission and commission errors and can be used to measure the degree of agreement between classification result and ground truth. It can be computed by a confusion matrix, the detailed information is available in [18].

Table 1 The characteristics of comparison methods

4.2 Experimental settings and results

For the SVM based methods, we choose the radial basis function (RBF) as the kernel function, the regularization term and kernel parameter are obtained by cross-validation technique. For OMP and JSRC, the optimal parameters are set to be the same as [4]. The parameters of SASR and SADL are also set as the same as the original paper [11, 30] respectively. For the proposed CDSR, the neighboring window size remains as the same as JSRC (i.e., 7 × 7, 15 × 15, 11 × 11 for Indian Pines, Salinas and Pavia University, respectively), since there is no difference in describing the test samples with neighboring information between JSRC and CDSR; the sparse constraint K is set to 3 by referring to [11]. The scanning window size and spectral similarity level J are determined by three-fold cross-validation. In the following, the optimal experimental results are presented by different methods in the case of above parameters on three HSI datasets, and all results are the average values on ten runs. For the NLSS-RNN, the detail data listed in following table are derived from the original paper [42].

Indian Pines dataset

Indian Pines dataset was collected over the Indian Pines test site in North-western Indiana via Airborne/Visible Infrared Imaging Specrometer (AVIRIS) sensor, the ground materials can be classified into 16 classes, the spatial size of this scene is 145 × 145, and its spatial resolution is 20 m. The original data contains 220 bands with the same size corresponding to the different narrow wavelength. 20 polluted bands were removed. Thus, only 200 bands were used in the experiment.

We randomly extract 10% labeled pixels from each class as the training set and the rest as the test samples, as shown in Table 2. The classification accuracies obtained by various classifiers are also shown in Table 2. As can be seen, the accuracies of spectral-spatial classifers (i.e. SVM-CK, JSRC, SASR, SADL, NLSS-RNN) are obvious higher than that of the only spectral classifers (i.e. SVM, OMP). The reason is that the integration of spatial information can enhance the robustness of single spectral pixel to noise. Moreover, it is easy to observe that the proposed CDSR outperforms the other classifiers in terms of OA, AA and KA. And in most land cover categories, CDSR achieves the highest accuracy among all these classifers. Especially, there is no misclassification for class 1, 4, 6, 7, 8, 13 and 14. Such effectiveness of CDSR comes from the fact that the spatial location expanding strategy provides more precise scope for compact dictionary since the Indian Pines dataset is composed of many smooth regions. And the fewer classes contained in compact dictionary mean less interference from other unrelated classes to the sparse coefficients. Besides, the superior performance of CDSR can be partly attributed to that it not only exploits the label information of training samples, but also takes full advantage of the class information of the classified test samples during the classification.

Table 2 Overall, average, and per class accuracy (%), and kappa coefficient of indian pines data set with different methods

In order to further verify the performance of the proposed CDSR, we test all mentioned methods using training samples obtained by manually block sampling as shown in Fig. 5b (i.e., each class of training samples are gathered together and not distributed in all regions of the same class). It can be seen from Fig. 5d–j that many regions without training data suffer from misclassification for all methods. The main reason is that the selected training samples with the block sampling are unable to well cover the spectral variations of the same land-cover class in different regions and the mentioned spatial-spectral methods only utilize the local spatial information. However, CDSR still achieves the best results compared with other classifiers. Also, by introducing the spectral information, CDSR can handle those regions that are not adjacent with training data (e.g. most of pixels enclosed by red line are also recognized correctly as shown in Fig. 5j.

Fig. 5
figure 5

Classification maps of Indian Pines in block sampling and the corresponding overall accuracy. (a) Ground Truth, (b) Training data, (c) Test data, (d) SVM (64.74%), (e) SVM-CK (62.38%), (f) OMP (59.02%), (g) JSRC (72.54%), (h) SASR (75.54%), (i) SADL (67.47%), (j) CDSR (79.32%)

Salinas dataset

The scene of Salinas dataset is located at Salinas Valley, California. 16 land covers were labeled in this area for groundtruth. The spatial size of the Salinas image is 512 × 217, and the spatial resolution of this image is 3.7 m per pixel. After discarding 20 water absorption bands, the remained 200 bands were used as the input of classifiers.

The Salinas dataset consists of many homogeneous regions and its spatial resolution is very high. In order to better demonstrate the superior performance of the proposed methods, we only extract 1% label-known pixels as training samples and the rest of label-known pixels are used to test the classification model as shown in Table 3. The visual classification maps are shown in Fig. 6. It can be observed that almost all classifiers make an obvious misclassification in the upper left area marked by yellow and blue of their corresponding classification maps except the proposed CDSR. The reason for this phenomenon lies in the fact that the spectral differences of these two area (corresponding to the untrained grapes and untrained vinyard in the land) are very small, and the local spatial information used in these specal-spatial classifiers (i.e. SVM-CK, JSRC, SASR, SADL, CDSR) is beneficial to eliminate the influence of noise samples but it is not obvious for increasing the discrimination of spectral feature among land covers. While, for the CDSR, it can exclude many extraneous classes that are unrelated with target pixel in terms of spatial location from the decision range by using the designed compact dictionary, and the introduced spatial location expanding strategy can spread the local explicit label information to a wider region. Therefore, the CDSR obtains a more smoother classification map than other spectral-spatial approaches. Moreover, the detailed data for quantitative metrics is presented in Table 3. It is obvious that the proposed method outperforms other algorithms. Even compared to SASR, SADL and NLSS-RNN, the overall accuracy of CDSR is increased by 0.87%, 2.89% and 2.15%, respectively.

Table 3 Overall, average, and per class accuracy (%), and kappa coefficient of salinas data set with different methods
Fig. 6
figure 6

Groundtruth map and classification maps of different methods on Salinas dataset (1% training data for each class in random sampling). (a) Ground Truth, (b) SVM, (c) SVM-CK, (d) OMP, (e) JSRC, (f) SASR, (g) SADL, (h) CDSR

University of Pavia dataset

Pavia University (PaviaU) dataset, which captured by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor, is of size 610 × 340 with 115 spectral bands, 12 noisy bands among all bands were removed so that we can alleviate its negative effect to classification. This dataset possesses very high spatial resolution (1.3-meter pixels), and a groundtruth labeled with 9 classes is provided.

By referring to [42], we randomly extract 9% labeled pixels from each class as a balanced training set, and the rest of 91% as the test set (see Table 4). The ground truth map and classification maps of University of Pavia are displayed in Fig. 7. It can be seen that the Pavia University dataset consists of many long and scattered regions which are more complicated than the previous two datasets. Looking closely at Fig. 7b and d, there are many salt and pepper noise points in the homogenous regions, because the corresponding classifies only take the spectral information into consideration. Furthermore, we can observe that SASR, SASL and CDSR obtain slightly difference classification maps. All of them are very close to the ground truth map. Hence, for a more effective comparison, the detailed quantitative metrics are also given in Table 4. It is easy to find that CDSR achieves the highest accuracy in class 1, 2, 6 and 8. Also, it should be noted that CDSR is not as well as NLSS-RNN in terms of OA, AA and KA. The reason may be that the small and scattered regions in PaviaU are conductive to the spatial expanding strategy, and a relative large neighboring window size was set in CDSR for balancing other regions. While, the neighboring window size of this setting is not suitable for the small and scattered area. Due to this reason, class 4 and 9 get a relative lower results. In spite of this, compared with the SR-based methods, the CDSR still achieves a comparative result for this dataset in overall.

Table 4 Overall, average, and per class accuracy (%), and kappa coefficient of university of pavia data set with different methods
Fig. 7
figure 7

Groundtruth map and classification maps of different methods on University of Pavia dataset (9% training data for each class in random sampling) (a) Ground Truth, (b) SVM, (c) SVM-CK, (d) OMP, (e) JSRC, (f) SASR, (g) SADL, (h) CDSR

4.3 Computational complexity and running time

For the proposed CDSR, the time-consuming steps consist of three parts: searching scan window, calculating spectral angle mapper, and solving sparse coefficient by SOMP algorithm. For a classification task with M test samples, the computation complexity of determining spatial label set by searching scan window is O(p2M). The time complexity of calculating spectral angle mapper is O(3CMB). By referring to [11], we know that the main operation of SOMP lies in the scalar multiplication, and its number of computation is \({\sum }_{k}(QB(\tilde {N}-k)+ 2k^{2}B+k^{3}+kBQ),k = 1,2,\cdots ,K\). Therefore, the final computation complexity is \(O(M(QKB\tilde {N}+ 2K^{3}B+K^{2}BQ+ 3CB+p^{2}))\).

In this part, we also compare the running time of the mentioned methods in 4.1 with the environment of MATLAB R2014a, Intel Core i5-4590 CPU 3.30GHz and 4GB RAM. Because the NLSS-RNN needs another totally different environment and the original paper does not provide the corresponding data, the running time of NLSS-RNN is not listed in Table 5 along with other methods. It can be seen that the SVM-based methods is less time-consuming, and the C interface called in its source code make a certain contribution. In addition, it should also be noted that the proposed CDSR is the fastest among all SR-based spectral-spatial classifiers including the full-sized dictionary based JSRC, SASR as well as the dictionary learning based SADL. The reason is that the compact dictionary greatly shortens the solution time of the sparse coefficient, and there are no additional iteration and update operation required in the construction of compact dictionary. Although, there is no detailed data about the time-consuming of NLSS-RNN, while, we know that the NLSS-RNN requires 1000 iterations to ensure the convergence of the algorithm according to the original paper [42]. Therefore, the training time of NLSS-RNN should not be too short. Overall, the proposed CDSR is efficient.

Table 5 The average running time (seconds) for ten runs on indian pines, salinas and university of pavia datasets

4.4 Influence of parameters

Both scanning window size and spectral similarity level are the key parameters to influence the scale of compact dictionary. Thus, we analyze the influence of the two parameters in terms of classification accuracies on Indian Pines, Salinas and Pavia University datasets. From Fig. 8a, it can be seen that with the increasing size of the scanning window, the overall accuracy decreases generally; in particular, when the scanning window size is expanded to 7 × 7 or more larger, the accuracy varies little. This can be explained that the compact dictionary contains bigger number of classes with the increasing size of the scanning window, and when the window size is large enough, the number of classes contained in the dictionary will not fluctuate greatly. From Fig. 8b, it can be observed that with the higher of spectral similarity level, the classification accuracy is gradually decreased, which can be explained by the fact that the compact dictionary degrades to be full-sized dictionary with the increasing spectral similarity level.

Fig. 8
figure 8

Overall accuracy with varying scanning window size and spectral similarity level on Indian Pines, Salinas and Pavia University

5 Conclusion

In this paper, we proposed a HSI classification method based on compact-dictionary-based sparse representation by uniting the spatial location expanding strategy and the spectral information. Experiments on three HSI datasets verify the performance of the proposed method in terms of accuracy and efficiency. In the future work, we will try to further improve the classification accuracy in block sampling by combining the global spatial information to eliminate the influence caused by the spectral variations of the same land-cover class in different regions.