A remote sensing image classification method based on sparse representation

Wu, Shulei; Chen, Huandong; Bai, Yong; Zhu, Guokang

doi:10.1007/s11042-016-3320-7

A remote sensing image classification method based on sparse representation

Published: 20 February 2016

Volume 75, pages 12137–12154, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

A remote sensing image classification method based on sparse representation

Download PDF

Shulei Wu^1,2,
Huandong Chen²,
Yong Bai¹ &
…
Guokang Zhu³

558 Accesses
10 Citations
Explore all metrics

Abstract

With the development of remote sensing image applications, sparse-based representation classification approaches have been investigated for better classification accuracy. This paper introduces an improved classification method based on sparse representation by representing the test samples through a dictionary. The key components of our proposed method rely on the feature dictionary construction, sparse representation and image reconstruction. The dictionary is obtained by training samples according to their class for a sparse linear combination. The sparse representation for the image is expressed as sparse coefficients by solving an optimization problem. We describe the method of constructing a dictionary by computing a best matrix to represent all data vectors. We also describe the algorithm used to solve for the sparse representation. Finally, we discuss the way of using the sparse vector to reconstruct the image for classification. In the experiments, the proposed method is applied to two real high spatial resolution images for the classification in comparison to Backpropagation Neural Network, Support Vector Machine, Classification and Regression Trees and K-means. The experimental results show that the proposed method performs better than the benchmark methods in terms of classification accuracy.

Classification Modeling of Multi-Featured Remote Sensing Images Based on Sparse Representation

Hyperspectral Image Classification Using a New Dictionary Learning Approach with Structured Sparse Representation

Hyperspectral image classification based on spatial and spectral features and sparse representation

Article 01 December 2014

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The remote sensing imagery of high spatial resolution (HSR) provides useful geometric and detailed information which can precisely represent the Earth’s surface. Due to the increasing applications of HSR remote sensing imagery, a major issue of land-cover classification is how to improve the accuracy of the image processing. The common HSR remote sensing imagery is obtained from satellites, such as IKONOS, QuickBird, WorldView-2 and Pleiades. The availability of HSR increases the possibility of accurate Earth observations [27] and makes it possible to be widely used. However, urban landscapes become more complicated and have many different objects with similar spectral features. The increasing resolution does not facilitate improvement of the classification accuracy in the same level. Therefore, it is necessary to explore more effective approaches and incorporate the spatial features to deal with the HSR images.

This paper focuses on the problem of classification of a given high-resolution image according to different objects. Our approach is motivated by those research works [9, 36, 39, 40] on sparse signal representation, which suggest that the linear relationship among high-resolution signal elements. We propose an improved strategy to train a dictionary [36] by utilizing the sparsity of the input samples and construct a sparse model to classify the pixels in remote sensing image through adopting the error residual for sparse representation [9, 39]. The sparse vector representing the atoms for the test spectral pixels can be recovered by solving an optimization problem [40]. The classes of the test pixels can then be determined by the characteristics of the recovered sparse vector.

The remainder of this study is organized as follows. In Section 3, the sparse-based representation classification method is introduced. In Section 4, the results of our experiments and the analyses are described, and the effectiveness of our proposed method is demonstrated. Section 5 summarizes this work and draws the future work.

2 Related Work

Toward the classification, various classification approaches have been developed in order to improve the accuracy of the classification, including Independent Component Analysis [30], Artificial Neural Networks [15], Back Propagation Neural Network (BPNN) [4, 16, 45], Hierarchical Hybrid Fuzzy-Neural Network [37], K-Nearest Neighbor [43], likelihood classifier [31], Support Vector Machine (SVM) [3, 6, 25], Classification and Regression Trees (CART) [8], K-means [23, 34] and decision tree classification [20]. Giacinto et al. [14] proposed an approach to the automatic design of effective neural network ensembles, to select the subset formed by the most error-independent nets. Conventional cluster technique such as K-means [23, 34] has been used for image segmentation over years. Luo et al. [23] proposed a spatial constrained K-means approach to solve the image segmentation problem. Back-propagation neural network [16] algorithm, which is a gradient-based method, was explored for classification of multispectral image data. A variation of the SVM-based algorithms [41] put forward a set of tools for structured classification, and generalized the traditional non-structured classification approaches.

However, the above traditional classifiers are inadequate for HSR imagery [17]. In this context, the features [1, 11, 18, 19, 29, 32, 35] were used to enhance the spectral information and raise the classification accuracy. Ouma and Tateishi [29] presented a pre-classification filtering method based on unsupervised multiresolution non-linear image filtering that combines spectral and textural image characteristics. The local texture characteristics were extracted via wavelet decomposition. Huang et al. [18] proposed some statistical measures to extract some structural features and used different classifiers including maximum likelihood classifier, BPNN, probability neural network based on expectation–maximization training, and SVM to process the hybrid spectral-structural features after the steps of spatial feature extraction and dimension reduction. Pingel et al. [32] developed the Morphological Filter algorithm to be competitive with other ground filtering algorithms for LIDAR and established a baseline performance for a progressive morphological filter implemented in its simplest form.

Researchers proposed to exploit spatial information for complementing the spectral feature space and enhancing separability of the spectrally similar classes [5, 11, 42]. Dópido et al. [11] developed a semisupervised self-learning framework in which the machine learning algorithm itself selects the most useful and informative unlabeled samples for hyperspectral image classification. However, this method was dependent on the assumption that the pixels with similar spectral signature belong to the same class. This might be possible for hyperspectral images, but not for multispectral images, since they contain many spectral ambiguities (e.g., roofs and roads, water and shadow). Bruzzone et al. [5] proposed a pixel-based system, which was aimed at obtaining accurate and reliable maps both by preserving the geometrical details in the images and by properly considering the spatial-context information, for the supervised classification of high spatial resolution images. Tuia et al. [42] presented a classification method for very high resolution images by exploiting efficient multisource information, both spectral and spatial through the combination of SVMs and composite kernels. Fauvel et al. [13] used kernel methods which deal with the joint use of the spatial and the spectral information through a support vector machine formulation.

Moreover, this is meaningful for classification of land cover, but not sufficient for applications of urban mapping, since the impervious surfaces need to be elaborated into more detailed objects (e.g., tree, residential area, and water). Therefore, it is desired to explore more effective algorithms, such as sparse representation and compressive sensing. Sparse representation has been an extremely powerful tool in many classical signal processing applications.

For sparse representation, Chen and Donoho proposed the so-called Basis Pursuit (BP) algorithm [7]. BP is a principle for decomposing a signal into an optimal super position of dictionary elements, and optimal means to have the smallest l ¹ norm of coefficients among all such decompositions. Mallat and Zhang [24] used an over-complete redundant dictionary for signal representation. They gave rise to the Matching Pursuit (MP) algorithm for the sparse reconstruction, and pointed out that the stronger a sparse signal is, the more accurate the reconstruction will be. The MP algorithm is a greedy algorithm, but is different from the BP algorithm. MP is a local optimization algorithm, of which the final result may not converge and may not necessarily to find the global optimal solution. Differently, Tropp and Gilbert presented the Orthogonal Matching Pursuit (OMP) algorithm [38] to deal with the convergence problem to obtain the most matching signal. In optimization, OMP selects an atom set to conduct the orthogonal optimization Gram-Schmidt in each iteration. In OMP, fewer samples are required and less iterative times are needed to achieve the optimal result compared with MP. Olshausen [28] pointed out that each image has a sparse nature.

Whereafter, the theory of the sparse theory developed rapidly. It has been adopted and employed effectively in the field of image processing [2, 12]. Furthermore, a better performance of sparse optimization algorithm [33] was proposed. Donoho and Candes presented the concept of compressive sensing [10] based on the sparse theory to further develop the sparse signal representation theory.

In very recent years, sparse representation has been further studied in literature [21, 22, 26, 42, 44, 45]. A nonlocal weighted joint sparse representation classification method [46] was proposed to improve the remote sensing image classification result, with different weights, for different neighboring pixels around the central test pixel, and the simultaneous orthogonal matching pursuit technique. Moody et al. [26] presented a technical method of land-cover unsupervised classification in multispectral satellite imagery, using sparse representations in learned dictionaries: clustering on sparse approximations and applying a Hebbian learning rule to build multispectral, multi-resolution dictionaries. In [22], Zhang et al. proposed a hyperspectral image anomaly detection approach using background joint sparse representation, which adaptively selects the most representative background bases for the local region. Zhang et al. [21] put forward a superpixel-level sparse representation classification resolution with multitask learning for hyperspectral imagery. Their proposed algorithm exploited the class-level sparsity prior for multiple-feature fusion, and the correlation and distinctiveness of pixels in a spatial local region. Yu et al. [44] proposed a remote sensing image classification method based on sparse component analysis, whose classification result is more reliable and more accurate.

3 Image Classification Model

In this paper, we focus on classification of a given high-resolution image using DigitalGlobe’s WorldView-2 satellite imagery. The main contribution of this paper is to develop an efficient solution for image classification by using nearest neighbor joint sparse linear combination to build the feature dictionary and applying pursuit algorithm joint sparse representation for image reconstruction. In this section, we mainly introduce the key components of our proposed method: the feature dictionary construction, sparse representation and image reconstruction. The idea of constructing the dictionary is to find a best matrix to represent all data vectors through extracting features directly from the data itself by nearest neighbor. We select randomly the training data set to construct the feature dictionary according to their classes by a sparse linear combination. So we describe the algorithm used to solve for the sparse representation. In our method, the sparse coefficients of test samples are divided into several groups, corresponding to the dictionary components representing specific classes. The test samples of image are represented by the sparse representation. We then discuss how to determine the class of the test pixel. The proposed classification model is shown in Fig. 1. The proposed classification model mainly consists of three steps: (1) feature dictionary construction, (2) sparse representation, (3) classification decision.

3.1 Sparse representation model

Let f be a pixel observation from an input signal with l − dimension for classification. In the sparse representation model, test spectral pixels, which lie approximately in several subspaces, are approximately represented by a few training examples. Suppose we have T distinct classes, and any one training sample for each class have n training data. This training sample can be trained to k dictionary elements. And test samples can be modeled to the T subspaces according to the T classes from the dictionary D. If the pixel f belongs to the ith class, we can represent f through these training data as a linear combination for the ith class. Thus, the test pixel f can be expressed as

$$ f=D\alpha =\left[\begin{array}{ccc}\hfill {d}_1^i\hfill & \hfill \cdots {d}_j^i\cdots \hfill & \hfill {d}_n^i\hfill \end{array}\right]\left[\begin{array}{c}\hfill {\alpha}_1^i\hfill \\ {}\hfill \vdots \hfill \\ {}\hfill {\alpha}_n^i\hfill \end{array}\right]={d}_1^i{\alpha}_1^i+{d}_2^i{\alpha}_2^i+\cdots +{d}_n^i{\alpha}_n^i,\ 1\le i\le T,\ 1\le j\le n, $$

(1)

where D = {d ⁱ_j } ^n,T_{j = 1,i = 1} is a feature dictionary which totally has n training data from the input sample of ith class and α ⁱ is a sparse vector. The coefficients of the sparse representations α can be decomposed to T pieces, each α ⁱ is a sparse vector which has only a few nonzero entries. Therefore, the sparse representation of the test pixel f can also be expressed as a linear combination of only the K dictionary atoms α _k (k = 1, . . . , K) which is a vector with K (K = ‖α‖₀) nonzero entries. Thus, f can be written as

$$ f=D\alpha =\left[\begin{array}{ccc}\hfill {d}_1^i\hfill & \hfill \cdots \hfill & \hfill {d}_K^i\hfill \end{array}\right]\left[\begin{array}{c}\hfill {\alpha}_1^i\hfill \\ {}\hfill \vdots \hfill \\ {}\hfill {\alpha}_K^i\hfill \end{array}\right]={d}_1^i{\alpha}_1^i+{d}_2^i{\alpha}_2^i+\cdots +{d}_K^i{\alpha}_K^i,\ 1\le i\le T $$

(2)

where K denotes the number of nonzero elements in the vector α. Next, we will train a dictionary from a set of input samples. And we also will introduce how to obtain the sparse vector α and how to classify test samples from the sparse vector α.

3.2 Feature space construction

We consider a method for constructing the dictionary that produces sparse representations for the training examples. For sparse representation, it is a procedure of computing the representation coefficients based on the given examples and dictionary. Here, we will construct the feature dictionary from the input examples. In the proposed model, assume that the pixels of spectral features belonging to the same class approximately lie in the same subspace. The construction strategy of the feature dictionary is to model the best centers based on the training examples to express the most distinct characteristics of the presented objects.

Given a remote sensing image with Q channels and N × M pixels as an input signal be such a set F = {f ^l_i,j } ^Q,N,M_{l = 1,i = 1,j = 1} (1 ≤ l ≤ Q, 1 ≤ i ≤ N, 1 ≤ j ≤ M), where l = 1, 2, ⋯, Q, and N, M is the number of rows and columns respectively. Suppose there be T distinct classes contained in the image in accordance with different plants or objects, and any one class has n training data. We select T types of representative samples from the training dataset, and input them into a sample set, S = (s ₁, ⋯, s _i, ⋯, s _T) (1 ≤ i ≤ T), where s _i is a subset corresponding to the ith class. It contains n data points [x ⁱ₁ , x ⁱ₂ , ⋯, x ⁱ_n ] with l bands, where x ⁱ is a data point in the subset s _i. Then, we construct the feature vector D = [d ₁, d ₂, ⋯, d _K], which can be viewed as a dictionary including a total of K (K < < NM) elements, where D ∈ ℝ ^Q × K. In addition, associated with this feature matrix, we have a class index table W = [I ₁, ⋯, I _i, ⋯ I _K], where 1 ≤ I _i ≤ T, and I _i records the class label of the feature pixel i, i = 1, 2, ⋯, K, that is, I _i indicates the class which the dictionary element d _i belongs to. For the given training set of image patches, each is reshaped as a two-dimensional vector. For better description, this image is rewritten as F = {f ^l_j } ^Q,NM_{l = 1,j = 1} , (1 ≤ l ≤ Q, 1 ≤ j ≤ NM), which can be represented as a sparse linear combination of these feature vectors. The representation of F may be approximate, that is F ≈ Dα, which satisfies the constrain ‖F − Dα‖₂ ≤ ε . The vector α involves the representation coefficients of the image F. We can write f _j = Dα _j, where α _j = e _j is a vector from the trivial basis, with all zero elements except the one in the pth position. The index p is selected such that

$$ \forall p\ne q{\left\Vert {f}_i-D{\alpha}_p\right\Vert}_2^2\le {\left\Vert {f}_i-D{\alpha}_q\right\Vert}_2^2. $$

(3)

For the sparse representation of the data set F, the minimization of error is computed in order to search the best possible dictionary D with K items. And it could alternatively be met by considering

$$ <D,W,\alpha >= \arg \underset{\alpha }{ \min }{\left\Vert F-D\alpha \right\Vert}_2^2\ s.t\ \forall j,\ {\alpha}_j={e}_j. $$

(4)

Algorithm 1 (Training a dictionary).
Task: Find a best matrix to represent all data vectors for constructing a dictionary by nearest neighbor. Input: A remote sensing image with Q channels and N × M pixels F = {f ^l_i,j } ^Q,N,M_{l = 1,i = 1,j = 1} , 1 ≤ l ≤ Q, 1 ≤ i ≤ N, 1 ≤ j ≤ M. Initialization: Randomly select k (k = K/T) data points from a sample subset s _i as the initial representatives φ ⁱ = [φ ⁱ₁ , φ ⁱ₂ , ⋯, φ ⁱ_k ], set i = 1 and repeat it until i reaches T. 1: Compute the k best centers from these n data points for each training sample s _i, and w ⁱ_j records the index of the best possible point for each data sample, w ⁱ_j = {p\| ∀ p ≠ q, ‖x ⁱ_j − φ ⁱ_p ‖₂ < ‖x ⁱ_j − φ ⁱ_q ‖₂}, 1 ≤ p, q ≤ k, 1 ≤ j ≤ n, 1 ≤ i ≤ T. 2: The representatives [ψ ⁱ₁ , ψ ⁱ₂ , ⋯, ψ ⁱ_k ] is obtained by the following formula: ψ ⁱ = {x ⁱ_p \| ∀ p ≠ q, ‖x ⁱ_p − x ⁱ_m ‖₂ < ‖x ⁱ_q − x ⁱ_m ‖₂}, x ⁱ_p , x ⁱ_q , x ⁱ_m ∈ φ ⁱ. 3: Update φ ⁱ by φ ⁱ = ψ ⁱ. 4: Go back to step 2. Repeat it until ψ ⁱ is equal to φ ⁱ. Output: A dictionary and a class vector.

3.3 Reconstruction and classification

We describe the way we use the sparse vector α for a test sample f _j (1 ≤ j ≤ NM) when reconstruct and classify it. At the moment, the dictionary D is obtained and known. Every image patch f _j could be represented sparsely over this dictionary. And the representation α _j satisfying Dα _j = f _j is obtained by solving the following optimization problem:

$$ {\widehat{\alpha}}_j= \arg\ \min {\left\Vert {\alpha}_j\right\Vert}_0\ s.t\ D{\alpha}_j={f}_j. $$

(5)

In order to solve the problem of searching the sparsest representation of f _j, the equality constraint in (5) can be formulated to an inequality one as

$$ {\widehat{\alpha}}_j= \arg\ \min {\left\Vert {\alpha}_j\right\Vert}_0\ s.t\ {\left\Vert D{\alpha}_j-{f}_j\right\Vert}_2\le \varepsilon, $$

(6)

where ε is the error. The above problem can also be considered as minimizing the approximation error within a certain sparsity level. We can compute the error residual as r _j = f _j − Dα _j. Notice that the above optimization problem can be replaced by

$$ {\widehat{\alpha}}_j= \arg \kern0.2em \min {\left\Vert D{\alpha}_j-{f}_j\right\Vert}_2\ s.t\ {\left\Vert {\alpha}_j\right\Vert}_0\le L, $$

(7)

where L express the sparsity level for the approximation error. Compute the residual for the ith class, that is, the error between the test sample f _j and the reconstruction from training samples in the ith class. The class of f _j can be determined by the recovered sparse vector $ {\widehat{\alpha}}_j $ as

$$ c\left({f}_j\right)= \arg\ \underset{i}{ \max}\left|{D}^i{\widehat{\alpha}}_j^i\right|,\kern0.1em s.t\kern0.1em \min \parallel {f}_j-{D}^i{\widehat{\alpha}}_j^i{\parallel}_2,\forall i,\kern0.1em 1\le i\le T, $$

(8)

where $ {\widehat{\alpha}}_j^i $ denotes the portion of the recovered sparse coefficients corresponding to the training samples in the ith class.

Eventually, we can obtain the final classification result $ \widehat{F} $ for the image F as (9).

$$ \widehat{F}=\left\{{f}_j\left|{f}_j\right.= colo{r}_{W\left({j}_0\right)},\forall c\left({f}_j\right)\in W\left({j}_0\right),1\le W\left({j}_0\right)\le T,1\le j\le NM\right\} $$

(9)

Algorithm 2 (Reconstruction and Classification).
Task: Construct the image and determine the class of the test pixels. Input: A normalized feature dictionary D, class vector W and sparsity level L. Initialization: Set j = 1 and repeat it until j reaches NM. 1: Choose the index j ₀, 1 ≤ j ₀ ≤ K, such that $ \left\|{\varphi_{j_0}}^T{r}_{j_0}\right\| $ is maximized. We say that j ₀ is the index of the maximum value of the product of the residual r _j and the atom φ _j, j = 1, 2, ⋯, NM, from the class index table W, i.e. j ₀ = arg max_{j = 1 … NM}\|〈r _j, φ _j〉\|. 2: Update the index set by I _j = I _j − 1 ∪ {j ₀} and the size incremental matrix by $ {A}_j={A}_{j-1}\cup \left\{{\varphi}_{j_0}\right\} $, when the product of the residual r _j and the atom φ _j is the maximum value. Then, remove the current column vector $ {\varphi}_{j_0} $ from the dictionary D, denoted by $ D=D\backslash \left\{{\varphi}_{j_0}\right\} $. 3: In this step, first, decompose A _j by A _j = UZV ^T. Then we can obtain the orthogonal vectors U and V ^T, and the singular value vector Z whose diagonal elements are called singular value. In addition to the diagonal elements, the value of its elements is zero. Therefore, we calculate the sparse coefficient by $ {\alpha}_j=V\times {\scriptscriptstyle \frac{1}{Z}}\times {U}^T $ and recompute α _j by α _j = α _j × f _j. 4: The residual is updated by the formula as $ {r}_j={f}_j-{D}_{j_0}{\alpha}_j $, s. t ‖α _j‖₀ ≤ L. 5: The class of f _j can be determined by the recovered sparse vector $ {\widehat{\alpha}}_j $ as $ c\left({f}_j\right)= \arg \underset{i}{ \max}\left\|{D}^i{\widehat{\alpha}}_j^i\right\| $, s. t $ \min {\left\Vert {f}_j-{D}^i{\widehat{\alpha}}_j^i\right\Vert}_2 $, ∀ i, 1 ≤ i ≤ T. 6: The final classification result $ \widehat{F} $ for the image F is obtained as $ \widehat{F}=\left\{{f}_j\Big\|{f}_j= colo{r}_{W\left({j}_0\right)},\forall c\left({f}_j\right)\in W\left({j}_0\right),1\le W\left({j}_0\right)\le T,1\le j\le NM\right\} $. 7: j = j + 1 Output: The coloured classification image .

4 Experimental results and analysis

In this paper, we focus on classification using DigitalGlobe’s WorldView-2 satellite imagery. The sensor provides the highest resolution commercially available multispectral data and has eight multispectral bands: four standard bands (red, green, blue, and near-infrared 1) and four new bands. Ordered from shorter to longer wavelength, the list of bands is coastal blue, blue, green, yellow, red, red edge, near-infrared 1 (NIR1), and near-infrared 2 (NIR2).

In this section, two data sets are applied for the experiment. We adopt just three bands (Red, Green, and Blue) shown in Fig. 2. We illustrate the effectiveness of the proposed classification method by comparing it with other traditional classifiers, which can be divided into two categories: supervised methods and unsupervised methods according to the previous works of researchers in this field. The first category is supervised method which focuses on learning feature representation and whose training samples with identity labels are required, for example, BPNN, SVM and CART. The second category is unsupervised method, which mainly focuses on feature extraction, such as K-means. The experiments aim to compare the performance of the proposed method with the other four classifiers. Thus, the average accuracy (AA), overall accuracy (OA), Kappa (Ka), producer’s accuracy (PA), and user’s accuracy (UA) are used as the accuracy statistical parameters. For each image, we quantitatively and visually compare and evaluate the classification results of these methods.

4.1 Experiment I for the image 1

The first dataset in our experiments was obtained from DigitalGlobe, which was acquired on 17 May 2010, as shown in Fig. 2a. It contains eight typical classes, including the bare land, residential area, grass, tree, and four different crops, which are labeled as: 1-bare land, 2-residential area, 3-grass, 4-tree, 5-crop1, 6-crop2, 7-crop3, and 8-crop4, respectively. Please refer to Fig. 3a. We randomly select around 11 % samples with ground truth class labels to train the classifiers, and use the rest as testing samples for evaluation. The number of training and testing samples for each class is shown in Table 1.

Table 1 The training and testing sets for each class (labeled as 1-bare land, 2-residential area, 3-grass, 4-tree, 5-crop1, 6-crop2, 7-crop3, and 8-crop4 in Fig. 3a)

Full size table

In order to verify the superiority of our proposed method, we make classification to this image by employing the proposed method, BPNN, SVM, CART and K-means. Figure 3 shows the classification results of the five classifiers. Thereafter, we analyze and compare their experimental results. The classification maps are shown in Fig. 3b-f. It is clear that in the K-means map, shown in Fig. 3f, all kinds of objects are grievously misclassified; as for the BPNN method, shown in Fig. 3e, the objects illustrated with highlight colors, such as the bare lands, residential areas, and crops2, are very easy to recognize, whereas the green-colored objects, such as the crops1, crops2, crops3, crops4, and especially the grasses and trees, are seriously misclassified; also, many crop4 pixels are wrongly labeled as grasses in the classification map; for the SVM and CART classification result in Fig. 3c, d, it can be clearly seen that there is severe misclassification among these classes. Relatively, through the comparison of our proposed method with the other four methods, our proposed method has achieved a great improvement, namely, the better distinction of objects, particularly the bare lands, residential areas, grasses, trees, crops3, and crops4. Only crops1 and crops2 have been classified with confusions. The result of the proposed method is shown in Fig. 3b.

The classification accuracies for each class using different classifiers are provided in Table 2. In this Table, AA, OA, Ka, PA, and UA are the statistics of the confusion matrix. Table 3 lists the confusion matrix of the proposed method. AA is the mean of the eight class accuracies. OA is computed as the ratio between the correctly classified testing samples and all the testing samples. Ka coefficient is a quantitative analysis for the classification precision and degree of agreement between the classification map and the ground truth based on the confusion matrix.

Table 2 The classification accuracies for different methods in Fig. 2a

Full size table

Table 3 The confusion matrix for the proposed method in Fig. 2a

Full size table

Combining the classification maps in Fig. 3b-f with the accuracy statistics in Tables 2 and 3, we can see that according to the ground truth, some pixels of crop2 are wrongly labeled as crop1, while some pixels of crop1 are misclassified as crop2 and residential area in the classification map of the proposed method; crop1 and crop4 cannot be identified in the BPNN map; there are several colors for each object in the SVM, CART and K-means map, for example, some pixels of the bare land are misclassified as tree and crop3 in the SVM map, and some pixels of tree are wrongly labeled as residential area, crop1 and crop2 in the CART map.

From Table 2, we can observe that the proposed method has achieved the highest PA and UA, and performed better in AA, OA, and Kappa coefficient. The best classification results for different objects are achieved by the proposed method. Moreover, we can see that the proposed method performs well in many aspects. The highest accuracies are achieved for the proposed method. The OA values of the proposed method, BPNN, SVM, CART and K-means are 91.22 %, 67.17 %, 20.55 %, 10 % and 51.27 %. The Ka values are 0.8984, 0.6038, 0.0785, 0.0286 and 0.3929. The AA values are 90.28 %, 57.41 %, 10.53 %, 10 % and 47.88 %. Compared with the BPNN, SVM, CART and K-means classifier, the PA values of the proposed method for each class are increased by at least 8 %, 51.5 %, 55.75 %, and 16 % respectively, and the UA values for each class are increased by at least 3.35 %, 65 %, 14.689 %, and 11.49 % respectively. Besides, the PA values of the proposed method for all classes are increased averagely by 32.88 %, 75.75 %, 80.28 %, and 42.41 % respectively, and the UA values for all classes are increased averagely by 33.79 %, 73.34 %, 75.99 %, and 48.47 %. The best classification results of different objects are achieved for the proposed method. Table 3 shows the confusion matrix of Fig. 2a.

4.2 Experiment II for the image 2

In order to verify the stability of the proposed classification method, we select another HSR image shown of WorldView-2 in Fig. 2b. We consider the WorldView-2 true-color image with 1.8-m spatial resolution of a suburban area, which has eight bands. We adopt just three bands (Red, Green, and Blue). This image contains six main kinds of objects: 1-lake, 2-tree, 3-short bush, 4-road, 5-grass, and 6-residential area, as shown in Fig. 4a. The training and testing samples are chosen from the reference data.

We also apply the proposed method, BPNN, SVM, CART and K-means classifier to classify the image 2. The classification maps are shown in Fig. 4b-f. By comparing the classification map in Fig. 4b-f with the original image in Fig. 4a, we can see that some pixels of residential area and grass are misclassified as road in Fig. 4b; some pixels of lake, short bush and residential area are labeled as tree, some pixels of road are labeled as residential area, and some pixels of residential area are labeled as road in Fig. 4c; some pixels of lake and grass are labeled as tree, some pixels of lake and road are labeled as residential area, and some pixels of residential area are labeled as road in Fig. 4d; the obvious error is the misclassification of grass as residential area, some pixels of road are misclassified as residential area, and the lake and bush are both misclassified as tree in Fig. 4e. In Fig. 4f, the tree and residential area are obviously muddled, some pixels of bush are misclassified as the grass, some pixels of grass are misclassified as the lake, and the tree is seriously misclassified as the lake.

From the results in Tables 4 and 5, we can see that the highest accuracies are also achieved for the proposed method. The OA values of the proposed method, BPNN, SVM, CART and K-means are 92.5 %, 83.86 %, 65.71 %, 65.21 % and 66 % respectively. The Ka values are 0.9082, 0.8030, 0.5896, 0.5853 and 0.5872 respectively. The AA values are 91.88 %, 82.89 %, 64.17 %, 72.17 % and 64.08 % respectively. Compared with the BPNN, SVM, CART and K-means classifiers, the PA values of the proposed method for each class are increased by at least 1.5 %, 3.5 %, 11 % and 11.5 %, respectively. Moreover, compared with the BPNN, SVM, CART and K-means, the PA values of the proposed method for all objects are increased averagely by 8.99 %, 23.71 %, 19.71 % and 27.79 % respectively, and the UA values for all classes are increased averagely by 9.95 %, 18.42 %, 20.52 % and 33.06 % respectively. The best classification results of different objects are also obtained for the proposed method. Table 5 shows the confusion matrix of Fig. 2b.

Table 4 The classification accuracies for different methods in Fig. 2b

Full size table

Table 5 The confusion matrix for the proposed method in Fig. 2b

Full size table

Finally, we compare the different methods in terms of computational cost, which is the CPU time computed by Matlab function and used to evaluate those methods. As can be seen from Table 6 and Fig. 5, the proposed method takes about 112 s in the first dataset and about 48 s in the second dataset to train the dictionary and make a decision; the computational cost of the proposed method is almost the same as SVM, more than the method of BPNN and CART, but far less than K-means.

Table 6 Computational cost (CPU time) of the different methods in Fig. 5

Full size table

5 Conclusion and Future Work

In this paper, we tackle the problem of the classification of the HSR remote sensing image using the proposed method based on sparse representation by representing the test samples through a dictionary. The dictionary is obtained by training samples according to their classes for a sparse linear combination. We discuss the specific idea of constructing the dictionary. That is how to compute a best matrix to represent all data vectors by nearest neighbor. We also describe the algorithm used to solve for the sparse representation. We then discuss how to construct the image and how to determine the classes of the test pixels. The experimental results indicate that our method has performed better and achieved higher accuracies in the verified four real remote sensing images.

In the future work, we plan to explore more properties as feature and work toward combining the proposed approach with spectral and spatial features, both at the feature and decision levels, to improve classification accuracy. And we also need to speed up the proposed method. The end goal of this work is to detect yearly and seasonal changes in vegetation cover. Additionally, we also explore how to construct dictionaries to expand to images from the same area in different seasons, and make use of them for change detection.

References

Aguera F, Aguilar JF, Aguilar AM (2008) Using texture analysis to improve perpixel classification of very high resolution images for mapping plastic greenhouses. ISPRS J Photogramm Remote Sens 63:635–646
Article Google Scholar
Aharon M, Elad M, Bruckstein A (2006) The K-SVD: an algorithm for designing of overcomplete dictionaries for sparse representation. IEEE Trans Sign Process 54:4311–4322
Article Google Scholar
Benediktsson JA, Palmason JA, Sveinsson JR (2005) Classification of hyperspectral data from urban areas based on extended morphological profiles. IEEE Trans Geosci Remote Sens 43:480–491
Article Google Scholar
Bischof H, Schneider W, Pinz AJ (1992) Multispectral classification of Landsat-images using neural networks. IEEE Trans Geosci Remote Sens 30:482–490
Article Google Scholar
Bruzzone L, Carlin L (2006) A multilevel context-based system for classification of very high spatial resolution images. IEEE Trans Geosci Remote Sens 44:2587–2600
Article Google Scholar
Bruzzone L, Chi M, Marconcini M (2006) A novel transductive SVM for the semisupervised classification of remote sensing images. IEEE Trans Geosci Remote Sens 44:3363–3373
Article Google Scholar
Chen S, Donoho D (1994) Basis pursuit. In: IEEE Conference Record of the Twenty-Eighth Asilomar Conference on Signals, Systems and Computers 1, 41–44
Chou PA (1991) Optimal partitioning for classification and regression trees. IEEE Trans Pattern Anal Mach Intell 4:340–354
Article Google Scholar
Cotter SF, Rao BD, Engan K, Kreutz-Delgado K (2005) Sparse solutions to linear inverse problems with multiple measurement vectors. IEEE Trans Signal Process 53:2477–2488
Article MathSciNet Google Scholar
Donoho DL (2006) Compressed sensing. IEEE Trans Inf Theory 52:1289–1306
Article MathSciNet MATH Google Scholar
Dópido I, Li J, Marpu PR, Plaza A (2013) Semisupervised self-learning for hyperspectral image classification. IEEE Trans Geosci Remote Sens 51:4032–4044
Article Google Scholar
Elad M, Aharon M (2006) Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Process 15:3736–3745
Article MathSciNet Google Scholar
Fauvel M, Chanussot J, Benediktsson JA (2012) A spatial-spectral kernel-based approach for the classification of remote-sensing images. Pattern Recogn 45:381–392
Article Google Scholar
Giacinto G, Roli F (2001) Design of effective neural network ensembles for image classification processes. Image Vis Comput 19:699–707
Article Google Scholar
Goel PK, Prasher SO, Patel RM, Landry JA, Bonnell RB, Viau AA (2003) Classification of hyperspectral data by decision trees and artificial neural networks to identify weed stress and nitrogen status of corn. Comput Electron Agric 39:67–93
Article Google Scholar
Heermann PD, Khazenie N (1992) Classification of multispectral remote sensing data using a back-propagation neural network. IEEE Trans Geosci Remote Sens 30:81–88
Article Google Scholar
Huang X, Zhang L (2013) An SVM ensemble approach combining spectral, structural, and semantic features for the classification of high-resolution remotely sensed imagery. Geosci Remote Sens IEEE Trans 51(1):257–272
Article Google Scholar
Huang X, Zhang L, Li P (2007) Classification and extraction of spatial features in urban areas using high-resolution multispectral imagery. IEEE Geosci Remote Sens Lett 4:260–264
Article Google Scholar
Inglada J (2007) Automatic recognition of man-made objects in high resolution optical remote sensing images by SVM classification of geometric image features. ISPRS J Photogramm Remote Sens 62:236–248
Article Google Scholar
Jiang LH, Wang WS, Yang XR, Xie NF, Cheng YP (2011) Classification methods of remote sensing image based on decision tree technologies. Comput Comput Technol Agric 344:353–358
Google Scholar
Li J, Zhang H, Zhang L (2015) Efficient superpixel-oriented multi-task joint sparse representation classification for hyperspectral imagery. IEEE Trans Geosci Remote Sens 53(10): 5338–5351
Li J, Zhang H, Zhang L, Ma L (2015) Hyperspectral anomaly detection by the use of background joint sparse representation. IEEE J Sel Top Appl Earth Obs Remote Sens 8(6):2523–2533
Luo M, Ma YF, Zhang HJ (1992) A spatial constrained k-means approach to image segmentation. In: Information, Communications and Signal Processing, 2003 and Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint Conference of the Fourth International Conference on, 2, 738–742
Mallat SG, Zhang Z (1993) Matching pursuits with time-frequency dictionaries. IEEE Trans Signal Process 41:3397–3415
Article MATH Google Scholar
Melgani F, Bruzzone L (2004) Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans Geosci Remote Sens 42:1778–1790
Article Google Scholar
Moody DI, Brumby SP, Rowland JC, Altmann GL (2014) Land cover classification in multispectral imagery using clustering of sparse approximations over learned feature dictionaries. J Appl Remote Sens 8(1):084793–084793
Moser G, Serpico SB, Benediktsson JA (2013) Land-cover mapping by Markov modeling of spatial-contextual information in veryhigh-resolution remote sensing images. Proc IEEE 101(3):631–651
Article Google Scholar
Olshausen BA (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381:607–609
Article Google Scholar
Ouma OY, Tateishi R (2008) Urban-trees extraction from QuickBird imagery using multiscale spectex-filtering and non-parametric classification. ISPRS J Photogramm Remote Sens 63:333–351
Article Google Scholar
Palmason JA, Benediktsson JA, Sveinsson JR, Chanussot J (2005) Classification of hyperspectral data from urban areas using morphological preprocessing and independent component analysis. In: Proc. 2005 Int. Conf. Geoscience and Remote Sensing Symp. (IGARSS), pp 25–29
Paola JD, Schowengerdt RA (1995) A detailed comparison of backpropagation neural network and maximum-likelihood classifiers for urban land use classification. IEEE Trans Geosci Remote Sens 33(4):981–996
Article Google Scholar
Pingel JT, Clarke CK, MaBride AW (2013) An improved simple morphological filter for the terrain classification of airborne LIDAR data. ISPRS J Photogramm Remote Sens 77:21–30
Article Google Scholar
Rakotomamonjy A (2011) Surveying and comparing simultaneous sparse approximation (or group-lasso) algorithms. Signal Process 91:1505–1526
Article MATH Google Scholar
Ray S, Turi RH (1999) Determination of number of clusters in k-means clustering and application in colour image segmentation. In: Proceedings of the 4th international conference on advances in pattern recognition and digital techniques, pp 137–143
Reis S, Tasdemir K (2011) Identification of hazelnut fields using spectral and Gabor textural features. ISPRS J Photogramm Remote Sens 66:652–661
Article Google Scholar
Rubinstein R, Bruckstein AM, Elad M (2010) Dictionaries for sparse representation modeling. Proc IEEE 98:1045–1057
Article Google Scholar
Stoeva S, Nikov A (2000) A fuzzy backpropagation algorithm. Fuzzy Sets Syst 112(1):27–39
Article MathSciNet MATH Google Scholar
Tropp J, Gilbert AC (2007) Signal recovery from random measurements via orthogonal matching pursuit. Inf Theory IEEE Trans 53(12):4655–4666
Article MathSciNet MATH Google Scholar
Tropp JA, Gilbert AC, Strauss MJ (2006) Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit. Signal Process Spec Issue Sparse Approximations Signal Image Process 86:572–588
MATH Google Scholar
Tropp JA, Wright SJ (2010) Computational methods for sparse solution of linear inverse problems. Proc IEEE 98:948–958
Article Google Scholar
Tuia D, Jordi MM, Kanevski M, Camps G (2011) Structured output SVM for remote sensing image classification. J Sign Process Syst 65:301–310
Article Google Scholar
Tuia D, Ratle F, Pozdnoukhov A, Camps-Valls G (2010) Multisource composite kernels for urban-image classification. IEEE Geosci Remote Sens Lett 7:88–92
Article Google Scholar
Yang J, Su M, Yu P (2010) A novel K-nearest neighbor classifier based on adaptive metric formed by features extracted by nonparametric feature extraction mode. Int J Adv Inf Technol 4(2):89–103
Google Scholar
Yu X, Cao T, Yang C, Chen H, Wu S (2009) Remote sensing image classification based on sparse component analysis. Prog Geophys 24:2274–2279
Google Scholar
Yu XC, Dai S, Hu D, Jiang QY (2011) HHFNN based on lasso function and its application remote sensing image classification. Chin J Geophys 54(6):1672–1678
Google Scholar
Zhang H, Li J (2014) A nonlocal weighted joint sparse representation classification method for hyperspectral imagery. IEEE J Sel Top Appl Earth Obs Remote Sens 7:2056–2065
Article Google Scholar

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61163042 61503235 and 41272359), and funded by Key Discipline (Cartography and Geographic Information System of Hainan Normal University).

Author information

Authors and Affiliations

College of Information Science and Technology, Hainan University, No. 58 Renming Road, Haikou, 570228, China
Shulei Wu & Yong Bai
College of Information Science and Technology, Hainan Normal University, No. 99 South Longkun Road, Haikou, 571158, China
Shulei Wu & Huandong Chen
College of Computer Science and Technology, Shanghai University of Electric Power, No. 2588 Changyang Road, Yangpu District, Shanghai, 200090, China
Guokang Zhu

Authors

Shulei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Huandong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yong Bai
View author publications
You can also search for this author in PubMed Google Scholar
Guokang Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Bai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, S., Chen, H., Bai, Y. et al. A remote sensing image classification method based on sparse representation. Multimed Tools Appl 75, 12137–12154 (2016). https://doi.org/10.1007/s11042-016-3320-7

Download citation

Received: 21 July 2015
Revised: 19 January 2016
Accepted: 27 January 2016
Published: 20 February 2016
Issue Date: October 2016
DOI: https://doi.org/10.1007/s11042-016-3320-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A remote sensing image classification method based on sparse representation

Abstract

Similar content being viewed by others

Classification Modeling of Multi-Featured Remote Sensing Images Based on Sparse Representation

Hyperspectral Image Classification Using a New Dictionary Learning Approach with Structured Sparse Representation

Hyperspectral image classification based on spatial and spectral features and sparse representation

1 Introduction

2 Related Work