1 Introduction

Hyperspectral sensors capture more than one hundred spectral bands which provide rich spectral information regarding the physical nature of different materials. For instance, the Airborne Visible-Infrared Imaging Spectrometer (AVRIS) system can capture 224 spectral channels with a spectral resolution of around 10 nm, covering the wavelength from 0.4 to 2.5 \(\upmu \)m. The wide spectral coverage and fine resolution of the hyperspectral data provide the capability to distinguish objects in the image more accurately. However, it also presents challenges to image classification due to the high dimensionality of the data. Specifically, traditional image processing tools for the analysis of gray-level or color images may be not appropriate for hyperspectral images. For instance, Hughes phenomenon may be produced for classification due to the well known curse of dimensionality, which means that the accuracy of classification algorithms may decrease significantly while the dimensionality of the data increases [1]. In order to make full use of the rich spectral information provided by the high spectral dimension, many different hyperspectral image classification algorithms have been developed in recent years [2].

In the first beginning, many algorithms were designed to classify each pixel of the hyperspectral image based on its spectrum only [2]. These methods are known as pixel-wise methods which can be divided into two categories: spectral feature extraction and spectral classification. The spectral feature extraction process only aims at reducing the spectral dimension of the data using linear and nonlinear transformations such as principal component analysis (PCA) [3] and independent component analysis (ICA) [4]. In addition to spectral feature extraction, spectral classification methods such as Bayesian estimation techniques [5], neural networks [6], decision trees [7], and genetic algorithms [8] also have been investigated to learn the class distributions in high-dimensional spaces by inferring the no-linear boundaries between classes in feature space. Among these methods, support vector machines [9] has shown robust classification performances when a limited number of training samples is available.

In recent years, it is found that the integration of spectral and spatial information in the image analysis can further improve the classification results. Specifically, a hyperspectral pixel is classified based on both the feature vector of this pixel and feature values extracted from the pixel’s neighborhood. Morphological filters [10, 11] and other types of local filtering approaches [1215] have been investigated to develop novel spatial feature extraction and classification methods. Zhang et al. [16, 17] investigated several frameworks which aim at combing multiple features to improve the classification accuracy. These methods have been demonstrated to show promising results in terms of classification accuracies. However, local processing techniques such as the recently proposed edge-preserving filtering based method [15] only consider the local neighborhoods. Although the local neighborhoods can be defined using different scales of filtering operations, this kind of methods cannot make full use of the deep and global spatial correlations among hyperspectral pixels.

Another approach which can make full of the spatial information is based on image segmentation [18]. Segmentation based classification usually consists of the following two steps: First, the hyperspectral image is segmented into non-overlapping homogeneous regions. Then, the classification result is obtained based on the pixel-wise classification, followed by major voting within the segmented regions. To make this approach applicable, accurate and automatic hyperspectral image segmentation is required. Different techniques have been successfully applied for hyperspectral image segmentation, such as watershed [19], partitional clustering [20], and hierarchical segmentation [21]. Although these approaches can usually lead to an improvement of classification accuracy, the segmentation algorithm may be time consuming.

In this paper, a novel spectral–spatial hyperspectral image classification method is introduced based on KNN searching in a novel feature space [22]. The main contributions of the paper are twofold: The first contribution is the extension of KNN searching for the non-local filtering of images which can make full use of the spatial correlation among adjacent pixels. The second contribution is the extension of the KNN based filtering algorithm to spectral–spatial hyperspectral image classification. Specifically, the KNN based filtering algorithm is used to refine the initial probability maps obtained by pixel-wise classifier. The resulting classification map is obtained by assigning each pixel with the label which gives the highest probability. This probability optimization based scheme is similar to our previous works which optimize the probabilities by using edge-preserving filtering [15] and extended random walkers [23] (a global optimization method). In comparison with the two methods, the major advantage of the proposed KNN method is that it can make full use of the non-local spatial information of the hyperspectral image while does not need to solve a global energy optimization problem. In this work, it is shown that such a KNN based non-local filtering scheme is able to improve the classification accuracies effectively. Experiments performed on two real hyperspectral data sets demonstrate the effectiveness of the proposed method.

The rest of this paper is organized as follows. Section 2 describes the proposed KNN based image filtering algorithm. Section 3 introduces the proposed KNN based spectral-spatial classification method in detail. Section 4 gives the results and discussions. Finally, conclusions are given in Sect. 5.

2 KNN Based Non-local Image Filtering

The k-nearest-neighbor (KNN) classifier is one of the simplest and most widely used nonparametric classification methods. Although it has been successfully used for hyperspectral image classification, the KNN is usually utilized as a pixel-wise classifier in these researches which rely heavily on the optimal distance metric and feature space [2426]. Different from these works, in this paper, KNN is used to search similar non-local pixels for image filtering rather than to achieve a direct classification of each pixel.

Rather than searching the nearest neighbors in the pixel value domain, the nonlocal principle can be implemented by computing the K nearest neighbors in the feature space which includes both pixel value and spatial coordinates. Specifically, the feature vector F(i) is defined as follows:

$$\begin{aligned} F(i)=(I(i),\lambda {}\cdot {}l(i),\lambda {}\cdot {}h(i)); \end{aligned}$$
(1)

where I(i) refers to the normalized pixel value, l(i) and h(i) refer to the normalized longitude and latitude of pixel i, respectively. \(\lambda \) controls the balance between pixel value and spatial coordinate in the KNN searching process. In order to do the KNN searching efficiently, the Fast Library for Approximate Nearest Neighbors (FLANN) is adopted to compute the K nearest neighbors in the defined feature space [27].

Generally, the KNN filtering process involves a guidance image I, an input image P, and an output image O. Both I and P are given beforehand according to the application. As shown in Fig.  1, P is one of the probability maps estimated with Support Vector Machines (SVM). I is the first principal component of the hyperspectral image. Given I and P, the KNN based nonlocal filtering method can be defined as follows:

$$\begin{aligned} O(i)=\frac{\sum {P(j)}}{K}, j\in \omega _{i}; \end{aligned}$$
(2)

where \(\omega _{i}\) refers to the K nearest neighbors of pixel i found in the feature space F(i) defined in (1), O(i) is the filtering output. As shown in Fig.  1, when \(\lambda =0\), the spatial distances between different pixels are not considered in the filtering operation, and thus, the KNN filtering cannot effectively transfer the spatial structures of I to P. By contrast, through modeling the spatial coordinates and pixel value in the same feature space, the spatial structures of the guidance image can be used to refine the boundaries in the input image (see Fig.  1d). This property makes it possible to apply the KNN filtering for spectral–spatial hyperspectral image classification.

Fig. 1
figure 1

An example of KNN filtering. a Input image P, b guidance image I, c filtering result O with \(\lambda =0\), d filtering result O with \(\lambda =3\)

3 Spectral–Spatial Hyperspectral Image Classification with KNN

In this section,the information about spatial structures defined by the KNN filtering algorithm mentioned above is used to improve the results of classification of a hyperspectral image. A probability optimization based spectral–spatial classification scheme is adopted here for hyperspectral images based on KNN filtering. The schematic of the proposed classification method is given in Fig.  2. Specifically, the proposed method consists of the following steps:

Fig. 2
figure 2

A schematic of the proposed KNN based spectral–spatial hyperspectral image classification method

  1. 1.

    SVM classification: Given a d-dimensional hyperspectral image \({\mathbf {x}}=({\mathbf {x}}_1,\ldots ,{\mathbf {x}}_i)\in {{\mathbb {R}}^{d\times {i}}}\) and \(\tau \) training samples \(T_\tau \equiv \left\{ ({\mathbf {x}}_1,c_1),\ldots ,({\mathbf {x}}_\tau ,c_\tau )\right\} \in ({\mathbb {R}}^{d}\times {{\mathcal {L}}_C})^\tau \), a pixel-wise classification is performed on the hyperspectral image with pixel-wise SVM [28], where \({\mathcal {L}}_C=\left\{ 1,\ldots ,N\right\} \) be a set of labels and N is the number of classes in the hyperspectral image.

  2. 2.

    KNN filtering: In this step, the principal component analysis method is first adopted to compute a one band representation of the hyperspectral image, i.e., I. Here, the first principal component is adopted to be the guidance image because it gives an optimal representation of the hyperspectral image in the mean squared sense, and thus, contains most of the salience information in the hyperspectral image (see Fig. 2). Then, based on (2), the proposed KNN filtering is performed on the the initial probability map \(P_n\) for each class n, with I serving as the guidance image. Finally, the classification result can be easily obtained by assigning each pixel with the label which gives the highest probability.

4 Experimental Results and Discussion

4.1 Experiments Performed on the Indian Pines Image

4.1.1 Data Set

In this experiment, the proposed classification algorithm is tested on a hyperspectral image of a rural area (the Indian Pines image). The Indian Pines image was collected by the AVIRIS sensor over the Indian Pines region in Northwestern Indiana in 1992. This scene, with a size of 145 by 145 pixels, and a spatial resolution of 20 m per pixel, was acquired over a mixed agricultural area. It is composed of 220 spectral channels in the wavelength range from 0.4 to 2.5 \(\upmu \)m. Before classification, some spectral bands (no. 104–108, 150–163, and 220) were removed from the data set due to noise and water absorption, leaving a total of 200 spectral channels to be used in the following experiments. The Indian Pines data contains 16 classes, which are detailed in Table 1. For illustrative purposes, Fig. 3 shows the three band false color composite and the ground-truth map available for the scene. In order to test classification performances, 10 % of the samples were randomly selected from the reference data as training samples in our experiments.

Fig. 3
figure 3

Indian Pines data set: a three-band color composite; b ground-truth classification map; c Color code of different classes

Fig. 4
figure 4

Classification results obtained with a \(\lambda =0\) (OA = 71.79 %), b \(\lambda =1\) (OA = 94.29 %), c \(\lambda =5\) (OA = 96.24 %), d \(\lambda =100\) (OA = 95.44 %), e \(K=5\) (OA = 89.82 %), f \(K=10\) (OA = 92.61 %), g \(K=40\) (OA = 96.24 %), and h \(K=400\) (OA = 86.94 %) [K is fixed as 40 for (ad), and \(\lambda \) is fixed as 5 for (eh)]

4.1.2 Influence of Parameters to the Classification Performance

In this experiment, the influence of the two parameters \(\lambda \) and K to the performance of the proposed classification algorithm is analyzed. Figure 4 shows the classification maps and overall accuracies obtained by the proposed algorithm with different values of \(\lambda \) and K. From Fig.  4a, it can be seen that the result contains serious “noise”and the corresponding accuracy is quite low (OA = 71.19 %) when the coefficient \(\lambda \) is set to be 0. The reason is that the spatial coordinates are not considered in the KNN filtering operation when \(\lambda =0\). Furthermore, the classification result tends to be oversmoothed and the corresponding accuracy decreases when \(\lambda \) is very large. It means that pixel value and spatial coordinates are both important factors for the improvement of classification accuracy. The parameter K has similar influences to the classification result. For example, when K is quite large, the proposed filtering method may lead to the oversmooth of classification result, and thus, decreases the accuracy dramatically (OA = 86.94 %). However, when K is relatively small, it means that only a small number of non-local pixels are considered in the averaging operation. In this situation, the classification accuracy also cannot be effectively improved (OA = 89.82 %). In this paper, \(K=40\) and \(\lambda =5\) are set to be the default parameters which gives the best performance in this experiment (OA = 96.24 %). In order to ensure the optimal of the two parameters, an adaptive setting scheme will be researched in the future.

4.1.3 Comparison of Different Classification Methods

Fig. 5
figure 5

Indian Pines data set: ae classification map for the SVM method [28] (OA = 81.72 %), the EMP method [11] (OA = 91.83 %), the LMLL method [29] (OA = 92.88 %), the LBP method [30] (OA = 92.17 %), the EPF method [15] (OA = 95.48 %) and the proposed KNN method (OA = 96.23 %)

Table 1 Class names, number of training and test samples, global and class-specific classification accuracies in percentage for the Indian Pines image

Figure 5a–e shows the classification results obtained with the Support Vector Machines method (SVM) [28], the Extended Morphological Profiles method (EMP) [11], the logistic regression and multilevel logistic method (LMLL) [29], the loopy belief propagation method (LBP) method [30], the edge-preserving filtering method (EPF) [15], and the proposed K nearest neighbors based method (KNN), respectively. The SVM classification is performed with the Gaussian radial basis function (RBF) kernel, using the LIBSVM library [28]. The optimal parameters \(C \) and \(\gamma \) were determined by fivefold cross validation. The default parameters given in [15, 29, 30] are adopted for the LMLL, LBP and EPF methods. As shown in this figure, all spectral–spatial methods can reduce significantly the noise in the classification map, resulting in more homogeneous and meaningful regions in the classification map. For example, with the proposed KNN method, the classification result of pixel-wise SVM can be improved significantly, because of the noise reduction. In order to evaluate the improvement more objectively, the number of training and test samples, and the global and individual classification accuracies of different classification methods are presented in Table  1. Three measures of accuracy are used: (1) Overall accuracy (OA) which measures the percentage of correctly classified pixels; (2) Average accuracy (AA) which measures the mean of the percentage of correctly classified pixels for each class; (3) Kappa coefficient (kappa) which measures the percentage of agreement (correctly classified pixels) corrected by the number of agreements that would be expected purely by chance. Furthermore, the accuracies are calculated as an average after 10 repeated experiments. Table 1 shows that the proposed method can effectively improve the OA, AA, and Kappa of SVM. Furthermore, the individual classification accuracies are also improved by the proposed KNN method for almost all of the classes. For example, the accuracy of the corn-no till class has been improved from 75.01 to 100 %. Compared with the EMP, LMLL method, LBP method, and the EPF methods, the proposed KNN method gives a comparable performances for OA, AA, and Kappa. It means that the proposed KNN method can effectively improve classification accuracy.

4.2 Experiments Performed on the Botswana Image

4.2.1 Data Set

In this experiment, the proposed classification algorithm is tested on a hyperspectral image of a woodlands area (the Botswana image). The Botswana image was collected by the NASA EO-1 satellite over Okavango Delta, Botswana in May 31, 2001. This scene, with a size of 1476 by 256 pixels, and a spatial resolution of 30m per pixel, was acquired to study the impact of flooding on vegetation in this area. It is composed of 242 spectral channels in the wavelength range from 0.4 to 2.5 \(\upmu \)m. Before classification, uncalibrated and noisy bands that cover water absorption features were removed, and the remaining 145 bands were included as candidate spectral features: [10–55, 82–97, 102–119, 134–164, 187–220]. The Botswana data consists of 14 identified classes representing the land cover types in seasonal swamps, occasional swamps, and drier woodlands, which are detailed in Table 1. For illustrative purposes, Fig. 6 shows the three band false color composite and the ground-truth map available for the study area. Similar to the experiments performed on the Indian Pines data set, 10 % of the samples were randomly selected from the reference data as training samples in our experiments.

Fig. 6
figure 6

Botswana data set: a three-band color composite; b ground-truth classification map; c color code of different classes

Fig. 7
figure 7

Botswana data set: ae classification map for the SVM method [28] (OA = 91.76 %), the EMP method [11] (OA = 96.34 %), the LMLL method [29] (OA = 97.37 %), the LBP method [30] (OA = 97.11 %), the EPF method [15] (OA = 97.13 %) and the proposed KNN method (OA = 98.81 %)

4.2.2 Comparison of Different Classification Methods

Figure 7a–e shows the classification results obtained by the SVM, EMP, LMLL, LBP, EPF, and KNN methods, respectively. Furthermore, the number of training and test samples, and the global and individual classification accuracies for different methods are presented in Table 2. The accuracies are calculated as an average after 10 repeated experiments. It can be seen that the proposed method can effectively improve the OA, AA, and Kappa of SVM. Moreover, the KNN method shows the best classification performance in terms of OA, AA, and Kappa.

Table 2 Class names, number of training and test samples, global and class-specific classification accuracies in percentage for the Botswana image

5 Conclusions

Although hyperspectral imaging provides rich spectral information, increasing the capability to distinguish different objects in a scene, the large number of spectral channels presents challenges to image classification. Instead of processing each pixel independently without considering information about spatial structures, the proposed KNN based image filtering algorithm can incorporate spatial information into classifier, and thus, the pixel-wise classification accuracies can be improved significantly, especially in areas where structural information is important to distinguish between classes. The proposed method has two main contributions: First, a simple yet effective feature vector construction methodology combining the values and spatial coordinates of different pixels is applied for the joint filtering of images. Second, the proposed KNN based filtering algorithm is applied for spectral–spatial hyperspectral image classification. In the experiments, it was shown that the proposed spectral–spatial classifier can lead to competitive classification accuracies when compared to other previously proposed spectral-spatial classification techniques. In conclusion, the proposed KNN based classification method succeeded in taking advantage of spatial and spectral information simultaneously.