1 Introduction

Remote sensing image classification has become one of the hot topic in recent twenty years that is because the image contains much useful information, which plays an important role in the development of society and economy. Remote sensing image combines imaging techniques with the spectral techniques togethers, are widely used in civil and military areas. However, remote sensing image classification is a complex process that may be affected by many factors. For instance, the availability of high quality images, proper classification approach, and the analytical ability of researchers. For a particular study, it is often difficult to identify the best classifier due to the lack of a guideline for selection and the availability of suitable classification models to band. Hence, many scientists have made great efforts [110] to improve classification accuracy. In many reports, supervised, semi-supervised and unsupervised are the three popular leaning methods for remote sensing image classification, such as, maximum-likelihood classifiers, neural networks and neurofuzzy models [1113]. However, there is an important hughes phenomenon [14] for hyper-spectral images. So, it needs a long time to deal with the high dimensional data.

The purpose of classification is to estimate the different species of each geographic region in remote sensing image. It is usually formulated as a segmentation task where an appearance model is first used to filter the pixel and then threshold setting strategies are utilized to infer the affiliation of the pixel in current frame. Hence, how to effectively model the appearance of the target region and how to accurately infer the affiliation from all ground-based ancillary data are two main steps for a successful classification system. Although a variety of classification algorithms have been proposed, remote sensing image classification still cannot meet the requirements of most practical applications. Many elegant features in the field of pattern recognition can be used to discriminate the category from image and ancillary data. However, the extraction of useful information and modeling are very difficult to achieve because the remote sensing image exhibits more complex spectral character, high-dimensional data, and the selection of band. Hence, traditional models to achieve robust classification are not always feasible.

In recent years, Quan et al. [15] proposed a multiscale segmentation method, which is combining the probabilistic neural network (PNN) with the multiscale autoregressive model. A hybird classifier was proposed by Zhang et al. [16] for polarimetric synthetic aperture radar (SAR) images. Besides, graph-based learning method is becoming an emergent research topic in image classification, some related research can be found in [1720]. In addition, multiview learning approach based image annotation also becoming one of the hot topic in image processing [21, 22]. However, for hyper-spectral remote image classification, the challenge is to develop approaches that are powerful enough to make use of the intricate details. While the growing number of spectral channels enables discrimination among a large number of cover classes, lots of traditional algorithms fail on these data because of mathematical or practical limitations. For instance, the maximum likelihood and other covariance-based classifiers require, as many training samples each class as the number of bands plus one, which produces a severe problem of field sampling for multi-bands hyper-spectral image. In high dimensional data analysis, such as hyper-spectral remote sensing image, the dimension reduction process plays an important role in all supervised or unsupervised classification approaches which require the estimation of second order statistics. The aim of dimension reduction process is to map a set of high dimensional data into a low dimensional space while preserving the intrinsic structure of that data. The most famous dimension reduction methods have been reported in the literature, such as principal component analysis (PCA) and its generalized kernel PCA [23, 24], locally linear embedding (LLE) [25] and spatial and spectral oriented dimension reduction (SASO-DR) [26]. However, this operation of dimension reduction can result in an undesirable loss of information. In addition, many researchers believe that the kernel trick is one of the best tools for solving high dimensional data classification [2730] or pattern recognition problem [3133]. Actually, the kernel trick [34] is applied to project the original data into a feature space in which the data become linearly (or approximate linearly) separable. Fauvel et al. [27] proposed a spatial-spectral kernel-based approach with the spatial and spectral information were jointly used for the classification. A kernel-based block matrix decomposition approach for the remote sensing image classification can be found in [35].

The main goal of this paper is to establish a new and efficient classification model applicable in remote sensing image processing by using mathematical theory. The model is a three-step process with adjustment of parameters. (1) The information of same areas associated with each pixel is modeled as the within-class set which generates within-class scatter matrix Sw, at the same time the information of different areas associated with mean pixel of each same areas is modeled as the between-class set which generates between-class scatter matrix Sb. (2) A projection matrix \(\mathbf W \;(\mathbf{W }=[\alpha \cdot \mathbf{W1 },(1-\alpha )\cdot \mathbf{W2 }])\) can be obtained by solving an optimal problem with using fisher linear discriminant analysis (FLDA) approach in the null space of Sw and range space of Sw. Where \(\alpha \; (0\le \alpha \le 1)\) is a non-negative parameter constant to balance the null space of Sw and range space of Sw. (3) The projection matrix W was used to project the original data into a new low-dimension feature space, and then support vector machine (SVM) (or KNN) classifier was used in the last step. The advantage of the proposed model is that the spatial information of remote sensing image can be fully used in classification processing. So, we abbreviate that by calling it parameterized null-range-Sw.

The remainder of this paper is organized as follows. Section 2 briefly reviews the formulations of FLDA, KNN and SVM. In Sect. 3, the derivation process of the proposed model is described in detail. The effectiveness of the model is demonstrated in Sect. 4 by experiments on several real remotely sensed images. Finally, Sect. 5 concludes this paper.

2 Review of FLDA, KNN and SVM

2.1 Fisher Linear Discriminant Analysis (FLDA)

The main idea of FLDA is to perform dimension reduction while preserving as much information as possible. Linear discriminant analysis (LDA) aims to find the optimal projection matrix such that the class structure of the original high-dimensional space is preserved in the low-dimensional space. However, the FLDA cannot be directly applied because the Sw has zero eigenvalues. Hence, many methods are proposed to solve it, such as LDA/QR [36], null subspace method [37], range subspace method [38] and based-median method [39, 40].

In this subsection, we first introduce some important notations used in this paper. Let c be the number of classes, \(N_i\) be the number of samples from \(i\mathrm{th}\) class, N be the number of total samples from each class, \(A_j^i\) be the \(j\mathrm{th}\) sample from \(i\mathrm{th}\) class and \(m_i\) be the mean of \(i\mathrm{th}\) class samples.

$$\begin{aligned} N= & {} \sum _{i=1}^{c}N_i, \end{aligned}$$
(1)
$$\begin{aligned} m_i= & {} \frac{1}{N}\sum _{j=1}^{N_i}A_j^i,\quad (i=1, \ldots , c). \end{aligned}$$
(2)

The optimal projection matrix \(W=[w_1,w_2,\ldots ,w_r]\) can be obtained via maximizing the following criterion [41]. Where r is at most min\((c-1,\; N)\).

$$\begin{aligned} J(W)=\displaystyle \frac{W^T\mathbf{Sb }W}{W^T\mathbf{Sw }W}, \end{aligned}$$
(3)

where, Sb and Sw are the between-class and within-class scatter matrices, respectively. \(m_0\) is the global mean of all classes samples.

$$\begin{aligned} \mathbf{Sb }= & {} \sum _{i=1}^{c}(m_i-m_0)^T(m_i-m_0), \end{aligned}$$
(4)
$$\begin{aligned} \mathbf{Sw }= & {} \sum _{i=1}^{c}\sum \limits _{j=1}^{N_i}(A_j^i-m_i)^T(A_j^i-m_i), \end{aligned}$$
(5)
$$\begin{aligned} m_0= & {} \displaystyle \frac{1}{c}\sum _{i=1}^{c}m_i. \end{aligned}$$
(6)

Pan et al. [42] gave a spectral regression discriminant analysis for hyperspectral image classification. Ghosh et al. [43] proposed a context-sensitive technique for unsupervised change detection in multitemporal remote sensing images. Bandos et al. [44] analyzed the classification of hyperspectral remote sensing image with LDA in the presence of a small ratio between the number of training samples and the number of spectral features.

2.2 K-Nearest Neighbor (KNN)

K-nearest neighbor is a nonparametric approach for classification. It does not require the priori knowledge such as priori probabilities and the conditional probabilities. It operates directly towards the samples and is categorized as an instance-based classification method. Details can be found in the [45, 46].

2.3 Support Vector Machine (SVM)

In this subsection, we briefly review the support vector machine. Given a labeled training set \(\{(x_1,y_1), \ldots , (x_N,y_N)\}\), where \(x_i\in {\mathbb {R}}^{d}\) and \(y_i\in \{-1,+1\}\), and \(\Phi (\cdot )\) is a nonlinear mapping. Generally speaking, to a high-dimension space, \(\Phi : {\mathbb {R}}^{d}\rightarrow {\mathbb {H}}\), the SVM algorithm solves

$$\begin{aligned} \min \limits _{w, \xi _i, b}\left\{ \displaystyle \frac{1}{2}\Vert w\Vert ^2+C\sum _{i}\xi _i\right\} , \end{aligned}$$
(7)

constrained to

$$\begin{aligned}&y_i(\langle \Phi (x_i),w\rangle +b)\ge 1-\xi _i,\quad \forall i=1, 2, \ldots , N. \end{aligned}$$
(8)
$$\begin{aligned}&\xi _i\ge 0,\quad \forall i=1, 2, \ldots , N. \end{aligned}$$
(9)

where w and b define a linear classifier in the feature space. According to the Cover’s theorem [47], the nonlinear mapping function \(\Phi \) is used in the transformed samples feature space. The parameter C controls the generalization capabilities of the classifier and it must be selected by the user, and \(\xi _i\) are positive slack variables enabling to deal with permitted errors.

In mathematics, the primal problem (7) is solved through its Lagrangian dual problem (10) because of the high-dimension of vector w.

$$\begin{aligned} \max \limits _{\alpha _i}\left\{ \sum _{i}\alpha _i-\displaystyle \frac{1}{2} \sum _{i,j}\alpha _i\alpha _jy_iy_j\langle \Phi (x_i),\Phi (x_j)\rangle \right\} , \end{aligned}$$
(10)

constrained to \(0\le \alpha _i\le C\) and \(\sum _{i}\alpha _iy_i=0, i=1, 2, \ldots , N\). Where auxiliary variables \(\alpha _i\) are Lagrange multipliers corresponding to constraints in (8). All \(\Phi \) mappings are performed in the form of inner products. So, a kernel function K needs to be defined as (11).

$$\begin{aligned} K(x_i, x_j)=\langle \Phi (x_i),\Phi (x_j)\rangle . \end{aligned}$$
(11)

By introducing (11) into (10), and then solving the dual problem, we can obtain the solution \(w=\sum \nolimits _{i=1}^{N}y_i\alpha _i\Phi (x_i)\). For any test vector x, the decision function can be obtained as below.

$$\begin{aligned} f(x)=sgn\left( \sum _{i=1}^{N}y_i\alpha _iK(x_i, x)+b\right) , \end{aligned}$$
(12)

where b can be easily obtained from the \(\alpha _i\) that are neither 0 nor C. Details can be found in [48]. The SVM technique is widely used in remote sensing image can be found in [2729, 4952].

3 Proposed Parameterized Null-Range-Sw (P-NRSw) Model

3.1 P-NRSw Model

In this subsection, our motivation came from the reserve of several details of remote sensing image. In our proposed model, we treat the same class areas and different class areas of remote sensing image as the within-class scatter matrix (Sw) and between-class scatter matrix (Sb), respectively. To distinguish each area of remote sensing image, we hope the differences as small as possible from the same areas, on the contrary, the differences as large as possible from the different areas. Meanwhile, inspired and motivated by the idea of FLDA, we proposed P-NRSw model to deal with the same and different areas of remote sensing image.

From a mathematical theory point of view, the proposed P-NRSw model consists of three subsequent steps. Firstly, the P-NRSw model projects the original space onto the null space of Sw using an orthogonal basis of null (Sw), and then in the projected space, a transformation that maximizes the between-class scatter is computed (generate project matrix W1). At the same time, the P-NRSw model projects the original space onto the range space of Sw by using a basis of range (Sw), and then in the transformed space the maximization of between-class scatter is pursued (generate project matrix W2). Secondly, select the appropriate parameters \(\alpha \; (0\le \alpha \le 1)\) while the project matrix is constructed as \(\mathbf{W }=[\alpha \cdot \mathbf{W1 },(1-\alpha )\cdot \mathbf{W2 }]\). Where \(\alpha \) is a non-negative parameter constant to balance the null space of Sw and range space of Sw. Finally, the projection matrix W was used to project the original data into a new low-dimension feature space, and then SVM (or KNN) classifier was used in the last step. The details can be found in the followings:

In the following, first a brief description of some important notations are given in this section. For convenience, we list these notations in Table 1.

Table 1 Notation description

Consider a original multi-band remote sensing image with N samples (different class areas) where each sample contains d bands. By this assumption, a \(N\times d\) matrix X can be constructed which has the components \(x_1\) to \(x_N\), where each component is consists of the bands of remote sensing image. Therefore, the dataset that consists of N samples \(\{(x_i,y_i)\}_{i=1}^N\), where \(x_i\in {\mathbb {R}}^{d}\), and \(y_i\in \{1, 2, \ldots , c\}\) denotes the class label of the ith sample, N is the sample size, d is the data dimensionality, and c is the number of classes. Let the data matrix \(X=[x_1, x_2, \ldots , x_N]\) be partitioned into c classes as \(X=[X_1, X_2, \ldots , X_c]\), where \(X_i\in {\mathbb {R}}^{d \times {n_i}}\), \(n_i\) is the size of the ith class \(X_i\) and \(\sum _{i=1}^cn_i=N\). Define matrices \(H_w\) and \(H_b\) as follows:

$$\begin{aligned} H_w= & {} \frac{1}{\sqrt{N}}\left[ X_1-m_1e_1^T, \ldots , X_c-m_ce_c^T\right] , \end{aligned}$$
(13)
$$\begin{aligned} H_b= & {} \frac{1}{\sqrt{N}}\left[ \sqrt{n_1}(m_1-m_0), \ldots , \sqrt{n_c}(m_c-m_0)\right] , \end{aligned}$$
(14)

where \(m_i\) is the centroid of the ith class, \(m_0\) is the global centroid, \(e_i\) is the vector of all ones of length \(n_i\). \(H_w\in {\mathbb {R}}^{d \times N}\), \(H_b\in {\mathbb {R}}^{d \times c}\). Then Sw and Sb can be expressed as follows:

$$\begin{aligned} \mathbf{Sw }= & {} H_wH_w^T, \end{aligned}$$
(15)
$$\begin{aligned} \mathbf{Sb }= & {} H_bH_b^T. \end{aligned}$$
(16)

Chen et al. [37] finds solution vectors in null (Sw) and Yu and Yang [38] restricts the space to range (Sb). However, these methods may ignore this fact that the solution vectors may from two different spaces. Hence, the proposed P-NRSw model to obtain solution vectors in both spaces. Considering the singular value decomposition (SVD) of \(\mathbf{Sw }\in {\mathbb {R}}^{d\times d}\).

$$\begin{aligned} \mathbf{Sw }=U_w\Sigma _wU_w^T=[\underset{s}{\underbrace{U_{w1}}} \underset{d-s}{\underbrace{U_{w2}}}]\left[ \begin{array}{l@{\quad }l} \Sigma _{w1}&{}0\\ 0&{}0 \end{array}\right] \left[ \begin{array}{c@{\quad }c} U_{w1}^T\\ U_{w2}^T \end{array}\right] , \end{aligned}$$
(17)

where \(s= \)rank \((\mathbf{Sw })\), null \((\mathbf{Sw })=\hbox {span}(U_{w2})\), \(U_w\in {\mathbb {R}}^{d \times d}\), \(U_{w1}\in {\mathbb {R}}^{d \times s}\), \(U_{w2}\in {\mathbb {R}}^{d \times (d-s)}\). In the transformed space by \(U_{w2}\), let the between-class scatter matrix be \(\widetilde{S}_b=U_{w2}^T\mathbf{Sb }U_{w2}\), \(\widetilde{S}_b\in {\mathbb {R}}^{(d-s) \times (d-s)}\). Then the basis of range \((\widetilde{S}_b)\) can be found by the SVD of \(\widetilde{S}_b\) as (18).

$$\begin{aligned} \widetilde{S}_b=\widetilde{U}_b\widetilde{\Sigma }_b\widetilde{U}_b^T =[\underset{r1}{\underbrace{\widetilde{U}_{b1}}} \underset{d-s-r1}{\underbrace{\widetilde{U}_{b2}}}]\left[ \begin{array}{l@{\quad }l} \widetilde{\Sigma }_{b1}&{}0\\ 0&{}0 \end{array}\right] \left[ \begin{array}{c@{\quad }c} \widetilde{U}_{b1}^T\\ \widetilde{U}_{b2}^T \end{array}\right] , \end{aligned}$$
(18)

where \(r1=\)rank \((\widetilde{S}_b)\), \(\widetilde{U}_b\in {\mathbb {R}}^{(d-s) \times (d-s)}\), \(\widetilde{U}_{b1}\in {\mathbb {R}}^{(d-s) \times r1}\), \(\widetilde{U}_{b2}\in {\mathbb {R}}^{(d-s) \times (d-s-r1)}\). In the transformed space by the basis \(\widetilde{U}_{b1}\) of range \((\widetilde{S}_b)\), let Y be the matrix whose columns are the eigenvectors corresponding to nonzero eigenvalues of

$$\begin{aligned} S_b^*=\widetilde{U}_{b1}^TU_{w2}^T\mathbf{Sb }U_{w2}\widetilde{U}_{b1}. \end{aligned}$$
(19)

On the other hand, in the transformed space by the basis \(U_{w1}\), let the between-class scatter matrix \(\widehat{S}_b=U_{w1}^T\mathbf{Sb }U_{w1}\), \(\widehat{S}_b\in {\mathbb {R}}^{s \times s}\). Then the basis of range \((\widehat{S}_b)\) can be found by the SVD of \(\widehat{S}_b\) as (20).

$$\begin{aligned} \widehat{S}_b=\widehat{U}_b\widehat{\Sigma }_b\widehat{U}_b^T =[\underset{r2}{\underbrace{\widehat{U}_{b1}}} \underset{s-r2}{\underbrace{\widehat{U}_{b2}}}] \left[ \begin{array}{ll} \widehat{\Sigma }_{b1}&{}0\\ 0&{}0 \end{array}\right] \left[ \begin{array}{c@{\quad }c} \widehat{U}_{b1}^T\\ \widehat{U}_{b2}^T \end{array}\right] , \end{aligned}$$
(20)

where \(r2=\hbox {rank} (\widehat{S}_b)\), \(\widehat{U}_b\in {\mathbb {R}}^{s \times s}\), \(\widehat{U}_{b1}\in {\mathbb {R}}^{s \times r2}\), \(\widehat{U}_{b2}\in {\mathbb {R}}^{s \times (s-r2)}\). In the transformed space by the basis \(\widehat{U}_{b1}\) of range \((\widehat{S}_b)\), let Z be the matrix whose columns are the eigenvectors corresponding to nonzero eigenvalues of

$$\begin{aligned} S_b^\star =\widehat{U}_{b1}^TU_{w1}^T\mathbf{Sb }U_{w1}\widehat{U}_{b1}. \end{aligned}$$
(21)

Therefore, the two projection matrices \(\mathbf{W1 }=U_{w2}\widetilde{U}_{b1}Y\) and \(\mathbf{W2 }=U_{w1}\widehat{U}_{b1}Z\), which are from the null space of Sw and the range space of Sw, respectively. In order to obtain stronger discriminant information, the parameter \(\alpha \) is introduced to W1 and W2 in proposed model (Viz. \(\mathbf{W }=[\alpha \cdot \mathbf{W1 }, (1-\alpha )\cdot \mathbf{W2 }])\). Finally, the classified results of remote sensing image test data set will be obtained through W and SVM (or KNN). Figure 1 shows a block diagram of our method.

Fig. 1
figure 1

Flowchart of our approach

3.2 Analysis of Computational Complexities

In this subsection, we analyze computational complexities for the discussed methods. The computational complexity for the SVD decomposition depends on what parts need to be explicitly computed. We use flop counts for the analysis of computational complexities where one flop (floating point operation) represents roughly what is required to do one addition (subtraction) or one multiplication (division) [53]. For the SVD of a matrix \(H\in {\mathbb {R}}^{p\times q}\) when \(p\gg q\), \(H=U\Sigma V^T=[\underset{q}{\underbrace{U_1}}\underset{p-q}{\underbrace{U_2}}]\Sigma V^T\), where \(U\in {\mathbb {R}}^{p\times p}\), \(\Sigma \in {\mathbb {R}}^{p\times q}\) and \(V\in {\mathbb {R}}^{q\times q}\), the complexities (flops) can be roughly estimated as follows (Table 2).

Table 2 Complexities description

For the multiplication of the \(p1\times p2\) matrix and the \(p2\times p3\) matrix, 2p1p2p3 flops can be counted. Compared to the traditional FLDA, null subspace approach [37], range subspace method [38] and popular morphology-based feature extraction method, which involves careful design and complex steps, such as, a series of opening and closing operations, our proposed method is more efficient and computationally straightforward. Moreover, with a SVD decomposition and a simple sorting process of eigenvectors, while the morphology-based approach cannot achieve it. As a result, the proposed P-NRSw is more discriminative for remote sensing image classification, which is confirmed by our experiments in the following section.

4 Experimental Results and Analysis

In this section, we demonstrate the effectiveness of the proposed P-NRSw model on remote sensing image classification tasks. The two publicly available databases namely San Francisco of 2012 GRSS data fusion contest (2012 GRSS data) and KSC-AVIRIS data, respectively.

It is well known that the selection of kernel function is very important to achieve better performance in classification tasks. Polynomial kernel [PK, Eq. (22)], Gaussian kernel [GK, Eq. (23)], and Sigmoid kernel [SK, Eq. (24)] are three commonly used kernel functions. To evaluate the efficiency of proposed model, the three kernel functions are used in our experiment.

$$\begin{aligned} k(x,y)= & {} (g\cdot x\cdot y+1)^3. \end{aligned}$$
(22)
$$\begin{aligned} k(x,y)= & {} \mathrm{{exp}}(g\cdot ||x-y||^2). \end{aligned}$$
(23)
$$\begin{aligned} k(x,y)= & {} \mathrm{{tan}}h(g\cdot x\cdot y-1). \end{aligned}$$
(24)

In SVM, the penalty term C and the width of kernel g are need to be tuned. And the libsvm [54] was used. Each original remote sensing image dataset was scaled between [0, 1] by using a per band range stretching method. The within classification accuracy (WCA) and the total classification accuracy (TCA) criteria [26] are used in our comparison.

$$\begin{aligned} \textit{WCAi}= & {} \frac{P_i}{M_i}\times 100, \end{aligned}$$
(25)
$$\begin{aligned} \textit{TCA}= & {} \frac{\sum \nolimits _{i=1}^{c}P_i}{P}\times 100. \end{aligned}$$
(26)

In these equations, \(P_i\) denotes the number of correctly classified samples in \(i\mathrm{th}\) class, \(M_i\) is the number of samples in \(i\mathrm{th}\) class, c is the number of classes, and P is the total number samples in the test data. In addition, the kappa coefficient \((\kappa )\) was calculated using confusion matrix in our experiments.

4.1 2012 GRSS Data

The data fusion contest has been annually organized by the data fusion technical committee of the IEEE Geoscience and Remote Sensing Society (GRSS) since 2006. Detailed description can be found in [55]. In our experiments, the data set is a subset of \(477\times 342\) pixels of original image. Meanwhile, we only considered seven classes such as grass, road, roof, shadow, trail, tree, and water, to characterize this area. The distributions of training set and test set are shown in Fig. 2. The class definitions and the number of samples for each experiment is listed in Table 3.

Fig. 2
figure 2

The images of training set (a) and test set (b)

Table 3 Number of the training and test samples of 2012 GRSS data

In SVM, the parameters \(C\;(1\le C\le 10)\) and \(g\; (2^{-10}\le g\le 2^{10})\)are determined by five fold cross validation strategy. Performance indicators (PIS) denotes WCA\(i\; (i=1, \ldots , c)\), TCA, and \(\kappa \). The results by using the above mentioned three kernels \((\alpha =0.5)\) are reported in Tables 45 and 6, respectively.

Table 4 TCA (%) and \(\kappa \) of 2012 GRSS data (Polynomial kernel)
Table 5 TCA (%) and \(\kappa \) of 2012 GRSS data (Gaussian kernel)
Table 6 TCA (%) and \(\kappa \) of 2012 GRSS data (Sigmoid kernel)

According to the results in Tables 4, 5 and 6, the proposed P-NRSw method gives a slightly better results in terms of total classification accuracy and kappa value. No matter what parameters C we use, the P-NRSw almost give a slightly better results. When the sigmoid kernel was used in this experiment, four methods give a worse results. So, the kernel function is very important to achieve better classification. From a practical point of view, the gaussian kernel is the best choose. Figure 3 shows the classification results by using KNN classifier with different k values.

Fig. 3
figure 3

TCA of different k values (2012 GRSS data, \(\alpha =0.5\))

According to the results in Fig. 3, the proposed P-NRSw achieves the best classification results in terms of total classification accuracy. Compared with SVM, the KNN classifier is the best for 2012 GRSS data. In addition, the influence of the parameter \(\alpha \) will be investigated in the following experiment, which is shown in Fig. 4. Figure 4 shows that the TCA rises at first and then keeps stable and then it begins descend slowly; but when the value of \(\alpha =0.5\), the TCA is the best.

Fig. 4
figure 4

TCA of different parameters \(\alpha \) of P-NRSw with Gaussian kernel (2012 GRSS data)

In order to further analyze the effectiveness of P-NRSw model, the KSC-AVIRIS data is used for the next experiment.

4.2 KSC-AVIRIS Data

Detailed description of KSC-AVIRIS data can be found in [35, 55]. The image is \(614\times 512\) pixels. Fig. 5 shows the original image with brands 11, 21, and 31. In this data set, there are lots of singular points. So, In the experiments, the singular point was replaced by using its surrounding mean value. 50 samples are randomly taken from each class as training samples and the rest are used test samples (see Table 7). In order to evaluate the generalization power of method more accurately, we adopt five fold across validation strategy.

Fig. 5
figure 5

KSC-AVIRIS data, (Bands 11, 21, 31) acquired over KSC

Table 7 Class codes, names, and number of the training and test samples (KSC-AVIRIS)

In SVM, the parameters C and g (from \(2^{-10}\) to \(2^{10}\), the step is \(2^{0.2}\)) are determined by five fold cross validation strategy. In the following experiments, the \(\alpha =0.5\). The optimal parameter and results are shown in Tables 8, 9 and 10 with different kernels.

Table 8 TCA (%) and \(\kappa \) of KSC-AVIRIS data (Polynomial kernel)
Table 9 TCA (%) and \(\kappa \) of KSC-AVIRIS data (Gaussian kernel)
Table 10 TCA (%) and \(\kappa \) of KSC-AVIRIS data (Sigmoid kernel)

From Tables 8, 9 and 10, the proposed P-NRSw performs better than other three methods in terms of total classification accuracies and kappa coefficient. In addition, according to the TCA and \(\alpha \), we also found that the best result (TCA \(=\) 93.1155%, \(\kappa =0.9229\)) is from gaussian kernel. Figure 6 shows the classification results by using KNN classifier with different k values.

Fig. 6
figure 6

TCA of different k values (KSC-AVIRIS data, \(\alpha =0.5\))

According to Fig. 6, no matter what k we use, the P-NRSw almost gives a better result. Therefore, the robustness is significant. However, compared with SVM, the result is not ideal by using KNN classifier for KSC-AVIRIS data. Of course, we also found that the classification result is the best with k \(=\) 4. Table 11 shows the classification result of each class.

Table 11 Classification accuracies (%) and \(\kappa \) of KSC-AVIRIS data (KNN, k=4)

The results of Table 11 clearly show the superiority of the proposed method. Meanwhile, for the “Willow swamp”, “Cabbage palm hammock”, “Cabbage palm/oak”, “Graminoid marsh”, “Spartina marsh”, “Cattail marsh”, “Salt marsh”, and “Mud flats” classes the P-NRSw produces better classification results.

In addition, according to Tables 4, 5, 68, 9 and 10, we can see that P-NRSw shows better performance as compared to N(Sw), R(Sb), and SVM in terms of TCA and \(\kappa \). As reported before, the classification accuracies achieved via P-NRSw are higher. So, the parameter \(\alpha \) plays an important role in classification by contacting with the null space of Sw and the range space of Sw. Hence, the influence of the parameter \(\alpha \) also will be investigated in the following experiment, which is shown in Fig. 7, which is displays the results of TCA with different parameters \(\alpha \). And meanwhile, it shows that the TCA rises at first and then it begins descend fast; but when the value of \(\alpha =0.7\), the TCA is the best. We know that the value of \(\alpha \) is used to control the contribution between the null space of Sw and the range space of Sw.

Fig. 7
figure 7

TCA of different parameters \(\alpha \) of P-NRSw with Gaussian kernel (KSC-AVIRIS data)

Compared the classification results with Tables 4, 5, 68, 9 and 10, gaussian kernel give the best result. From what has been discussed above, the P-NRSw outperforms other three methods. In addition, from a theoretical point of view, P-NRSw offers a stable method to handle remote sensing image classification problems.

In order to further verify the effectiveness of the proposed model, we made the following experiment. Table 12 gives the total classification accuracy (TCA), kappa coefficient and time cost of different data sets. #1 and #2 denote the 2012 GRSS data set and KSC-AVIRIS data set, respectively. All the methods were implemented by using MATLAB R2010b on a desktop PC equipped with an Intel Core 2 i3 (at 2.40 GHz) and 2 GB of RAM memory.

Table 12 TCA (%), \(\kappa \) and times (s) for N(Sw), R(Sb), SVM, P-SVM and P-NRSw of different data sets

According to Table 12, we can see that P-NRSw shows better performance as compared to P-SVM, SVM, R(Sb) and N(Sw) in terms of TCA and kappa value. However, it is worth noting that the elapsed time of P-NRSw algorithm is more than other algorithms. This is because the most time consuming step is the calculation of decomposition from Sw and Sb. In addition, compared with the P-SVM method, the proposed P-NRSw method can provide higher classification accuracies with respect to TCA.

5 Conclusion and Future Work

In this paper, a new and efficient remote sensing image classification model P-NRSw been proposed. The goal of the proposed model is to make full use of detail information to search an effective projection matrix for improving the performance of image classification. For remote sensing image, the P-NRSw takes advantage of within-class similarity and between-class diversity to build a mathematical model. Then, based on the FLDA, the SVM (or KNN) is applied to obtain classification results. Experimental results obtained on two data sets confirm the effectiveness of the proposed P-NRSw, which provided efficient discriminant information for remote sensing image classification tasks. In addition, the authors realize that more work must be done to improve the classification results in the further.