Introduction

Because of the key role of the classification in hyperspectral imaging, there have recently done many efforts to design more accurate classifiers (Camps-Valls et al. 2014; Canty 2014). The support vector machine (SVM) classifier is one of the best classifiers which has shown good performance in hyperspectral image classification even with limited reference data (vapnik 2000). However, in SVM and also other pixel-wise classifiers the information of each pixel is only used for classification on which the abundant spatial information of the hyperspectral, especially high resolution, images and the relationship between neighbouring pixels is not considered. Therefore, the salt-and-pepper error appears in the classification map (Fauvel et al. 2013). Furthermore, recent studies have shown that spatial information can be used to achieve better classification accuracy (Benediktsson and Ghamisi 2015). In addition to the spectral features, the texture, shape and size features can be applied to distinguish the pixels belonging to different classes better (Khodadadzadeh et al. 2014; Golipour et al. 2016; Wang et al. 2016a, b). An improved SVM classifier with multiple kernels, based on the spectral and spatial features, is proposed by Wang and Duan 2018. Since, the spectral kernel is constructed through each pixel’s spectral features, and the spatial kernel is modeled by using the extended morphological profile method. In some works, Markov Random Field (MRF) has been used to model the spatial correlation between adjacent pixels. MRF intuitively represents the spatial information which used for classification. In these works, the classification results are refined using spatial contextual information and by minimizing a suitable energy function (solberg et al. 1996; Tarabalka et al. 2010a; Zhang et al. 2011; Wang et al. 2016a, b). In another work (Wang and Duan 2019), the authors proposed a combination of the techniques of algebraic multigrid (AMG), hierarchical segmentation (HSEG) and MRF for spectral-spatial classification of hyperspectral images. In this method, a novel segmentation method is developed by combining the AMG-based marker selection approach and the conventional HSEG algorithm to construct a set of unsupervised segmentation maps in multiple scales. On the other hand, an improved MRF energy function is proposed for multiscale information fusion. Some other represented works are based on the features extraction. Extended attribute profiles (EAP) and extended multi-attribute profiles (EMAP) have been used for extracting spatial features and modelling spatial information (Dalla Mura et al. 2010). The spatial features are generated based on the morphological attribute filters and multilevel analysis. An extinction profile (EP) is an impressive and recent feature extraction (FE) method for hyperspectral images (Fang et al. 2018a). In FE by EP, the spatial information and the geometrical characteristics are represented better than the previous spectral-special methods. In Fang et al. (2018b) a novel local covariance matrix representation method is represented on which the correlation between different spectral bands and the spatial–contextual information is described when conducting FE from hyperspectral images.

Another important category of spectral-spatial classification methods is the object-based classification (Li and Wan 2015; Zehtabian and Ghassemian 2015; Fang et al. 2015). After extracting objects from the image, a feature or a set of features assigned to each object. A classifier is applied to classify the image, object by object. All pixels of an object share the same label assigned to that object.

The other well-known schemes of this category are those which utilize texture features. Texture is one of the main features of the image which reveals the spatial information. Gabor filters, gray-level-co-occurrence matrices (GLCMs) and wavelet transformation are three main methods used to extract texture features (Guo et al. 2014; Wang et al. 2016a, b; Mirmehdi 2008). In Huo and Tang (2011) the spectral features and the extracted texture features are stacked in one feature vector and used for classification. Extended expression of the stacked features approach can be found in Mirzapour and Ghassemian (2015). Various combinations of spectral, texture and shape features are stacked together. Four different types of spectral features, i.e. original spectral data, features extracted by linear discriminant analysis (LDA); principal component analysis (PCA); and non-parametric weighted feature extraction (NWFE) (Jia et al. 2013) were considered as spectral part of stacked features. More than 50 different combinations were evaluated on three hyperspectral data using SVM classifier and the best combination of each data set was reported. In Fang et al. (2017) the spectral, texture and shape features are combined by adaptive sparse representation (MFASR) method. The sparse coefficients of these multiple-feature matrix are used to determine the label of each pixel.

Multi-scale superpixel-based spectral–spatial classification (Li et al. 2016) is another novel technique for spectral-spatial classification. Superpixel-based classification and segmentation are performed in each scale on which the classification accuracy is improved using the segmentation results. Finally, the multi-scale classification result is obtained by the majority voting.

Some other widely used methods are based on the combination of the pixel-wise classification with segmentation or clustering techniques. Tarabalka et al. (2010b) used an SVM classifier for pixel-wise classification and a watershed transformation to generate segmentation map. The improved classification results were obtained by combining classification and segmentation maps through majority voting approach within the watershed regions. Seifi Majdar and Ghassemian (2017a) investigated the efficiency of this method in functional data analysis framework. A three steps spectral-spatial classification method based on joint bilateral filtering and graph cut segmentation is proposed by Wang et al. 2016a, b in which the regions obtained by the spectral-spatial segmentation are properly labelled.

The prominent potential of the Neural network (NN) for hyperspectral image classification has recently proved. In conventional NNs the typical pooling layers are fixed and cannot be changed adaptively for feature downsampling. On the other hand, the sampling locations of traditional convolutional kernels cannot be changed based on the complex spatial structures in hyperspectral images. Zhu et al. (2018) proposed a deformable CNN-based hyperspectral image classification method in which a deformable convolutional sampling locations with adaptive size and shape can be adjusted according to the spatial information. In Fang et al. (2018c), a squeeze multibias network (SMBN) is proposed for hyperspectral image classification. In this method, a multibias module is placed behind the convolutional layer to decouple feature maps to multiple maps according to the magnitudes of responses. Then, the combination of the response maps by the subsequent layer is used to reach a better classification accuracy.

Integrating of the spectral data and the spatial information in the probabilistic framework (Liu and Lu 2016) is one of the newest spectral–spatial classification approaches. The per-pixel probability is separately estimated for spectral data and spatial information and then the joint probability is obtained using these two probabilities. We can find outstanding results in this method, but still remained some issues to be addressed. This method only applied in fixed-size window and is not considered the border effects between regions. In another work, a spectral-spatial classification method was proposed in probabilistic framework. At first, spectral and texture features were separately classified by a probabilistic SVM classifier to estimate the per-pixel probability. Then, the total probability was calculated by a linear combination of the previous probabilities for each pixel. Finally, one pixel was assigned to a class with maximum total probability (Seifi Majdar and Ghassemian 2017b). Texture features is the only spatial information which applied in this method to enhance the classification results and it had not been used of the shape features and contextual information.

To address the above issues and to enhance the classification results further, a new combination of spectral, texture and shape features, as well as, contextual information in the probabilistic framework is proposed to improve the classification of the hyperspectral images, especially with limited training samples. Texture features are extracted by Gabor filters and the shape features are represented by MPs. The spectral and spatial features are separately fed into a probabilistic support vector machine (SVM) classifier to estimate the per-pixel probability. These probabilities are combined together to calculate the total probability, on which three weights determine the efficacy of each one. Finally, the classification result obtained in the previous step is refined by majority voting within the shape adaptive (SA) neighbourhood of each pixel. Instead of the simple majority vote, we applied the majority vote in the probabilistic framework on which the reliability of the labels in the region is also considered. The main contributions of this article contain three-folds: 1) weighted combination of the spectral, texture, shape features and the contextual information in probabilistic framework is the main contribution of this article that didn’t appear in the previous works in which all extracted spatial information are combined with the spectral data by using probability distribution functions, 2) Unlike some spectral-spatial classification methods which use of the fixed-window neighbourhood, in this article, the contextual information in a SA neighbourhood is used to enhance the classification results, 3) in the previous works, the number of the similar and dissimilar labels in the neighbourhood is used to refine the label of the central pixel, but the reliability of the neighbourhood labels is not considered. In the proposed method, the number of labels and the reliability of that labels are considered to determine the label of the central in a SA neighbourhood. The remaining parts of this article are organized as follows: In the next section, the mentioned methods to extract the texture and the shape features are introduced briefly. In the Proposed classification method section, we explain the proposed method. The hyperspectral data sets and the experimental results are represented in the Experimental results and discussions section, and the conclusion is drawn in Conclusion section.

Main components of the proposed method

In this study we exploit spatial information using Gabor filters as texture features and MP as shape features. In the following subsections, we introduce them briefly.

Texture feature extraction using Gabor filters

Gabor filters have widely been conducted in image processing, pattern recognition and computer vision (Mirmehdi 2008). They can provide accurate time-frequency location and robust against contrast and brightness of the images. Two-dimensional Gabor filters are used to extract texture features. Daugman (1985) extended the Gabor filters in two-dimensions for the first time. A two-dimensional Gabor wavelet consists of a complex plane wave modulated by an elliptical Gaussian envelope (Zhang et al. 2012). 2-D Gabor function can be represented as follows:

$$ {\displaystyle \begin{array}{c}{G}_{s,d}\left(x,y\right)={G}_{\overrightarrow{k}}\left(\overline{x}\right)=\frac{\left\Vert \overrightarrow{k}\right\Vert }{\delta^2}.{e}^{-\frac{{\left\Vert \overrightarrow{k}\right\Vert}^2-{\left\Vert \overline{x}\right\Vert}^2}{2{\delta}^2}}.\left[{e}^{i.\overrightarrow{k}.\overline{x}}-{e}^{\frac{\delta^2}{2}}\right],\\ {}\overrightarrow{k}=\left(\pi /2{f}^s\right)\cdotp {e}^{i\left(\frac{\pi d}{8}\right)},f=2.\end{array}} $$
(1)

Where (x,y) implies the spatial position of a pixel in the image, \( \overrightarrow{k} \) is the frequency vector, d and s are direction and scale respectively. The number of oscillation under Gaussian envelope is determined by δ = 2π. The convolution of the image I with the Gabor function in a specific scale and direction is performed to generate the Gabor texture features:

$$ {F}_{s,d}\left(x,y\right)={G}_{s,d}\left(x,y\right)\ast I\left(x,y\right) $$
(2)

The Gabor texture features of one pixel are given as follows:

$$ {v}_{texture}\left(x,y\right)=\left[{F}_{1,1}\left(x,y\right),\dots, {F}_{s,d}\left(x,y\right)\right]\in {R}^{sd} $$
(3)

Shape features representation by morphological profiles

Mathematical morphology, developed by Matheron (1975); Serra (1983), is a powerful methodology which shows some operators to extract image components used to represent the region shape. Erosion and dilation, two fundamental morphological operations, act on an image by using structuring element (SE) of arbitrary size and shape to evaluate the geometrical structures of the image. More complex morphological operations, opening and closing, are obtained by the combination of erosion and dilation. The morphological opening is defined as an erosion of an image followed by a dilation. Moreover, the morphological closing has an opposite definition of the opening operation which is defined as a dilation, followed by an erosion operation. Reconstruction is a morphological transformation which tends to restore the shape of the objects that remain after morphological opening or closing. When opening by reconstruction or closing by reconstruction operators act on an image, preserve the objects that can contain SE, while the other objects are completely removed (Soille 2013). Applying these operators on a grayscale image I with m disk-shaped SE of radius λ ∈ {1, 2, …, m}, lead to a stack of (2 n + 1) features,

$$ \mathbf{MP}\left(\mathbf{I}\right)={\Pi}_i:\left\langle \begin{array}{c}\ {\Pi}_i={\Pi}_{\phi_{\lambda }}\ \mathrm{with}\kern1em \lambda =\left(m-1+i\right),\forall \lambda \epsilon \left[1,m\right]\kern2.5em \\ {}{\Pi}_i={\Pi}_{\gamma_{\lambda }}\kern0.5em \mathrm{with}\kern1em \lambda =\left(i-m-1\right),\forall \lambda \epsilon \left[m+1,2m+1\right].\end{array}\right. $$
(4)

Where \( {\Pi}_{\phi_{\lambda }} \) and \( {\Pi}_{\gamma_{\lambda }} \) are closing and opening by reconstruction operators with a disk-shaped SE of radius λ, respectively.

Probabilistic SVM

SVM is a versatile method for the classification of hyperspectral images with linear or nonlinear models. The nonlinear SVM model, using kernel functions, provides better classification accuracies rather than the linear model (Camps-Valls and Bruzzone 2005). In this article, all of the spectral and spatial features are fed into the multiclass one-versus-one SVM classifier with a polynomial kernel, K(x, y) = (γxTy + r0)d. For simplification and fair comparisons, the default values are considered as kernel parameters, d = 3, γ = 1/number of features, and r0 = 0. The SVM classifier cannot directly generate the estimation of probability. Therefore, different techniques are used to estimate per-pixel probability by combining all pair-wise comparisons (Wu et al. 2004). The popular LIBSVM library is employed to implement the SVM classifier and to estimate per-pixel probability (Chang and Lin 2011).

Proposed classification method

The flowchart of the proposed classification method is shown in Fig. 1. We have d-band hyperspectral image X = {xi ∈ Rd, i = 1, 2, …n}; n is the number of pixels. In this method, at first, Gabor and MP features are extracted from the first principal components (PC) of the original data (Zhang et al. 2012; Mirzapour and Ghassemian 2015). In the second step, the spatial features, Gabor and shape features, as well as spectral data are separately classified using the probabilistic SVM classifier. Therefore, three per-pixel probabilities are obtained for each pixel of the image.

Fig. 1
figure 1

Flowchart of the proposed classification method

Suppose there are K different classes {w1, w2, …wK} in the image. The estimation of the probability based on the spectral, texture and shape features are computed as follows:

$$ {\boldsymbol{p}}^{\mathrm{spc}}\left({\boldsymbol{x}}_i\right)=\left\{{p}_k^{\mathrm{spc}}\left({\boldsymbol{x}}_i\right)={p}^{\mathrm{spc}}\left(y=k|{\boldsymbol{x}}_i\right),k=1,2,\dots K;\kern0.5em i=1,2,\dots n\right\} $$
(5)
$$ {\boldsymbol{p}}^{\mathrm{txt}}\left({\boldsymbol{x}}_i\right)=\left\{{p}_k^{\mathrm{txt}}\left({\boldsymbol{x}}_i\right)={p}^{\mathrm{txt}}\left(y=k|{\boldsymbol{x}}_i\right),k=1,2,\dots K;\kern0.5em i=1,2,\dots n\right\} $$
(6)
$$ {\boldsymbol{p}}^{\mathrm{shp}}\left({\boldsymbol{x}}_i\right)=\left\{{p}_k^{\mathrm{shp}}\left({\boldsymbol{x}}_i\right)={p}^{\mathrm{shp}}\left(y=k|{\boldsymbol{x}}_i\right),k=1,2,\dots K;\kern0.5em i=1,2,\dots n\right\} $$
(7)

Where ‘spc’, ‘txt’, and ‘shp’ are the abbreviations of spectral, texture, and shape, respectively.

Now, three probability distributions are obtained, i.e., the spectral distribution calculated from the spectral data, the spatial distributions computed from the texture and shape features. In the third step, the spectral, texture and shape probability distribution are combined together on which three positive weights, ω = {ω1, ω2, ω3}; ωj > 0, determine the efficacy of each one. It means that for each feature, the larger ωj represents the more important role of that feature in the combination. Therefore, total probability distributions defined as follows:

$$ {\boldsymbol{p}}^{\mathrm{total}}\left({\boldsymbol{x}}_i\right)={\omega}_1{\boldsymbol{p}}^{\mathrm{spc}}\left({\boldsymbol{x}}_i\right)+{\omega}_2{\boldsymbol{p}}^{\mathrm{txt}}\left({\boldsymbol{x}}_i\right)+{\omega}_3{\boldsymbol{p}}^{\mathrm{shp}}\left({\boldsymbol{x}}_i\right)=\sum \limits_{j=1}^3{\omega}_j{\boldsymbol{p}}^{{\mathrm{feature}}_j}\left({\boldsymbol{x}}_i\right) $$
(8)

Where feature = {spc, txt, shp} and also we have the following constraint

$$ \sum \limits_{j=1}^3{\omega}_j=1\kern0.75em ,\kern0.5em {\omega}_j>0 $$
(9)

In the final step, the spatial contextual information is applied to enhance the classification performance. For this purpose, Instead of defining a fixed-size window, a shape adaptive (SA) window is assumed for each pixel xi (Fu et al. 2016). We know that the label of central pixel xi can be modified by two factors of the neighbouring pixels. The first one is the inter-pixel class dependence assumption on which one pixel with a specific label has tendency to have neighbouring pixels with the same label (Tarabalka et al. 2010b). The second one is the classification reliability of the neighbouring pixels in which the neighbouring pixels with higher reliability have more influence on the label of pixel xi (Negri et al. 2014). Negri et al. (2014) determined the classification reliability of one-pixel x as follows:

$$ {f}_{SVM}\left(\boldsymbol{x}\right)=<w,\boldsymbol{x}>+b $$
(10)

Where, w is the orthogonal vector to the separating hyperplane and \( \frac{b}{\left\Vert w\right\Vert } \) determines the offset of the hyperplane from the origin along w. The notations <, > and ‖‖ represents the inner product and the vector normal, respectively. In a given neighbourhood of a pixel xi, the influence of each neighbouring pixel is proportional to the reliability of that pixel.

In our proposed method, the classification reliability of the pixels is determined by per-pixel probability which obtained previously. Accordingly, in a specific class, the classification reliability of a pixel with probability near one is high and the classification reliability of a pixel with probability near zero is low. For instance, Fig. 2 shows the per-pixel probability of the pixels x1 and x2, we supposed that there are only two different classes in the sense. As shown in this figure the probability of belonging pixel x1 to class1 is high but there is a lower probability of belonging pixel x2 to class1. As a result, it can be said that the classification reliability of pixel x1 is higher than the classification reliability of pixel x2 in class1. Accordingly, we can say with high reliability that the red pixel belongs to the class1 while for blue pixel the degree of reliability is low.

Fig. 2
figure 2

Example of using probability estimates of one pixel to determine the classification reliability

The general idea of applying spatial contextual information to regularize the initial classification maps are given in Fig. 3. In this procedure, at first, a neighbourhood with SA is considered for each pixel xi. Then the total probability of pixel xi and its neighbourhood, which obtained in the previous step, are summed together:

$$ {p}^{\mathrm{SA}}\left({\boldsymbol{x}}_i\right)={p}^{\mathrm{total}}\left({\boldsymbol{x}}_i\right) $$
(11)
Fig. 3
figure 3

The procedure of applying spatial contextual information to refine the classification maps (the numbers in the tables were arbitrarily selected)

Where Ni is the set of neighbouring pixels of a given pixel xi. Finally, the pixel xi is assigned to the class with the highest probability.

$$ label\ \left({\boldsymbol{x}}_i\right)={\arg \mathit{\max}}_{k=1}^K{p}^{\mathrm{SA}}\left({x}_{ik}\right) $$
(12)

It should be noted that this procedure occurs when there is at least one pixel with the different label in the neighbouring pixels.

Our proposed method to use of the contextual information of the neighbouring pixels is the same as the majority voting model but it has a main advantage. The simple majority voting model is just based on the inter-pixel class dependence assumption in which all pixels within a region are assigned to the most frequent label of that region and the reliability of the neighbourhood pixels are not considered. On the other hand, in our proposed majority voting in the probabilistic framework, the regularization of the classification map is performed based on the number of labels and the reliability of the neighbourhood patterns. The former indicates that one pixel with a specific label has tendency to have neighbouring pixels with the same label, while the latter means that the neighbouring pixels with higher classification reliability have more influence on refining the classification maps. When the probabilities of one pixel and its neighbouring pixels collected together, these two assumptions considered simultaneously.

The mentioned details of the proposed spectral-spatial classification method can be found in Fig. 1. The proposed method for each data set with different scheme of training samples will be introduced in the next section. The only drawback of the proposed method is the best selection of weights ωj correspond to each feature. In the proposed method, the best set of these weights are derived experimentally. The effects of the weights on the classification results are given in the next section.

Experimental results and discussions

In this article, three hyperspectral data sets are used to evaluate the proposed method and to compare it with some recent spectral–spatial classification methods: (1) Indian Pines data set has a low spatial resolution image and it contains agricultural and forest land covers which gathered by AVIRIS sensor over the Indian Pines test site. (2) Pavia University data set was captured by the ROSIS sensor over Pavia with a high spatial resolution and it contains urban structures. (3) Salinas data set contains agricultural land cover with a high spatial resolution which were acquired by the AVIRIS sensor over Salinas Valley.

Hyperspectral data sets

  1. (1)

    Indian Pines data set: Indian Pines data set is a high spectral resolution hyperspectral image but it has a low spatial resolution (20 m). It relates to non-urban, agricultural/forest land covers which captured by the AVIRIS sensor over North-western Indiana in June 1992. It has 145 × 145 pixels and 224 spectral bands on which 24 water absorption and noisy bands (104–108, 150–163, 220) were excluded and the remaining 200 bands are used in the experiment. This image contains 16 different classes in which six classes with the small number of samples are removed and ten classes will be applied for the experiments. The false-colour and the ground truth image are shown in Fig. 4.

  2. (2)

    Pavia University data set: Pavia University data set which has a very high spatial resolution (1.3 m) acquired by the ROSIS-3 sensor over the urban area of Pavia University, northern Italy. It has made of 610 × 340 pixels, and 115 spectral bands in which 12 most noisy bands were omitted and the remaining 103 spectral bands were used in our experiment. The false-colour and the ground truth image are shown in Fig. 5.

  3. (3)

    Salinas data set: Salinas data set is a high spectral resolution and also it has a high spatial resolution (3.7 m) which captured by AVIRIS sensor over Salinas Valley, California. This data set comprises 512 × 217 pixels, 16 classes, and 224 bands, while 20 water absorption bands (108–112, 154–167 and 224) were discarded. The false-colour and the ground truth image are shown in Fig. 6.

Fig. 4
figure 4

Indian Pines image: a false-colour image b 16-class ground truth

Fig. 5
figure 5

Pavia University image: a false-colour image b ground truth

Fig. 6
figure 6

Salinas image: a false-colour image b ground truth

General description

In order to evaluate the performance of the proposed method, it is applied to three data sets described in advance. In all data sets, the texture and shape features are extracted by Gabor filters and MP, respectively.

In Gabor function, the direction and scale parameters are d = 12 and s = 5 (Zhang et al. 2012). Therefore, 5 × 12 = 60 Gabor filters are applied on the first PC. Accordingly, the texture features for one pixel of the hyperspectral image are represented by 60 features, vtexture ∈ R60. To extract MP features, the first PC is used. The parameter m is arbitrarily chosen 25, therefore, the disk-shaped SE of radius λ∈ {1, 2, …,25}, leads to a stack of 51 features (Mirzapour and Ghassemian 2015). It should be noted that, we used of first PC because it contains as much of the variability in the image as possible and has the most spatial information.

The probabilistic SVM with a polynomial kernel, K(x, y) = (γxTy + r0)d, is used in all classifications. The default values are considered as kernel parameters,\( d=3,\gamma =\frac{1}{number\ \mathrm{of}\ \mathrm{features}},\mathrm{and}\ {r}_0=0 \). In all experiments the training samples of each class are randomly selected from the entire scene to train the classifier, however, the remaining samples are used as testing samples. Each experiment is run ten times with ten different sets of training samples and the average results are reported. Three schemes for the number of training samples is considered in all experiments which called set A, set B and set C respectively: (1) ill-posed classification problem i.e. ni = 10 < N < d; (2) poorly-posed classification problem i.e. ni = 50 < d < N; (3) a proportional scheme i.e. we select 1% of samples from each class as training set; where ni is the number of training samples of class i, N is the number of all training samples and d is the number of spectral bands of hyperspectral image. All schemes relate to the limited training samples problems. It should be noted that we consider type 1 and 2 for Indian Pines and Pavia University data sets and type 3 only for Salinas data set in order to make fair comparison between the proposed method and the other spectral–spatial methods. The number of training samples in each scheme for three data sets are shown in Table 1.

Table 1 The number of spectral bands and the total number of training samples for three data sets

Three measures of accuracy were used: overall accuracy (OA), average accuracy (AA) (Kianisarkaleh and Ghassemian 2016)) and kappa coefficient (κ) (Cohen 1960). The experiments are conducted on a computer with an Intel Pentium (R) Dual-Core 2.2 GHz processor and 4 GB RAM running Windows 7 64Bit operating system. In addition, MATLAB version 2018a is used to implement the algorithms.

Results

At first, effectiveness of the different values of ω1, ω2 and ,ω3 on the proposed weighted combination of multiple-features classification in probabilistic framework has been evaluated (Fig. 7). This experiment demonstrates that by selecting different values for these weights, the classification results are changed considerably. With set A, in Indian Pines image, the effectiveness of the shape probability distribution is higher than the texture and the spectral probability distribution. We can find almost the same effects in Salinas image, but in Pavia University image, the texture probability distribution has more effectiveness. With set B, the classification results of the three data set are almost the same. In these condition the effectiveness of the shape and texture probability distribution are higher than the spectral probability distribution. It can be see that the effectiveness of the weights increases by decreasing the number of training samples.

Fig. 7
figure 7

The evaluation of the weighted combination with some different values of ω1, ω2 and ω3 on three data sets and with training samples set A and set B, in each case the best weights are given

Our experiment with different set of training samples demonstrate that various reasons, such as hyperspectral image, spatial resolution and the number of training samples, have effect to select the optimum weights ω1, ω2 and ,ω3 to reach the best classification results. The best choices of these parameters were derived experimentally and in each case are given in Fig. 7. Automatically selecting the weights in different situation, can be investigated in future works.

Now, the classification results of the spectral, texture and shape features are obtained on three hyperspectral images with two schemes of the training samples, set A and B, and compared with the classification results of the proposed method to show its superiorities.

  1. 1)

    Indian Pines image: the pixel–wise classification results of spectral, texture, shape features and the proposed method with limited training samples are given in Table 2. The low spatial resolution and the problem of highly mixed pixels of the Indian Pines image leads to the low classification accuracies with spectral and texture features, especially with set A. It can be seen that among these features, the shape one reveals better results than the spectral and texture features with set A. Moreover, the texture features reveal better results with set B. The OA, AA and κ of the shape features with set A are 74.75%, 77.74%, and 71.16%, respectively. On the other hand, the OA, AA and κ of the texture features with set B are 88.20%, 91.68%, and 86.36%, respectively. On the other hand, the OA, AA and κ of the proposed method with set A are 84.86%, 89.80%, and 82.63%, respectively and with set B are 96.54%, 97.70%, and 95.97%, respectively. If we compare these results, we can see the considerable improvement in classification accuracies. The proposed method improves the OA accuracy about 25%, 27% and 10% in comparison with pixel–wise spectral, texture and shape features classification with set A, respectively. Similar improvements, but of lesser degree, can also be found with set B. The classification maps of set A and set B are shown in Fig. 8.

  2. 2)

    Pavia University image: as Indian Pines image, the classification accuracies of the spectral, texture and shape features; the proposed method are given in Table 3. It can be seen that among all features, shape features have the best performance with all schemes of training samples. On the other hand, the proposed method improves the classification accuracies about 10% with set A and about 5% with set B in comparison with the shape features. The classification maps of set A and set B are shown in Fig. 9.

  3. 3)

    Salinas image: as previous data sets, OA, AA and κ of the classification results with set A and set B are listed in Table 4. We can also see the benefits of the proposed weighted combination method in probabilistic framework with this data set. The classification maps are given in Fig. 10.

Table 2 Classification results of Indian Pines image
Fig. 8
figure 8

Indian Pines image: the ground truth image and comparison of the classification maps of spectral, texture and shape features and proposed method with training samples set A and set B

Table 3 Classification results of Pavia University image
Fig. 9
figure 9

Pavia University image: the ground truth image and comparison of the classification maps of spectral, texture and shape features and proposed method with training samples set A and set B

Table 4 Classification results of Salinas image
Fig. 10
figure 10

Salinas image: the ground truth image and comparison of the classification maps of spectral, texture and shape features and proposed method with training samples set A and set B

The simplest conclusion which can be drawn from the previous experiment is that the classification results of the proposed method depend on the data set and the number of training samples. On the other hand, we could not find a unique set of weights for all data sets to give the best results with different schemes of training samples. But the proper weights can improve the classification results considerably. We select the optimum weights experimentally in this article. The benefits of the proposed combination method are more considerable with set A, ill-posed training samples, in comparison with set B, poorly–posed training samples.

Overall time complexity analysis

Since the run time of the competing methods are not accessible to compare with our method, the overall time complexity of the proposed method is studied. The overall time complexity of the proposed method is studied here. The most time-consuming parts are texture features extraction by Gabor filters, shape features extraction by MP and spatial contextual information extraction by shape-adaptive algorithm. Because of the use of SVM classifier with a polynomial kernel, without tuning the kernel parameters, the classification of all pixels based on the spectral, Gabor and MP features is not time-consuming. In addition, the process of combining probability distributions and applying spatial contextual information of the neighbouring pixels are not complex and they are not considered in total time complexity. Therefore, the time complexity of Gabor features extraction, MP extraction, shape-adaptive algorithm and SVM classification are denoted by O(Gabor), O(MP), O(SA) and O(SVM). Accordingly, the overall time complexity of the proposed method is calculated by:

$$ \left(n-N\right)\times \left[O\left(\mathrm{Gabor}\right)+O\left(\mathrm{MP}\right)+\mathrm{O}\left(\mathrm{SA}\right)+3\times O\left(\mathrm{SVM}\right)\right] $$
(13)

Where, n is the number of pixels in the image and N.

Comparison with some recent spectral–spatial classification methods

The proposed method with set B for Indian Pines and Pavia University images and set C for Salinas image are compared with some recent spectral–spatial classification methods in order to evaluate the capability of the proposed method in hyperspectral image classification (Tables 5, 6 and 7). For Indian Pines and Pavia University images, the proposed method is quantitatively compared with the extended morphological profiles (EMP) (Benediktsson et al. 2005), edge-preserving filter (EPF) (Kang et al. 2014), SVM-composite kernel (SVM-CK) (Camps-Valls et al. 2006), generalized composite kernel-based multivariate logistic regression (GCK-MLR) (Li et al. 2013), superpixel-based classification via multiple kernels (SC-MK) (Fang et al. 2015), multiple nonlinear feature learning with multivariate logistic regression (MNFL) (Li et al. 2015), EPs with a stacking manner (EPs-stacking) (Ghamisi et al. 2016), EPs-fusion (EPs-F) (Fang et al. 2018a, b, c). The spatial information in EMP and EPF are extracted by the morphological profiles and edge-filtering, respectively. In the SVM-CK method, the spectral and spatial features are combined together by the composite kernel on which the weights for each features are selected manually. Similarly, in the GCK-MLR method, the generalized composite kernel is constructed by the spectral–spatial information. A superpixel approach in the SC-MK method is used to extract the spatial and spectral information. Various extracted features are combined together in the MNFL method for better classification of hyperspectral images. A fusion framework in the EPs-F method is used to draw out the spatial information of the EPs. On the other hand, for Salinas image, the proposed method is compared with the extended morphological profiles (EMP) (Benediktsson et al. 2005), the Logistic regression via variable splitting and augmented Lagrangian-multilevel logistic (LORSAL-MLL) (Li et al. 2011), the sparse representation classification-pixelwised (SRC-pixelwised), the joint sparse representation model (JSRM) (Chen et al. 2011), the nonlocal weighting sparse representation (NLW-SR) (Zhang et al. 2014), the singlescale adaptive sparse representation(SASR), the multiscale separate sparse representation (MSSR), the multiscale joint sparse representation (MJSR), the multiscale adaptive sparse representation (MASR) (Fang et al. 2014). For the LORSAL-MLL method, the spatial information of the hyperspectral image is exploited by the multilevel logistic prior-based segmentation technique. The SRC-Pixel-wise is a sparse representation-based classifier on which the spectral information is used for classification. Similarly, the JSRM is a sparse representation-based classifier on which the spatial context is utilized within one fixed single scale. In NLW-SR method, a dynamic weight based on the spectral similarity and the local neighbourhood region is assigned for each pixel. The SASR method is a single-scale sparse representation classifiers based on modifications of the JSRM. In SASR, the adaptive atom selection strategy is applied on the JSRM. The MSSR and MJSR are two multiscale sparse representation-based classifiers. The MASR simultaneously represents pixels of multiple scales via an adaptive sparse strategy to exploit correlations among multiple scales and to represent the pixels of each scale to by an appropriate representation. Since we did not have the codes of these methods, the classification accuracies of the competing methods were obtained from the corresponding articles. The details and the parameter settings of different methods can be found in the original articles. It can be seen that the proposed method in Indian Pines and Salinas data sets outperforms the competing methods. The proposed method in Pavia University is better than the other methods although it is mostly similar to the EPs-F method. The OA of the proposed method is 98.33% and the OA of the EPs-F method is 98.67%. It can be seen that the proposed method in Indian Pines and Salinas data sets outperforms the competing methods on which the level of enhancement is noticeable. The proposed method in Pavia University is better than the other methods although it is mostly similar to the EPs-F method. Eps-F is an effective spatial-spectral feature extraction method for hyperspectral images (HSIs), which has better performance in Pavia University, high resolution hyperspectral image. The given results in these tables demonstrate the performance of the proposed spectral–spatial classification method. These results show that the weighted combination of the probability distributions contain useful information about texture and shape features of the image. In addition, the spatial contextual information of the neighbouring pixels can significantly improve the classification accuracy. On the other hand, according to the results presented in Tables 5, 6 and 7, the proposed method with set B for Indian Pines and Pavia University images and set C for Salinas image, poorly-posed classification problem, outperforms most of the competing methods.

Table 5 Indian Pines: classification accuracies of the proposed method compared with some recent spectral–spatial classification methods for 50 training samples (set B)
Table 6 Pavia University: classification accuracies of the proposed method compared with some recent spectral–spatial classification methods for 50 training samples (set B)
Table 7 Salinas: classification accuracies of the proposed method compared with some recent spectral–spatial classification methods for 1% training samples (set C)

Conclusion

In this article a novel weighted combination of spectral and spatial information in probabilistic framework for classification of hyperspectral images, especially with limited training samples was proposed. Three probabilities of spectral, texture and shape features were combined together for each pixel on which three weights determine the efficacy of each one. Then the spatial contextual information of the neighbouring pixels in a SA region was applied to improve the classification results further. The main contributions of this article contain three-folds: 1) weighted combination of the spectral, texture, shape features and the contextual information in probabilistic framework is the main contribution of this article that didn’t appear in the previous works in which all extracted spatial information are combined with the spectral data by using probability distribution functions, 2) Unlike some spectral-spatial classification methods which use of the fixed-window neighbourhood, in this article, the contextual information in a SA neighbourhood is used to enhance the classification results, 3) in the previous works, the number of the similar and dissimilar labels in the neighbourhood is used to refine the label of the central pixel, but the reliability of the neighbourhood labels is not considered. The proposed method was performed on three data sets with different limited training samples. Studying the results showed that the use of probabilistic framework to combine the spectral and spatial information is a simple and robust technique for enhancing the classification accuracies in hyperspectral images. Finally, Comparison with some recent spectral-spatial classification methods demonstrated better performance of the proposed method.