Introduction

The use of hyperspectral data has increased significantly with the contribution of recent advanced sensors. Hyperspectral images with high spectral and spatial resolution provide great advantages to image processing and classification processes. For instance, they help to enhance the ability to distinguish landscape objects, monitor land and water resources, conduct research at lowers scales. This advantage derives from the absorption and reflection determination capacity of specific and narrow spectral bands (Kamal and Phinn 2011). Optimal selection of wavelengths, number of bands, and spatial and spectral resolution can be given as major problems in the use of hyperspectral imagery in pattern recognition problems (Jasani and Stein 2002; Bajcsy and Groves 2004). Although each individual band of hyperspectral images reveals different characteristics of the objects, these images contain high degree of redundancy due to the correlation between adjacent spectral bands (Lee and Landgrebe 1993; Kavzoglu and Mather 2000). The redundancy in the data set, resulting from the correlation among neighbouring spectral bands, affects all types of analyses, which can result in increased processing time and inter-class confusions (Kavzoglu and Mather 2002). Therefore, fewer bands providing the highest separability among landscape features are required to ensure the highest possible classification accuracy. Redundancy and irrelevancy of spectral bands are main causes for the failure of classification process (i.e. insufficient accuracy level). Hence, reduction of data dimensionality by selecting features from original spectral space or estimating features from transformed feature space is required to conduct a successful hyperspectral image classification.

The use of large number of spectral bands can have an adverse effect on traditional classifiers, particularly when limited ground reference data are available. Also, the classifiers utilizing only the first-order statistics show poor performance compared to the classifiers using second-order statistics (e.g. covariance matrix) (Lee and Landgrebe 1993). The use of features more than the optimal one may cause a decrease in accuracy reached by the classifiers, which is called “curse of dimensionality” (Hughes 1968). Data in high-dimensional feature space can be defined by a subspace of lower dimensionality (Kavzoglu and Mather 2002). When statistical classifiers are used for high-dimensional data, the required number of sample size reaches a size that cannot be provided in most studies. Supervised statistical classification methods require a priori knowledge of certain aspects of land use/land cover (LULC) classes, including their statistical distribution. Two crucial features of ground reference samples for a successful classification are their size and representativeness (Kavzoglu 2009). Sample size is directly related to spectral bands and the underlying assumptions of the algorithm used in modelling the dataset. Mather (1999) states that sample size should be at least 30p pixels per class (preferable more) where p is the number of spectral bands. From this point of view, for a 200-band hyperspectral image at least 6000 samples are needed for each class to learn the characteristics of the dataset, which cannot be collected in most cases. In the current dataset, samples for LULC classes range from 478 to 2455 pixels, none of which satisfies the minimum number requirement of conventional statistical classifiers. Advanced non-parametric classifiers, including neural nets, random forest, support vector machines and decision trees, have been lately suggested in the processing of hyperspectral imagery, mainly because they require smaller number of training samples to distinguish classes from each other. Principal component analysis (PCA) and feature selection algorithms are generally applied in the literature to reduce the number of spectral bands (Hirosawa et al. 1996; Kavzoglu and Mather 2000, 2002; Agarwal et al. 2007). PCA is a commonly used transformation method for the analysis of remotely sensed images. It defines new uncorrelated dimensions with their variability in the data.

Pixel-based classification has become deficient with the vast increase in spectral information and spatial resolution of the images. Pixels are assigned to one of the categories considering its spectral, textural and contextual information. Thematic maps produced through a pixel-based classification can be noisy, since some pixels in the image can be atypical or mixed pixels. Object-based image analysis (OBIA) offers unique advantages and found robust in handling hyperspectral imagery with high spatial resolution. It is also effective to eliminate ‘salt and pepper effect’. Instead of pixel specific information, OBIA considers spectral, spatial and textural features of objects that are formed by merging similar or homogenous adjacent pixels in the image. OBIA is much closer to human vision than the per-pixel analysis (Addink et al. 2007). It has been reported by many researchers that object-based classification outperforms pixel-based classification (Gao et al. 2006; Kamal and Phinn 2011; Duro et al. 2012). The crucial stage of OBIA is the segmentation, in that the image is partitioned into homogenous parts, called image objects, that intrinsically have a strong correlation with real world features. Segmentation reduces the detail level and complexity, and makes image content more suitable for delineation (Lang 2008). These methods are usually categorised as pixel-based, edge-based and region-based. Multi-resolution segmentation, introduced by Baatz and Schäpe (2000), has been the one of the most widely-used method in the literature. It requires setting of three main parameters: scale, shape and compactness. Because of its relative impact on segmentation quality and subsequent classification accuracy, the selection of scale parameter is of crucial importance (Kim et al. 2011; Johnson 2013; Kavzoglu et al. 2017). Unsupervised and supervised scale selection methods have been proposed to compute optimum scale value for a given image. ESP-2 tool, developed by Drăguţ et al. (2014), was used to calculate the optimum value for the scale parameter. It should be noted that the tool has a limitation of maximum 30 input bands for estimating scale parameter, which also requires reduction in the number of bands.

In this study, performances of parametric and non-parametric classifiers were tested on 200-band AVIRIS hyperspectral imagery, namely the publicly available Indian Pines data, using pixel- and object-based classification approaches. Nearest neighbour (NN) algorithm was selected as parametric classifier while random forest (RF) was selected as non-parametric classifier. Due to the correlation between the spectral bands and limited ground reference data, the dimension of the dataset was reduced by applying PCA and sequential forward selection based on Jeffries–Matusita (JM) distance. Performances of the methods were compared using overall accuracy and Kappa coefficient, and performance differences were analysed using Chi squared McNemar’s test.

Study Area and Dataset

The Indian Pines scene recorded by the AVIRIS sensor in June 12, 1992 was used in this study. The study site covers a mixed forest-agricultural land in Northwestern Indiana, USA. The ground truth data which delineates 16 classes was gathered by Landgrabe and his students in June 1992 (Jackson and Landgrebe 2001). The image bundle with additional materials including calibration information is available online at https://purr.purdue.edu/publications/1947/1. The image is in the size of 145 by 145 pixels with 16-bit radiometric resolution and 20-m spatial resolution. The data set contains 220 spectral bands ranging from 0.2 to 2.4 µm, 20 of which (bands 104–108, 150–163, 220) covering the region of water absorption were removed. The data set is freely available for the purpose of conducting scientific research. There are 16 LULC classes in the original Indian Pine image, but 7 classes were discarded due to their limited sample sizes. As a result, 9 LULC classes were selected for this research study (Fig. 1). Approximately 30% samples for each class randomly chosen from the ground truth as training samples and the rest was used in validation process. The data set is regarded as a challenging classification problem because of two main reasons. Firstly, the crops in the study site (mainly corn and soybeans) were very early in their growth cycle (about 5% canopy cover), and secondly the imagery has moderate spatial resolution of 20 m, causing high number of mixed pixels (Plaza et al. 2009).

Fig. 1
figure 1

Ground reference data for Indian Pines image

Methodology

Dimensionality Reduction

The dimension of the 200-band dataset was reduced using sequential forward selection (SFS) and principal component analysis (PCA). The SFS method begins by seeking the best individual band and then evaluates the other bands one at a time to locate the second band performing best with the first one. This process ends when a desired number of bands are selected. This strategy may reach a sub-optimal solution because selected bands cannot be discarded later, and there may be interactions between the selected bands (Kavzoglu and Mather 2002). In this study, Jeffries–Matusita (JM) distance, which is a saturating transformation applied to the Bhattacharyya distance, was used as a fitness measure to evaluate the quality of band combinations. Eventually, sequential forward selection process was applied with JM distance to seek the optimum 30 bands for the 200-band test image.

Principal Components Analysis (PCA) has been one of the most commonly applied method for reducing the size of multi-dimensional data sets. It has been applied for many purposes including feature extraction and data compression. PCA method is applied with the intention of removing the redundancy existing in the data set. Components are calculated by ranking them in their importance order. Thus, data sets can be described or visualised by a smaller number of components with limited loss of information. Component loadings show the relative positions of the variables along the new component axes. Given a matrix X with each column representing a pixel spectrum, the PCA dimensionality reduction is obtained by Y = PX. Here P consists of the n eigenvectors corresponding to the n largest eigenvalues of the covariance matrix of X. The resulting data matrix Y contains columns representing the lower-dimensional spectra that still convey the most information of the original spectra (Yuan 2012).

Object-Based Image Analysis (OBIA)

The main idea behind OBIA is that the image objects created by grouping adjacent pixels with similar characteristics are taken into consideration instead of millions of pixels building up the image. One of the important advantages of the object-based classification approach is that it is possible to make various analyses considering many features including neighbourhood, texture, shape, size of the objects. Image objects are created in the segmentation process that is the first step of object-based classification. The selection of the segmentation parameters is crucially important for the determination of appropriate object sizes for the real Earth objects. If the specified parameters are not selected properly, larger or smaller segments are produced, which results in over- or under-segmentation. The scale parameter is regarded as the most important parameter that determines the object sizes (Kim et al. 2011; Myint et al. 2011; Kavzoglu et al. 2017). Therefore, many studies have been conducted to analyse the effect of segmentation scale (e.g. Addink et al. 2007; Duro et al. 2012; Kavzoglu and Yildiz 2014). In the literature, although several supervised and unsupervised scale selection methods have been introduced, the scale parameter is usually determined by trial-and-error strategy. While supervised methods use manually segmented reference maps to compare segmentation results, unsupervised ones estimated quality scores or indices for the segmented image. In this study, ESP-2 tool developed by Drăguţ et al. (2014), an unsupervised scale estimation method, was employed to estimate optimal scale values for the images. The tool quickly estimates the scale parameter embedded in the eCognition Developer software. It automatically divides the image into segments according to the amount of increase defined by the user and calculates local variances between each object. For each scale parameter value, the calculated local variance graph is plotted. According to Drăguţ et al. (2010), “the thresholds in rates of change of LV (RoC–LV) indicate the scale levels at which the image can be segmented in the most appropriate manner, relative to the data properties at the scene level”. As a result, RoC–LV value is calculated for each scale parameter using Eq. (1). The RoC–LV graph consists of continuous and abrupt peaks and decays, and the first peak in the chart theoretically shows the optimum scale parameter.

$$ RoC = \left[ {\frac{{L - \left( {L - 1} \right)}}{L - 1}} \right]*100 $$
(1)

where L is the local variance of the target level, L − 1 is the local variance of the next lower level. According to the RoC–LV graph, the most significant change observed that the scale value of the difference is regarded as the optimum scale parameter. The ESP-2 tool is an improved version that allows the use of multiple layers up to 30 input bands. An important point to note here is that all bands of the 200-band hyperspectral image cannot be used in the ESP-2 tool; therefore, a dimensionality reduction process must be applied. In order to overcome this problem, the JM separability analysis was applied to estimate the best 30 band combination to be used in ESP-2 tool.

Random Forest

The Random Forest (RF) is a non-parametric method that can be applied to continuous and categorical data. RF is an improved version of bagging ensemble, and it has been reported to be superior to conventional classifiers (e.g. Pal 2005; Kavzoglu et al. 2015). According to Breiman (2001), “random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest”. The RF classifier, in fact, includes a number of decision tree classifiers, each of which is trained with bootstrapped subset of the input samples. In order to improve diversity among decision-tree classifiers in ensemble, different training samples are constructed by applying bagging method. While about 2/3 of the samples named as in-bag samples are used for training each individual tree, the remaining 1/3 samples named as out-of-bag samples is used to measure the prediction performance of the RF model. A voting approach, generally simple majority voting, is applied to combine the outputs of the different classifiers to make a final prediction a new sample (Ghosh and Joshi 2014; Belgiu and Drăguţ 2016). The number of trees or ensemble size and the number of input features to be used at each node are the main user-defined parameters of the RF algorithms.

Performance Evaluation

Assessing the quality or accuracy of a thematic map is one of the important steps of the classification process. Up-to-now, a variety of accuracy measures including map-level and class-level metrics have been suggested and applied for this purpose (e.g. Foody 2002; Liu et al. 2007; Congalton and Green 2009; Warrens 2015). In this study, standard and widely-used accuracy metrics (i.e. overall accuracy, Kappa coefficient, user’s and producer’s accuracies) were used for assessing overall and individual class accuracies for the produced thematic maps. In addition to these measures, a non-parametric test, namely McNemar’s test, was applied to compare the classification errors of two classifiers. The test, which is based on Chi squared distributions, was used to analyse the statistical significance of the differences between the overall accuracies produced by the methods. McNemar’s test is a non-parametric test based on confusion matrix and the test statistic (\( \chi^{2} \)) can be calculated by following equation (Foody 2004).

$$ \chi^{2} = \frac{{\left( {\left| {f_{12} - f_{21} } \right| - 1} \right)^{2} }}{{f_{12} + f_{21} }} $$
(2)

where \( f_{12} \) indicates the number of samples correctly classified by classifier-1 but misclassified by classifier-2, \( f_{21} \) indicates the number of samples misclassified by classifier-1 but correctly classified by classifier-2 and -1 indicates continuity correction. If the estimated statistic is greater than the critical table value (3.84), the null hypothesis is rejected. In other words, the accuracy difference for the classifiers is statistically significant.

Results

In order to overcome two basic limitations, namely the availability of small training data and 30-band input requirement of ESP-2 tool, PCA and sequential forward selection (SFS) based on JM distance was employed for the AVIRIS hyperspectral imagery. With the reduced number of spectral bands, it was possible to apply ESP-2 tool for optimal scale selection. Dimensionality reduction was first performed by applying the PCA method to the Indian Pines scene, and the first 19 principal components (PCs) representing 98.5% of the data were selected. Thus, the size of the hyperspectral dataset was reduced by about 91%. Afterwards, SFS strategy using the JM distance as a fitness function was applied to the dataset with an in-house program written in MATLAB (R2013) software. As a result, the best 30-band combination was determined. Both datasets were classified by the NN and RF methods using the training data (Fig. 1) and related thematic maps were produced. In the application of RF, the number of trees and features at each node were determined by cross-validation. Accordingly, while the number of trees took values between 150 and 250, the optimal number of features varied between 4 and 5. Selected 19 PCs and 30-band combinations were used separately to determine optimal scale values for segmentation processes. As can be seen from Fig. 2, scale values of 13 and 14 were determined for the 19 PCs and 30-band combinations respectively by assessing the peaks of the RoC–LV graphs.

Fig. 2
figure 2

Scale selection using ESP-2 tool for a 19 PCA components, b 30 bands by JM. Dotted vertical lines indicate position of the optimal scale parameters

Multi-resolution segmentation process was conducted with these particular scale values and then segments were created for each case. Segmented images were then utilized to perform classification using the NN and RF classifiers and corresponding thematic maps of the study area were produced. It should be mentioned that object-based classification of the images was performed with Definiens eCognition (9.1) software. The accuracies of the thematic maps generated by the pixel- and object-based classifications were estimated using validation dataset accounting for about 70% of the ground reference data. In order to perform an objective accuracy assessment for pixel- and object-based classifications accuracy metrics were estimated for validation pixels of each LULC class on the thematic maps. The results are given in Table 1. While 72% overall accuracy was achieved for the 19 PCs by NN with pixel-based classification, overall accuracy of 78% was calculated for object-based classification. The difference was much higher for 30-band JM dataset (increased from 67 to 84%). When the results of the RF classifier were considered, the accuracy increased for both cases at a significant level, reaching to 13% in terms of overall accuracy. Interestingly, the RF classifier produced more accurate results for lower dimensional data (i.e. 19 PCs). It is obvious from the results that using first-order statistics in classification as in the case of NN produced inferior classification performance. Another reason could be the use of limited training data for classification process.

Table 1 Pixel- and object-based classification results

The non-parametric RF classifier outperformed the conventional NN classifier for all cases and produced significantly more accurate results in object-based classification. Considering that the particular hyperspectral dataset used in this study includes a challenging task for classification, the accuracy level (88%) achieved by the RF with object-based classification is quite promising when compared to the results presented in the literature (e.g. Yang et al. 2014; Ghamisi and Benediktsson 2015).

Statistical comparison of the NN and RF results with regard to the utilized dataset and the classification approach (i.e. pixel-based and object-based) was also conducted using McNemar’s test (Table 2). In the table, statistical testing for the datasets constructed with PCA components and feature selection were given separately to make sound comparisons of the classification results. When the calculated statistical test results were analysed, it was observed that the differences in performances of all considered pairwise comparisons were statistically significant at 95% level of confidence. For example, in the case of the pairwise comparison of the object-based classification results of the NN (O-NN) and RF (O-RF) using PCA dataset, the calculated test statistic was 88.03. The estimated test value was higher than the McNemar’s table value (i.e. 3.84 at 95% confidence level). Similarly, calculated test statistic for pixel-based classification results of NN (P-NN) and RF (P-RF) was 4.93 greater than the critical table value. Thus, it can be said that the RF classifier produced significantly different classification results for the both object- and pixel-based classification compared to NN classifier for the PCA dataset. In other words, the differences between overall classification accuracies of RF and NN classifiers (i.e. 10% for object-based and 3% for pixel-based classification) were statistically significant. Furthermore, estimated overall accuracies for object-based classification were statistically different to the pixel-based classification for all pairwise comparisons in Table 2. Results undoubtedly show that the use of object-based classification significantly improved the estimated classification accuracies of NN and RF for the reduced size datasets. These findings supported the above-mentioned classification results that the use of object-based approach and non-parametric RF classifier led to statistically significant improvements in classification accuracy.

Table 2 McNemar’s test results for PCA and JM selected bands using pixel- and object-based classifications

The thematic maps generated by object-based and pixel-based classification for the first 19 PCs are shown in Fig. 3. As an expected result for pixel-based classification, Fig. 3a, b has salt-and-pepper look with lower classification accuracies. While some classes including woods, hay-windrowed and grass-tress were well discriminated with high individual accuracies (estimated user’s and producer’s accuracies were up to 95%), some classes including soybean-mintill, corn-notill and soybean-clean were mixed up with other classes (estimated user’s and producer’s accuracies were lower the 60%). The quality of segmentation and resulting object-based classification can be easily seen from the result produced by the RF method (Fig. 3d). In particular, fields of woods (user’s accuracy of 97.78%, producer’s accuracy of 95.00%), soybean-notill (user’s accuracy of 92.78%, producer’s accuracy of 88.36%) and corn-mintill (user’s accuracy of 96.11%, producer’s accuracy of 77.58%) were much more accurately identified.

Fig. 3
figure 3

Thematic maps produced for the first 19 PCs by a pixel-based NN, b pixel-based RF, c object-based NN and d object-based RF

The thematic maps produced for 30-band dataset constructed through feature selection by object- and pixel-based classification are given in Fig. 4. Similar to the thematic maps in Fig. 3, pixel-based classifications produced lower accuracies with salt-and-pepper look, and object-based classifications, especially the one produced by the RF classifier, produced more accurate map with better characterization of the LULC fields. Particularly, woods, hay-windrowed and grass-tress pixels were accurately classified with the RF classifier (estimated user’s and producer’s accuracies were up to 95%).

Fig. 4
figure 4

Thematic maps produced for the 30-bands of feature selection by a pixel-based NN, b pixel-based RF, c object-based NN and d object-based RF

Conclusions

In this study, a hybrid classification system consisting of a series of consecutive operations that includes dimensionality reduction and OBIA using a non-parametric classifier was evaluated for the classification of the Indian Pines hyperspectral image in this study. For this purpose, PCA and JM distance methods were employed to reduce the dimension of the hyperspectral imagery. OBIA was applied to generate segmented objects that were subject to classification through NN and ensemble-based RF classifiers. In addition, pixel-based classification using both methods were utilized as a benchmark method for the performance comparison. Results of this study revealed some important findings. First, it was observed that both feature extraction and feature selection methods were found to be effective for reducing dimensionality of hyperspectral imagery and estimated reduction rate were about 85 and 91% with JM distance and PCA methods, respectively. As a result, both complexity of the classification model and the process time required for supervised classification were significantly reduced. Second, it was obvious that the use of OBIA improved the classification accuracy up to 13%. This level of improvement was found to be statistically significant by the McNemar’s test results. In addition, while some LULC classes were spectrally similar and difficult to distinguish, result of object-based image classification appeared more uniform and intact. Third, estimated accuracy results revealed that the RF classifier could produce more accurate results compared to the NN classifier for both object- and pixel-based classifications. Furthermore, conventional NN classifier produced better classification performances in object-based classification with increasing feature set size (i.e. 30 bands), but the best classification performance was estimated with the use of 19 principal components for the RF classifier. In other words, the use of non-parametric algorithm significantly improved the accuracy of both object- and pixel-based image analyses in the case of hyperspectral imagery with the limited training dataset. All in all, classification of hyperspectral imagery requires advanced non-parametric classifiers and OBIA together with feature reduction techniques to handle high dimensionality with correlated spectral bands and limited ground reference data.