Introduction

Nowadays, a substantial increase in interest in fusing remotely sensed data from multiple aerial and satellite imaging sensors is witnessed. The initial data used in this work is derived from sensors including; long-wave infrared radiation (LWIR), hyperspectral, multispectral, light detection, and ranging (LiDAR), and synthetic aperture radar (SAR) data.

A plethora of urban applications such as the classification of the urban land cover (Bigdeli et al., 2021; Li et al., 2015; Lu et al., 2015; Samadzadegan et al., 2017), detection of heat islands (Bulatov et al., 2020; Heldens et al., 2013), and monitoring of the urban infrastructure (Tarighat et al., 2021) are performed by employing different remote sensing fusion techniques. However, urban land cover classification is believed to remain a challenging issue considering a variety of land covers such as diverse building materials or vegetation types (Lu et al., 2015). Accordingly, urban land cover classification with only one type of remotely sensed data like multispectral images may not yield accurate and satisfactory results (Abdi et al., 2017). Therefore, numerous urban studies leveraged the simultaneous application of multi-resolution and multi-sensor data from the same region (Abdi et al., 2017; Bigdeli et al., 2021; Lu et al., 2015; Samadzadegan et al., 2017). Many of these efforts were focused on urban land cover classification (Abdi et al., 2017; Bigdeli et al., 2021; Samadzadegan et al., 2017; Zhang et al., 2012; Zhong et al., 2017).

Many works have been published on decision-based data fusion, particularly between hyperspectral LWIR and visible images with a high resolution, for the classification of the urban land cover (Bigdeli et al., 2021; Samadzadegan et al., 2017; Zhang et al., 2012). Also, for instance, Bigdeli et al. (2021) suggested a decision-based fuzzy integral and modification of PSO for urban land cover generation utilizing thermal infrared hyperspectral data as well as high-resolution RGB images. Lu et al. (2015) demonstrated a decision-level fusion approach for classifying residential land cover utilizing those mentioned earlier. Moreover, Abdi et al. (2017) have proposed a decision-based urban classification system, which resulted in improved performance compared to single-sensor classification. Nevertheless, many works focused on the feature-level data fusion for the classification of urban land cover urban by the utilization of hyperspectral LWIR as well as very high-resolution visible images. These scarcities might originate from the different nature of the two datasets (RGB image with high resolution and thermal infrared hyperspectral data). As a result, those two methods can be used as complementary (Dalponte et al., 2008). Hence, feature-level fusion with hyperspectral LWIR and RGB data expects better results for urban land cover classifications.

It is believed that in the discernment of the classes of land cover that are spectrally analogous, hyperspectral remote sensing images perform a critical function (Feng et al., 2019; Hänsch et al., 2020). This data source is known for its very high spectral resolution, typically containing hundreds of observation bands. Based on its spectral richness, addressing the urban application, which requires extraordinary discrimination potentials in the spectral area, is achievable (Hasani et al., 2017). However, classifying this high-dimensional feature space is difficult, and ordinary parametric classifiers are hugely influenced by the Hughes phenomenon (Lu et al., 2015; Ma et al., 2013).

For improving the classification performance of high-dimensional data, several methods are recommended including parameter determination of classifier (Liu et al., 2014; Phan et al., 2017) and selection of features (Samadzadegan et al., 2017). Numerous researchers investigated the optimization of the techniques mentioned earlier. They discovered that the highest accuracy because of the dependence of parameters and features might result from feature selection and simultaneous parameter determination (O'Boyle et al., 2008) The grid search is a conventional way to determine the accuracy, however, the highest accuracy is obtained because the classifier's parameters strongly impact its execution, (Hsu et al., 2003a). Additionally, the classification precision and computation cost can be determined by the selection approach for feature subset (Lin et al., 2008).

Also, Du et al. (2017) implemented PSO to optimize SVM parameters for precipitation prediction; their findings confirmed the supremacy of the suggested approach compared to the traditional models. The other fundamental level in the classification of high-dimensional data is feature selection. Taşkın et al. (2017) proposed a novel algorithm called high-dimensional model representation for feature selection. Its execution was finalized, concluding that the performances are improved in classification accuracies and computational times. Considering that parameter values may affect the feature subset selection and vice versa, other studies confirmed that the fittest classification achievement is achieved by selecting the classifier determination and feature at the same time by an optimization technique (Samadzadegan et al., 2012). Recently, a study conducted by (Abdulrahman, 2021), used SVM parameters, and the feature subset by PSO, which have a complicated relationship when joined together; thus, they are used to optimize the final classification outcome.

On the other hand, Marwaha et al. (2015) used two datasets, airborne LWIR hyperspectral image, and high-resolution RGB data, with ground-truth image tested on Thetford Mines, sited in the region of Québec, Canada. The authors performed a comparative analysis between pixel-based and object-based analysis approaches to classify airborne hyperspectral thermal data. The results on the thermal hyperspectral image confirm that the object-oriented algorithm works better in classifying objects with regular geometries and well-defined edges. At the same time, its performance drops with more confused and less defined patterns.

Therefore, the technique that is employed is based on classifying hyperspectral thermal and visible spectra images using an optimized feature-level image fusion approach. This work generates several features consisting of spectral and textural features, the vegetation index, the morphological building index (MBI), and PCs from hyperspectral thermal imagery (LWIR) data. The obtained results are used to classify all objects and discriminate all classes (bare soil, roofs, road, grass, tree, etc.) in a complex area. The parameters of the SVM classifier and the selection of the feature subset using PSO are determined by the proposed technique to improve the combined thermal and visible hyperspectral classification performance.

Optimum Hybrid Classification of Hyperspectral Thermal Imagery and VIS Data

According to the Particle Swarm Optimization, the proposed hybrid classification of hyperspectral thermal imagery and VIS data is summarized in a flow chart in Fig. 1. The hybrid feature space generation, the classification engine based on SVM, and the optimization with Binary PSO are three main parts that compose the proposed flow chart.

Fig. 1
figure 1

Flow chart of the suggested technique implemented in the paper

In order to fuse the hyperspectral thermal imagery and visible data, a hybrid feature space, including the spectral and textural features, is produced:

  1. 1.

    A series of initial processes are performed on the hyperspectral thermal data to remove the noisy data from the inputs, and then the principal components are extracted.

  2. 2.

    These data are added to the original hyperspectral bands.

  3. 3.

    The Gabor filter, vegetation index, and MBI are taken from the visible data, which comprise the spectral feature space.

By joining extracted features, the hybrid feature space is defined, and finally, normalization is utilized to convert data into a predetermined range [0, 1] so as to reduce the numerical complexity.

The SVM parameters consist of (1) regularization parameter C, which defines the balance between decreasing the training error and subduing the difficulty of the model; (2) Kernel parameters (Wu et al., 2007). Moreover, to maintain the endurance of SVM in high-dimensional space, the SVM is chosen as the classifier (Melgani et al., 2004). However, the two primary difficulties in applying the SVM classifier for high-dimensional data classification are the measurement of the SVM parameter and choosing the feature subset. Grid search is a conventional technique for model determination that delivers exhaustive research and chooses a series of parameter values with the highest compatibility (Hsu et al., 2003b). However, reliable model selection utilizing high-resolution grids takes a longer time for real-valued situations. Another crucial process in classifying high-dimensional datasets via SVM is the choice of optimum feature subset (Lin et al., 2008; Tan et al., 2008). Therefore, before SVM can be utilized to enhance the classification of such a hybrid feature space, optimized amounts for the parameters and the proper feature subsets are better to be thoughtfully picked. The binary PSO is a robust optimization which is population-based.

Therefore, to define the parameters of SVM and concurrently choose the features to obtain this goal, the Binary PSO as a robust population-based optimization algorithm is employed.

Hybrid Feature Space Generation

Considering that the feature space can directly control the system's performance in terms of computation complexity and processing duration and regulate the precision of the outcomes, it is a fundamental factor in the deciding procedure. Hence, an appropriate feature space depending on hyperspectral thermal imagery and visible data should be produced as the primary measure of the suggested approach.

Textural Features

The spatial features can significantly enhance classification precision because of the spatial correlation between neighbouring pixels (Janalipour and Mohammadzadeh, 2018). Notably, when the high spatial resolution imager is in mind, feature extraction becomes even more relevant. In this paper, both occurrence and co-occurrence statistics are used. In occurrence statistics, features are employed to describe local variance within 3 × 3, 5 × 5, 7 × 7, and 9 × 9 windows. However, the latter utilizes grey-level co-occurrence matrices (GLCM) via different window sizes of 5, 7, and 9, and the offsets of 1, 2, and 3 pixels, respectively. As a result, both contrasts and homogeneity features are obtained from visible imagery of every respective bond. Furthermore, Gabor features are derived in the spatial domain. The equations for each textural feature can be found in Haralick et al. (1973). The equations for the vegetation index, morphological building index, and Gabor are shown in Table 1.

Table 1 Textural features

Vegetation Index (VI)

Considering the biophysical properties of the vegetated land, the normalized proportion among the averaged LWIR channels, and the red band from visible spectra are utilized as VI (see Fig. 2 (b)).

Fig. 2
figure 2

Visible data a; vegetation index b; morphological building index c

Morphological Building Index (MBI)

This work calculated the morphological building index (BMI) that can detect buildings by defining the spectral-spatial features utilizing a suite of morphological operators to identify the residential areas as shown in Fig. 2. Taking into account that the Toller structures cast relatively longer shadows and thus provide a significant local contrast concerning their roof. In this regard, the white-top hat can highlight the bright house and building and can be used for calculating MBI in an unsupervised approach by applying it to a set of multidirectional linear SEs.

Classification Based on SVM

The proposed algorithm successfully identified each land cover via a binary SVM classifier, which is a training method obtained from statistical learning theory. The SVM calculates an optimally separating hyperplane to maximize the margin between the respective groups. Moreover, after classification, the majority filtering is utilized to overcome the binary map’s salt and pepper noise. However, if samples cannot be divisible in the primary space, kernel functions should be utilized for mapping data in spaces with higher dimensions by applying linear decision functions (Abe, 2005).

Considering a dataset having samples denoted by n \(\{ (x_{i} ,y_{i} )|i = 1,...,n\}\), where \(x_{i} \in \Re^{k}\) defines a vector for feature with components denoted by k, and \(y_{i} \in \{ - 1,\;1\}\) represents the label off \(x_{i}\). The SVM searches for the hyperplane \(w.\varphi (x) + b = 0\) in a space that is high dimensional which can divide the data from classes’ 1 and 1 with the highest margin. W represents the vector for weight, orthogonal to the hyperplane, the offset expression is defined by b, and a mapping function is represented by \(\varphi\), that gathers data into a high-dimensional space to divide the linearity of the data with a training error that is low. Maintaining the highest contrast equals achieving the lowest norm of w, and accordingly, the SVM has to be ready to solvation the minimization problems as follow:

$${\text{Minimize}}:\frac{1}{2}\left\| w \right\|^{2} + C\sum\limits_{i = 1}^{n} {\xi_{i} }$$
(1)
$${\text{Subject to}}: \, y_{i} (w.\varphi (x) + b) \ge 1 - \xi_{i} {\text{ and }}\xi_{i} \ge 0,{\text{ for }}i = 1,...,n$$

Here, C denotes an arrangement parameter that requires balancing the value of misclassifying in the training data and maximizing the margin and \(\xi_{i}\) is slack variables.

Below is the way to achieve the decision function by finding the minimum value for Eq. 1:

$$f(x) = \sum\limits_{{x_{i} \in {\text{SV}}}} {y_{i} \alpha_{i} \varphi (x_{i} ).\varphi (x) + b}$$
(2)

The parameter αi represents a constant value in the Lagrange multipliers defined through the minimization method. The SV correlates with a series of support vectors of the training samples for which the related Lagrange multipliers are significantly higher than zero. To calculate dot products among the given pair of samples in the feature space, the kernel function is used. Moreover, the Gaussian RBF used here in this work is a standard kernel described by Eq. (3)

$$K_{{{\text{Gaussian}}}} (x_{i} ,x_{j} ) = {\text{e}}^{{\frac{{ - \left\| {x_{i} - x_{j} } \right\|}}{{2\sigma^{2} }}}}$$
(3)

In the suggested approach, the classification module performs a fundamental function in assessing the fitness function, where training data and trained SVM train SVM are assessed by test (unseen) data. The evaluation is achieved by generating a confusion matrix and calculating the accuracy indicators.

Using the Binary PSO for Simultaneous SVM Parameter Determination and Feature Subset Selection

The Particle Swarm Optimization (POS) algorithm is applied, which is an algorithm that depends on the population as well as on the simulation of the social action of flying birds in their flock. In the area of research, a group of individuals (particles) are spread as a candidate solution in the PSO algorithm. Depending on the existing speed, and their intellectual and social background, they iteratively increase their answer and push towards an enhanced location (Engelbrecht, 2007).

Let \({X}_{i}^{t}=\{{x}_{i1},{x}_{i2},\dots ,{x}_{iD}\}\) and \({V}_{i}^{t}=\{{v}_{i1},{v}_{i2},\dots ,{v}_{iD}\}\) denote the position and velocity of the i particle, respectively, in the research area that is D dimensional at time t. At each iteration, particle i alters its position and velocity at every iteration, as shown below,

$$V_{i}^{t} = V_{i}^{t - 1} + c_{1} r_{1} \left( {{\text{pbest}}_{i} - X_{i}^{t - 1} } \right) + c_{2} r_{2} \left( {{\text{gbest}} - X_{i}^{t - 1} } \right)$$
(4)
$$X_{i}^{t} = X_{i}^{t - 1} + V_{i}^{t}$$
(5)

\({pbest}_{i}=\{{p}_{i1},\dots , {p}_{iD}\}\) denotes the personal best experience of particle i, \(gbest=\{{g}_{1},\dots ,{g}_{D}\}\) denotes the global best of all particles, \({c}_{1}\) and \({c}_{2}\) signifies the intellectual and social factors, respectively, and \({r}_{1}\), \({r}_{2}\) are haphazard variables in [0, 1]. By the utilization of PSO to handle binary search space, the solutions are characterized by binary strings, Binary PSO (BPSO) is presented. It changes standard PSO in the position updating step according to the sigmoid function.

Binary strings. To measure the SVM parameter and feature subset choice in the fusion of hyperspectral and LiDAR data, simultaneously, the solution includes four components: structural features, spectral features, kernel parameter, and regularization parameters (Fig. 3). The first and second parts are identical to the spectral (nhyper) and structural features (nLidar), respectively, in terms of widths. Regularization and kernel parameters are real-valued and transformed to binary coding for adaptability with the binary properties of the feature selection process. The regularization (nc) and kernel parameters (nk) depend on the amount of the parameters and the demanded accuracy, in terms of length.

Fig. 3
figure 3

Representation of solution for BPSO

For the assessment of the particle solution, a fitness function is used. Hence, in the binary of the solution, the first and the second parts define the feature and are to be chosen by designating ‘1’ to the ith bit. Moreover, the feature in the hybrid will be dismissed if the amount is ‘0’ for the ith feature. Additionally, the binary makeup of the third and fourth sections of the answer are switched to the real values for determining the SVM parameters in Eq. (6).

$$p = \min_{p} + \frac{{\max_{p} - \min_{p} }}{{2^{l} - 1}} \times d$$
(6)

Here, p, minp, and maxp correspond to the real value for bit string, its smallest and largest values, respectively, and are controlled by the user. Moreover, the length of the bit string and decimal values of the bit string, for each parameter, are represented by letters l and d, respectively.

Obtained data shows limited chosen features, but increased classification accuracy that overall organizes the evaluation function. The objective function that is provided in Eq. (7) can help in the solution of the multiple criteria problems by mixing the two aims with the production of a single objective fitness function.

$$f = \rho \times \left( {1 - {\text{accuracy}}} \right) + \left( {1 - \rho } \right) \times \frac{{N_{f} }}{N}$$
(7)

where f represents the fitness value, a constant parameter in [0,1] is represented by \(\rho ,\) the kappa coefficient is used to achieve accuracy (Congalton & Green, 2019), Nf describes the number of selected features, and N is the number of total features, including spectral and structural features.

As can be seen in Fig. 1, the reasonable solutions can be randomly produced in the first cycle, and afterwards, the particle is evaluated by Eq. (7) in a way that the chosen one as the global solution for the population has the maximum classification accuracy and the minimum selected feature subset. Furthermore, individual particles compare their current location alongside all the previously experienced locations from which the optimum location is chosen. Then the velocity of the particle is updated by Eq. (8), and the particles that are displaced are calculated, sigmoid function is applied to the velocity vector, as in Eq. (9), to identify the novel location that illustrates a new featured subset and SVM parameters. Finally, based on Eq. (10), the particle’s position \({x}_{\mathrm{id}}^{t}\) is computed in a way that shows the ith component of its new position (feature space/SVM parameters).

$$v_{{{\text{id}}}}^{t} = v_{{{\text{id}}}}^{t - 1} + c_{1} r_{1} \left( {p_{{{\text{id}}}} - x_{{{\text{id}}}}^{t - 1} } \right) + c_{2} r_{2} \left( {g_{d} - x_{{{\text{id}}}}^{t - 1} } \right)$$
(8)
$$s\left( {v_{{{\text{id}}}}^{t} } \right) = \frac{1}{{1 + {\text{e}}^{{ - v_{{{\text{id}}}}^{t} }} }}$$
(9)
$$x_{{{\text{id}}}}^{t} = \{ 1, {\text{if}} s\left( {v_{{{\text{id}}}}^{t} } \right) > \rho_{{{\text{id}}}} 0, {\text{Otherwise}}$$
(10)

where \({\rho }_{\mathrm{id}}\) represents a vector of random numbers that are picked arbitrarily from 0 and 1, the algorithm begins with initial locations and velocities; at each iteration, Eq. (8) is used to update the velocity components of all particles and then they are transferred to the range of [0, 1] by the sigmoid function. After that, as a new location for particles, a binary string is constructed. Once a termination criterion such as maximum iteration is fulfilled, the repetition of this procedure is ceased. The fitness functions, including; dimensionality of feature space and classification accuracy, are enhanced through multiple iterations according to the swarm intelligence theory.

Experiments and Results

Remote Sensing data

The proposed methodology was applied to a dataset provided by the 2014 IGARSS Data Fusion contest. The dataset consists of thermal data acquired by Telops Inc. Another sensor collected the visible high-resolution images and covers an urban area that includes roads, gardens, and residential and commercial buildings located around Thetford Mines in Québec, Canada.

As an airborne long-wave infrared hyperspectral, the Telops Hyper-Cam collected the LWIR data (ref. to Fig. 4a-and-d). It is worth mentioning that Telops Hyper-Cam is a Fourier-transform spectrometer (FTS), including 84 spectral bands in the 7.8 − 11.5 μm wavelength realm.

Fig. 4
figure 4

Dataset for the 2014 Data Fusion Contest: a RGB false-colour composed of the LWIR image, b colour data, c training labels. The last row represents the data which was released for testing in the second phase: d LWIR data, e colour data, f ground truth (color figure online)

Two distinct sensors with a small temporal gap (the LWIR and visible bands) were adopted on May 21, 2013, for collecting two series of airborne data from an average height of around 800 m above the ground. The obtained images provide an averaged spatial resolution of about 1 m for the LWIR and 0.1 m for the visible-bond images. Moreover, to reduce the differences in the resolution of these two sets of data, the visible images were then resampled to 0.2 m, utilizing two internal calibration blackbodies, and then for the infrared measurements, the end-to-end radiometric calibration was executed. It is worth mentioning that the initial inputs were radiometrically and geometrically corrected. Moreover, the materials of the visible wavelength were composed of uncalibrated data, with high spatial resolution and sparse digital data ground coverage, compared to the LWIR hyperspectral images that were taken from the same area (ref. Figure 4b and e). Moreover, the visible data were geo-referenced and designated to the thermal data.

Classification Results

The SVM classifier with RBF kernel was utilized to assess the integrity of the hybrid feature space. The specialized Matlab interface was used to perform the LIBSVM (Chang, 2001). To address the evaluation measure choice, as a most critical issue in the ordering procedure and to determine the classification accuracy the kappa coefficient and the overall accuracy are employed. Based on the principles proposed by (Congalton & Green, 2019), the Khat index is employed to estimate the correctness of the classification by computing the confusion matrix, as shown in Fig. 5.

Fig. 5
figure 5

a Vegetation and tree, b grey roof, c soil, d roads, e red roof and concrete roof, f primary classification map (color figure online)

Simultaneous Parameter Determination and Feature Selection Based on BPSO

Several correlated and unnecessary features are available that deteriorate the classification output of the suggested approach. However, the hybrid imagery method enhances the classification accuracy. Additionally, the SVM parameters are a crucial additive element in classification performance, considering that they influence the selection of feature subset and vice versa. Thus, the SVM parameters tuning and feature subset selection depending on BPSO (as provided in Table 2) are performed simultaneously in this work to address this problem. The complexity and dimensionality of the search space are two factors that the binary string’s length is chosen relative to, and the rest of the parameters are organized based on environmental preconditions.

Table 2 BPSO parameter values

The convergence plots are given in Fig. 6 for the BPSO schemes for the VIS and LWIR features and respective hybrid images, and the fitness value for the most suitable one in each production is presented. Based on Eq. 7, the weight parameter is fixed at 0.8. This indicates 80% of fitness is dedicated to accuracy and 20% is dedicated to the dimensionality of feature space.

Fig. 6
figure 6

The value of fitness for the global best in every cycle of BPSO

Close inspection in Fig. 6 reveals an increase in the fitness values (i.e. classification performance), which are relatively higher for the hybrid imagery compared to the VIS and LWIR features. However, the fitness function, as discussed above, is composed of two distinct components, including; the kappa coefficient and the feature space dimension. Figure 7 provides the global best kappa coefficient based on the iterations to estimate the variation in the classification accuracy.

Fig. 7
figure 7

The coefficient of kappa for the global best in every cycle of BPSO

Depending on iterations, Fig. 8 represents the number of chosen features showing the global best and provides an estimation of the diversity of feature space dimensionality. Furthermore, a close inspection of this figure declares that fewer feature subset sizes are chosen for hybrid image (102 features) compared to the whole picked numbers for VIS (37 features) and LWIR (65 features), separately.

Fig. 8
figure 8

The quantity of features chosen for the global best in each cycle of BPSO

The collected results produced increases in classification accuracy and extraordinary decreases in the feature space dimension. The features chosen for the recommended approach for hybrid imagery, structural feature space, and spectral features space are compiled in Table 3.

Table 3 Chosen features in the suggested approach

Table 4 includes the quantity of chosen features, alongside the quantities for the regularization, kernel parameters, and the classification accuracy of checking and validation dataset, defined through the suggested approach for spectral and structural hand hybrid feature space.

Table 4  Output for the selection of features and determination of parameters simultaneously depending on BPSO for VIS and LWIR feature space and hybrid image

Scrutinizing in Table 4 shows that implementing the suggested approach for the hybrid image produces a better production in comparison with any other dataset. The detailed estimations of results for any class accuracy assessment are represented in Fig. 9.

Fig. 9
figure 9

Classification outcomes for each class depending on the proposed method

Lastly, in Fig. 10, the performance of the recommended approach is contrasted with some of the previously published works, namely on the 2014 IEEE dataset (IEEE, 2014). Close inspection of this figure revealed that the performance of our approach is superior to the other previous reports. It is worth mentioning that the analyses were executed on the same test and the train dataset was also provided by Telops Inc.

Fig. 10
figure 10

Comparison of kappa coefficient of previous researches with our method

Conclusion

Throughout this research, an innovative structure is considered to enhance the hybrid classification system. Then the suggested approach is investigated depending on BPSO for the fusion of hyperspectral thermal and visible data. Moreover, the visible and LWIR spectra were used to obtain a couple of spectral and spatial features. It is known that the SVM classifier is a suitable one in higher-dimensional spaces. Moreover, by measuring the parameters and selection of feature subsets simultaneously, its representation is optimized. The achieved outcomes revealed that the hybrid classification system based on BPSO cannot only promote classification efficiency by up to seven percent but may also increase the per-class accuracy through the removal of unnecessary features. Accordingly, the excellent system of hybrid classification achieved higher accurate outcomes in a lower complex space. Additionally, the entire classes in the hybrid system show improved or at least the same accuracy compared to the results obtained from the classification of visible and LWIR separately. Finally, a comparison of the archived outcomes based on the suggested methodology with the previously reported data in the 2014 IEEE GRSS Data Fusion Contest confirmed the encouraging results and the superiority of our approach.