1 Introduction

The state of the globe is continuously changing now than ever. Assessing these changes and their effect is imperative for protecting the planet. Hyperspectral remote sensors record comprehensive spectral responses from ground objects over hundreds of contiguous, narrow electromagnetic spectrum bands. The pixels of the obtained hyperspectral image (HSI) can be visualized as high-dimensional vectors containing values ​​corresponding to the spectral reflectance. The high-dimensional values of each pixel help to determine minute spectral differences by which ground entities can be distinguished [1]. The HSI is successfully used in various fields, such as land use, exploration of minerals, detection of water pollution, and so on [2]. The classification or distinction of objects is the main aim of almost all HSI applications. In the HSI classification, a class category is determined for each pixel. Although HSI is particularly rich in information, the HSI classification faces several challenges. The HSI classification requires considerable computing overhead and high memory expenses due to the large number of spectral bands in the HSI data [3]. Additionally, collecting labelled samples for the ground truth image is costly, labour-intensive, and time-consuming. Therefore, the HSI dataset has a limited number of labelled samples. The high dimensionality and limited number of labelled samples of HSI data lead to the “curse of dimensionality” challenge. This challenge states that, for a fixed, limited number of training samples, the classification accuracy diminishes as the dimensionality of the data increases [4]. Moreover, the neighbouring bands in the HSI usually have a strong correlation with each other. In other words, the HSI has redundant data [5]. Data redundancy can make models unstable [6]. Because of these difficulties, dimensionality reduction (DR) becomes a necessary pre-processing step for HSI applications. The approaches of DR eliminate unnecessary bands and preserve crucial data [7].

The DR approaches are divided into two classes: feature extraction (FE) and feature selection (also referred to as band selection) (BS). The FE method maps the data from high-dimensional to low-dimensional with specific constraints. The principal component analysis (PCA) [8] and independent component analysis (ICA) [9] are examples of FE. The data produced by the FE methods lost its physical significance since they changed the original band properties [10]. The BS strategy determines a subset of the original bands that provide relevant data.

Various BS methods are available in the literature [1, 7]. The BS methods can be categorized into filter, wrapper, and embedded. The filter method determines the relevance of each band based on some criteria and select bands based on this relevance. It is independent to the training process. In the filter method, higher emphasis is placed on analyzing the merits of individual bands, and the benefits of the combination of bands are not considered [11].Wrapper methods start with a subset of bands and evaluate their importance using a prediction algorithm. This process is repeated for different subsets until the optimal subset is reached [12].In embedded feature selection, the selection process is integrated into the model learning procedure [11].

For highly correlated features (bands), a subset of bands with better classification performance can be obtained by incorporating the benefits of a combination of bands in the band selection process [11, 13]. Wrapper methods consider the advantages of combining bands. These methods involve an exhaustive or heuristic search strategy to identify a subset of bands from the original data [14]. Exhaustive search generates every possible subset of features to identify the best subset. Although this goal is desirable, search is an NP-hard combinatorial problem [15].

Metaheuristic techniques are being used to solve combinatorial problems more efficiently. These can investigate many potential band subgroups to identify a nearly ideal feature subgroup within a reasonable timeline [16]. Exploration (or diversification) and exploitation (or intensification) are the two primary concerns in metaheuristic search algorithms. These two must be precisely balanced in the algorithm to obtain optimal results in a reasonable amount of time [17]. Exploration is the potential of the metaheuristic algorithm to search new and diverse regions of the solution space, and exploitation is the potential of the metaheuristic algorithm to use the best solutions so far and refine them locally. For HSI band selection, several nature-inspired metaheuristics have already been thoroughly investigated. These include the genetic algorithm (GA) [11], particle swarm optimisation (PSO)[18], grey wolf optimisation (GWO) [19], artificial bee colony (ABC) [20], wind-driven optimisation (WDO) [21], whale optimisation [22] and others.

The self-adaptive differential evolution method is used for BS [23]. In [18], a BS scheme is presented utilizing PSO, where the objective function has been created under the constraint of the minimal predicted abundance covariance. The PSO-based approach for the BS, which also chooses the appropriate number of bands, has been presented [24]. Another PSO-based BS method for HSI target detection was investigated in [25]. To devise a hybrid BS strategy, the authors merged a GA and PSO [26]. The GA is used for the BS, where the objective function has been formed using a weighted combination of entropy and image gradient [13]. A modified Lévy flight-based GA variant is used to avoid becoming stuck in local optima [27]. A binary form of Cuckoo Search (CS) has been employed for wrapper BS [28]. Authors of [20] have applied improved subspace decomposition and the artificial bee colony for the BS. A wind-driven optimization (WDO) model has been refined using PSO, and its use for band selection has been studied in [21]. The authors have presented a hybrid BS approach combining WDO and CS [29]. Gray Wolf Optimization (GWO), which is an algorithm inspired by the nature of grey wolves, has been used for BS [19]. Recently, a modified GWO algorithm has been presented for the BS, where chaotic operations are applied for indexing the wolves [30]. A modified discrete gravitational search algorithm creates band subsets by adhering to a requirement that increases the information conveyed by each element in a subset and reduces duplicated data between groups [31]. The capability of an ant colony algorithm for band selection has been extended using a pre-filter and a dynamic information update technique [32]. In [33], the potential of the moth-flame optimization for the BS has been investigated. The membrane whale optimization and the wavelet support vector machine ensemble technique for the BS are presented in [34].

However, these algorithms also have their limitations. The metaheuristic search algorithms for BS may get trapped in the local optimum, making it difficult to reach the global optimum. More redundant and irrelevant bands can also lead to more local optima in the solution space. These local optimum points can significantly reduce the effectiveness of metaheuristic algorithms to attain the global optimum. In addition, metaheuristic algorithms can converge slowly because of local optimum points. When exploitation and exploration are effectively balanced, the algorithm produces optimal results within a reasonable timeframe.

Thus, an investigation should be performed by combining two or more metaheuristic algorithms to balance exploration and exploitation. A hybrid metaheuristic algorithm integrates two metaheuristic strategies with the objective that the merits of both can be attained.

Crow search algorithm (CSA) and PSO are population-based meta-heuristic algorithms with different search procedures. PSO is motivated by the collective behaviour of flocks, whereas the behaviour of crow groups drives the crow search algorithm (CSA). They update the individuals in the population differently from each other to explore the search space. PSO emphasizes exploitation, whereas CSA places more focus on exploration.

Therefore, this paper proposes wrapper-based BS methods using the hybridization of PSO and CSA.

In hybridization methods, PSO and CSA exchange valuable information at each iteration to incorporate the strengths of both in the search process. These methods aim to speed up convergence and select the best band set for classification performance. The main contributions of this paper are as follows:

  1. 1)

    To the best of our knowledge, first time applying the CSA for band selection from the HSI data.

  2. 2)

    This paper proposes two hybrid models: HPSOCSA_SP and HPSOCSA_SLP, which are based on the hybridization of PSO and CSA. The goal of these hybridizations is that the competencies of PSO and CSA can be achieved in a hybrid search process

  3. 3)

    In HPSOCSA_SP, split the population into two equal parts. PSO is applied to one part and CSA to another.

  4. 4)

    In HPSOCSA_SLP, selects half of the top-performing members based on fitness. PSO and CSA are applied to the selected population sequentially.

  5. 5)

    This paper thoroughly examines the capability of proposed models on four benchmark HSI datasets in terms of class accuracy, average accuracy, overall accuracy, kappa score, precision, recall and F1 score.

  6. 6)

    The proposed methods achieved good results when compared with contemporary metaheuristic approaches. It achieved better classification accuracy with fewer bands.

The rest of the paper is organized into three sections. Section 2 includes the description of the proposed model. Section 3 presents the experimental results and comparison. Section 4 concludes this paper.

2 Methodology

This section explains the technological background of PSO and CSA, followed by descriptions of the proposed band selection methods.

2.1 Particle swarm optimization

PSO is a population-based metaheuristic optimization technique [35]. The collective behaviour of bird flocks forms the basis for the operating principle of the PSO [36]. In PSO, a group of particles (prospective solutions) forms a population. In pursuit of the optimal answer, particles move in the search area at a particular velocity. The best-known positions of each particle and the best-known of the group are used to determine the new locations of the particles [37].

Consider that n bands need to be selected, and the position of particle keeps the indices of the selected bands. The position vector (size n) of particle i at iteration itr is represented by\({x}_{pso}^{i, itr}\). The velocity of the ith particle in iteration itr is determined by the following expression:

$${v}^{i, itr+1}=w{v}^{i, itr}+{c}_1{r}_1\left({x}_{pB}^i\right.-{x}_{pso}^{i,\kern0.5em itr}\ \left)+{c}_2{r}_2\left({x}_{gBest}\right.-{x}_{pso}^{i,\kern0.5em itr}\right)$$
(1)

where xgBest is the current optimum position, \({x}_{pB}^i\) is the personal best position of ith particle and w is the inertia weight. The participation of the local and global best is governed by cognitive factor (c1) and social factor (c2), respectively, and r1 and r2 are random numbers between 0 and 1 [25, 38].

Based on the changed velocity, the position of the ith particle is determined by the following:

$${x}_{pso}^{i, itr+1}={x}_{pso}^{i,\kern0.5em itr}+{v}^{i,\kern0.5em itr+1}$$
(2)

The ith particle updates its personal best using Eq. (3)

$${x}_{pB}^{i, itr+1}=\left\{\begin{array}{l}{x}^{i, itr+1}\kern5em if\ Fitness\left({x}^{i, itr+1}\right)\ is\ better\ than\ Fitness\left({x}_{pB}^i\right)\\ {} No\ Change\kern5.75em Otherwise\kern16em \end{array}\right.$$
(3)

2.2 Crow search algorithm

CSA is a population-based metaheuristic optimization method [39]. The fundamental principle of CSA is based on the behaviour of crow flocks. Crows live in groups. They explore food spots and memorize the best ones they find. When searching for new food sites, crows frequently follow another crow to see its food site and steal from it. In addition, when a crow detects that another crow is following it, it will flee to a random location rather than its food source [40].

Consider a d-dimensional search space and N number of crows. At iteration itr, the position of each crow i is denoted by a vector\({x}^{i, itr}=\left[{x}_1^{i, itr},{x}_2^{i, itr},\dots {x}_d^{i, itr}\right]\). Each crow represents a potential solution for the concern. A variable X depicts the location of all crows as shown in Eq. (4).

$$X=\left[\begin{array}{cccc}{x}_1^1& {x}_2^1& \cdots & {x}_d^1\\ {}{x}_1^2& {x}_2^2& \dots & {x}_d^2\\ {}\cdots & \cdots & \dots & \cdots \\ {}{x}_1^N& {x}_2^N& \cdots & {x}_d^N\end{array}\right]$$
(4)

The fitness of each crow’s position is estimated using an objective function. Each crow has a memory for the position that is currently the fittest. At iteration itr, the memory of each crow i is denoted by a vector \({M}^{i, itr}=\left[{M}_1^{i, itr},{M}_2^{i, itr},\dots {M}_d^{i, itr}\right]\). A matrix, MEM (which kept the memory of all crows), was represented as:

$$MEM=\left[\begin{array}{cccc}{M}_1^1& {M}_2^1& \cdots & {M}_d^1\\ {}{M}_1^2& {M}_2^2& \cdots & {M}_d^2\\ {}\cdots & \cdots & \cdots & \cdots \\ {}{M}_1^N& {M}_2^N& \cdots & {M}_d^N\end{array}\right]$$
(5)

In other words, the MEM variable keeps the location of the foods that each crow has hidden.

The ith crow generates a new location at any iteration itr by following to the jth crow (selected randomly). Two states are probable in this situation:

  1. State 1:

    jth Crow is unaware that ith Crow is following it.

$${x}^{i, \;itr+1}={x}^{i, \;itr}+{r}_i\times {FL}^{i, \;itr}\times \left({M}^{j, \;itr}-{x}^{i, \;itr}\right)$$
(6)

where flight length (length covered by crow in a single ride) is represented using FL, and variable ri is a random value between 0 and 1.

  1. State 2:

    jth Crow is aware that ith Crow is following it.

$${x}^{i, itr+1}= Any\ Random\ Position$$
(7)

In CSA, a variable called awareness probability (AP) is used, which measures the likelihood that a crow will be conscious that it is being pursued. On the basis of AP Eqs. (6) and (7) are merged as followed:

$${x}^{i, itr+1}=\left\{\begin{array}{l}{x}^{i, itr}+{r}_i\times {FL}^{i, itr}\times \left({M}^{j, itr}-{x}^{i, itr}\right)\kern1.25em when\ {r}_j\ge {AP}^{j, itr}\\ {} Any\ Random\ Position\kern6.5em Otherwise\kern6.00em \end{array}\right.$$
(8)

The ith crow updates its memory using Eq. (9)

$${M}^{i, itr+1}=\left\{\begin{array}{l}{x}^{i, itr+1}\kern5em if\ Fitness\left({x}^{i, itr+1}\right)\ is\ better\ than\ Fitness\left({M}^{i, itr}\right)\\ {} No\ Change\kern5.75em Otherwise\kern16em \end{array}\right.$$
(9)

where Fitness () represents fitness value.

2.3 Proposed methodology

Assume that the HSI data cube is represented using HD1, D2, D3, in which D1 and D2 represent the spatial dimensions and D3 the spectral. In other words, H has a D3 number of bands, and each band has a D1 × D2 number of pixels. L represents the count of labelled samples obtained from the ground truth of H. The band selection algorithm aims to find a subset of the bands n (from [1, D3]) that provides high classification accuracy.

This paper proposes three methods for band selection:

1) CSA-based BS.

2) Hybridization of PSO and CSA by split population (HPSOCSA_SP) for BS,

3) Hybridization of PSO and CSA by select population (HPSOCSA_SLP) for BS.

Figure 1 shows the workflow of the process, which includes preprocessing, BS based on the proposed method, classifiers based on selected bands, and performance evaluation.

Fig. 1
figure 1

Workflow of the proposed process

2.3.1 Preprocessing

This part consists of two steps: mapping and initialization. An input 3-dimnesional HSI data cube is converted into a 2D matrix in mapping. Data cube H has been transformed into a 2-dimensional matrix (called a “Feature”) of size L × D3 .The initialization step involves initializing the parameters of different aspects of the applied metaheuristic algorithm, such as search space boundary, population size (N), and maximum iteration (itrMax).

Objective function

In this paper, the classification error of the classifier is used as the objective function. In this case, the objective function is used as a minimization function. Based on the objective function fitness of each member of the population is calculated. The classification error (CE) is computed using on the basis of overall accuracy (OA) Eqs. (10)-(11).

$$OA=\frac{\sum_{i=1}^r{t}_i}{T}$$
(10)
$$CE=1- OA$$
(11)

where r represents the no of classes in the HSI data, ti is the no of samples of the ith class that are correctly classified (i = 1, 2… r), and T is the total no of labeled samples.

2.3.2 CSA-based BS

Suppose X (size N × n) and Fit (size N) store the position and corresponding fitness value of each crow of the population, respectively. In addition, the best-known position of each population member and the fitness of that position are maintained in MEM (size N × n) and FitM (size N), respectively. The proposed CSA-based BS starts with random population initialization in such a way that the size of the position matrix must be N x n, where N is the number of crows (population size) and n is the number of bands to be selected. The position of the crow keeps the indices of the selected bands. Each crow represents a population member, and their corresponding memory positions denote a possible solution (a subset of selected bands). Algorithm 1 includes the pseudo-code of the CSA-based BS.

Algorithm 1
figure a

Crow Search Algorithm based HSI band selection

2.3.3 Hybrid approaches involving PSO and CSA for BS

Metaheuristic techniques begin by exploring local optima and then move towards global optima to solve optimization problems [41]. For that, balancing exploration and exploitation to reach global optima during iteration is crucial. From Eqs. (1)-(2), it can be observed that in PSO, the movement of particles depends on their best position and the group’s best position. This aspect assists in exploiting existing areas. It has a limited ability to explore unknown areas in search space. Equation (8) shows that in CSA, when a crow becomes aware of others following it, it moves to a random position in the search space. As a result, CSA is more effective in exploring new areas.

Two hybrid approaches (HPSOCSA_SP and HPSOCSA_SLP) based on PSO and CSA are proposed for band selection. In the proposed hybrid methods, the exploitation capability of PSO and the exploration capability of CSA are involved in the search process. PSO and CSA exchange information via population sharing to find an optimal solution at a reasonable time.

HPSOCSA_SP randomly split the population into two equal parts. In this approach, half of the members are processed by PSO and the rest by CSA in each iteration. The population members processed by PSO may or may not be processed by CSA in the next iteration. Similarly, population members processed by CSA may or may not be processed by PSO in the next iteration. HPSOCSA_SLP selects half the top-performing members based on fitness and sequentially applies PSO and CSA in each iteration.

Thus, HPSOCSA_SP is more random than HPSOCSA_SLP regarding the information sharing between PSO and CSA through population members.

HPSOCSA_SP

The proposed HPSOCSA_SP randomly generates the initial population. The entire population is then split into two equal parts. Participants in each part are randomly selected. PSO is applied to one portion, and CSA is applied to another. Figure 2 shows the workflow of HPSOCSA_SP, which includes population splitting, PSO process, CSA process, and population integration.

Fig. 2
figure 2

Work flow of the HPSOCSA_SP

The population is initialized randomly during preprocessing. Then, population splitting, PSO, CSA, and population integration form a new population for the next generation.

Population splitting

In this step, the entire population is divided into two equal parts of size N/2 (N1) (Part A and Part B). The PSO and CSA progress on Part A and Part B, respectively. Suppose idx, idx1, and idx2 are three vectors where idx = {randomly generate N integer number in [1, N]}; idx1 = {first half of idx}; and idx2 = {second half of idx2}.In the proposed method, the memory of the crow is treated as the same as the particle’s best-known position because both perform the same role.

Part a

The position matrix (xpso) for Part A is derived from X by using the following:

$${x}_{pso}=X{\left[ idx1\right]}_{N1\times n}$$
(12)

The velocity matrix, (vpso) for Part A is determined from v using the following:

$${v}_{pso}=v{\left[ idx1\right]}_{N1\times n}$$
(13)

The fitness of position matrix (Fitpso) for Part A is determined from Fit using the following:

$${Fit}_{pso}= Fit{\left[ idx1\right]}_{N1}$$
(14)

The personal best position matrix (xpBest) for Part A is determined from XpB using the following:

$${x}_{pB est}={X}_{pB}{\left[ idx1\right]}_{N1\times n}$$
(15)

The fitness of personal best position (FitPpso) for Part A is determined from FitP using the following:

$${FitP}_{pso}= FitP{\left[ idx1\right]}_{N1}$$
(16)

Part B

The position matrix (xcsa) for Part B is determined from X using the following:

$${x}_{csa}=X{\left[ idx2\right]}_{N1\times n}$$
(17)

The velocity matrix (vcsa) for Part B is determined from v using the following:

$${v}_{csa}=v{\left[ idx2\right]}_{N1\times n}$$
(18)

The fitness of position matrix (Fitcsa) for Part B is determined from Fit using the following:

$${Fit}_{csa}= Fit{\left[ idx2\right]}_{N1}$$
(19)

The memory matrix (Mcsa) for Part B is determined from XpBusing the following:

$${M}_{csa}={X}_{pB}{\left[ idx2\right]}_{N1\times n}$$
(20)

The fitness of memory matrix (FitMcsa) for Part B is determined from FitP using the following:

$${FitM}_{csa}= FitP{\left[ idx2\right]}_{N1}$$
(21)

PSO process

The PSO process updates the population Part A. This process updates the values of xpso, vpso, Fitpso, xpBest, and FitPpso based on PSO.

CSA process

The CSA process updates the population Part B. This process updates the values of xcsa, Fitcsa, Mcsa and FitMcsa. The values of vcsa stay the same after this process.

Population integration

In this step, the updated Part A and Part B are combined into a single population size N. The value of X, v, Fit, XpB, and FitP are updated using the following:

$$X={\left[\begin{array}{c}{x}_{pso}\\ {}{x}_{csa}\end{array}\right]}_{N\times n}$$
(22)
$$v={\left[\begin{array}{c}{v}_{pso}\\ {}{v}_{csa}\end{array}\right]}_{N\times n}$$
(23)
$$Fit={\left[\begin{array}{c}{Fit}_{pso}\\ {}{Fit}_{csa}\end{array}\right]}_N$$
(24)
$${X}_{pB}={\left[\begin{array}{c}{x}_{pB est}\\ {}{M}_{csa}\end{array}\right]}_{N\times n}$$
(25)
$$FitP={\left[\begin{array}{c}{FitP}_{pso}\\ {}{FitM}_{csa}\end{array}\right]}_N$$
(26)

After integration, update the global best solution (XBest) based on fitness value of XpB.

In brief, HPSOCSA_SP involves preprocessing, followed by a sequential repetition of population-splitting, PSO, CSA, and population integration itrMax times. After the whole process, the values of XBest represent the indices of the selected bands. Algorithm 2 includes the pseudo-code of the HPSOCSA_SP. The flowchart of HPSOCSA_SP is presented in Fig. 3.

Algorithm 2
figure b

HPSOCSA_SP based HSI band selection

Fig. 3
figure 3

Flowchart of HPSOCSA_SP process

HPSOCSA_SLP

Figure 4 illustrates the workflow of HPSOCSA_SLP. The process of HPSOCSA_SLP consists of four steps: selection, PSO process, CSA process, and population-integration.

Fig. 4
figure 4

Work flow of the HPSOCSA_SLP

The population is randomly initialized during the preprocessing stage. A new population for the upcoming generation is created using selection, PSO, CSA, and population integration. The details of these steps are given below.

Selection

This step selects N1 of the top-performing members. For that, based on the fitness value of every member of the population, the members are ordered (sorted) so that the first member has the best fitness value, and the last member has the worst. The first half of the population is picked. These selected members of the population are viewed as nobles.

The fitness of personal best vector, FitP ordered using the following:

$$\left[{FitP}_{order}, idx3\right]= sort\left[ FitP\right]$$
(27)

where FitPorder includes the sorted fitness value of personal-known, while idx3 has the corresponding indices of FitP vector. As in this proposed model, the first half of the sorted population is to be selected for the PSO operation, so another vector called idx4 is maintained, which keeps the first N1 values of idx3.

Using idx4, position matrix X, fitness vector Fit, velocity matrix v, the personal best-known matrix XpB, and FitP are arranged as follows:

Nobles

$${x}_{pso}=X{\left[ idx4\right]}_{N1\times n}$$
(28)
$${Fit}_{pso}= Fit{\left[ idx4\right]}_{N1}$$
(29)
$${v}_{pso}=v{\left[ idx4\right]}_{N1\times n}$$
(30)
$${x}_{pB est}={X}_{pB}{\left[ idx4\right]}_{N1\times n}$$
(31)
$${FitP}_{pso}= FitP{\left[ idx4\right]}_{N1}$$
(32)

PSO process

In this step, PSO updates particles of the selected population, creating the PSO-generated population. The updated values of Xpso, Fitpso, vpso, XpB_pso, and FitPpso are also stored in xcsa, Fitcsa, vcsa, Mcsa, and FitMcsa, respectively, as this population becomes the input for the CSA process.

CSA process

In this step, the members of PSO-generated population are updated by CSA. In other words, the values of xcsa, Fitcsa, Mcsa, and FitMcsa are updated. After this process, the values of vcsa remain unchanged. The resulting population is termed the CSA-generated population.

Population integration

This step involves integrating the PSO-generated population and the CSA-generated population into a single population (of size N). The position matrix X, velocity matrix v, fitness vector Fit, personal-best XpB, and fitness of personal-best FitP updated using Eqs. (22)-(26).

In brief, HPSOCSA_SLP involves preprocessing, followed by a sequential repetition of selection, PSO, CSA, and population integration itrMax times. After the whole process, the values of XBest represent the indices of the selected bands. Algorithm 3 includes the pseudo-code of the HPSOCSA_SLP. The flowchart of HPSOCSA_SLP is presented in Fig. 5.

Algorithm 3
figure c

HPSOCSA_SLP based HSI band selection

Fig. 5
figure 5

Flowchart of HPSOCSA_SLP process

3 Result and discussion

3.1 Dataset

The effectiveness of the proposed band selection methods was evaluated using four benchmark HSI datasets Indian Pines (IP), Kennedy space centre (KSC), Botswana (BW), and Pavia University. Table 1 lists the details of the mentioned HSI datasets [42].

Table 1 Summarized details HSI datasets

3.2 Performance evaluation measure

The usefulness of the proposed methods has been evaluated using the following quantitative metrics: class accuracy (CA), average accuracy (AA), OA, Kappa coefficient, precision, recall, and F1 score[43,44,45].

$${C}_i=\frac{t_i}{T_i}$$
(33)
$$AA=\frac{\sum_{i=1}^r{C}_i}{r}$$
(34)
$$Kappa=\frac{p_0-{p}_c}{1-{p}_c}$$
(35)
$${p}_c=\frac{1}{T^2}\bullet {\sum}_{i=1}^r{s}_t\bullet {s}_c$$
(36)
$$Precision\ (P)=\frac{TP}{TP+ FP}$$
(37)
$$Recall\ (R)=\frac{TP}{TP+ FN}$$
(38)
$$F1\ score=\frac{2\ast Precision\ast Recall}{Precision+ Recall}$$
(39)

where Ci is class accuracy of ith class, ti is the count of samples of ith class that classified correctly, Ti is total count of samples in the ith class, T is total count of labeled samples in the image, r is total count of classes, po represent observed agreement, pc represent chance agreement, st is true count of samples of ith class, and sc is count of samples classified as into the ith class [46]. TP, FP, and FN represent true positive, false positive and false negative, respectively.

3.3 Experimental setup

The experiments are executed on the Intel Xeon 2.2 GHz CPU having 64 GB RAM using MATLAB. For each dataset, the labeled pixels (samples) are divided into two parts: training data and a testing data. The training data consists of 10% of samples from each land-cover class chosen at random, with the rest making up the testing data. Then, the HSI with selected bands is classified using a support vector machine (SVM) classifier. The CA, AA, OA and Kappa are computed using testing data. This experiment is repeated 10 times for each dataset and average performance is reported. In addition, the proposed methods are compared with those of two alternative metaheuristics based feature selection methods, GA and PSO. GA is a population-based metaheuristic algorithm [47]. The values of the parameters of the mentioned algorithm are presented in Table 2. All parameters of PSO and CSO are included in HPSOCSA_SP and HPSOCSA_SLP.

Table 2 Parameters values of GA, PSO and CSA

3.4 Experimental results

In this part, the outputs of the proposed approach are presented. Figure 6 presents the value of OA (in %) of the proposed methods and other tested methods for a selected number of bands.

Fig. 6
figure 6

Overall accuracy for a selected number of bands (a) Indian Pines (b) KSC (c) Botwana (d) Pavia University

Figure 4 show that CSA, HPSOCSA_SP, and HPSOCSA_SLP outperform GA and PSO for all four used datasets. For the Indian Pines (in Fig. 6a), when the value of n was 10, the CSA achieved the best OA (76.20%). In addition, HPSOCSA_SLP outperforms the other competitors in the majority of instances. For the KSC dataset (in Fig. 6b), when the value of n was 5, the HPSOCSA_SP achieved the best OA (88.53%). Also, HPSOCSA_SP produced higher OA than others in most of cases for this dataset. When n was 5, HPSOCSA_SP achieved the best OA (89.02%) for the Botswana dataset shown in Fig. 6c. When n was 25, HPSOCSA_SLP achieved the best OA (87.57%) on the Pavia University dataset (Fig. 6d).

The classification results obtained by different methods (n = 10) are presented in Tables 3, 4, 5, and 6 for the Indian Pines, KSC and Botswana, respectively. Table 3 demonstrates that the proposed models (CSA-based, HPSOCSA_SP, and HPSOCSA_SLP) obtained OA values exceeding 75% for Indian Pines data. Also, it can be seen that the CSA-based BS approach obtained the highest OA, AA, and Kappa value for Indian Pines. As illustrated in Table 4, the HPSOCSA_SP outperformed the other techniques for the KSC data, obtaining an OA of 86.48%, an AA of 78.73% and a Kappa score of 0.8491. Table 5 demonstrates that the proposed models achieved higher class-specific accuracy in most of the classes for the Botswana data. In addition, the HPSOCSA_SP performs better than other techniques, attaining an OA of 85.56%, an AA of 82.3%, and a Kappa score of 0.8432. Table 6 shows that the CSA-based, HPSOCSA_SP, and HPSOCSA_SLP methods outperform other methods for the Pavia University dataset. In addition, HPSOCSA_SLP achieved an OA of 85.34%, an AA of 77.52% and a Kappa score of 0.8011.

Table 3 Classification results of Indian Pines for the ten bands
Table 4 Classification results of KSC for the ten bands
Table 5 Classification results of Botswana for 10 bands
Table 6 Classification results of Pavia for 10 bands

Table 7 illustrates the precision, recall and F1 score performance on all four datasets. The results show that the CSA achieved the highest precision, recall and F1 score of 73.07%, 58.34% and 62.53% on Indian Pine. HPSOCSA_SP obtained the highest precision (84.39%), recall (78.72%) and F1 score on KSC (80.51%). On the Botswana dataset, HPSOCSA_SP achieved the highest recall (82.30%) and F1 score (84.11%). HPSOCSA_SLP achieved the highest precision of 88.95%. Among the models tested on the Pavia, HPSOCSA_SLP had the highest recall (77.52%) and F1 score (80.45%), while HPSOCSA_SP achieved the highest precision of 86.68%.

Table 7 Precision, recall and F1 score of all mentioned dataset for 10 bands

The classification maps (CMs) created for the Indian Pines, KSC, Botswana, and Pavia datasets using different methods are displayed in Figs. 7, 8, 9, and 10, respectively.

Fig. 7
figure 7

Classification map for the Indian Pines dataset (a) Ground truth (b) GA (c) PSO (d) CSA (e) HPSOCSA_SP (f) HPSOCSA_SLP

Fig. 8
figure 8

Classification map for the KSC dataset (a) Ground truth (b) GA (c) PSO (d) CSA (e) HPSOCSA_SP (f) HPSOCSA_SLP

Fig. 9
figure 9

Classification map for the Botswana (a) Ground truth (b) GA (c) PSO (d) CSA (e) HPSOCSA_SP (f) HPSOCSA_SLP

Fig. 10
figure 10

Classification map for the Pavia (a) Ground truth (b) GA (c) PSO (d) CSA (e) HPSOCSA_SP (f) HPSOCSA_SLP

3.5 Convergence analysis

The convergence graphs using the proposed methods (CSA-based, HPSOCSA_SP, and HPSOCSA_SLP) and GA and PSO for the four mentioned datasets are presented in Fig. 11. In Fig. 11a, for Indian pines, HPSOCSA_SP converges earlier than other methods, and the CSA-based BS method reaches global solution. The convergence rate of HPSOCSA_SLP is better than GA, PSO and CSA. In addition, HPSOCSA_SP and HPSOCSA_SLP are closer to the reachable global optima. In Fig. 11b, the HPSOCSA_SLP algorithm quickly converged and obtained a near-global optimum for KSC data, while the CSA-based BS method converged late but eventually reached the global optimum. In Fig. 11c, the HPSOCSA_SLP method converges earlier for Botswana compared to other methods. Additionally, the CSA-based BS method successfully reaches a global solution. In Fig. 11d, the HPSOCSA_SLP algorithm quickly converged and obtained a near-global optimum for the Pavia data, while the CSA-based BS method converged later but eventually reached the global optimum.

Fig. 11
figure 11

Convergence graph of the proposed methods on (a) Indian Pines (b) KSC (c) Botwana (d) Pavia

Figure 11 shows that the CSA-based method achieves global optima on all four datasets. The HPSOCSA_SP method outperforms other methods in terms of convergence speed at Indian Pine. The HPSOCSA_SLP method demonstrates faster convergence than other methods at KSC, Botswana, and Pavia University.

3.6 Discussion

After analyzing Tables 3, 4, 5, 6 and 7, it can be concluded that the CSA-based algorithm performs the best on the Indian Pines dataset for OA, AA, Kappa score, precision, recall, and F1 score. On the other hand, HPSOCSA_SP is the best algorithm for the KSC dataset for the same metrics. In terms of recall, and F1 score, HPSOCSA_SP is the top-performing algorithm for Botswana, while HPSOCSA_SLP is the best for Pavia. In addition, HPSOCSA_SP and HPSOCSA_SLP achieve the highest overall accuracy on the Botswana and Pavia datasets, respectively. In addition, the highest OA on the Botswana and Pavia datasets is achieved through HPSOCSA_SP and HPSOCSA_SLP, respectively. On the other hand, the highest precision on the same datasets is achieved through HPSOCSA_SLP and HPSOCSA_SP. Thus, in most cases, the hybrid approach (HPSOCSA_SP or HPSOCSA_SLP) achieved the best result.

Based on the convergence graphs in Fig. 11, we can conclude that the CSA-based BS method reached the global optimum, but it took longer to converge. In addition, the HPSOCSA_SP method shows superior convergence speed at Indian Pine compared to other methods, while the HPSOCSA_SLP method demonstrates faster convergence than other methods at KSC, Botswana, and Pavia University. In most cases, the hybrid approaches, especially HPSOCSA_SLP, converge quickly compared to other methods. This suggests that information sharing between PSO and CSA via population sharing to balance the exploitation and exploration in each iteration works well.

4 Conclusion

The proposed research was conducted to find an optimal band subset with a high convergence rate for HSI data. In this direction, three BS methods based on metaheuristics for HSI have been applied, where the first method is based on CSA, and the other two (HPSOCSA_SP and HPSOCSA_SLP) are based on the hybridization of PSO and CSA. These three methods start with random population initialization. PSO and CSA have been hybridized to bring balance between exploration and exploitation into search progress. The population in HPSOCSA_SP is divided equally in half. PSO is used on one part, while CSA is used on the other. Based on fitness, HPSOCSA_SLP selects half of the best-performing members. The selected population receives successive applications of PSO and CSA. Then the resultant parts are integrated.

In the experiment, benchmark HSI datasets: Indian Pines, KSC, Botswana, and Pavia University have been used. CSA-based, HPSOCSA_SP, and HPSOCSA_SLP achieved OA values exceeding 75% for Indian Pines data (n = 10). On dataset KSC, HPSOCSA_SP obtains the OA of 86.48%, the AA of 78.73%, and the Kappa score of 0.8491. On dataset Botswana, HPSOCSA_SP attains the OA of 85.56%, the AA of 82.3%, and the Kappa score of 0.8432.On Pavia dataset, HPSOCSA_SLP attains the OA of 85.34%, the AA of 77.52.3%, and the Kappa score of 0.8011.The outcome demonstrates that proposed methods perform better than PSO and GA and offers steady outcomes. Compared to other methods CSA-based method obtains global optima on all four datasets. On Indian Pine, HPSOCSA_SP converges quickly compared to other methods. The HPSOCSA_SLP method proves to outshine other methods in terms of faster convergence at KSC, Botswana, and Pavia University.