1 Introduction

Recent advances in imaging techniques allow hyperspectral sensors capable of gathering spectral information in hundreds of narrow and contiguous bands, covering a broad range of wavelength in the spectrum. The increasing availability of hyperspectral images has gained huge success in the field of remote sensing. Hyperspectral image is three-dimensional (3-D) image cube (the third dimension denotes spectral band) comprises huge amount of spectral as well as spatial information to recognize and discriminate spectrally unique materials [30]. The enormous spectral bands of hyperspectral images provide redundant information, further entail huge storage cost and heavy computational burden. Besides, spectral bands are strongly correlated, i.e., more number of bands may not always result in significant discriminating capability for classification purpose. Therefore, reducing the dimensionality is a vital process to discard the redundant bands and make it less time-consuming for classification [4, 29, 32].

Generally, reduction in the dimensionality can be achieved with either feature extraction or feature selection (also known as band selection). Feature extraction is the process of reducing the dimension by mapping the data from original higher dimensional space to new lower dimensional space. Approaches like, Principal component analysis (PCA), Independent component analysis (ICA), Discriminant analysis, and transform based approaches have been widely used [14, 24, 43, 44]. However, these approaches generally alter the physical meaning of original data since the data in the low-dimensional space correspond to their linear combinations but do not original individual bands. Band selection approaches to pick set of most suitable bands from pool of bands may be preferred when the physical meaning of bands needs to be preserved [28, 31, 36]. Recently, many band selection approaches have been proposed [15, 45]. There exists two most commonly used ways for selection of optimal band subset. The first strategy includes the individual band evaluation, and the second strategy includes the band subset evaluation. Individual evaluation consists of clustering based approaches and ranking based approaches. In the individual band evaluation, the score (which signifies the importance of the band) of an individual band is measured according to its degree of relevance. Band clustering approaches form cluster of similar bands and select representative band [25, 46], whereas, in ranking based band selection approaches, the bands are selected based on the ranking [8, 47]. A hierarchical clustering approach is most commonly used band clustering approach, which is based on Ward’s linkage [18]. This technique employs mutual information (MI) between the bands to pick most informative bands. Kamandar and Ghassemian [36] proposed a novel band selection technique that maximizes relevance of selected bands and minimizes redundancy between them (mRMR), which further leads to excellent classification performance. Jia et al. [12] proposed a new hyperspectral band selection approach based on Relief-F algorithm and discards less informative bands.

Intelligent algorithms try to evaluate a complete subset of bands rather than individually selecting the bands. Generating such subsets is known to be an NP-hard problem. Therefore, such problems are solved by employing nature-inspired algorithms. In band selection, popular nature-inspired algorithms include Firefly algorithm [38], Artificial Bee Colony(ABC) [48], Wind Driven Optimization (WDO) [27], Genetic Algorithms (GA) [16], Particle Swarm Optimization (PSO) [37], Cuckoo Search (CS) optimization algorithm [33, 34], Gray Wolf Optimization (GWO) [22], and so on. These approaches consider the band selection problem as a combinatorial optimization problem which is solved by formulating an appropriate fitness function or objective function. The objective function assesses the band subsets and provides the degree of their goodness. Objective function needs careful determination as it has an influence on the performance of the system. It can be dependant or independent on the on learning algorithm. Hence, the objective function can be modelled by dependant evaluation criteria (depends on a predefined learning algorithm) [20, 39]or independent evaluation criteria such as such as, information measures (divergence, entropy or mutual information) [7, 42], distance measures (Bhattacharya distance, Kullback–Leibler divergence, Jeffries–Matusita distance, Hausdorff distance, and Spectral Angle Mapper (SAM)) [13, 22], and dependency measures (correlation measures, similarity measures) [6, 53]. Selection of an effective search strategy is a crucial task in the band selection problem. To optimize the defined objective function, an appropriate optimization algorithm must be chosen to converge to the global optimal solution and escape out of the local optimum.

Tschannerl et al. [42] proposed a modified discrete gravitational search algorithm based a novel unsupervised band selection approach employing maximum information minimum redundancy (MIMR) criterion which outperformed original MIMR with clonal selection algorithm. ABC approach is utilized to address problem of band selection [48]. An improved subspace decomposition is suggested to obtain intuitive band subspaces and ABC is employed as a search strategy. Xu et al. proposed a hybrid band selection approach by combining differential evolution (DE) and PSO in order to effectively combine characteristics of PSO and DE [49]. Sawant et al. [27] used a novel metaheuristic algorithm known as WDO, which is inspired atmospheric motion. WDO has excellent exploitation ability, but loses population diversity as gravitational force used in WDO pulls particles back to the origin. To address this limitation, PSO is combined with WDO and a novel band selection approach based on WDO and PSO is presented in [27]. Medjahed [21] proposed a binary version of CS algorithm to ensure that the selected band subset had a significant classification capability. Medjahed et al. [22] utilized a highly efficient metaheuristic algorithm called as GWO to reduce the dimensionality of hyperspectral image. This procedure has found a better band combination by optimizing five different objective functions. However, these algorithms also have their own limitations. A random population initialization strategy used in above algorithms may lead to the sluggish optimization response and fail to obtain the global optimum solution.

Recently, developing hybrid meta-heuristic algorithms has been gaining increasing attentions for researchers [10, 41, 49]. By combining two or more meta-heuristics, hybrid optimization approach can expand the performance of individual algorithm, and they seem to be more effective in solving complex optimization problems [1, 5, 9]. WDO and CS are population based meta-heuristic algorithms with different search mechanisms. WDO inspires from atmospheric motion and utilizes actual physical equations to search the solution space, whereas CS uses the Levy flight random walk to generate new solutions. The internal search mechanisms results in different characteristics of the two algorithms, i.e., WDO favours global exploration while CS is good at local exploitation. Considering the ability of these two algorithms, a hybrid approach based on WDO and modified CS, called WDOMCS, is proposed to speed up the convergence rate and enhance the optimization performance of the hybrid algorithm, aimed for selecting suitable bands with better classification performance. The main contributions of this manuscript are summarized as follows:

  1. 1)

    The proposed hybrid optimization algorithm utilizes the Chebyshev chaotic map for generation of initial population which helps circumventing the problem of premature convergence.

  2. 2)

    By population division, WDO and CS employ information sharing mechanism which decreases the risk of falling into a local optimal solution.

  3. 3)

    An adaptive step size is employed in Lévy flight random walk which helps enhancing the global searching ability of the standard CS algorithm.

  4. 4)

    A band selection approach for hyperspectral image classification based on WDOMCS is proposed, and the classification accuracy is the optimal with fewer bands.

The remaining part of the manuscript is organized as follows, a brief description about wind driven optimization, cuckoo search algorithm and chaotic map is provided in Section 2. Section 3 discusses the proposed WDOMCS algorithm in detail. Experimental analysis of the standard datasets is discussed in Section 4 and lastly, Section 5presents the conclusions.

2 Technical background

In this section, the technical background of wind driven optimization, cuckoo search algorithm and chaotic map is discussed.

2.1 Wind driven optimization

WDO is population based nature inspired metaheuristic optimization algorithm proposed by Bayraktar et al. [2, 3]. It is easy to implement and highly effective in solving multi-modal and multidimensional problem. The inspiration for WDO comes from atmospheric motion where wind blows in order to balance the imbalances in air pressure. Air flows from high pressure point to low pressure point at a velocity which is proportional to pressure gradient. Based on the above theory, WDO algorithm is derived. It starts with Newton’s second law of motion which states that the acceleration of an air particle as produced by a total force (Ft) is directly proportional to the magnitude of the total force, in the same direction as the total force according to Eq. (1).

$$ \rho a=\sum {\overrightarrow{F}}_t $$
(1)

Where, ρ is the air density for an infinitesimal air particle and a is the acceleration of an air particle. The relationship of air pressure with the air particle’s density and temperature is given by the ideal gas law,

$$ P=\rho RT $$
(2)

Where, P is the pressure, R is the universal gas constant and T is the temperature.

There are various forces that are responsible for movement of air particles mainly including pressure gradient force (FPG), friction force (FF), gravitational force (FG) and Coriolis force (FC).The physical equations of the above mentioned forces are expressed as,

$$ {F}_{\mathrm{PG}}=-\nabla P\delta V $$
(3)
$$ {F}_{\mathrm{F}}=-\rho {\alpha}_f\boldsymbol{u} $$
(4)
$$ {F}_{\mathrm{G}}=\rho \delta Vg $$
(5)
$$ {F}_{\mathrm{C}}=-2\Omega \times \boldsymbol{u} $$
(6)

Where, ∇P represents the pressure gradient, δV is finite volume of the air particle, urepresents the velocity vector of the wind, and αf is the friction coefficient, grepresents the gravitational acceleration, and Ω is rotation of the Earth.

By including all four forces FG, FPG, FC, and FFin the right hand side of Eq. (1) and Eq. (1) can be expressed as,

$$ \rho \frac{\Delta \boldsymbol{u}}{\Delta \mathrm{t}}=\rho \updelta Vg+\left(-\nabla P\updelta V\right)+\left(-2\Omega \times \boldsymbol{u}\right)+\left(-\rho {\alpha}_f\boldsymbol{u}\right) $$
(7)

The acceleration a in Eq. (1) is expressed as, \( a=\frac{\Delta \boldsymbol{u}}{\Delta t} \). By assuming ∆t = 1 for simplicity and for dimensionless and weightless air particle, δV = 1, the Eq. (7) can be expressed as,

$$ \rho \Delta \boldsymbol{u}=\rho g+\left(-\nabla P\right)+\left(-2\Omega \times \boldsymbol{u}\right)+\left(-\rho {\alpha}_f\boldsymbol{u}\right) $$
(8)

By substituting Eq. (2) in Eq. (8),

$$ \frac{P}{RT}\Delta \boldsymbol{u}=\frac{P}{RT}g+\left(-\nabla P\right)+\left(-2\Omega \times \boldsymbol{u}\right)+\left(-\frac{P}{RT}{\alpha}_f\boldsymbol{u}\right) $$
(9)

For simplicity, pressure P is replaced by pressure value of air particle at current location Pcurr, and then Eq. (9) can be represented as,

$$ \Delta \boldsymbol{u}=g+\left(-\nabla P\frac{RT}{P_{\mathrm{curr}}}\right)+\left(-2\Omega \times \boldsymbol{u}\frac{RT}{P_{\mathrm{curr}}}\right)+\left(-{\alpha}_f\boldsymbol{u}\right) $$
(10)

In WDO algorithm, velocity and position of air particle update their corresponding values at each iteration. Hence, change in the velocity u can be written as, u = unew − ucurr, where, ucurr is the velocity of air particle at current iteration and unew is the velocity of air particle at next iteration. Then the velocity update equation will be,

$$ {\boldsymbol{u}}_{\mathrm{new}}=\left(\left(1-{\alpha}_f\right){\boldsymbol{u}}_{\mathrm{curr}}\right)-g+\left(-\nabla P\frac{RT}{P_{\mathrm{curr}}}\right)+\left(-2\Omega \times \boldsymbol{u}\frac{RT}{P_{\mathrm{curr}}}\right) $$
(11)

Consider, xcurr is current location of air particle, xnew is new location of air particle, g→ and ∇P are vectors, which can be broken down in direction and magnitude as g = |g|(0 − xcurr),− ∇ P = |Popt − Pcurr|(xnew − xcurr), Popt is optimum pressure, Pcurr is the pressure of current location. The influence of the Coriolis force (Ω × u) is replaced by the velocity influence from another dimension, \( {\boldsymbol{u}}_{\mathrm{curr}}^{\mathrm{other}\ \dim } \) . Then, all the coefficients are combined together, c = −2RT and the velocity update equation in Eq. (11) will be expressed as,

$$ {\boldsymbol{u}}_{\mathrm{new}}=\left(\left(1-{\alpha}_f\right){\boldsymbol{u}}_{\mathrm{curr}}\right)-g\left({x}_{\mathrm{curr}}\right)+\left[ RT\left|1-\frac{1}{i}\right|\left({x}_{\mathrm{new}}-{x}_{\mathrm{curr}}\right)\right]+\left[\frac{-c{\boldsymbol{u}}_{\mathrm{curr}}^{o\mathrm{ther}\ \dim }}{i}\right] $$
(12)

Then the air particle location update equation will be,

$$ {x}_{\mathrm{new}}={x}_{\mathrm{curr}}+\left({\boldsymbol{u}}_{\mathrm{new}}\times \Delta t\right) $$
(13)

Where, i is rank among all air parcels based on their pressure values and xnew is the new updated location for next iteration. WDO is similar to other nature inspired optimization algorithms with few control variables that need to be adjusted. When standard WDO is applied to solve the hyperspectral band selection problem, the description of WDO parameters used in the algorithm is given in Table 1 and the hyperspectral band selection using standard WDO is given as Algorithm 1.

Table 1 Description of WDO parameters
figure a

2.2 Cuckoo search algorithm

CS is population based bio-inspired metaheuristic algorithm proposed by Yang and Deb [51] as the solution for the global optimization problems. The inspiration for CS comes from the breeding behavior of some cuckoo species. Cuckoo bird put sits eggs in the nests of other bird. If a cuckoo egg is discovered by host bird, then host bird either throws the foreign eggs away or abandons the nest and constructs a new nest. In the standard CS algorithm, each egg of the host bird in a nest represents a solution, and a cuckoo egg denotes a new solution. If a new solution is better than the one in the nest, the worse one will be replaced with new solution. Therefore, the development of the CS algorithm is based the following three main rules: (1) Each cuckoo places one egg in an arbitrarily selected nest. (2) The best nest with good quality eggs is transferred to the next generation. (3) The number of available host’s nests is fixed. The host bird determines the cuckoo egg by a probability of pa ϵ [0, 1]. If the host bird finds a cuckoo egg, the host bird can abandon the nest or throw the egg away, and construct a completely new nest.

In the standard CS algorithm, initially m number of random nest locations are generated representing potential solution, Xi for a cuckoo i. A new solution Xi +1 is generated by adopting Lévy flight random walk as follows:

$$ {\boldsymbol{X}}_{\boldsymbol{i}+\mathbf{1}}={\boldsymbol{X}}_{\boldsymbol{i}}+\boldsymbol{\alpha} \bigoplus \boldsymbol{levy}\left(\boldsymbol{\lambda} \right) $$
(14)

Where, α > 0 is a step size scaling factor; the product of ⊕ means entry wise multiplication; levy (λ) is a random walk with the random step length which is drawn from a Lévy distribution. The Levy flight step size coefficient α, which controls the scale of the flight by multiplication and is related to the size of the solution space of the objective function. When a Lévy step is generated using a random number generator, it is first multiplied by α before it is used to generate a new egg. The cuckoo laying eggs corresponds to a new solution set and this is analogous to generating new solutions for a cuckoo i. The levy (λ) is calculated using Mantegna’s algorithm as follows:

$$ \boldsymbol{levy}\ \left(\boldsymbol{\lambda} \right)=\frac{\boldsymbol{\mu}}{{\left|\boldsymbol{\nu} \right|}^{\mathbf{1}/\boldsymbol{\lambda}}} $$
(15)

Where, λ is fixed parameter with value 3/2. Furthermore, μ and ν are drawn from a normal distribution with zero means and an associated variance, as follows:

$$ {\boldsymbol{\sigma}}_{\boldsymbol{\mu}}={\left[\frac{\boldsymbol{\varGamma} \left(\mathbf{1}+\boldsymbol{\lambda} \right)\boldsymbol{\sin}\left(\boldsymbol{\pi} \boldsymbol{\lambda} /\mathbf{2}\right)}{\boldsymbol{\varGamma} \left(\left(\mathbf{1}+\boldsymbol{\lambda} /\mathbf{2}\right)\boldsymbol{\lambda} {\mathbf{2}}^{\left(\boldsymbol{\lambda} -\mathbf{1}\right)/\mathbf{2}}\right)}\right]}^{\mathbf{1}/\boldsymbol{\lambda}},{\boldsymbol{\sigma}}_{\boldsymbol{\nu}}=\mathbf{1} $$
(16)

Where, Γ() is the standard Gamma function.

As standard CS is applied to solve the hyperspectral band selection problem, each cuckoo represents a band of hyperspectral data and each nest represents a solution of the problem (set of selected bands). Discovery probability pa indicates each band has probability of pa to be successful. The hyperspectral band selection using standard CS is given as Algorithm 2:

figure b

2.3 Chaotic map

Chaos is the behavior of nonlinear systems which exhibits randomness and ergodic behavior. Chaotic map is an exceptional solution to the dynamic behavior of the systems that are highly sensitive to the random initial seeds [35]. Most of metaheuristic optimization algorithms involve random initialization of seeds and are highly dependent on initial conditions. So, if the initial solutions are not well selected, they lead to local minimum and poor results. The randomness at initial step leads the algorithm to premature convergence and easily gets trapped into a local optimal solution. Yang proposed a chaotic map based global optimization algorithm which employed chaotic variables instead of random variables, and improved the efficiency of the algorithm [50]. Ergodicity and non-repetition of chaos enables the chaotic map in maintaining the diversity in the population and tackles the problem of premature convergence.

A chaotic map is a mathematical function that shows some kind of chaotic behavior over time characterized by,

$$ {\boldsymbol{X}}_{\boldsymbol{t}+\mathbf{1}}=\boldsymbol{f}\left({\boldsymbol{X}}_{\boldsymbol{t}}\right) $$
(17)

In the chaotic sequence given in equation (4), chaotic variable Xt +1 at time t + 1 depends on chaotic variable Xt at time t. There are several chaotic maps [40], such as iterative chaotic map with infinite collapses (ICMIC), tent map, logistic map, Chebyshev map, gauss map. The Chebyshev chaotic map used in this study is based on preliminary trials. Chebyshev map is defined as:

$$ {\boldsymbol{X}}_{\boldsymbol{t}+\mathbf{1}}=\boldsymbol{\cos}\left(\boldsymbol{a}.{\boldsymbol{\cos}}^{-\mathbf{1}}\left({\boldsymbol{X}}_{\boldsymbol{t}}\right)\right) $$
(18)

Where, a is finite constant.

3 Proposed band selection approach

Consider the hyperspectral image dataset which is represented as, χϵRH × W × N, where, H and W are the height and width of hyperspectral image respectively and N is the total number of spectral bands or the feature dimensions. Assume that k classes in the hyperspectral data are denoted as, Ω = [Ω1, Ω2, Ω3, …..Ωk] and n be the selected number of bands denoted as, Xi= [b1, b2, b3, …..bn]. The following subsection discusses in detail the proposed WDOMCS band selection algorithm, and the objective function.

3.1 Objective function or fitness function

The goal of the proposed band selection approach is to explore a finite set of possible subset of bands that maximize the classification accuracy. In this study, the classification accuracy of the classifier is used as an objective function. Classification accuracy is defined as follows [17]:

$$ \boldsymbol{E}\left(\boldsymbol{i}\right)=\boldsymbol{Accuracy}=\frac{\sum_{\boldsymbol{k}=\mathbf{1}}^{\mid \boldsymbol{H}\times \boldsymbol{W}\mid}\boldsymbol{Evaluate}\left({\boldsymbol{p}}_{\boldsymbol{k}}\right)}{\boldsymbol{total}\ \boldsymbol{number}\ \boldsymbol{of}\ \boldsymbol{pixels}} $$
(19)

Where, assess (pi) is the function used to classify pixels pk.

$$ \boldsymbol{Evaluate}\left({\boldsymbol{p}}_{\boldsymbol{k}}\right)=\left\{\begin{array}{c}\mathbf{1}\kern0.75em \boldsymbol{if}\ \boldsymbol{classify}\ \left({\boldsymbol{p}}_{\boldsymbol{k}}\right)=\Omega \\ {}\mathbf{0}\kern0.75em \boldsymbol{otherwise}\kern4.75em \end{array}\right. $$
(20)

Classify (pk) is the function that gives the class of pk. For the pixel pk with true class, the function assess (pk) = 1, and 0 otherwise.

3.2 The proposed WDOMCS band selection algorithm

Although the standard WDO and CS have their own features and benefits, they also have some limitations. CS has relatively strong local exploration capabilities, but algorithm suffers from premature convergence and gets trapped into local optimal solution. The WDO algorithm has strong exploitation ability, thus combining the searching capabilities of both algorithms seem to be good approach which will certainly lead to the development of better performance algorithms.

The standard WDO and CS start with random population initialization which may cause repeated calculations and can easily fall into local optimal solution. For the optimization problem, the final solution is very sensitive to the initial population. By using the chaotic map, the property of the non-repetition and ergodicity accelerate the search speed by exploring all search space efficiently. Furthermore, a scaling factor α of Lévy flight distribution used in Eq. (14) is fixed in a standard CS algorithm. This parameter has a great influence on the global optimal solution. A large value of α allows cuckoo to explore search space rigorously at the risk of slower convergence. A smaller value of scaling factor causes an algorithm to get stuck into a local optimum solution, thereby leading to premature convergence. Therefore, the selection of the appropriate step size is indispensable to avoid local optimal solution and to enhance the speed of convergence. The scaling factor αnew can be formulated as,

$$ {\boldsymbol{\alpha}}_{\boldsymbol{new}}=\left\{\begin{array}{c}{\propto}_{\boldsymbol{L}}+\frac{\left({\propto}_{\boldsymbol{U}}-{\propto}_{\boldsymbol{L}}\right)\left({\boldsymbol{f}}_{\boldsymbol{i}}\left(\boldsymbol{t}\right)-{\boldsymbol{f}}_{\boldsymbol{w}}\left(\boldsymbol{t}\right)\right)}{{\boldsymbol{f}}_{\boldsymbol{avg}}\left(\boldsymbol{t}\right)-{\boldsymbol{f}}_{\boldsymbol{w}}\left(\boldsymbol{t}\right)},{\boldsymbol{f}}_{\boldsymbol{i}}\left(\boldsymbol{t}\right)\le {\boldsymbol{f}}_{\boldsymbol{w}}\left(\boldsymbol{t}\right)\\ {}\frac{\propto_{\boldsymbol{U}}}{\boldsymbol{t}},\kern9.5em {\boldsymbol{f}}_{\boldsymbol{i}}\left(\boldsymbol{t}\right)>{\boldsymbol{f}}_{\boldsymbol{w}}\left(\boldsymbol{t}\right)\end{array}\right. $$
(21)

Where,

t is the iteration number.

L and ∝U are predefined lower and upper limits of step size, respectively.

favg(t) is the average fitness value of all nests.

fi(t) is the fitness value of ith nest.

fw(t) is the worst fitness value.

The new nest location is computed from Lévy flight and Eq. (1) is re-written as,

$$ {\boldsymbol{X}}_{\boldsymbol{i}}\left(\boldsymbol{t}+\mathbf{1}\right)={\boldsymbol{X}}_{\boldsymbol{i}}\left(\boldsymbol{t}\right)+{\boldsymbol{\alpha}}_{\boldsymbol{i}}\left(\boldsymbol{t}+\mathbf{1}\right)\bigoplus \boldsymbol{levy}\left(\boldsymbol{\lambda} \right) $$
(22)

It is evident from this adaptation strategy, the moment current solution reaches a near optimum solution, the cuckoos upgrade their exploration around the current best solution using a smaller step size. On the other hand, a large step size is used when the current solution is less satisfactory than the average fitness value in order to inspire the cuckoos to explore a search space more rigorously. The term t is involved to encourage more exploitation. Also, the position and fitness value of the new cuckoo egg remain unchanged if it goes out of the search space. In this manner, CS can make full use of the previous excellent host bird’s nests. Therefore, inclusion of adaptive Lévy flight in the standard CS can help the algorithm quickly converge to the global optimal solution.

In this manuscript, a hybrid metaheuristic algorithm called WDOMCS is proposed aiming at combining both the exploration of CS and exploitation of WDO. In general, hybrid optimization algorithms can utilize the characteristics of the original algorithm. The main goal of hybridization is to complement each other’s characteristics. The hybridization approach is as follows: In the beginning, the initial population is generated using Chebyshev chaotic map, and the whole population is divided into two equal subgroups, Group A and Group B. WDO and modified version of CS are adopted for these two subgroups independently. By division, these two subgroups can share suitable information and utilize each other’s pros, thus avoid premature convergence, and obtain best optimal solution. The steps involved in the WDOMCS are presented in Algorithm 3. The flowchart of WDOMCS process is shown in Fig.1.

Fig. 1
figure 1

Flowchart of WDOMCS process

figure cfigure c

4 Result and discussion

The proposed WDOMCS algorithm was tested on three different hyperspectral remote sensing datasets, Indian pines, Pavia University and Botswana [11] for assessment of the effectiveness of the proposed band selection approach. Details of the datasets and comprehensive results are discussed in following subsections.

4.1 Dataset description and experimental setup

Table 2 summarizes the details of the standard bench-mark hyperspectral datasets, Indian pines, Pavia University and Botswana.

Table 2 Details of Hyperspectral datasets

All the experiments are conducted using MATLAB platform version R2018b on Intel Xeon processor 2.90 GHz CPU along with 128 GB RAM in Windows 10 (64 bit) environment. The performance of the proposed method is compared with those of the other five metaheuristic band selection approaches, i.e. GA, PSO, GWO, WDO, and CS.

In the beginning, 10% number of samples of each land cover class from the Indian pine dataset, Pavia University and Botswana datasets are randomly chosen as training samples for validation of the efficacy of the proposed technique with less amount of labelled data. The rest of the samples are considered as testing data. The experimentation is conducted ten times for the evaluation of the average of Overall Accuracy (OA), Average Accuracy (AA), class-wise accuracy, and kappa coefficient κ. To perform the classification task using selected bands, one-against-all Support Vector Machine (SVM) is employed as classifier [23]. The SVM classifier is selected due to its excellent performance when dealing with small-sized training data. To compare fairly, we have conducted the experiments on the final bands searched by their methods, but using the same classifier of SVM. In SVM, Radial Basis Function (RBF) is used as its kernel function. The parameters C and γ in SVM are determined using fivefold cross validation method. The parameters of each algorithm are summarized in Table 3.

Table 3 Parameters setting of each algorithm

4.2 Experimental analysis

This section discusses the classification results obtained by the proposed approach and commonly used metaheuristic approaches, comparison of the proposed band selection approach with commonly used band selection approaches, WDOMCS parameter analysis and convergence analysis.

4.2.1 Classification performance

The classification results obtained by the proposed band selection approach WDOMCS are compared with other metaheuristic approaches to assess the effectiveness of the WDOMCS. GA, PSO, GWO, WDO, CS, and WDOMCS are executed by selecting 20 bands on the Indian Pines, Pavia University and Botswana dataset. The OA, AA and κ are measured in every case alongside the individual class accuracies. Results are presented in Tables 4, 5 and 6 and the best value for each metric is recorded in bold. It is observed from Table 4 that, GA performs worst on Indian Pines dataset, whereas PSO and GWO show similar performances. It is observed from Table 5 that, GA and GWO perform worst on Pavia University dataset; however, PSO shows satisfactory performance. On the other hand, Table 6 shows that, performance of GA and PSO is inferior to that of GWO. Moreover, the performance of GWO is almost similar to that of CS.

Table 4 Classification accuracies (%) obtained by the proposed method and competing approaches for Indian pines dataset
Table 5 Classification accuracies (%) obtained by the proposed method and competing approaches for Pavia University dataset
Table 6 Classification accuracies (%) obtained by the proposed method and competing approaches for Botswana dataset

According to the results in Tables 4, 5 and 6, the optimization abilities of WDO and CS are better than GA, PSO and GWO and the OA value exceeded 80% for Indian Pines dataset, 94% for Pavia University dataset and 92% for Botswana dataset. WDO and CS require a few control parameters; however, the exploration ability of both algorithms is restricted to the initial setting. In WDO, air particles simultaneously walk around the universe, and explore the entire search space without getting trapped at the boundary. In MCS, adaptive Lévy flight strategy helps to maintain balance between exploration and exploitation and enhances the searching ability to obtain global best solution. Although, the WDO and CS give better results, they have not obtained theoretical optimal value. Thus, WDOMCS obtains the highest fitness value for all datasets, which illustrates that the excellent optimization ability of WDOMCS in comparison with other metaheuristic approaches. In contrast to this excellent performance, the classification performance obtained by the standard WDO and CS algorithm with the random initialization is inferior to that of proposed WDOMCS approach. This substantial difference indicates that, the classification performance obtained by the standard WDO and CS algorithm is deteriorated due to the randomness introduced in the selection of initial population. It is evident from Tables 4, 5 and 6, the proposed WDOMCS algorithm has strong capability to obtain an optimal solutions. Among all metaheuristic approaches, the proposed WDOMCS algorithm has achieved excellent classification performance in terms of OA, AA and κ. The proposed WDOMCS algorithm has achieved the maximum OA of 81.67% for Indian Pines dataset, 94.17% for Pavia University dataset, and 92.64% for Botswana dataset. It can be inferred from Tables 4, 5 and 6 that, the use of the initialization strategy, adaptive Lévy flight strategy and hybridization of WDO and CS have greater influence on the optimization ability of an algorithm.

The classification maps produced by all the above mentioned band selection approaches on Indian Pines, Pavia University and Botswana dataset are shown in Figs. 2, 3 and 4, respectively. The proposed approach WDOMCS shows a smooth classification map than other competing band selection approaches. For instance, the region of class “Woods” as shown in Fig. 2f (denoted by brown arrow) is smoother than those of the other approaches. Similarly, the region of class “Meadows” as shown in Fig. 3f (denoted by red arrow) is smoother than other approaches.

Fig. 2
figure 2

Classification map obtained by various methods for Indian Pines dataset. a GA (b) PSO (c) GWO (d) WDO (e) CS (f) WDOMCS

Fig. 3
figure 3

Classification map obtained by various methods for Pavia University dataset. a GA (b) PSO (c) GWO (d) WDO (e) CS (f) WDOMCS

Fig. 4
figure 4

Classification map obtained by various methods for Botswana dataset. a GA (b) PSO (c) GWO (d) WDO (e) CS (f) WDOMCS

4.2.2 Comparison of the proposed band selection approach with commonly used band selection approaches

To validate the effectiveness of the proposed WDOMCS approach, the classification performance of the WDOMCS is compared with commonly used band selection approaches such as Ward’s linkage strategy using divergence (WaLuDi) [18], minimum-redundancy maximum-relevance (mRMR) [19], and Relief algorithm [12]. Figure 5a, b and c present the OA, AA and κ for three hyperspectral datasets using above mentioned band selection approaches. It is observed that the classification accuracy by the proposed WDOMCS shows significant improvement that by using the full spectral bands, which is higher than 4% for Indian Pines, 13% for Pavia University and 1% for Botswana dataset. More importantly, the number selected of bands is greatly reduced in comparison with original spectral bands. Among all the band selection approaches, WDOMCS attains excellent classification performance by selecting most discriminative bands. Although the number of selected bands is the same, OA is lower than 80% for Indian Pines, less than 90% for Pavia University and Botswana datasets when using Relief, mRMR, and WaLuDi approaches. Similarly, AA is less than 62% for Indian Pines, lower than 89% for Pavia University and Botswana datasets when using Relief, mRMR, and WaLuDi approaches. Furthermore, kappa coefficient is lower than 76% for Indian Pines, less than 88% for Pavia University and Botswana datasets when using Relief, mRMR, and WaLuDi approaches. In short, we can say that WDOMCS has the excellent searching ability and optimization performance, which makes it better choice to solve the band selection problem.

Fig. 5
figure 5

Overall statistics (OA, AA, k in %) for (a) Indian Pines dataset (b) Pavia University dataset (c) Botswana dataset

4.2.3 WDOMCS parameter analysis

The population size Ps, number of iterations Tare the two main parameters influencing the performance of WDOMCS. Both parameters have been analysed in Fig. 6a and b for a fixed number of 20 bands on the Botswana dataset. As expected, classification accuracy increases with increase in Ps and T. They are chosen to be as minimal as possible to attain maximum optimisation capacity with minimal computational effort. Based on these results, Ps is set to 20 and T is set to 100.

Fig. 6
figure 6

Sensitivity of WDOMCS parameters (a) population size (b) number of iterations

4.2.4 Convergence analysis

In this section, optimization performance of the WDOMCS is discussed with respect to convergence analysis. The proposed WDOMCS includes hybridization of the standard WDO and CS algorithm. In addition, Chebyshev chaotic map based population initialization phase and one modification to the standard CS algorithm were adopted to speed up the convergence process. The ergodicity and stochastic properties exhibited by chaos helps the algorithm to reach its global best optimal solution. A modification to the standard CS algorithm includes the use of an adaptive Lévy flight strategy based on the fitness value of previous iteration, which will helps in increasing the convergence rate by exploring around the most likely solution space. In this section, the influence of each modification on the standard algorithms (M1: WDO algorithm with Chebyshev chaotic initialization, M2: CS algorithm with Chebyshev chaotic initialization, M3: CS algorithm with adaptive Lévy flight strategy) are studied. The convergence plots using the above three modifications and the standard WDO, CS and WDOMCS algorithm for Botswana dataset are shown in Fig. 7. It is evident from Fig. 7, the use of Chebyshev chaotic map based population initialization have reached a near attainable global solution without bringing any substantial enhancement to the convergence rate in comparison with the standard WDO. Similarly, CS algorithm Chebyshev chaotic map based population initialization attained maximum fitness value without getting trapped in the local optimum. Whereas, the convergence rate of M2 is slower than that of standard CS algorithm. The Chebyshev chaotic map based population initialization strategy can improve the searching ability because it generate non-repeated solution and improve the quality of the initial feasible solution. The use of adaptive Lévy flights strategy in M3provided more stable convergence rate. The adaptive step size encourages the cuckoos to explore more rigorously to find a new solution. However, in comparison with modification M2, it failed to attain the global optimal solution. Therefore, the inclusion of all three modifications to the standard WDO and CS algorithm lead to a more stable and reliable optimization performance as expected.

Fig. 7
figure 7

Convergence behaviour of WDOMCS algorithm on Botswana dataset

All the approaches except WDOMCS have obtained almost same solution at the end iteration; however convergence speed slightly varies. As can be clearly seen from Fig. 7, WDOMCS algorithm converges quickly and reaches global optimal solution. The Chebyshev chaotic map generates non-random population and significantly increases the convergence rate. Adaptive step size helps algorithm to come out of local optima, overcomes premature convergence and improves quality of solution significantly. Furthermore, hybridization of the WDO and CS leads to exchange of useful information across the search space. Therefore, with these modifications, WDOMCS obtains optimal solution quickly in comparison with other metaheuristic approaches. The fitness curve of WDOMCS proved that, the exploitation capability of the algorithm has been enhanced which succeed in reducing premature convergence problem as well as improving the convergence rate.

5 Conclusion

In this paper, a novel hybrid optimization approach is proposed based on WDO and MCS algorithm for solving the problem of band selection in hyperspectral image. In WDOMCS, the randomness of initial population is modified through use of the chaotic map. Furthermore, the searching ability of CS algorithm is improved by adaptively adjusting the Levy flight step size. WDOMCS can enhance the ability of exploitation and avoid getting trapped into local optimal solution by dividing the population, grouping update and combining the population to share the useful information between individuals. The experimental results have been compared with other band selection techniques optimized by GA, PSO, GWO, WDO and CS. Compared to GA, PSO, GWO, WDO and CS algorithm, the WDOMCS is more effective in obtaining better solutions and convergence speed of WDOMCS is higher. This hybridization presented a better optimization performance and robustness and significantly enhances the optimization capability of original WDO and CS algorithms.

Although in this paper the hybrid WDOMCS algorithm was implemented for solving the problem of hyperspectral band selection, there are still many aspects worthy of our study. Firstly, we will focus on various initialization strategies in order to further improve the performance of optimization. Secondly, multi-objective optimization model is needed to find an optimal trade-offs between conflicting issues such as, information preserving and redundancy reducing for choosing optimal band subsets. Also, we will further explore various strategies for combining spectral and spatial information to improve the classification performance. Lastly, we will further explore our proposed approach to various multimedia applications such as, image segmentation [52], image detection [26], and image dehaze [54].