Comprehensive Learning Strategy Enhanced Chaotic Whale Optimization for High-dimensional Feature Selection

Ma, Hanjie; Xiao, Lei; Hu, Zhongyi; Heidari, Ali Asghar; Hadjouni, Myriam; Elmannai, Hela; Chen, Huiling

doi:10.1007/s42235-023-00400-7

Comprehensive Learning Strategy Enhanced Chaotic Whale Optimization for High-dimensional Feature Selection

Research Article
Published: 04 July 2023

Volume 20, pages 2973–3007, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Bionic Engineering Aims and scope Submit manuscript

Comprehensive Learning Strategy Enhanced Chaotic Whale Optimization for High-dimensional Feature Selection

Download PDF

Hanjie Ma¹,
Lei Xiao¹,
Zhongyi Hu¹,
Ali Asghar Heidari¹,
Myriam Hadjouni²,
Hela Elmannai³ &
…
Huiling Chen ORCID: orcid.org/0000-0002-7714-9693¹

597 Accesses
2 Citations
Explore all metrics

Abstract

Feature selection (FS) is an adequate data pre-processing method that reduces the dimensionality of datasets and is used in bioinformatics, finance, and medicine. Traditional FS approaches, however, frequently struggle to identify the most important characteristics when dealing with high-dimensional information. To alleviate the imbalance of explore search ability and exploit search ability of the Whale Optimization Algorithm (WOA), we propose an enhanced WOA, namely SCLWOA, that incorporates sine chaos and comprehensive learning (CL) strategies. Among them, the CL mechanism contributes to improving the ability to explore. At the same time, the sine chaos is used to enhance the exploitation capacity and help the optimizer to gain a better initial solution. The hybrid performance of SCLWOA was evaluated comprehensively on IEEE CEC2017 test functions, including its qualitative analysis and comparisons with other optimizers. The results demonstrate that SCLWOA is superior to other algorithms in accuracy and converges faster than others. Besides, the variant of Binary SCLWOA (BSCLWOA) and other binary optimizers obtained by the mapping function was evaluated on 12 UCI data sets. Subsequently, BSCLWOA has proven very competitive in classification precision and feature reduction.

A New Chaotic Whale Optimization Algorithm for Features Selection

Article 01 July 2018

Feature selection by using chaotic cuckoo optimization algorithm with levy flight, opposition-based learning and disruption operator

Article 09 October 2020

Chaotic Atom Search Optimization for Feature Selection

Article 30 March 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Extensive data collecting and storage technology has undergone a revolutionary transformation due to the rapid development of contemporary civilization in information technology, image processing, natural language [1, 2], and other fields. As the core part of Knowledge Discovery in Databases (KDD) [3], data mining [4,5,6] aims to automatically search for obtaining hidden information from a vast amount of collected data. The acquired data sets are usually high dimensional and have irrelevant and redundant characteristics. The unprocessed execution will consume a lot of computational resources, which will impact how well the learning algorithm performs in the end. Feature selection, as a representative data preprocessing method, has been widely concerned in recent decades. It performs well in lowering the data’s feature dimension [7] and increasing prediction accuracy.

Filter [8, 9], wrapper [10, 11], and hybrid [12,13,14,15] methods are the three main categories [16] of conventional feature selection techniques. The filter approach directly employs all training data’s characteristics to assess and choose feature subsets [5] without using different algorithms. Thus, the calculation speed is fast, but the average deviation is large. The wrapper method works by creating many models with different subsets of input features and selecting those features with the most optimal model performance according to performance metrics. Therefore, the wrapper method is usually better than the filter method in selecting smaller feature subsets. Hybrid methods attempt to combine two or more feature selection methods to inherit the advantages of filters and wrappers and ultimately obtain a better subset of features [7, 17,18,19].

Researchers widely favor the wrapper method because they improve search efficiency and accuracy by using various search strategies to obtain a better subset. Existing early search strategies, such as branch-and-bound search [20], improve search efficiency by cutting out many redundant features. Sequential Forward Search (SFS) generates feature subsets by adding features to blanks [21]. Sequential Backward Search (SBS), which generates feature subsets by removing features [22] from the complete set [23], is a further example. The population-based search strategy is particularly suitable for feature selection problems because a group of non-dominant solutions (feature subsets) can be found in a single run. Meta-heuristic search can quickly obtain high-quality solutions [9] in the whole search space, so it is gradually used to solve FS.

Despite the long-standing preference for deterministic optimization such as gradient descent due to their precision and repeatability, they may not always be sufficient when dealing with complex problems [24,25,26]. In such cases, evolutionary methodologies such as genetic algorithms that explore a wide range of potential solutions can offer a promising alternative [27]. However, it is important to note that these approaches can be computationally expensive and time-consuming. Moreover, optimizing complex systems requires consideration of multiple objectives [28, 29], as single-objective optimization may not always yield desired outcomes [30]. Nevertheless, multi-objective optimization is not always necessary or appropriate [31], and a combination of single-objective and multi-objective approaches may prove more effective depending on the problem at hand [32]. Therefore, it is crucial to carefully evaluate the problem requirements before selecting an optimization methodology.

At present, there are many Meta-heuristic Algorithms (MAs) which applied to practical problems, such as particle swarm optimizer (PSO) [33, 34], Ant Colony Optimization (ACO) [35], Grey Wolf Optimization (GWO) [36], Harris Hawks Optimizer (HHO) [37], Grasshopper Optimization Algorithm (GOA) [38], Differential Evolution (DE) [39], Slime Mould Algorithm (SMA) [40, 41], Colony Predation Algorithm (CPA) [42], Runge Kutta optimizer (RUN) [43], Rime Optimization Algorithm (RIME) [44], Hunger Games Search (HGS) [45] and Sine Cosine Algorithm (SCA) [46]. Algorithms, however, are not appropriate for any issues due to the diversity and complexity of practical problems. As a result, it is crucial to research optimization techniques for various situations. Therefore, many improved MAs have been proposed, including Nelder-Mead simplex [47], Opposition-based Marine Predators Algorithm (MPA-OBL) [48], Random Learning Slime Mould Optimization (ISMA) [49], Improved Seagull Optimization Algorithm (ISOA) [50], Enhanced Elephant Herding Optimization (EEHO) [51], Improved Archimedes Optimization Algorithm (I-AOA) [52], Opposition-based Levy Flight Chimp Optimizer (ICHOA) [53], Multi-population Cooperative Coevolutionary Whale Optimization Algorithm (MCCWOA) [54], Adaptive Chaotic Grey Wolf Optimization (ACGWO) [55], Memory-based Harris Hawks Optimization (MEHHO) [56], and Hybrid Wind Driven-based Fruit Fly Optimization (WDFO) [57]. The MAs have been applied to solve many problems, such as bankruptcy prediction [58], economic emission dispatch [59], economic load dispatch [60], feature selection [61,62,63], constrained multi-objective optimization [64], global optimization [65], large-scale complex optimization [66], feed-forward neural networks [67], scheduling optimization [68, 69], multi-objective optimization [70], and dynamic multi-objective optimization [71].

The use of MAs for feature selection has also attracted much investigation. Al-tashi et al. [72] proposed a discrete BGWOPSO that combined GWO with Particle Swarm Optimization (PSO) to perform feature selection on the text. Abd et al. [73] proposed a hybrid algorithm of the Marine Predators Algorithm (MPA) and KNN to solve the feature selection problems. Xue et al. [74] proposed an Adaptive Particle Swarm Optimization (SAPSO) and applied it to solve FS. Samy et al. [75] proposed a new binary WOA using the Optimal Path Forest (OPF) technique as an objective function to select the optimal subset to deal with the classification problem. Hussien et al. [76] proposed two binary WOA variants for feature selection, which ultimately reduced the complexity of the system and improved the performance of the system. Tubishat et al. put forward a Dynamic Butterfly Optimization Algorithm (DBOA) which is an enhanced Butterfly Optimization Algorithm (BOA) [77]. Fang et al. [78]. proposed a hybrid algorithm of the Nonlinear Binary Grasshopper Whale Optimization Algorithm (NL-BGWOA) to solve the feature selection problem and achieved good results in most high-dimensional datasets. Hassanien et al. [79] proposed a method to recognize human emotions using an elephant herding optimization algorithm and support vector regression and verified its effectiveness. Houssein et al. [80] proposed an improved Beluga Whale Optimization(BWO) algorithm based on dynamic candidate solutions to solve feature selection problems of different dimensions Barshandeh et al. [81] used the extended Learning-Automata mechanism to re-integrate and develop the Jellyfish Search algorithm and Marine Predator Algorithm (MPA). They verified the advantages of the proposed algorithm on 38 test functions and 10 data sets. Emary et al. [82] presented an extension of Ant-lion Optimization (ALO) with Levy flight (LALO) for feature selection. Sayed et al. [83] proposed a new Chaotic Dragonfly Feature Selection Algorithm (CDA) embedded with a chaotic map algorithm for dragonfly search iterations. Abualigah et al. [84] used an Improved Particle Swarm Optimization Algorithm (FSPSOTC) as an FS method, and experiments validated that it improved the effectiveness of text clustering techniques by processing a new subset of informative features. Thom et al. [85] presented a discrete Coyote Optimization Algorithm (COA) called Binary COA (BCOA). According to the hyperbolic transfer function of the packing model, the optimal feature subset of the dataset is selected for classification. Rajalaxmi et al. [86] developed a Binary Iimproved Grey Wolf Optimization (BIGWO) method based on wrappers for classifying Parkinson’s disease with optimal feature sets. Li et al. [87] proposed an Improved Sticky Binary PSO (ISBPSO) algorithm for feature selection.

These methods are efficient ways for FS. One of the population-based MAs, namely WOA [88], focuses on mathematical modeling by simulating the humpback whale’s bubble-net hunting style. WOA has few model parameters, fast convergence speed, and easy implementation compared to other optimizers. Therefore, it has been widely used in engineering technology [89,90,91,92] and other fields.

Although WOA performs well in optimizing simple nonconvex and other problems, it also struggles to settle complex optimization problems and tends to fall into local optima. Search around ideal individuals and searching around random individuals are the two basic search modes for the WOA algorithm. The contraction and spiral update process around the optimum search is crucial to ensure the algorithm has excellent local search ability and fast convergence speed. However, this also leads to a struggle beyond the local optimum at a later stage. Therefore, introducing appropriate, feasible strategies to enhance WOA’s ability to global explore and locally exploit is a direction worthy of study.

This research has developed a comprehensive learning whale algorithm, which provides the optimal solution to solve the imbalance of WOA’s exploration and exploitation. First, sine chaos is introduced into WOA. The sine chaos strategy generates a chaotic sequence to replace pseudorandom numbers to generate an initial random population. Experiments show that the initial population with more uniform distribution can be obtained using sine chaos, which helps the algorithm obtain better results in the early evolution stage. This work also proposes a comprehensive learning strategy to enrich the diversity of the swarm. CL strategy contains three mutually exclusive equations [93], which can provide more search methods for the algorithm to escape from local optimization and maintain the balance between exploration and exploitation. According to the excellent search performance of SCLWOA, this paper considers its application in high-dimensional feature space for FS. By changing the encoding mode of SCLWOA for feature selection, its effectiveness is verified by evaluating 12 UCI high-dimensional medical data sets. This paper adopts K Nearest Neighbor (KNN) as a classifier for experimental evaluation. KNN shows excellent performance in training speed and classification accuracy [94, 95], so it is widely utilized to classify small sample data. The specific contributions of this paper are shown below:

1.
Propose an SCLWOA with sine chaos and comprehensive learning mechanisms to strike a new balance between exploration and exploitation for WOA.
2.
SCLWOA has better global search capability than other established and improved optimization algorithms.
3.
The improved binary version based on SCLWOA has an excellent performance in feature optimization compared to other algorithms.

The structure of this article is as follows: It focuses on the detailed description of the standard WOA’s exploration and exploitation process in Sect. 2. Section 3 introduces an improved SCLWOA incorporating sine chaos and CL strategies and proposes BSCLWOA for feature selection tasks. The experimental setup and results analysis of this study are shown in Sect. 4. Finally, in the fifth part, the conclusions and description of the work will be given.

2 An Overview of Whale Optimization Algorithm (WOA)

WOA was a MA inspired by the humpback whales’ predatory behavior, which was proposed in 2016. It consists of two main stages of exploration and exploitation. The term "exploitation" refers to the algorithm’s ability to search regions with potential optimal solutions that have not been searched globally in the iterative process. In contrast, the term "exploration" refers to the algorithm’s ability to focus on the local search between different regions that have been developed [96]. During the exploitation phase, WOA mainly referred to the humpback whales’ bubble net behavior for mathematical modeling.

2.1 Exploration

The search phase mainly refers to the random individual position. This stage allows the algorithm to perform a global search to obtain a better solution [88],

$$D = \left| {C*X_{rand}^{j} - X^{j} } \right|$$

(1)

$$X_{t + 1}^{j} = X_{rand}^{j} - A*D$$

(2)

where $X_{rand}$ is the random agent selected from the current population, C and A are two coefficients. The specific equations of A and C are presented in Eqs. (3) and (4).

$$A = 2a*r - a$$

(3)

$$C = 2 \cdot r$$

(4)

where $a$ is a parameter decreasing from 2 to 0 linearly during the evolving process, and $r$ represents a number ranging from [0,1] randomly.

2.2 Exploitation

In the utilization stage of the WOA algorithm, a bubble search mechanism is adopted, embodied in two methods, namely the shrinking enveloping strategy and spiral update position mechanism.

2.2.1 Contraction Encirclement Mechanism

The update position of the individual will be set to any position between the current individual and the prey if A is within [− 1,1].

$$D = \left| {C \cdot X_{best}^{j} - X^{j} } \right|$$

(5)

$$X_{t + 1}^{j} = X_{best}^{j} - A \cdot D$$

(6)

where $X_{best}$ represents the current optimal solution.

2.2.2 Spiral Update Position

The whales drive their prey through bubble nets and constantly update their position. This process will first calculate the distance between the current individual and the best individual:

$$D^{\prime } = \left| {X_{best}^{j} - X^{j} } \right|$$

(7)

$$X_{t + 1}^{j} = D^{\prime } \cdot e^{bl} \cdot cos\left( {2\pi l} \right) + X_{best}^{j}$$

(8)

where $b$ indicates a logarithmic spiral shape factor, and $l$ represents a number ranging from [− 1, 1] randomly. To balance the two search strategies, a random parameter $p \in \left[ {0,1} \right]$ is introduced as follows:

$$X_{t + 1}^{j} = \left\{ {\begin{array}{*{20}l} {X_{best}^{j} - A \cdot D} \hfill & {if\,p < 0.5} \hfill \\ {D^{\prime } \cdot e^{bl} \cdot cos\left( {2\pi l} \right) + X_{best}^{j} } \hfill & {if\,p \ge 0.5} \hfill \\ \end{array} } \right.$$

(9)

WOA’s pseudo-code is shown in Algorithm 1.

3 Proposed Comprehensive Learning WOA Algorithm

The proposed SCLWOA adds two main fruitful strategies: (1) The idea of chaos optimization is introduced into WOA, and the initialization is uniformly generated using the randomness, ergodicity, and regularity properties of sine chaos optimization; (2) CL mechanism is added to enrich the individual’s diversity using three mutually exclusive equations. Therefore, the algorithm can avoid falling into the local optimum trap. Besides, the feature selection problem can be solved by improving SCLWOA to a binary version.

3.1 Sine Chaotic Mapping

Global optimization is called searching for the minimal or maximal fitness of objective function on solution space. There is a strong likelihood that the global optimum will be reached if various randomly chosen points on the solution space can be offered. Therefore, stochasticity is the key to dealing with global optimization problems [97]. The behavior of the chaotic mapping phenomenon is characterized by uncertainty, irreducibility, and unpredictability, which is the existence of seemingly random irregular motion in a deterministic system. There are Common chaotic mappings, such as Logistic mapping, Sine mapping, Gaussian mapping, Tent mapping, Cubic mapping, and so on. We mainly use simple sine mapping to realize population initialization, and its mathematical equation can be calculated by Eq. (10).

$${\text{X}}(t + 1) = \sin (2/X(t))$$

(10)

3.2 Comprehensive Learning Strategy

Balancing the exploration and exploitation phases is crucial to determining whether the algorithm can find the optimal agent. Compared to other optimizers, WOA has good exploitation performance but is weak in global space search. Therefore, a comprehensive learning mechanism is incorporated to strength the search behavior of WOA. This is manifested in three mutually exclusive equations:

Levy distribution is a kind of continuous probability distribution of non-negative random variables, which has the advantages of exploring the unknown large search space and jumping out of the trouble of local optimization. Therefore, in Eq. (11), the Levy flight mechanism is introduced to WOA to get rid of the local optimum. Then, a specific formula is expressed as follows:

$$X_{t + 1}^{j} = X_{t}^{j} + rand*\left( {X_{t}^{j} - X_{best}^{j} \oplus Levy(\gamma )} \right)$$

(11)

where $\gamma$ is Levy’s flight parameters, and the product $\oplus$ represents the term-by-term multiplication.

The spiral mechanism helps to avoid premature convergence, and the mean value processing scheme is adopted in Eq. (12). In each dimension, solutions are updated according to a differential vector pointing to the overall average value of all solutions. By learning from the average behavior and effectively using the information results of all schemes, the population diversity of the algorithm is maintained, and the local optimum can be avoided. The expression of the updated equation is shown as Eq. (12).

$$X_{t + 1}^{j} = X_{mean} (j) + rand*\left( {X_{mean} (j) - X_{best}^{j} } \right)$$

(12)

Equation (13) guides individuals to approach the optimal direction by learning from the optimal solution to gain a better solution to enhance the local search ability.

$$X_{t + 1}^{j} = X_{t}^{j} + exp(rand)*\left( {X_{best}^{j} - X_{t}^{j} } \right)$$

(13)

The pseudo-code of SCLWOA is given in Algorithm 2. The detailed flowchart is shown in Fig. 1.

3.3 Binary SCLWOA

Solving the problem of feature selection is a search optimization problem. For a feature set of size dim, the search space consists of ${2}^{dim-1}$ possible subsets. This is a huge exhaustive search space. As an improved algorithm of the original WOA, the proposed SCLWOA has greatly improved the search performance. Therefore, this study applies it to gain a better feature subset.

An effective population-based strategy is to search the potential feature space and convert the feature subset into binary population individuals. Feature selection is a discrete combinatorial optimization problem, and the primary task of the method is to discretize the search strategy. The transfer function and position update equation selected in this paper are shown in Eqs. (14) and (15). Under the FS strategy based on SCLWOA, each individual $x = (x_{i,1} ,x_{i,2} ,...,x_{i,n} )$ is regarded as a feature subset, and the individual dimension dim represents the number of features in the original data set. If a feature is chosen, the corresponding dimension value is 1; otherwise, the value is 0 [98].

$$T\left( {x_{i}^{j} (t)} \right) = \frac{1}{{1 + exp^{{ - x_{i}^{j} \left( t \right)}} }}$$

(14)

$$x_{i}^{k} \left( {t + 1} \right) = \left\{ {\begin{array}{*{20}c} 0 & {If\,rand < T\left( {x_{i}^{k} (t + 1)} \right)} \\ 1 & {If\,rand \ge T\left( {x_{i}^{k} (t + 1)} \right)} \\ \end{array} } \right.$$

(15)

The fitness function is a measure to decide the merits of the selected subset. The selection of the fitness function will directly affect whether the algorithm can eventually find the optimal subset. FS aims to select the subset with the least features to ensure classification accuracy. Affected by the above two objective constraints, the adopted fitness function is shown in Eq. (16):

$$Fitness = \alpha \gamma_{R} (D) + \beta \frac{|R|}{{|D|}}$$

(16)

where $\gamma_{R} (D)$ represents the classification error rate of the KNN classifier, D denotes the number of features in the original data set, $\alpha \in \left[ {0,1} \right],\,\beta = 1 - \alpha$, R denotes the length of the selected feature subset. This paper selects the best point $\alpha =0.05$ through experiments.

3.4 Computational Complexity Analysis

The time complexity of SCLWOA in this paper is mainly determined by four subsequent steps: population initialization, sine chaos, fitness calculation, and position update order incorporating a comprehensive learning strategy. The time spent calculating the fitness value is mainly judged according to the chosen optimization problem, so the algorithm complexity is temporarily set to $O(T*n)$. The population initialization depends primarily on the number of schemes and the dimensionality, n indicates the number of agents, Dim denotes the dimensionality of the agents, and T means the maximum number of iterations, so the time complexity of population initialization is $O(n*Dim)$. The time complexity of sine chaos is $O(n*Dim),$ and the computational complexity of updating the location is $O\left(T*n*Dim\right)$. Besides, the integrated learning strategy is $O(T*n*Dim)$. Finally, the time complexity of SCLWOA is $O((T+3)*n*Dim+T*n)$ (Table 1).

Table 1 Parameter settings of SCLWOA

Comprehensive Learning Strategy Enhanced Chaotic Whale Optimization for High-dimensional Feature Selection

Abstract

Similar content being viewed by others

A New Chaotic Whale Optimization Algorithm for Features Selection

Feature selection by using chaotic cuckoo optimization algorithm with levy flight, opposition-based learning and disruption operator

Chaotic Atom Search Optimization for Feature Selection

Explore related subjects

1 Introduction

2 An Overview of Whale Optimization Algorithm (WOA)

2.1 Exploration

2.2 Exploitation

2.2.1 Contraction Encirclement Mechanism

2.2.2 Spiral Update Position

3 Proposed Comprehensive Learning WOA Algorithm

3.1 Sine Chaotic Mapping

3.2 Comprehensive Learning Strategy

3.3 Binary SCLWOA

3.4 Computational Complexity Analysis

4 Experimental Results and Analysis

4.1 The Qualitative Analysis

4.2 Cross-evaluation of Proposed SCLWOA was Conducted

4.3 Influence of Parameters

4.4 Comparison with Other Well-known MAs

4.5 Compared with the Improved WOA Variants

4.6 Compared with State-of-the-art Advanced Evolutionary Algorithms

4.7 Feature Selection Experiment

4.7.1 K-nearest Neighbor Classification

4.7.2 The Preparatory Work

4.7.3 Results and Analysis

4.8 Experiment Summary

5 Conclusions and Future Directions

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Declaration of AI and AI-assisted technologies in the writing process

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation