Feature selection by using chaotic cuckoo optimization algorithm with levy flight, opposition-based learning and disruption operator

kelidari, Mahsa; Hamidzadeh, Javad

doi:10.1007/s00500-020-05349-x

Feature selection by using chaotic cuckoo optimization algorithm with levy flight, opposition-based learning and disruption operator

Methodologies and Application
Published: 09 October 2020

Volume 25, pages 2911–2933, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Soft Computing Aims and scope Submit manuscript

Feature selection by using chaotic cuckoo optimization algorithm with levy flight, opposition-based learning and disruption operator

Download PDF

860 Accesses
47 Citations
Explore all metrics

Abstract

Feature selection, which plays an important role in high-dimensional data analysis, is drawing increasing attention recently. Finding the most relevant and important features for classifications are one of the most important tasks of data mining and machine learning, since all of the datasets have irrelevant features that affect accuracy rate and slow down the classifier. Feature selection is an optimization process, which improves the accuracy rate of data classification and reduces the number of selected features. Applying too many features both requires a large memory capacity and leads to a slow execution speed. Feature selection algorithms are often responsible to decide which features should be selected to be used during a classification algorithm. Traditional algorithms seemed to be inefficient due to the complexity of dimensions of the problem, thus evolutionary algorithms were used to improve the problem solving process. The algorithm proposed in this paper, chaotic cuckoo optimization algorithm with levy flight, disruption operator and opposition-based learning (CCOALFDO), is applied to select the optimal feature subspace for classification. It reduces the randomization in selecting features and avoids getting stuck in local optimum solutions which lead to a more interesting feature subset. Extensive experiments are conducted on 20 high-dimensional datasets to demonstrate the effectiveness and efficiency of the proposed method. The results showed the superiority of the proposed method to state-of-the-art methods in terms of classification accuracy rate. In addition, they prove the ability of the CCOALFDO in selecting the most relevant features for classification tasks. Thus, it is a reasonable solution in handling noise and avoiding serious negative impacts on the classification accuracy rate in real world datasets.

A Multi-objective Optimization Method for Joint Feature Selection and Classifier Parameter Tuning

An Efficient Feature Selection Method for High Dimensional Data Based on Improved BOA in AIoT

Comprehensive Learning Strategy Enhanced Chaotic Whale Optimization for High-dimensional Feature Selection

Article 04 July 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Feature selection in high dimensional data is a preprocessing step that has been widely used to improve the performance of learning algorithms in many fields (Mafarja et al. 2019). This preprocessing technique selects relevant features, removes other features, decreases the time required to learn the predictors and finally leads to an easier interpretation by simplifying the models (Thaher et al. 2020; Yan et al. 2019).

Nowadays, feature selection is an important task in machine learning, since it can affect the performance of high-dimensional data classification. There are different sources of data applicable to solve various problems of present world. Yet, selecting features from these big and complex data sources are one of the most challenging discussions. The real-world data has a continuous form; therefore, we have to extract useful knowledge from this data to solve problems. The large quantities of high dimensional data creates great opportunities and new challenges in the conventional data mining and knowledge discovery researches (Kriegel et al. 2009). The most important challenges in the high-dimensional data analysis include limitations in time, memory, and computational cost (Rao et al. 2019).

The data are comprised of samples with several features. Thus, selecting some samples or features is advisable to reduce its volume for better computation (Hamidzadeh et al. 2015). This reduction can be done through extracting or selecting features (Huang et al. 2018). However, selecting a feature would be advantageous, since it simplifies the interpretation of the classification model (Ramírez-Gallego et al. 2017). In other words, the complexity of classifiers decreases through using proper set of selected features. Furthermore, it causes faster convergence among optimum results, which leads to an improved performance of classifiers (Li et al. 2017; da Silva et al. 2017).

Many features of problem solving, may be useless or lack information. Some others even mislead the performance of the classifier and affect the accuracy rate due to the presence of noise. Although failing to remove these features does not cause any information problems, it raises the computational cost and increases the response time of the system. Considering the storage space, it also causes storing a lot of non-useful information along with useful data (Ramírez-Gallego et al. 2017; Lee and Kim 2016). Moreover, the specific feature selection method and the specific problem may be interrelated, so the method would not be applicable to other problems (Fong et al. 2013).

Mathematical algorithms are not interesting anymore because they are too complex and time consuming. Instead, metaheuristic algorithms which can imitate the life of living creatures, and their evolution have become more applicable (Arora and Anand 2019). Metaheuristic methods can be used to discover optimal global solutions (Nayar et al. 2019). These types of algorithms solve complex nonlinear and indeterminate problems more quickly than classical algorithms (Arora and Anand 2019). Unlike traditional algorithms, metaheuristic ones did not need to restart when new data enters or environment changes. Considering their nature, selecting features using evolutionary algorithms, reduce the consumed time and make acceptable accuracy rate in classification (Kumar and Bharti 2019). They can be applied for any problem that can be formularized, also they can be integrated with other optimization techniques. The biggest advantage of these algorithms is that they can solve problems very quickly and with reasonable accuracy rate, without applying common mathematical solutions (Mafarja et al. 2019).

There are two concepts in metaheuristic algorithms: exploration and exploitation. Exploration is a capability for searching the problem space without any attention to the results, while exploitation concentrates on results. In order to reach the best performance in problem solving, these capabilities should be balanced. Most of the evolutionary algorithms have issues with getting trapped in local optima and early convergence, thus several studies have been done on these issues (Arora and Anand 2019). By applying stringent conditions and using random numbers in implementing algorithms to increase the convergence rate, algorithms may be locked into local optimizations, and their accuracy and efficiency may reduce (Fong et al. 2013; Fong et al. 2016; Gandomi and Yang 2014).

One of the newest and most powerful evolutionary optimization methods is cuckoo’s search algorithm, which is more capable of finding global optimum points. This algorithm simulates the cuckoo bird’s behavior in nesting and laying to solve optimization problems (Gandomi et al. 2013). The skill of laying eggs like the host bird eggs in each generation improves evolutionary and prevents host birds from detecting them. As a result, chickens out of these eggs grow in the host nest. In this algorithm, cuckoos look for the best eggplant or globally best solution for survival (Gandomi et al. 2013; El Aziz and Hassanien 2016).

Similar to other evolutionary algorithms, cuckoo’s optimization algorithm has an excessive randomness problem, so this study uses the chaotic theory, levy flight and disruption operator to limit the random behavior of this algorithm. These enhancements improve the exploration phase and feature selection in complex data. By improving the cuckoo optimization algorithm for feature selecting, a feature subset with maximum feature elimination and the highest accuracy rate in classification are obtained.

This paper discusses data preprocessing and its most important task: selecting the best feature set for next processing. The CCOALFDO algorithm is a wrapper-based feature selection method developed by an integrated framework that combines chaos theory and cuckoo optimization algorithm as well as levy flight, opposition-based learning and disruption operator to handle the 20 high-dimensional datasets in selecting the most relevant features. The K-NN classifier (where K = 5) is used to evaluate the produced feature subsets. Experimental results on data sets show the efficiency of the proposed method over comparative algorithms.

In summary, what you can find in this study includes:

1
Using chaotic theory, levy flight and disruption operator to reduce the random selection of features and avoid getting stuck in local optimum.
2
The opposition-based learning prevents the solutions from accumulating too much in one place and causes a better search in the problem space
3
The disruption operator searches the opposite points of solutions, so the answers cover more parts of the problem space for searching and get better results.
4
Selecting informative features efficiently and reaching convergence in a reasonable time.
5
The CCOALFDO method for feature selection problem is defined as a supervised feature selection method which maximizes the classification accuracy rate and minimizes the feature set size.
6
In comparison with the five state-of-the-art evolutionary algorithms, the CCOALFDO algorithm ranked first in the accuracy rate and ranked second in reducing the number of features.

In Sect. 2, the related literature of feature selection is reviewed. The proposed method is explained thoroughly in Sect. 3. Section 4 reports the experimental results and analyses the performances of different feature selection algorithms. Finally, Sect. 5 contains conclusions and future works.

2 Related works

Feature selection is a very important preprocessing and knowledge discovery tool for data analysis, and it is applied to many domains. In addition, it is used as an important technique for high-dimensional data analysis, since it has shown its promising ability to enhance a variety of other learning tasks, such as data clustering, document analysis, image processing and video processing, to name a few (Huda et al. 2016; Pes et al. 2017; Hossain et al. 2016; Yadav et al. 2018; Zhang et al. 2019; Luo et al. 2018; Bannigidad and Gudada 2019).

Feature selection problems, used to face a tradeoff between the feature set size and the classification quality. In fact, the size of the subspace of features usually remained fixed during previous studies. This means that if we have a dataset with k features, in order to select a subspace of that with s features, there are $ k!/\left( {\left( {k - s} \right)!s!} \right) $ different ways (Nayar et al. 2019; Mafarja et al. 2019). Selecting s, the size of the subspace of features, is a great challenge because when s is too big, there may be redundant subset of features and if it is too small, some useful information will be lost (Bostani and Sheikhan 2017).

Depending on the label information usage in various datasets, feature selection methods can be classified into three categories (Solorio-Fernández et al. 2020): supervised, semi-supervised and unsupervised.

The supervised learning methods use the label information of data to guide the process of dimensionality reduction. According to some criteria, these methods measure the importance and relevance of the features by utilizing the labeled data to train the feature selection models (Zare et al. 2019; Zhong 2020; Ang et al. 2015). Since unlabeled data consists of samples and features without any information about the natural grouping of data, unsupervised methods are suitable for selecting features without attention to the labels in dataset (Solorio-Fernández et al. 2020).

While the second method, a semi-supervised feature selection, integrates a small amount of labeled data into unlabeled data, the third method lacks this additional information (Ang et al. 2015; Song et al. 2016; Xu et al. 2018; Yang et al. 2018).

Feature selection algorithms are divided into three groups based on the computational time: filters, wrappers and hybrid or embedded (Ramírez-Gallego et al. 2017; Fong et al. 2013).

In filter algorithms, the learner model doesn’t have any bias and features and patterns are independent, so there is no need to evolve the patterns as the features become evolved. Also, the structure of this algorithms have simple and clear criteria for feature evaluation (Li et al. 2017). Although the results of wrapper based algorithms using machine learning techniques are more reliable, and their computational cost is so high that they need to use a special method to reduce it (Kumar and Bharti 2019). Comparing with a filter algorithm, a wrapper-based feature selection has a more complex structure and higher accuracy rate, since it works in conjunction (Ramírez-Gallego et al. 2017; Fong et al. 2013; Peng and Fan 2017). To be flexible in solving problems, hybrid algorithms combine two types of algorithms including filter and wrapper ones which enables them to provide a more favorable result (Thaher et al. 2020).

Features are divided into four groups: irrelevant features, redundant features, weakly relevant but non-redundant features and strongly relevant features (Li et al. 2017). In order to reach an optimal feature set, we need algorithms that can choose some strongly relevant and non-redundant features.

Some redundant features may have significant statistical relations with other features. In other words, an irrelevant feature may be highly relevant when combined with other features. That is why feature set in some studies with computed minimal redundancy and maximum relevance (mRMR), is determined based on mutual dissimilarity with other features (Ding and Peng 2005; Masud 2010).

In past decades, researchers introduced a lot of evolutionary algorithms, such as PSO algorithm (Fong et al. 2016; Mistry et al. 2017; Dara et al. 2017; Qi et al. 2017), ACO algorithm (Shunmugapriya and Kanmani 2017; Sivagaminathan and Ramakrishnan 2007; Sweetlin et al. 2017; Harde and Sahare 2016), bee colony algorithm (da Silva et al. 2017; Shunmugapriya and Kanmani 2017), bat algorithm (Jayabarathi et al. 2018; Cheng et al. 2016),the frog leaping algorithm (Hu et al. 2016; Dai Y 2015) and the krill herd optimization algorithm (Rodrigues D 2014; Wang et al. 2019).

Recently, meta-heuristic algorithms like ant colony optimization (ACO) (Beltramo et al. 2019), simulated annealing (SA) (Yan et al. 2019), and Henry gas solubility optimization (HGSO) (Neggaz et al. 2020), have become more popular due to their good performance in solving feature selection problems. Fong, for example, used a metaheuristic algorithm as a particle search to select features. This method is suitable for high-correlation feature spaces and it can be applied with other classification methods as a fitness function (Fong et al. 2013).

Another metaheuristic algorithm is called crow search algorithm (CSA), which purposes a v-shaped binary CSA. However, like other evolutionary algorithms, it suffers from low convergence rate and entrapments in local optima (De Souza 2018). Anter introduced a hybrid crow search algorithm in selecting feature set with combination of chaos function and fuzzy c-means objective function as a cost function (Anter and Ali 2020).

Thaher purposed an algorithm to select a suitable feature set based on a hunting process performed by hawks and rabbits and called it HHO. This algorithm is a greedy search with improvement of levy flight in continuous space which transfers to binary space (Thaher et al. 2020).

Huang proposed a distributed PSO-SVM approach, which includes the binary PSO for feature selection and standard PSO for parameter optimization of SVM (Huang and Dun 2008). Also, a genetic algorithm (GA) was proposed to reduce the number of features which try to optimize the feature subset without degrading the classification accuracy (Himabindu et al. 2019).

The whale optimization algorithm (WOA) is also a recent algorithm used in feature selection problems which was not successful in reducing features (Hussien et al. 2019). Mafarja used eight different shapes of movement to improve the WOA; the results showed that most of V-shaped transfer functions had better performance in minimize the feature number and less information loss against S-shaped transfer functions (Mafarja et al. 2019).

Multi-strategy ensemble algorithm tried to enhance the processes of feature selection problems with tune gray wolf optimization (GWO) parameters (Hussien et al. 2019). Anter applied the GWO algorithm only in medical datasets (Anter et al. 2019).

Another study expecting more improvement in accuracy rate, combined GA selection with SA to avoid getting into the local optima called GA–SA algorithm (Yan et al. 2019).

The previously mentioned studies were not successful in making a balance between the classification accuracy rate and the feature reduction rate. This paper introduces a new wrapper feature selection method, in which the irrelevant features are discarded by using cuckoo optimization algorithm (COA) with the chaotic theory, levy flight and disruption operator. So, the aim of this paper is to provide a method which is able to improve classification accuracy rate and optimum reduction of the number of features simultaneously.

3 Background

In this section, cuckoo optimization algorithm is described in Sect. 3.1, the definition of chaotic function is given in Sect. 3.2, levy flight algorithm is described in Sect. 3.3, the definition of opposition-based learning is given in Sect. 3.4 and finally in Sect. 3.5, disruption operator is briefly reviewed.

3.1 Cuckoo optimization algorithm

Choosing an optimal subspace of the dataset is critical to speed up and grade the classification especially for high volume data. In this study, the cuckoo’s algorithm which was introduced by Yang in 2009 (Gandomi et al. 2013) is used for selecting features. Cuckoos are interesting birds that have a spatial strategy for aggressive reproduction. In fact, they put their eggs in the nests of other birds called hosts. The cuckoo eggs will not have any chance to grow up if the host birds find them in their nest and throw them out. However, if the host birds do not find the cuckoo’s egg, it will be the cuckoo’s chicken, which grows faster than host’s chickens and throws them out of the nest. Thus, finding the best place with similar eggs for egg laying, is essential for cuckoo’s generation to survive.

Cuckoo search algorithm rules include:

Number of nests (habitats) are fixed.
A cuckoo lays a number between 5 and 10 eggs each time and then choose a nest.
Best nests will be used in the next generation.
For N_var dimensional problem an array with 1 row and N_var column are used.

A position of habitat or nest in space is like below:

$$ {\text{Habitat}} = \left[ {x_{1} , x_{2} , \ldots ,x_{\text{Nvar}} } \right] $$

(1)

Habitats which are more suitable for egg laying are desirable; therefore, a fitness function is applied to first evaluate every habitat then describe the profit function as follow:

$$ {\text{fitness}} = f_{p} \left( {\text{habitat}} \right) = {\text{f}}_{\text{p}} \left( {x_{1} ,x_{2} , \ldots ,x{\text{x}}_{\text{Nvar}} } \right) $$

(2)

The space needs to be limited in the optimization algorithm, then var_hi is applied for upper bound and var_low for lower bound. Every cuckoo has an egg laying radius (ELR) which is determined by the proportion between the number of eggs of this cuckoo and the total number of eggs in bound. α is tuning the maximum value of ELR. ELR is formulated in Eq 3:

$$ {\text{ELR}} = \alpha \times \frac{\text{Number of current cuckoo's eggs}}{\text{Total number of eggs}} \times \left( {{\text{var}}_{\text{hi}} - {\text{var}}_{\text{low}} } \right) $$

(3)

In this algorithm, the first place for egg laying is selected randomly, but the next generations use the best solutions definitely.

3.2 Chaotic function

Chaos is a deterministic dynamic function that is sensitive to its initial conditions and parameters. The nature of chaos is apparently random and unpredictable; however, it also possesses an element of regularity (Alatas et al. 2009).

Chaos is a nonlinear function which can change the algorithm behavior. Chaotic maps represent a pseudo-random deterministic process, that is non-converging bounded (dos Santos and Mariani 2008).

It can solve the early convergence in evolutionary algorithms by avoiding local minimums. There are a lot of chaotic maps. Ten one-dimensional maps that can be used as initial position pattern in cuckoo optimization algorithms are introduced in Table 1 (Sayed et al. 2019; Saremi et al. 2014; Hamidzadeh and Namaei 2018; Hamidzadeh et al. 2017):

Table 1 Chaotic maps

Full size table

3.3 Levy flight

A levy flight random walk is suggested as a permutation to perform a local search (Emary and Zawbaa 2018). Levy flight process represents the optimum random search pattern and is frequently detected in nature (Viswanathan et al. 2008). This distribution has infinite mean and infinite variance and also has a simple power law formula. S is the step length in levy flight algorithm. The parameters u and ν are given by the normal distributions in Eq. 6. The variance, σ, is calculated as Eq. 7 with 1 ≤ β ≤ 2 and $ \varGamma \left( \mathcal{Z} \right) $ is the gamma function (Yang and Deb 2009; Syberfeldt 2014).

$$ L\left( S \right)\sim\left| S \right|^{ - 1 - \beta } \quad 0 < \beta \le 2 ,\,S \to \infty $$

(4)

$$ S = \frac{u}{{\left| v \right|^{{\frac{1}{\beta }}} }} $$

(5)

$$ u \sim N(0,\sigma_{u}^{2} ) \, ,\nu \sim N(0,\sigma_{v}^{2} ) $$

(6)

$$ \sigma_{u} = \left( {\frac{{\varGamma \left( {1 + \beta } \right)*\sin \left( {\pi \beta /2} \right)}}{{\varGamma \left[ {\left( {1 + \beta } \right)/2} \right]*\beta *2^{{\left( {\beta - 1} \right)/2}} }}} \right)^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 \beta }}\right.\kern-0pt} \!\lower0.7ex\hbox{$\beta $}}}} ,\sigma_{v} = 1 $$

(7)

3.4 Opposition-based learning

Opposition-based learning (OBL) is a useful way to better explore problem space. This algorithm can help increase efficiency by searching for the solution and its opposite at the same time. Although this strategy raises the algorithm’s computational load, but it helps a lot to achieve faster convergence (Oliva and Abd Elaziz 2020; Tizhoosh 2005; Aladeemy et al. 2020).

The OBL strategy in D-dimensional space is as Eq. 8.

$$ \bar{x}_{j} = u_{j} + l_{j} - x_{j} \quad j = 1,2 \, , \ldots ,D $$

(8)

In Eq. 8 the x is the habitat in search space and $ \bar{x} $ is the opposite sides of the habitat which are generated within the interval [u, l]. Clearly, for each dimension the opposite number should be calculated which is represented by j. Finally, $ \bar{x} $ represents the opposite habitat states in D dimension.

3.5 Disruption operator

Disruption operator (D_op) is implemented as a solution to enhance the population diversity. This phenomenon was inspired from the astrophysics. Harwit belives:

When a set of gravitationally bound particles (with total mass m) is very close to a massive object (with mass M), then the set becomes a torn apart. Similar to this, when a solid body that held together through gravitational forces, approaches a much more massive object.

This operator is used because CCOALFDO algorithm searches better in the problem space and maintains the balance between exploration and exploitation processes. D_op is formulated in Eq. 9:

$$ D_{\text{op}} = \left\{ {\begin{array}{*{20}c} {D_{{is_{i,j} }} \times \delta \left( { - 2,2} \right)} & { {\text{if}} D_{{is_{{i,{\text{best}}}} }} \ge 1} \\ {1 + D_{{is_{{i,{\text{best}}}} }} \times \delta \left( {\frac{{ - 10^{ - 4} }}{2},\frac{{10^{ - 4} }}{2}} \right) } & {\text{otherwise}} \\ \end{array} } \right. . $$

(9)

In Eq. 9 the $ D_{{is_{i,j} }} $ represents the Euclidean distance between the ith solution and its nearest neighborhood jth solution, and $ D_{{is_{{i,{\text{best}}}} }} $ is the Euclidean distance between the ith solution and the best solution. Also, the $ \delta \left( {a,b} \right) $ is a random number which is generated within the interval [a, b] (Neggaz et al. 2020).

4 Proposed chaotic cuckoo optimization algorithm with levy flight and disruption operator for feature selection

In this section, the proposed wrapper-based chaotic cuckoo optimization algorithm with levy flight and disruption operator (CCOALFDO) is introduced as a feature selection method. In the basic cuckoo optimization algorithm, the cuckoos move in the search space to modify their positions to any point in the space which is called the continuous space. Regarding the nature of the feature selection issues, the solutions are limited to the binary space [0,1] values.

With N_pop cuckoos, one can make a N_pop × N_var matrix. In the usual algorithm, at the first place of egg laying, cuckoos randomly select some habitats that are built in the algorithm, chaotic function speeds up the convergence of the algorithm.

Searching for the optimal solution in a chaotic way among the local minima using ergodicity, regularity and semi-stochastic properties, leaded to a high probable global optimal solution. In Table 1, chaos variables which are generated by Guass/mouse map in the CCOALFDO algorithm are described. However, chaos variables are now assumed to be generated via an arbitrary one-dimensional map which enables them to either modify in the range of (0.1) or we scale them in this range.

The initial value for Gauss/mouse chaotic function is a random number then the sequence values are calculated by Eq. 10:

$$ \begin{aligned} & x_{0} = {\text{rand}} \\ & x_{i + 1} = \left\{ {\begin{array}{*{20}c} 1 & {x_{i} = 0} \\ {\frac{1}{{\bmod \left( {x_{i} ,1} \right)}}} & {\text{otherwise}} \\ \end{array} } \right. \\ \end{aligned} $$

(10)

In order to find habitat in the next generation, cuckoos should search around to find the best.

Though there are many different evolutionary variants in the literature, cuckoo optimization algorithm problems of premature convergence and generating inefficient results are still persisting. Levy flights method is used to solve these problems and enable the CCOALFDO algorithm to generate more efficient results. Also, the disruption operator observed population diversity of habitats. Using this method, it is ensured that COA, which is unable to perform global search well, would do it more effectively and would not be trapped in local minima. Thus, random walks drawn from levy stable distribution are used in the proposed method to define steps. Then, the step size is calculated by Eq. 11. Here, the factor 0.01 is the typical length scale to avoid exaggerating the size of the step and make the solutions jump out of the design domain.

$$ \begin{array}{*{20}l} {{\text{Step}}\;{\text{size }} = 0.01 \times S} \hfill \\ \end{array} $$

(11)

In Eq. 12 the global random search done by the concept of levy flights is presented in which $ x_{i}^{\left( t \right)} $ is the current state of the cuckoo i in its last search for finding a habitat after t iterations. Levy flight is applied to generate a new state $ x_{i}^{{\left( {t + 1} \right)}} $ of $ x_{i}^{\left( t \right)} $. By levy flight, the new state of the habitat $ x_{i}^{\left( t \right)} $ is calculated by Eq. 12 and ELR is described in Eq. 3.

$$ x_{\text{i}}^{{\left( {{\text{t}} + 1} \right)}} = x_{\text{i}}^{{\left( {\text{t}} \right)}} + {\text{ELR}} \times {\text{levy}}\left( {\text{s}} \right) $$

(12)

For all habitat the OBL strategy in D-dimensional space should be calculated as Eq. 8: This doubles the number of habitats. The disruption operator should be applied to each habitat and their opposites. As mentioned earlier, D_op is calculated by Eq. 9. It is necessary to be sure that all the answers are not too close together and make the balance between exploration and exploitation processes. The inclusion of OBL and DO permits to increase the diversity of the habitats.

Since the continuous space of features ought to be transformed to the corresponding binary space, a new position of the habitat is used in the next formula in which y^j,t+1 refers to the cuckoo with j index after t iteration:

$$ y^{{{\text{j}},{\text{t}} + 1}} = {\text{f}}\left( x \right) = \left\{ {\begin{array}{*{20}c} 1 & {{\text{if}}\left( {w\left( {x^{{{\text{j}},{\text{t}} + 1}} } \right)} \right) \ge {\text{rand}}} \\ 0 & { {\text{otherwise}}} \\ \end{array} } \right. $$

(13)

$ w\left( {x^{j,t + 1} } \right) $ is calculated by Eq. 14.

$$ w\left( a \right) = \frac{1}{{1 + {\text{e}}^{{10\left( {{\text{a}} - 0.5} \right)}} }} $$

(14)

To solve the feature selection problems, the continuous space (free position) must be transformed to their corresponding binary solutions.

For example, if the value of a bit equals to 1, its corresponding feature is selected in the feature subset, while 0 indicates nothing (Fig. 1). It means that the second and fourth features are selected for the feature subset.Fitness function described as Eq. 15:

$$ {\text{fitness}}_{t} = { \hbox{max} }\left( {\left( {1 - b} \right) \times {\text{ACC}} + b \times \left( {1 - \frac{m}{n}} \right)} \right) $$

(15)

in Eq. 16, n is the total number of features, m is the selected feature subset length, and b is a parameter corresponding to create a balance between the classification accuracy and feature reduction which can be in $ \left[ {0,1} \right] $. It is illustrated in the experiments that b₌0.2. The ACC parameter is the classification accuracy rate made by calling a rapid classifier such as KNN to evaluate the feature sets and generate the accuracy value based on Euclidean distance measure. The K parameter equals 5. ACC is described as Eq. 16:

$$ {\text{ACC}} = {\text{ correct}}\;{\text{classificationrate}}\;{\text{with}}\;K\;{\text{Neighbors}}\;{\text{Classifier}}\;\left( {N\_{\text{neighbors }} = \, 5} \right) $$

(16)

During every iteration, fitness values are assigned to cuckoo’s habitat in the search space. These positions are evaluated on each iteration, and the best position is selected as the best solution.

Figure 2 shows the framework of CCOALFDO algorithm for problem optimization. In general, the proposed algorithm depends on improving the behaviors of the cuckoo optimization algorithm by using the chaotic algorithm, levy flight, opposition-based learning and disruption operator. Clearly, each of these operators have their own tasks. The aim of using the chaotic algorithm is to improve the start position in the first step. The updating mechanism was used in the levy flight operation, since the cuckoos have the largest effect on the convergence of solutions leading to the global solution. Also, opposition-based learning is an intelligent solution for searching the opposite side of space and enhancing the exploration power. Whereas the disruption operator is used to enhance the diversification of the whole population, which leads to improving the exploration ability of the CCOALFDO and convergence of the optimal solution. At the end, calling a rapid classifier such as KNN was done to evaluate the candidate solution and generate the fitness value based on the reduced dataset.

In CCOALFDO algorithm, two criteria were selected to terminate the algorithm:

1
maximum iteration (or generation) number
2
no significant improvement in accuracy rate in the latest 20 iterations

The pseudo-code of CCOALFDO is shown in Algorithm 1. The algorithm starts by generating a random population of solutions with chaotic formula, then the iteration starts. The basic task in each iteration is determination of the best solution, and the coefficients in the next step. In the following phase, the solutions in the population are updated using levy flight and also the opposite position of each nest is calculated as new nests. It then applies D_op formula for all of the habitats and their opposites to make more suitable positions for them. This process is repeated until reaching a satisfying stopping constraint. Finally, the algorithm returns to the best solution.

The computational complexity of the proposed algorithm depends on several cases, including the cuckoo optimization algorithm (COA), the chaotic map (CM), the levy flight (LF) and disrupt operator (DO). Therefore, the complexity of the CCOALFDO algorithm is calculated by Eq. 17.

$$ \theta ({\text{CCOALFDO}}) = K_{s} \theta \left( {\text{COA}} \right) + N \times \theta \left( {\text{CM}} \right) + \left( {N - k_{s} } \right)\theta \left( {\text{LF}} \right) + \theta \left( {\text{DO}} \right) $$

(17)

where $ \theta \left( {\text{COA}} \right) $ is calculated by Eq. 18.

$$ \theta \left( {\text{COA}} \right) = \theta \left( {t\left( {{\text{Dim}} \times N + C \times N + N{ \log }N} \right)} \right) $$

(18)

and $ \theta \left( {\text{CM}} \right) $ is calculated by Eq. 19.

$$ \theta \left( {\text{CM}} \right) = \theta \left( {t \times N} \right) $$

(19)

and $ \theta \left( {\text{LF}} \right) $ is described as Eq. 20.

$$ \theta \left( {\text{LF}} \right) = \theta \left( {t({\text{Dim}} \times N + C \times N} \right) $$

(20)

And finally, $ \theta \left( {\text{DO}} \right) $ is calculated by Eq. 21.

$$ \theta \left( {\text{DO}} \right) = \theta \left( {t \times N} \right) $$

(21)

In Eq. 17 to 21, t represents the number of iterations, Dim is the number of the features. N represents the number of feature subsets and also, Ks represents the number of habitats which updated in every iteration of the CCOALFDO algorithm, and C is the cost of objective function.

5 Experimental result

In this section, we perform the experiments on various high-dimensional datasets to compare the proposed CCOALFDO method with five other feature selection methods. At first, in Sect. 5.1 applied datasets are introduced concisely. Then, in Sect. 5.2, the experimental setups and parameters are discussed. Finally, in Sect. 5.3 the performance of the proposed method against other relevant methods is evaluated.

5.1 Datasets

The results presented in the experiments are based on the 15 real-world datasets from UCI machine learning repository (Lichman 2013), and the 5 medical datasets were used (Statnikov et al. 2005) that are listed in Table 2 and cover different sizes and dimensions. In Table 2, the number of data samples, features and classes for each dataset are given. In all datasets, the missing value is replaced with the median of that feature.

Table 2 Experimental datasets (Lichman 2013; Statnikov et al. 2005)

Full size table

5.2 Experimental setup and parameters setup

The experiment was conducted on the Intel core i3 computing platform with 4 GB RAM, 2.13 GHz frequency. The programming environment is MATLAB R2014a in the Windows 10 operating system.

To compare the CCOALFDO algorithm, we used some other evolutionary algorithms such as whale optimization algorithm (WOA) (Mafarja and Mirjalili 2018), gray wolf optimizer (GOW) (Emary et al. 2016), particle swarm optimization (PSO) (Eberchart and Kennedy 1995), crow search algorithm (CSA) (Sayed et al. 2019) and cuckoo optimization algorithm (Rajabioun 2011). Table 3 shows the parameter setting of the algorithms used for comparing.

Table 3 State- of-the-art metaheuristic algorithm parameter settings

Full size table

In order to train and validate K − onefolds are used, while the last fold is used for testing in the k-fold cross-validation. The proposed method’s performance should be confirmed by applying the fivefold cross validation to generate the five equal portions in the dataset.

The role of the training part involves training the applied classifier through optimization at the final evaluation, while the validation part assesses the performance of the classifier during the training time and the testing part evaluates features given the trained classifier which are picked up eventually. Based on fivefold cross validation, 80% of training partition is used for training, while 20% is used for the fitness function validation.

5.3 Result interpretation

In this section, we report and analyze the experimental results of the proposed CCOALFDO method and the compared feature selection methods. To provide a fair comparison, each method is conducting 10 solutions, and its average percentages are reported in comparison with other methods. All experiments use the same parameters for initialization and evolution. To execute CCOALFDO algorithm, the population size equals 50 and also the iteration number equals 100. It is noteworthy to mention that all the experiments were repeated for 30 independent times to obtain meaningful results.

Using the KNN algorithm for classification and dividing the correct sample classification to the overall number of samples, the accuracy rate of the algorithm (ACC) is found. In KNN algorithm, the K parameter equals 5. The optimum result would be the one with maximum accuracy rate and minimum number of features.

Table 4 presents the results of the proposed algorithm in each dataset. SF shows selected features in percent, and ACC is the accuracy rate in percent that reached to threefold cross-validation.

Table 4 Experiment chaotic maps on different datasets

Full size table

According to Table 4 the Gauss/mouse map obtains similar or even higher accuracy rates on all datasets in comparison with 10 other maps. However, it obtains insignificant lower accuracy rate in D3, D5, D7, D8, D13and D19 datasets. Also, the results are graphically shown in Fig. 3. As shown in Fig. 3, the red bar is usually higher than the other colors.

The Gauss/mouse map results of CCOALFDO are not much higher than other maps; therefore, it is used to compare CCOALFDO algorithm with other five algorithms in each dataset. Figure 4 depicted the convergence of the proposed method with Gauss/mouse map. As it is seen in Fig. 4, larger datasets such as D2, D6, D12 and D19 needs more than 100 iteration numbers to increase the accuracy rate, and if this number of iterations increases, they may perform better at the correct rate. The smallest datasets such as D7, D9, D10 and etc., are also the same. More iterations are not needed as 50 iterations become fixed.

Table 5 presents the accuracy and selected feature rate results. Because of the limitation of random behavior of cuckoos, the proposed algorithm caused faster convergence in most datasets.

Table 5 Accuracy and selected feature rate comparing result

Full size table

According to Table 6, considering the average scores in the datasets, the CCOALFDO algorithm eventually shows the best performance in accuracy rate. The results are graphically shown in Fig. 5. Comparing the accuracy rate of different algorithms, the proposed algorithm shows greater accuracy than other algorithms in most datasets. In terms of ACC, the proposed CCOALFDO method achieves the best performance on the D2, D3, D4, D5, D6, D8, D11, D12, D13, D15, D16, D17, D18, D19, D20 datasets. Other datasets could not have the best accuracy rate which might happen due to the large number of classes. Figures 5 shows the boxplots of classification accuracy rate for the CCOALFDO, which is compared with 5 other evolutionary algorithms including WOA, GOW, PSO, CSA and COA that have been implemented and tested on twenty datasets in the same environment.

Table 6 Accuracy rate and standard deviation comparing result

Full size table

The Wilcoxon statistical test is also used to find about the proposed method more precisely. It is a non-parametric statistical test implemented to determine statistical significance between algorithms (Statnikov et al. 2005). The Wilcoxon test is conducted at 5% significance level to verify whether there is a statistical difference between obtained results of accuracy rate. In other words, if p value < 0. 05, then it indicates the proposed approach has a significant difference.

According to a normative analysis on Wilcoxon test results, the outcome of the proposed method is meaningfully distinguishable from others. Considering accuracy, the CCOALFDO method performs significantly better than the competing methods.

Table 7 compares the results of the CCOALFDO with WOA, GOW, PSO, CSA and COA where bold values represent p-value > = 0.05. According to the results, the proposed algorithm showed more significant improvements comparing to other methods.

Table 7 p-value of the Wilcoxon rank sum test for the classification accuracy rate based on CCOALFDO and five other evolutionary algorithms

Full size table

The comparison between CCOALFDO accuracy rate and five other evolutionary algorithms are presented in Fig. 5 and 6. These boxplots are created to report the accuracy rates of 30 independent runs for CCOALFDO and five others evolutionary algorithms. The boxplots prove the superiority of the proposed algorithm over other mentioned algorithms. It creates more compact boxes with a higher median than all other algorithms.

As Table 8 and Fig. 7 represent, CCOALFDO placed as the second-best method considering the number of features. On the D2, D6, D12, D15, D17, D19, D20 datasets, it obtained significantly better SF rank, while obtaining the best ACC rank in Table 6. The results show that the proposed method acts better in feature reduction in larger datasets. To evaluate the quality of each algorithm, standard division is also measured. The CCOALFDO achieved the smallest standard division value in most of datasets which proves the robustness of the algorithm and fixes it as the most stable method in terms of quality of feature selection.

Table 8 Selected features rate comparing results

Full size table

Considering the p-values in Table 9, the proposed CCOALFDO outperforms others in most of the cases, significantly. The Wilcoxon test measure shows the superiority of the proposed method over WOA, PSO, CSA, COA algorithm. The results also denote that the GOW algorithm got higher p-values than others and achieved better rank comparing with CCOALFDO algorithm, however; its accuracy rate was lower.

Table 9 p-value of the Wilcoxon rank sum test for the selected features rate based on CCOALFDO and five other evolutionary algorithms

Full size table

According to Tables 6 and 8, the superiority of the proposed method over others in term of classification accuracy rate and feature reduction rate is clear. In all 20 real-world datasets, the CCOALFDO algorithm achieved the best rank in accuracy rate. There are some reasons which create this advantage. The first one is using the chaotic function to enhance the first positions of the cuckoos that leaded to maintaining the population and improved the search ability by levy flight for finding the best solution. Second, applying the DO operator which avoided getting trapped in local optima. And finally, using OBL improved the ability of exploration which guaranteed the diversity of the population. As a result, the performance of convergence improved, the chance of trapping in local minima decreased and the competence of the selected features was guaranteed.

Table 10 lists the average CPU time measured in seconds for per run, taken by the original feature set and the selected feature set. In all cases, the CCOALFDO algorithm deletes the redundant and noisy features, so the classifier can provide enhanced time performance, while keeping better classification performance.

Table 10 Comparing runtime and accuracy rates between selected feature set and original feature set

Full size table

Table 11 indicates the average CPU time measured in seconds for per run, taken by the CCOALFDO algorithm and five others evolutionary algorithm. This measure indicates the speed of the algorithms in selecting features from a given dataset. In this measure, the GOW is the fastest algorithm, while the proposed algorithm ranks fourth. Because of more searching steps in the CCOALFDO algorithm, it takes more time for computations but it is still reasonable in this case because of its high accuracy rate and good feature selection rate. Although the CCOALFDO algorithm needs more enhancement in the time complexity issue, it can be fixed by applying the opposition-based learning and disruption operator for only a small part of the cuckoo’s or based on a specific condition. Figure 8 represents these results graphically.

Table 11 Comparison between CCOALFDO and other metaheuristics based on time (s)

Full size table

6 Conclusion and future works

Feature selection is an ever evolving frontier in data mining and machine learning. The more the machine learning develops, the broader the scopes of feature selection research are. In this study, a new algorithm is presented for feature selection according to the cuckoo optimization algorithm. The proposed CCOALFDO algorithm utilizes the optimization feature selection which transferred the random variables to chaotic behavior. Moreover, the levy flight is used to handle uncertainty and update cuckoo’s steps in high-dimensional space. In addition, the opposition-based learning technique and the disruption operator are used to improve the exploration ability and ensure the diversity of the population to speed up the convergence rate. The evaluation of the proposed method was done through applying 20 datasets. Finally, the results were compared with WOA, GOW, PSO, CSA, COA algorithms. The results revealed that the presented method was able to utilize the information structure of a large number of features and enhance the accuracy rate of the classification tasks accordingly. By controlling the random behavior of this algorithm, a better feature set of datasets was selected. The rate of classification accuracy has improved, since some redundant features were eliminated. In other words, the CCOALFDO algorithm aims to find a compromise between two objectives maximizing the classification accuracy rate and reducing the number of features. The classification time was also reduced after decreasing the number of features and choosing a suitable feature set, while providing a more accurate classification. Unlike other evolutionary algorithms, applying levy flight, opposition-based learning and disrupt operator to the whole population leads to the solution diversity; however, the computational time increases. This is an issue which can be handled in the future works. They can also focus on applying other rates, such as precision or G-mean and the use of CCOALFDO for feature selection issues in online applications. Moreover, the CCOALFDO can evaluate the real-world problems to ensure the stability.

References

Aladeemy M et al (2020) New feature selection methods based on opposition-based learning and self-adaptive cohort intelligence for predicting patient no-shows. Appl Soft Comput 86:105866
Article Google Scholar
Alatas B, Akin E, Ozer AB (2009) Chaos embedded particle swarm optimization algorithms. Chaos, Solitons Fractals 40(4):1715–1734
Article MathSciNet MATH Google Scholar
Ang JC et al (2015) Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans Comput Biol Bioinf 13(5):971–989
Article Google Scholar
Anter AM, Ali M (2020) Feature selection strategy based on hybrid crow search optimization algorithm integrated with chaos theory and fuzzy c-means algorithm for medical diagnosis problems. Soft Comput 24(3):1–20
Article Google Scholar
Anter AM, Azar AT, Fouad KM (2019) Intelligent hybrid approach for feature selection. In: International conference on advanced machine learning technologies and applications. Springer
Arora S, Anand P (2019) Binary butterfly optimization approaches for feature selection. Expert Syst Appl 116:147–160
Article Google Scholar
Bannigidad P, Gudada C (2019) Age-type identification and recognition of historical kannada handwritten document images using HOG feature descriptors. In: Iyer B, Nalbalwar S, Pathak N (eds) Computing, communication and signal processing. Springer, Berlin, pp 1001–1010
Chapter Google Scholar
Beltramo T, Klocke M, Hitzmann B (2019) Prediction of the biogas production using GA and ACO input features selection method for ANN model. Inf Process Agric 6(3):349–356
Google Scholar
Bhattacharya A, Chattopadhyay PK (2010) Hybrid differential evolution with biogeography-based optimization for solution of economic load dispatch. IEEE Trans Power Syst 25(4):1955–1964
Article Google Scholar
Bostani H, Sheikhan M (2017) Hybrid of binary gravitational search algorithm and mutual information for feature selection in intrusion detection systems. Soft Comput 21(9):2307–2324
Article Google Scholar
Cheng C, Bao L, Bao C (2016) network intrusion detection with bat algorithm for synchronization of feature selection and support vector machines. In: International symposium on neural networks. 2016. Springer
da Silva DL, Seijas LM, Bastos-Filho CJ (2017) Artificial bee colony optimization for feature selection of traffic sign recognition. Int J Swarm Intell Res (IJSIR) 8(2):50–66
Article Google Scholar
Dai Y et al. (2015) Feature selection of high-dimensional biomedical data using improved SFLA for disease diagnosis. In: International conference on bioinformatics and biomedicine (BIBM), IEEE
Dara S, Banka H, Annavarapu CSR (2017) A rough based hybrid binary PSO algorithm for flat feature selection and classification in gene expression data. Ann Data Sci 4(3):1–20
Article Google Scholar
De Souza RCT et al (2018) A V-shaped binary crow search algorithm for feature selection. In: 2018 IEEE congress on evolutionary computation (CEC). 2018. IEEE
Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinf Comput Biol 3(02):185–205
Article Google Scholar
dos Santos CL, Mariani VC (2008) Use of chaotic sequences in a biologically inspired algorithm for engineering design optimization. Expert Syst Appl 34(3):1905–1913
Article Google Scholar
Du D, Simon D, Ergezer M (2009) Biogeography-based optimization combined with evolutionary strategy and immigration refusal. In: IEEE International conference onsystems, man and cybernetics, 2009. SMC IEEE
Eberchart R, Kennedy J (1995) Particle swarm optimization. In: IEEE international conference on neural networks, Perth, Australia. 1995
El Aziz MA, Hassanien AE (2016) Modified cuckoo search algorithm with rough sets for feature selection. Neural Comput Appl 29(4):1–10
Google Scholar
Emary E, Zawbaa HM (2018) Feature selection via Lèvy Antlion optimization. Pattern Anal Appl 22(3):1–20
Google Scholar
Emary E, Zawbaa HM, Hassanien AE (2016) Binary gray wolf optimization approaches for feature selection. Neurocomputing 172:371–381
Article Google Scholar
Fong S, Yang XS, Deb S (2013) Swarm search for feature selection in classification. In: 2013 IEEE 16th international conference on computational science and engineering (CSE), 2013. IEEE
Fong S, Wong R, Vasilakos AV (2016) Accelerated PSO swarm search feature selection for data stream mining big data. IEEE Trans Serv Comput 9(1):33–45
Google Scholar
Gandomi AH, Yang X-S (2014) Chaotic bat algorithm. J Comput Sci 5(2):224–232
Article MathSciNet Google Scholar
Gandomi AH, Yang X-S, Alavi AH (2013) Cuckoo search algorithm: a metaheuristic approach to solve structural optimization problems. Eng Comput 29(1):17–35
Article Google Scholar
Hamidzadeh J, Namaei N (2018) Belief-based chaotic algorithm for support vector data description. Soft Comput 23:1–26
MATH Google Scholar
Hamidzadeh J, Monsefi R, Yazdi HS (2015) IRAHC: instance reduction algorithm using hyperrectangle clustering. Pattern Recogn 48(5):1878–1889
Article MATH Google Scholar
Hamidzadeh J, Sadeghi R, Namaei N (2017) Weighted support vector data description based on chaotic bat algorithm. Appl Soft Comput 60:540–551
Article Google Scholar
Harde S, Sahare V (2016) Design and implementation of ACO feature selection algorithm for data stream mining. In: International conference on automatic control and dynamic optimization techniques (ICACDOT), IEEE
Himabindu K, Jyothi S, Mamatha D (2019) GA-based feature selection for squid’s classification. In: Wang J, Reddy G, Prasad V, Reddy V (eds) Soft computing and signal processing. Springer, Berlin, pp 29–36
Chapter Google Scholar
Hossain MA, Jia X, Benediktsson JA (2016) One-class oriented feature selection and classification of heterogeneous remote sensing images. IEEE J Sel Top Appl Earth Observ Remote Sens 9(4):1606–1612
Article Google Scholar
Hu B et al (2016) Feature selection for optimized high-dimensional biomedical data using the improved shuffled frog leaping algorithm. IEEE/ACM Trans Comput Biol Bioinf 15(6):1765–1773
Article Google Scholar
Huang C-L, Dun J-F (2008) A distributed PSO–SVM hybrid system with feature selection and parameter optimization. Appl Soft Comput 8(4):1381–1391
Article Google Scholar
Huang P et al (2018) Feature extraction based on graph discriminant embedding and its applications to face recognition. Soft Comput 23(16):1–14
Google Scholar
Huda S et al (2016) A hybrid feature selection with ensemble classification for imbalanced healthcare data: a case study for brain tumor diagnosis. IEEE Access 4:9145–9154
Article Google Scholar
Hussien AG et al (2019) S-shaped binary whale optimization algorithm for feature selection. In: Bhattacharyya S, Mukherjee A, Bhaumik H, Das S, Yoshida K (eds) Recent trends in signal and image processing. Springer, Berlin, pp 79–87
Chapter Google Scholar
Jayabarathi T, Raghunathan T, Gandomi A (2018) The bat algorithm, variants and some practical engineering applications: a review. In: Yang XS (ed) Nature-inspired algorithms and applied optimization. Springer, Cham, pp 313–330
Chapter Google Scholar
Jothiprakash V, Arunkumar R (2013) Optimization of hydropower reservoir using evolutionary algorithms coupled with chaos. Water Resour Manag 27(7):1963–1979
Article Google Scholar
Kriegel H-P, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data (TKDD) 3(1):1
Article Google Scholar
Kumar L, Bharti KK (2019) An improved BPSO algorithm for feature selection. In: Khare A, Tiwary U, Sethi I, Singh N (eds) Recent trends in communication, computing, and electronics. Springer, Berlin, pp 505–513
Chapter Google Scholar
Lee J, Kim D-W (2016) Efficient multi-label feature selection using entropy-based label selection. Entropy 18(11):405
Article Google Scholar
Li Y, Li T, Liu H (2017) Recent advances in feature selection and its applications. Knowl Inf Syst 53(3):551–577
Article Google Scholar
Lichman M (2013) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, p 2017
Google Scholar
Li-Jiang Y, Tian-Lun C (2002) Application of chaos in genetic algorithms. Commun Theor Phys 38(2):168
Article Google Scholar
Luo T et al (2018) Semi-supervised feature selection via insensitive sparse regression with application to video semantic recognition. IEEE Trans Knowl Data Eng 30(10):1943–1956
Article Google Scholar
Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453
Article Google Scholar
Mafarja M et al (2019a) Whale optimisation algorithm for high-dimensional small-instance feature selection. Int J Parallel Emerg Distrib Syst. https://doi.org/10.1080/17445760.2019.1617866
Article Google Scholar
Mafarja M et al (2019b) Binary grasshopper optimisation algorithm approaches for feature selection problems. Expert Syst Appl 117:267–286
Article Google Scholar
Masud MM, et al (2010) Classification and novel class detection of data streams in a dynamic feature space. In: Joint European conference on machine learning and knowledge discovery in databases. 2010. Springer
Mirjalili S, Mirjalili SM, Lewis A (2014) Gray wolf optimizer. Adv Eng Softw 69:46–61
Article Google Scholar
Mistry K et al (2017) A micro-GA embedded PSO feature selection approach to intelligent facial emotion recognition. IEEE Trans Cybern 47(6):1496–1509
Article Google Scholar
Nayar N, Ahuja S, Jain S (2019) Swarm intelligence for feature selection: a review of literature and reflection on future challenges. In: Kolhe M, Trivedi M, Tiwari S, Singh V (eds) Advances in data and information sciences. Springer, Berlin, pp 211–221
Chapter Google Scholar
Neggaz N, Houssein EH, Hussain K (2020a) An efficient henry gas solubility optimization for feature selection. Expert Syst Appl 152:113364
Article Google Scholar
Neggaz N et al (2020b) Boosting salp swarm algorithm by sine cosine algorithm and disrupt operator for feature selection. Expert Syst Appl 145:113103
Article Google Scholar
Oliva D, Abd Elaziz M (2020) An improved brainstorm optimization using chaotic opposite-based learning with disruption operator for global optimization and feature selection. Soft Comput 24:1–22
Article Google Scholar
Peng H, Fan Y (2017) Feature selection by optimizing a lower bound of conditional mutual information. Inf Sci 418:652–667
Article Google Scholar
Pes B, Dessì N, Angioni M (2017) Exploiting the ensemble paradigm for stable feature selection: a case study on high-dimensional genomic data. Inf Fusion 35:132–147
Article Google Scholar
Qi C et al (2017) Feature selection and multiple kernel boosting framework based on PSO with mutation mechanism for hyperspectral classification. Neurocomputing 220:181–190
Article Google Scholar
Rajabioun R (2011) Cuckoo optimization algorithm. Appl Soft Comput 11(8):5508–5518
Article Google Scholar
Ramírez-Gallego S et al (2017) A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing 239:39–57
Article Google Scholar
Rao H et al (2019) Feature selection based on artificial bee colony and gradient boosting decision tree. Appl Soft Comput 74:634–642
Article Google Scholar
Rodrigues D et al. (2014) A binary krill herd approach for feature selection. In: 2014 22nd international conference on pattern recognition (ICPR), 2014. IEEE
Saremi S, Mirjalili S, Lewis A (2014a) Biogeography-based optimisation with chaos. Neural Comput Appl 25(5):1077–1097
Article Google Scholar
Saremi S, Mirjalili SM, Mirjalili S (2014b) Chaotic krill herd optimization algorithm. Procedia Technol 12:180–185
Article Google Scholar
Sayed GI, Hassanien AE, Azar AT (2019) Feature selection via a novel chaotic crow search algorithm. Neural Comput Appl 31(1):1–18
Article Google Scholar
Shunmugapriya P, Kanmani S (2017) A hybrid algorithm using ant and bee colony optimization for feature selection and classification (AC-ABC Hybrid). Swarm Evol Comput 36:27–36
Article Google Scholar
Simon D (2008) Biogeography-based optimization. IEEE Trans Evol Comput 12(6):702–713
Article Google Scholar
Sivagaminathan RK, Ramakrishnan S (2007) A hybrid approach for feature subset selection using neural networks and ant colony optimization. Expert Syst Appl 33(1):49–60
Article Google Scholar
Solorio-Fernández S, Carrasco-Ochoa JA, Martínez-Trinidad JF (2020) A review of unsupervised feature selection methods. Artif Intell Rev 53(2):907–948
Article Google Scholar
Song J et al (2016) Deep and fast: deep learning hashing with semi-supervised graph construction. Image Vis Comput 55:101–108
Article Google Scholar
Statnikov A et al (2005) A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5):631–643
Article Google Scholar
Sweetlin JD, Nehemiah HK, Kannan A (2017) Feature selection using ant colony optimization with tandem-run recruitment to diagnose bronchitis from CT scan images. Comput Methods Programs Biomed 145:115–125
Article Google Scholar
Syberfeldt A (2014) Multi-objective optimization of a real-world manufacturing process using cuckoo search. In: Yang XS (ed) Cuckoo search and firefly algorithm. Springer, Cham, pp 179–193
Chapter Google Scholar
Thaher T et al (2020) Binary Harris Hawks optimizer for high-dimensional, low sample size feature selection. In: Mirjalili S, Faris H, Aljarah I (eds) Evolutionary machine learning techniques. Springer, Berlin, pp 251–272
Chapter Google Scholar
Tizhoosh HR (2005) Opposition-based learning: a new scheme for machine intelligence. In: International conference on computational intelligence for modelling, control and automation and international conference on intelligent agents, Web technologies and internet commerce (CIMCA-IAWTIC’06)
Viswanathan G, Raposo E, Da Luz M (2008) Lévy flights and superdiffusion in the context of biological encounters and random searches. Phys Life Rev 5(3):133–150
Article Google Scholar
Wang N, Liu L, Liu L (2001) Genetic algorithm in chaos. Or Trans 5(3):1–10
Google Scholar
Wang G-G et al (2014) Chaotic krill herd algorithm. Inf Sci 274:17–34
Article MathSciNet Google Scholar
Wang G-G et al (2019) A comprehensive review of krill herd algorithm: variants, hybrids and applications. Artif Intell Rev 51(1):1–30
Article Google Scholar
Xu S, Dai J, Shi H (2018) Semi-supervised feature selection based on least square regression with redundancy minimization. In: 2018 International joint conference on neural networks (IJCNN). 2018. IEEE
Yadav S, Ekbal A, Saha S (2018) Feature selection for entity extraction from multiple biomedical corpora: a PSO-based approach. Soft Comput 22(20):6881–6904
Article Google Scholar
Yan C et al (2019) Hybrid binary coral reefs optimization algorithm with simulated annealing for feature selection in high-dimensional biomedical datasets. Chemom Intell Lab Syst 184:102–111
Article Google Scholar
Yang XS, Deb S (2009) Cuckoo search via Lévy flights. In: Nature & biologically inspired computing, 2009. NaBIC 2009. World Congress on. 2009. IEEE
Yang X-K et al (2018) Semi-supervised minimum redundancy maximum relevance feature selection for audio classification. Multimed Tools Appl 77(1):713–739
Article Google Scholar
Zare M, Eftekhari M, Aghamollaei G (2019) Supervised feature selection via matrix factorization based on singular value decomposition. Chemom Intell Lab Syst 185:105–113
Article Google Scholar
Zhang M et al (2019) Multi-temporal SAR image classification of coastal plain wetlands using a new feature selection method and random forests. Remote Sens Lett 10(3):312–321
Article Google Scholar
Zhenyu G et al (2006) Self-adaptive chaos differential evolution. In: Advances in natural computation. pp. 972–975
Zhong Z (2020) Adaptive graph learning and low-rank constraint for supervised spectral feature selection. Neural Comput Appl 32(11):1–10
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Engineering and Information Technology, Sadjad University of Technology, Mashhad, Iran
Mahsa kelidari & Javad Hamidzadeh

Authors

Mahsa kelidari
View author publications
You can also search for this author in PubMed Google Scholar
Javad Hamidzadeh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Javad Hamidzadeh.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

kelidari, M., Hamidzadeh, J. Feature selection by using chaotic cuckoo optimization algorithm with levy flight, opposition-based learning and disruption operator. Soft Comput 25, 2911–2933 (2021). https://doi.org/10.1007/s00500-020-05349-x

Download citation

Published: 09 October 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s00500-020-05349-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Feature selection by using chaotic cuckoo optimization algorithm with levy flight, opposition-based learning and disruption operator

Abstract

Similar content being viewed by others

A Multi-objective Optimization Method for Joint Feature Selection and Classifier Parameter Tuning

An Efficient Feature Selection Method for High Dimensional Data Based on Improved BOA in AIoT

Comprehensive Learning Strategy Enhanced Chaotic Whale Optimization for High-dimensional Feature Selection

1 Introduction

2 Related works