Abstract
Feature Selection (FS) is a pre-processing step in most big data processing applications. Its purpose is to remove inconsequential and redundant features from data to determine a final set of data properties that best describe the data as a whole. The FS process is an NP-hard problem. It tries to determine the optimal subset, i.e., produces all conceivable solutions to acquire only the best. In the last few years, metaheuristic algorithms (MAs) have been coined as an ideal solution for FS problems, particularly in high-dimensional data cases. This work is an extension of our previous effort in finding an effective solution to FS problems by applying a recently developed metaheuristic algorithm called the Black Widow Optimization (BWO) algorithm. We combine our previous algorithm, the Binary Black Widow Algorithm (BBWO), with a Hill-Climbing Algorithm to solve the slow convergence problem of the BBWO. The newly developed algorithm, BBWO-HCA, is tested using 28 UCI datasets and compared with six well-regarded algorithms in the domain. The test results show that the BBWO-HCA outperforms our previous BBWO solution and almost all comparable solutions tested.
Supported by Wilfrid Laurier University research fund.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Feature Selection
- Evolutionary Algorithm
- Metaheuristic Algorithm
- Classification
- Machine Learning
- Data Mining
1 Introduction
It has become a challenge for researchers and developers to cope with the explosive growth of available data, the dimensions of which are expanding daily. Feature selection (FS) is a pre-processing step in most big data processing and machine learning applications, particularly in data mining applications. It is used to remove noisy and inconsequential features to determine a subset of features that best represent and portray the data, thus, boosting the quality of the data obtained.
Classical search approaches, such as random search and complete search have been used to solve FS problems [17]. While these methods ensure the optimal solution for small datasets, their execution is impractical for large datasets. FS is an NP-hard problem, it tries to determine the optimal subset. For example, if a dataset contains n features, then \(2^n\) solutions must be formulated and assessed, i.e., the problem complexity is \(O(\infty )\). It also requires an enormous amount of computational power and an excessive amount of time.
In the last few years, metaheuristic algorithms (MAs) have been identified as an ideal solution to FS problems, particularly in cases involving high-dimensional data. Researchers used the Simulated Annealing algorithm, Ant Colony Optimization algorithm, Particle Swarm Optimization algorithm, Genetic Algorithm, etc. to solve FS problems and have obtained valuable results. For example, see [1, 4, 5, 8, 11, 12, 18].
MAs are the most appropriate alternative method for addressing the limitations of lengthy, far-reaching searches that entail high computational cost. Despite some desirable results, however, most MAs are impeded by the limitations imposed by a local optimum and a disproportion between the explorative and exploitative scope of the algorithm. Moreover, each dataset has a different number of features, and no single method is the most appropriate for FS problems, i.e., one can still find room for improvements. These observations motivated this work to look for means to overcome the limitations described and develop a novel FS solution.
We selected a recent algorithm, the Black Widow Optimization algorithm (BWO) [9], to study FS problems due to its success in optimizing engineering design problems. BWO is a nature-inspired algorithm that mimics the black widow’s life cycle. It is inspired by the singular mating behaviour exhibited by the black widow spider, a process that includes an exclusive stage called cannibalism. The BWO approach is designed to deliver rapid convergence and to avoid local optima, and, because BWO maintains equilibrium between the exploration and exploitation stages [9], a property that most MAs applied to FS problems are lacking [14], the BWO is particularly appropriate for solving several kinds of optimization problems that involve a number of local optima.
This work is an extension of our previous effort in trying to find an effective solution to FS problems by applying the BWO algorithm. In our previous effort, we modified the BWO algorithm to solve feature selection problems and developed the Binary Black Widow Optimization (BBWO) algorithm [3].
Despite the competitive results of the BBWO, its performance can be further improved by enhancing the slow convergence caused by the use of a population of solutions and a lack of local exploitation. In this article, we describe an improved version of the BBWO. We combined the BBWO with the Hill-Climbing Algorithm (HCA). The newly developed algorithm, the BBWO-HCA, is tested using 28 UCI datasets and compared with six well-regarded algorithms in the domain. The algorithms are Binary particle swarm optimization (BPSO) [10], Binary multi-verse optimization algorithm (BMWO) [2], Binary grey wolf optimizer algorithm (BGWO) [1, 7], Binary moth-flame optimization algorithm (BMFO) [19], Binary whale optimization algorithm (BWOA) [11], and Binary bat algorithm (BBAT) [13].
The rest of this article is organized as follows: In Sect. 2, the proposed algorithm is presented. In Sect. 3, the experiment setup, test results, and result discussion are presented. Finally, in Sect. 4, the research is concluded and some future works are identified.
2 BBWO-HCA Feature Selection Algorithm
The results presented in [3] show that the BBWO produces impressive results and, in some cases, is competitive with the best-known algorithms. The results also reveal that the BBWO performance can be further improved by enhancing the slow convergence due to the use of a population of solutions and a lack of local exploitation. The BBWO-HCA aims to increase the exploitation process of the BBWO by incorporating it with a local metaheuristic algorithm based on the Hill-Climbing Algorithm (HCA). HCA is a well-known local search algorithm. It has been tested on various problems and has shown to be an effective and efficient method that can produce sound results [6]. The BBWO-HCA’s main steps, selection, procreation, and mutation, are described in the subsections below.
2.1 Solutions Representation
In the BBWO-HCA, each solution represents a single black widow. All possible solutions to all FS problems are envisioned in terms of the attributes of the black widow spider. In programming terms, this is equivalent to saying each spider is represented by a class and spider attributes are class instance variables, or each spider is an array and spider attributes are array values. The spider population is modeled as an \(N_{var}\) dimensional array, i.e., an array of spider objects, and the FS problem becomes an \(N_{var}\) dimensional optimization problem.
The BBWO-HCA algorithm uses binary values to represent a population of solutions \((N_{pop})\). In binary representation, a solution is shown by a one-dimensional array. The length of the array varies in accordance with the feature number of the original dataset. For example, if S features are contained in the dataset, the solution length is S. The cell value in the array will be ‘1’ or ‘0’. The value ‘1’ indicates that the corresponding feature is selected, whereas ‘0’ indicates that the feature is not selected. In general, when the number of features is \(N_f\) and the population size is \(\mid N_{pop}\mid \), the array size of the problem will be \(N_f\) \(\times \) \(\mid N_{pop} \mid \).
2.2 Initialization
The population of solutions offered by the BBWO-HCA is randomly generated by assigning a value of either “0” or “1” to each cell of the solution. The process begins by initializing the population size and the number of features. The algorithm then arbitrarily assigns either ‘0’ or ‘1’ by looping through each solution in the population. This process is repeated until all solutions in the population have been initialized.
2.3 Fitness Function and Evaluation
Each solution is evaluated according to a fitness function. The function employed is shown in Eq. 1. A similar function is used by [1, 15]. The k-nearest neighbour algorithm (KNN) [16] is used in the solution evaluation, i.e., the KNN classifier determines the accuracy of the solution.
In Eq. 1, \(\gamma _{R}(D)\) represents the classification error rate of the KNN classifier, \(\mid R \mid \) is the cardinality of the selected subset, \(\mid C \mid \) is the total number of the original features in the dataset, and \(\alpha , \beta \) are two weight parameters corresponding to the importance of classification quality and subset length, \(\alpha \in [0,1]\) and \(\beta = (1-\alpha )\). A similar approach is adapted by [7, 11, 19].
After initializing the population of solutions, we assign to each solution (widow) a fitness value, which represents the quality of the solution. The fitness value of each solution is calculated using the fitness function and is evaluated using the KNN classifier. This is because the BBWO-HCA is a wrapper-based FS approach.
2.4 Transformation Function
The positions of the search agents generated from the standard BWO are continuous values. This cannot be directly applied to our problem because it contradicts the binary nature of the FS on selection or non-selection (0 or 1).
The sigmoidal function in 2 and 3, which is considered a form of the transformation function, is used in our proposed method as a part of the reproduction process to convert any continuous values to binary equivalents. The performance of the transformation function has been investigated and adopted by many researchers, e.g., [1, 7, 19].
where each of \(z_{s_{w}}\) is a continuous value (feature) in the search agent for the S-shaped function, specifically in the solution w at dimension d (w = 1,...,d), and is a random number drawn from the uniform distribution \(\in \) [0,1]. The \(z_{binary}\) value can be 0 or 1 depending on the value of \( rand\) compared to the values of \(z_{s_{w}}\), where e is a mathematical constant known as Euler’s number.
2.5 Reproduction Process
BWO is inspired by Darwin’s natural selection theory, which is defined as generational descent accompanied by modification where species are subtly adjusted over time and new species arise as a result.
In the BBWO-BCA algorithm, the procreation process begins and parents (in pairs) are selected randomly to perform the procreating steps by mating to bring forth the new generation. An array known as alpha will be generated to complete further reproduction. Offspring \(c_1\) and \(c_2\) will be produced by taking \(\alpha \) with the following equation in which \(w_1\) and \(w_2\) are parents.
2.6 Cannibalism Process
The BWO includes an exclusive stage, cannibalism. Cannibalism can be classified into three kinds: sexual cannibalism where the husband gets eaten by the female black widow during or after mating, sibling-cannibalism where the weaker siblings are eaten by the stronger siblings, and mother cannibalism where the mother is eaten by her strongest child. The BBWO-HCA uses this concept of cannibalism and determines the weak or strong spiders by calculating and evaluating their fitness values. The best solutions (surviving spiders) from the reproduction process will be selected and stored in population two, i.e., \(pop_2\).
2.7 Mutation Process and New Population Generation
The procedure of mutations begins by randomly selecting a number of solutions (widows) from the \(pop_1\) which will be mutated individually. Two cells from each selected solution are randomly exchanged, and the new mutation solutions will be kept in \(pop_3\). The new generation can finally be generated as a combination of \(pop_2\) and \(pop_3\), which will then be evaluated to return the optimal solution \((W^{*})\) of values bearing the N dimension.
In the BBWO-HCA, the cannibalism rate \((C_R)\), the procreation rate \((P_r)\), and the mutation rate \((M_r)\) are used as parameters. The value of the \((C_R)\) is determined by the fitness values obtained by Eq. 1, and the \(P_r\) and \(M_r\) are identical to those of the standard BWO.
2.8 HCA Steps
The algorithm uses the best solution \((W^{*})\) of the BBWO as an initial solution for the HCA. The solution is modified by selecting one feature randomly and flipping the value of that feature, i.e., if the feature value is “0” it is changed to “1” (which indicates adding one feature), and if the value is “1” it is changed to “0” (which indicates deleting one feature). If the fitness value of the modified solution is improved, it will replace the old one, otherwise, it discards the new solution.
Next, the HCA iteration counter and BBWO best solution \((W^{*})\) are updated, and the stopping criteria of the BBWO is checked. If the BBWO stopping condition is met, i.e., the max iterations are reached, the algorithm stops and returns the best solution \((W^{*})\), otherwise, a new iteration for the BBWO starts.
The pseudocode for the BBWO is shown in Fig. 1 and the additional steps involved in implementing the HCA are shown in Fig. 2. Together, they form the pseudocode for the BBWO-HCA.
3 Experiment Setup and Results
28 well-known datasets from the University of California Irvine (UCI)Footnote 1 machine learning repository have been used to investigate the performance and strength of our proposed methods. The dataset is randomly split into 80% for the training set and 20% for the test set. These rates are widely accepted data partition rates. The datasets vary in the number of features and instances. Table 1 presents a brief description of the datasets. Each row in the table represents the number of features, objects, classes, and the domain to which each of these datasets belong.
The performance of our proposed method, the BBWO-HCA, is compared with six well-respected binary FS algorithms: (BPSO [10], BMVO [2], BGWO [1, 7], BMFO [19], BWOA [11], BBAT [13]) based on the two evaluation criteria, classification accuracy and the number of features selected.
To ensure an impartial comparison and a correct evaluation between our proposed method and other FS algorithms, we re-implemented the six FS algorithms using the same parameters values as illustrated in Table 2 and the same transformation function as explained in Sect. 2.4. The algorithms are run independently multiple times and the average accuracy and the average number of features selected are reported.
3.1 BBWO-HCA vs. BBWO Results and Discussion 1
Table 3 shows the comparison between our two algorithms (BBWO-HCA and BBWO algorithms) based on the two evaluation criteria (the classification accuracy and feature selected). The best classification accuracy and the lower number of features selected are highlighted in bold.
The results show that the BBWO-HCA is more efficient than the BBWO in terms of maximizing classification accuracy. The BBWO-HCA outperforms the BBWO in 15 datasets and obtains the same results in 13 datasets in terms of classification accuracy. When considering the average accuracy for all datasets, the performance of the BBWO-HCA is better than the BBWO. This is shown in Fig. 3.
In terms of minimizing the total number of features selected, the results show that the BBWO-HCA obtains better results than the BBWO in 26 datasets, the same results in one dataset, and worse results in one dataset. The results also show that on average, the BBWO-HCA is more efficient than the BBWO in this regard. This is shown in Fig. 4.
3.2 BBWO-HCA vs. Six FS Algorithms, Results and Discussion 2
We compared the BBWO-HCA results with six FS algorithms (BPSO, BMVO, BGWO, BMFO, BWOA, BBAT). The results are presented in Tables 4 and 5. The test results reveal that the BBWO-HCA outperforms all six algorithms unless an algorithm already reached the best possible solution, in which case the BBWO-HCA results are the same as the other algorithm. For example, the BBWO-HCA produced better results than the BPSO in 26 datasets and the same results in two datasets.
In to the number of features selected, the test results reveal that the BBWO-HCA outperforms all six FS algorithms for all 28 datasets tested, see Table 5. These results are depicted pictorially in Figs. 5 and 6, and they show that the BBWO-HCA is an effective algorithm for solving FS problems.
4 Conclusion and Future Works
Recently, a novel algorithm, the Black Widow Algorithm (BWO), has been developed to solve optimization problems. BWO is derived from nature; it mimics the singular mating behaviour exhibited by the black widow spider.
Initially, we developed the BBWO algorithm based on the BWO for solving FS problems. In this work, we further improved the BBWO by combining it with the Hill-Climbing Algorithm. The newly developed algorithm, BBWO-HCA, is tested using 28 UCI datasets and compared with six well-regarded algorithms in the domain. The test results show that the BBWO-HCA outperforms the BBWO and almost all comparable algorithms for most datasets tested.
This work opened the door for further FS and optimization studies. Examples of such studies are:
-
The test results of the BBWO-HCA can be further analyzed to determine the impact of the dataset size, number of features in the dataset, number of instances, etc. on the performance of the algorithm.
-
Combining the BBWO with other algorithms and studying the outcomes of these new combinations are open future works.
-
The BBWO and BBWO-HCA can be applied to various other areas of study to solve many other real-world optimization problems such as text mining, clustering, image processing, and routing problems.
Notes
- 1.
The datasets can be downloaded here: https://archive.ics.uci.edu/ml/datasets.php.
References
Abdel-Basset, M., El-Shahat, D., El-henawy, I., de Albuquerque, V.H.C., Mirjalili, S.: A new fusion of grey wolf optimizer algorithm with a two-phase mutation for feature selection. Expert Syst. Appl. 139, 112824 (2020)
Al-Madi, N., Faris, H., Mirjalili, S.: Binary multi-verse optimization algorithm for global optimization and discrete problems. Int. J. Mach. Learn. Cybern. 10(12), 3445–3465 (2019). https://doi.org/10.1007/s13042-019-00931-8
Al-Saedi, A., Mawlood-Yunis, A.R.: Binary black widow optimization algorithm for feature selection problems. In: Simos, D.E., Rasskazova, V.A., Archetti, F., Kotsireas, I.S., Pardalos, P.M. (eds.) Learning and Intelligent Optimization (LION 2022). LNCS, vol. 13621, pp. 93–107. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-24866-5_7
Alweshah, M., Alkhalaileh, S., Albashish, D., Mafarja, M., Bsoul, Q., Dorgham, O.: A hybrid mine blast algorithm for feature selection problems. Soft. Comput. 25(1), 517–534 (2020). https://doi.org/10.1007/s00500-020-05164-4
Arora, S., Anand, P.: Binary butterfly optimization approaches for feature selection. Expert Syst. Appl. 116, 147–160 (2019)
Dordaie, N., Navimipour, N.J.: A hybrid particle swarm optimization and hill climbing algorithm for task scheduling in the cloud environments. ICT Express 4(4), 199–202 (2018)
Emary, E., Zawbaa, H.M., Hassanien, A.E.: Binary grey wolf optimization approaches for feature selection. Neurocomputing 172, 371–381 (2016)
Hammouri, A.I., Mafarja, M., Al-Betar, M.A., Awadallah, M.A., Abu-Doush, I.: An improved dragonfly algorithm for feature selection. Knowl.-Based Syst. 203, 106131 (2020)
Hayyolalam, V., Kazem, A.A.P.: Black widow optimization algorithm: a novel meta-heuristic approach for solving engineering optimization problems. Eng. Appl. Artif. Intell. 87, 103249 (2020)
Mafarja, M., Jarrar, R., Ahmad, S., Abusnaina, A.A.: Feature selection using binary particle swarm optimization with time varying inertia weight strategies. In: Proceedings of the 2nd International Conference on Future Networks and Distributed Systems, pp. 1–9 (2018)
Mafarja, M., Mirjalili, S.: Whale optimization approaches for wrapper feature selection. Appl. Soft Comput. 62, 441–453 (2018)
Mirjalili, S.: The ant lion optimizer. Adv. Eng. Softw. 83, 80–98 (2015)
Mirjalili, S., Mirjalili, S.M., Yang, X.S.: Binary bat algorithm. Neural Comput. Appl. 25(3), 663–681 (2014)
Mostafa, R.R., Ewees, A.A., Ghoniem, R.M., Abualigah, L., Hashim, F.A.: Boosting chameleon swarm algorithm with consumption AEO operator for global optimization and feature selection. Knowl.-Based Syst. 246, 108743 (2022)
Neggaz, N., Houssein, E.H., Hussain, K.: An efficient henry gas solubility optimization for feature selection. Expert Syst. Appl. 152, 113364 (2020)
Syriopoulos, P.K., Kotsiantis, S.B., Vrahatis, M.N.: Survey on KNN methods in data science. In: Simos, D.E., Rasskazova, V.A., Archetti, F., Kotsireas, I.S., Pardalos, P.M. (eds.) Learning and Intelligent Optimization (LION 2022). LNCS, vol. 13621, pp. 379–393. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-24866-5_28
Venkatesh, B., Anuradha, J.: A review of feature selection and its methods. Cybern. Inf. Technol. 19(1), 3–26 (2020)
Yang, X.S.: A new metaheuristic bat-inspired algorithm. In: Gonzáilez, J.R., Pelta, D.A., Cruz, C., Terrazas, G., Krasnogor, N. (eds.) Nature Inspired Cooperative Strategies for Optimization (NICSO 2010). Studies in Computational Intelligence, vol. 284, pp. 65–74. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12538-6_6
Zawbaa, H.M., Emary, E., Parv, B., Sharawi, M.: Feature selection approach based on moth-flame optimization algorithm. In: 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 4612–4617 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Al-saedi, A., Mawlood-Yunis, AR. (2023). Binary Black Widow with Hill Climbing Algorithm for Feature Selection. In: Dorronsoro, B., Chicano, F., Danoy, G., Talbi, EG. (eds) Optimization and Learning. OLA 2023. Communications in Computer and Information Science, vol 1824. Springer, Cham. https://doi.org/10.1007/978-3-031-34020-8_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-34020-8_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34019-2
Online ISBN: 978-3-031-34020-8
eBook Packages: Computer ScienceComputer Science (R0)