A novel feature selection method for data mining tasks using hybrid Sine Cosine Algorithm and Genetic Algorithm

Abualigah, Laith; Dulaimi, Akram Jamal

doi:10.1007/s10586-021-03254-y

A novel feature selection method for data mining tasks using hybrid Sine Cosine Algorithm and Genetic Algorithm

Published: 22 February 2021

Volume 24, pages 2161–2176, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Cluster Computing Aims and scope Submit manuscript

A novel feature selection method for data mining tasks using hybrid Sine Cosine Algorithm and Genetic Algorithm

Download PDF

1244 Accesses
64 Citations
Explore all metrics

Abstract

Feature selection (FS) is a real-world problem that can be solved using optimization techniques. These techniques proposed solutions to make a predictive model, which minimizes the classifier's prediction errors by selecting informative or important features by discarding redundant, noisy, and irrelevant attributes in the original dataset. A new hybrid feature selection method is proposed using the Sine Cosine Algorithm (SCA) and Genetic Algorithm (GA), called SCAGA. Typically, optimization methods have two main search strategies; exploration of the search space and exploitation to determine the optimal solution. The proposed SCAGA resulted in better performance when balancing between exploitation and exploration strategies of the search space. The proposed SCAGA has also been evaluated using the following evaluation criteria: classification accuracy, worst fitness, mean fitness, best fitness, the average number of features, and standard deviation. Moreover, the maximum accuracy of a classification and the minimal features were obtained in the results. The results were also compared with a basic Sine Cosine Algorithm (SCA) and other related approaches published in literature such as Ant Lion Optimization and Particle Swarm Optimization. The comparison showed that the obtained results from the SCAGA method were the best overall the tested datasets from the UCI machine learning repository.

An evolutionary computation-based approach for feature selection

Article 08 November 2019

A Comparative Study of Evolutionary Algorithms with a New Penalty Based Fitness Function for Feature Subset Selection

Feature Subset Selection Approach by Gray-Wolf Optimization

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

When addressing the data mining field, one must expand briefly and mention the broader concept of Knowledge Discovery in Databases (KDD), which categorizes datasets to find useful information from a large-scale dataset. The Discovery of knowledge has four main processes: data warehouse, preprocessing, data mining, and evaluation process [1, 2]. This process is essential in information acquisition, Machine Learning, pattern discovery, data visualization, databases, statistics, and artificial intelligence.

Data mining goes through multiple steps such as reversion, categorizing, grouping, deviation-change discovery, dependence modeling, and summarization [3]. For the data mining process to obtain the best results, a vital step must be conducted: reduction, transformation, normalization, discretization, integration, feature extraction, data cleaning, and feature selection of the data. Feature selection (variable or subsets selection) is the highlight of the study; it points to the process of choosing relevant and essential features (features are also identified as attributes, properties, characteristics, and dimensions). It discards any irrelevant, redundant, and noisy features that may decrease the accuracy of the classification, therefore lowering the algorithm's performance and the accuracy of the output. With that being said, problems are created by a faulty feature selection result in a more complicated learning process and high computational expenses [4,5,6].

Feature selection can encounter many problems. The most significant and commonly encountered problem is the curse of dimensionality, which happens when the features’ attributes or numbers exceed the samples’ number resulting in problems that lower accuracy and affect learning speed. To address this problem, datasets must be summarized in order to find smaller or narrower attributes and samples that are more relevant to the original matrix while eliminating noise and redundancy, resulting in an enhanced discrimination power and a better classification performance; this process is named dimensionality reduction [2, 7]. Feature selection techniques are applied in several applications such as image processing [8], signal processing [9], pattern recognition [10], text clustering [11,12,13], and machine learning [14, 15].

Approaches of feature selection are usually categorized into three wide groups: wrapper model, filter model, and embedded model; such approaches are selected based on how the selection algorithm and the process of building the model are combined. The wrapping-based model employs learning algorithms to find the optimal subset and evaluate it, which results in finding the predictions that have better performance. Filter model approaches are rankers; they rank attributes based on their relevance to the output variable while having an independent evaluator from any learning algorithm. A filter method has adequate generalization capacity and low computational cost; this method can also handle high-scale dataset. The embedded approach is the method that combines the benefits of the wrapper and filter methods while trying to eliminate their disadvantages. This method starts just like the filter method by independently trying to find the optimal subset. It then employs linear classifiers such as Support Vector Machine (SVM) to enhance this subset by finding correlated features locally to have better local discrimination, resulting in a final optimal subset but at a lower computational cost than that of the wrapper model [16].

Feature Selection (FS) problems are considered real-world problems that affect the classification accuracy and learning speed. Metaheuristic algorithms are high procedure algorithms designed to search for a good enough solution to the search space. Such algorithms use two conflicting criteria in determining the best solution: exploration and exploitation of the search space. Exploring the search space is trying to find a good enough solution in the whole search area. In contrast, exploitation is a method that tries to determine the optimal solutions. In the native sine–cosine Algorithm, the weakness in its exploration strategy is noted, which leads to weakness in its performance during the search space. However, enhancement or modification by creating a hybrid technique can be done by introducing a new version of metaheuristic algorithms to improve performance by balancing the search space's exploitation and exploration. This stimulus underlies all of our attempts to create a predictive model based on hybridization approaches for solving feature selection problems by reducing the number of features, weakly relevant and irrelevant features. Practically, an optimal subset is likely to contain only powerfully relevant features.

The objectives of the proposed feature selection approach are reducing dimensionality and eliminate noise from data. This leads to an increase in learning speed, ease of rules, easy visualization of the data, and predictive accuracy. So, this study aims to achieve maximal accuracy of classification with a minimal number of features. This study assesses the ability of hybridization between metaheuristic algorithms to create a new feature selection approach to solve a feature selection problem by enhancing search space performance. To evaluate the new feature selection method's performance, classification accuracy, best, worst, and mean fitness, Standard Deviation (Std). An average number of features are used as evaluation criteria. This study is significant as it tries to solve Feature Selection (FS) problems by building a new hybrid feature selection approach (SCAGA), which discards redundant, irrelevant, noisy, and weak features from the original dataset. This leads to increased learning speed, ease of rules, dataset visualization, and predictive accuracy for a classification task.

The paper organization is as follows: Sect. 2 explains the literature survey that talks about previous studies and related works. Section 3 introduces the procedures and methodology and discusses the proposed schemes. Section 4 shows the evaluation criteria. Section 5 portrays the results and the discussion of the results, and Sect. 6 discusses the conclusions and future work.

2 Related works

Feature Selection (FS) technique is one technique employed to enhance the prediction accuracy of the searching space problems [17,18,19]. Approaches of searching may be summarized as follows: thorough searching, probabilistic searching, heuristic searching, and involuntary hybrid exploration algorithms [20]. Metaheuristic algorithms plan to decrease the time consumption and only search for a particular path to obtain the optimal solution [2]. Metaheuristic search is typically used on real-world problems and to exact varied computer science series [21]. Heuristics are also suitable to treat other parts of massive data, such as diversity and speed [22].

Different metaheuristic methods are applied to treat feature selection problems [23]. Genetic algorithm (GA) is the furthermost inspected metaheuristic algorithm. Population- and single-based metaheuristic algorithms are proposed [24]. Metaheuristic algorithms that are single-based such as hill climbing and simulated annealing, have been used. Scatter, random, harmony, and hill-climbing searches have main disadvantages; they are very tricky for opening solutions. They often drop in local optima [25]. The subset features have been selected by using the spider monkey optimization approach. The primary population algorithms have been given for the dataset. The assessment of the fitness calculation was done using the SVM for classification accuracy. In order to continue or stop the process, a stopping criterion is tested. The best final subset of attributes with high classification accuracy is defined as the best optimal results [26].

A hybrid binary technique between coral reefs optimization and simulated annealing for attribute selection (BCROSAT) can discover the maximal accuracy and select minimal features’ number for most datasets utilized [27]. Instance selection is a method that can reduce the size of the original training data. A combination of instance selection and feature extraction reduces the large volume of computation time of training the classifier [28]. For the global optimizer and FS algorithm, a novel chaotic slap swarm technique is efficient for two problems: FS problems and global optimization problems [29]. FS is the procedure to statistically identify the utmost relevant features to improve the predictive capability of the classifiers [30]. A method for FS to enhance clustering of documents is by using particle swarm optimization; this approach focuses on enhancing the current implementation of Bayesian calibration to building energy simulation [31].

The feature selection technique of Water Wave Optimization (WWO) builds the text FS technique based on Water Wave Optimization (WWOTFS) [32]. A hybrid approach based on binary chemical reaction optimization and a Tabu search optimization algorithm for FS has been developed. Once the four essential reactions are performed, in the iteration step, the best solution is checked. Then Tabu search is utilized to search neighbors, which is a local search process. An enhanced FS based on the Ant Colony Optimization (FACO) algorithm and the classifier SVM has been used to solve FS problems [33]. With the growing volume of data in networks and the number of feature sets, the security of the network is threatened by extra network attacks, such as APT and DDoS attacks. To speedily detect anomalies in networks, a classification technique is extensively used in the anomaly field of data discovery. However, there are massive irrelevant and redundant features in the dataset, which are considered difficulties that prevent the classification algorithm from creating efficient anomaly detection classifiers. To enhance the performance of classification for classifiers, the ant colony optimization method searches for the optimal features subset. It selects the relevant features independently from the classifier, which can efficiently decrease the algorithms' complexity to classify and improve the classification accuracy of classification.

A new technique for the subset of feature selection in ML, FSS-MGSA (Feature Subset Selection by Modified Gravitational Search Algorithm), is presented. FSS-MGSA is a sophisticated chaotic search algorithm based on the gravity law and interaction of mass. It can be performed when knowledge of the domain is not accessible [34]. The binary of bare-bones particle swarm optimization (BPSO): the stimulus for this method is to design a global search technique using a small number of parameters, which has a better performance when solving feature selection problems also is easy to implement as well. Also, FS methods have been used to find feature subsets that have maximal classification ability [35]. In 2014, Moradi et al. presented a hybrid approach for selecting features in two phases: In the first phase, they seek a decrease in the original set's feature set by using the filtering model. Then, in the second phase, the wrapper model is applied for selecting the best features subset from the reduced feature set [36].

In the FS technique of binary PSO and GA, when determining coronary artery disease using a support vector machine (BPSO-FST), every particle is created from 23 binary cells, which refer to all features in the dataset. The cells' value shows whether the feature would be selected or not, where a cell value of 1 means that the feature is selected. With a value of a cell of 0, the feature is unselected into the dataset [37]. Feature selection is used to identify a powerfully predictive of fields inside a database and decreases the field number presented to the computational process. Feature selection affects some pattern classification aspects containing the learning classification procedure’s accuracy, such as a support vector machine [38]. FS technique based on improved binary-coded Ant Colony Optimization technique (MPACO) is established to increase the accuracy of classification while reducing redundant features [39]. A novel FS technique using PSO has been used for cancer microarray datasets. This method is used for classifying high-dimensional cancer microarray datasets after solving feature selection problems. Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Probabilistic Neural Network (PNN) are used as classifiers and evaluators [40].

The experiment results showed that a heuristic FS method for text categorization using Chaos Optimization and Genetic technique (CGFSO) found feature subsets that resulted in the maximum accuracy of classification, while it found compact feature subsets. The performance of this approach is quicker than other traditional approaches [41]. Genetic algorithms and particle swarm optimization method can be combined in various ways. In the hybrid of GA and PSO (HGAPSO) FS method, hybridization is done by integrating the updated PSO rules and the standard velocity with selection, mutation, and crossover from the genetic algorithm [42]. FS based on the antlion optimization algorithm is working on the wrapper-based model. The wrapper methodology's main characteristic is using the classifier as a leader of the feature selection technique. Wrapper-based feature selection can classify depending on the next three chief items: classification method, feature-evaluation criteria, and search method.

The first hybrid approach is proposed; Oh and Lee presented aggregation between algorithms and local search methods inserted inside a GA to improve search space by searching the utmost hopeful area discovered by the genetic algorithm procedure [43]. Lately, hybrid metaheuristic techniques have displayed high performance for solving a hard combinatorial optimization problem. For example, combination methods, such as the hybrid between PSO and GA [44] and ant colony optimization with a genetic algorithm, have also been suggested. Simulated Annealing (SA) with a genetic algorithm are some local search approaches inside operative algorithms that balance between exploration and exploitation [45]. A new wrapper-based method for hybrid simulated annealing with a crossover operator was proposed [46]. Moreover, Talbi, Jourdan, Garcia-Nieto, and Alba, proposed GA was hybridized with the PSO technique, which used the SVM as a classifier. Table 1 shows an overview of different FS approaches with their details.

Table 1 Overview of the classification and optimization methods used for feature selection

Full size table

3 Procedures and methodology

This section supplies a detailed depiction of the research objectives methodology. Generally, wrapper methodology approaches use the method of learning as a classifier to assess the usefulness of features’ subset and subsequently obtain the best performance of a predictive. Wrapper-based approaches gain better results of the quality measurement than filter-based methods. It obtained a subset of features that are enhanced for learning algorithm utilization. The technical details and a good understanding of the existing algorithms are necessary to make the appropriate choice unpractical in most statuses. Inside the field of this area, there is no metaheuristic-based method capable of solving all FS problems [20]. For newly unknown datasets, it will be more intricate to select a suitable method. Nevertheless, enhancements can be made to current algorithms to improve performance during balancing of the search space. This incentive mostly motivates us to make a prediction wrapper model depending on the hybridization technique for solving a feature selection problem.

3.1 Binary version of Sine Cosine Algorithm (SCA)

Recently, algorithms of metaheuristic proved to obtain high performance for solving real-world problems. Feature selection problems are considered binary search problems. In the proposed method, a Binary Sine Cosine Algorithm (SCA) adjusts the continuous SCA to handle feature selection problems in the binary domain. The SCA begins with random positions whereas the search agent (X_i) = 5. Sine Cosine Algorithm works according to mathematical Eq. (3.1) [47].

$$ x_{ij}^{t + 1 } \left\{ {\begin{array}{*{20}c} {x_{ij}^{t } + \, r_{1} * \, sin \, \left( {r2} \right) \, * \, \left| {r_{3}^{ } {(}Xb_{j}^{t} {)} - x_{ij}^{t } } \right|{\text{if}}\;R_{1} < \, 0.5} \\ {x_{ij}^{t } + \, r_{1} * \, cos \, \left( {r2} \right) \, * \, \left| {r_{3}^{ } {(}Xb_{j}^{t} {)} - x_{ij}^{t } } \right|{\text{if}}\;R_{1} \ge \, 0.5,} \\ \end{array} } \right. $$

(3.1)

$$ r_{1} = \, a \, {-} \, t\frac{a}{Tmax} $$

(3.2)

The fitness function of the SCA will increase if the performance of classification over the testing dataset is increased when validating and achieving the minimal number of selected feature selections together.

$$ f\theta = \omega *E + (1 - \omega )\frac{{\sum\nolimits_{i} {\theta i} }}{n}, $$

(3.3)

where the fitness function fθ gives vector θ sized n with 0/1 elements representing unselected/selected feature; n is the features number in the dataset. E is the classifier error rate and ω is a constant value (fixed 0.05) to control the classification accuracy performance to the features number that is selected. The variables used are the same features number in the given dataset. The variables are limited in the range [0, 1], where the variable value approaching 1 means that its corresponding features are candidate to be selected in the classification [28, 29, 44]. In individual fitness calculations, the variable is the threshold to decide the exact features to be assessed as in Eq. (3.4):

$$ f_{ij} = {\text{ 1 if}}\;X_{ij} > \, 0.{5},{\text{ otherwise }}0, $$

(3.4)

where X_ij is the value of dimension for search agent i at dimension j. While updating each position of search agent at some dimensions, the updated value can violate the limiting constraints: [0, 1]; hence, we use simple truncation rule to ensure variable limit. Each candidate feature is represented as a binary vector with one dimension. The vector is used for features mapped to be in [0, 1] interval based on threshold value that is set to 0.5, indicating to the upper bound (ub = 1) and the lower bound (lb = 0).

3.2 Initial population

The Sine Cosine Algorithm (SCA) begins first randomly with positions to converge the global optima. It then calculates the value of fitness for every individual. It allocates the utmost remarkable location to FS as candidate features. Every solution is presented as a binary vector in one dimension. The number of the vectors is equal to the number of features in the original dataset. All cells within the vectors are labeled with 0 or 1. One value is indicating that the feature is selected, otherwise indicating that the feature is ignored.

3.3 Classifier

K-Nearest Neighbor (KNN) classifier, The KNN classifier is a predictor of variables weight at a distance based on trial-and-error process. K-Nearest Neighbor is utilized as a part of the fitness function in all the experiments due to its excellent performance in classifying.

3.4 Fitness function

In this study, the fitness function is applied to assess each feature subset in the search space of Sine Cosine Algorithm based on K-Nearest Neighbor (KNN) as a classifier, where K = 5. The proposed fitness function is calculated by using Eq. (3.5):

$$ f\theta = \omega * E + \left( {{1} - \omega } \right) \frac{{\sum\nolimits_{i} {\theta i} }}{n}, $$

(3.5)

where the fitness function fθ is given vector θ sized n with 0/1 element representing select/unselect feature; n is the features’ amount of dataset. E is the error rate of classifier and ω is a constant to control the classification accuracy performance to the features number selected.

3.5 The crossover operator

Crossover is the leading exploration operator in the genetic algorithm. It searches the area for possible solutions depending on present solutions [34]. The binary crossover operators are to exchange bits between two parents selected to reproduce two new individuals. Both will be different from their parents yet hold some parent features. The type of crossover operator depends on the encoding method. Therefore, several types of crossover techniques, such as one-point, two-point, and uniform crossover. Uniform crossover is more exploratory, better for small populations, while the two-point crossover is suitable for large populations. In general, recombining parts of the right individuals gives a better opportunity to produce better individuals.

In this study, the crossover operator is designed by using three critical formulas, which are as follows: single, double, and uniform, enhancing the worst solution that has been selected by the Sine Cosine Algorithm through recombining the worst solution with the best one obtained from the previous iteration. The appropriate formula is chosen in each iteration depending on the roulette wheel selection function to take advantage of the features for each type of species mentioned. The crossover is utilized as an internal agent within Sine Cosine Algorithm, as shown in Fig. 1; Algorithm 3.1 shows the crossover operator's main steps.

3.6 The mutation operator

The mutation operator is utilized to produce new individuals with various features not present in their predecessors. Mutations can be applied for integer, binary, or real representations and categorized into several types. In general, the mutations are generated by randomly selecting one or more bits and then flipping their value with a certain probability (pm = 0.02). The mutation operator is utilized to act as an internal function employed within Sine Cosine Algorithm (SCA) to generate a new solution and improve the exploration ability after applying the crossover operator. The following Eq. (3.6) shows the work of the mutation operator:

$$ X_{i}^{t + 1} = Mutation(X_{i}^{t} ). $$

(3.6)

At first, the metaheuristic algorithms’ process displays two contradictory criteria: strategies of exploration of the search space and exploitation to detect the optimal solution. In the native Sine Cosine Algorithm, we note weakness in its exploration strategy, leading to weakness in its performance during the search space. The main steps of the proposed framework are shown in Fig. 1. In the proposed hybrid feature selection approach to improve the exploration strategy of the Sine Cosine Algorithm and its performance during the search space for solving feature selection problems, we used the Genetic Algorithm as an internal function within the SCA as a hybrid feature selection method named SCAGA.

The wrapper model applies the classifier method as evidence in the FS method depending on some optimization techniques. The SCA is used as an FS method for balancing the accuracy of classification (maximal value) and the feature number selected (minimal value) in all solutions. In the beginning, population solutions with the SCA algorithm depend on updating its roles on the functions of sine and cosine according to Eq. (3.1).

Proposed Binary version of Sine Cosine algorithm then generates feature subsets. The fitness function is applied to evaluate the feature subsets. Update the iteration. R1 is switched between sine and cosine functions based on the random switch parameter, where it is applied according to Eq. (3.1). The proposed genetic algorithm improves the current position or solution. The crossover operator (embedded within SCA) is applied to generate new offspring based on the best and candidate solutions and then enhanced by the mutation operator. The mutation operator is acting as an internal function within the SCA to prevent the algorithm from falling in the local optima problem and getting an optimal solution or near-optimal solution.

The probability of mutation controls the mutation operator. The optimal rate of mutation is a common problem in this arena. It has to be set at a low rate where the probability of mutation (pm = 0.02) is the best value during tuning parameters. On the other hand, if the rate is set at a max value, then the search will deflect into a random search and avoid the technique of converging to an optimum solution. All features are within this range [0, 1], so the position must be amended based on ub = 1 & lb = 0. Then fitness function of the new solution that is generated by applying the genetic algorithm is computed. After terminating the SCAGA search, we get the best solution, apply the evaluation measurements, and finally get results. An example of this rule can be viewed in Fig. 2. In this Figure, ten features are given, and number 1 presents the selected feature, and number 0 presents the unselected feature.

4 Evaluation criteria

The proposed hybrid feature selection method SCAGA is run 20 times. Iteration = 80 at each time because we have come to stability at run = 20 to test both the FS approach's stability and the statistical significance. The informative feature subset is evaluated using the evaluation criteria (accuracy of classification, mean, best and worst fitness, Standard Deviation (std), and average selected size) so the best feature subsets are obtained. The proposed FS method results were at maximal accuracy of a classification and a minimal number of features. The evaluation criteria are explained as follows:

Classification accuracy it is used to evaluate the performance of the feature selection method on the dataset that the classifier has been given. The classification accuracy can be calculated by Eq. (4.1) [34]:
$$ Test = \frac{1}{N}\mathop \sum \limits_{j = 1}^{N} \sqrt {\mathop \sum \limits_{i = 1}^{K} \left( {A_{i} - E_{i} } \right)^{2} } $$
(4.1)

where K is the number of test sample points and A_i, E_i are actual and expected class labels for data point i.
Best fitness represents the smallest fitness function value for each optimization algorithm at the dissimilar M operations of an optimization algorithm and can be formulated as in the following Eq. (4.2) [48]:
$$ Best = Min_{{i = 1g_{*}^{i} }}^{M} . $$
(4.2)
Worst fitness represents the maximum solution among the best solutions found for running each optimization algorithm for M times as in Eq. (4.3) [48]:
$$ Worst = Max_{{i = 1g_{*}^{i} }}^{M} . $$
(4.3)
Mean fitness it represents the average of solutions acquired from running an optimization algorithm for diverse M running as in Eq. (4.4) [48]:
$$ Mean \, = \frac{1}{M} \mathop \sum \limits_{i = 1}^{M} g_{*}^{i} . $$
(4.4)
Standard Deviation (std) represents the variance of the best solutions found for running each optimization algorithm for M diverse as runs in Eq. (4.5) [48]:
$$ Std = \sqrt {\frac{1}{M - 1}\sum \left( {g^{i} - Mean} \right)^{2} } $$
(4.5)
Average selection size (average number of features selected) represents the number of features selected to the entire number of features and may be formulated as Eq. (4.6) [20]:
$$ {\text{Average selection size }} = \frac{1}{M}\mathop \sum \limits_{i = 1}^{M} \frac{{size(g_{*}^{i} )}}{D} , $$
(4.6)

where size(x) is the number of values for the vector x, D is the number of features in the original dataset, and $g_{*}^{i}$ is the optimal solution resulting from run number i.

5 The results and discussion

In this section, the proposed method's performance is evaluated and compared using other similar methods using several feature selection datasets. All experiments are conducted using the same conditions. The maximal iteration number is 80, and the number of search agents is 5. All results were calculated with an average of 20 runs using the framework of Matlab on an Intel Core i5 computer, 2.50G CPU, and 4.00 G of RAM with the 64-bit operating system.

5.1 Datasets and parameters

Sixteen datasets with two high dimensions were collected from the University of California Irvine (UCI) Machine Learning Repository, which is available at https://archives.ics.uci.edu/ml/datasets.html [49]. All details of the datasets are represented in Table 2. The proposed new hybrid method is a wrapper-based procedure. Every solution in the population is represented as an index binary vector for the features in the dataset. We take only the optimal solution and its fitness, which attained maximal accuracy of classification with minimal features. Besides, the parameter settings are summarized in Table 3.

Table 2 Datasets description

Full size table

Table 3 Experimental parameters setting of proposed method

Full size table

The relevant features are known previously. In this case, we can validate the features selected by prior knowledge. In contrast, in furthermost real-world problems, the relevant features are unknown previously. So, we have to utilize classification performance for testing datasets to indicate the quality and provide an unbiased evaluation of a final method. As FS is naturally multiobjective, results have been compared in both features’ count instances and obtained classification accuracy.

To conduct a comparison between the diverse FS methods with our proposed hybrid method, the various next indicators are used: First, classification accuracy is an indicator for describing how accurate the classifier is given the selected feature set. Classification accuracy is formulated in Eq. (4.1). Secondly, the best fitness represents the most optimistic solution gained. The criteria of best fitness are formulated in Eq. (4.2). Third, the worst fitness represents the worst solution among all possible solutions that can be obtained for running optimization. The criteria of the worst fitness are formulated in Eq. (4.3). Fourth, mean fitness is the average performance, indicating the average of solutions obtained from running an optimizer with diverse 20 runs; the criteria of mean fitness are formulated in Eq. (4.4). Fifth, Standard Deviation (std) refers to the variation of the acquired optimum solutions from running a stochastic optimizer with 20 diverse runs and formulated in Eq. (4.5). Sixth and finally, the average selection size represents the average number of features selected to the whole number of features defined by Eq. (4.6).

5.2 Experimental results and discussions

In the proposed method, the SCAGA embedded a Genetic Algorithm inside Sine Cosine Algorithm to act as an internal function to improve the exploration ability of the SCA algorithm. The performance of the SCAGA was compared with native SCA and other approaches published in the literature survey as follows: PSO and ALO methods based on two critical goals, namely, the accuracy of the classification and average selection size. The SCAGA was also compared with other FS methods based on the next evaluation criteria: worst fitness, mean fitness, best fitness, and Std. An average of 20 runs-based frameworks calculates all the results of the evaluation criterion.

As shown in Table 4, the proposed hybrid method (SCAGA) is considerably better than the SCA, ALO, and PSO methods in terms of both goals: the number of features selected and the accuracy of classification. The comparison of SCAGA with other FS methods showed that SCAGA performance is better than SCA, PSO, and ALO through all datasets regarding the accuracy of the classification. As noted, the optimal solution is obtained in the M-of-n dataset, so the value of the classification accuracy is one, as it is clearly stated in Table 4. This means that the proposed method can be considered a suitable FS method for small-scale datasets and high-dimensional datasets.

Table 4 The Comparison between the SCAGA method with other optimizations methods in terms of accuracy of classification

Full size table

In Table 4, the proposed method SCAGA achieved maximal accuracy of classification in all used datasets. So, SCAGA is better than SCA, PSO, and ALO. Furthermore, the average number of selected features in Table 5 signifies that the performance of SCAGA is better than that of other methods over all datasets.

Table 5 The percentage of the selected features for the comparative methods

Full size table

As shown in Table 6, the obtained results in best fitness criteria are the best when used the proposed SCAGA compared to other methods.

Table 6 Results of best fitness

Full size table

Table 7 shows that the proposed method (SCAGA) has never taken the worst fitness value, which is straightforward compared with other methods. The bold font refers to the worst value in Table 7. Also, the best results are obtained in mean fitness, as shown in Table 8.

Table 7 Results of worst fitness

Full size table

Table 8 Results of mean fitness

Full size table

In Table 9, the results are gained from (Std) evaluation referring to the variation of the acquired optimal solutions from running stochastic optimization with 20 diverse runs. The bold font refers to the best value in Table 9. SCAGA method outperforms other native SCA and related methods in the literature over ten datasets. Here the compared results of the proposed hybrid feature selection method (SCAGA) are given, the results of the native Sine Cosine Algorithm (SCA), and few other approaches related to feature selection picked from the literature survey such as Antlion Optimization (ALO) and Particle Swarm Optimization (PSO). Our results show that the performance of SCAGA is significantly better than SCA, ALO, and PSO that is common in wrapper-based feature selection. In Fig. 3, the proposed method's performance (SCAGA) for preventing local optima problem is the best over the native sine cosine algorithm.

Table 9 The most remarkable solutions in terms of standard deviation of all optimizations

Full size table

Figure 4a and b summarizes empirical results obtained from proposed methods (SCA and SCAGA). It is observed that the SCAGA method gave a high performance as a multiobjective optimization method where it achieves two conflicting goals, maximum accuracy of classification with the least number of selected attributes on all datasets. All evaluation results fall between [0, 1] when the accuracy of classification is at the maximum values, and average selection size is at the minimum values.

A high-dimensional dataset means data with significant features number that leads to the dimensionality curse; with the high-dimensional dataset, the number of features can exceed the number of observations. Therefore, the calculations become very difficult. FS method has become a critical stage of analyzing high-dimensional datasets.

The SCAGA method is designed for high-dimensional data. It has been shown that it is useful in discarding redundant features and irrelevant features (see Fig. 5). In this study, to evaluate the proposed FS method's performance on high-dimensional datasets, we use two massive datasets, namely, Krvskp.EW (3196 objects with 36 attributes) & Waveform.EW (5000 objects with 40 attributes). As shown in Fig. 5, the proposed SCAGA method got better results than the other methods (SCA, PSO, and ALO) with low-dimensional datasets. It achieves two inconsistent targets, maximum accuracy of classification with the least number of features on all datasets. This means the proposed method can be considered as a suitable FS method for high-dimensional datasets.

6 Conclusion and future work

In this paper, an enhanced version of the Sine Cosine Algorithm (SCA) is proposed with a wrapper model to solve the feature selection problems, called SCAGA. The proposed SCAGA is based on utilizing the genetic algorithm’s crossover operator to generate the new solution and then apply the genetic algorithm of the genetic algorithm to enhance the solution generated by applying it to improve the exploration of the Sine Cosine Algorithm (SCA). It allows a more extensive search to prevent falling in local optima and then find the enhanced best solutions. The proposed hybrid SCAGA has significantly improved the performance of the native Sine Cosine Algorithm (SCA) to solve the FS problem from the reported results. It became more robust through the results showing the qualities of the method proposed for solving real-world problems with unknown and challenging search spaces. Therefore, the proposed method (SCAGA) got better results than other methods (SCA, PSO, and ALO) with all datasets. It achieved two contradictory goals, maximal accuracy of classification with a minimal size of features on all datasets, either with small datasets or high-dimensional datasets. For future work, the proposed approach can be applied to other different datasets to generalize the approach for different domains, such as fault diagnosis in wind turbine test rig datasets. Other new optimizers can solve the feature selection problem, such as Arithmetic Optimization Algorithm (AOA).

References

Liu, H., Motoda, H.: Computational Methods of Feature Selection. Chapman and Hall/CRC Press, Boca Raton (2007)
Book Google Scholar
Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining, vol. 454. Springer, New York (2012)
MATH Google Scholar
Abdullah, S., Shaker, K., Shaker, H.: Investigating a round robin strategy over multi algorithms in optimizing the quality of university course timetables. Int. J. Phys. Sci. 6(6), 1452–1462 (2011)
Google Scholar
Holland. Genetic Algorithm for Solving Optimization Problems (1975)
Abualigah, L.M., Khader, A.T., Al-Betar, M.A., Alomari, O.A.: Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst. Appl. 84, 24–36 (2017)
Article Google Scholar
Abualigah, L., Alsalibi, B., Shehab, M., Alshinwan, M., Khasawneh, A.M., Alabool, H.: A parallel hybrid krill herd algorithm for feature selection. Int. J. Mach. Learn. Cybern. 1–24 (2020)
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: Feature Selection for High-Dimensional Data. Springer , Cham (2015)
Book Google Scholar
Nakamura, R.Y., Pereira, L.A., Costa, K.A., Rodrigues, D., Papa, J.P., Yang, X.S.: BBA: a binary bat algorithm for feature selection. In: 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), (pp. 291–297). IEEE (2012)
Choi, S.I., Oh, J., Choi, C.H., Kim, C.: Input variable selection for feature extraction in classification problems. Signal Process. 92(3), 636–648 (2012)
Article Google Scholar
Fu, K.S., Min, P.J., Li, T.J.: Feature selection in pattern recognition. IEEE Trans. Syst. Sci. Cybern. 6(1), 33–39 (1970)
Article Google Scholar
Abualigah, L., Gandomi, A.H., Elaziz, M.A., Hussien, A.G., Khasawneh, A.M., Alshinwan, M., Houssein, E.H.: Nature-inspired optimization algorithms for text document clustering—a comprehensive analysis. Algorithms 13(12), 345 (2020)
Article MathSciNet Google Scholar
Abualigah, L.: Multi-verse optimizer algorithm: a comprehensive survey of its results, variants, and applications. Neural Comput. Appl. 1–21 (2020)
Abualigah, L.: Group search optimizer: a nature-inspired meta-heuristic optimization algorithm with its results, variants, and applications. Neural Comput. Appl. 1–24 (2020)
Yan, M.: Hybrid Bainary Coral Reefs Optimazation Algorithm with Samulated Annealing for Feature Selection in High Dimentional Bieomedical Datasets, pp. 102–111. Elsevier, Amsterdam (2018)
Google Scholar
Abualigah, L., Diabat, A., Mirjalili, S., AbdElaziz, M., Gandomi, A.H.: The arithmetic optimization algorithm. Comput. Methods Appl. Mech. Eng. 376, 113609 (2021)
Article MathSciNet Google Scholar
Kumar, V., Minz, S.: Feature selection: a literature review. Smart Comput. Rev. 4(3), 211–229 (2014). https://doi.org/10.6029/smartcr.2014.03.007
Article Google Scholar
Kang, S.H., Kim, K.J.: A feature selection approach to find optimal feature subsets for the network intrusion detection system. Clust. Comput. 19(1), 325–333 (2016)
Article Google Scholar
Manoj, R.J., Praveena, M.A., Vijayakumar, K.: An ACO–ANN based feature selection algorithm for big data. Clust. Comput. 22(2), 3953–3960 (2019)
Article Google Scholar
Gokulnath, C.B., Shantharajah, S.P.: An optimized feature selection based on genetic approach and support vector machine for heart disease. Clust. Comput. 22(6), 14777–14787 (2019)
Article Google Scholar
Khamees, A.A., Khalid, S.: Multi-objective Feature Selection: Hybrid of Salp Swarm and Simulated Annealing Approach, pp. 1–14. Springer, Switzerland (2018)
Google Scholar
Du, K.L., Swamy, M.N.S.: Search and Optimization by Metaheuristics, p. 434. Springer, New York City (2016)
Book Google Scholar
Dhaenens, C., Jourdan, L.: Metaheuristics for Big Data. Wiley, New York (2016)
Book Google Scholar
Diao, R., Shen, Q.: Nature inspired feature selection meta-heuristics. Artif. Intell. Rev. 44(3), 311–340 (2015)
Article Google Scholar
Mallenahalli, S.: A Tunable particle swarm size optimization algorithm for feature selection. In: 2018 IEEE Congress on Evolutionary Computation. IEEE (2018)
Diao, R., Shen, Q.: Feature selection with harmony search. IEEE Trans. Syst. Man Cybern. Part B 42(6), 1509–1523 (2012)
Article Google Scholar
Peng, Y.T., Hu, S.: An improved feature selection algorithm based on ant colony optimization. IEEE Access. 6, 69203–69209 (2018)
Article Google Scholar
Yan, M., Luo, W.: A hybrid algorithm based on binary chemical reaction optimization and tabu search for feature selection of high-dimensional biomedical data. Tsinghua Sci. Technol. 23(6), 733–743 (2018)
Article Google Scholar
Sayed, G.I., Khoriba, G.: A Novel Chaotic Salp Swarm Algorithm for Global Optimization and Feature Selection. Springer, New York (2018)
Book Google Scholar
Sahu, B., Debahut, M.: A novel feature selection algorithm using particle swarm optimization for cancer microarray data. Procedia Eng. 38, 27–31 (2012)
Article Google Scholar
Abualigah, L.M.Q.: Feature Selection and Enhanced Krill Herd Algorithm for Text Document Clustering. Studies in Computational Intelligence. Springer, Berlin (2019)
Book Google Scholar
Abualigah, L.M., Khader, A.T., Hanandeh, E.S.: A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J. Comput. Sci. 25, 456–466 (2018)
Article Google Scholar
Chen, H., Hou, Y., Luo, Q., Hu, Z., Yan, L.: Text feature selection based on water wave optimization algorithm. In: International Conference on Advanced Computational Intelligence (ICACI). IEEE, pp. 546 551 (2018)
Padhy, N., Mishra, D., Panigrahi, R.: The survey of data mining applications and feature scope. arXiv preprint (2012).
Han, X.C., Quan, Y.X., Li, J., Zhang, L.: Feature subset selection by gravitational search algorithm optimization. Inf. Sci. 281, 128–146 (2014)
Article MathSciNet Google Scholar
Zanaty, E.A., Ghiduk, A.S.: A novel approach based on genetic algorithms and region growing for magnetic resonance image (MRI) segmentation. Comput. Sci. Inf. Syst. 10(3), 1319–1342 (2013)
Article Google Scholar
Mirjalili, S.: ALO: Antlion Optimization for solving feature selection problems. Adv. Eng. Softw. 83, 80–98 (2015)
Article Google Scholar
Linoff, G.S., Berry, M.J.: Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management. Wiley, New York (2011)
Google Scholar
Zhang, Z., Ning, Y.: Effective semi-supervised nonlinear dimensionality reduction for wood defects recognition. Comput. Sci. Inf. Syst. 7(1), 127–138 (2010)
Article Google Scholar
Wan, M.W., Ye, L.: A feature selection method based on modified binary coded ant colony optimization algorithm. Appl. Soft Comput. 49, 248–258 (2016)
Article Google Scholar
Zhao, Z.A., Liu, H.: Spectral Feature Selection for Data Mining. CRC Press, Boca raon (2011)
Book Google Scholar
Chen, W.J., Li, L.: A heuristic feature selection approach for text categorization by using chaos optimization and genetic algorithm. In: Hindawi Publishing Corporation, Mathematical Problems in Engineering, pp. 1–6, (2013)
Ghamisi, P., Jon, A.B.: Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geosci. Remote Sens. Lett. 12(2), 309–313 (2014)
Article Google Scholar
Oh, I.S., Lee, J.S., Moon, B.R.: Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1424–1437 (2004)
Article Google Scholar
Atyabi, A., Luerssen, M., Fitzgibbon, S., Powers, D.M.: Evolutionary feature selection and electrode reduction for EEG classification. In: IEEE Congress on Evolutionary Computation (CEC), (pp. 1–8). IEEE (2012)
Vasant, P.: Hybrid simulated annealing and genetic algorithms for industrial production management problems. Int. J. Comput. Methods 7(02), 279–297 (2010)
Article Google Scholar
Wu, J., Lu, Z., Jin, L.: A novel hybrid genetic algorithm and simulated annealing for feature selection and kernel optimization in support vector regression. In: 2012 IEEE 13th International Conference on Information Reuse and Integration (IRI), (pp. 401–406). IEEE (2012)
Mirjalili, S.: SCA: a sine cosine algorithm for solving optimization problems. Knowl.-Based Syst. 96, 120–133 (2016)
Article Google Scholar
Emary, E., Zawbaa, H.M., AboulElla, H.: Binary Gray Wolf optimization approaches for feature selection. Neuro computing 2312(15), 1–33 (2015)
Google Scholar
Abualigah, L.M., Khader, A.T.: Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput. 73(11), 4773–4795 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Sciences and Informatics, Amman Arab University, Amman, 11953, Jordan
Laith Abualigah & Akram Jamal Dulaimi

Authors

Laith Abualigah
View author publications
You can also search for this author in PubMed Google Scholar
Akram Jamal Dulaimi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laith Abualigah.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abualigah, L., Dulaimi, A.J. A novel feature selection method for data mining tasks using hybrid Sine Cosine Algorithm and Genetic Algorithm. Cluster Comput 24, 2161–2176 (2021). https://doi.org/10.1007/s10586-021-03254-y

Download citation

Received: 19 July 2020
Revised: 22 December 2020
Accepted: 06 February 2021
Published: 22 February 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s10586-021-03254-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A novel feature selection method for data mining tasks using hybrid Sine Cosine Algorithm and Genetic Algorithm

Abstract

Similar content being viewed by others

An evolutionary computation-based approach for feature selection

A Comparative Study of Evolutionary Algorithms with a New Penalty Based Fitness Function for Feature Subset Selection

Feature Subset Selection Approach by Gray-Wolf Optimization

1 Introduction

2 Related works

3 Procedures and methodology