A Feature Selection Method Based on Multi-objective Optimisation with Gravitational Search Algorithm

Dickson, Bolou Bolou; Wang, Shengsheng; Dong, Ruyi; Wen, Changji

doi:10.1007/978-3-662-49155-3_57

Bolou Bolou Dickson⁹,
Shengsheng Wang⁹,
Ruyi Dong⁹ &
…
Changji Wen⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 569))

Included in the following conference series:

International Conference on Geo-Informatics in Resource Management and Sustainable Ecosystem

2384 Accesses

Abstract

The process of feature selection (FS) is a substantial task that has a significant effect in the performance of a given algorithm. The goal is to choose a subset of available features by eliminating the unnecessary features. This hybrid algorithm is in maximising the classification performance and minimising the number of features to achieve an outstanding performance through a less complex procedure. From the experiments, FSMOGSA was noted to be quite unparalleled in comparison with other methods in reducing the error rate, and maximising the general performance through irrelevant feature reduction.

Access provided by Autonomous University of Puebla. Download conference paper PDF

An evolutionary computation-based approach for feature selection

Article 08 November 2019

A novel feature selection method for data mining tasks using hybrid Sine Cosine Algorithm and Genetic Algorithm

Article 22 February 2021

A Novel Approach for Feature Selection

Keywords

1 Introduction

Today, the relevance of feature selection (FS) in machine learning cannot be under-rated. FS has considerable importance in real life applications such as medicine, astronomy, biology, to mention but a few. The goal is to choose a subset of available features by eliminating unnecessary features [1]. To obtain any desired result in using datasets, high dimensionality imposes learning difficulties by degradation of relevant information on the learned models. Real world datasets are entangled with many irrelevant and misleading features for this reason (FS) is adopted to eliminate such impediments. Furthermore, the objective of FS is to select a relevant subset of features say q, from a set of p features (q < p) in a given dataset. To extract sufficient information for example from an image set, it is appropriate to eliminate the features with no predictive information and avoid redundant features.

2 Related Works on Feature Selection

Efficient processing and retrieval of features rely on the number of relevant features extracted [2]. Hamdani et al. developed an entirely different method called, multi-objective feature selection algorithm using non-dominated sorting-based multi-objective GA II (NSGAII), however, it was not compared with any algorithm [3].

Our work is focused on Multi-Object feature selection with gravitational search algorithm (FSMOGSA) which is completely new in terms of feature selection. Tian et al. [4] proposed a work on multi-objective optimization of short-term hydrothermal scheduling using non-dominated sorting gravitational search algorithm with chaotic mutation. A. R. Bhowmik and A. K. Chakraborty, proposed, Solution of optimal power flow using non-dominated sorting multi-objective opposition based gravitational search algorithm (NSMOOGSA) [5]. And in 2013, Bing Xue et al. proposed PSO for feature selection and classification: a multi-objective approach and investigate two PSO-based multi-objective feature selection algorithms [6].

Our FSMOGSA is in maximising the classification performance and minimising the number of features to achieve an outstanding performance through a less complex method. It finds the non-dominated (Pareto fronts) solutions and groups such solutions into subsets of indexed non-dominated solutions.

2.1 Basic Gravitational Search Algorithm

Gravitational search algorithm was introduced in 2009 by Rashedi et al. [7], where the solutions of optimisation problems are regarded as agents. All agents attract one another in the solution space due to the force of gravity, lighter agents are attracted (converge) towards the heavier agents, known as the optimal solution based on the law of motion. Given a system of N agents the position of the ith agent is:

$$ X_{i} = (x_{i}^{1} , \ldots ,x_{i}^{d} , \ldots ,x_{i}^{n} ),\quad {\text{for}}\,{\text{i}} = 1, 2, \ldots ,{\text{N}} $$

(1)

Where $ x_{i}^{d} $ is the position of the ith agent in the dth dimension and n is the dimension of the space.

$$ F_{ij}^{d} (t) = G(t)\frac{{M_{pi} (t) \times M_{aj} (t)}}{{R_{ij} (t) + \varepsilon }}(x_{j}^{d} (t) - x_{i}^{d} (t)) $$

(2)

Where M_aj is called active gravitational mass, M_pi is passive gravitational mass, G(t) is gravitational constant at time t, $ \varepsilon $ is infinitesimally small value then $ R_{ij} (t) $, is a Euclidean distance between masses i and j, $ R_{ij} (t) = |X_{i} (t),X_{j} (t)|_{2} . $

Equation (3), is the force of the object i

$$ \,\mathop F\nolimits_{i}^{d} (t) = \sum\limits_{j = 1,j \ne i}^{N} {\mathop {rand}\nolimits_{j} \mathop F\nolimits_{ij}^{d} \left( t \right)} , $$

(3)

$$ {\text{The}}\,{\text{acceleration}}\,{\text{of}}\,{\text{the}}\,{\text{ith}}\,{\text{object}}\,{\text{is}}:a_{i}^{d} (t) = \frac{{F_{i}^{d} (t)}}{{M_{ii} }} $$

(4)

2.2 Velocity and Position of Particles

The successive velocity of a given object is obtained by the addition of its current velocity to its acceleration that is Eq. (5), and the current position of the object can be obtained by (6).

$$ v_{i}^{d} (t + 1) = rand_{i}^{d} v_{i}^{d} (t) + a_{i}^{d} (t) $$

(5)

$$ x_{i}^{d} (t + 1) = x_{i}^{d} (t) + v_{i}^{d} (t + 1) $$

(6)

where $ rand_{i} $ is a random number between 0 and 1, present velocity $ v_{i}^{d} (t) $, next possible velocity $ v_{i}^{d} (t + 1) $, next possible position $ x_{i}^{d} (t + 1) $, present position $ x_{i}^{d} (t) $, and acceleration $ a_{i}^{d} (t) $ of the ith particle at time t.

3 Multi-objective Gravitational Search Algorithm and Pareto Front

This method operates based on the concept of dominance of set of optimal solutions called Pareto front. A given multi-objective optimisation entails maximisation or minimisation of multiple conflicting objective functions. From the training sets, any of the subsets possessing fewer features is presumed to achieve higher quality function value, as such the extrapolated features with the best fitness are chosen mostly out of such subsets of features. Reducing the number of irrelevant features will have a positive effect on the performance of the entire process. The minimisation is expressed below,

$$ min,F(x) = [f_{1} (x), \, f_{2} (x), \ldots ,f_{M} (x)] $$

(7)

$$ g_{i} (x) \le 0,\quad i = 1,2, \ldots m $$

(8)

$$ h_{i} (x) = 0,\quad i = 1,2, \ldots l $$

(9)

where x is the vector of decision variables, $ f_{i} (x) $ is a function of x, and k is the number of objective functions to be minimised, $ g_{i} (x) $ and $ h_{i} (x) $ are the constraint functions of the problem. Given any minimisation task, solution $ x_{1} $ will dominate solution $ x_{2} $ if both satisfy this condition:

$$ \forall m \in [1,M],f_{m} (x_{1} ) \le f_{m} (x_{2} )\quad {\text{and}}\quad \mathrel\backepsilon n:f_{n} (x_{1} ) < f_{n} (x_{2} ) $$

(10)

For $ m,n \in [1,2, \ldots ,M] $.

3.1 The Main Optimisation Process of (FSMOGSA)

If a given solution is not dominated by another set of solutions, then that solution is called a Pareto-optimal solution. A collection of all the sets of Pareto-optimal solutions yields the Pareto front, some basic principles of choosing dominant or non-dominated solutions in our algorithm is based on;

(a)
The number of individuals a given individual dominates.
(b)
The Pareto front an individual is located.
(c)
The number of individuals that dominates a given individual solution.

Multi-objective tasks result if there is the necessity to make optimal decisions between two or more conflicting objectives in a solution space. Hence, we will make the equation of fitness of particles which is effective in a single objective to an equation adoptable for multi-objective as shown below:

$$ M_{i} (t) = \text{||}\varepsilon \text{||} + \sum\limits_{k = 1}^{k} {\left[ {m_{i}^{k} (t)} \right]}^{2} /\sum\limits_{j = 1}^{N} {\sum\limits_{k = 1}^{K} {\left[ {m_{j}^{k} (t)} \right]} }^{2} $$

(11)

$$ m_{i}^{k} (t) = \frac{{fit_{i}^{k} (t) - worst^{k} (t)}}{{best^{k} (t) - worst^{k} (t)}},for\,k \in \left[ {1,k} \right] $$

(12)

where $ ||\varepsilon || $ is an infinitesimally small error value, $ m_{i}^{k} (t) $ is the normalised fitness value of the ith agent in the kth objective; $ fit_{i}^{k} (t) $ is the fitness value of the ith agent in the kth objective; K is the number of objectives; $ best^{k} (t) $ is the best fitness of all agents in the kth objective; $ worst^{k} (t) $ is the worst fitness of all the agents in the kth objective (Fig. 1).

The particles are initialised and the fitness values, velocity, acceleration and position of each particle is calculated and updated. Then, the non-dominated solutions are chosen which is followed by the random mutation to produce a new population for another optimisation.

3.1.1 The fitness function in Eq. (11) will improve the performance of the (FSMOGSA) algorithm by prudently minimising the convergence rate of agents in the process. The error factor $ ||\varepsilon \text{||} $ in the fitness equation stabilises the motion of the agents, it assumes infinitesimally small values within (0,1). The chosen features in the training set will be in categories of; False negative, False positive, True negative and True positive. After the fitness function had been utilised the error rate is further evaluated by Eq. (13) below:

$$ ErrorRate(\psi ) = \frac{Fn + Fp}{Fn + Fp + Tn + Tp} $$

(13)

For Fn is false negative, Fp is positive, Tn is true negative and Tp is true positive feature. This error rate can be adjusted to minimise error during feature selection.

3.1.2 Then the second purpose is reducing the number of features by choosing only very highly ranked features, redundant features are left out. This work is a multi-objective feature selection algorithm so a function other than Eq. (13) which will perform the dual purpose of the fitness function is used to minimise the classification error rate and as well guarantee minimisation of the number of features with high classification performance is adopted as in Eq. (14):

$$ Fitness_{function} = \left\{ {\frac{{F_{selected} }}{{F_{All} }}*\alpha + \frac{{ER_{selected} }}{{ER_{All} }}*(1 - \alpha )} \right. $$

(14)

Where $ F_{selected} $ is number of selected features, $ F_{All} $ is all the available features, $ \alpha $ is a negligible constant within (0, 1), $ ER_{selected} $ is classification error rate of the selected feature subset, $ ER_{All} $ is the classification error rate of all available features of the training set. A preponderantly negative occurrence in SI optimisation is stagnation, where the swarm agents get confined in local optimum.

3.2 Random Mutation to Generate New Agents

After every iteration period a mutation process is added to the population to randomly generate a new solution population. As a result of the unforeseen random factors, a random mapping process is employed to overcome premature convergence in FSMOGSA algorithm in the creation of new agents. Whenever a new agent dominates the current existing agent, the newly generated agent replaces the existing agent. In other words, it updates the masses and chooses the agent with heavier mass. Equations (15), (16) and (17) below are the mutation equations:

$$ \upzeta_{i}^{d} = [x_{i}^{d} (t) - x_{\hbox{min} }^{d} (t)]/[x_{\hbox{max} }^{d} (t) - x_{\hbox{min} }^{d} (t)] $$

(15)

$$ \upeta_{i}^{d} =\uplambda \upzeta _{i}^{d} \left( {1 -\upzeta_{i}^{d} } \right),\upzeta_{i}^{d} \in [0,1] $$

(16)

$$ x \cdot c_{i}^{d} (t) = \eta_{i}^{d} [x_{\hbox{max} }^{d} (t) - x_{\hbox{min} }^{d} (t)] + x_{\hbox{min} }^{d} (t) $$

(17)

where $ \upzeta_{i}^{d} $ represents the normalised position of the ith agent in the dth dimension; $ \uplambda $ is a constant; $ \upeta_{i}^{d} $ is the transformed value by random mutation; $ xc_{i}^{d} (t) $ is the new position of the ith agent [8].

To determine the position of a particle undergoing mutation Eq. (17) is adopted. In the end of the mutation process, the steps to update the velocity and position of the offspring population is developed for ranking of the solutions then another optimisation process is carried out to select another set of Pareto solutions. The quantity $ \upzeta_{i}^{d} $ in Eq. (15) is randomly generated within the interval [0,1] as the process starts, $ \lambda $ is considered to be a constant in Eq. (16).

The mutation begins by choosing a particle say $ p_{1} $, in a random pattern within the current population. Then from a given Pareto front, another two particles $ p_{2} $ and $ p_{3} $ are chosen lying within a bound. By Eq. (16) the mutation factor of particle $ p_{1} $, is evaluated in the dth dimension from when d assumes the value 1 to n, then a newly mutated particle is produced. The next step is the substitution of $ p_{1} $ with the newly mutated particle. When the mutation process is over the fitness values of the new population is evaluated, then, the error rate is also re-evaluated of the chosen features with Eqs. (13) and (14). Every abnormally copied code of a feature will result to a new feature (mutant), the process is not done orderly but randomly.

3.3 Indexed Non-dominated Solutions (Pareto Front Subsets)

A multi-objective method, unlike a single objective one searches for sets of optimally non-dominated solutions. The set of solutions are indexed (grouped) as non-dominated sets according to the various individual feature types. This idea helps to reduce extraneous features from the relevant features and improves the classification performance because the number of features is minimised.

Definition 3.3.1

Given different indexed feature sets, let F be a finite feature space, for every feature f in F, there is a set of features S_f such that a set of common features collected as G of S_f sets is known as an indexed collection of feature sets, that is a collection of feature sets indexed by F in different dimensions. The collection G is represented by $ \left\{ {S_{f} } \right\}_{f \in F} $, is the finite sets of indexed non-dominated feature sets represent Pareto fronts. Let G be the indexed sets of non-dominated solutions, so $ \left\{ {S_{f} } \right\}_{f \in F} = \left\{ {\left\{ {S_{{f_{1} }} } \right\}_{{f_{1} \in F}} ,\left\{ {S_{{f_{2} }} } \right\}_{{f_{2} \in F}} , \ldots \left\{ {S_{{f_{n} }} } \right\}_{{f_{n} \in F}} } \right\} $ hence, $ \left\{ {S_{f} } \right\}_{f \in F} \subset G $.

Generally, for finite sets of non-dominated solutions, the union and intersection; $ \bigcap\limits_{i = 1}^{n} {G_{i} = \left\{ {x \in U:\exists i \in \left\{ {1,2, \ldots ,n} \right\},:x \in G_{i} } \right\}} $ and $ \bigcap\limits_{i = 1}^{n} {G_{i} = \left\{ {x \in U:\forall i \in \left\{ {1,2, \ldots ,n} \right\},:x \in G_{i} } \right\}} $.

The idea of indexed sets of non-dominated solutions is adopted here in two aspects:

(i) Disjoint Non-dominated Solutions (Pareto Fronts)

For some indexed sets of non-dominated solutions, there are some indexed finite Pareto fronts $ \left\{ {G_{1,} G_{2, \ldots ,} G_{n} } \right\} $, the arbitrary intersection of the Pareto fronts is; $ G_{1} \cap G_{2 \ldots } \cap G_{n} = { \oslash } $. It implies, $ \bigcap\limits_{f \in F} {U_{f} } = \left\{ {\left. x \right|x \in U_{f} ,\forall f \in F} \right\} = { \oslash } $, and called “distinct or disjoint indexed non-dominated solutions” in $ F $ which are ranked.

(ii) Connected or Intersecting Non-dominated Solutions (Pareto Fronts)

If the solution subsets of non-dominated solutions have one or more common features in the indexed non-dominated solutions, then, the arbitrary intersection is non-empty. That is, $ G_{1} \cap G_{2 \ldots } \cap G_{n} \ne { \oslash } $, so $ \bigcap\limits_{f \in F} {U_{f} } = \left\{ {\left. x \right|x \in U_{f} ,\forall f \in F} \right\} \ne { \oslash } $, hence, called “connected or intersecting indexed non-dominated solutions”.

This segregates the non-dominated solutions and the irrelevant solutions are neglected.

The collection G is represented by $ \left\{ {S_{f} } \right\}_{f \in F} $, in this algorithm the finite sets of indexed non-dominated feature sets represent Pareto fronts.

3.4 K-Nearest Neighbor (K-NN) Classifier

The K-nearest neighbor (K-NN) classifier is employed here to evaluate our method as a result of its simplicity. The introduction of the K-nearest neighbor (K-NN) method in 1951 by Fix and Hodges has greatly contributed in the improvement of new algorithms. One reason of the (K-NN) algorithm is for the classification of new features after the random mutation. Due to the attributes and training samples obtained, the K-NN evaluates the classification performed by our FSMOGSA method. As a multi-objective task, the K-NN method consists of a supervised learning task where new indexed non-dominated solution sets are evaluated in the K-neighbourhood and classified based on the ranking in the solution space.

4 Experiments

The experiments were performed using some data sets from the UCI open access repository, in which three other algorithms were used to compare with the performance of our (FSMOGSA) method. Two of the algorithms are single objective method, that is, gravitational search algorithm (GSA) and Binary particles swarm optimisation (BPSO). The other is a multi-objective optimisation algorithm called Non-dominated solutions particle swarm optimisation feature selection (NSPSOFS), these three methods were used in the comparison with our proposed method. Precisely, four different data sets were used in the experiment to ascertain the efficiency of FSMOGSA. The experiment was implemented on a 32-bit windows 7 operating system, processor: Intel®, core™, Duo 3.00 GHz, RAM: 4 GB computer, with MATLAB (R2012a) suite (Table 1).

Table 1. Is the description of the four data sets

Full size table

All the four methods were tested on each of the four (4) data sets and the results were compared with one another. The outcome of each of the four algorithms for each data set shows the rate of error with regards to the number of features in the data set obtained.

Table 2, shows the percentage number of features and the error values of the four methods, the best results (error values) obtained in each data set are underlined in bold character. Every data set has different number of features as such the iterations are different in number. Obviously, the error values for FSMOGSA algorithm are the least. As the number of features increases the error value increases also. The Iris data set was left out here due to the few features in it.

Table 2. Is a performance Comparison of the error values with percentage number of features.

Full size table

Figure 2(a–d) show the graphical display of the experiment on each data set.

The FSMOGSA algorithm shows a great degree of stabilisation of the low error rate and minising the number of features to achieve our objectives. Since our multi-objective was to maximise performance by reducing the error rate and at the same time reducing the number of features, we used the error function in Eq. (14) to achieve these goals. While the generation of new particles increased the chances of obtaining optimal sets of non-dominated solutions, the indexed sets facilitate the ranking and choosing of the best solution sets. The experimental validation of (FSMOGSA) is a good indication of the efficiency in its application to feature selection in a multi-objective task over most existing feature selection methods. Again, it indicates that FSMOGSA searches for the non-dominated solutions through the integration of minimal feature set numbers and classifier performance to yield optimised indexed non-dominated (Pareto fronts) solutions with high classification accuracy than the other three methods.

5 Conclusion

The experimental validation indicates that our method is a more efficient one The best results for the multi-objective algorithm’s validation were obtained by our FSMOGSA algorithm, the next best result was NSPSOFS both of which are hybrid methods. This shows that the hybrid methods have a better performance than the regular methods. From the experiments, FSMOGSA was noted to be quite unparalleled in comparison with the other methods in reducing the error rate and maximising the general performance by minimising the irrelevant features. For the objective of efficient performance we suggest a future work in binary GSA hybrid method.

References

Chen, B., Chen, L., Chen, Y.: Efficient ant colony optimization for image feature selection. Sig. Process. 93, 1566–1576 (2013)
Article Google Scholar
Jing, L., Zhang, C., Ng, M.K.: SNMFCA: supervised NMF-based image classification and annotation. In: IEEE 2011 (2011)
Google Scholar
Hamdani, T.M., Won, J.-M., Alimi, A.M., Karray, F.: Multi-objective feature selection with NSGA II. In: Proceedings of 8th ICANNGA Part I, vol. 4431, pp. 240–247 (2007)
Google Scholar
Tian, H., Yuan, X., Ji, B., Chen, Z.: Multi-objective optimization of short-term hydrothermal scheduling using non-dominated sorting gravitational search algorithm with chaotic mutation. Energy Convers. Manage. 81, 504–519 (2014)
Article Google Scholar
Bhowmik, A.R., Chakraborty, A.K.: Solution of optimal power flow using non dominated sorting multi-objective opposition based gravitational search algorithm. Electr. Power Energy Syst. 64, 1237–1250 (2015)
Article Google Scholar
Xue, B., Zhang, M., Browne, W.N.: Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans. Cybern. 43(6), 1656–1671 (2013)
Article Google Scholar
Rashedi, E., Nezamabadi-Pour, H., Saryazdi, S.: GSA: a gravitational search algorithm. Inform. Sci. 179, 2232–2248 (2009)
Article Google Scholar
Tian, H., Yuan, X., Ji, B., Chen, Z.: Multi-objective optimization of short-term hydrothermal scheduling using non-dominated sorting gravitational search algorithm with chaotic mutation. Energy Convers. Manage. 81, 504–519 (2014)
Article Google Scholar

Download references

Acknowledgements

This project is supported by the National Natural Science Foundation of China (61472161, 61133011, 61303132, 61202308), Science & Technology Development Project of Jilin Province (20140101201JC, 201201131).

Author information

Authors and Affiliations

College of Computer Science and Technology, Jilin University, Changchun, 130012, People’s Republic of China
Bolou Bolou Dickson, Shengsheng Wang, Ruyi Dong & Changji Wen

Authors

Bolou Bolou Dickson
View author publications
You can also search for this author in PubMed Google Scholar
Shengsheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ruyi Dong
View author publications
You can also search for this author in PubMed Google Scholar
Changji Wen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shengsheng Wang .

Editor information

Editors and Affiliations

Wuhan University , Wuhan, China
Fuling Bian
Eastern Michigan University , Ypsilanti, China
Yichun Xie

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dickson, B.B., Wang, S., Dong, R., Wen, C. (2016). A Feature Selection Method Based on Multi-objective Optimisation with Gravitational Search Algorithm. In: Bian, F., Xie, Y. (eds) Geo-Informatics in Resource Management and Sustainable Ecosystem. GRMSE 2015 2015. Communications in Computer and Information Science, vol 569. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49155-3_57

Download citation

DOI: https://doi.org/10.1007/978-3-662-49155-3_57
Published: 13 January 2016
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49154-6
Online ISBN: 978-3-662-49155-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Feature Selection Method Based on Multi-objective Optimisation with Gravitational Search Algorithm

Abstract

Similar content being viewed by others

An evolutionary computation-based approach for feature selection