Abstract
The performance of classification algorithms in machine learning is affected by the features used to describe the labeled examples presented to the inducers. Therefore, the problem of feature subset selection has received considerable attention. Genetic approaches to this problem usually follow the wrapper approach: treat the inducer as a black box that is used to evaluate candidate feature subsets. The evaluations might take a considerable time and the traditional approach might be impractical for large data sets. This paper describes a hybrid of a simple genetic algorithm and a method based on class separability applied to the selection of feature subsets for classification problems. The proposed hybrid was compared against each of its components and two other feature selection wrappers that are used widely. The objective of this paper is to determine if the proposed hybrid presents advantages over the other methods in terms of accuracy or speed in this problem. The experiments used a Naive Bayes classifier and public-domain and artificial data sets. The experiments suggest that the hybrid usually finds compact feature subsets that give the most accurate results, while beating the execution time of the other wrappers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
John, G., Kohavi, R., Phleger, K.: Irrelevant features and the feature subset problem. In: Proceedings of the 11th International Conference on Machine Learning, pp. 121–129. Morgan Kaufmann, San Francisco (1994)
Kohavi, R., John, G.: Wrappers for feature subset selection. Artificial Intelligence 97, 273–324 (1997)
Jain, A., Zongker, D.: Feature selection: evaluation, application and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 153–158 (1997)
Siedlecki, W., Sklansky, J.: A note on genetic algorithms for large-scale feature selection. Pattern Recognition Letters 10, 335–347 (1989)
Brill, F.Z., Brown, D.E., Martin, W.N.: Genetic algorithms for feature selection for counterpropagation networks. Tech. Rep. No. IPC-TR-90-004, University of Virginia, Institute of Parallel Computation, Charlottesville (1990)
Brotherton, T.W., Simpson, P.K.: Dynamic feature set training of neural nets for classification. In: McDonnell, J.R., Reynolds, R.G., Fogel, D.B. (eds.) Evolutionary Programming IV, Cambridge, MA, pp. 83–94. MIT Press, Cambridge (1995)
Bala, J., De Jong, K., Huang, J., Vafaie, H., Wechsler, H.: Using learning to facilitate the evolution of features for recognizing visual concepts. Evolutionary Computation 4, 297–311 (1996)
Kelly, J.D., Davis, L.: Hybridizing the genetic algorithm and the K nearest neighbors classification algorithm. In: Belew, R.K., Booker, L.B. (eds.) Proceedings of the Fourth International Conference on Genetic Algorithms, San Mateo, CA, pp. 377–383. Morgan Kaufmann, San Francisco (1991)
Punch, W.F., Goodman, E.D., Pei, M., Chia-Shun, L., Hovland, P., Enbody, R.: Further research on feature selection and classification using genetic algorithms. In: Forrest, S. (ed.) Proceedings of the Fifth International Conference on Genetic Algorithms, San Mateo, CA, pp. 557–564. Morgan Kaufmann, San Francisco (1993)
Raymer, M.L., Punch, W.F., Goodman, E.D., Sanschagrin, P.C., Kuhn, L.A.: Simultaneous feature scaling and selection using a genetic algorithm. In: Bäck, T. (ed.) Proceedings of the Seventh International Conference on Genetic Algorithms, San Francisco, pp. 561–567. Morgan Kaufmann, San Francisco (1997)
Kudo, M., Sklansky, K.: Comparison of algorithms that select features for pattern classifiers. Pattern Recognition 33, 25–41 (2000)
Vafaie, H., De Jong, K.A.: Robust feature selection algorithms. In: Proceedings of the International Conference on Tools with Artificial Intelligence, pp. 356–364. IEEE Computer Society Press, Los Alamitos (1993)
Inza, I., Larrañaga, P., Etxeberria, R., Sierra, B.: Feature subset selection by Bayesian networks based optimization. Artificial Intelligence 123, 157–184 (1999)
Cantú-Paz, E.: Feature subset selection by estimation of distribution algorithms. In: Langdon, W.B., Cantú-Paz, E., Mathias, K., Roy, R., Davis, D., Poli, R., Balakrishnan, K., Honavar, V., Rudolph, G., Wegener, J., Bull, L., Potter, M.A., Schultz, A.C., Miller, J.F., Burke, E., Jonoska, N. (eds.) GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, San Francisco, CA, pp. 303–310. Morgan Kaufmann Publishers, San Francisco (2002)
Raymer, M.L., Punch, W.F., Goodman, E.D., Kuhn, L.A., Jain, A.K.: Dimensionality reduction using genetic algorithms. IEEE Transactions on Evolutionary Computation 4, 164–171 (2000)
Inza, I., Larrañaga, P., Sierra, B.: Feature subset selection by Bayesian networks: a comparison with genetic and sequential algorithms. International Journal of Approximate Reasoning 27, 143–164 (2001)
Inza, I., Larrañaga, P., Sierra, B.: Feature subset selection by estimation of distribution algorithms. In: Larrañaga, P., Lozano, J.A. (eds.) Estimation of Distribution Algorithms: A new tool for Evolutionary Computation, Kluwer Academic Publishers, Dordrecht (2001)
Ozdemir, M., Embrechts, M.J., Arciniegas, F., Breneman, C.M., Lockwood, L., Bennett, K.P.: Feature selection for in-silico drug design using genetic algorithms and neural networks. In: IEEE Mountain Workshop on Soft Computing in Industrial Applications, pp. 53–57. IEEE Press, Los Alamitos (2001)
Lanzi, P.: Fast feature selection with genetic algorithms: a wrapper approach. In: IEEE International Conference on Evolutionary Computation, pp. 537–540. IEEE Press, Los Alamitos (1997)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Oh, I.S., Lee, J.S., Suen, C.: Analysis of class separation and combination of classdependent features for handwritting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 21, 1089–1094 (1999)
Harik, G., Cantú-Paz, E., Goldberg, D.E., Miller, B.L.: The gambler’s ruin problem, genetic algorithms, and the sizing of populations. Evolutionary Computation 7, 231–253 (1999)
Matsumoto, M., Nishimura, T.: Mersenne twister: A 623-dimensionally equidistributed uniform pseudorandom number generator. ACM Transactions on Modeling and Computer Simulation 8, 3–30 (1998)
Blake, C., Merz, C.: UCI repository of machine learning databases (1998)
Miller, B.L., Goldberg, D.E.: Genetic algorithms, selection schemes, and the varying effects of noise. Evolutionary Computation 4, 113–131 (1996)
Alpaydin, E.: Combined 5 × 2cv F test for comparing supervised classification algorithms. Neural Computation 11, 1885–1892 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cantú-Paz, E. (2004). Feature Subset Selection, Class Separability, and Genetic Algorithms. In: Deb, K. (eds) Genetic and Evolutionary Computation – GECCO 2004. GECCO 2004. Lecture Notes in Computer Science, vol 3102. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24854-5_96
Download citation
DOI: https://doi.org/10.1007/978-3-540-24854-5_96
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22344-3
Online ISBN: 978-3-540-24854-5
eBook Packages: Springer Book Archive