Abstract
The purpose of this paper is to discuss about feature selection methods. We present two common feature selection approaches: statistical methods and artificial intelligence approach. Statistical methods are exposed as antecedents of classification methods with specific techniques for choice of variables because we pretend to try the feature selection techniques in classification problems. We show the artificial intelligence approaches from different points of view. We also present the use of the information theory to build decision trees. Instead of using Quinlan’s Gain we discuss others alternatives to build decision trees. We introduce two new feature selection measures: MLRelevance formula and the PRelevance. These criteria maximize the heterogeneity among elements that belong to different classes and the homogeneity among elements that belong to the same class. Finally, we compare different feature selection methods by means of the classification of two medical data sets.
Chapter PDF
Similar content being viewed by others
Keywords
- Feature Selection
- Feature Subset
- Feature Selection Method
- Irrelevant Feature
- Artificial Intelligence Technique
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Koller, D., Mehran, S.: Toward Optimal Feature Selection. Computer Science Department. Stanford University, Stanford (1997)
Grau, R.: Estadística aplicada con ayuda de paquetes de software. Editorial Universitaria, Jalisco (1994)
Michie, D., Spiegelhalter, J.T.C.C.: Machine Learning, Neural and Statistical Classification. Springer, Heidelberg (1994)
Bello, R.: Métodos de Solución de Problemas para la Inteligencia Artificial. Universidad Central de Las Villas, Santa clara (1998)
Blum, A., Langley, P.: Selection of relevant features and examples in mechine learning. Artificial Intelligence 97, 245–271 (1997)
John, G., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problems. In: Proceedings 11th International conferences on Machine Learning, New Brunswick, NJ (1994)
Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings 9th International Conference on Machine Learning, Aberdeen, Scotland (1992)
Almuallim, H., Dietterich, T.G.: Learning with many irrelevant features. In: Proceedings of AAAI 1992, MIT Press, Cambridge (1992)
Langley, P., Sage, S.: Oblivious decision trees and abstract cases. In: Working Notes of the AAAI 1994, Workshop on Case Base Reasoning, Seattle (1994)
Quinlan, J.R.: Induction of Decision Trees. Machine Learning, 81–106 (1986)
Quinlan, J.R.: Improved Use of Continuous Attributes in C4.5. Research Journal of Artificial Intelligence 4, 77–90 (1996)
Breiman, L.F., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth, Belmont (1984)
Brender, J.: Measuring quality of medical knowledge. In: Proceeding of the Twelfth International Congress of the European Federation for Medical Informatics (1994)
Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufman, San Mateo (1993)
Quinlan, J.R.: See5/C5.0 (2002)
Mántaras, R.L.: A Distance-Based Attribute Selection Measure for Decision Tree Induction. Machine Learning (1991)
Cheguis, I., Yablonskii, S.: K-Testor, Moscow: Trudy Matematicheskava Instituta imeni V. A. Steklova LI. 270–360 (1958)
Zhuravlev, Y.I., Tuliaganov, S.E.: Measures to Determine the Importance of Objects in Complex Systems, Moscu., vol. 12, pp. 170–184 (1972)
Aizenberg, N.N., Tsipkin, A.I.: Prime Tests, vol. 4, pp. 801–802. Doklady Akademii Nauk (1971)
Ruiz-Shulcloper, J., Cortés, M.L.: K-testores primos. Revista Ciencias Técnicas Físicas y Matemáticas 9, 17–55 (1991)
Pawlak, Z.: Rough Sets- Theorical Aspects of Reasoning about Data. Kluwer Academic, Dondrecht (1991)
Komorowski, J., et al.: A Rough Set Perspective on Data and Knowledge. In: Klosgen, W. (ed.) The HandBook of DataMining and Knowledge Discovery, Oxford University Press, Oxford (1999)
Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning databases. Department of Information and Computer Science. University of California, Berkeley (2003)
Aha, D.W.: Case-Based Learning Algorithm (1991)
Jabson, D.: Applied Multivariate Data Analysis. Categorical and Multivariate methods, vol. 2. Springer, Heidelberg (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Piñero, P., Arco, L., García, M.M., Caballero, Y., Yzquierdo, R., Morales, A. (2003). Two New Metrics for Feature Selection in Pattern Recognition. In: Sanfeliu, A., Ruiz-Shulcloper, J. (eds) Progress in Pattern Recognition, Speech and Image Analysis. CIARP 2003. Lecture Notes in Computer Science, vol 2905. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24586-5_60
Download citation
DOI: https://doi.org/10.1007/978-3-540-24586-5_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20590-6
Online ISBN: 978-3-540-24586-5
eBook Packages: Springer Book Archive