Abstract
A major difficulty in bioinformatics is due to the size of the datasets, which contain frequently large numbers of variables. In this study, we present a two-step procedure for feature selection. In a first “filtering” stage, a relatively small subset of features is identified on the basis of several criteria. In the second stage, the importance of the selected variables is evaluated based on the frequency of their participation in relevant patterns and low impact variables are eliminated. This step is applied iteratively, until arriving to a Pareto-optimal “support set”, which balances the conflicting criteria of simplicity and accuracy.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Alexe, G., S. Alexe, D.E. Axelrod, T.O. Bonates, I.I. Lozina, M. Reiss, and P.L. Hammer. (2006). “Breast Cancer Prognosis by Combinatorial Analysis of Gene Expression Data.” Breast Cancer Research, 8(4),R41.
Alexe, G., S. Alexe, and P.L. Hammer. (2006). “Pattern-based Clustering and Attribute Analysis.” Soft Computing,10(5), 442–452.
Alexe, G., S. Alexe, P.L. Hammer, L. Liotta, E. Petricoin, and M. Reiss. (2004). “Ovarian Cancer Detection by Logical Analysis of Proteomic Data.” Proteomics, 3, 766–783.
Alexe S., E. Blackstone, P.L. Hammer, H. Ishwaran, M.S. Lauer, and C.E. Pothier Snader. (2003). “Coronary Risk Prediction by Logical Analysis of Data.” Annals of Operations Research, 119, 15–42.
Alexe, S. and P.L. Hammer. (2006). “Accelerated Algorithm for Pattern Detection in Logical Analysis of Data.” Discrete Applied Mathematics, 154(7), 1050–1063.
Boros, E., P.L. Hammer, T. Ibaraki, and A. Kogan. (1997). “Logical Analysis of Numerical Data.” Mathematical Programming, 79, 163–190.
Boros, E., P.L. Hammer, T. Ibaraki, A. Kogan, E. Mayoraz, and I. Muchnik. (2000). “An Implementation of Logical Analysis of Data.” IEEE Transactions on Knowledge and Data Engineering, 12(2), 292–306.
Bradley, P.S. and O.L. Mangasarian. (1998). “Feature Selection Via Concave Minimization and Support Vector Machines.” In J. Shavlik, (ed.), Proceedings of the Fifteenth International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA, pp. 82–90.
Chtioui, Y., D. Bertrand, and D. Barba. (1998), “Feature Selection by a Genetic Algorithm.” Application to seed Discrimination by Artificial Vision. Journal of the Science of Food and Agriculture, 76(1), 77–86.
Crama, Y., P.L. Hammer, and T. Ibaraki. (1988). “Cause-Effect Relationships and Partially Defined Boolean Functions.” Annals of Operations Research, 16, 299–326.
Dash, M. and H. Liu. (1997). “Feature Selection for Classification.” Intelligent Data Analysis, 1(3), 131–156.
Golub, T.R., D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander. (1999). “Molecular Classification of Cancer; Class Discovery and Class Prediction by Gene Expression Monitoring.” Science, 286(5439), 531–537.
Koda, Y. and F.A. Ruskey. (1993). “Gray Code for the Ideals of a Forest Poset.” Journal of Algorithms, 15, 324–340.
Leray, P. and P. Gallinari. (1999). “Feature Selection with Neural Networks.” Behaviormetrika, 26(1).
Liu, H. and H. Motoda. (1998a). Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer Academic Publishers.
Liu, H. and H. Motoda. (1998b). Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers.
Petricoin, E.F., A.M. Ardekani, B.A. Hitt, P.J. Levine, V.A. Fusaro, S.M. Steinberg, G.B. Mills, C. Simone, D.A. Fishman, E.C. Kohn, and L.A. Liotta. (2002). “Use of Proteomic Patterns in Serum to Identify Ovarian Cancer.” The Lancet, 359(9306), 572–577.
Setiono, R. and H. Liu. (1997). “Neural Network Feature Selector.” IEEE Transactions on Neural Networks, 8(3), 654–662.
Shipp M.A., K.N. Ross, P. Tamayo, A.P. Weng, J.L. Kutok, R.C.T. Aguiar, M. Gaasenbeek, M. Angelo, M. Reich, G.S. Pinkus, T.S. Ray, M. Koval, A.K.W. Last, A. Norton, T.A. Lister, J. Mesirov, D.S. Neuberg, E.S. Lander, J.C. Aster, and T.R. Golub. (2002). “Diffuse Large B-Cell Lymphoma Outcome Prediction by Gene Expression Profiling and Supervised Machine Learning.” Nature Medicine, 1(8), 68–74.
Author information
Authors and Affiliations
Corresponding author
Additional information
All authors contributed equally to this manuscript.
Rights and permissions
About this article
Cite this article
Alexe, G., Alexe, S., Hammer, P.L. et al. Pattern-based feature selection in genomics and proteomics. Ann Oper Res 148, 189–201 (2006). https://doi.org/10.1007/s10479-006-0084-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-006-0084-x