Abstract
The increased dimensionality of genomic and proteomic data produced by microarray and mass spectrometry technology makes testing and training of general classification method difficult. Special data analysis is demanded in this case and one of the common ways to handle high dimensionality is identification of the most relevant features in the data. Wrapper feature selection is one of the most common and effective techniques for feature selection. Although efficient, wrapper methods have some limitations due to the fact that their result depends on the search strategy. In theory when a complex search is used, it may take much longer to choose the best subset of features and may be impractical in some cases. Hence we propose a new wrapper feature selection for big data based on a random search using genetic algorithm and prior information. The new approach was tested on 2 biological dataset and compared to two well known wrapper feature selection approaches and results illustrate that our approach gives the best performances.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Ben Brahim, A., Bouaguel, W., Limam, M.: 24. In: Combining Feature Selection and Data Classification Using Ensemble Approaches: Application to Cancer Diagnosis and Credit Scoring, pp. 517–532. Taylor & Francis (2014)
Schowe, B., Morik, K.: Fast-ensembles of minimum redundancy feature selection. In: Ensembles in Machine Learning Applications: Studies in Computational Intelligence, vol. 373
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering 17(4), 491–502 (2005)
Karegowda, A.G., Jayaram, M.A., Manjunath, A.: Article: Feature subset selection problem using wrapper approach in supervised learning. International Journal of Computer Applications 1(7), 13–17 (2010). Published By Foundation of Computer Science
Chan, Y.H., Wing, W.Y.N., Daniel, S.Y., Chan, P.P.K.: Empirical comparison of forward and backward search strategies in L-GEM based feature selection with RBFNN. In: Proceedings of the International Conference on Machine Learning and Cybernetics (ICMLC), pp. 1524–1527 (2010)
Yun, C., Shin, D., Jo, H., Yang, J., Kim, S.: An experimental study on feature subset selection methods. In: Proceedings of the 7th IEEE International Conference on Computer and Information Technology. CIT 2007, Washington, DC, USA, pp. 77–82. IEEE Computer Society (2007)
Martínez, H.P., Yannakakis, G.N.: Genetic search feature selection for affective modeling: a case study on reported preferences. In: Proceedings of the 3rd International Workshop on Affective Interaction in Natural Environments. AFFINE 2010, New York, NY, USA, pp. 15–20. ACM (2010)
Feature subset selection using a genetic algorithm. In: Liu, H., Motoda, H. (eds.): Feature Extraction, Construction and Selection. The Springer International Series in Engineering and Computer Science, vol. 453
Bonev, B.: Feature Selection based on Information Theory. Ph.D. thesis, University of Alicante (2010)
Kumar, G., Kumar, K.: A novel evaluation function for feature selection based upon information theory. In: Proceedings of the Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 395–399 (2011)
Al-Ani, A., Deriche, M.: An optimal feature selection technique using the concept of mutual information. In: Proceedings of the Sixth International Symposium on Signal Processing and its Applications, pp. 477–480 (2001)
Zhang, H., Sun, G.: Feature selection using tabu search method. 35(3), 701–711 (2002)
Ramirez, R., Puiggros, M.: A genetic programming approach to feature selection and classification of instantaneous cognitive states. In: Giacobini, M. (ed.) Applications of Evolutionary Computing. Lecture Notes in Computer Science, vol. 4448, pp. 311–319. Springer, Heidelberg (2007)
Holland, J.H.: Adaptation in natural and artificial systems. MIT Press, Cambridge (1992)
Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla, L.M., Angelo, M., McLaughlin, M.E., Kim, J.Y.H., Goumnerova, L.C., Black, P.M., Lau, C., Allen, J.C., Zagzag, D., Olson, J.M., Curran, T., Wetmore, C., Biegel, J.A., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D.N., Mesirov, J.P., Lander, E.S., Golub, T.R.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870), 436–442 (2002)
Okun, O.: Feature Selection and Ensemble Methods for Bioinformatics: Algorithmic Classification and Implementations (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Bouaguel, W. (2016). A New Approach for Wrapper Feature Selection Using Genetic Algorithm for Big Data. In: Lavangnananda, K., Phon-Amnuaisuk, S., Engchuan, W., Chan, J. (eds) Intelligent and Evolutionary Systems. Proceedings in Adaptation, Learning and Optimization, vol 5. Springer, Cham. https://doi.org/10.1007/978-3-319-27000-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-27000-5_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26999-3
Online ISBN: 978-3-319-27000-5
eBook Packages: EngineeringEngineering (R0)