Abstract
We propose a Genetic Algorithm (GA) approach combined with Support Vector Machines (SVM) for the classification of high dimensional Microarray data. This approach is associated to a fuzzy logic based pre-filtering technique. The GA is used to evolve gene subsets whose fitness is evaluated by a SVM classifier. Using archive records of ”good” gene subsets, a frequency based technique is introduced to identify the most informative genes. Our approach is assessed on two well-known cancer datasets and shows competitive results with six existing methods.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natnl. Acad. Sci. USA 96 (1999)
Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z.: Tissue classification with gene expression profiles. Journal of Computational Biology 7(3-4), 559–583 (2000)
Bonilla Huerta, E., Duval, B., Hao, J.K.: Feature space reduction of large scale gene expression data using Fuzzy Logic. Technical Report, LERIA, University of Angers (January 2006)
Brown, M.P.S., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, S.W., Furey, T.S., Ares Jr., M., Haussler, D.: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. U S A. 97(1), 262–267 (2000)
Chao, S., Lihui Feature, C.: dimension reduction for microarray data analysis using locally linear embedding. In: APBC, pp. 211–217 (2005)
Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)
Goh, L., Song, Q., Kasabov, N.: A novel feature selection method to improve classification of gene expression data. In: Proceedings of the Second Asia-Pacific Conference on Bioinformatics, pp. 161–166. Australian Computer Society, Australia (2004)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46(1-3), 389–422 (2002)
Joachims Estimating, T.: the Generalization Performance of a SVM Efficiently. In: Proceedings of the International Conference on Machine Learning (ICML). Morgan Kaufmann, San Francisco (2000)
Jourdan, L.: Metaheuristics for knowledge discovery: Application to genetic data (in French). PhD thesis, University of Lille (2003)
Kim, K.-J., Cho, S.-B.: Prediction of colon cancer using an evolutionary neural network. Neurocomputing (Special Issue on Bioinformatics) 61, 361–379 (2004)
Li, L., Weinberg, C.R., Darden, T.A., Pedersen, L.G.: Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12), 1131–1142 (2001)
Liu, J., Iba, H.: Selecting informative genes using a multiobjective evolutionary algorithm. In: Proc. of Congress on Evolutionary Computation (CEC 2002), pp. 297–302 (2002)
Markowetz, F., Edler, L., Vingron, M.: Support vector machines for protein fold class prediction. Biometrical Journal 45(3), 377–389 (2003)
Mukherjee, S.: Classifying Microarray Data Using Support Vector Machines. Springer, Heidelberg (2003)
Peng, S., Xu, Q., Ling, X.B., Peng, X., Du, W., Chen, L.: Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines. FEBS Letter 555(2), 358–362 (2003)
Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C.H., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J.P., Poggio, T., Gerald, W., Loda, M., Lander, E.S., Golub, T.R.: Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA. 98(26), 15149–15154 (2001)
Reddy, A.R., Deb, K.: Classification of two-class cancer data reliably using evolutionary algorithms. Technical Report. KanGAL (2003)
Saeys, Y., Aeyels Degroeve, S., Rouze, D., Van de Peer, Y.P.: Feature selection for splice site prediction: A new method using eda-based feature ranking. BMC Bioinformatics, 5–64 (2004)
Salcedo-Sanz, S., Prez-Cruz, F., Campsand, G., Bousoo-Calzn, C.: Enhancing genetic feature selection through restricted search and Walsh analysis. IEEE Transactions on Systems, Man and Cybernetics, Part C 34, 398–406 (2004)
Ross, T.J.: Fuzzy Logic with Engineering Applications. McGraw-Hill, New York (1997)
Vapnik, V.N.: Statistical Learning Theory. Wiley N.Y., Chichester (1998)
Wang, Y., Makedon, F., Ford, J.C., Pearlman, J.D.: Hykgene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics 21(8), 1530–1537 (2005)
Zitzlere, E., Deb, K., Thiele, L.: Comparison of multiobjective evolutionary algorithms: Empirical results. Evolutionary Computation 8(2), 173–195 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huerta, E.B., Duval, B., Hao, JK. (2006). A Hybrid GA/SVM Approach for Gene Selection and Classification of Microarray Data. In: Rothlauf, F., et al. Applications of Evolutionary Computing. EvoWorkshops 2006. Lecture Notes in Computer Science, vol 3907. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732242_4
Download citation
DOI: https://doi.org/10.1007/11732242_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33237-4
Online ISBN: 978-3-540-33238-1
eBook Packages: Computer ScienceComputer Science (R0)