Abstract
Feature selection in machine learning and data mining facilitates the optimization of accuracy attained from the classifier with smallest number of features. The use of feature selection in microarray data mining is quite promising. However, usually it is hard to identify and select the feature genes from microarray data sets because multi-class categories and high dimensionality features exist in microarray data with a small-sized sample. Therefore, using good selection approaches to eliminate incomprehensibility and optimize prediction accuracy is becoming necessary, because it will help obtain genes that are relevant to sample classification when investigating large number of genes. In his paper, we propose a new feature selection method for microarray data sets. The method consists of the Gain Ratio (GR) and Improved Gene Expression Programming (IGEP) algorithms which are for gene filtering and feature selection respectively. Support Vector Machine (SVM) alongside with leave-one-out cross-validation (LOOCV) method was used to evaluate the proposed method on eight microarray datasets captured in the literature. The experimental results showed the effectiveness of the proposed method in selecting small number of features while generating higher classification accuracies compared with other existing feature selection approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chuang, L.-Y., Ke, C.-H., Yang, C.-H.: A hybrid both filter and wrapper feature selection method for microarray classification. arXiv:1612.08669 (2016)
Guo, S., et al.: A centroid-based gene selection method for microarray data classification. J. Theor. Biol. 400, 32–41 (2016)
Dashtban, M., Balafar, M.: Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts. Genomics 109(2), 91–107 (2017)
Yang, C.-H., Chuang, L.-Y., Yang, C.H.: IG-GA: a hybrid filter/wrapper method for feature selection of microarray data. J. Med. Biol. Eng. 30(1), 23–28 (2010)
Chinnaswamy, A., Srinivasan, R.: Hybrid feature selection using correlation coefficient and particle swarm optimization on microarray gene expression data. In: Innovations in Bio-Inspired Computing and Applications, pp. 229–239. Springer (2016)
Algamal, Z.: An efficient gene selection method for high-dimensional microarray data based on sparse logistic regression. Electron. J. Appl. Stat. Anal. 10(1), 242–256 (2017)
Lu, H., et al.: A hybrid feature selection algorithm for gene expression data classification. Neurocomputing (2017)
Pino Angulo, A.: Gene selection for microarray cancer data classification by a novel rule-based algorithm. Information 9(1), 6 (2018)
Jain, I., Jain, V.K., Jain, R.: Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl. Soft Comput. 62, 203–215 (2018)
Cheng, Q., Zhou, H., Cheng, J.: The fisher-markov selector: fast selecting maximally separable feature subset for multiclass classification with applications to high-dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 33(6), 1217–1233 (2011)
Chuang, L.-Y., Yang, C.-H., Yang, C.-H.: Tabu search and binary particle swarm optimization for feature selection using microarray data. J. Comput. Biol. 16(12), 1689–1703 (2009)
Ferreira, C.: Gene expression programming in problem solving. In: Soft Computing and Industry, pp. 635–653. Springer (2002)
Azzawi, H., Hou, J., Xiang, Y., Alanni, R.: Lung cancer prediction from microarray data by gene expression programming. IET Syst. Biol. (2016)
Yu, Z., Lu, H., Si, H., Liu, S., Li, X.: A highly efficient gene expression programming (GEP) model for auxiliary diagnosis of small cell lung cancer. PLoS ONE 10(5), e0125517 (2015)
Peng, Y.Z., Yuan, C.A., Qin, X., Huang, J.T., Shi, Y.B.: An improved Gene Expression Programming approach for symbolic regression problems. Neurocomputing 137, 293–301 (2014)
Kusy, M., Obrzut, B., Kluska, J.: Application of gene expression programming and neural networks to predict adverse events of radical hysterectomy in cervical cancer patients. Med. Biol. Eng. Comput. 51(12), 1357–1365 (2013)
Yu, Z., Chen, X.Z., Cui, Si, H.Z.: Prediction of lung cancer based on serum biomarkers by gene expression programming methods. Asian Pac. J. Cancer Prev. 15(21), 9367–9373 (2014)
Alanni, R., Hou, J., Abdu-aljabar, R., Xiang, X.: Prediction of NSCLC recurrence from microarray data with GEP. IET Syst. Biol. 11(3), 77–85 (2017)
Azzawi, H., Hou, J., Alanni, R., Xiang, Y.: Multiclass lung cancer diagnosis by gene expression programming and microarray datasets. In: International Conference on Advanced Data Mining and Applications. Springer (2017)
Tan, P.L., Tan, S.C., Lim, C.P., Khor, S.E.: A modified two-stage SVM-RFE model for cancer classification using microarray data. In: International Conference on Neural Information Processing. Springer (2011)
Martínez, J., Iglesias, C., Matías, J.M., Taboada, J.M., Araújo, M.: Solving the slate tile classification problem using a DAGSVM multiclassification algorithm based on SVM binary classifiers with a one-versus-all approach. Appl. Math. Comput. 230, 464–472 (2014)
Afshar, H.L., Ahmadi, M., Roudbari, M., Sadoughi F.: Prediction of breast cancer survival through knowledge discovery in databases. Glob. J. Health Sci. 7(4), 392 (2015)
Le Thi, H.A., Nguyen, M.C.: DCA based algorithms for feature selection in multi-class support vector machine. Ann. Oper. Res. 249(1), 273–300 (2017)
Rajaguru, H., Ganesan, K., Bojan, V.K.: Earlier detection of cancer regions from MR image features and SVM classifiers. Int. J. Imaging Syst. Technol. 26(3), 196–208 (2016)
Priyadarsini, R.P., Valarmathi, M., Sivakumari, S.: Gain ratio based feature selection method for privacy preservation. ICTACT J. Soft Comput. 1(04), 20011 (2011)
Karegowda, A.G., Manjunath, A., Jayaram, M.: Comparative study of attribute selection using gain ratio and correlation based feature selection. Int. J. Inf. Technol. Knowl. Manage. 2(2), 271–277 (2010)
Yang, P., Zhou, B., Zhang, Z.: A multi-filter enhanced genetic ensemble system for gene selection and sample classification of microarray data. BMC Bioinform. 11(1), 1 (2010)
Witten, I.H., et al.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann (2016)
Golberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning, p. 102. Addison Wesley (1989)
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection, vol. 1. MIT Press (1992)
Hearst, M.A., et al.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)
Vanitha, C.D.A., Devaraj, D., Venkatesulu, M.: Gene expression data classification using support vector machine and mutual information-based gene selection. Procedia Comput. Sci. 47, 13–21 (2015)
Su, A.I., Welsh, J.B., Sapinoso, L.M.: Molecular classification of human carcinomas by use of gene expression signatures. Can. Res. 61(20), 7388–7393 (2001)
Staunton, J.E., et al.: Chemosensitivity prediction by transcriptional profiling. Proc. Natl. Acad. Sci. 98(19), 10787–10792 (2001)
Pomeroy, S.L., et al.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870), 436 (2002)
Nutt, C.L., et al.: Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Can. Res. 63(7), 1602–1607 (2003)
Golub, T.R., Slonim, D.K., Tamayo, P.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Armstrong, S.A., et al.: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 30(1), 41 (2002)
Bhattacharjee, A., Richards, W.G., Staunton, J.: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl. Acad. Sci. 98(24), 13790–13795 (2001)
Singh, D., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)
Moraglio, A., Di Chio, C., Poli, R.: Geometric particle swarm optimisation. In: European Conference on Genetic Programming. Springer (2007)
Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading, MA (1989)
Karaboga, D., Basturk, B.: Artificial bee colony (ABC) optimization algorithm for solving constrained optimization problems. In: International Fuzzy Systems Association World Congress. Springer (2007)
Thomas, J.: GEP4J (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Alanni, R., Hou, J., Azzawi, H., Xiang, Y. (2019). New Gene Selection Method Using Gene Expression Programing Approach on Microarray Data Sets. In: Lee, R. (eds) Computer and Information Science. ICIS 2018. Studies in Computational Intelligence, vol 791. Springer, Cham. https://doi.org/10.1007/978-3-319-98693-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-98693-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98692-0
Online ISBN: 978-3-319-98693-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)