Abstract
Lung cancer is a heterogeneous disease based on uncontrollable growth of cells. Lung cancer is major cause of cancer-related deaths. Early diagnosis of lung cancer is important for its treatment and survival of patients. In this study, through the statistical analysis of cancerous proteins sequences, we observed the mutated genes associated with etiology of lung cancer. Our analysis revealed most frequent mutated genes TP53, EGFR, KMT2D, PDE4DIP, ATM, ZNF521, DICER1, CTNNB1 RUNX1T1, SMARCA4, FBXW7, NF1, PIK3CA, STK11, NTRk3, APC, PTPRB, BRCA2, MYH11 and AMER1. We observed abnormal mutations in genes contributed toward variations in the composition of amino acid sequences. This variation was described in various feature spaces using statistical and physicochemical properties of amino acids. These influential features have provided sufficient discrimination power for the development of effective lung cancer classification models (LCCMs). The main advantage of proposed novel approach is the effective utilization of the discriminant information of mutated genes. Experimental results showed that SVM model has the best performance in split amino acid composition. In the study, we explored a new dimension of early lung cancer classification using discriminant information of mutated genes revealed through the statistical analysis of the mutated genes. It is anticipated that the proposed approach would be useful for practitioners and domain experts for early lung cancer diagnosis and prognosis.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Torre, L.A.; Siegel, R.L.; Ward, E.M.; Jemal, A.: Global cancer incidence and mortality rates and trends: an update. Cancer Epidemiol. Biomark. Prev. 25(1), 16–27 (2016)
Stoppler, M.C.: Lung cancer facts. https://www.medicinenet.com/lung_cancer/article.htm#lung_cancer_facts. Accessed 10 Jan 2018
Stoppler, M.C.: Causes of lung cancer in non-smokers. https://www.medicinenet.com/script/main/art.asp?articlekey=53012. Accessed 11 Jan. 2018
Siegel, R.L.; Miller, K.D.; Jemal, A.: Cancer statistics, 2018. CA Cancer J. Clin. 68(1), 7–30 (2018)
Luqman, M.; Javed, M.M.; Daud, S.; Raheem, N.; Ahmad, J.; Khan, A.-U.-H.: Risk factors for lung cancer in the Pakistani population. Asia Pac. J. Cancer Prev. 15(7), 3035–3039 (2014)
Gilad, S.; Lithwick-Yanai, G.; Barshack, I.; Benjamin, S.; Krivitsky, I.; Edmonston, T.B.; Bibbo, M.; Thurm, C.; Horowitz, L.; Huang, Y.; Feinmesser, M.; Steve Hou, J.; Cyr, B.; Burnstein, I.; Gibori, H.; Dromi, N.; Sanden, M.; Kushnir, M.; Aharonov, R.: Classification of the four main types of lung cancer using a microRNA-based diagnostic assay. J. Mol. Diagn. 14(5), 510–517 (2012)
Lee, K.J.; Lee, J.H.; Chung, H.K.; Choi, J.; Park, J.; Park, S.S.; Ju, E.J.; Park, J.; Shin, S.H.; Park, H.J.; Ko, E.J.; Suh, N.; Kim, I.; Hwang, J.J.; Song, S.Y.; Jeong, S.-Y.; Choi, E.K.: Novel peptides functionally targeting in vivo human lung cancer discovered by in vivo peptide displayed phage screening. Amino Acids 47(2), 281–289 (2015)
Cheung, C.H.Y.; Juan, H.: Quantitative proteomics in lung cancer. J. Biomed. Sci. 24(1), 37–47 (2017)
Detterbeck, F.C.; Boffa, D.J.; Kim, A.W.; Tanoue, L.T.: The eighth edition lung cancer stage classification. Chest 151(1), 193–203 (2017)
Consortium, T.U.: UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45(D1), D158–D169 (2017)
Fraser, A.: Essential human genes. Cell Syst. 1(6), 381–382 (2015)
Dela-Cruz, C.S.; Tanoue, L.T.; Matthay, R.A.: Lung cancer: epidemiology, etiology, and prevention. Clin. Chest Med. 32(4), 605–644 (2011)
Ho, V.; Parent, M.-E.; Pintos, J.; Abrahamowicz, M.; Danieli, C.; Richardson, L.; Bourbonnais, R.; Gauvin, L.; Siemiatycki, J.; Koushik, A.: Physical activity and lung cancer risk in men and women. Cancer Causes Control 28(4), 309–318 (2017)
Halvorsen, A.R.; Silwal-Pandit, L.; Meza-Zepeda, L.A.; Vodak, D.; Vu, P.; Sagerup, C.; Hovig, E.; Myklebost, O.; Børresen-Dale, A.-L.; Brustugun, O.T.; Helland, Å.: TP53 mutation spectrum in smokers and never smoking lung cancer patients. Front. Genet. 7, 85 (2016). https://doi.org/10.3389/fgene.2016.00085
Forbes, S.A.; Beare, D.; Boutselakis, H.; Bamford, S.; Bindal, N.; Tate, J.; Cole, C.G.; Ward, S.; Dawson, E.; Ponting, L.; Stefancsik, R.; Harsha, B.; Kok, C.Y.; Jia, M.; Jubb, H.; Sondka, Z.; Thompson, S.; De, T.; Campbell, P.J.: COSMIC: somatic cancer genetics at high-resolution (2017). https://doi.org/10.1093/nar/gkw1121
NIH: TCGA: The Cancer Genome Atalas. https://cancergenome.nih.gov. Accesses 25 Sept. 2017
Augert, A.; Zhang, Q.; Bates, B.; Cui, M.; Wang, X.; Wildey, G.; Dowlati, A.; MacPherson, D.: Small cell lung cancer exhibits frequent inactivating mutations in the histone methyltransferase KMT2D/MLL2: CALGB 151111 (Alliance). J. Thorac. Oncol. 12(4), 704–713 (2017)
Ramani, R.G.; Jacob, S.G.: Improved classification of lung cancer tumors based on structural and physicochemical properties of proteins using data mining models. PLoS ONE 8(3), e58772 (2013). https://doi.org/10.1371/journal.pone.0058772
Hosseinzadeh, F.; KayvanJoo, A.H.; Ebrahimi, M.; Goliaei, B.: Prediction of lung tumor types based on protein attributes by machine learning algorithms. SpringerPlus 2, 238 (2013). https://doi.org/10.1186/2193-1801-2-238
Li, J.; Ching, T.; Huang, S.; Garmire, L.X.: Using epigenomics data to predict gene expression in lung cancer. BMC Bioinform. 16(5), 5–10 (2015)
Zhang, Y.; Elgizouli, M.; Schöttker, B.; Holleczek, B.; Nieters, A.; Brenner, H.: Smoking-associated DNA methylation markers predict lung cancer incidence. Clin. Epigenetics 8, 127 (2016). https://doi.org/10.1186/s13148-016-0292-4
Salim, A.; Amjesh, R.; Vinod, C.S.S.: SVM based lung cancer prediction using microRNA expression profiling from NGS data. Paper Presented at the Asian Conference on Intelligent Information and Database Systems, vol. 38, pp. 599–609 (2016)
Velazquez, E.R.; Parmar, C.; Liu, Y.; Coroller, T.P.; Cruz, G.; Stringfield, O.; Ye, Z.; Makrigiorgos, M.; Fennessy, F.; Mak, R.H.; Gillies, R.; Quackenbush, J.; Aerts, H.J.W.L.: Somatic mutations drive distinct imaging phenotypes in lung cancer. Cancer Res. 77(14), 3922–3930 (2017)
Ji-Yeon, Y.; Yoshihara, K.; Tanaka, K.; Hatae, M.; Masuzaki, H.; Itamochi, H.; Takano, M.; Ushijima, K.; Tanyi, J.L.; Coukos, G.; Lu, Y.; Mills, G.B.; Verhaak, R.G.W.: Predicting time to ovarian carcinoma recurrence using protein markers. J. Clin. Invest. 123(9), 3740–3750 (2013)
Ali, S.; Majid, A.: Can-Evo-Ens: classifier stacking based evolutionary ensemble system for prediction of human breast cancer using amino acid sequences. J. Biomed. Inform. 54, 256–269 (2015)
Munteanu, C.R.; Magalhães, A.L.; Uriarte, E.; González-Díaz, H.: Multi-target QPDR classification model for human breast and colon cancer-related proteins using star graph topological indices. J. Theor. Biol. 257, 303–311 (2009)
Ali, S.; Majid, A.; Khan, A.: IDM-PhyChm-Ens: intelligent decision-making ensemble methodology for classification of human breast cancer using physicochemical properties of amino acids. Amino Acids 46(4), 977–993 (2014)
Robertson, W.W.; Steliga, M.A.; Siegel, E.R.; Arnaoutakis, K.: Accuracy of fine needle aspiration and core lung biopsies to predict histology in patients with non-small cell lung cancer. Med. Oncol. 31(6), 967 (2014). https://doi.org/10.1007/s12032-014-0967-7
Online Mendelian Inheritance in Man (OMIM). Johns Hopkins University, Baltimore. https://www.omim.org/. Accessed October 10 (2017)
Smedley, D.; Haider, S.; Ballester, B.; Holland, R.; London, D.; Thorisson, G.; Kasprzyk, A.: BioMart: biological queries made easy. BMC Genom. 10(1), 22 (2009). https://doi.org/10.1186/1471-2164-10-22
Zerbino, D.R.; Achuthan, P.; Akanni, W.; Amode, M.R.; Barrell, D.; Bhai, J.; Billis, K.; Cummins, C.; Gall, A.; Girón, C.G.; Gil, L.; Gordon, L.; Haggerty, L.; Haskell, E.; Hourlier, T.; Izuogu, O.G.; Janacek, S.H.; Juettemann, T.; To, J.K.; Laird, M.R.; Lavidas, I.; Liu, Z.; Loveland, J.E.; Maurel, T.; McLaren, W.; Moore, B.; Mudge, J.; Murphy, D.N.; Newman, V.; Nuhn, M.; Ogeh, D.; Ong, C.K.; Parker, A.; Patricio, M.; Riat, H.S.; Schuilenburg, H.; Sheppard, D.; Sparrow, H.; Taylor, K.; Thormann, A.; Vullo, A.; Walts, B.; Zadissa, A.; Frankish, A.; Hunt, S.E.; Kostadima, M.; Langridge, N.; Martin, F.J.; Muffato, M.; Perry, E.; Ruffier, M.; Staines, D.M.; Trevanion, S.J.; Aken, B.L.; Cunningham, F.; Yates, A.; Flicek, P.: Ensembl 2018. Nucleic Acids Res. 46(D1), D754–D761 (2018). https://doi.org/10.1093/nar/gkx1098
Mirza, M.T.; Khan, A.; Tahir, M.; Lee, Y.S.: MitProt-Pred: predicting mitochondrial proteins of plasmodium falciparum parasite using diverse physiochemical properties and ensemble classification. Comput. Biol. Med. 43(10), 1502–1511 (2013)
Chen, C.; Zhou, X.; Tian, Y.; Zou, X.; Cai, P.: Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. Anal. Biochem. 357, 116–121 (2006)
Limongelli, I.; Marini, S.; Bellazzi, R.: PaPI: pseudo amino acid composition to score human protein-coding variants. BMC Bioinform. 16, 123 (2015). https://doi.org/10.1186/s12859-015-0554-8
Chou, K.C.; Zhang, C.T.: Prediction of protein structural classes. Crit. Rev. Biochem. Mol. Biol. 30(4), 275–349 (1995)
Sugiyama, M.: Introduction to Statistical Machine Learning, pp. 237–244. Morgan Kaufmann, Boston (2016)
Theodoridis, S.: Machine Learning: A Bayesian and Optimization Prospective. Elsevier, Hoboken (2015)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin (1999)
Duda, R.O.; Hart, P.E.; Stork, D.G.: Pattern Classification, 2nd edn. Wiley, Hoboken (2000)
Python Software Foundation. https://www.python.org/. Accessed June 2017
Jiao, Y.; Du, P.: Performance measures in evaluating machine learning based bioinformatics predictors for classifications. Quant. Biol. 4(4), 320–330 (2016)
Tom, F.: ROC graphs: notes and practical considerations for researchers. Mach. Learn. 31, 1–38 (2004)
Kuijjer, M.L.; Paulson, J.N.; Salzman, P.; Ding, W.; Quackenbush, J.: Cancer subtype identification using somatic mutation data. Br. J. Cancer 118, 1492–1501 (2018)
Weng, T.-Y.; Wang, C.-Y.; Hung, Y.-H.; Chen, W.-C.; Chen, Y.-L.; Lai, M.-D.: Differential expression pattern of THBS1 and THBS2 in lung cancer: clinical outcome and a systematic-analysis of microarray databases. PLoS ONE 11(8), e0161007 (2016). https://doi.org/10.1371/journal.pone.0161007
Liu, J.X.; Gao, Y.L.; Xu, Y.; Zheng, C.H.; You, J.: Differential expression analysis on RNA-seq count data based on penalized matrix decomposition. IEEE Trans. Nanobiosci. 13(1), 12–18 (2014)
Liu, J.-X.; Wang, Y.-T.; Zheng, C.-H.; Sha, W.; Mi, J.-X.; Xu, Y.: Robust PCA based method for discovering differentially expressed genes. BMC Bioinform. 14(8), S3 (2013). https://doi.org/10.1186/1471-2105-14-s8-s3
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Sattar, M., Majid, A. Lung Cancer Classification Models Using Discriminant Information of Mutated Genes in Protein Amino Acids Sequences. Arab J Sci Eng 44, 3197–3211 (2019). https://doi.org/10.1007/s13369-018-3468-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-018-3468-8