Abstract
Lung cancer is the most common cause of cancer death in men and the second leading cause of cancer death in women worldwide. Even though early detection of cancer can aid in the complete cure of the disease, the demand for techniques to detect the occurrence of cancer nodules at an early stage is increasing. Its cure rate and prediction are primarily dependent on early disease detection and diagnosis. Knowledge discovery and data mining have numerous applications in the business and scientific domains that provide useful information in healthcare systems. Therefore, the present work aimed to compare several prediction models as well as the features to be used, with the help of Weka and RapidMiner tools. Both classification and association rules techniques were implemented. The results obtained were quite satisfactory, with emphasis on the Naive Bayes model, which obtained an accuracy of 95.03% for cross-validation 10 folds and 94.59% for percentage split 66%.
This work has been supported by FCT-Fundação para a Ciência e Tecnologia within the R &D Units Project Scope: UIDB/00319/2020.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cancer Online: What is lung cancer?. https://www.cancro-online.pt/cancro-do-pulmao/informacao-basica/o-que-e-o-cancro-do-pulmao/. Accessed 5 Jan 2022
Yang, H.: Data mining in lung cancer pathologic staging diagnosis: Correlation between clinical and pathology information (2015). Accessed 29 Dec 2021
Krishnaiah, V.: Diagnosis of lung cancer prediction system using data mining classification techniques (2013). Accessed 2 Jan 2022
Reis, R., Peixoto, H., Machado, J., Abelha, A.: Machine learning in nutritional follow-up research (2017). https://www.degruyter.com/document/doi/10.1515/comp-2017-0008/html. Accessed 29 Mar 2022
Bhat, M.A.: Lung Cancer (2021). https://www.kaggle.com/datasets/mysarahmadbhat/lung-cancer. Accessed 18 Dec 2021
DevMedia: Data Mining: concepts and use cases in healthcare. https://www.devmedia.com.br/data-mining-conceitos-e-casos-de-uso-na-area-da-saude/5945. Accessed 21 Dec 2021
Horácio, J.: Data driven mindset - O modelo de mineração CRISP-DM. https://jorgeaudy.com/2021/01/29/data-driven-mindset-o-modelo-de-mineracao-crisp-dm/. Accessed 22 Dec 2021
Damasceno, M.: Introduction to Data Mining using Weka. http://connepi.ifal.edu.br/ocs/anais/conteudo/anais/files/conferences/1/schedConfs//papers/258/public/258-4653-1-PB.pdf. Accessed 30 Dec 2021
Garner, S.R.: Weka: The waikato environment for knowledge analysis. In: Proceedings of the New Zealand Computer Science Research Students Conference, vol. 1995, pp. 57–64 (1995). Accessed 29 Mar 2022
iMasters: Data Mining: Association Rules. https://imasters.com.br/back-end/data-mining-na-pratica-regras-de-associacao. Accessed 30 Dec 2021
Santana, R.: Dealing with unbalanced classes - machine learning (2020). https://minerandodados.com.br/lidando-com-classes-desbalanceadas-machine-learning/. Accessed 10 Jan 2022
Fonceca, F., Peixoto, H., Mirande, F., Machado, J., Abelha, A.: Step towards prediction of perineal tear (2017). https://repositorium.sdum.uminho.pt/bitstream/1822/51692/1/3.pdf. Accessed 10 Jan 2022
Neto, C., Peixoto, H., Abelha, V., Abelha, A., Machado, J.: Knowledge discovery from surgical waiting lists (2017). https://www.sciencedirect.com/science/article/pii/S1877050917323438. Accessed 29 Mar 2022
iMasters: Machine Learning: Metrics for Classification Models (2019). https://imasters.com.br/desenvolvimento/machine-learning-metricas-para-modelos-de-classificacao. Accessed 11 Jan 2022
Rodrigues, M., Peixoto, H., Machado, J., Abelha, A.: Understanding stroke in dialysis and chronic kidney disease (2017). https://www.sciencedirect.com/science/article/pii/S1877050917317052. Accessed 29 Mar 2022
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sousa, R., Sousa, R., Peixoto, H., Machado, J. (2023). Prediction Models Applied to Lung Cancer Using Data Mining. In: Braubach, L., Jander, K., Bădică, C. (eds) Intelligent Distributed Computing XV. IDC 2022. Studies in Computational Intelligence, vol 1089. Springer, Cham. https://doi.org/10.1007/978-3-031-29104-3_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-29104-3_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29103-6
Online ISBN: 978-3-031-29104-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)