Abstract
Data mining, also known as Knowledge-Discovery in Databases (KDD), is the process of automatically searching large volumes of data for patterns. For instance, a clinical pattern might indicate a female who have diabetes or hypertension are easier suffered from stroke for 5 years in a future. Then, a physician can learn valuable knowledge from the data mining processes. Here, we present a study focused on the investigation of the application of artificial intelligence and data mining techniques to the prediction models of breast cancer. The artificial neural network, decision tree, logistic regression, and genetic algorithm were used for the comparative studies and the accuracy and positive predictive value of each algorithm were used as the evaluation indicators. 699 records acquired from the breast cancer patients at the University of Wisconsin, nine predictor variables, and one outcome variable were incorporated for the data analysis followed by the tenfold cross-validation. The results revealed that the accuracies of logistic regression model were 0.9434 (sensitivity 0.9716 and specificity 0.9482), the decision tree model 0.9434 (sensitivity 0.9615, specificity 0.9105), the neural network model 0.9502 (sensitivity 0.9628, specificity 0.9273), and the genetic algorithm model 0.9878 (sensitivity 1, specificity 0.9802). The accuracy of the genetic algorithm was significantly higher than the average predicted accuracy of 0.9612. The predicted outcome of the logistic regression model was higher than that of the neural network model but no significant difference was observed. The average predicted accuracy of the decision tree model was 0.9435 which was the lowest of all four predictive models. The standard deviation of the tenfold cross-validation was rather unreliable. This study indicated that the genetic algorithm model yielded better results than other data mining models for the analysis of the data of breast cancer patients in terms of the overall accuracy of the patient classification, the expression and complexity of the classification rule. The results showed that the genetic algorithm described in the present study was able to produce accurate results in the classification of breast cancer data and the classification rule identified was more acceptable and comprehensible.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wingo PA, Tong T, Bolden S (1995) Cancer statistics, 1995. CA Cancer J Clin 45(1):8–30
Calle J (2004) Breast cancer facts and figures 2003–2004. Am Cancer Soc 2004:1–27
Jerez-Aragones JM et al (2003) A combined neural network and decision trees model for prognosis of breast cancer relapse. Artif Intell Med 27(1):45–63
Edwards BK et al (2002) Annual report to the nation on the status of cancer, 1973–1999, featuring implications of age and aging on U.S. cancer burden. Cancer 94(10):2766–2792
Pendharkar P, Rodger J, Yaverbaum G (1999) Association, statistical, mathematical and neural approaches for mining breast cancer patterns. Exp Syst Appl 17:223–232
Elmore JG et al (1994) Variability in radiologists’ interpretations of mammograms. N Engl J Med 331(22):1493–1499
Fentiman IS (1998) Detection and treatment of breast cancer. Martin Duntiz, London
Anderson TW (1984) An introduction to multivariate statistical analysis. Willey, New York, NY
Johnson RA, Wichern DW (2002) Applied multivariate statistical analysis. Prentice-Hall, Upper Saddle River, NJ
Kovalerchuk B et al (1997) Fuzzy logic in computer-aided breast cancer diagnosis: analysis of lobulation. Artif Intell Med 11(1):75–85
Barr EAF (1982) The handbook of artificial intelligence, vol 1–3. William Kaufmann, Los Altos, CA
Laurikkala J, Juhola M (1998) A genetic-based machine learning system to discover the diagnostic rules for female urinary incontinence. Comput Methods Programs Biomed 55(3):217–228
Myoung-Jong K, Ingoo H (2003) The discovery of experts’ decision rules from qualitative bankruptcy data using genetic algorithms. Exp Syst Appl 25:637–646
Chen TC, Hsu TC (2006) A GAs based approach for mining breast cancer pattern. Exp Syst Appl 30:674–681
Goldberg DE (1989) Genetic algorithm in search, optimization, and machine learning. Addison-Wesley, Reading, MA
Holland JH (1975) Adaption in natural and artificial systems. The University of Michigan Press, Ann Arbor, MI
Goldberg DE (1994) Genetic and evolutionary algorithms come of age. Comm ACM 37:113–119
Mitchell M (1996) An introduction to genetic algorithms. MIT Press, Cambridge, MA
Forrest S (1993) Genetic algorithms: principles of natural selection applied to computation. Science 261(5123):872–878
Congdon CB (1995) A comparison of genetic algorithms and other machine learning systems on a complex classification task from common disease research. Department of Computer Science and Engineering, University of Michigan
Bali RK et al (2005) Introduction to the special issue on advances in clinical and health-care knowledge management. IEEE Trans Inf Technol Biomed 9(2):157–161
Gurbaxani BM et al (2006) Linear data mining the Wichita clinical matrix suggests sleep and allostatic load involvement in chronic fatigue syndrome. Pharmacogenomics 7(3):455–465
Berger AM, Berger CR (2004) Data mining as a tool for research and knowledge development in nursing. Comput Inform Nurs 22(3):123–131
Hobbs GR (2001) Data mining and healthcare informatics. Am J Health Behav 25(3):285–289
Obenshain MK (2004) Application of data mining techniques to healthcare data. Infect Control Hosp Epidemiol 25(8):690–695
Koh HC, Tan G (2005) Data mining applications in healthcare. J Healthc Inf Manag 19(2):64–72
Bauer RJ (1994) Genetic algorithm and investment strategies. Willey, New York, NY
Kim YS et al (2003) Screening test data analysis for liver disease prediction model using growth curve. Biomed Pharmacother 57(10):482–488
Shin KS, LEE YJ (2002) A genetic algorithm application in bankruptcy prediction model. Exp Syst Appl 23(3):321–328
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. The fourteenth International Joint Conference on Artificial Intelligence 1995. San Francisco, CA.
Breiman L, Friedman JH, Qlshen RA (1984) Classification and regression trees. Wadsworth & Brooks/Cole Advanced Books, Pacific Grove, CA
Kim EK et al (1993) Comparison of neural network and k-NN classification methods in medical image and voice recognitions. Med J Osaka Univ 41–42(1–4):11–16
Richardson CJ, Barlow DJ (1996) Neural network computer simulation of medical aerosols. J Pharm Pharmacol 48(6):581–591
Eghbaldar A et al (1996) Identification of structural features from mass spectrometry using a neural network approach: application to trimethylsilyl derivatives used for medical diagnosis. J Chem Inf Comput Sci 36(4):637–643
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. MIT Press, Cambridge, MA
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media, New York
About this protocol
Cite this protocol
Liou, DM., Chang, WP. (2015). Applying Data Mining for the Analysis of Breast Cancer Data. In: Fernández-Llatas, C., García-Gómez, J. (eds) Data Mining in Clinical Medicine. Methods in Molecular Biology, vol 1246. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-1985-7_12
Download citation
DOI: https://doi.org/10.1007/978-1-4939-1985-7_12
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-1984-0
Online ISBN: 978-1-4939-1985-7
eBook Packages: Springer Protocols