Abstract
The massive growth of Big Data kickstarted a new era for data analytics and knowledge discovery. Data mining algorithms are employed to analyze different types of data, which reside in complex information networks. Researchers focus on producing usable knowledge by taking advantage of opportunities in various domains (e.g., healthcare, social media, energy etc.). Epidemics and disease outbreaks raised concerns about effective infectious disease management in communities around the world. Therefore, they encourage the use of AI methods for management and prevention, in order to mitigate disease spread, and contain outbreaks. This work engages in predictive analytics, utilizing classification, as well as descriptive analytics utilizing association rule mining and clustering, which are widely used in healthcare and medicine, either for predicting outbreaks or for extracting usable information from healthcare and medical data. Certain steps need to be considered when attempting to perform data analysis, such as data extraction, cleaning, preprocessing, transformation, interpretation and evaluation. The experimental part of this chapter integrates widely used datasets retrieved from the UCI Machine Learning Repository related with the healthcare domain. This chapter offers a literature review on data mining in epidemics, while thoroughly discussing all the aforementioned concepts. It also presents a complete process/cycle of the required steps to analyze data retrieved from healthcare and medical sources. Hence, the research questions addressed can be summarized to the following: Q1. Which are the pervasive types of analytics involving the domains of medicine and healthcare? Q2. How is data mining performed in the fields of healthcare and medicine? Q3. Which are the widespread techniques and methods utilized? These questions are discussed and elaborated, through a concise, informative and educational narration.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Abbreviations
- Term :
-
Definition
- API (Application Programming Interface):
-
A set of functions and protocols that enables the data transmission and communication between software applications.
- Data mining algorithms:
-
Mathematical and computational expressions of patterns found in datasets.
- Data point:
-
An observation derived from a set of one or more measurements, presented either numerically or graphically.
- Feature:
-
An attribute or variable of a dataset that can be used for analysis.
- Instance:
-
A subset of the overall dataset or a single row of data.
References
Ibrahim, N., Akhir, N.S.M., Hassan, F.H.: Predictive analysis effectiveness in determining the epidemic disease infected area. AIP Conf. Proc. 1891(1), 020064 (2017)
Suggala, R.K.: A Survey on Prediction and Detection of Epidemic Diseases Outbreaks (2019)
Thapen, N., Simmie, D., Hankin, C., Gillard, J.: Defender: detecting and forecasting epidemics using novel data-analytics for enhanced response. PloS One 11(5), e0155417 (2016). https://doi.org/10.1371/journal.pone.0155417
Ravì, D., Wong, C., Deligianni, F., Berthelot, M., Andreu-Perez, J., Lo, B., Yang, G.Z.: Deep learning for health informatics. IEEE J. Biomed. Health Inform. 21(1), 4–21 (2016)
Christaki, E.: New technologies in predicting, preventing and controlling emerging infectious diseases. Virulence 6(6), 558–565 (2015)
Koukaras, P., Rousidis, D., Tjortjis, C.: Forecasting and prevention mechanisms using social media in healthcare. Adv. Comput. Intell. Healthc. 7(2020), 121–137 (2020)
Leopord, H., Cheruiyot, W.K., Kimani, S.: A survey and analysis on classification and regression data mining techniques for diseases outbreak prediction in datasets. Int. J. Eng. Sci 5(9), 1–11 (2016)
Zhang, S., Tjortjis, C., Zeng, X., Qiao, H., Buchan, I., Keane, J.: Comparing data mining methods with logistic regression in childhood obesity prediction. Inf. Syst. Front. J. 11(4), 449–460 (2009)
Yoo, I., Alafaireet, P., Marinov, M., Pena-Hernandez, K., Gopidi, R., Chang, J.F., Hua, L.: Data mining in healthcare and biomedicine: a survey of the literature. J. Med. Syst. 36(4), 2431–2448 (2012)
Tjortjis, C., Saraee, M., Theodoulidis, B., Keane, J.A.: Using T3, an improved decision tree classifier, for mining stroke related medical data. Methods Inf. Med. 46(5), 523–529 (2007)
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Mag. 17(3), 37–37 (1996)
Liao, S.H., Chu, P.H., Hsiao, P.Y.: Data mining techniques and applications–a decade review from 2000 to 2011. Expert Syst. Appl. 39(12), 11303–11311 (2012)
Murdoch, W.J., Singh, C., Kumbier, K., Abbasi-Asl, R., Yu, B.: Definitions, methods, and applications in interpretable machine learning. Proc. Natl. Acad. Sci. 116(44), 22071–22080 (2019)
Sharma, V., Kumar, A., Panat, L., Karajkhede, G., Lele, A.: Malaria outbreak prediction model using machine learning. Int. J. Adv. Res. Comput. Eng. Technol. (IJARCET) 4(12) (2015).
Rovatsos, M., Mittelstadt, B., Koene, A.: Landscape Summary: Bias in Algorithmic Decision-Making. Centre for Data Ethics and Innovation (2019)
Bellinger, C., Jabbar, M.S.M., Zaïane, O., Osornio-Vargas, A.: A systematic review of data mining and machine learning for air pollution epidemiology. BMC Public Health 17(1), 907 (2017)
Sumathi, S., Sivanandam, S.N.: Data mining tasks, techniques, and applications. In: Introduction to Data Mining and Its Applications, pp. 195–216 (2006)
Gheware, S.D., Kejkar, A.S., Tondare, S.M.: Data mining: task, tools, techniques and applications. Int. J. Adv. Res. Comput. Commun. Eng., 3(10) (2014)
Assamnew, S.: Predicting the occurrence of measles outbreak in Ethiopia using data mining technology (Doctoral dissertation, Addis Ababa University) (2011)
Traore, B.B., Kamsu-Foguem, B., Tangara, F.: Data mining techniques on satellite images for discovery of risk areas. Expert Syst. Appl. 72, 443–456 (2017)
Ahmed, K.P.: Analysis of data mining tools for disease prediction. J. Pharm. Sci. Res. 9(10), 1886–1888 (2017)
Tzirakis, P., Tjortjis, C.: T3C: Improving a decision tree classification algorithm’s interval splits on continuous attributes. Adv. Data Anal. Classif. 11(2), 353–370 (2017)
Tjortjis, C., Keane, J.A.: T3: an Improved classification algorithm for data mining. Lect. Notes Comput. Sci. 2412, 50–55 (2002)
Kanellopoulos, Y., Antonellis, P., Tjortjis, C., Makris, C., Tsirakis, N.: k-attractors: a partitional clustering algorithm for numeric data analysis. Appl. Artif. Intell. 25(2), 97–115 (2011)
Ghafari, S.M.; Tjortjis, C. (2019). A Survey on association rules mining using heuristics. WIREs Data Min. Knowl. Discov. 9(4)
Yakhchi, S., Ghafari, S.M., Tjortjis, C., Fazeli, M.: ARMICA-improved: a new approach for association rule mining. Lect. Notes AI 10412, 296–306 (2017)
Ghafari, S.M., Tjortjis, C.: Association rules mining by improving the imperialism competitive algorithm (ARMICA). In: IFIP Proceedings 12th International Conference on Artificial Intelligence Applications & Innovations (AIAI 2016), vol. 475, pp. 242–254. Springer (2016).
Wang, C., Tjortjis, C.: PRICES: an efficient algorithm for mining association rules. Lect. Notes Comput. Sci. 3177, 352–358 (2004)
Dong, L., Tjortjis, C.: Experiences of using a quantitative approach for mining association rules. Lect. Notes Comput. Sci. 2690, 693–700 (2003)
Buczak, A.L., Koshute, P.T., Babin, S.M., Feighner, B.H., Lewis, S.H.: A data-driven epidemiological prediction method for dengue outbreaks using local and remote sensing data. BMC Med. Inform. Decis. Making 12(1) (2012)
Tarmizi, N.D.A., Jamaluddin, F., Bakar, A.A., Othman, Z.A., Hamdan, A.R.: Classification of dengue outbreak using data mining models. Res. Notes Inf. Sci. 12, 71–75 (2013)
Hamami, D., Atmani, B., Cameron, R., Pollock, K.G., Shankland, C.: Improving process algebra model structure and parameters in infectious disease epidemiology through data mining. J. Intell. Inf. Syst. 1–23 (2019)
Fan, Q., Yao, X.A., Dang, A.: Spatiotemporal analysis and data mining of the 2014–2016 Ebola virus disease outbreak in West Africa. In: Geospatial Technologies for Urban Health, pp. 181–208. Springer, Cham (2020)
Mustaqeem, A., Anwar, S.M., Majid, M.: Multiclass classification of cardiac arrhythmia using improved feature selection and SVM invariants. Comput. Math. Methods Med (2018)
Kirk, M.: Thoughtful Machine Learning with Python: A Testdriven Approach. “ O'Reilly Media, Inc.” (2017)
Maillo, J., Ramírez, S., Triguero, I., Herrera, F.: kNN-IS: an iterative spark-based design of the k-nearest neighbors classifier for big data. Knowl.-Based Syst. 117, 3–15 (2017)
Guo, G., Wang, H., Bell, D., Bi, Y., Greer, K.: KNN model-based approach in classification. In OTM Confederated International Conference “On the Move to Meaningful Internet Systems”, pp. 986–996. Springer, Berlin, Heidelberg (2003).
Sabbeh, S.F.: Machine-learning techniques for customer retention: a comparative study. Int. J. Adv. Comput. Sci. Appl. 9(2) (2018)
Nabavi, S., Jafari, S.: Providing a customer churn prediction model using random forest and boosted trees techniques (case study: Solico Food Industries Group). J. Basic Appl. Sci. Res. 3(6), 1018–1026 (2013)
Smith, L.: A Tutorial on PCSA. Department of Computer Science, University of Otago., 12–28 (2006). http://www.cs.otago.ac.nz/research/techreports.php
Silwattananusarn, T., Tuamsuk, K.: Data mining and its applications for knowledge management: a literature review from 2007 to 2012. ArXiv, abs/1210.2872 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Nousi, C., Belogianni, P., Koukaras, P., Tjortjis, C. (2022). Mining Data to Deal with Epidemics: Case Studies to Demonstrate Real World AI Applications. In: Lim, CP., Vaidya, A., Jain, K., Mahorkar, V.U., Jain, L.C. (eds) Handbook of Artificial Intelligence in Healthcare. Intelligent Systems Reference Library, vol 211. Springer, Cham. https://doi.org/10.1007/978-3-030-79161-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-79161-2_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79160-5
Online ISBN: 978-3-030-79161-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)