Skip to main content

Mining Data to Deal with Epidemics: Case Studies to Demonstrate Real World AI Applications

  • Chapter
  • First Online:
Handbook of Artificial Intelligence in Healthcare

Abstract

The massive growth of Big Data kickstarted a new era for data analytics and knowledge discovery. Data mining algorithms are employed to analyze different types of data, which reside in complex information networks. Researchers focus on producing usable knowledge by taking advantage of opportunities in various domains (e.g., healthcare, social media, energy etc.). Epidemics and disease outbreaks raised concerns about effective infectious disease management in communities around the world. Therefore, they encourage the use of AI methods for management and prevention, in order to mitigate disease spread, and contain outbreaks. This work engages in predictive analytics, utilizing classification, as well as descriptive analytics utilizing association rule mining and clustering, which are widely used in healthcare and medicine, either for predicting outbreaks or for extracting usable information from healthcare and medical data. Certain steps need to be considered when attempting to perform data analysis, such as data extraction, cleaning, preprocessing, transformation, interpretation and evaluation. The experimental part of this chapter integrates widely used datasets retrieved from the UCI Machine Learning Repository related with the healthcare domain. This chapter offers a literature review on data mining in epidemics, while thoroughly discussing all the aforementioned concepts. It also presents a complete process/cycle of the required steps to analyze data retrieved from healthcare and medical sources. Hence, the research questions addressed can be summarized to the following: Q1. Which are the pervasive types of analytics involving the domains of medicine and healthcare? Q2. How is data mining performed in the fields of healthcare and medicine? Q3. Which are the widespread techniques and methods utilized? These questions are discussed and elaborated, through a concise, informative and educational narration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Abbreviations

Term :

Definition

API (Application Programming Interface):

A set of functions and protocols that enables the data transmission and communication between software applications.

Data mining algorithms:

Mathematical and computational expressions of patterns found in datasets.

Data point:

An observation derived from a set of one or more measurements, presented either numerically or graphically.

Feature:

An attribute or variable of a dataset that can be used for analysis.

Instance:

A subset of the overall dataset or a single row of data.

References

  1. Ibrahim, N., Akhir, N.S.M., Hassan, F.H.: Predictive analysis effectiveness in determining the epidemic disease infected area. AIP Conf. Proc. 1891(1), 020064 (2017)

    Google Scholar 

  2. Suggala, R.K.: A Survey on Prediction and Detection of Epidemic Diseases Outbreaks (2019)

    Google Scholar 

  3. Thapen, N., Simmie, D., Hankin, C., Gillard, J.: Defender: detecting and forecasting epidemics using novel data-analytics for enhanced response. PloS One 11(5), e0155417 (2016). https://doi.org/10.1371/journal.pone.0155417

  4. Ravì, D., Wong, C., Deligianni, F., Berthelot, M., Andreu-Perez, J., Lo, B., Yang, G.Z.: Deep learning for health informatics. IEEE J. Biomed. Health Inform. 21(1), 4–21 (2016)

    Article  Google Scholar 

  5. Christaki, E.: New technologies in predicting, preventing and controlling emerging infectious diseases. Virulence 6(6), 558–565 (2015)

    Article  Google Scholar 

  6. Koukaras, P., Rousidis, D., Tjortjis, C.: Forecasting and prevention mechanisms using social media in healthcare. Adv. Comput. Intell. Healthc. 7(2020), 121–137 (2020)

    Google Scholar 

  7. Leopord, H., Cheruiyot, W.K., Kimani, S.: A survey and analysis on classification and regression data mining techniques for diseases outbreak prediction in datasets. Int. J. Eng. Sci 5(9), 1–11 (2016)

    Google Scholar 

  8. Zhang, S., Tjortjis, C., Zeng, X., Qiao, H., Buchan, I., Keane, J.: Comparing data mining methods with logistic regression in childhood obesity prediction. Inf. Syst. Front. J. 11(4), 449–460 (2009)

    Article  Google Scholar 

  9. Yoo, I., Alafaireet, P., Marinov, M., Pena-Hernandez, K., Gopidi, R., Chang, J.F., Hua, L.: Data mining in healthcare and biomedicine: a survey of the literature. J. Med. Syst. 36(4), 2431–2448 (2012)

    Article  Google Scholar 

  10. Tjortjis, C., Saraee, M., Theodoulidis, B., Keane, J.A.: Using T3, an improved decision tree classifier, for mining stroke related medical data. Methods Inf. Med. 46(5), 523–529 (2007)

    Google Scholar 

  11. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Mag. 17(3), 37–37 (1996)

    Google Scholar 

  12. Liao, S.H., Chu, P.H., Hsiao, P.Y.: Data mining techniques and applications–a decade review from 2000 to 2011. Expert Syst. Appl. 39(12), 11303–11311 (2012)

    Article  Google Scholar 

  13. Murdoch, W.J., Singh, C., Kumbier, K., Abbasi-Asl, R., Yu, B.: Definitions, methods, and applications in interpretable machine learning. Proc. Natl. Acad. Sci. 116(44), 22071–22080 (2019)

    Google Scholar 

  14. Sharma, V., Kumar, A., Panat, L., Karajkhede, G., Lele, A.: Malaria outbreak prediction model using machine learning. Int. J. Adv. Res. Comput. Eng. Technol. (IJARCET) 4(12) (2015).

    Google Scholar 

  15. Rovatsos, M., Mittelstadt, B., Koene, A.: Landscape Summary: Bias in Algorithmic Decision-Making. Centre for Data Ethics and Innovation (2019)

    Google Scholar 

  16. Bellinger, C., Jabbar, M.S.M., Zaïane, O., Osornio-Vargas, A.: A systematic review of data mining and machine learning for air pollution epidemiology. BMC Public Health 17(1), 907 (2017)

    Article  Google Scholar 

  17. Sumathi, S., Sivanandam, S.N.: Data mining tasks, techniques, and applications. In: Introduction to Data Mining and Its Applications, pp. 195–216 (2006)

    Google Scholar 

  18. Gheware, S.D., Kejkar, A.S., Tondare, S.M.: Data mining: task, tools, techniques and applications. Int. J. Adv. Res. Comput. Commun. Eng., 3(10) (2014)

    Google Scholar 

  19. Assamnew, S.: Predicting the occurrence of measles outbreak in Ethiopia using data mining technology (Doctoral dissertation, Addis Ababa University) (2011)

    Google Scholar 

  20. Traore, B.B., Kamsu-Foguem, B., Tangara, F.: Data mining techniques on satellite images for discovery of risk areas. Expert Syst. Appl. 72, 443–456 (2017)

    Article  Google Scholar 

  21. Ahmed, K.P.: Analysis of data mining tools for disease prediction. J. Pharm. Sci. Res. 9(10), 1886–1888 (2017)

    Google Scholar 

  22. Tzirakis, P., Tjortjis, C.: T3C: Improving a decision tree classification algorithm’s interval splits on continuous attributes. Adv. Data Anal. Classif. 11(2), 353–370 (2017)

    Article  MathSciNet  Google Scholar 

  23. Tjortjis, C., Keane, J.A.: T3: an Improved classification algorithm for data mining. Lect. Notes Comput. Sci. 2412, 50–55 (2002)

    Article  Google Scholar 

  24. Kanellopoulos, Y., Antonellis, P., Tjortjis, C., Makris, C., Tsirakis, N.: k-attractors: a partitional clustering algorithm for numeric data analysis. Appl. Artif. Intell. 25(2), 97–115 (2011)

    Article  Google Scholar 

  25. Ghafari, S.M.; Tjortjis, C. (2019). A Survey on association rules mining using heuristics. WIREs Data Min. Knowl. Discov. 9(4)

    Google Scholar 

  26. Yakhchi, S., Ghafari, S.M., Tjortjis, C., Fazeli, M.: ARMICA-improved: a new approach for association rule mining. Lect. Notes AI 10412, 296–306 (2017)

    Google Scholar 

  27. Ghafari, S.M., Tjortjis, C.: Association rules mining by improving the imperialism competitive algorithm (ARMICA). In: IFIP Proceedings 12th International Conference on Artificial Intelligence Applications & Innovations (AIAI 2016), vol. 475, pp. 242–254. Springer (2016).

    Google Scholar 

  28. Wang, C., Tjortjis, C.: PRICES: an efficient algorithm for mining association rules. Lect. Notes Comput. Sci. 3177, 352–358 (2004)

    Article  Google Scholar 

  29. Dong, L., Tjortjis, C.: Experiences of using a quantitative approach for mining association rules. Lect. Notes Comput. Sci. 2690, 693–700 (2003)

    Article  Google Scholar 

  30. Buczak, A.L., Koshute, P.T., Babin, S.M., Feighner, B.H., Lewis, S.H.: A data-driven epidemiological prediction method for dengue outbreaks using local and remote sensing data. BMC Med. Inform. Decis. Making 12(1) (2012)

    Google Scholar 

  31. Tarmizi, N.D.A., Jamaluddin, F., Bakar, A.A., Othman, Z.A., Hamdan, A.R.: Classification of dengue outbreak using data mining models. Res. Notes Inf. Sci. 12, 71–75 (2013)

    Google Scholar 

  32. Hamami, D., Atmani, B., Cameron, R., Pollock, K.G., Shankland, C.: Improving process algebra model structure and parameters in infectious disease epidemiology through data mining. J. Intell. Inf. Syst. 1–23 (2019)

    Google Scholar 

  33. Fan, Q., Yao, X.A., Dang, A.: Spatiotemporal analysis and data mining of the 2014–2016 Ebola virus disease outbreak in West Africa. In: Geospatial Technologies for Urban Health, pp. 181–208. Springer, Cham (2020)

    Google Scholar 

  34. Mustaqeem, A., Anwar, S.M., Majid, M.: Multiclass classification of cardiac arrhythmia using improved feature selection and SVM invariants. Comput. Math. Methods Med (2018)

    Google Scholar 

  35. Kirk, M.: Thoughtful Machine Learning with Python: A Testdriven Approach. “ O'Reilly Media, Inc.” (2017)

    Google Scholar 

  36. Maillo, J., Ramírez, S., Triguero, I., Herrera, F.: kNN-IS: an iterative spark-based design of the k-nearest neighbors classifier for big data. Knowl.-Based Syst. 117, 3–15 (2017)

    Article  Google Scholar 

  37. Guo, G., Wang, H., Bell, D., Bi, Y., Greer, K.: KNN model-based approach in classification. In OTM Confederated International Conference “On the Move to Meaningful Internet Systems”, pp. 986–996. Springer, Berlin, Heidelberg (2003).

    Google Scholar 

  38. Sabbeh, S.F.: Machine-learning techniques for customer retention: a comparative study. Int. J. Adv. Comput. Sci. Appl. 9(2) (2018)

    Google Scholar 

  39. Nabavi, S., Jafari, S.: Providing a customer churn prediction model using random forest and boosted trees techniques (case study: Solico Food Industries Group). J. Basic Appl. Sci. Res. 3(6), 1018–1026 (2013)

    Google Scholar 

  40. Smith, L.: A Tutorial on PCSA. Department of Computer Science, University of Otago., 12–28 (2006). http://www.cs.otago.ac.nz/research/techreports.php

  41. Silwattananusarn, T., Tuamsuk, K.: Data mining and its applications for knowledge management: a literature review from 2007 to 2012. ArXiv, abs/1210.2872 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christos Tjortjis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Nousi, C., Belogianni, P., Koukaras, P., Tjortjis, C. (2022). Mining Data to Deal with Epidemics: Case Studies to Demonstrate Real World AI Applications. In: Lim, CP., Vaidya, A., Jain, K., Mahorkar, V.U., Jain, L.C. (eds) Handbook of Artificial Intelligence in Healthcare. Intelligent Systems Reference Library, vol 211. Springer, Cham. https://doi.org/10.1007/978-3-030-79161-2_12

Download citation

Publish with us

Policies and ethics