Abstract
One of the most fundamental phenomena heavily influencing the digital society is Big Data. It is crucial not only to collect and analyze vast amounts of data but do it in an intelligent way. We believe that in order to do so, there needs to be a suitable interplay between the knowledge already known in the given application domain (background knowledge) and the knowledge inductively gained from data utilizing various data analysis techniques. We call it a knowledge-based approach to data analysis or intelligent data analysis. In this chapter, we will focus on two main types of the knowledge-based approach to data analysis. We start with the introduction of the semantic modelling of data analytics processes, which can efficiently cover an explicit form of background knowledge. The main focus here will be on the conceptualization of domain knowledge shared between the domain expert and data scientist and modelling of data mining workflows in order to achieve reproducibility and reusability. The second situation is typical for medical application, where the prevalent amount of background knowledge tends to stay tacit. In such a situation, the human-in-the-loop approach is a way how to perform data analysis intelligently. For both of these types of knowledge-based data analysis, specific case studies are presented to show how intelligent data analysis works in practice.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Shearer, C.: The CRISP-DM model: the new blueprint for data mining. J. Data Warehous. 5(4), 13–22 (2000)
Panov, P., Dzeroski, S., Soldatova, L.N.: OntoDM: an ontology of data mining. In: 2008 IEEE International Conference on Data Mining Workshops, pp. 752–760 (2008)
Hilario, M., Nguyen, P., Do, H., Woznica, A., Kalousis, A.: Ontology-based meta-mining of knowledge discovery workflows. In: Meta-Learning in Computational Intelligence (2011)
Kietz, J., Serban, F., Bernstein, A., Fischer, S.: Towards cooperative planning of data mining workflows. In: Proceedings of the Third Generation Data Mining Workshop at the 2009 European Conference on Machine Learning (ECML 2009) (2009)
Vanschoren, J., Soldatova, L.: Exposé: an ontology for data mining experiments. In: International Workshop on Third Generation Data Mining: Towards Service-Oriented Knowledge Discovery (SoKD-2010), pp. 31–46 (2010)
Sarnovsky, M., Bednar, P., Smatana, M.: Cross-sectorial semantic model for support of data analytics in process industries. Processes 7(5), 51–68 (2019)
Sarnovsky, M., Bednar, P., Smatana, M.: Big data processing and analytics platform architecture for process industry factories. Big Data and Cognitive Comput. 2(1), 3 (2018)
Sarnovský, M., Paralič, J.: Hierarchical intrusion detection using machine learning and knowledge model. Symmetry, 12(2) (2020)
Sabanovic, S., Majnaric Trtica, L., Babič, F., Vadovský, M., Paralič, J., Vcev, A., Holzinger, A.: Metabolic syndrome in hypertensive women in the age of menopause: a case study on data from general practice electronic health records. BMC Med. Inf. Decision Making 18(1), 1–24 (2018)
Bekic, S., Babič, F., Filipčic, I., Majnaric Trtica, L.: Clustering of mental and physical comorbidity and the risk of frailty in patients aged 60 years or more in primary care. Med. Sci. Monitor 25, 6820–6835 (2019)
Babič, F., Majnaric Trtica, L., Bekic, S., Holzinger, A.: Machine learning for family doctors: a case of cluster analysis for studying aging associated comorbidities and frailty. In: Holzinger, A., Kieseberg, P., Tjoa, A., Weippl, E. (eds.) Machine Learning and Knowledge Extraction. CD-MAKE 2019. Lecture Notes in Computer Science, vol 11713. Springer, Cham (2019)
Yin, J., Tian, L.: Optimal linear combinations of multiple diagnostic biomarkers based on Youden index. Stat. Med. 33(8), 1426–1440 (2013)
Barnett, K., Mercer, S.W., Norbury, M., et al.: Epidemiology of multimorbidity and implications for health care, research and medical education: a cross-sectional study. Lancet 38, 37–43 (2012)
Hothor, T., Everitt, B.S.: A Handbook of Statistical Analyses Using R, 2nd edn. Chapman and Hall/CRC, Boca Raton (2009)
Kodinariya, T.M., Makwana, P.R.: Review on determining number of Cluster in K-means clustering. Int. J. Adv. Res. Comput. Sci. Manage. Stud. 1(6), 90–95 (2013)
Rousseeuw, P.J.: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 20, 53–65 (1987)
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Theory Methods 3(1), 1–27 (1974)
Tolles, J., Meurer, W.J.: Logistic regression relating patient characteristics to outcomes. JAMA 316(5), 533–534 (2016)
Habshah, M., Kumar Sakar, S., Rana, S.: Collinearity diagnostics of binary logistic regression model. J. Interdisciplinary Math. 13(3), 253–267 (2010)
Lukáčová, A.: Approaches to extraction of decision support rules in medical domain. Dissertation thesis. Technical University of Košice, 99 p. (2016)
Holzinger, A.: Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inform. 3(2), 119–131 (2016). https://doi.org/10.1007/s40708-016-0042-6
Begum, S. et al.: Case-based reasoning systems in the health sciences: a survey of recent trends and developments. In: IEEE Transactions On Systems, Man, And Cybernetics – Part C: Applications and Reviews, vol. 41, no. 4, pp. 421–434 (2011)
Choudhury, N., Begum, S.: A survey on case-based reasoning in medicine. Int. J. Adv. Comput. Sci. Appl. 7(8), 136–144 (2016)
Pella, Z., Milkovič, P., Paralič, J.: Application for text processing of cardiology medical records. In: Proceedings of the IEEE World Symposium on Digital Intelligence for Systems and Machines (DISA 2018), pp. 169–174, IEEE (2020)
Pella, D. et al.: Possible role of machine learning in the detection of increased cardiovascular risk patients – KSC MR Study (design). Archives of Medical Science (accepted)
Tocimáková, Z., Pusztová, L., Paralič, J., Pella, D.: Case-based reasoning for support of the diagnostics of cardiovascular diseases. In: Studies in Health Technology and Informatics, vol. 270, NLM (Medline), pp. 537–541 (2020)
Acknowledgements
This work was supported by the Slovak Research and Development Agency under grants no. APVV-16-0213 and APVV-17-0550.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bednár, P., Paralič, J., Babič, F., Sarnovský, M. (2021). Knowledge-Based Approaches to Intelligent Data Analysis. In: Paralič, J., Sinčák, P., Hartono, P., Mařík, V. (eds) Towards Digital Intelligence Society. DISA 2020. Advances in Intelligent Systems and Computing, vol 1281. Springer, Cham. https://doi.org/10.1007/978-3-030-63872-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-63872-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63871-9
Online ISBN: 978-3-030-63872-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)