Abstract
Heart disease refers to the condition when the heart is not capable to push required amount of blood to the entire body. Heart disease (HD) is the prevailing reason behind deaths among the world-wide population. Early prediction of heart diseases can save lives. Predicting cardiovascular or heart disease in advance, a person can be warned beforehand, and the death can be prevented in turn. Machine learning (ML) has made a huge contribution to classify the population with heart disease from the healthy population. This paper proposes three heart disease prediction (HDP) models namely LOFS-ANN, LOFS-SVM, and LOFS-DT utilizing lion optimization-based feature selection (LOFS) method and three ML-based classifiers. The datasets used are from UCI repository. The comparative analysis reflects that the model LOFS-ANN performs best among all three models, with the values of 97.1% and 90.5% for AUC measure and accuracy measure, respectively. It can be concluded that the LOFS-ANN has a significant potential to predict heart disease after drawing its statistical comparison with the competing models.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Heart disease
- Artificial neural network (ANN)
- UCI Cleveland
- Feature selection (FS)
- Support vector machine (SVM) and area under the curve (AUC)
1 Introduction
Heart disease (HD) is the biggest reason behind the deaths all around the world. The WHO investigated into the statistics and reported that 17.7 deaths were caused due to cardiovascular diseases almost in 2015 throughout the world [1]. The early prediction of HD among population can be a potential help in saving lives by issuing warning and precautionary measures to the people. Machine learning (ML) techniques are playing a crucial role in heart diseases prediction (HDP) using the past collected patient data [2]. A wide range of ML techniques is available for developing the heart disease predictors [3]. The patient datasets possess numerous attributes and not all worthy for predicting the heart disease. Feature selection (FS) facilitates to enhance prediction accuracy by removing the non-contributing and irrelevant attributes [4,5,6,7,8]. Bio-inspired algorithms are gaining popularity for the FS [9]. This study utilizes lion optimization (LO) algorithm originated from the social behavior of lion [10]. Lion optimization for feature selection (LOFS) has not yet been utilized in ML-based HDP domain. To carry out the research streamlined, following research goals are established-
-
R1To report the best ML-based HDP model among the proposed models to predict heart disease effectively.
-
R2 To establish the statistical validation of the work.
The paper is organized as follows—Sect. 2 discusses the literature related to this study. The experimental methods and setup are given in Sect. 3. The results of experiments are reported under Sect. 4. The research work is concluded under Sect. 5 bringing a light on the future work.
2 Literature Work
The survey on the work carried out in the literature of HDP applying the machine learning techniques has been summed up in this section. The survey is summarized as Table 1.
3 Research Methodology
The research methodology adopted for this work including the experimental methods and setup are briefed in this section.
This work utilizes three datasets from the UCI repository for experimental work [15]. The description to datasets attributes is given as under Table 2. The patient dataset is partitioned into training and testing datasets with 70–30 ratio. Then, lion optimization algorithm for feature selection (LOFS) [14] is applied to select the most significant features. The features selected using the LOFS algorithm for all three experimental datasets are listed as in Table 3. Then, the only selected features are fed to the ML-based classifiers for training purpose. The most renowned classification algorithms [2] are selected for the heart disease prediction (HDP) which are artificial neural network (ANN) [16], support vector machine (SVM), [17] and decision trees (DT) [18, 19]. Performance of all three proposed classifiers is recorded over all three datasets. Figure 1 depicts the proposed experimental model.
For the performance evaluation, ROC, AUC, and accuracy are considered [2, 3, 11,12,13, 16,17,18,19,20,21].
4 Results and Discussion
This section reports the experimental results and the inferences drawn after analysis are listed out here.
4.1 Finding the Best ML-Based HDP Model (R1)
A comparison is done among LOFS-ANN, LOFS-SVM, and LOFS-DT to find the best performer. First up, the AUC values are recorded over all three datasets for all the candidate models and reported as in Table 4. Next, the author records the accuracy measure (see Table 5). It is clear that LOFS-ANN performs best over accuracy criteria too. The results are plotted as Fig. 2 for visualization of comparative analysis.
To achieve the goal R1, ROC is considered for performance evaluation. The corresponding ROC plots for all three datasets—UCI Heart Disease Dataset (Cleveland) [15], UCI Statlog (Heart), and UCI Heart Failure Clinical Dataset are reported as Figs. 3, 4, and 5, respectively.
From the experimental results, it is seen that LOFS-ANN shows the best accuracy for predicting the heart disease in comparison with rest of the models.
Response to R1—The proposed LOFS-ANN performs best among the proposed models for all datasets.
4.2 Statistical Justification (R2)
To find the statistical proof, Friedman’s test is conducted [20]. The result of test reflects upon whether the statistical proof for the goal R1 exists or not. The test is conducted with significance level of 5%. The results show that the value of p-statistic is less than 0.05 (see Fig. 6). Hence, it can be statistically validated that proposed LOFS-ANN-based HDP model is better than LOFS-SVM and LOFS-DT.
Response to R2—There exists statistical proof to validate the research work carried out in this paper.
5 Conclusion
Heart disease is the biggest reason of death in the entire world. If it is predicted well in advance and the patient is fore alarmed, then the lives can be saved. ML classification algorithms are being used for predicting the heart disease. The accuracy of the heart disease predictor is enhanced with the appropriate subset selection of the features from the total feature set—which are in good correlation with the target. In this paper, lion-based feature selection (LOFS) method has been utilized to select most significant features from three datasets—UCI Heart Disease Dataset (Cleveland), UCI Statlog (Heart), and UCI Heart Failure Clinical Dataset. These preprocessed data are fed for the training of three classifiers—ANN, SVM, and DT resulting into three HDP models-LOFS-ANN, LOFA-SVM, and LOFS-DT. The comparison is made among the performance of these proposed methods. The author concludes the work that the ANN with LOFS performs best for heart disease prediction.
Author proposes to replicate the work in the future with larger clinical datasets to contribute more accurate heart disease predictors for biomedical domain.
References
World Health Organization (WHO) (2017) Cardiovascular diseases (CVDs)—Key Facts
http://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds). Accessed 22 Mar 2022
Goyal S (2023) Software measurements with machine learning techniques-a review. Recent Adv Comput Sci Commun 16:1–17. https://dx.doi.org/10.2174/2666255815666220407101922
Safdar S, Zafar S, Zafar N et al (2018) Machine learning based decision support systems (DSS) for heart disease diagnosis: a review. Artif Intell Rev 50:597–623. https://doi.org/10.1007/s10462-017-9552-8
Goyal S (2022) FOFS: firefly optimization for feature selection to predict fault-prone software modules. In: Nanda P, Verma VK, Srivastava S, Gupta RK, Mazumdar AP (eds) Data engineering for smart systems. Lecture Notes in Networks and Systems, vol 238. Springer, Singapore. https://doi.org/10.1007/978-981-16-2641-8_46
Amin MS, Chiam YK, Varathan KD (2019) Identification of significant features and data mining techniques in predicting heart disease. Telemat Inform 36:82–93. https://doi.org/10.1016/j.tele.2018.11.007
Prakash S, Sangeetha K, Ramkumar N (2019) An optimal criterion feature selection method for prediction and effective analysis of heart disease. Cluster Comput 22(s5):11957–11963. https://doi.org/10.1007/s10586-017-1530-z
Gokulnath CB, Shantharajah SP (2019) An optimized feature selection based on genetic approach and support vector machine for heart disease. Cluster Comput 22(s6):14777–14787. https://doi.org/10.1007/s10586-018-2416-4
Darwish A (2018) Bio-inspired computing: algorithms review, deep analysis, and the scope of applications. Future Comput Inform J 3(2):231–246, ISSN 2314-7288. https://doi.org/10.1016/j.fcij.2018.06.001
Yazdani M, Jolai F (2016) Lion optimization algorithm (LOA): a nature-inspired metaheuristic algorithm. J Comput Design Eng 3(1):24–36, ISSN 2288-4300. https://doi.org/10.1016/j.jcde.2015.06.003
Haq AU, Li JP, Memon MH, Nazir S, Sun R (2018) A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms. Mobile Inform Syst
Bharti R, Khamparia A, Shabaz M, Dhiman G, Pande S, Singh P (2021) Prediction of heart disease using a combination of machine learning and deep learning. Comput Intell Neurosci
Benhar Charles V, Surendran D, SureshKumar A (2022) Heart disease data based privacy preservation using enhanced ElGamal and ResNet classifier. Biomed Signal Process Control 71(Part B):103185, ISSN 1746-8094. https://doi.org/10.1016/j.bspc.2021.103185
Fitriyani NL, Syafrudin M, Alfian G, Rhee J (2020) HDPM: an effective heart disease prediction model for a clinical decision support system. IEEE Access 8:133034–133050. https://doi.org/10.1109/ACCESS.2020.3010511
Goyal S (2022) Genetic evolution-based feature selection for software defect prediction using SVMs. J Circuits Syst Comput 31(11):2250161. https://doi.org/10.1142/S0218126622501614
Goyal S (2022) 3PcGE: 3-parent child-based genetic evolution for software defect prediction. Innovations Syst Softw Eng. https://doi.org/10.1007/s11334-021-00427-1
UCI Machine Learning Repository: Heart Disease Data Set.: Archive.ics.uci.edu. http://archive.ics.uci.edu/ml/datasets/Heart?
Goyal S (2021) Effective software defect prediction using support vector machines (SVMs). Int J Syst Assur Eng Manag. https://doi.org/10.1007/s13198-021-01326-1
Goyal S (2022) Static code metrics-based deep learning architecture for software fault prediction. Soft Comput pp 1–33. https://doi.org/10.1007/s00500-022-07365-5
Goyal S (2021) Predicting the defects using stacked ensemble learner with filtered dataset. Autom Softw Eng 28:14. https://doi.org/10.1007/s10515-021-00285-y
Goyal S (2021) Handling class-imbalance with KNN (neighbourhood) under-sampling for software defect prediction. Artif Intell Rev. https://doi.org/10.1007/s10462-021-10044-w
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Goyal, S. (2023). Predicting the Heart Disease Using Machine Learning Techniques. In: Fong, S., Dey, N., Joshi, A. (eds) ICT Analysis and Applications. Lecture Notes in Networks and Systems, vol 517. Springer, Singapore. https://doi.org/10.1007/978-981-19-5224-1_21
Download citation
DOI: https://doi.org/10.1007/978-981-19-5224-1_21
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-5223-4
Online ISBN: 978-981-19-5224-1
eBook Packages: EngineeringEngineering (R0)