Abstract
Living beings are subjected to many hazards during their course of life. Owing to high mortality rate, heart disease (HD) is among leading hazards for living being. It is world’s one of the critical disease due to its complex diagnosis and expansive treatment. It has predominantly affected the health care sector of developing as well as developed countries. Inadequate preventive measures, diagnosis shortcomings, inefficient medical support, lack of medical staff and advancements have led to severe impacts on developing countries. The paper exhibits state-of-the-art of various intelligent solutions for HD detection with an empirical analysis of machine learning algorithms on electrocardiogram-based arrhythmia dataset for disease detection. A critical investigation is being performed using eight machine learning algorithms, Support Vector Machine, K-Nearest Neighbors, Random Forest, Extra Tree, Bagging, Decision Tree, Linear Regression, and Adaptive Boosting, under imbalanced and balanced class paradigms. The performance of these algorithms is tested with four metrics namely, precision, recall, accuracy, and f1-score. The empirical analysis presents an interesting insight on the structure of dataset. Initially for binary class balancing problem majority class have more accuracy than the minority class because model’s training dataset is crowded with majority class tuples than minority class. The paper uses Synthetic Minority Over-sampling Technique for data balancing. It has not only increased the overall accuracy of the algorithm but also the individual accuracy of the classes. Hence, the accuracy of the minority class will not be sacrificed.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
The heart is an essential organ of the human body, which is also known as the human’s engine room. It is a lattice of muscles that pumps blood into the human body or in other words we can say that it is the central processing area of the cardiovascular system [1]. The cardiovascular system is a complex network of blood circulation that is composed of blood vessels (i.e., arteries), capillaries, and veins [2]. Any abnormality or obstruction in healthy blood flow or blood circulation may lead to severe and several complexities of heart disorders or diseases. These heart disorders or diseases are commonly known as cardiovascular diseases (CVDs) and are one of the deadliest diseases across the world to date. The CVDs have consisted of various diseases such as brain vascular diseases, heart diseases, and blood vessel diseases [3]. Global Atlas on Cardiovascular Disease Prevention and Control of the World Health Organization (WHO) has quoted in their report is that CVDs are one of the foremost causes of disability and deaths across the globe [4].
However, several reports from WHO have indicated an increase in CVD worldwide, which is a very dangerous indication for humans. Approximately 17.5 million people have died due to CVDs in the year 2012 globally than from any other life-threatening diseases [5]. It is also quoted in the WHO's report that around 17.9 million people die from CVDs each year which is 31% of all deaths across the world. Among these deaths, 85% have died from either heart attack or stroke [6].
CVDs can be categorized into several types and some of them have been listed below [7]:
-
1.
Coronary Heart Disease: This disease is caused by blockage or damage in the main blood vessel.
-
2.
Heart Failure: In this condition heart suffers to pump sufficient blood.
-
3.
Cardiomyopathy: It is a hereditary or acquired heart muscle disease.
-
4.
Hypertensive Heart Disease: This disease is caused by high blood pressure (BP).
-
5.
Ischemic Heart Disease: This disease makes heart arteries narrow; thus, the minimal amount of oxygen and blood reaches into the heart.
-
6.
Valvular Heart Disease: This disease is caused by defects or damage in any of the heart valves.
-
7.
Inflammatory Heart Disease: This disease is caused by bacterial or viral infections.
In the medical domain, machine learning plays an important role in knowledge extraction and medical data analysis. Its immense handling ability and diverse processing capabilities make it a giant player, to deal with the complex problem in both real-time and offline scenarios [8,9,10,11]. If we talk about heart disease detection, the algorithms such as Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest (RF), Naive Bayes (NB), Extra Tree (ET), Bagging (BAG), Decision Tree (DT), Linear Regression (LR), Adaptive Boosting (ADB), Linear Discriminant Analysis (LDA), Convolutional Neural Network (CNN), Quadratic Discriminant Analysis (QDA), Multi-layer Perceptron (MLP), Ensemble Classifiers, Artificial neural network (ANN), Boosting, and so on have been used. Apart from this, various hybrid models are also been introduced and they have archived great success [12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60].
Accurate diagnosis and prediction are vital issues not only for hospitals but also for practitioners. These vital issues should be taken care-off while building a smart healthcare solution. Development in computing technologies has enabled various facilities for real-time data collection and storage. Thus, a huge amount of health data are collected and which is very useful for clinical investigations.
The principal objective of this study is:
-
To critically analyze and summarize the state-of-the-art research articles on heart disease detection over ECG datasets.
-
To critically investigate the empirical analysis of machine learning algorithms on imbalance electrocardiogram based arrhythmia dataset for heart disease detection.
-
To find out the impact of class balancing on the performance of machine learning algorithms over the imbalanced class paradigms.
-
To give a machine learning-based direction toward robotic or smart machine-based solutions for social well-being.
The rest of this study is as follows. Section 2 exhibits state-of-the-art articles present on the detection of heart diseases over ECG datasets. In Sect. 3, a discussion about the material and methods used for preliminary study such as dataset description, model setup, and statistical analysis have been drawn. Section 4 manifests the result of the experimental evaluation. The comprehensive discussion about the empirical analysis is conferred in Sect. 5. In the last Sect. 6, the concluding comments with the anticipated scope have been described.
2 Related Work
The emergence of IoT encourages researchers to work in the healthcare domain, with massive data coverage and functionalities. It also motivates the big companies to build a health-centric solution for the wellbeing of the human. If we talk about the algorithmic perspective, massive development has been seen in the last couple of decades. From time to time various researchers have given the possible solution toward making the healthcare domain smart, but still, there is plenty of scopes are left. Thus, these scopes need improvements in both, design and algorithmic perspectives. It will not only help in building a smarter world but also for making the healthcare domain to new heights by enabling the robotics-based smarter solution.
The state-of-the-art articles on heart disease detection over the ECG dataset have been shown in Table 1 [12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60].
In Table 1, various state-of-the-art articles on heart disease detection over the ECG dataset have been shown. This table gives a quick inside of the state-of-the-art methods with respect to year, research objectives, used methods, dataset and their types, accuracy of the methods, and their respective references. It is clear from pieces of the literature that there are plenty of machine learning algorithms such as Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest (RF), Naive Bayes (NB), Extra Tree (ET), Bagging (BAG), Decision Tree (DT), Linear Regression (LR), Adaptive Boosting (ADB), Linear Discriminant Analysis (LDA), Convolutional Neural Network (CNN), Quadratic Discriminant Analysis (QDA), Multi-layer Perceptron (MLP), Ensemble Classifiers, Artificial neural network (ANN), Boosting, and various hybrid models have been used for the classification of the heart disease [12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60]. These methods are applied either on the ECG signals or numerical datasets. It is evident from the literature that this is an active area of research with the plenty of opportunities. Thus, there is still a lot of scopes to find out the more effective methods with clear interpretations. Keeping this into our mind, we have presented the empirical examination of the state-of-the-art algorithms.
3 Materials and Methods
This section talks about the material and methods used for the preliminary study. It is consisting of the three subsections which discuss dataset description, model setup, and statistical parameters, respectively.
3.1 Data
For this empirical investigation, two open-source Electrocardiogram (ECG) datasets have been used. The first dataset has been extracted from PhysioNet’s repository under the MIT-BIH arrhythmia database [61]. This dataset has been recorded using the nine electrodes (E1-E9) mounted on various locations of the human body. The second dataset has been extracted from the UCI (University of California-Irvine) repository named Heart Disease Data (consist of four data repositories Cleveland, Hungary, Switzerland, and the VA Long Beach) which is also an arrhythmia dataset (Coronary Artery Disease) [62]. This dataset has been recorded using the thirteen parameters i.e., AGE, SEX, CP, TRESTBPS, CHOL, FBS, RESTECG, THALACH, EXANG, OLDPEAK, SLOPE, CA, and THAL. The first dataset (PhysioNet’s arrhythmia Dataset) is consists of 74,501 instances of 9 attributes whereas the second dataset (UCI's Arrhythmia Dataset) contains 403 instances of 14 attributes.
In Figs. 1 and 2, the visualization of the PhysioNet’s arrhythmia dataset and UCI's arrhythmia dataset has been exhibited, respectively. Both the datasets have been used for heart disease detection purposes.
The class-wise distribution of the PhysioNet’s arrhythmia dataset (based on nine attributes) and UCI's arrhythmia dataset (based on thirteen attributes) has been exhibited in Tables 2 and 3, respectively. These tables are consisting of the class-wise classification of all the liable attributes based on minimum range, maximum range, mean and standard derivation.
In Figs. 3 and 4, the histogram of the PhysioNet’s and UCI's arrhythmia dataset has been shown. This histogram visualization will very helpful to understand the dataset distribution in the form of their spread and shape over a real-time sample (signal/data). This information will be extremely beneficial for further investigation.
3.2 Model Setup
The empirical analysis setup of machine learning algorithms for the electrocardiogram-based arrhythmia dataset has exhibited in Fig. 5.
This critical analysis is consisting of six fundamental steps.
-
In the initial step, the ECG-based arrhythmia dataset (from PhysioNet and UCI repositories) is supplied as input.
-
In step 2, the cleaning of the dataset is performed for the elimination of the missing values and unusual objects.
-
The class balancing using the Synthetic Minority Over-sampling Technique (SMOTE) is performed in step 3.
-
In the next step 4, the class balanced preprocessed data are fed as an input to the several machine learning algorithms (i.e., SVM, KNN, RF, ET, BAG, ET, DT, LR, and ADB) with a tenfold class validation method.
-
The statistical interpretation-based evaluation of the all used machine learning algorithms has computed in step 5.
-
In the last step 6, the output of the used machine learning algorithms is received.
3.2.1 Data Balancing Using SMOTE
Class imbalance is one of the well-known and vital issues which may influence the performance of machine learning algorithms. This empirical analysis has been conducted to find out the impact of class imbalance on the performance of the various machine learning algorithms. From this empirical analysis, we have seen that in the binary class problem the majority class has more accuracy than the minority class because while training the model the more samples are from the majority class and only a small bunch of samples are from the minority class. Thus, there is a significant need for an effective method that could manage the issue of class imbalance. In this context, an unsupervised technique i.e., SMOTE (Synthetic Minority Oversampling Technique) [63, 64] method has been used to deal with the class imbalance problem on the ECG-based arrhythmia datasets (from PhysioNet and UCI repositories).
Class balancing outcome of PhysioNet’s arrhythmia dataset and UCI arrhythmia dataset has been exhibited in Tables 4 and 5, respectively. These tables hold the class-wise aspects of the SMOTE percentage with the total number of samples at each setting.
3.2.2 Hyper-tuning of Machine Learning Algorithms
Hyper-tuning is one of the best ways to select the hyper-parameter for machine learning algorithms. With the help of this, we can evaluate various hyperparameters under various settings to find the best-suited parameter for the given problem [65, 66]. Therefore, we have hyper tuned each machine-learning algorithm to find out the best parameters. These evaluated parameters have been used to train the machine learning models.
In Table 6, the hyperparameters with various selection criteria have been shown. Based on these selection criteria, the best hyperparameters have opted and which have been used in the experimental evaluation.
3.3 Statistical Analysis
To validate the evaluation result of machine learning algorithms, four statistical measures, i.e., precision, f1-score, recall, and accuracy, have been utilized. These statistical measures have played an important role in establishing the accuracy and suitability of machine learning algorithms [67]. The mathematical imputation of these statistical measures has been manifested in Eqs. 1, 2, 3 and 4, respectively.
where TP, true positive; TN, true negatives; FP, false positive, and FN false negatives.
4 Results
In the process of result finding, a critical analysis has been performed using eight machine learning algorithms (i.e., Support Vector Machine, K-Nearest Neighbors, Random Forest, Extra Tree, Bagging, Decision Tree, Linear Regression, and Adaptive Boosting) with a tenfold class validation method, and which has been shown in Fig. 6.
To understand the correlation among the attributes of the arrhythmia dataset of PhysioNet (contains 74,501 instances of 9 attributes) and UCI (contains 403 instances of 14 attributes) repositories the correlation coefficient for both the dataset has been calculated and shown in Figs. 7 and 8, respectively.
This critical investigation has been conducted into two parts. In the first part, a critical analysis of the hyper-tuned result of all eight machine learning algorithms on ECG-based arrhythmia dataset (from PhysioNet and UCI repositories) has been performed. In the second part, the class balancing (using SMOTE) and hyper-tuning both are combinedly applied on the same arrhythmia dataset (from PhysioNet and UCI repositories), to find out the impact of class balancing on results. The performance evaluation of all algorithms has been conducted using the four validation measures i.e., accuracy, recall, f-1socre, and precision. This empirical investigation is aimed to find out the impact of class balancing on the performance of machine learning algorithms which will very helpful in developing a machine learning-based direction toward robotic or smart machine-based solutions for social well-being.
In Tables 7, 8, 9 and 10, the performance evaluation result of eight machine learning algorithms under the different validation criteria (cross-validation policy, i.e., 1-fold, 3-fold, 5-fold, and 10-fold) based on the four performance evaluators (i.e., precision, f1-score, recall, accuracy) for ECG-based arrhythmia dataset (from PhysioNet and UCI repositories) has been presented.
From this empirical analysis is clear that in the binary class balancing problem, the majority class has archived more accuracy than the minority class, because while training the model the more samples are from the majority class and only a small bunch of samples are from the minority class. But after the class balancing using the SMOTE has not only increases the overall accuracy of the algorithm but also the individual accuracy of the classes. Therefore, the accuracy of the minority class will not be sacrificed.
Table 7 shows the hyper-tuned results of all eight machine learning algorithms on the PhysioNet’s ECG-based arrhythmia dataset, whereas Table 8 shows the hyper-tuned results of all eight machine learning algorithms after class balancing. Similarly, Table 9 shows the hyper-tuned results of all eight machine learning algorithms on UCI's ECG-based arrhythmia dataset, whereas Table 10 shows the hyper-tuned results of all eight machine learning algorithms after class balancing.
5 Discussion
Class-wise performance result of eight machine learning algorithms (Hyper-tuned vs. Hyper-tuned After Class Balancing) under the different validation criteria (cross-validation policy, i.e., 1-fold, 3-fold, 5-fold, and 10-fold) based on the four performance evaluators (i.e., precision, f1-score, recall, accuracy) for ECG-based arrhythmia dataset (from PhysioNet and UCI repositories) has been presented in Figs. 9, 10, 11 and 12.
Figure 9 shows the class 1 result of eight machine learning algorithms (Hyper-tuned vs. Hyper-tuned After Class Balancing) on the PhysioNet’s ECG-based arrhythmia dataset, whereas Fig. 10 shows the class 2 result of eight machine learning algorithms (Hyper-tuned vs. Hyper-tuned After Class Balancing) on the PhysioNet’s ECG-based arrhythmia dataset.
Similarly, Fig. 11 shows the class 0 result of eight machine learning algorithms (Hyper-tuned vs. Hyper-tuned After Class Balancing) on UCI's ECG-based arrhythmia dataset, whereas Fig. 12 shows the class 1 result of eight machine learning algorithms (Hyper-tuned vs. Hyper-tuned After Class Balancing) on UCI's ECG-based arrhythmia dataset.
In Fig. 13, the average accuracy with a tenfold cross-validation policy of eight machine learning algorithms (Hyper-tuned vs. Hyper-tuned After Class Balancing) over the ECG-based arrhythmia dataset of PhysioNet and UCI has been presented.
6 Conclusion
The growing rate of heart illnesses has become a serious matter of concern for humans across the globe. This concern got the attention of researchers to reduce the mortality and morbidity rate of heart diseases across the globe. From time to time various researchers have given their algorithmic solutions toward the heart data analysis. These algorithmic solutions are playing a vital role not only in building the smart robotic solution but also in minimizing the impact of the diseases by effective decision making. This empirical investigation is aimed to critically analyze and summarize state-of-the-art articles on heart disease detection. Apart from this, we have also performed an empirical analysis of machine learning algorithms using an electrocardiogram dataset to find out the impact of class balancing on the performance of machine learning algorithms, which will give a machine learning-based direction toward robotic or smart machine-based solutions. From this experimental analysis is clear that in the binary class balancing problem, the majority class has archived more accuracy than the minority class because, while training the model the more samples are from the majority and only a small bunch of samples are from the minority class. But after the class balancing using the Synthetic Minority Over-sampling Technique (SMOTE) has not only increases the overall accuracy of the algorithm but also the individual accuracy of the classes. Therefore, the accuracy of the minority class will not be sacrificed. Thus, we can say that class imbalance is one of the vital issues that have to take care of before making medical solutions where are things depend on the accuracy of the algorithm.
In the future, this critical analysis will further be extended with the algorithmic and data perspective.
References
Nashif, S.; Raihan, M.R.; Islam, M.R.; Imam, M.H.: Heart disease detection by using machine learning algorithms and a real-time cardiovascular health monitoring system. World J. Eng. Technol. 6(4), 854–873 (2018)
Stefanovska, A.: Physics of the human cardiovascular system. Contemp. Phys. 40(1), 31–55 (1999)
Mendis, S.; Puska, P.; Norrving, B.; World Health Organization: Global atlas on cardiovascular disease prevention and control. World Health Organization, Geneva (2011)
Najafi, F.; Jamrozik, K.; Dobson, A.J.: Understanding the ‘epidemic of heart failure’: a systematic review of trends in determinants of heart failure. Eur. J. Heart Fail. 11(5), 472–479 (2009)
World Health Organization. (2020). Hearts: technical package for cardiovascular disease management in primary health care.
World Health Organization. (2013). Global action plan for the prevention and control of noncommunicable diseases 2013–2020.
Nikhar, S.; Karandikar, A.M.: Prediction of heart disease using machine learning algorithms. Int. J. Adv. Eng. Manag. Sci. 2(6), 239484 (2016)
Ketu, S.; Mishra, P.K.: Hybrid classification model for eye state detection using electroencephalogram signals. Cogn. Neurodyn. (2021). https://doi.org/10.1007/s11571-021-09678-x
Ketu, S.; Mishra, P.K.: Performance analysis of machine learning algorithms for IoT-based human activity recognition. In: Advances in Electrical and Computer Technologies (pp. 579–591). Springer, Singapore (2020)
Ketu, S.; Mishra, P.K.: Enhanced Gaussian process regression-based forecasting model for COVID-19 outbreak and significance of IoT for its detection. Appl. Intell. 51(3), 1492–1512 (2021)
Ketu, S.; Mishra, P.K.: Scalable kernel-based SVM classification algorithm on imbalance air quality data for proficient healthcare. Complex Intell. Syst. (2021). https://doi.org/10.1007/s40747-021-00435-5
Yu, S.N.; Lee, M.Y.: Bispectral analysis and genetic algorithm for congestive heart failure recognition based on heart rate variability. Comput. Biol. Med. 42(8), 816–825 (2012)
Martis, R.J.; Acharya, U.R.; Mandana, K.M.; Ray, A.K.; Chakraborty, C.: Application of principal component analysis to ECG signals for automated diagnosis of cardiac health. Expert Syst. Appl. 39(14), 11792–11800 (2012)
Pal, D.; Mandana, K.M.; Pal, S.; Sarkar, D.; Chakraborty, C.: Fuzzy expert system approach for coronary artery disease screening using clinical parameters. Knowl.-Based Syst. 36, 162–174 (2012)
Yu, S.N.; Lee, M.Y.: Conditional mutual information-based feature selection for congestive heart failure recognition using heart rate variability. Comput. Methods Programs Biomed. 108(1), 299–309 (2012)
Kim, J.K.; Lee, J.S.; Park, D.K.; Lim, Y.S.; Lee, Y.H.; Jung, E.Y.: Adaptive mining prediction model for content recommendation to coronary heart disease patients. Clust. Comput. 17(3), 881–891 (2014)
Melillo, P.; De Luca, N.; Bracale, M.; Pecchia, L.: Classification tree for risk assessment in patients suffering from congestive heart failure via long-term heart rate variability. IEEE J. Biomed. Health Inform. 17(3), 727–733 (2013)
Lainscsek, C.; Sejnowski, T.J.: Electrocardiogram classification using delay differential equations. Chaos Interdiscip J. Nonlinear Sci. 23(2), 023132 (2013)
Mašetic, Z.; Subasi, A.: Detection of congestive heart failures using c4.5 decision tree. Southeast Eur. J. Soft Comput. 2(2), 74 (2013)
Guidi, G.; Pettenati, M.C.; Melillo, P.; Iadanza, E.: A machine learning system to improve heart failure patient assistance. IEEE J. Biomed. Health Inform. 18(6), 1750–1756 (2014)
Liu, G.; Wang, L.; Wang, Q.; Zhou, G.; Wang, Y.; Jiang, Q.: A new approach to detect congestive heart failure using short-term heart rate variability measures. PLoS ONE 9(4), e93399 (2014)
Vafaie, M.H.; Ataei, M.; Koofigar, H.R.: Heart diseases prediction based on ECG signals’ classification using a genetic-fuzzy system and dynamical model of ECG signals. Biomed. Signal Process. Control 14, 291–296 (2014)
Long, N.C.; Meesad, P.; Unger, H.: A highly accurate firefly based algorithm for heart disease prediction. Expert Syst. Appl. 42(21), 8221–8231 (2015)
Tay, D.; Poh, C.L.; Kitney, R.I.: A novel neural-inspired learning algorithm with application to clinical risk prediction. J. Biomed. Inform. 54, 305–314 (2015)
Acharya, U.R.; Fujita, H.; Sudarshan, V.K.; Sree, V.S.; Eugene, L.W.J.; Ghista, D.N.; San Tan, R.: An integrated index for detection of sudden cardiac death using discrete wavelet transform and nonlinear features. Knowl.-Based Syst. 83, 149–158 (2015)
Abdar, M.; Kalhori, S.R.N.; Sutikno, T.; Subroto, I.M.I.; Arji, G.: Comparing performance of data mining algorithms in prediction heart diseases. Int. J. Electr. Comput. Eng. 5(6), 1569–1576 (2015)
Saxena, K.; Sharma, R.: Efficient heart disease prediction system. Procedia Comput. Sci. 85, 962–969 (2016)
Samuel, O.W.; Asogbon, G.M.; Sangaiah, A.K.; Fang, P.; Li, G.: An integrated decision support system based on ANN and Fuzzy_AHP for heart failure risk prediction. Expert Syst. Appl. 68, 163–172 (2017)
Bashir, S.; Qamar, U.; Khan, F.H.: IntelliHealth: a medical decision support application using a novel weighted multi-layer classifier ensemble framework. J. Biomed. Inform. 59, 185–200 (2016)
Fujita, H.; Acharya, U.R.; Sudarshan, V.K.; Ghista, D.N.; Sree, S.V.; Eugene, L.W.J.; Koh, J.E.: Sudden cardiac death (SCD) prediction based on nonlinear heart rate variability features and SCD index. Appl. Soft Comput. 43, 510–519 (2016)
Taslimitehrani, V.; Dong, G.; Pereira, N.L.; Panahiazar, M.; Pathak, J.: Developing EHR-driven heart failure risk prediction models using CPXR (Log) with the probabilistic loss function. J. Biomed. Inform. 60, 260–269 (2016)
Weng, C.H.; Huang, T.C.K.; Han, R.P.: Disease prediction with different types of neural network classifiers. Telematics Inform. 33(2), 277–292 (2016)
Altan, G.; Kutlu, Y.; Allahverdi, N.: A new approach to early diagnosis of congestive heart failure disease by using Hilbert-Huang transform. Comput. Methods Programs Biomed. 137, 23–34 (2016)
Masetic, Z.; Subasi, A.: Congestive heart failure detection using random forest classifier. Comput. Methods Programs Biomed. 130, 54–64 (2016)
Leema, N.; Nehemiah, H.K.; Kannan, A.: Neural network classifier optimization using differential evolution with global information and back propagation algorithm for clinical datasets. Appl. Soft Comput. 49, 834–844 (2016)
Arabasadi, Z.; Alizadehsani, R.; Roshanzamir, M.; Moosaei, H.; Yarifard, A.A.: Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm. Comput. Methods Programs Biomed. 141, 19–26 (2017)
Dolatabadi, A.D.; Khadem, S.E.Z.; Asl, B.M.: Automated diagnosis of coronary artery disease (CAD) patients using optimized SVM. Comput. Methods Programs Biomed. 138, 117–126 (2017)
Tayefi, M.; Tajfard, M.; Saffar, S.; Hanachi, P.; Amirabadizadeh, A.R.; Esmaeily, H.; Taghipour, A.; Ferns, G.A.; Moohebati, M.; Ghayour-Mobarhan, M.: hs-CRP is strongly associated with coronary heart disease (CHD): A data mining approach using decision tree algorithm. Comput. Methods Programs Biomed. 141, 105–109 (2017)
Mustaqeem, A.; Anwar, S.M.; Khan, A.R.; Majid, M.: A statistical analysis based recommender model for heart disease patients. Int. J. Med. Inform. 108, 134–145 (2017)
Mahajan, R.; Viangteeravat, T.; Akbilgic, O.: Improved detection of congestive heart failure via probabilistic symbolic pattern recognition and heart rate variability metrics. Int. J. Med. Inform. 108, 55–63 (2017)
Sudarshan, V.K.; Acharya, U.R.; Oh, S.L.; Adam, M.; Tan, J.H.; Chua, C.K.; Chua, K.P.; San Tan, R.: Automated diagnosis of congestive heart failure using dual tree complex wavelet transform and statistical features extracted from 2 s of ECG signals. Comput. Biol. Med. 83, 48–58 (2017)
Zhang, J.; Lafta, R.L.; Tao, X.; Li, Y.; Chen, F.; Luo, Y.; Zhu, X.: Coupling a fast fourier transformation with a machine learning ensemble model to support recommendations for heart disease patients in a telehealth environment. IEEE Access 5, 10674–10685 (2017)
Mokeddem, S.A.: A fuzzy classification model for myocardial infarction risk assessment. Appl. Intell. 48(5), 1233–1250 (2018)
Boon, K.H.; Khalil-Hani, M.; Malarvili, M.B.: Paroxysmal atrial fibrillation prediction based on HRV analysis and non-dominated sorting genetic algorithm III. Comput. Methods Programs Biomed. 153, 171–184 (2018)
Zheng, Y.; Guo, X.; Qin, J.; Xiao, S.: Computer-assisted diagnosis for chronic heart failure by the analysis of their cardiac reserve and heart sound characteristics. Comput. Methods Programs Biomed. 122(3), 372–383 (2015)
Rasmy, L.; Wu, Y.; Wang, N.; Geng, X.; Zheng, W.J.; Wang, F.; Wu, H.; Xu, H.; Zhi, D.: A study of generalizability of recurrent neural network-based predictive models for heart failure onset risk using a large and heterogeneous EHR data set. J. Biomed. Inform. 84, 11–16 (2018)
Aborokbah, M.M.; Al-Mutairi, S.; Sangaiah, A.K.; Samuel, O.W.: Adaptive context aware decision computing paradigm for intensive health care delivery in smart cities—a case analysis. Sustain. Cities Soc. 41, 919–924 (2018)
Pławiak, P.: Novel methodology of cardiac health recognition based on ECG signals and evolutionary-neural system. Expert Syst. Appl. 92, 334–349 (2018)
Tan, J.H.; Hagiwara, Y.; Pang, W.; Lim, I.; Oh, S.L.; Adam, M.; Tan, R.S.; Chen, M.; Acharya, U.R.: Application of stacked convolutional and long short-term memory network for accurate identification of CAD ECG signals. Comput. Biol. Med. 94, 19–26 (2018)
Bozkurt, B.; Germanakis, I.; Stylianou, Y.: A study of time-frequency features for CNN-based automatic heart sound classification for pathology detection. Comput. Biol. Med. 100, 132–143 (2018)
Miao, F.; Cai, Y.P.; Zhang, Y.X.; Fan, X.M.; Li, Y.: Predictive modeling of hospital mortality for patients with heart failure by using an improved random survival forest. IEEE Access 6, 7244–7253 (2018)
Dominguez-Morales, J.P.; Jimenez-Fernandez, A.F.; Dominguez-Morales, M.J.; Jimenez-Moreno, G.: Deep neural networks for the recognition and classification of heart murmurs using neuromorphic auditory sensors. IEEE Trans. Biomed. Circuits Syst. 12(1), 24–34 (2017)
Jin, B.; Che, C.; Liu, Z.; Zhang, S.; Yin, X.; Wei, X.: Predicting the risk of heart failure with EHR sequential data modeling. Ieee Access 6, 9256–9261 (2018)
Yahaya, L.; Oye, N.D.; Garba, E.J.: A Comprehensive review on heart disease prediction using data mining and machine learning techniques. Am. J. Artif. Intell. 4(1), 20–29 (2020)
Subhadra, K.; Vikas, B.: Neural network based intelligent system for predicting heart disease. Int. J. Innov. Technol. Exploring Eng. (IJITEE) 8(5), 484–487 (2019)
Ayatollahi, H.; Gholamhosseini, L.; Salehi, M.: Predicting coronary artery disease: a comparison between two data mining algorithms. BMC Public Health 19(1), 1–9 (2019)
Padmanabhan, M.; Yuan, P.; Chada, G.; Nguyen, H.V.: Physician-friendly machine learning: A case study with cardiovascular disease risk prediction. J. Clin. Med. 8(7), 1050 (2019)
Lakshmanarao, A.; Swathi, Y.; Sri, P.; Sundareswar, S.: Machine learning techniques for heart disease prediction. Int. J. Sci. Technol. Res. 8(11), 374–377 (2019)
Reddy, P.K.; Reddy, T.S.; Balakrishnan, S.; Basha, S.M.; Poluru, R.K.: Heart disease prediction using machine learning algorithm. Int. J. Innov. Technol. Explor. Eng. 8(10), 2603–2606 (2019)
Annepu, D.; Gowtham, G.: Cardiovascular disease prediction using machine learning techniques. Int. Res. J. Eng. Technol. 6(4), 3963–3971 (2019)
MIT-BIH Arrhythmia Database Available Online: https://www.physionet.org/physiobank/database/mitdb/
Heart Disease Data Set Available Online: https://archive.ics.uci.edu/ml/datasets/Heart+Disease
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Fernández, A.; Garcia, S.; Herrera, F.; Chawla, N.V.: SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J. Artif. Intell. Res. 61, 863–905 (2018)
Bardenet, R.; Brendel, M.; Kégl, B.; Sebag, M. (2013) Collaborative hyperparameter tuning. In: International Conference on Machine Learning, pp. 199–207
Yogatama, D.; Mann, G. (2014). Efficient transfer learning method for automatic hyperparameter tuning. In: Artificial Intelligence and Statistics, pp. 1077–1085
Goutte, C.; Gaussier, E. (2005) A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: European Conference on Information Retrieval, pp. 345–359. Springer, Berlin
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ketu, S., Mishra, P.K. Empirical Analysis of Machine Learning Algorithms on Imbalance Electrocardiogram Based Arrhythmia Dataset for Heart Disease Detection. Arab J Sci Eng 47, 1447–1469 (2022). https://doi.org/10.1007/s13369-021-05972-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-021-05972-2