Empirical Analysis of Machine Learning Algorithms on Imbalance Electrocardiogram Based Arrhythmia Dataset for Heart Disease Detection

Ketu, Shwet; Mishra, Pramod Kumar

doi:10.1007/s13369-021-05972-2

Empirical Analysis of Machine Learning Algorithms on Imbalance Electrocardiogram Based Arrhythmia Dataset for Heart Disease Detection

Research Article-Computer Engineering and Computer Science
Published: 15 July 2021

Volume 47, pages 1447–1469, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Empirical Analysis of Machine Learning Algorithms on Imbalance Electrocardiogram Based Arrhythmia Dataset for Heart Disease Detection

Download PDF

Shwet Ketu¹ &
Pramod Kumar Mishra¹

1475 Accesses
39 Citations
Explore all metrics

Abstract

Living beings are subjected to many hazards during their course of life. Owing to high mortality rate, heart disease (HD) is among leading hazards for living being. It is world’s one of the critical disease due to its complex diagnosis and expansive treatment. It has predominantly affected the health care sector of developing as well as developed countries. Inadequate preventive measures, diagnosis shortcomings, inefficient medical support, lack of medical staff and advancements have led to severe impacts on developing countries. The paper exhibits state-of-the-art of various intelligent solutions for HD detection with an empirical analysis of machine learning algorithms on electrocardiogram-based arrhythmia dataset for disease detection. A critical investigation is being performed using eight machine learning algorithms, Support Vector Machine, K-Nearest Neighbors, Random Forest, Extra Tree, Bagging, Decision Tree, Linear Regression, and Adaptive Boosting, under imbalanced and balanced class paradigms. The performance of these algorithms is tested with four metrics namely, precision, recall, accuracy, and f1-score. The empirical analysis presents an interesting insight on the structure of dataset. Initially for binary class balancing problem majority class have more accuracy than the minority class because model’s training dataset is crowded with majority class tuples than minority class. The paper uses Synthetic Minority Over-sampling Technique for data balancing. It has not only increased the overall accuracy of the algorithm but also the individual accuracy of the classes. Hence, the accuracy of the minority class will not be sacrificed.

An intelligent hybrid classification model for heart disease detection using imbalanced electrocardiogram signals

Article 12 September 2023

Meticulous Presaging Arrhythmia Fibrillation for Heart Disease Classification Using Oversampling Method for Multiple Classifiers Based on Machine Learning

Heart Disease Diagnosis Using Machine Learning Classification Techniques

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The heart is an essential organ of the human body, which is also known as the human’s engine room. It is a lattice of muscles that pumps blood into the human body or in other words we can say that it is the central processing area of the cardiovascular system [1]. The cardiovascular system is a complex network of blood circulation that is composed of blood vessels (i.e., arteries), capillaries, and veins [2]. Any abnormality or obstruction in healthy blood flow or blood circulation may lead to severe and several complexities of heart disorders or diseases. These heart disorders or diseases are commonly known as cardiovascular diseases (CVDs) and are one of the deadliest diseases across the world to date. The CVDs have consisted of various diseases such as brain vascular diseases, heart diseases, and blood vessel diseases [3]. Global Atlas on Cardiovascular Disease Prevention and Control of the World Health Organization (WHO) has quoted in their report is that CVDs are one of the foremost causes of disability and deaths across the globe [4].

However, several reports from WHO have indicated an increase in CVD worldwide, which is a very dangerous indication for humans. Approximately 17.5 million people have died due to CVDs in the year 2012 globally than from any other life-threatening diseases [5]. It is also quoted in the WHO's report that around 17.9 million people die from CVDs each year which is 31% of all deaths across the world. Among these deaths, 85% have died from either heart attack or stroke [6].

CVDs can be categorized into several types and some of them have been listed below [7]:

1.
Coronary Heart Disease: This disease is caused by blockage or damage in the main blood vessel.
2.
Heart Failure: In this condition heart suffers to pump sufficient blood.
3.
Cardiomyopathy: It is a hereditary or acquired heart muscle disease.
4.
Hypertensive Heart Disease: This disease is caused by high blood pressure (BP).
5.
Ischemic Heart Disease: This disease makes heart arteries narrow; thus, the minimal amount of oxygen and blood reaches into the heart.
6.
Valvular Heart Disease: This disease is caused by defects or damage in any of the heart valves.
7.
Inflammatory Heart Disease: This disease is caused by bacterial or viral infections.

In the medical domain, machine learning plays an important role in knowledge extraction and medical data analysis. Its immense handling ability and diverse processing capabilities make it a giant player, to deal with the complex problem in both real-time and offline scenarios [8,9,10,11]. If we talk about heart disease detection, the algorithms such as Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest (RF), Naive Bayes (NB), Extra Tree (ET), Bagging (BAG), Decision Tree (DT), Linear Regression (LR), Adaptive Boosting (ADB), Linear Discriminant Analysis (LDA), Convolutional Neural Network (CNN), Quadratic Discriminant Analysis (QDA), Multi-layer Perceptron (MLP), Ensemble Classifiers, Artificial neural network (ANN), Boosting, and so on have been used. Apart from this, various hybrid models are also been introduced and they have archived great success [12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60].

Accurate diagnosis and prediction are vital issues not only for hospitals but also for practitioners. These vital issues should be taken care-off while building a smart healthcare solution. Development in computing technologies has enabled various facilities for real-time data collection and storage. Thus, a huge amount of health data are collected and which is very useful for clinical investigations.

The principal objective of this study is:

To critically analyze and summarize the state-of-the-art research articles on heart disease detection over ECG datasets.
To critically investigate the empirical analysis of machine learning algorithms on imbalance electrocardiogram based arrhythmia dataset for heart disease detection.
To find out the impact of class balancing on the performance of machine learning algorithms over the imbalanced class paradigms.
To give a machine learning-based direction toward robotic or smart machine-based solutions for social well-being.

The rest of this study is as follows. Section 2 exhibits state-of-the-art articles present on the detection of heart diseases over ECG datasets. In Sect. 3, a discussion about the material and methods used for preliminary study such as dataset description, model setup, and statistical analysis have been drawn. Section 4 manifests the result of the experimental evaluation. The comprehensive discussion about the empirical analysis is conferred in Sect. 5. In the last Sect. 6, the concluding comments with the anticipated scope have been described.

2 Related Work

The emergence of IoT encourages researchers to work in the healthcare domain, with massive data coverage and functionalities. It also motivates the big companies to build a health-centric solution for the wellbeing of the human. If we talk about the algorithmic perspective, massive development has been seen in the last couple of decades. From time to time various researchers have given the possible solution toward making the healthcare domain smart, but still, there is plenty of scopes are left. Thus, these scopes need improvements in both, design and algorithmic perspectives. It will not only help in building a smarter world but also for making the healthcare domain to new heights by enabling the robotics-based smarter solution.

The state-of-the-art articles on heart disease detection over the ECG dataset have been shown in Table 1 [12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60].

Table 1 The state-of-the-art on heart disease detection over the ECG dataset

Full size table

In Table 1, various state-of-the-art articles on heart disease detection over the ECG dataset have been shown. This table gives a quick inside of the state-of-the-art methods with respect to year, research objectives, used methods, dataset and their types, accuracy of the methods, and their respective references. It is clear from pieces of the literature that there are plenty of machine learning algorithms such as Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest (RF), Naive Bayes (NB), Extra Tree (ET), Bagging (BAG), Decision Tree (DT), Linear Regression (LR), Adaptive Boosting (ADB), Linear Discriminant Analysis (LDA), Convolutional Neural Network (CNN), Quadratic Discriminant Analysis (QDA), Multi-layer Perceptron (MLP), Ensemble Classifiers, Artificial neural network (ANN), Boosting, and various hybrid models have been used for the classification of the heart disease [12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60]. These methods are applied either on the ECG signals or numerical datasets. It is evident from the literature that this is an active area of research with the plenty of opportunities. Thus, there is still a lot of scopes to find out the more effective methods with clear interpretations. Keeping this into our mind, we have presented the empirical examination of the state-of-the-art algorithms.

3 Materials and Methods

This section talks about the material and methods used for the preliminary study. It is consisting of the three subsections which discuss dataset description, model setup, and statistical parameters, respectively.

3.1 Data

For this empirical investigation, two open-source Electrocardiogram (ECG) datasets have been used. The first dataset has been extracted from PhysioNet’s repository under the MIT-BIH arrhythmia database [61]. This dataset has been recorded using the nine electrodes (E1-E9) mounted on various locations of the human body. The second dataset has been extracted from the UCI (University of California-Irvine) repository named Heart Disease Data (consist of four data repositories Cleveland, Hungary, Switzerland, and the VA Long Beach) which is also an arrhythmia dataset (Coronary Artery Disease) [62]. This dataset has been recorded using the thirteen parameters i.e., AGE, SEX, CP, TRESTBPS, CHOL, FBS, RESTECG, THALACH, EXANG, OLDPEAK, SLOPE, CA, and THAL. The first dataset (PhysioNet’s arrhythmia Dataset) is consists of 74,501 instances of 9 attributes whereas the second dataset (UCI's Arrhythmia Dataset) contains 403 instances of 14 attributes.

In Figs. 1 and 2, the visualization of the PhysioNet’s arrhythmia dataset and UCI's arrhythmia dataset has been exhibited, respectively. Both the datasets have been used for heart disease detection purposes.

The class-wise distribution of the PhysioNet’s arrhythmia dataset (based on nine attributes) and UCI's arrhythmia dataset (based on thirteen attributes) has been exhibited in Tables 2 and 3, respectively. These tables are consisting of the class-wise classification of all the liable attributes based on minimum range, maximum range, mean and standard derivation.

Table 2 Attributes description of PhysioNet’s arrhythmia dataset

Full size table

Table 3 Attributes description of UCI's arrhythmia dataset

Full size table

In Figs. 3 and 4, the histogram of the PhysioNet’s and UCI's arrhythmia dataset has been shown. This histogram visualization will very helpful to understand the dataset distribution in the form of their spread and shape over a real-time sample (signal/data). This information will be extremely beneficial for further investigation.

3.2 Model Setup

The empirical analysis setup of machine learning algorithms for the electrocardiogram-based arrhythmia dataset has exhibited in Fig. 5.

This critical analysis is consisting of six fundamental steps.

In the initial step, the ECG-based arrhythmia dataset (from PhysioNet and UCI repositories) is supplied as input.
In step 2, the cleaning of the dataset is performed for the elimination of the missing values and unusual objects.
The class balancing using the Synthetic Minority Over-sampling Technique (SMOTE) is performed in step 3.
In the next step 4, the class balanced preprocessed data are fed as an input to the several machine learning algorithms (i.e., SVM, KNN, RF, ET, BAG, ET, DT, LR, and ADB) with a tenfold class validation method.
The statistical interpretation-based evaluation of the all used machine learning algorithms has computed in step 5.
In the last step 6, the output of the used machine learning algorithms is received.

3.2.1 Data Balancing Using SMOTE

Class imbalance is one of the well-known and vital issues which may influence the performance of machine learning algorithms. This empirical analysis has been conducted to find out the impact of class imbalance on the performance of the various machine learning algorithms. From this empirical analysis, we have seen that in the binary class problem the majority class has more accuracy than the minority class because while training the model the more samples are from the majority class and only a small bunch of samples are from the minority class. Thus, there is a significant need for an effective method that could manage the issue of class imbalance. In this context, an unsupervised technique i.e., SMOTE (Synthetic Minority Oversampling Technique) [63, 64] method has been used to deal with the class imbalance problem on the ECG-based arrhythmia datasets (from PhysioNet and UCI repositories).

Class balancing outcome of PhysioNet’s arrhythmia dataset and UCI arrhythmia dataset has been exhibited in Tables 4 and 5, respectively. These tables hold the class-wise aspects of the SMOTE percentage with the total number of samples at each setting.

Table 4 Result of class imbalance problem using SMOTE on PhysioNet’s arrhythmia dataset

Full size table

Table 5 Result of class imbalance problem using SMOTE on UCI's arrhythmia dataset

Full size table

3.2.2 Hyper-tuning of Machine Learning Algorithms

Hyper-tuning is one of the best ways to select the hyper-parameter for machine learning algorithms. With the help of this, we can evaluate various hyperparameters under various settings to find the best-suited parameter for the given problem [65, 66]. Therefore, we have hyper tuned each machine-learning algorithm to find out the best parameters. These evaluated parameters have been used to train the machine learning models.

In Table 6, the hyperparameters with various selection criteria have been shown. Based on these selection criteria, the best hyperparameters have opted and which have been used in the experimental evaluation.

Table 6 Hyper-parameter selection

Full size table

3.3 Statistical Analysis

To validate the evaluation result of machine learning algorithms, four statistical measures, i.e., precision, f1-score, recall, and accuracy, have been utilized. These statistical measures have played an important role in establishing the accuracy and suitability of machine learning algorithms [67]. The mathematical imputation of these statistical measures has been manifested in Eqs. 1, 2, 3 and 4, respectively.

$${\text{Precision}} = \frac{{\left( {{\text{TP}}} \right)}}{{\left( {{\text{TP}} + {\text{FP}}} \right)}}$$

(1)

$$f1 = \frac{{2 \times \left( {{\text{precision}} \times {\text{recall}}} \right)}}{{{\text{precision}} + {\text{recall}}}}$$

(2)

$${\text{Recall}} = \frac{{\left( {{\text{TP}}} \right)}}{{\left( {{\text{TP}} + {\text{FN}}} \right)}}$$

(3)

$${\text{Accuracy}} = \frac{{\left( {{\text{TP}} + {\text{TN}}} \right)}}{{\left( {{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}} \right)}} \times 100\%$$

(4)

where TP, true positive; TN, true negatives; FP, false positive, and FN false negatives.

4 Results

In the process of result finding, a critical analysis has been performed using eight machine learning algorithms (i.e., Support Vector Machine, K-Nearest Neighbors, Random Forest, Extra Tree, Bagging, Decision Tree, Linear Regression, and Adaptive Boosting) with a tenfold class validation method, and which has been shown in Fig. 6.

To understand the correlation among the attributes of the arrhythmia dataset of PhysioNet (contains 74,501 instances of 9 attributes) and UCI (contains 403 instances of 14 attributes) repositories the correlation coefficient for both the dataset has been calculated and shown in Figs. 7 and 8, respectively.

This critical investigation has been conducted into two parts. In the first part, a critical analysis of the hyper-tuned result of all eight machine learning algorithms on ECG-based arrhythmia dataset (from PhysioNet and UCI repositories) has been performed. In the second part, the class balancing (using SMOTE) and hyper-tuning both are combinedly applied on the same arrhythmia dataset (from PhysioNet and UCI repositories), to find out the impact of class balancing on results. The performance evaluation of all algorithms has been conducted using the four validation measures i.e., accuracy, recall, f-1socre, and precision. This empirical investigation is aimed to find out the impact of class balancing on the performance of machine learning algorithms which will very helpful in developing a machine learning-based direction toward robotic or smart machine-based solutions for social well-being.

In Tables 7, 8, 9 and 10, the performance evaluation result of eight machine learning algorithms under the different validation criteria (cross-validation policy, i.e., 1-fold, 3-fold, 5-fold, and 10-fold) based on the four performance evaluators (i.e., precision, f1-score, recall, accuracy) for ECG-based arrhythmia dataset (from PhysioNet and UCI repositories) has been presented.

Table 7 Performance analysis of hyper-tuned machine leaning algorithm over PhysioNet’s arrhythmia dataset

Full size table

Table 8 Performance analysis of hyper-tuned machine learning algorithm over PhysioNet’s arrhythmia dataset after class balancing

Full size table

Table 9 Performance analysis of hyper-tuned machine leaning algorithm over UCI’s arrhythmia dataset

Full size table

Table 10 Performance analysis of hyper-tuned machine learning algorithm over UCI’s arrhythmia dataset after class balancing

Full size table

From this empirical analysis is clear that in the binary class balancing problem, the majority class has archived more accuracy than the minority class, because while training the model the more samples are from the majority class and only a small bunch of samples are from the minority class. But after the class balancing using the SMOTE has not only increases the overall accuracy of the algorithm but also the individual accuracy of the classes. Therefore, the accuracy of the minority class will not be sacrificed.

Table 7 shows the hyper-tuned results of all eight machine learning algorithms on the PhysioNet’s ECG-based arrhythmia dataset, whereas Table 8 shows the hyper-tuned results of all eight machine learning algorithms after class balancing. Similarly, Table 9 shows the hyper-tuned results of all eight machine learning algorithms on UCI's ECG-based arrhythmia dataset, whereas Table 10 shows the hyper-tuned results of all eight machine learning algorithms after class balancing.

5 Discussion

Class-wise performance result of eight machine learning algorithms (Hyper-tuned vs. Hyper-tuned After Class Balancing) under the different validation criteria (cross-validation policy, i.e., 1-fold, 3-fold, 5-fold, and 10-fold) based on the four performance evaluators (i.e., precision, f1-score, recall, accuracy) for ECG-based arrhythmia dataset (from PhysioNet and UCI repositories) has been presented in Figs. 9, 10, 11 and 12.

Figure 9 shows the class 1 result of eight machine learning algorithms (Hyper-tuned vs. Hyper-tuned After Class Balancing) on the PhysioNet’s ECG-based arrhythmia dataset, whereas Fig. 10 shows the class 2 result of eight machine learning algorithms (Hyper-tuned vs. Hyper-tuned After Class Balancing) on the PhysioNet’s ECG-based arrhythmia dataset.

Similarly, Fig. 11 shows the class 0 result of eight machine learning algorithms (Hyper-tuned vs. Hyper-tuned After Class Balancing) on UCI's ECG-based arrhythmia dataset, whereas Fig. 12 shows the class 1 result of eight machine learning algorithms (Hyper-tuned vs. Hyper-tuned After Class Balancing) on UCI's ECG-based arrhythmia dataset.

In Fig. 13, the average accuracy with a tenfold cross-validation policy of eight machine learning algorithms (Hyper-tuned vs. Hyper-tuned After Class Balancing) over the ECG-based arrhythmia dataset of PhysioNet and UCI has been presented.

6 Conclusion

The growing rate of heart illnesses has become a serious matter of concern for humans across the globe. This concern got the attention of researchers to reduce the mortality and morbidity rate of heart diseases across the globe. From time to time various researchers have given their algorithmic solutions toward the heart data analysis. These algorithmic solutions are playing a vital role not only in building the smart robotic solution but also in minimizing the impact of the diseases by effective decision making. This empirical investigation is aimed to critically analyze and summarize state-of-the-art articles on heart disease detection. Apart from this, we have also performed an empirical analysis of machine learning algorithms using an electrocardiogram dataset to find out the impact of class balancing on the performance of machine learning algorithms, which will give a machine learning-based direction toward robotic or smart machine-based solutions. From this experimental analysis is clear that in the binary class balancing problem, the majority class has archived more accuracy than the minority class because, while training the model the more samples are from the majority and only a small bunch of samples are from the minority class. But after the class balancing using the Synthetic Minority Over-sampling Technique (SMOTE) has not only increases the overall accuracy of the algorithm but also the individual accuracy of the classes. Therefore, the accuracy of the minority class will not be sacrificed. Thus, we can say that class imbalance is one of the vital issues that have to take care of before making medical solutions where are things depend on the accuracy of the algorithm.

In the future, this critical analysis will further be extended with the algorithmic and data perspective.

References

Nashif, S.; Raihan, M.R.; Islam, M.R.; Imam, M.H.: Heart disease detection by using machine learning algorithms and a real-time cardiovascular health monitoring system. World J. Eng. Technol. 6(4), 854–873 (2018)
Article Google Scholar
Stefanovska, A.: Physics of the human cardiovascular system. Contemp. Phys. 40(1), 31–55 (1999)
Article Google Scholar
Mendis, S.; Puska, P.; Norrving, B.; World Health Organization: Global atlas on cardiovascular disease prevention and control. World Health Organization, Geneva (2011)
Google Scholar
Najafi, F.; Jamrozik, K.; Dobson, A.J.: Understanding the ‘epidemic of heart failure’: a systematic review of trends in determinants of heart failure. Eur. J. Heart Fail. 11(5), 472–479 (2009)
Article Google Scholar
World Health Organization. (2020). Hearts: technical package for cardiovascular disease management in primary health care.
World Health Organization. (2013). Global action plan for the prevention and control of noncommunicable diseases 2013–2020.
Nikhar, S.; Karandikar, A.M.: Prediction of heart disease using machine learning algorithms. Int. J. Adv. Eng. Manag. Sci. 2(6), 239484 (2016)
Google Scholar
Ketu, S.; Mishra, P.K.: Hybrid classification model for eye state detection using electroencephalogram signals. Cogn. Neurodyn. (2021). https://doi.org/10.1007/s11571-021-09678-x
Ketu, S.; Mishra, P.K.: Performance analysis of machine learning algorithms for IoT-based human activity recognition. In: Advances in Electrical and Computer Technologies (pp. 579–591). Springer, Singapore (2020)
Ketu, S.; Mishra, P.K.: Enhanced Gaussian process regression-based forecasting model for COVID-19 outbreak and significance of IoT for its detection. Appl. Intell. 51(3), 1492–1512 (2021)
Article Google Scholar
Ketu, S.; Mishra, P.K.: Scalable kernel-based SVM classification algorithm on imbalance air quality data for proficient healthcare. Complex Intell. Syst. (2021). https://doi.org/10.1007/s40747-021-00435-5
Yu, S.N.; Lee, M.Y.: Bispectral analysis and genetic algorithm for congestive heart failure recognition based on heart rate variability. Comput. Biol. Med. 42(8), 816–825 (2012)
Article Google Scholar
Martis, R.J.; Acharya, U.R.; Mandana, K.M.; Ray, A.K.; Chakraborty, C.: Application of principal component analysis to ECG signals for automated diagnosis of cardiac health. Expert Syst. Appl. 39(14), 11792–11800 (2012)
Article Google Scholar
Pal, D.; Mandana, K.M.; Pal, S.; Sarkar, D.; Chakraborty, C.: Fuzzy expert system approach for coronary artery disease screening using clinical parameters. Knowl.-Based Syst. 36, 162–174 (2012)
Article Google Scholar
Yu, S.N.; Lee, M.Y.: Conditional mutual information-based feature selection for congestive heart failure recognition using heart rate variability. Comput. Methods Programs Biomed. 108(1), 299–309 (2012)
Article Google Scholar
Kim, J.K.; Lee, J.S.; Park, D.K.; Lim, Y.S.; Lee, Y.H.; Jung, E.Y.: Adaptive mining prediction model for content recommendation to coronary heart disease patients. Clust. Comput. 17(3), 881–891 (2014)
Article Google Scholar
Melillo, P.; De Luca, N.; Bracale, M.; Pecchia, L.: Classification tree for risk assessment in patients suffering from congestive heart failure via long-term heart rate variability. IEEE J. Biomed. Health Inform. 17(3), 727–733 (2013)
Article Google Scholar
Lainscsek, C.; Sejnowski, T.J.: Electrocardiogram classification using delay differential equations. Chaos Interdiscip J. Nonlinear Sci. 23(2), 023132 (2013)
Article MathSciNet MATH Google Scholar
Mašetic, Z.; Subasi, A.: Detection of congestive heart failures using c4.5 decision tree. Southeast Eur. J. Soft Comput. 2(2), 74 (2013)
Google Scholar
Guidi, G.; Pettenati, M.C.; Melillo, P.; Iadanza, E.: A machine learning system to improve heart failure patient assistance. IEEE J. Biomed. Health Inform. 18(6), 1750–1756 (2014)
Article Google Scholar
Liu, G.; Wang, L.; Wang, Q.; Zhou, G.; Wang, Y.; Jiang, Q.: A new approach to detect congestive heart failure using short-term heart rate variability measures. PLoS ONE 9(4), e93399 (2014)
Article Google Scholar
Vafaie, M.H.; Ataei, M.; Koofigar, H.R.: Heart diseases prediction based on ECG signals’ classification using a genetic-fuzzy system and dynamical model of ECG signals. Biomed. Signal Process. Control 14, 291–296 (2014)
Article Google Scholar
Long, N.C.; Meesad, P.; Unger, H.: A highly accurate firefly based algorithm for heart disease prediction. Expert Syst. Appl. 42(21), 8221–8231 (2015)
Article Google Scholar
Tay, D.; Poh, C.L.; Kitney, R.I.: A novel neural-inspired learning algorithm with application to clinical risk prediction. J. Biomed. Inform. 54, 305–314 (2015)
Article Google Scholar
Acharya, U.R.; Fujita, H.; Sudarshan, V.K.; Sree, V.S.; Eugene, L.W.J.; Ghista, D.N.; San Tan, R.: An integrated index for detection of sudden cardiac death using discrete wavelet transform and nonlinear features. Knowl.-Based Syst. 83, 149–158 (2015)
Article Google Scholar
Abdar, M.; Kalhori, S.R.N.; Sutikno, T.; Subroto, I.M.I.; Arji, G.: Comparing performance of data mining algorithms in prediction heart diseases. Int. J. Electr. Comput. Eng. 5(6), 1569–1576 (2015)
Google Scholar
Saxena, K.; Sharma, R.: Efficient heart disease prediction system. Procedia Comput. Sci. 85, 962–969 (2016)
Article Google Scholar
Samuel, O.W.; Asogbon, G.M.; Sangaiah, A.K.; Fang, P.; Li, G.: An integrated decision support system based on ANN and Fuzzy_AHP for heart failure risk prediction. Expert Syst. Appl. 68, 163–172 (2017)
Article Google Scholar
Bashir, S.; Qamar, U.; Khan, F.H.: IntelliHealth: a medical decision support application using a novel weighted multi-layer classifier ensemble framework. J. Biomed. Inform. 59, 185–200 (2016)
Article Google Scholar
Fujita, H.; Acharya, U.R.; Sudarshan, V.K.; Ghista, D.N.; Sree, S.V.; Eugene, L.W.J.; Koh, J.E.: Sudden cardiac death (SCD) prediction based on nonlinear heart rate variability features and SCD index. Appl. Soft Comput. 43, 510–519 (2016)
Article Google Scholar
Taslimitehrani, V.; Dong, G.; Pereira, N.L.; Panahiazar, M.; Pathak, J.: Developing EHR-driven heart failure risk prediction models using CPXR (Log) with the probabilistic loss function. J. Biomed. Inform. 60, 260–269 (2016)
Article Google Scholar
Weng, C.H.; Huang, T.C.K.; Han, R.P.: Disease prediction with different types of neural network classifiers. Telematics Inform. 33(2), 277–292 (2016)
Article Google Scholar
Altan, G.; Kutlu, Y.; Allahverdi, N.: A new approach to early diagnosis of congestive heart failure disease by using Hilbert-Huang transform. Comput. Methods Programs Biomed. 137, 23–34 (2016)
Article Google Scholar
Masetic, Z.; Subasi, A.: Congestive heart failure detection using random forest classifier. Comput. Methods Programs Biomed. 130, 54–64 (2016)
Article Google Scholar
Leema, N.; Nehemiah, H.K.; Kannan, A.: Neural network classifier optimization using differential evolution with global information and back propagation algorithm for clinical datasets. Appl. Soft Comput. 49, 834–844 (2016)
Article Google Scholar
Arabasadi, Z.; Alizadehsani, R.; Roshanzamir, M.; Moosaei, H.; Yarifard, A.A.: Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm. Comput. Methods Programs Biomed. 141, 19–26 (2017)
Article Google Scholar
Dolatabadi, A.D.; Khadem, S.E.Z.; Asl, B.M.: Automated diagnosis of coronary artery disease (CAD) patients using optimized SVM. Comput. Methods Programs Biomed. 138, 117–126 (2017)
Article Google Scholar
Tayefi, M.; Tajfard, M.; Saffar, S.; Hanachi, P.; Amirabadizadeh, A.R.; Esmaeily, H.; Taghipour, A.; Ferns, G.A.; Moohebati, M.; Ghayour-Mobarhan, M.: hs-CRP is strongly associated with coronary heart disease (CHD): A data mining approach using decision tree algorithm. Comput. Methods Programs Biomed. 141, 105–109 (2017)
Article Google Scholar
Mustaqeem, A.; Anwar, S.M.; Khan, A.R.; Majid, M.: A statistical analysis based recommender model for heart disease patients. Int. J. Med. Inform. 108, 134–145 (2017)
Article Google Scholar
Mahajan, R.; Viangteeravat, T.; Akbilgic, O.: Improved detection of congestive heart failure via probabilistic symbolic pattern recognition and heart rate variability metrics. Int. J. Med. Inform. 108, 55–63 (2017)
Article Google Scholar
Sudarshan, V.K.; Acharya, U.R.; Oh, S.L.; Adam, M.; Tan, J.H.; Chua, C.K.; Chua, K.P.; San Tan, R.: Automated diagnosis of congestive heart failure using dual tree complex wavelet transform and statistical features extracted from 2 s of ECG signals. Comput. Biol. Med. 83, 48–58 (2017)
Article Google Scholar
Zhang, J.; Lafta, R.L.; Tao, X.; Li, Y.; Chen, F.; Luo, Y.; Zhu, X.: Coupling a fast fourier transformation with a machine learning ensemble model to support recommendations for heart disease patients in a telehealth environment. IEEE Access 5, 10674–10685 (2017)
Article Google Scholar
Mokeddem, S.A.: A fuzzy classification model for myocardial infarction risk assessment. Appl. Intell. 48(5), 1233–1250 (2018)
Google Scholar
Boon, K.H.; Khalil-Hani, M.; Malarvili, M.B.: Paroxysmal atrial fibrillation prediction based on HRV analysis and non-dominated sorting genetic algorithm III. Comput. Methods Programs Biomed. 153, 171–184 (2018)
Article Google Scholar
Zheng, Y.; Guo, X.; Qin, J.; Xiao, S.: Computer-assisted diagnosis for chronic heart failure by the analysis of their cardiac reserve and heart sound characteristics. Comput. Methods Programs Biomed. 122(3), 372–383 (2015)
Article Google Scholar
Rasmy, L.; Wu, Y.; Wang, N.; Geng, X.; Zheng, W.J.; Wang, F.; Wu, H.; Xu, H.; Zhi, D.: A study of generalizability of recurrent neural network-based predictive models for heart failure onset risk using a large and heterogeneous EHR data set. J. Biomed. Inform. 84, 11–16 (2018)
Article Google Scholar
Aborokbah, M.M.; Al-Mutairi, S.; Sangaiah, A.K.; Samuel, O.W.: Adaptive context aware decision computing paradigm for intensive health care delivery in smart cities—a case analysis. Sustain. Cities Soc. 41, 919–924 (2018)
Article Google Scholar
Pławiak, P.: Novel methodology of cardiac health recognition based on ECG signals and evolutionary-neural system. Expert Syst. Appl. 92, 334–349 (2018)
Article Google Scholar
Tan, J.H.; Hagiwara, Y.; Pang, W.; Lim, I.; Oh, S.L.; Adam, M.; Tan, R.S.; Chen, M.; Acharya, U.R.: Application of stacked convolutional and long short-term memory network for accurate identification of CAD ECG signals. Comput. Biol. Med. 94, 19–26 (2018)
Article Google Scholar
Bozkurt, B.; Germanakis, I.; Stylianou, Y.: A study of time-frequency features for CNN-based automatic heart sound classification for pathology detection. Comput. Biol. Med. 100, 132–143 (2018)
Article Google Scholar
Miao, F.; Cai, Y.P.; Zhang, Y.X.; Fan, X.M.; Li, Y.: Predictive modeling of hospital mortality for patients with heart failure by using an improved random survival forest. IEEE Access 6, 7244–7253 (2018)
Article Google Scholar
Dominguez-Morales, J.P.; Jimenez-Fernandez, A.F.; Dominguez-Morales, M.J.; Jimenez-Moreno, G.: Deep neural networks for the recognition and classification of heart murmurs using neuromorphic auditory sensors. IEEE Trans. Biomed. Circuits Syst. 12(1), 24–34 (2017)
Article Google Scholar
Jin, B.; Che, C.; Liu, Z.; Zhang, S.; Yin, X.; Wei, X.: Predicting the risk of heart failure with EHR sequential data modeling. Ieee Access 6, 9256–9261 (2018)
Article Google Scholar
Yahaya, L.; Oye, N.D.; Garba, E.J.: A Comprehensive review on heart disease prediction using data mining and machine learning techniques. Am. J. Artif. Intell. 4(1), 20–29 (2020)
Article Google Scholar
Subhadra, K.; Vikas, B.: Neural network based intelligent system for predicting heart disease. Int. J. Innov. Technol. Exploring Eng. (IJITEE) 8(5), 484–487 (2019)
Google Scholar
Ayatollahi, H.; Gholamhosseini, L.; Salehi, M.: Predicting coronary artery disease: a comparison between two data mining algorithms. BMC Public Health 19(1), 1–9 (2019)
Article Google Scholar
Padmanabhan, M.; Yuan, P.; Chada, G.; Nguyen, H.V.: Physician-friendly machine learning: A case study with cardiovascular disease risk prediction. J. Clin. Med. 8(7), 1050 (2019)
Article Google Scholar
Lakshmanarao, A.; Swathi, Y.; Sri, P.; Sundareswar, S.: Machine learning techniques for heart disease prediction. Int. J. Sci. Technol. Res. 8(11), 374–377 (2019)
Google Scholar
Reddy, P.K.; Reddy, T.S.; Balakrishnan, S.; Basha, S.M.; Poluru, R.K.: Heart disease prediction using machine learning algorithm. Int. J. Innov. Technol. Explor. Eng. 8(10), 2603–2606 (2019)
Article Google Scholar
Annepu, D.; Gowtham, G.: Cardiovascular disease prediction using machine learning techniques. Int. Res. J. Eng. Technol. 6(4), 3963–3971 (2019)
Google Scholar
MIT-BIH Arrhythmia Database Available Online: https://www.physionet.org/physiobank/database/mitdb/
Heart Disease Data Set Available Online: https://archive.ics.uci.edu/ml/datasets/Heart+Disease
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article MATH Google Scholar
Fernández, A.; Garcia, S.; Herrera, F.; Chawla, N.V.: SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J. Artif. Intell. Res. 61, 863–905 (2018)
Article MathSciNet MATH Google Scholar
Bardenet, R.; Brendel, M.; Kégl, B.; Sebag, M. (2013) Collaborative hyperparameter tuning. In: International Conference on Machine Learning, pp. 199–207
Yogatama, D.; Mann, G. (2014). Efficient transfer learning method for automatic hyperparameter tuning. In: Artificial Intelligence and Statistics, pp. 1077–1085
Goutte, C.; Gaussier, E. (2005) A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: European Conference on Information Retrieval, pp. 345–359. Springer, Berlin

Download references

Author information

Authors and Affiliations

Department of Computer Science, Institute of Science, Banaras Hindu University, Varanasi, India
Shwet Ketu & Pramod Kumar Mishra

Authors

Shwet Ketu
View author publications
You can also search for this author in PubMed Google Scholar
Pramod Kumar Mishra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shwet Ketu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ketu, S., Mishra, P.K. Empirical Analysis of Machine Learning Algorithms on Imbalance Electrocardiogram Based Arrhythmia Dataset for Heart Disease Detection. Arab J Sci Eng 47, 1447–1469 (2022). https://doi.org/10.1007/s13369-021-05972-2

Download citation

Received: 07 October 2020
Accepted: 05 July 2021
Published: 15 July 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s13369-021-05972-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Empirical Analysis of Machine Learning Algorithms on Imbalance Electrocardiogram Based Arrhythmia Dataset for Heart Disease Detection

Abstract

Similar content being viewed by others

An intelligent hybrid classification model for heart disease detection using imbalanced electrocardiogram signals

Meticulous Presaging Arrhythmia Fibrillation for Heart Disease Classification Using Oversampling Method for Multiple Classifiers Based on Machine Learning

Heart Disease Diagnosis Using Machine Learning Classification Techniques

1 Introduction

2 Related Work