1 Introduction

The heart is an essential organ of the human body, which is also known as the human’s engine room. It is a lattice of muscles that pumps blood into the human body or in other words we can say that it is the central processing area of the cardiovascular system [1]. The cardiovascular system is a complex network of blood circulation that is composed of blood vessels (i.e., arteries), capillaries, and veins [2]. Any abnormality or obstruction in healthy blood flow or blood circulation may lead to severe and several complexities of heart disorders or diseases. These heart disorders or diseases are commonly known as cardiovascular diseases (CVDs) and are one of the deadliest diseases across the world to date. The CVDs have consisted of various diseases such as brain vascular diseases, heart diseases, and blood vessel diseases [3]. Global Atlas on Cardiovascular Disease Prevention and Control of the World Health Organization (WHO) has quoted in their report is that CVDs are one of the foremost causes of disability and deaths across the globe [4].

However, several reports from WHO have indicated an increase in CVD worldwide, which is a very dangerous indication for humans. Approximately 17.5 million people have died due to CVDs in the year 2012 globally than from any other life-threatening diseases [5]. It is also quoted in the WHO's report that around 17.9 million people die from CVDs each year which is 31% of all deaths across the world. Among these deaths, 85% have died from either heart attack or stroke [6].

CVDs can be categorized into several types and some of them have been listed below [7]:

  1. 1.

    Coronary Heart Disease: This disease is caused by blockage or damage in the main blood vessel.

  2. 2.

    Heart Failure: In this condition heart suffers to pump sufficient blood.

  3. 3.

    Cardiomyopathy: It is a hereditary or acquired heart muscle disease.

  4. 4.

    Hypertensive Heart Disease: This disease is caused by high blood pressure (BP).

  5. 5.

    Ischemic Heart Disease: This disease makes heart arteries narrow; thus, the minimal amount of oxygen and blood reaches into the heart.

  6. 6.

    Valvular Heart Disease: This disease is caused by defects or damage in any of the heart valves.

  7. 7.

    Inflammatory Heart Disease: This disease is caused by bacterial or viral infections.

In the medical domain, machine learning plays an important role in knowledge extraction and medical data analysis. Its immense handling ability and diverse processing capabilities make it a giant player, to deal with the complex problem in both real-time and offline scenarios [8,9,10,11]. If we talk about heart disease detection, the algorithms such as Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest (RF), Naive Bayes (NB), Extra Tree (ET), Bagging (BAG), Decision Tree (DT), Linear Regression (LR), Adaptive Boosting (ADB), Linear Discriminant Analysis (LDA), Convolutional Neural Network (CNN), Quadratic Discriminant Analysis (QDA), Multi-layer Perceptron (MLP), Ensemble Classifiers, Artificial neural network (ANN), Boosting, and so on have been used. Apart from this, various hybrid models are also been introduced and they have archived great success [12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60].

Accurate diagnosis and prediction are vital issues not only for hospitals but also for practitioners. These vital issues should be taken care-off while building a smart healthcare solution. Development in computing technologies has enabled various facilities for real-time data collection and storage. Thus, a huge amount of health data are collected and which is very useful for clinical investigations.

The principal objective of this study is:

  • To critically analyze and summarize the state-of-the-art research articles on heart disease detection over ECG datasets.

  • To critically investigate the empirical analysis of machine learning algorithms on imbalance electrocardiogram based arrhythmia dataset for heart disease detection.

  • To find out the impact of class balancing on the performance of machine learning algorithms over the imbalanced class paradigms.

  • To give a machine learning-based direction toward robotic or smart machine-based solutions for social well-being.

The rest of this study is as follows. Section 2 exhibits state-of-the-art articles present on the detection of heart diseases over ECG datasets. In Sect. 3, a discussion about the material and methods used for preliminary study such as dataset description, model setup, and statistical analysis have been drawn. Section 4 manifests the result of the experimental evaluation. The comprehensive discussion about the empirical analysis is conferred in Sect. 5. In the last Sect. 6, the concluding comments with the anticipated scope have been described.

2 Related Work

The emergence of IoT encourages researchers to work in the healthcare domain, with massive data coverage and functionalities. It also motivates the big companies to build a health-centric solution for the wellbeing of the human. If we talk about the algorithmic perspective, massive development has been seen in the last couple of decades. From time to time various researchers have given the possible solution toward making the healthcare domain smart, but still, there is plenty of scopes are left. Thus, these scopes need improvements in both, design and algorithmic perspectives. It will not only help in building a smarter world but also for making the healthcare domain to new heights by enabling the robotics-based smarter solution.

The state-of-the-art articles on heart disease detection over the ECG dataset have been shown in Table 1 [12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60].

Table 1 The state-of-the-art on heart disease detection over the ECG dataset

In Table 1, various state-of-the-art articles on heart disease detection over the ECG dataset have been shown. This table gives a quick inside of the state-of-the-art methods with respect to year, research objectives, used methods, dataset and their types, accuracy of the methods, and their respective references. It is clear from pieces of the literature that there are plenty of machine learning algorithms such as Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest (RF), Naive Bayes (NB), Extra Tree (ET), Bagging (BAG), Decision Tree (DT), Linear Regression (LR), Adaptive Boosting (ADB), Linear Discriminant Analysis (LDA), Convolutional Neural Network (CNN), Quadratic Discriminant Analysis (QDA), Multi-layer Perceptron (MLP), Ensemble Classifiers, Artificial neural network (ANN), Boosting, and various hybrid models have been used for the classification of the heart disease [12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60]. These methods are applied either on the ECG signals or numerical datasets. It is evident from the literature that this is an active area of research with the plenty of opportunities. Thus, there is still a lot of scopes to find out the more effective methods with clear interpretations. Keeping this into our mind, we have presented the empirical examination of the state-of-the-art algorithms.

3 Materials and Methods

This section talks about the material and methods used for the preliminary study. It is consisting of the three subsections which discuss dataset description, model setup, and statistical parameters, respectively.

3.1 Data

For this empirical investigation, two open-source Electrocardiogram (ECG) datasets have been used. The first dataset has been extracted from PhysioNet’s repository under the MIT-BIH arrhythmia database [61]. This dataset has been recorded using the nine electrodes (E1-E9) mounted on various locations of the human body. The second dataset has been extracted from the UCI (University of California-Irvine) repository named Heart Disease Data (consist of four data repositories Cleveland, Hungary, Switzerland, and the VA Long Beach) which is also an arrhythmia dataset (Coronary Artery Disease) [62]. This dataset has been recorded using the thirteen parameters i.e., AGE, SEX, CP, TRESTBPS, CHOL, FBS, RESTECG, THALACH, EXANG, OLDPEAK, SLOPE, CA, and THAL. The first dataset (PhysioNet’s arrhythmia Dataset) is consists of 74,501 instances of 9 attributes whereas the second dataset (UCI's Arrhythmia Dataset) contains 403 instances of 14 attributes.

In Figs. 1 and 2, the visualization of the PhysioNet’s arrhythmia dataset and UCI's arrhythmia dataset has been exhibited, respectively. Both the datasets have been used for heart disease detection purposes.

Fig. 1
figure 1

Visualization of PhysioNet’s arrhythmia dataset

Fig. 2
figure 2

Visualization of UCI's arrhythmia dataset

The class-wise distribution of the PhysioNet’s arrhythmia dataset (based on nine attributes) and UCI's arrhythmia dataset (based on thirteen attributes) has been exhibited in Tables 2 and 3, respectively. These tables are consisting of the class-wise classification of all the liable attributes based on minimum range, maximum range, mean and standard derivation.

Table 2 Attributes description of PhysioNet’s arrhythmia dataset
Table 3 Attributes description of UCI's arrhythmia dataset

In Figs. 3 and 4, the histogram of the PhysioNet’s and UCI's arrhythmia dataset has been shown. This histogram visualization will very helpful to understand the dataset distribution in the form of their spread and shape over a real-time sample (signal/data). This information will be extremely beneficial for further investigation.

Fig. 3
figure 3

Histogram representation of PhysioNet’s arrhythmia dataset

Fig. 4
figure 4

Histogram representation of UCI's arrhythmia dataset

3.2 Model Setup

The empirical analysis setup of machine learning algorithms for the electrocardiogram-based arrhythmia dataset has exhibited in Fig. 5.

Fig. 5
figure 5

Setup for experimental analysis of machine learning algorithms

This critical analysis is consisting of six fundamental steps.

  • In the initial step, the ECG-based arrhythmia dataset (from PhysioNet and UCI repositories) is supplied as input.

  • In step 2, the cleaning of the dataset is performed for the elimination of the missing values and unusual objects.

  • The class balancing using the Synthetic Minority Over-sampling Technique (SMOTE) is performed in step 3.

  • In the next step 4, the class balanced preprocessed data are fed as an input to the several machine learning algorithms (i.e., SVM, KNN, RF, ET, BAG, ET, DT, LR, and ADB) with a tenfold class validation method.

  • The statistical interpretation-based evaluation of the all used machine learning algorithms has computed in step 5.

  • In the last step 6, the output of the used machine learning algorithms is received.

3.2.1 Data Balancing Using SMOTE

Class imbalance is one of the well-known and vital issues which may influence the performance of machine learning algorithms. This empirical analysis has been conducted to find out the impact of class imbalance on the performance of the various machine learning algorithms. From this empirical analysis, we have seen that in the binary class problem the majority class has more accuracy than the minority class because while training the model the more samples are from the majority class and only a small bunch of samples are from the minority class. Thus, there is a significant need for an effective method that could manage the issue of class imbalance. In this context, an unsupervised technique i.e., SMOTE (Synthetic Minority Oversampling Technique) [63, 64] method has been used to deal with the class imbalance problem on the ECG-based arrhythmia datasets (from PhysioNet and UCI repositories).

Class balancing outcome of PhysioNet’s arrhythmia dataset and UCI arrhythmia dataset has been exhibited in Tables 4 and 5, respectively. These tables hold the class-wise aspects of the SMOTE percentage with the total number of samples at each setting.

Table 4 Result of class imbalance problem using SMOTE on PhysioNet’s arrhythmia dataset
Table 5 Result of class imbalance problem using SMOTE on UCI's arrhythmia dataset

3.2.2 Hyper-tuning of Machine Learning Algorithms

Hyper-tuning is one of the best ways to select the hyper-parameter for machine learning algorithms. With the help of this, we can evaluate various hyperparameters under various settings to find the best-suited parameter for the given problem [65, 66]. Therefore, we have hyper tuned each machine-learning algorithm to find out the best parameters. These evaluated parameters have been used to train the machine learning models.

In Table 6, the hyperparameters with various selection criteria have been shown. Based on these selection criteria, the best hyperparameters have opted and which have been used in the experimental evaluation.

Table 6 Hyper-parameter selection

3.3 Statistical Analysis

To validate the evaluation result of machine learning algorithms, four statistical measures, i.e., precision, f1-score, recall, and accuracy, have been utilized. These statistical measures have played an important role in establishing the accuracy and suitability of machine learning algorithms [67]. The mathematical imputation of these statistical measures has been manifested in Eqs. 1, 2, 3 and 4, respectively.

$${\text{Precision}} = \frac{{\left( {{\text{TP}}} \right)}}{{\left( {{\text{TP}} + {\text{FP}}} \right)}}$$
(1)
$$f1 = \frac{{2 \times \left( {{\text{precision}} \times {\text{recall}}} \right)}}{{{\text{precision}} + {\text{recall}}}}$$
(2)
$${\text{Recall}} = \frac{{\left( {{\text{TP}}} \right)}}{{\left( {{\text{TP}} + {\text{FN}}} \right)}}$$
(3)
$${\text{Accuracy}} = \frac{{\left( {{\text{TP}} + {\text{TN}}} \right)}}{{\left( {{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}} \right)}} \times 100\%$$
(4)

where TP, true positive; TN, true negatives; FP, false positive, and FN false negatives.

4 Results

In the process of result finding, a critical analysis has been performed using eight machine learning algorithms (i.e., Support Vector Machine, K-Nearest Neighbors, Random Forest, Extra Tree, Bagging, Decision Tree, Linear Regression, and Adaptive Boosting) with a tenfold class validation method, and which has been shown in Fig. 6.

Fig. 6
figure 6

Machine learning a quick look

To understand the correlation among the attributes of the arrhythmia dataset of PhysioNet (contains 74,501 instances of 9 attributes) and UCI (contains 403 instances of 14 attributes) repositories the correlation coefficient for both the dataset has been calculated and shown in Figs. 7 and 8, respectively.

Fig. 7
figure 7

Correlation coefficient matrix for PhysioNet’s arrhythmia dataset

Fig. 8
figure 8

Correlation coefficient matrix for UCI’s arrhythmia dataset

This critical investigation has been conducted into two parts. In the first part, a critical analysis of the hyper-tuned result of all eight machine learning algorithms on ECG-based arrhythmia dataset (from PhysioNet and UCI repositories) has been performed. In the second part, the class balancing (using SMOTE) and hyper-tuning both are combinedly applied on the same arrhythmia dataset (from PhysioNet and UCI repositories), to find out the impact of class balancing on results. The performance evaluation of all algorithms has been conducted using the four validation measures i.e., accuracy, recall, f-1socre, and precision. This empirical investigation is aimed to find out the impact of class balancing on the performance of machine learning algorithms which will very helpful in developing a machine learning-based direction toward robotic or smart machine-based solutions for social well-being.

In Tables 7, 8, 9 and 10, the performance evaluation result of eight machine learning algorithms under the different validation criteria (cross-validation policy, i.e., 1-fold, 3-fold, 5-fold, and 10-fold) based on the four performance evaluators (i.e., precision, f1-score, recall, accuracy) for ECG-based arrhythmia dataset (from PhysioNet and UCI repositories) has been presented.

Table 7 Performance analysis of hyper-tuned machine leaning algorithm over PhysioNet’s arrhythmia dataset
Table 8 Performance analysis of hyper-tuned machine learning algorithm over PhysioNet’s arrhythmia dataset after class balancing
Table 9 Performance analysis of hyper-tuned machine leaning algorithm over UCI’s arrhythmia dataset
Table 10 Performance analysis of hyper-tuned machine learning algorithm over UCI’s arrhythmia dataset after class balancing

From this empirical analysis is clear that in the binary class balancing problem, the majority class has archived more accuracy than the minority class, because while training the model the more samples are from the majority class and only a small bunch of samples are from the minority class. But after the class balancing using the SMOTE has not only increases the overall accuracy of the algorithm but also the individual accuracy of the classes. Therefore, the accuracy of the minority class will not be sacrificed.

Table 7 shows the hyper-tuned results of all eight machine learning algorithms on the PhysioNet’s ECG-based arrhythmia dataset, whereas Table 8 shows the hyper-tuned results of all eight machine learning algorithms after class balancing. Similarly, Table 9 shows the hyper-tuned results of all eight machine learning algorithms on UCI's ECG-based arrhythmia dataset, whereas Table 10 shows the hyper-tuned results of all eight machine learning algorithms after class balancing.

5 Discussion

Class-wise performance result of eight machine learning algorithms (Hyper-tuned vs. Hyper-tuned After Class Balancing) under the different validation criteria (cross-validation policy, i.e., 1-fold, 3-fold, 5-fold, and 10-fold) based on the four performance evaluators (i.e., precision, f1-score, recall, accuracy) for ECG-based arrhythmia dataset (from PhysioNet and UCI repositories) has been presented in Figs. 9, 10, 11 and 12.

Fig. 9
figure 9

Performance result of Class 1 based on hyper-tuned versus hyper-tuned after class balancing over PhysioNet’s arrhythmia dataset

Fig. 10
figure 10

Performance result of Class 2 based on hyper-tuned versus hyper-tuned after class balancing over PhysioNet’s arrhythmia dataset

Fig. 11
figure 11

Performance result of Class 0 based hyper-tuned versus hyper-tuned after class balancing over UCI’s arrhythmia dataset

Fig. 12
figure 12

Performance result of Class 1 based on hyper-tuned versus hyper-tuned after class balancing over UCI’s arrhythmia dataset

Figure 9 shows the class 1 result of eight machine learning algorithms (Hyper-tuned vs. Hyper-tuned After Class Balancing) on the PhysioNet’s ECG-based arrhythmia dataset, whereas Fig. 10 shows the class 2 result of eight machine learning algorithms (Hyper-tuned vs. Hyper-tuned After Class Balancing) on the PhysioNet’s ECG-based arrhythmia dataset.

Similarly, Fig. 11 shows the class 0 result of eight machine learning algorithms (Hyper-tuned vs. Hyper-tuned After Class Balancing) on UCI's ECG-based arrhythmia dataset, whereas Fig. 12 shows the class 1 result of eight machine learning algorithms (Hyper-tuned vs. Hyper-tuned After Class Balancing) on UCI's ECG-based arrhythmia dataset.

In Fig. 13, the average accuracy with a tenfold cross-validation policy of eight machine learning algorithms (Hyper-tuned vs. Hyper-tuned After Class Balancing) over the ECG-based arrhythmia dataset of PhysioNet and UCI has been presented.

Fig. 13
figure 13

Average accuracy of hyper-tuned versus hyper-tuned after class balancing over arrhythmia dataset of PhysioNet’s and UCI’s

6 Conclusion

The growing rate of heart illnesses has become a serious matter of concern for humans across the globe. This concern got the attention of researchers to reduce the mortality and morbidity rate of heart diseases across the globe. From time to time various researchers have given their algorithmic solutions toward the heart data analysis. These algorithmic solutions are playing a vital role not only in building the smart robotic solution but also in minimizing the impact of the diseases by effective decision making. This empirical investigation is aimed to critically analyze and summarize state-of-the-art articles on heart disease detection. Apart from this, we have also performed an empirical analysis of machine learning algorithms using an electrocardiogram dataset to find out the impact of class balancing on the performance of machine learning algorithms, which will give a machine learning-based direction toward robotic or smart machine-based solutions. From this experimental analysis is clear that in the binary class balancing problem, the majority class has archived more accuracy than the minority class because, while training the model the more samples are from the majority and only a small bunch of samples are from the minority class. But after the class balancing using the Synthetic Minority Over-sampling Technique (SMOTE) has not only increases the overall accuracy of the algorithm but also the individual accuracy of the classes. Therefore, the accuracy of the minority class will not be sacrificed. Thus, we can say that class imbalance is one of the vital issues that have to take care of before making medical solutions where are things depend on the accuracy of the algorithm.

In the future, this critical analysis will further be extended with the algorithmic and data perspective.