Abstract
Heart disease is a widespread global concern, underscoring the critical importance of early detection to minimize mortality. Although coronary angiography is the most precise diagnostic method, its discomfort and cost often deter patients, particularly in the disease's initial stages. Hence, there is a pressing need for a non-invasive and dependable diagnostic approach. In the contemporary era, machine learning has pervaded various aspects of human life, playing a significant role in revolutionizing the healthcare industry. Decision support systems based on machine learning, leveraging a patient's clinical parameters, offer a promising avenue for diagnosing heart disease. Early detection remains pivotal in mitigating the severity of heart disease. The healthcare sector generates vast amounts of patient and disease-related data daily. Unfortunately, practitioners frequently underutilize this valuable resource. To tap into the potential of this data for more precise heart disease diagnoses, a range of machine learning algorithms is available. Given the extensive research on automated heart disease detection systems, there is a need to synthesize this knowledge. This paper aims to provide a comprehensive overview of recent research on heart disease diagnosis by reviewing articles published by reputable sources between 2014 and 2022. It identifies challenges faced by researchers and proposes potential solutions. Additionally, the paper suggests directions for expanding upon existing research in this critical field.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Heart disease refers to conditions that affect the functioning of the heart and blood vessels. Heart disease is a major cause of death worldwide. About 31% of global deaths occur due to this disease. According to the World Health Organization report, approximately 17.9 million people worldwide die each year due to this disease. According to the American Heart Association reports, 121.5 million American adults were affected by this disease in 2016 [1]. Early detection of heart disease can reduce the chance of the disease progressing to a more severe stage by providing appropriate treatment [2].
With the advent of machine learning, decision support systems have become useful tools in many fields such as manufacturing, marketing, education, weather forecasting, transportation, and healthcare [3]. In the past few decades, machine learning has influenced the healthcare sector to a great extent and various automated decision support systems have been developed for the prediction and diagnosis of diseases [4].
Heart Disease diagnosis is a deadly disease and timely diagnosis of this disease can reduce the severity of the disease and hence save human life. A decision support system developed using machine learning methods can help in diagnosing the disease using non-invasive tests. Researchers have made many efforts in this direction and research is still going on [5, 6]. This paper provides a detailed survey of the various decision support systems developed for heart disease diagnosis. In developing these decision support systems, researchers have used machine learning and deep learning methods. The performance parameters used to evaluate the performance of these systems and the validation methods used are also presented. Researchers have utilized several online available heart disease datasets for validating their systems. The details of various heart disease datasets available online are also discussed. Various challenges faced by researchers along with some feasible solutions are also suggested.
1.1 Motivation
Several decision support systems for heart disease diagnosis have been developed in recent years. It's critical to understand how existing systems were developed and what problems researchers faced to enhance them. It is also crucial to discover what improvements can be made to existing systems.
1.2 Research Questions
This paper aims to answer the following research questions:
-
(1)
What methods have been used to develop a decision support system for heart disease diagnosis?
-
(2)
What are the performance parameters and validation methods to evaluate the performance of systems?
-
(3)
What are various online available heart disease datasets?
-
(4)
What are the issues and challenges faced by researchers in developing automated diagnostic systems?
-
(5)
What strategies can be used to overcome challenges?
-
(6)
What are the possible improvements in heart disease diagnostic systems that can be done in the future?
1.2.1 Data Sources
The authors have done a survey of various articles from 2014 to 2022. An extensive search has been performed using the following keywords:
-
“Heart disease diagnosis using machine learning”
-
“Heart disease prediction using machine learning”
-
“Heart disease diagnosis using deep learning”
-
“Heart disease prediction using deep learning”
-
“Intelligent system to diagnose heart disease”
-
“Decision support system for heart disease diagnosis”
-
“Automated heart disease diagnosis”.
The number of articles studied is shown in Fig. 1.
Authors have surveyed articles from renowned digital libraries like Springer, IEEE, Hindawi, and Elsevier. The number of articles reviewed from different libraries is shown in Fig. 2.
A list of some good journals under various digital libraries used for the study is shown in Fig. 3.
The remaining paper is organized as follows: Sect. 2 contains a survey of heart disease diagnostic systems developed using machine learning methods. Section 3 contains a survey of heart disease diagnostic systems developed using deep learning methods. Section 4 contains the description of various online available heart disease datasets. Section 5 describes the performance parameters used to evaluate the performance of decision support systems. Section 6 describes validation methods used to perform experiments. Section 7 contains issues and challenges faced by various researchers. Section 8 describes the conclusion and future work.
2 Machine Learning Methods for Heart Disease Diagnosis
Several ways for detecting heart disease using machine learning techniques have been developed by researchers in the previous decade. Many researchers have proposed different approaches under diverse strategies, and we will discuss these methods in this section. Ghadiri Hedeshi and Saniee Abadeh [7] diagnosed heart disease by extracting rules using PSO (particle swarm optimization) algorithm. Multiple rules were extracted in each run of PSO. The authors worked on the dataset created by combining datasets of Cleveland, Long Beach, Hungarian, and Switzerland. The dataset contained 920 records and heart disease was diagnosed using 13 features. An accuracy of 85.76% was obtained. Bashir et al. [8] developed a system for the prediction of heart disease using the ensemble mechanism. The authors performed experiments on five datasets. The absence of disease was indicated by specifying 0 as the class label and the presence of disease was indicated by specifying 1 as the class label. The inter-quantile range method was used to detect outliers. The classification was performed using DT (decision tree), NB (naive bayes), SVM (support vector machine, and memory-based learner classifiers. Classifier results were aggregated with a majority vote to increase accuracy. Accuracy of 85.81% was obtained on the Cleveland dataset, 80.15% on the SPECTF dataset, 82.40% on the SPECT dataset, 86.12% on the Eric dataset, and 88.52% on the Statlog dataset.
Tomar and Agarwal [9] developed a system using LSTSVM (least square twin support vector machine). Features were selected based on their F-score. Experiments were performed on the Statlog dataset and 85.59% accuracy was achieved. Olaniyi et al. [10] proposed a model using MLP (multilayer perceptron) and SVM (support vector machine). The backpropagation algorithm was used for training the MLP. A learning rate of 0.32 was used in MLP. SVM provided an accuracy of 87.5% and multilayer perceptron provided an accuracy of 85%. Marateb and Goudarzi [11] diagnosed heart disease using a fuzzy rule-based system developed by NFC (Neuro-Fuzzy Classifier). The scaled conjugate gradient algorithm was used to sharply reduce the root mean square error and increase the learning speed of NFC. In their research, the authors used the Cleveland dataset. As a part of data preprocessing, discretization was performed to convert continuous values of features into discrete values. The classification process included both fuzzification and defuzzification. SFS (sequential feature selection) and MLR (multiple logistic regression) algorithms were used to identify significant features. The validation of the system was done by the hold-out method. NFC performance was measured without using feature selection, NFC in combination with MLR, NFC in combination with SFS. The results demonstrated that NFC in combination with MLR provided the highest accuracy of 84%. Khanna et al. [12] achieved 84.7% accuracy with SVM on the Cleveland dataset.
Long et al. [13] developed a system using IT2FLS (interval type-2 fuzzy logic system). Chaos firefly and rough set feature selection algorithms were used to optimize IT2FLS. The system achieved 88.3% accuracy on the Statlog dataset. Miranda et al. [14] developed a model to detect the risk of heart disease using NB classifier. The results of blood and urine tests were used to develop the model. Data was collected from Mayapada hospital. Prediction attributes were selected by conducting interview sessions. Data contained 38 attributes of 60589 patients. Feature selection was performed using the BE (backward elimination) method. Records with incomplete data were removed. After data cleaning, data from 50528 patients were retained. Data normalization was performed by converting numerical data into categorical data. After data cleaning and normalization, classification was performed using the NB Classifier. The system achieved 80% accuracy. Verma et al. [15] performed feature selection by combining CFS (correlation-based feature selection) with PSO. K-means clustering algorithm was used to remove outliers. Four Classifiers MLP (multilayer perceptron), MLR (multinomial logistic regression), FURIA (fuzzy unordered rule induction algorithm), and C4.5 were used to develop the prediction model. Experiments were performed on data collected from IGMC (Indira Gandhi Medical College), Shimla. MLR achieved an accuracy of 88.4%.
Jabbar et al. [16] used RF (random forest) to predict heart disease. The feature set was reduced using chi-square method. The system achieved 83.70% accuracy. Liu et al. [17] used relief and RS (rough set) methods to select features relevant to heart disease diagnosis. The relief method assigns weights to features and selects important features. The output of the relief was used as the input of RS to reduce the set of features. In their research, the authors used an ensemble classifier using boosting mechanism. C4.5 was used as a weak classifier in the ensemble classifier. System performance was validated with the jackknife test on the Statlog dataset. The system achieved 92.59% accuracy. Buchan et al. [18] predicted disease based on risk factors such as high cholesterol, physical inactivity, high blood pressure, and an unhealthy diet. The authors used electronic medical records of patients which are unstructured data. The authors had to combine natural language processing and machine learning to make predictions based on unstructured data. The authors used i2b2 Heart Disease Risk Factors Challenge data set containing records of 296 diabetic patients. Some of the risk factors for diabetics are common to heart disease. This was a challenge facing the researchers. The authors used Apache cTAKES for natural language processing. The information extracted by cTAKES was used to provide training to the model. Feature selection was performed using PCA (principal component analysis) and MI (mutual information). After feature selection, classification was performed using MaxEnt (maximum entropy), SVM, and NB classifiers. The system achieved 77.4% F1-Score.
Mdhaffar et al. [19] combined the technique of CEP (complex event processing) with statistical methods. The system collected the health parameters of the patients extracted through the use of wearable sensors. CEP processed the input data by executing analysis rules based on the threshold. Threshold values vary from patient to patient. Threshold values were calculated automatically based on historical data. Rspberry PI3 was used to perform the formatting of the collected data. NoSQL database was used to store the data. CEP can trigger alarms for predicting heart failure. The system can generate reports of prediction which can be used by cardiologists. The system achieved 84.75% precision. Babic et al. [20] detected heart disease diagnosis using descriptive and prescriptive analysis. Predictive analysis was done using NB, DT, SVM, and ANN (artificial neural network) whereas descriptive analysis was done using decision and association rules. Important features for prediction were selected using some statistical methods.
The analysis was performed on three datasets: Z-Alizadeh Sani dataset, South African dataset, and combined dataset created from Hungary, Cleveland, Long Beach, and Switzerland datasets. In the Z-Alizadeh Sani dataset, the best accuracy of 86.67 was achieved with SVM. The best accuracy of 73.87% was achieved with DT on the South African dataset. In the combined dataset, ANN achieved the best accuracy of 89.93%.
Davari Dolatabadi et al. [21] diagnosed heart disease from ECG (electrocardiogram) signals obtained from the Long-Term ST Database. The database included ECG recordings of eighty individuals representing events of ST-segment changes. HRV (heart rate variability) signals were extracted from the ECG signals. PCA was applied to the extracted features to select important features. Features selected with PCA were used by SVM classifier for diagnosis achieving an accuracy of 99.2%. Kumar and Inbarani [22] diagnosed heart disease by ECG signals. The authors acquired the data on ECG signals from the database of MIT-BIH Arrhythmia. Discrete wavelet transform was applied to remove noise from ECG signals and perform feature extraction. The authors proposed NRSC (neighborhood rough set classifier) to perform the diagnosis. Euclidean distance was used as the distance metric to define the neighborhood. The system diagnosed the disease by classifying the signal as a normal or abnormal heartbeat. The system achieved 99.32% accuracy.
Shah et al. [23] proposed a system combining PPCA (probabilistic principal component analysis) with SVM. PPCA was used to reduce the dimensionality of the features. RBF (radial basis function) based SVM was used to classify a smaller set of features. The system achieved 85.82% accuracy on the Hungarian dataset, 82.18% accuracy on the Cleveland dataset, and 91.30% accuracy on the Switzerland dataset.
Qin et al. [24] their research used RF, SVM, MLP, LR (logistic regression), GBDT (gradient boosting decision tree), Adaboost (adaptive boosting), and KNN (k-nearest neighbor) classifiers for detecting heart disease. The authors proposed an ensemble algorithm based upon multiple feature selection to select relevant features so that detection accuracy can be improved. Maximum accuracy of 93.70% was achieved on Z-Alizadeh dataset. Nalluri et al. [25] diagnosed heart disease using the hybrid system. SVM and MLP classifiers were used to perform the classification. Three evolutionary algorithms GSA (gravity search algorithm), FA (firefly algorithm), and PSO were used to optimize the parameters. In MLP, momentum and learning rate were optimized. In SVM, margins were optimized. The system was validated on five datasets of cardiovascular disease. The system obtained an accuracy of 94.1% on the Cleveland dataset, 90.74% on the Statlog dataset, 89.5% on the SPECT dataset, 90.6% on the SPECTF dataset, and 91.4% on the Eric dataset. Alizadehsani et al. [26] used three classifiers for detecting the stenosis of three coronary arteries. During the detection of stenosis in each artery, features to be used are selected by SVM. These three classifiers were only able to predict blockage in the individual artery. The final prediction of heart disease was made by combining the results of the three classifiers. The authors achieved 88.77% accuracy on the Hungarian dataset, 93.06% on the Cleveland dataset, and 96.40% on the Z-Alizadeh dataset.
Verma et al. [27] presented the use of NB, C4.5, and MLP for CAD diagnosis. The authors collected the data of 335 individuals from IGMC, Shimla, India. Disease severity was detected with 77.6% accuracy using C4.5, 73.73% accuracy using NB, and 71.94% accuracy using MLP.
Dhanaseelan and Jeya Sutha [28] proposed HCFI (frequent itemset algorithm based upon hashing) to detect heart disease. The algorithm efficiently detected disease by removing unnecessary features. The algorithm worked in two steps. In the first step, the transaction was initiated. In the second step, frequent itemsets were generated, The authors evaluated HCFI on the Cleveland dataset. David and Belcy [29] used RF, DT, and NB classifiers. The system provided the best accuracy of 81% with RF on the Statlog dataset. Haq et al. [30] used three algorithms of feature selection MRMR (minimal redundancy maximal relevance), relief, and LASSO (least absolute shrinkage and selection operator) for the selection of significant features. The classification was performed using six classifiers LR, SVM, NB, ANN, DT, RF, and KNN. The combination of RF and Relief provided an accuracy of 85%.
Vijayashree and Sultana [31] classified heart disease using SVM. Feature selection was performed using improved PSO. PSO was improved by selecting optimal weights. A fitness function optimized using a support vector machine helped in the selection of optimal weights. The system achieved 84.36% accuracy on the Cleveland dataset. Dwivedi [32] predicted heart disease using six classifiers ANN, LR, SVM, CT (classification tree), KNN, and NB. LR achieved the highest accuracy of 85%. Dogan et al. [33] predicted heart disease using genetic and epigenetic data from the Framingham dataset. A model was constructed using random forest and 78% accuracy was achieved. Saqlain et al. [34] diagnosed heart disease using RBF Kernel SVM. The accuracy of diagnosis was increased by performing feature selection using three methods. Different feature subsets were created by combining features using the fisher score-based algorithm. After creating different feature subsets, forward and reverse feature selection algorithms were used to select the feature subset. The system achieved an accuracy of 92.68%, 81.19%, 84.52%, and 82.7% for Switzerland, Cleveland, Hungarian and SPECTF datasets, respectively. Abdar et al. [35] developed a system using three types of SVM. GA (genetic algorithm) and PSO were used for feature selection and model optimization. Experiments were performed on the Z-Alizadeh dataset and 93.08% accuracy was achieved.
Ayatollahi et al. [36] diagnosed heart disease using SVM and ANN. Data were collected from the Aja University of Medical Sciences. Twenty-five features were used for disease diagnosis. SVM diagnosed with greater accuracy than ANN. SVM provided a sensitivity of 92.23%. Latha and Jeeva [37] used majority voting with NB, BN (bayes network), MLP, and RF to develop an ensemble model to predict heart disease with 85.48% accuracy. Khennou et al. [38] performed heart disease prediction using KNN and SVM. A combined dataset of Cleveland, Switzerland, and Hungarian was used to evaluate the results and maximum accuracy of 87% was achieved with SVM. Magesh and Swarnalatha [39] used CDTL (cluster-based decision tree learning) to optimize the set of features and performed heart disease classification using these optimized features. The authors used a Cleveland heart disease dataset that was divided into multiple datasets using class labels. After that data was preprocessed and different class pairs were created. After that decision tree was applied to each dataset and decision attributes were selected from each cluster. Interconnecting features were extracted from these decision attributes and classifiers were applied to these extracted features. The system achieved an accuracy of 89.30% using an optimized random forest classifier.
Khourdifi and Bahaj [40] developed a system using the SVM, KNN, MLP, RF, and NB classifiers. FCBF (fast correlation-based feature selection) method was used to select relevant features for classification based on the correlation between the features. The selected subset of features was further optimized by ACO (ant colony optimization) and PSO (particle swarm optimization) methods. In PSO, different individuals of the population called particles work together to find a globally optimum solution. ACO optimizes the feature set by selecting the features that have less similarity with other features, hence reducing redundancy in the selected features. The system was validated on the Cleveland dataset. The system provided the best accuracy of 99.65% with KNN. Mohan et al. [41] developed HRFLM (hybrid random forest with the linear method) for the classification of heart disease. Feature selection was performed using a decision tree by entropy value. The system achieved 88.7% accuracy on the Cleveland dataset. Ali et al. [42] predicted heart failure using a stacked model. The stacked model was developed using two models of SVM. One model was used for feature selection and the other model was used for classification. L1 regular linear SVM was used for feature selection. L2 regular RBF (radial basis function) kernel SVM was used for classification. The system achieved 92.22% accuracy on the Cleveland dataset.
Li et al. [43] developed a heart disease diagnosis system using DT, KNN, ANN, LR, SVM, and NB classifiers. Cleveland dataset was used for performing experiments. Missing values in the dataset were removed. Preprocessing methods of standard scaling and min–max scaling were applied to the dataset. Authors used standard methods of feature selection including MRMR, relief, LL (local learning), and LASSO for selecting relevant features to increase the accuracy of the system. The author also proposed a new feature selection method FCMIM (fast conditional mutual information). FCMIM was deployed based on conditional mutual information. FCMIM used mutual information value of features that are more compatible with the target class and compatible with already selected features. The system achieved maximum accuracy of 92.37%.
Fitriyani et al. [44] validated their proposed system on the Cleveland and the Statlog datasets. Outliers were detected and eliminated in the dataset using DBSCAN (density-based spatial clustering of applications with noise). The Training dataset was balanced using a hybrid SMOTE-ENN method. SMOTE (synthetic minority oversampling technique) oversampled the minority class and ENN (edited nearest neighbor) removed undesired overlapped samples while ensuring a balanced distribution of class. PCC (Pearson’s correlation coefficient) and the information gain method were used to remove irrelevant features. The Weka V3.8 tool was used for performing the experiments. XGBoost (extreme gradient boosting) classifier was used for predicting heart disease. The system achieved 95.90% accuracy on the Statlog dataset and 98.40% accuracy on the Cleveland dataset.
Almustafa [45] predicted heart disease using different classifiers. Naive Bayes, k-nearest neighbors, decision tree J48, SVM, JRip, stochastic gradient descent, decision tables, and AdaBoost classifiers were used for prediction. The authors used a combined dataset from Hungarian, Cleveland, Long Beach, and Switzerland, available online on Kaggle, to conduct the experiments. A total of 14 features were used out of 76 features. Out of these 14 features, the relevant features were selected using the classifier subset evaluator method. Decision Trees, KNN, and JRIP provided the best results with accuracy of 98.04, 99.70, and 97.26 respectively. The author also performed sensitivity analysis on decision tree and naive bayes classifiers. The sensitivity analysis of the decision tree was performed by taking the PCF (pruning confidence factor) as a parameter and the sensitivity analysis of naive bayes was done by taking the training size as a parameter. The decision tree was chosen to perform the sensitivity analysis because of its maximum accuracy and the naive bayes was chosen because of its low accuracy. The decision tree performed best with a PCF value of 0.35. Naive Bayes performed best with 80% training size.
Tama et al. [46] developed a system for using a stacked ensemble model. Stacked model was constructed using GB, RF, and XGBoost classifiers, and dimensionality reduction was performed using PSO. The system achieved 93.55% accuracy on the Statlog dataset, 86.49% accuracy on the Cleveland dataset, and 91.18% accuracy on the Hungarian dataset. Terrada et al. [47]used DT, ANN, and AdaBoost classifiers to diagnose heart disease and performed experiments on three datasets. ANN provided the best accuracy of 94%. Verma [48] developed an ensemble model using J48, CART (classification and regression tree), and RF classifiers. The model was validated on the Z-Alizadeh dataset. The system achieved 84.82% accuracy. Javid et al. [37] developed an ensemble model using the voting mechanism for heart disease prediction. KNN, RF, SVM, GRU (gated recurrent unit), and LSTM (long short-term memory) classifiers were used to develop the ensemble model. Experiments were performed on the Cleveland dataset and 85.71% accuracy was achieved.
Joloudari et al. [49] developed models using DT (decision tree), RT (random tree), CHAID (chi-squared automatic interaction detection), and SVM. Important features were selected based on feature ranking. The random tree provided the best accuracy of 91.47. Mienye et al. [50] partitioned the dataset into different segments. Different models were developed on the partitioned datasets using CART (classification and regression tree). An ensemble model was developed by combining different CART models. The system achieved 91% accuracy on the Framingham dataset. Spencer et al. [51] combined four datasets i.e. Cleveland, Hungarian, Long Beach, and Switzerland into one dataset. After creating the integrated dataset, important features were selected using the chi-square method. Heart disease classification was done using bayes net algorithm achieving 85% accuracy. Gazeloğlu [52] achieved 84.81% accuracy in heart disease classification using the SVM classifier. Budholiya et al. [53] developed a system using the XGBoost classifier. The hyperparameters of XGBoost were optimized using bayesian optimization. The system achieved 91.8% prediction accuracy. Amin et al. [54] developed different models for predicting heart disease using DT, KNN, NB, SVM, LR, and ANN classifiers. An ensemble model was also developed by applying voting on naive Bayes and logistic regression. Significant features were selected using the brute force method. The system achieved maximum accuracy of 87.4 using the voting mechanism.
(L et al. 2021) developed models to predict heart disease using RF, NB, DT, AdaBoost, LR, GB (gradient boosting), and XGBoost classifiers. Relevant features were selected using GA. The performance of the models was optimized using hyperparameter optimization. Feature selection increases the accuracy of all the classifiers. DT achieved 88.7% accuracy, RF achieved 90.7% accuracy, AdaBoost achieved 85.5% accuracy, NB achieved 62.7% accuracy, LR achieved 70.4% accuracy, KNN achieved 84.5% accuracy, XGBoost achieved 85.2% accuracy and GB achieved 86.8% accuracy. Gárate-Escamila et al. [55] developed a system using random forest where the optimal set of features was selected using chi-square and PCA. The system provided an accuracy of 99% on the Hungarian dataset, 98.7% on the Cleveland dataset, and 99.4% on the Cleveland-Hungarian dataset. Arul Jothi et al. [56] used DT and KNN classifiers to predict heart disease. KNN achieved 67% accuracy and DT achieved 81% accuracy. Valarmathi and Sheela [57] had optimized random forest classifiers using randomized search, grid search, and genetic algorithm. Important features for diagnosis were selected using SFS (sequential forward selection) algorithm. Optimized random forest provided 80.2% accuracy on the Z-Alizadeh dataset. Bahani et al. [58] developed a system to predict heart disease by FCRLC (fuzzy rule-based classification system with fuzzy clustering and linguistic modifiers). The system achieved 83.17% accuracy on the Cleveland dataset.
Shorewala [59] developed a stacked model using KNN, SVM, and RF for heart disease detection. LASSO algorithm was utilized for feature selection. The model was evaluated using a Cardiovascular Disease dataset having records from 70,000 patients, and it was shown to be 75.1% accurate. (L et al. 2021) developed an optimized model using RF. The features were chosen using GA. The model provided 90.7% accuracy on the Z-Alizadeh dataset. Rani et al. [60] developed a hybrid system using NB, SVM, RF, LR, and AdaBoost Classifiers. Features were selected using GA and RFE (recursive feature elimination) algorithms. SMOTE and standard scalar methods were also used for data preprocessing. Missing values were imputed using MICE (multivariate imputation by chained equations). The system achieved maximum accuracy of 86.6% with RF (random forest). Rani et al. [61] selected features by finding out the feature importance using ET (extra tree) classifier. The classification was done using KNN, XGBoost, SVM-Linear (Support Vector Machine-Linear), and SVM-RBF(Support Vector Machine-Radial Basis Function). Hyperparameter optimization was done using grid-search optimization. The system provided an accuracy of 95.16% with SVM-RBF on the Z-Alizadeh Sani dataset.
Patro et al. [62] used support vector machine optimized using bayesian algorithm and achieved 93.3% accuracy on the Cleveland dataset. Louridi et al. [63] filled missing data using MICE, Mean, KNN, and Mode algorithms. Class balancing was also done in the dataset. Accuracy of 95.83% was achieved using the stacking algorithm. Ghosh et al. [64] selected features using relief and lasso methods. Hybrid classifiers were developed by combining boosting and bagging methods. The best accuracy of 99.05% was achieved with RFBM (random forest bagging method). Nawaz et al. [65] developed a prediction model for heart disease using KNN, SVM, RF, ANN, and GDO (gradient descent optimization). GDO achieved maximum accuracy of 98.54%. Chang et al [66] developed a system using RF and achieved 83% accuracy. Archana et al. [67] developed a hybrid method using NB and RF. Features were selected using the relief algorithm. An accuracy of 93% was obtained on the Cleveland dataset. Nagavelli et al. [68] detected heart disease using the XGBoost classifier and achieved 95.9% accuracy on the Cleveland dataset. Records with missing values were not used. Gao et al. [69] performed experiments with SVM, RF, DT, KNN, NB, and ensemble algorithms. Features were selected using LDA (linear discriminate analysis) and PCA methods. Ensemble algorithm with DT has given maximum accuracy of 98.6%. The heart disease diagnosis systems developed by the researchers using machine learning methods are summarized in Table 1.
3 Deep Learning Methods for Heart Disease Diagnosis
Researchers have proposed several methods for identifying heart disease using deep learning techniques. Many researchers have presented various approaches under various strategies, which are discussed here. Choi et al. [72] developed a model to detect heart failure using RNN (recurrent neural network). The authors analyzed the relationship between temporal events in EHR (electronic health records) using the GRU of RNN. EHR was obtained from Sutter PAMF (Palo Alto Medical Foundation). EHR events were represented by a set of one-hot vectors. An N-dimensional vector was used to represent N events. In each vector of n dimensions, one dimension was 1 indicating the occurrence of the event and the rest were 0. Vector Xt was given as input to GRU stored in hidden layer h at timestamp t. The state of the hidden layer changes with each timestamp. Logistic regression was applied to the vector of the final state and a scalar value was produced representing the patient's risk score. The model achieved an AUC value of 0.777. Arabasadi et al. [73] proposed a system for CAD diagnosis using a neural network. Weights of the neural network were optimized by GA. The backpropagation algorithm was used to train the ANN. GA used 100 chromosomes as the initial population. The fitness value of the chromosomes was calculated using the RMSE (root mean square error) of the untrained ANN. Roulette wheel algorithm was used in GA for selection. Two-point crossover was used with a crossover probability of 1. The mutation was performed using gaussian mutation. Each chromosome contained all weights of the neural network and each gene in the chromosome contained one weight of the neural network. Feature selection was performed by SVM. The system was evaluated on the Z-Alizadeh Sani dataset. The system achieved an accuracy of 93.85%.
Samuel et al. [74] developed the ANN model to predict the risk of heart failure. The network weights were optimized using the fuzzy approach. Accuracy of 91.10% was achieved. Kim and Kang [75] developed a system to predict the risk level of heart disease using ANN. Relevant features contributing to diagnosis were selected by performing a correlation analysis of features. The correlated features were coupled by connecting to the hidden layer of the neural network. Only relevant features were used as inputs to the neural network to predict disease. Experiments were performed on the dataset collected in the KNHANES-VI (6th Korea National Health and Nutrition Examination Survey). The system provided an accuracy of 82.51%. Caliskan and Yuksel [76] proposed a DNN (deep neural network model) for CAD diagnosis by combining a softmax and two autoencoders. The authors validated the model on four different datasets. The system achieved 85.2% accuracy on the Cleveland dataset, 84% accuracy on the Long Beach dataset, 92.2% on the Switzerland dataset, and 83.5% on the Hungarian dataset. Poornima and Gladis [77] in their research proposed a hybrid classifier for the prediction of heart disease. Features were selected using OLPP (Orthogonal Local Preserving Projection). The classification was performed using ANN. There were 4 neurons in the input layer, 100 neurons in the hidden layer, and 5 neurons in the output layer of the neural network. Weights of the connection between neurons had a range from − 10 to 10. The network was optimized using LM (levenberg–marquardt) and GSO (group search optimization) for setting the weights. From the two sets of weights obtained by LM and GSO, the best weights in the network were used. The authors used three datasets Cleveland, Hungarian, and Switzerland to validate the results. The system achieved 94% accuracy on the Cleveland dataset, 98% accuracy on the Hungarian dataset, and 87% accuracy on the Switzerland dataset. Malav and Kadam [78] predicted heart disease using ANN and K-means. The authors utilized the Cleveland dataset. The dataset was first clustered using K-means and then the output of K-means was given as input to the ANN for classification. The convergence time of ANN was reduced by K-Means. The system achieved 89.53% sensitivity and 93.52% precision.
Tan et al. [79] proposed a system for CAD diagnosis with a stacked model of LSTM and CNN using ECG signals. The data of ECG signals were obtained from the PhysioNet database and only lead 2 signals were used. This dataset consists of ECG signals from 7 CAD patients and 40 healthy individuals. The system achieved an accuracy of 99.85%. Miao and Miao [80] developed a DNN model for heart disease diagnosis and achieved 83.67% accuracy on the Cleveland dataset. Ali et al. [81] developed a system for diagnosing heart disease using DNN. In DNN, overfitting and underfitting of the model should not occur. Irrelevant features in the training data lead to overfitting of the model. If there is an insufficient number of features in the training data, this may lead to the underfitting of the model. To tackle the problem of selecting relevant features, the authors have used the chi-square statistical method. The network configuration was optimized using the exhaustive grid search method. The authors used the Cleveland dataset for experiments achieving 93.33% accuracy.
Meshref [82] achieved 84.25% accuracy in heart disease diagnosis using ANN. The attribute subset selection method was used to select features. Verma and Mathur [83] used the correlation and cuckoo search method to select important features and developed a DNN for the diagnosis of heart disease. In an individual detected with the disease, the severity of the disease was informed using case-based logic. An accuracy of 85.48% was obtained using this approach.
Javeed et al. [84] developed two-hybrid systems FWAFE-DNN and FWAFE-ANN for heart disease diagnosis. The authors proposed FWAFE (floating window with adaptive size for feature elimination) algorithm for feature selection that was used in both systems. In this feature selection method, a floating window was used to eliminate the features. The window size was taken from 1 to n-1 and the features that resided in the window were eliminated. Feature selection was done by evaluating the performance of the system for different subsets of features. After feature selection, classification was done using ANN in the FWAFE-ANN hybrid system and DNN in the FWAFE-ANN hybrid system. The authors used the Cleveland dataset and the hold-out validation method with 70% training data and 30% testing data. The FWAFE-ANN achieved 91.11% accuracy and the FWAFE-ANN achieved 93.33% accuracy.
Pan et al. [85] developed a system for heart disease prediction using enhanced deep learning assisted convolution neural network. The dataset was pre-processed by removing missing values and applying scaling methods. Deep learning was used for feature selection. The classification was performed using MLP and BN. The system achieved 94.9% accuracy. Ali et al. [81] proposed OCI-DBN (optimally configured and improved deep belief network) for heart disease prediction. Important features were selected using the RUZZO-TOMPA approach in which features were selected by computing the fitness of each feature. The configuration of the deep belief network was optimized using a stacked genetic algorithm. The system achieved 94.61% accuracy. Dutta et al. [86] used the data collected in the National Health and Nutritional Examination Survey from 1999 to 2016. The data was highly imbalanced containing 1300 records of heart patients and 35,779 records of healthy individuals. A CNN (convolutional neural network) model was proposed for CAD diagnosis that provided 79.5% accuracy. Relevant features were selected using LASSO before being applied to the model for classification.
Paragliola and Coronato [87] proposed a system to identify the risk of cardiac events due to hypertension. This system was developed using LSTM and CNN and used ECG signals as inputs. The system achieved 98% accuracy. Cherian et al. [88] predicted heart disease using the ANN model. The feature set was reduced using PCA. A hybrid approach combining LA (lion algorithm) and PSO was used to optimize the weights of a neural network. Results were validated on the Statlog dataset and 87.09% accuracy was achieved. Salhi Dhai Eddine and Tari [89] selected important features using a correlation matrix. The authors used ANN to diagnose heart disease using selected features and maximum accuracy of 93% was achieved. Murugesan et al. [90] developed a super learner by combining three bioinspired algorithms with ANN. Three sets of features were selected using CSO (cat swarm optimization), BFO (bacterial foraging optimization), and KH (krill herd) algorithms. A BPNN (backpropagation neural network) was trained using the features selected by each algorithm. Accuracy of 86.36% was achieved on the Statlog dataset and Accuracy of 84% was achieved on the Cleveland dataset. Bharti et al. [91] developed a DNN model to detect heart disease. Dropout layers were used to prevent overfitting. DNN achieved 94.2% accuracy on the Cleveland dataset. Mehmood et al. [92] used LASSO to select features and applied the selected features to CNN achieving 97% accuracy. Koppu et al. [93] proposed a model in which firstly preprocessing was done using spline interpolation to fill missing data and entropy-correlation to detect outliers. Optimal features were selected using F-DA (fitness-oriented dragon fly optimization algorithm). Selected features were applied to DBF (deep belief network) achieving 84.44% accuracy. The heart disease diagnosis systems developed by the researchers using deep learning methods are summarized in Table 2.
4 Online Available Heart Disease Datasets
In the literature, researchers used various online available clinical datasets to develop models for heart disease diagnosis. The details of the various heart disease datasets available online are given in this section.
4.1 Cleveland, Hungarian, Switzerland, and Long Beach Heart Disease Datasets
All four datasets are available in the UCI (University of California, Irvine) repository. These datasets have a total of 76 features including continuous and categorical features. All the researchers used 14 of the 76 features. Thirteen of these 14 features are prognostic features and one feature differentiates between the presence and absence of disease.
The presence of the disease is indicated by a value of 1 to 4 indicating the level of disease severity. The absence of disease is indicated by the value 0. All four datasets have some missing values. The Cleveland dataset is the most widely used dataset by researchers [95].
4.2 Statlog Heart Disease Dataset
There are 13 predictive features in this dataset and one feature indicates the presence or absence of heart disease. There are 297 instances in the dataset. The presence of heart disease is indicated by 1 and the absence of heart disease is indicated by 0. There are no missing values in this dataset Statlog Dataset [101].
4.3 Framingham Heart Disease Dataset
This dataset is available on Kaggle. This dataset contains 15 predictive features. There are 4240 examples in the dataset. Values of some features are missing in this dataset Framingham Dataset [100].
4.4 SPECTF Heart Disease Dataset
This dataset includes features extracted from cardiac SPECT (single proton emission computed tomography) images. The dataset consists of total 44 attributes based on which heart disease is diagnosed. This dataset contains a record of 267 individuals. This dataset is available on the UCI repository (SPECTF [96].
4.5 Z-Alizadeh Sani Dataset
This dataset contains a record of 303 individuals. The dataset contains 54 predictive features classified into four categories: ECG, demographic, symptom and examination, and laboratory. Based on these predictive features, a person is classified as a normal or CAD patient. There are no missing values in the data in this dataset [97]
The various online available heart disease datasets are summarized in Table 3. Figure 4 shows the datasets along with the number of features (Table 4).
5 Performance Parameters
Various performance parameters used to evaluate the classification performance of the system are as follow:
-
Accuracy
-
Sensitivity
-
Specificity
-
Precision
-
F-Measure
-
AUROC (area under receiver operating characteristics curve)
The performance parameters are calculated using the number of true positives, false positives, true negatives, and false negatives. If a patient is suffering from a disease and the model can predict the disease, it is known as true positive and if the model is not able to predict, it is referred to as false negative. If an individual does not suffer from the disease and the model correctly classifies it is known as true negative, otherwise, it is referred to as false positive. Performance parameters are shown in Fig. 5 and the usage percentage of these parameters in the studied literature is shown in Fig. 6.
6 Validation Methods
Most researchers have used one or both of the following methods to validate the results:
6.1 Hold-Out Validation
In this method, the dataset is divided into two parts. One part is used for training the system and another part is used for testing the system. Most of the researchers have used 70% data for training and 30% data for testing. However, some researchers have used other percentage splits as well.
6.2 K-Fold Validation
In this method, the dataset is divided into k groups. Classification performance is evaluated over k iterations, using k-1 groups for model training and one group for model testing. In each iteration, a different group is selected for testing and the remaining groups are used for training. The performance of the classifier is calculated by taking the average performance over k iterations. Most of the researchers have used 10-fold validation with k = 10. However, some researchers have also used 2-fold, 3-fold, and 5-fold methods.
8 Conclusion and Future Work
Machine learning and deep learning approaches have been used to develop several decision support systems for the detection of heart disease. These systems were accurate to varying degrees. The system's accuracy is determined by the feature selection method, classifier, and preprocessing methods used. To design a high-performance decision support system for heart disease diagnosis, effective preprocessing approaches, feature selection, and classifiers are required. The majority of the researchers used data from the UCI repository, which is available online. The Cleveland and Z-Alizadeh datasets are the most popular and widely used.
Following are some suggestions based on review of the literature that should be included in future research for more accurate heart disease detection:
-
1.
By combining several machine learning algorithms and mining unstructured data available in enormous quantities in healthcare organisations, more hybrid models for reliable prediction of heart disease can be developed.
-
2.
In heart disease prediction, classification algorithms received greater attention than association rules. So, to achieve better results in future research, we must include these factors.
-
3.
The majority of studies used the online available datasets to train and test prediction models. We can collect real-time data from a large number of heart disease patients from reputable medical institutes around the country and utilize it to train and evaluate our prediction algorithms.
-
4.
For a more accurate diagnosis, highly skilled cardiologists must be consulted to prioritize the features based on their impact on the patient's health and add more vital heart disease attributes.
Data Availability
This review manuscript has no associated data.
References
Kumar R, Rani P (2020) Comparative analysis of decision support system for heart disease. Adv Math Sci J. https://doi.org/10.37418/amsj.9.6.15
Rajkumar R, Anandakumar K, Bharathi A (2016) Coronary artery disease (CAD) prediction and classification—a survey. ARPN J Eng Appl Sci 11
Sarker IH (2021) Machine learning: algorithms, real-world applications and research directions. SN Comput Sci. https://doi.org/10.1007/s42979-021-00592-x
Patel S, Patel A (2016) A big data revolution in health care sector: opportunities, challenges and technological advancements. Int J Inf Sci Tech. https://doi.org/10.5121/ijist.2016.6216
Malakar AK, Choudhury D, Halder B et al (2019) A review on coronary artery disease, its risk factors, and therapeutics. J Cell Physiol. https://doi.org/10.1002/jcp.28350
Masetic Z, Subasi A (2016) Congestive heart failure detection using random forest classifier. Comput Methods Progr Biomed. https://doi.org/10.1016/j.cmpb.2016.03.020
Ghadiri Hedeshi N, Saniee Abadeh M (2014) Coronary artery disease detection using a fuzzy-boosting PSO approach. Comput Intell Neurosci. https://doi.org/10.1155/2014/783734
Bashir S, Qamar U, Khan FH, Javed MY (2014) MV5: a clinical decision support framework for heart disease prediction using majority vote based classifier ensemble. Arab J Sci Eng. https://doi.org/10.1007/s13369-014-1315-0
Tomar D, Agarwal S (2014) Feature selection based least square twin support vector machine for diagnosis of heart disease. Int J Bio-Sci Bio-Technol. https://doi.org/10.14257/ijbsbt.2014.6.2.07
Olaniyi EO, Oyedotun OK, Adnan K (2015) Heart diseases diagnosis using neural networks arbitration. Int J Intell Syst Appl 7:75–82
Marateb HR, Goudarzi S (2015) A noninvasive method for coronary artery diseases diagnosis using a clinically-interpretable fuzzy rule-based system. J Res Med Sci 20:214
Khanna D, Sahu R, Baths V, Deshpande B (2015) Comparative study of classification techniques (SVM Logistic Regression and Neural Networks) to predict the prevalence of heart disease. Int J Mach Learn Comput. https://doi.org/10.7763/ijmlc.2015.v5.544
Long NC, Meesad P, Unger H (2015) A highly accurate firefly based algorithm for heart disease prediction. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2015.06.024
Miranda E, Irwansyah E, Amelga AY et al (2016) Detection of cardiovascular disease risk’s level for adults using naive bayes classifier. Healthc Inform Res. https://doi.org/10.4258/hir.2016.22.3.196
Verma L, Srivastava S, Negi PC (2016) A hybrid data mining model to predict coronary artery disease cases using non-invasive clinical data. J Med Syst. https://doi.org/10.1007/s10916-016-0536-z
Jabbar MA, Deekshatulu BL, Chandra P (2016) Prediction of heart disease using random forest and feature subset selection. In: Snášel V, Abraham A, Krömer P, Pant M, Muda A (eds) Advances in intelligent systems and computing. Springer, Cham
Liu X, Wang X, Su Q et al (2017) A hybrid classification system for heart disease diagnosis based on the RFRS method. Comput Math Methods Med. https://doi.org/10.1155/2017/8272091
Buchan K, Filannino M, Uzuner Ö (2017) Automatic prediction of coronary artery disease from clinical narratives. J Biomed Inform. https://doi.org/10.1016/j.jbi.2017.06.019
Mdhaffar A, Bouassida Rodriguez I, Charfi K et al (2017) CEP4HFP: complex event processing for heart failure prediction. IEEE Trans Nanobiosci. https://doi.org/10.1109/TNB.2017.2769671
Babic F, Olejar J, Vantova Z, Paralic J (2017) Predictive and descriptive analysis for heart disease diagnosis. In: Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, FedCSIS 2017
Davari Dolatabadi A, Khadem SEZ, Asl BM (2017) Automated diagnosis of coronary artery disease (CAD) patients using optimized SVM. Comput Methods Progr Biomed. https://doi.org/10.1016/j.cmpb.2016.10.011
Kumar SU, Inbarani HH (2017) Neighborhood rough set based ECG signal classification for diagnosis of cardiac diseases. Soft Comput. https://doi.org/10.1007/s00500-016-2080-7
Shah SMS, Batool S, Khan I et al (2017) Feature extraction through parallel probabilistic principal component analysis for heart disease diagnosis. Phys A Stat Mech its Appl. https://doi.org/10.1016/j.physa.2017.04.113
Qin CJ, Guan Q, Wang XP (2017) Application of ensemble algorithm integrating multiple criteria feature selection in coronary heart disease detection. Biomed Eng—Appl Basis Commun. https://doi.org/10.4015/S1016237217500430
Nalluri MSR, Kannan K, Manisha M, Roy DS (2017) Hybrid disease diagnosis using multiobjective optimization with evolutionary parameter optimization. J Healthc Eng. https://doi.org/10.1155/2017/5907264
Alizadehsani R, Hosseini MJ, Khosravi A et al (2018) Non-invasive detection of coronary artery disease in high-risk patients based on the stenosis prediction of separate coronary arteries. Comput Methods Progr Biomed. https://doi.org/10.1016/j.cmpb.2018.05.009
Verma L, Srivastava S, Negi PC (2018) An intelligent noninvasive model for coronary artery disease detection. Complex Intell Syst. https://doi.org/10.1007/s40747-017-0048-6
Dhanaseelan R, Jeya Sutha M (2018) Diagnosis of coronary artery disease using an efficient hash table based closed frequent itemsets mining. Med Biol Eng Comput. https://doi.org/10.1007/s11517-017-1719-6
David HBF, Belcy SA (2018) Heart disease prediction using data mining techniques. ICTACT J SOFT Comput 9:1824–1830
Haq AU, Li JP, Memon MH et al (2018) A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms. Mob Inf Syst. https://doi.org/10.1155/2018/3860146
Vijayashree J, Sultana HP (2018) A machine learning framework for feature selection in heart disease classification using improved particle swarm optimization with support vector machine classifier. Progr Comput Softw. https://doi.org/10.1134/S0361768818060129
Dwivedi AK (2018) Performance evaluation of different machine learning techniques for prediction of heart disease. Neural Comput Appl. https://doi.org/10.1007/s00521-016-2604-1
Dogan MV, Grumbach IM, Michaelson JJ, Philibert RA (2018) Integrated genetic and epigenetic prediction of coronary heart disease in the Framingham heart study. PLoS ONE. https://doi.org/10.1371/journal.pone.0190549
Saqlain SM, Sher M, Shah FA et al (2019) Fisher score and Matthews correlation coefficient-based feature subset selection for heart disease diagnosis using support vector machines. Knowl Inf Syst. https://doi.org/10.1007/s10115-018-1185-y
Abdar M, Książek W, Acharya UR et al (2019) A new machine learning technique for an accurate diagnosis of coronary artery disease. Comput Methods Prog Biomed. https://doi.org/10.1016/j.cmpb.2019.104992
Ayatollahi H, Gholamhosseini L, Salehi M (2019) Predicting coronary artery disease: a comparison between two data mining algorithms. BMC Public Health. https://doi.org/10.1186/s12889-019-6721-5
Latha CBC, Jeeva SC (2019) Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Informat Med Unlocked. https://doi.org/10.1016/j.imu.2019.100203
Khennou F, Fahim C, Chaoui H, Chaoui NEH (2019) A machine learning approach using predictive analytics to identify and analyze high risks patients with heart disease. Int J Mach Learn Comput. https://doi.org/10.18178/ijmlc.2019.9.6.870
Magesh G, Swarnalatha P (2021) Optimal feature selection through a cluster-based DT learning (CDTL) in heart disease prediction. Evol Intell. https://doi.org/10.1007/s12065-019-00336-0
Khourdifi Y, Bahaj M (2019) Heart disease prediction and classification using machine learning algorithms optimized by particle swarm optimization and ant colony optimization. Int J Intell Eng Syst. https://doi.org/10.22266/ijies2019.0228.24
Mohan S, Thirumalai C, Srivastava G (2019) Effective heart disease prediction using hybrid machine learning techniques. IEEE Access. https://doi.org/10.1109/ACCESS.2019.2923707
Ali L, Niamat A, Khan JA et al (2019) An optimized stacked support vector machines based expert system for the effective prediction of heart failure. IEEE Access. https://doi.org/10.1109/ACCESS.2019.2909969
Li JP, Haq AU, Din SU et al (2020) Heart disease identification method using machine learning classification in E-healthcare. IEEE Access. https://doi.org/10.1109/ACCESS.2020.3001149
Fitriyani NL, Syafrudin M, Alfian G, Rhee J (2020) HDPM: an effective heart disease prediction model for a clinical decision support system. IEEE Access. https://doi.org/10.1109/ACCESS.2020.3010511
Almustafa KM (2020) Prediction of heart disease and classifiers’ sensitivity analysis. BMC Bioinformat. https://doi.org/10.1186/s12859-020-03626-y
Tama BA, Im S, Lee S (2020) Improving an intelligent detection system for coronary heart disease using a two-tier classifier ensemble. Biomed Res Int. https://doi.org/10.1155/2020/9816142
Terrada O, Hamida S, Cherradi B et al (2020) Supervised machine learning based medical diagnosis support system for prediction of patients with heart disease. Adv Sci Technol Eng Syst. https://doi.org/10.25046/AJ050533
Jinny SV, Mate YV (2021) Early prediction model for coronary heart disease using genetic algorithms, hyper-parameter optimization and machine learning techniques. Health Technol (Berl). https://doi.org/10.1007/s12553-020-00508-4
Joloudari JH, Joloudari EH, Saadatfar H et al (2020) Coronary artery disease diagnosis; ranking the significant features using a random trees model. Int J Environ Res Public Health. https://doi.org/10.3390/ijerph17030731
Mienye ID, Sun Y, Wang Z (2020) An improved ensemble learning approach for the prediction of heart disease risk. Informat Med Unlocked. https://doi.org/10.1016/j.imu.2020.100402
Spencer R, Thabtah F, Abdelhamid N, Thompson M (2020) Exploring feature selection and classification methods for predicting heart disease. Digit Heal. https://doi.org/10.1177/2055207620914777
Gazeloğlu C (2020) Prediction of heart disease by classifying with feature selection and machine learning methods. Prog Nutr. https://doi.org/10.23751/pn.v22i2.9830
Budholiya K, Shrivastava SK, Sharma V (2020) An optimized XGBoost based diagnostic system for effective prediction of heart disease. J King Saud Univ—Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2020.10.013
Amin MS, Chiam YK, Varathan KD (2019) Identification of significant features and data mining techniques in predicting heart disease. Telemat Inform. https://doi.org/10.1016/j.tele.2018.11.007
Gárate-Escamila AK, Hajjam El Hassani A, Andrès E (2020) Classification models for heart disease prediction using feature selection and PCA. Informat Med Unlocked. https://doi.org/10.1016/j.imu.2020.100330
Arul Jothi K, Subburam S, Umadevi V, Hemavathy K (2021) Heart disease prediction system using machine learning. Mater Today Proc. https://doi.org/10.1016/j.matpr.2020.12.901
Valarmathi R, Sheela T (2021) Heart disease prediction using hyper parameter optimization (HPO) tuning. Biomed Signal Process Control. https://doi.org/10.1016/j.bspc.2021.103033
Bahani K, Moujabbir M, Ramdani M (2021) An accurate fuzzy rule-based classification systems for heart disease diagnosis. Sci African. https://doi.org/10.1016/j.sciaf.2021.e01019
Shorewala V (2021) Early detection of coronary heart disease using ensemble techniques. Informat Med Unlocked. https://doi.org/10.1016/j.imu.2021.100655
Rani P, Kumar R, Ahmed NMOS, Jain A (2021) A decision support system for heart disease prediction based upon machine learning. J Reliab Intell Environ. https://doi.org/10.1007/s40860-021-00133-6
Rani P, Kumar R, Jain A (2021) Coronary artery disease diagnosis using extra tree-support vector machine: ET-SVMRBF. Int J Comput Appl Technol. https://doi.org/10.1504/IJCAT.2021.119772
Patro SP, Nayak GS, Padhy N (2021) Heart disease prediction by using novel optimization algorithm: a supervised learning prospective. Informat Med Unlocked. https://doi.org/10.1016/j.imu.2021.100696
Louridi N, Douzi S, El Ouahidi B (2021) Machine learning-based identification of patients with a cardiovascular defect. J Big Data. https://doi.org/10.1186/s40537-021-00524-9
Ghosh P, Azam S, Jonkman M et al (2021) Efficient prediction of cardiovascular disease using machine learning algorithms with relief and lasso feature selection techniques. IEEE Access. https://doi.org/10.1109/ACCESS.2021.3053759
Nawaz MS, Shoaib B, Ashraf MA (2021) Intelligent cardiovascular disease prediction empowered with gradient descent optimization. Heliyon. https://doi.org/10.1016/j.heliyon.2021.e06948
Chang V, Bhavani VR, Xu AQ, Hossain M (2022) An artificial intelligence model for heart disease detection using machine learning algorithms. Healthc Anal. https://doi.org/10.1016/j.health.2022.100016
Archana KS, Sivakumar B, Kuppusamy R et al (2022) Automated cardioailment identification and prevention by hybrid machine learning models. Comput Math Methods Med. https://doi.org/10.1155/2022/9797844
Nagavelli U, Samanta D, Chakraborty P (2022) Machine learning technology-based heart disease detection models. J Healthc Eng. https://doi.org/10.1155/2022/7351061
Gao XY, Amin Ali A, Shaban Hassan H, Anwar EM (2021) Improving the accuracy for analyzing heart diseases prediction based on the ensemble method. Complexity. https://doi.org/10.1155/2021/6663455
Verma P (2020) Ensemble models for classification of coronary artery disease using decision trees. Int J Recent Technol Eng. 8:940–944. https://doi.org/10.35940/ijrte.F7250.038620
Javid I, Alsaedi AKZ, Ghazali R (2020) Enhanced accuracy of heart disease prediction using machine learning and recurrent neural networks ensemble majority voting method. Int J Adv Comput Sci Appl. https://doi.org/10.14569/ijacsa.2020.0110369
Choi E, Schuetz A, Stewart WF, Sun J (2017) Using recurrent neural network models for early detection of heart failure onset. J Am Med Informat Assoc. https://doi.org/10.1093/jamia/ocw112
Arabasadi Z, Alizadehsani R, Roshanzamir M et al (2017) Computer aided decision making for heart disease detection using hybrid neural network-genetic algorithm. Comput Methods Progr Biomed. https://doi.org/10.1016/j.cmpb.2017.01.004
Samuel OW, Asogbon GM, Sangaiah AK et al (2017) An integrated decision support system based on ANN and Fuzzy_AHP for heart failure risk prediction. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2016.10.020
Kim JK, Kang S (2017) Neural network-based coronary heart disease risk prediction using feature correlation analysis. J Healthc Eng. https://doi.org/10.1155/2017/2780501
Caliskan A, Yuksel ME (2017) Classification of coronary artery disease data sets by using a deep neural network. EuroBiotech J. https://doi.org/10.24190/issn2564-615x/2017/04.03
Poornima V, Gladis D (2018) A novel approach for diagnosing heart disease with hybrid classifier. Biomed Res. https://doi.org/10.4066/biomedicalresearch.38-18-434
Malav A, Kadam K (2018) A hybrid approach for Heart Disease Prediction using Artificial Neural Network and K-means. Int J Pure Appl Math 118
Tan JH, Hagiwara Y, Pang W et al (2018) Application of stacked convolutional and long short-term memory network for accurate identification of CAD ECG signals. Comput Biol Med. https://doi.org/10.1016/j.compbiomed.2017.12.023
Miao KH, Miao JH (2018) Coronary heart disease diagnosis using deep neural networks. Int J Adv Comput Sci Appl. https://doi.org/10.14569/IJACSA.2018.091001
Ali L, Rahman A, Khan A et al (2019) An automated diagnostic system for heart disease prediction based on χ2 statistical model and optimally configured deep neural network. IEEE Access. https://doi.org/10.1109/ACCESS.2019.2904800
Meshref H (2019) Cardiovascular disease diagnosis: a machine learning interpretation approach. Int J Adv Comput Sci Appl. https://doi.org/10.14569/ijacsa.2019.0101236
Verma L, Mathur MK (2019) Deep learning based model for decision support with case based reasoning. Int J Innov Technol Explor Eng 8
Javeed A, Rizvi SS, Zhou S et al (2020) Heart risk failure prediction using a novel feature selection method for feature refinement and neural network for classification. Mob Inf Syst. https://doi.org/10.1155/2020/8843115
Pan Y, Fu M, Cheng B et al (2020) Enhanced deep learning assisted convolutional neural network for heart disease prediction on the internet of medical things platform. IEEE Access. https://doi.org/10.1109/ACCESS.2020.3026214
Dutta A, Batabyal T, Basu M, Acton ST (2020) An efficient convolutional neural network for coronary heart disease prediction. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2020.113408
Paragliola G, Coronato A (2021) An hybrid ECG-based deep network for the early identification of high-risk to major cardiovascular events for hypertension patients. J Biomed Inform. https://doi.org/10.1016/j.jbi.2020.103648
Cherian RP, Thomas N, Venkitachalam S (2020) Weight optimized neural network for heart disease prediction using hybrid lion plus particle swarm algorithm. J Biomed Inform. https://doi.org/10.1016/j.jbi.2020.103543
Salhi Dhai Eddine and Tari A and KM-T (2021) Using machine learning for heart disease prediction. In: Senouci Mustapha Redaand Boudaren MEY and SF and MM (ed) Advances in computing systems and applications. Springer, Cham
Murugesan S, Bhuvaneswaran RS, Khanna Nehemiah H et al (2021) Feature selection and classification of clinical datasets using bioinspired algorithms and super learner. Comput Math Methods Med. https://doi.org/10.1155/2021/6662420
Bharti R, Khamparia A, Shabaz M et al (2021) Prediction of heart disease using a combination of machine learning and deep learning. Comput Intell Neurosci. https://doi.org/10.1155/2021/8387680
Mehmood A, Iqbal M, Mehmood Z et al (2021) Prediction of heart disease using deep convolutional neural networks. Arab J Sci Eng. https://doi.org/10.1007/s13369-020-05105-1
Koppu S, Maddikunta PKR, Srivastava G (2020) Deep learning disease prediction model for use with intelligent robots. Comput Electr Eng. https://doi.org/10.1016/j.compeleceng.2020.106765
Ali SA, Raza B, Malik AK et al (2020) An Optimally configured and improved deep belief network (OCI-DBN) approach for heart disease prediction based on Ruzzo–Tompa and stacked genetic algorithm. IEEE Access. https://doi.org/10.1109/ACCESS.2020.2985646
Cleveland Dataset (1988) Cleveland Dataset. In: V.A. Med. Center, Long Beach Clevel. Clin. Found. https://archive.ics.uci.edu/ml/datasets/heart+disease
SPECTF Dataset 2001 SPECTF Dataset. https://archive.ics.uci.edu/ml/datasets/SPECTF+Heart. Accessed 11 May 2022
Z-Alizadeh Sani Dataset 2017 Z-Alizadeh Sani_Dataset. https://archive.ics.uci.edu/ml/datasets/Z-Alizadeh+Sani. Accessed 11 Apr 2022
Rani P, Kumar R, Jain A (2022) A novel hybrid imputation method to predict missing values in medical datasets. In: Marriwala N, Tripathi C, Jain S, Kumar D (eds) Lecture notes in networks and systems. Springer, Singapore
Rani P, Kumar R, Jain A (2022) A hybrid approach for feature selection based on correlation feature selection and genetic algorithm. Int J Softw Innov. https://doi.org/10.4018/ijsi.292028
Framingham Dataset Framingham_Dataset. https://www.kaggle.com/captainozlem/framingham-chd-preprocessed-data. Accessed 11 May 2022
Statlog Dataset Statlog_Dataset. http://archive.ics.uci.edu/ml/datasets/statlog+(heart)
Funding
There is no funding involved with this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no known competing financial interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rani, P., Kumar, R., Jain, A. et al. An Extensive Review of Machine Learning and Deep Learning Techniques on Heart Disease Classification and Prediction. Arch Computat Methods Eng 31, 3331–3349 (2024). https://doi.org/10.1007/s11831-024-10075-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11831-024-10075-w