Comparison of Different Decision Tree Algorithms for Predicting the Heart Disease

Saraswat, Deepak; Singh, Preetvanti

doi:10.1007/978-981-15-6318-8_21

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1241))

Included in the following conference series:

International Conference on Machine Learning, Image Processing, Network Security and Data Sciences

1207 Accesses

Abstract

Data mining procedures are utilized to extract meaningful information for effective knowledge discovery. Decision tree, a classification method, is an efficient method for prediction. Seeing its importance, this paper compares decision tree algorithms to predict heart disease. The heart disease data sets are taken from Cleveland database, Hungarian database and Switzerland database to evaluate the performance measures. 60 data records for training and 50 data records for testing were taken as input for comparison. In order to evaluate the performance, fourteen attributes are considered to generate confusion matrices. The results exhibited that the algorithm that highest accuracy rates for predicting heart disease is Random forest, and thus can be considered as the best procedure for prediction.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Heart Disease Prediction using Machine Learning Techniques

Article 16 October 2020

Predictive System: Comparison of Classification Techniques for Effective Prediction of Heart Disease

Prediction of Heart Disease Through KNN, Random Forest, and Decision Tree Classifier Using K-Fold Cross-Validation

Keywords

1 Introduction

The knowledge discovery process, now-a-days, has become more complex because of increasing size and complexity of the data sets. Data mining procedures are utilized to extract meaningful information for effective knowledge discovery. These procedures can be classified as descriptive procedures and predictive procedures. Descriptive procedures of data mining provide latest information on past or recent events, and for validating results, necessitate post-processing methods. Predictive procedures, on the other hand, predict the patterns and properties of vague information. Commonly used data mining procedures are Clustering, Classification, Association, Outlier Detection, Prediction, and Regression.

Classification is utilized for discovering knowledge based on different classes. It determines a model to describe and distinguish data classes based on trained data set, and identifies to which of the categories a new observation belongs to. Decision tree fosters a classification model in a tree-like structure. This type of mining, where the data set is distributed into smaller subsets and the associated Decision Tree (DT) is incrementally built, belongs to supervised class learning. The benefits of decision trees are:

Easy integration due to intuitively representing the data,
Investigative discovery of knowledge
High accuracy
Easily interpretable, and
Excludes unimportant features

Because of above mentioned benefits, decision tree classifier is utilized for knowledge extraction in areas like education [33, 40], tourism [18, 34], healthcare [30, 31] and others. The healthcare industry creates colossal information from which it becomes extremely difficult to extract useful information. Decision tree is an efficient method for extracting effective knowledge from this titanic of information and providing reliable healthcare decision. It has been utilized in making effective decisions in various medical science areas like cancer detection, heart disease diagnosis and others [9, 11, 32, 45]. Presenting a brief overview of the algorithms for developing decision trees, and then comparing these algorithms for predicting heart disease based on performance measures is the foremost goal of this paper.

Heart diseases are a major source of death worldwide. As of 2016 there have been more than 17.6 million deaths per year. The death toll is expected to exceed 23.6 million by 2030 [3]. India too is witnessing shocking rise in the occurrence of heart disease (HD) [12]. Researchers have developed various decision tree algorithms to effectively diagnosis and treat heart diseases. Decision trees and rough set approach was utilized by Son, Kim, Kim, Park and Kim [41] to develop a model for heart failure. Chaurasia and Pal [8], Sa [35], and Amin, Chiam, and Varathan [4] developed a prediction system for HD by utilizing decision tree in combination with other data mining algorithms. Mathan, Kumar, Panchatcharam, Manogaran, and Varadharajan [22] presented forecast frameworks for heart diseases using decision tree classifiers. Wu, Badshah and Bhagwat [45] developed prediction model for HD survivability. Saxena, Johri, Deep and Sharma [37] developed a HD prediction system using KNN and Decision tree algorithm. Shekar, Chandra and Rao [38] developed a classifier to provide optimized feature for envisaging the type of HD using decision tree and genetic algorithm. Vallée, Petruescu, Kretz, Safar and Blacher [43] evaluated the role of APWV index in predicting HD. Pathak and Valan [29] proposed a forecasting model for HD diagnosis by integrating rule-based approach with decision tree. Sturts and Slotman [42] predicted risks for the patients who are re-admitted within 30 days after hospital discharge for CHF by using decision trees analysis. Seeing the importance of decision tree in healthcare, this paper presents a brief overview and comparison of seven DT algorithms based on various evaluation measures to diagnosis the heart disease.

2 Material and Method

2.1 Overview of Decision Tree Algorithms

Decision tree algorithm is a supervised learning method which is implemented on the basis of the data volume, available memory space and scalability, in serial or parallel style. The DT algorithms considered in this study are: J48, Decision stump, LMT, Hoeffding tree, Random forest, Random tree and REPTree. These are the most used algorithms for predicting various diseases (Table 1).

Table 1. Different algorithms are applied in many areas.

Full size table

a.
The J48 algorithm develops decision tree by classifying the class attribute based on the input elements.
b.
The Hoeffding tree algorithm learns from huge data streams.
c.
A Random tree algorithm draws a random tree from a set of possible trees and the distribution of trees is considered uniform.
d.
A Random forest algorithm draws multiple decision trees using a bagging approach.
e.
Logistic model tree (LMT) interprets combination of tree induction and linear logistic regression.
f.
Decision stump builds simple binary decision stumps for both nominal and numeric classification task.
g.
REPTree algorithm generates a regression or decision tree using information gain or variance.

2.2 Data Set

In order to attain the second goal of the present paper, three data sets from Cleveland database, Hungarian database and Switzerland database are considered for evaluating the performance measures of the DT algorithms. 60 data records for training and 50 data records for testing were taken as input for comparison. As shown in Table 2, fourteen attributes are considered for evaluating the performance measures.

Table 2. Description of the input attributes.

Full size table

3 Decision Tree Analysis

The performance measures are generated by using the information mining instrument Weka 3.9.3. Data pre-processing is done by means of the Replace Missing Values channel to filter all records and replace missing qualities. Next confusion matrices are developed by applying considered DT algorithms with 2 classes as Class 1 = YES (heart disease is present), and Class 2 = NO (heart disease not present), and True Positive = correct positive predicted; False Positive = incorrect positive predicted; True Negative = correct negative predicted; False Negative = incorrect negative predicted; P are Positive samples; and N are Negative samples.

These matrices are then utilized to compute the accuracy measures using the equations:

$$ TP rate = \frac{TP}{TP + FP} $$

(1)

$$ FP rate = \frac{FP}{FP + TN} $$

(2)

$$ Accuracy = \frac{TP + TN}{P + N} $$

(3)

$$ Error rate = \frac{FP + FN}{P + N} $$

(4)

4 Results: Comparison

In Table 3 discussed the comparison of considered algorithms.

Table 3. Working comparison of decision tree algorithms.

Full size table

Table 4 shows the computed performance measures using Eq. (3) and Eq. (4) for the data.

Table 4. Values of correctly classifier instances (CCI) and incorrectly classifier instances (ICI).

Full size table

From the Fig. 1, it can be observed that for the considered data sets, Random Forest is showing max accuracy and least error rate.

From Table 5 it is clear that the TP Rate for the class = No is higher for Decision stump, Hoeffding tree, J48, Random forest, LMT and Random tree, which means the algorithms are successfully identifying the patients who do not have heart disease.

Table 5. Class accuracy.

Full size table

5 Conclusion and Future Scope

The primary goal of this paper was to compare most used decision tree algorithms and determine efficient method for predicting heart disease on the basis of computed performance measures Accuracy, True Positive Rate, Error rate and False Positive Rate. The algorithms considered in the study are Hoeffding tree, Decision stump, LMT, J48, Random tree, Random forest and REPTree were evaluated. From results it is clear that Random tree and Random forest are efficient method for generating decision tree. The reason for Radom forest being the best is, it splits on a sub set of a features and supports parallelism. Further the algorithm also supports high dimensionality, quick prediction, and outliners and non-linear data. However, the algorithm is less interpretable and can tend to over fit. In future, performance evaluation can be based on considering more attributes responsible for heart diseases. Other than healthcare, the framework can be utilized for evaluating performances in other domains also.

References

Alehegn, M., Joshi, R., Mulay, P.: Analysis and prediction of diabetes mellitus using machine learning algorithm. Int. J. Pure Appl. Math. 118(9), 871–878 (2018)
Google Scholar
Alickovic, E., Subasi, A.: Medical decision support system for diagnosis of heart arrhythmia using DWT and random forests classifier. J. Med. Syst. 40(4), 108 (2016). https://doi.org/10.1007/s10916-016-0467-8
Article Google Scholar
American Heart Association. Heart disease and stroke statistics 2018 (2017). http://www.heart.org/idc/groups/ahamahpublic/@wcm/@sop/@smd/documents/downloadable/ucm_491265.Pdf
Amin, M.S., Chiam, Y.K., Varathan, K.D.: Identification of significant features and data mining techniques in predicting heart disease. Telematics Inform. 36, 82–93 (2019). https://doi.org/10.1016/j.tele.2018.11.007
Article Google Scholar
Azar, A.T., Elshazly, H.I., Hassanien, A.E., Elkorany, A.M.: A random forest classifier for lymph diseases. Comput. Methods Programs Biomed. 113(2), 465–473 (2014). https://doi.org/10.1016/j.cmpb.2013.11.004
Article Google Scholar
Bahrami, B., Shirvani, M.H.: Prediction and diagnosis of heart disease by data mining techniques. J. Multidisc. Eng. Sci. Technol. (JMEST). 2(2), 164–168 (2015)
Google Scholar
Chaurasia, V., Pal, S.: Data mining approach to detect heart diseases. Int. J. Adv. Comput. Sci. Inf. Technol. (IJACSIT). 2, 56–66 (2014)
Google Scholar
Chaurasia, V., Pal, S.: Early prediction of heart diseases using data mining techniques. Carib. J. Sci. Technol. 1, 208–217 (2013)
Google Scholar
Fatima, M., Pasha, M.: Survey of machine learning algorithms for disease diagnostic. J. Intell. Learn. Syst. Appl. 9(1), 1 (2017). https://doi.org/10.4236/jilsa.2017.91001
Article Google Scholar
Gomathi, S., Narayani, V.: Early prediction of systemic lupus erythematosus using hybrid K-Means J48 decision tree algorithm. Int. J. Eng. Technol. 7(1), 28–32 (2018)
Google Scholar
Hasan, M.R., Abu Bakar, N.A., Siraj, F., Sainin, M.S., Hasan, S.: Single decision tree classifiers’ accuracy on medical data (2015)
Google Scholar
https://timesofindia.indiatimes.com/india/heart-disease-deaths-rise-in-india-by-34-in-15-years/articleshow/64924601.cms
Iyer, A., Jeyalatha, S., Sumbaly, R.: Diagnosis of diabetes using classification mining techniques (2015). arXiv preprint arXiv:1502.03774, https://doi.org/10.5121/ijdkp.2015.5101
Jena, L., Kamila, N.K.: Distributed data mining classification algorithms for prediction of chronic-kidney-disease. Int. J. Emerg. Res. Manag. Technol. 4(11), 110–118 (2015)
Google Scholar
Karabulut, E.M., Ibrikci, T.: Effective automated prediction of vertebral column pathologies based on logistic model tree with SMOTE preprocessing. J. Med. Syst. 38(5), 50 (2014). https://doi.org/10.1007/s10916-014-0050-0
Article Google Scholar
Karthikeyan, T., Thangaraju, P.: Analysis of classification algorithms applied to hepatitis patients. Int. J. Comput. Appl. 62(15), 25–30 (2013)
Google Scholar
Kasar, S.L., Joshi, M.S.: Analysis of multi-lead ECG signals using decision tree algorithms. Int. J. Comput. Appl. 134(16) (2016). https://doi.org/10.5120/ijca2016908206
Kuzey, C., Karaman, A.S., Akman, E.: Elucidating the impact of visa regimes: a decision tree analysis. Tourism Manag. Perspect. 29, 148–156 (2019). https://doi.org/10.1016/j.tmp.2018.11.008
Article Google Scholar
Lohita, K., Sree, A.A., Poojitha, D., Devi, T.R., Umamakeswari, A.: Performance analysis of various data mining techniques in the prediction of heart disease. Indian J. Sci. Technol. 8(35), 1–7 (2015)
Article Google Scholar
Masethe, H.D., Masethe, M.A.: Prediction of heart disease using classification algorithms. In: Proceedings of the World Congress on Engineering and Computer Science, vol. 2, pp. 22–24 (2014)
Google Scholar
Masetic, Z., Subasi, A.: Congestive heart failure detection using random forest classifier. Comput. Methods Programs Biomed. 130, 54–64 (2016). https://doi.org/10.1016/j.cmpb.2016.03.020
Article Google Scholar
Mathan, K., Kumar, P.M., Panchatcharam, P., Manogaran, G., Varadharajan, R.: A novel Gini index decision tree data mining method with neural network classifiers for prediction of heart disease. Des. Autom. Embedded Syst. 22(3), 225–242 (2018). https://doi.org/10.1007/s10617-018-9205-4
Article Google Scholar
Nahar, N., Ara, F.: Liver disease prediction by using different decision tree techniques. Int. J. Data Min. Knowl. Manag. Process (IJDKP) 8, 1–9 (2018). https://doi.org/10.5121/ijdkp.2018.8201
Article Google Scholar
Novakovic, J.D., Veljovic, A.: Adaboost as classifier ensemble in classification problems. In: Proceedings Infoteh-Jahorina, pp. 616–620 (2014)
Google Scholar
Olayinka, T.C., Chiemeke, S.C.: Predicting paediatric malaria occurrence using classification algorithm in data mining. J. Adv. Math. Comput. Sci. 31(4), 1–10 (2019). https://doi.org/10.9734/jamcs/2019/v31i430118
Article Google Scholar
Pachauri, G., Sharma, S.: Anomaly detection in medical wireless sensor networks using machine learning algorithms. Procedia Comput. Sci. 70, 325–333 (2015). https://doi.org/10.1016/j.procs.2015.10.026
Article Google Scholar
Pandey, A.K., Pandey, P., Jaiswal, K.L., Sen, A.K.: A heart disease prediction model using decision tree. IOSR J. Comput. Eng. (IOSR-JCE) 12(6), 83–86 (2013)
Article Google Scholar
Parimala, C., Porkodi, R.: Classification algorithms in data mining: a survey. Proc. Int. J. Sci. Res. Comput. Sci. 3, 349–355 (2018)
Google Scholar
Pathak, A.K., Arul Valan, J.: A predictive model for heart disease diagnosis using fuzzy logic and decision tree. In: Elçi, A., Sa, P.K., Modi, C.N., Olague, G., Sahoo, M.N., Bakshi, S. (eds.) Smart Computing Paradigms: New Progresses and Challenges. AISC, vol. 767, pp. 131–140. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-9680-9_10
Chapter Google Scholar
Paxton, R.J., et al.: An exploratory decision tree analysis to predict physical activity compliance rates in breast cancer survivors. Ethn. Health. 24(7), 754–766 (2019). https://doi.org/10.1080/13557858.2017.1378805
Article Google Scholar
Pei, D., Zhang, C., Quan, Y., Guo, Q.: Identification of potential type II diabetes in a Chinese population with a sensitive decision tree approach. J. Diabetes Res. (2019). https://doi.org/10.1155/2019/4248218
Article Google Scholar
Perveen, S., Shahbaz, M., Guergachi, A., Keshavjee, K.: Performance analysis of data mining classification techniques to predict diabetes. Procedia Comput. Sci. 82, 115–121 (2016). https://doi.org/10.1016/j.procs.2016.04.016
Article Google Scholar
Rizvi, S., Rienties, B., Khoja, S.A.: The role of demographics in online learning; a decision tree based approach. Comput. Educ. 137, 32–47 (2019). https://doi.org/10.1016/j.compedu.2019.04.001
Article Google Scholar
Rondović, B., Djuričković, T., Kašćelan, L.: Drivers of E-business diffusion in tourism: a decision tree approach. J. Theor. Appl. Electron. Commer. Res. 14(1), 30–50 (2019). https://doi.org/10.4067/S0718-18762019000100104
Article Google Scholar
Sa, S.: Intelligent heart disease prediction system using data mining techniques. Int. J. Healthcare Biomed. Res. 1, 94–101 (2013)
Google Scholar
Salih, A.S.M., Abraham, A.: Intelligent decision support for real time health care monitoring system. In: Abraham, A., Krömer, P., Snasel, V. (eds.) Afro-European Conference for Industrial Advancement. AISC, vol. 334, pp. 183–192. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-13572-4_15
Chapter Google Scholar
Saxena, R., Johri, A., Deep, V., Sharma, P.: Heart diseases prediction system using CHC-TSS evolutionary, KNN, and decision tree classification algorithm. In: Abraham, A., Dutta, P., Mandal, J., Bhattacharya, A., Dutta, S. (eds.) Emerging Technologies in Data Mining and Information Security, vol. 813, pp. 809–819. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-1498-8_71
Chapter Google Scholar
Chandra Shekar, K., Chandra, P., Venugopala Rao, K.: An ensemble classifier characterized by genetic algorithm with decision tree for the prophecy of heart disease. In: Saini, H.S., Sayal, R., Govardhan, A., Buyya, R. (eds.) Innovations in Computer Science and Engineering. LNNS, vol. 74, pp. 9–15. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-7082-3_2
Chapter Google Scholar
Shrivas, A.K., Yadu, R.K.: An effective prediction factors for coronary heart disease using data mining based classification technique. Int. J. Recent Innov. Trends Comput. Commun. 5(5), 813–816 (2017)
Google Scholar
Skrbinjek, V., Dermol, V.: Predicting students’ satisfaction using a decision tree. Tert. Educ. Manag. 25(2), 101–113 (2019). https://doi.org/10.1007/s11233-018-09018-5
Article Google Scholar
Son, C.S., Kim, Y.N., Kim, H.S., Park, H.S., Kim, M.S.: Decision-making model for early diagnosis of congestive heart failure using rough set and decision tree approaches. J. Biomed. Inform. 45(5), 999–1008 (2012)
Article Google Scholar
Sturts, A., Slotman, G.: Predischarge decision tree analysis predicts 30-day congestive heart failure readmission. Crit. Care Med. 48(1), 116 (2020). https://doi.org/10.1097/01.ccm.0000619424.34362.bc
Article Google Scholar
Vallée, A., Petruescu, L., Kretz, S., Safar, M.E., Blacher, J.: Added value of aortic pulse wave velocity index in a predictive diagnosis decision tree of coronary heart disease. Am. J. Hypertens. 32(4), 375–383 (2019). https://doi.org/10.1093/ajh/hpz004
Article Google Scholar
Vijiyarani, S., Sudha, S.: An efficient classification tree technique for heart disease prediction. In: International Conference on Research Trends in Computer Technologies (ICRTCT-2013) Proceedings published in International Journal of Computer Applications (IJCA), vol. 201, pp. 0975–8887 (2013)
Google Scholar
Wu, C.S.M., Badshah, M., Bhagwat, V.: Heart disease prediction using data mining techniques. In: Proceedings of the 2019 2nd International Conference on Data Science and Information Technology, pp. 7–11 (2019). https://doi.org/10.1145/3352411.3352413
Yang, S., Guo, J.Z., Jin, J.W.: An improved Id3 algorithm for medical data classification. Comput. Electr. Eng. 65, 474–487 (2018). https://doi.org/10.1016/j.compeleceng.2017.08.005
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Physics and Computer Science, Dayalbagh Educational Institute, Agra, India
Deepak Saraswat & Preetvanti Singh

Authors

Deepak Saraswat
View author publications
You can also search for this author in PubMed Google Scholar
Preetvanti Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deepak Saraswat .

Editor information

Editors and Affiliations

National Institute of Technology Silchar, Silchar, India
Arup Bhattacharjee
National Institute Of Technology Silchar, Silchar, India
Samir Kr. Borgohain
National Institute of Technology Silchar, Silchar, India
Badal Soni
National Institute of Technology Kurukshetra, Kurukshetra, India
Gyanendra Verma
University of Eastern Finland, Kuopio, Finland
Xiao-Zhi Gao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saraswat, D., Singh, P. (2020). Comparison of Different Decision Tree Algorithms for Predicting the Heart Disease. In: Bhattacharjee, A., Borgohain, S., Soni, B., Verma, G., Gao, XZ. (eds) Machine Learning, Image Processing, Network Security and Data Sciences. MIND 2020. Communications in Computer and Information Science, vol 1241. Springer, Singapore. https://doi.org/10.1007/978-981-15-6318-8_21

Download citation

DOI: https://doi.org/10.1007/978-981-15-6318-8_21
Published: 15 June 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6317-1
Online ISBN: 978-981-15-6318-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Comparison of Different Decision Tree Algorithms for Predicting the Heart Disease

Abstract

Similar content being viewed by others

Heart Disease Prediction using Machine Learning Techniques

Predictive System: Comparison of Classification Techniques for Effective Prediction of Heart Disease

Prediction of Heart Disease Through KNN, Random Forest, and Decision Tree Classifier Using K-Fold Cross-Validation

Keywords

1 Introduction

2 Material and Method

2.1 Overview of Decision Tree Algorithms

2.2 Data Set

3 Decision Tree Analysis

4 Results: Comparison

5 Conclusion and Future Scope

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Comparison of Different Decision Tree Algorithms for Predicting the Heart Disease

Abstract

Similar content being viewed by others

Heart Disease Prediction using Machine Learning Techniques

Predictive System: Comparison of Classification Techniques for Effective Prediction of Heart Disease

Prediction of Heart Disease Through KNN, Random Forest, and Decision Tree Classifier Using K-Fold Cross-Validation

Keywords

1 Introduction

2 Material and Method

2.1 Overview of Decision Tree Algorithms

2.2 Data Set

3 Decision Tree Analysis

4 Results: Comparison

5 Conclusion and Future Scope

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation