Abstract
Classification is a data mining technique in the machine learning domain. Various algorithms such as K-nearest neighbor, support vector machines, random forest, logistic regression, and decision trees are used to solve the classification problem. Out of them, logistic regression and decision trees are perhaps the most used classification techniques. The study seeks to compare the performance of these two techniques in classifying observations from two different data sets. This study aims to identify cases where one would prefer a particular algorithm over the other and explore the advantages and disadvantages associated with these two algorithms. After a data discovery phase, which included checking for missing values and removing redundant predictors, logistic regression and decision tree models were built for both datasets. The models were compared based on their ROC curves, and their predictive ability was obtained from their confusion matrices.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Han J, Kamber M, Pei J (2006) Data mining: concepts and techniques. Elsevier Science
Rudd JM, GStat MPH, Priestley JL (2017) A comparison of decision tree with logistic regression model for prediction of worst non-financial payment status in commercial credit Published and Grey Literature from PhD Candidates 5. https://digitalcommons.kennesaw.edu/dataphdgreylit/5
Andrews PJD et al (2002) Predicting recovery in patients suffering from traumatic brain injury by using admission variables and physiological data: a comparison between decision tree analysis and logistic regression. J Neurosurgery 97:326–336
Kohavi R (1995) A study of cross-validation and boostrap for accuracy estimation and model selection. In: Proceedings of the fourteenth international joint conference on artificial intelligence, pp 1137–1143
Hosmer DW, Lemeshow S (2000) Applied logistic regression, 2nd edn. Wiley (2000)
Kleinbaum DG, Klein M (2010) Logistic regression, 3rd edn. Springer, New York
Barros RC et al (2015) Automatic design of decision tree induction algorithms. Springer, International Publishing
Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15:651–674
Lincoff GH (Pres) (1981) The audubon society field guide to north american mushrooms. Alfred A. Knopf, New York
Antonio N, Almeida A, Nunes L (2019) Hotel booking demand datasets. Data Brief 22
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Appendix
Appendix
See Table 1.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Roy, D., Chatterjee, S. (2022). Classification Through Data Mining Algorithm. In: Bansal, J.C., Engelbrecht, A., Shukla, P.K. (eds) Computer Vision and Robotics. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-16-8225-4_7
Download citation
DOI: https://doi.org/10.1007/978-981-16-8225-4_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-8224-7
Online ISBN: 978-981-16-8225-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)