Classification Through Data Mining Algorithm

Roy, Debasish; Chatterjee, Subhojit

doi:10.1007/978-981-16-8225-4_7

Debasish Roy⁷ &
Subhojit Chatterjee⁷

Part of the book series: Algorithms for Intelligent Systems ((AIS))

540 Accesses

Abstract

Classification is a data mining technique in the machine learning domain. Various algorithms such as K-nearest neighbor, support vector machines, random forest, logistic regression, and decision trees are used to solve the classification problem. Out of them, logistic regression and decision trees are perhaps the most used classification techniques. The study seeks to compare the performance of these two techniques in classifying observations from two different data sets. This study aims to identify cases where one would prefer a particular algorithm over the other and explore the advantages and disadvantages associated with these two algorithms. After a data discovery phase, which included checking for missing values and removing redundant predictors, logistic regression and decision tree models were built for both datasets. The models were compared based on their ROC curves, and their predictive ability was obtained from their confusion matrices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Han J, Kamber M, Pei J (2006) Data mining: concepts and techniques. Elsevier Science
Google Scholar
Rudd JM, GStat MPH, Priestley JL (2017) A comparison of decision tree with logistic regression model for prediction of worst non-financial payment status in commercial credit Published and Grey Literature from PhD Candidates 5. https://digitalcommons.kennesaw.edu/dataphdgreylit/5
Andrews PJD et al (2002) Predicting recovery in patients suffering from traumatic brain injury by using admission variables and physiological data: a comparison between decision tree analysis and logistic regression. J Neurosurgery 97:326–336
Google Scholar
Kohavi R (1995) A study of cross-validation and boostrap for accuracy estimation and model selection. In: Proceedings of the fourteenth international joint conference on artificial intelligence, pp 1137–1143
Google Scholar
Hosmer DW, Lemeshow S (2000) Applied logistic regression, 2nd edn. Wiley (2000)
Google Scholar
Kleinbaum DG, Klein M (2010) Logistic regression, 3rd edn. Springer, New York
Book Google Scholar
Barros RC et al (2015) Automatic design of decision tree induction algorithms. Springer, International Publishing
Book Google Scholar
Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15:651–674
Article MathSciNet Google Scholar
Lincoff GH (Pres) (1981) The audubon society field guide to north american mushrooms. Alfred A. Knopf, New York
Google Scholar
Antonio N, Almeida A, Nunes L (2019) Hotel booking demand datasets. Data Brief 22
Google Scholar

Download references

Author information

Authors and Affiliations

Statistics Department, Amity University, Action Area II, Rajarhat, Newtown, Kolkata, 700135, India
Debasish Roy & Subhojit Chatterjee

Authors

Debasish Roy
View author publications
You can also search for this author in PubMed Google Scholar
Subhojit Chatterjee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

South Asian University, New Delhi, Delhi, India
Jagdish Chand Bansal
Department of Industrial Engineering and Computer Science, Stellenbosch University, Stellenbosch, South Africa
Andries Engelbrecht
Babu Banarasi Das University, Lucknow, Uttar Pradesh, India
Praveen Kumar Shukla

Appendix

See Table 1.

Table 1 Complexity parameter (CP) table showing the cross-validation error (xerror) and the size of a tree obtained corresponding to different CP values with respect to the deep/parent tree

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Roy, D., Chatterjee, S. (2022). Classification Through Data Mining Algorithm. In: Bansal, J.C., Engelbrecht, A., Shukla, P.K. (eds) Computer Vision and Robotics. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-16-8225-4_7

Download citation

DOI: https://doi.org/10.1007/978-981-16-8225-4_7
Published: 15 March 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-8224-7
Online ISBN: 978-981-16-8225-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Classification Through Data Mining Algorithm

Abstract

Access this chapter

Subscribe and save

Buy Now

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation