Automated Document Categorization Model

Patra, Rakhi

doi:10.1007/978-3-030-50641-4_2

Rakhi Patra⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 907))

1620 Accesses

Abstract

The aim of this work is to build a generic model of Document Clustering that automatically groups together the related documents. Model is built with unsupervised and supervised learning with the assumption of no prior knowledge of the given domain. No manual effort is required for creating the training document set, instead the proposed model automatically generates training document. After that, it uses those for categorizing text documents. In the proposed model, the entire process is broadly divided into two steps. First, the initial classification is done in an unsupervised way. Apply K-means algorithm on the unlabeled documents in order to prepare the training dataset. Text documents are represented here as feature vector format where keywords extracted are considered as a feature. Here the selected representative documents are considered as the initial centroids. In step 2, create a supervised classifier on the initially categorized set. The categorized documents resulted from the previous step are used to train the supervised classifier. Naive Bayes classifier will be used as a statistical text classifier which uses word frequencies as features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Semi-supervised Text Categorization Using Recursive K-means Clustering

Document Clustering Using Different Unsupervised Learning Approaches: A Survey

An Efficient Approach for Document Categorization Using Weighted Sum

References

Ikonomakis, M., Kotsiantis, S., & Tampakas, V. (2005). Text classification using machine learning techniques. WSEAS Transactions on Computers, 4(8), 966–974.
Google Scholar
Purohit, A., Atre, D., Jaswani, P., & Asawara, P. (2015). Text classification in data mining. International Journal of Scientific and Research Publications, 5(6), 1–7.
Google Scholar
Morariu, D. I., Cretulescu, R. G., & Breazu, M.: Feature selection in document classification. https://pdfs.semanticscholar.org/.
http://www.codeproject.com/Articles/822379/Text-Mining-and-its-Business-Applications.
https://www.datanovia.com/en/lessons/determining-the-optimal-number-of-clusters-3-must-know-methods/.
Liu, Y. C., Liu, M., Wang, X. L. (2012). Application of self-organizing maps in text clustering: a review (vol. 10). https://doi.org/10.5772/50618.
https://www.kdnuggets.com/2015/01/text-analysis-101-document-classification.html.
Ko, Y., & Seo, J.: Automatic text categorization by unsupervised learning. In: Proceedings of the 18th Conference on Computational Linguistics (vol. 1, pp. 453–459). Association for Computational Linguistics, July 2000.
Google Scholar
https://www.slideserve.com/nelly/text-mining-overview.

Download references

Author information

Authors and Affiliations

Department of Software Engineering, Birla Institute of Technology and Science, Pilani, Pilani, 333031, Rajasthan, India
Rakhi Patra

Authors

Rakhi Patra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rakhi Patra .

Editor information

Editors and Affiliations

School of Computer Science and Engineering, National Institute of Science and Technology (Autonomous), Berhampur, Odisha, India
Santosh Kumar Das
School of Computer Science and Engineering, National Institute of Science and Technology (Autonomous), Berhampur, Odisha, India
Shom Prasad Das
Department of Information Technology, Techno India College of Technology, Kolkata, West Bengal, India
Nilanjan Dey
Founder and Head of the Egyptian Scientific Research Group (SRGE), Information Technology Department, Cairo University, Faculty of Computer and Artificial Intelligence, Giza, Egypt
Aboul-Ella Hassanien

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Patra, R. (2021). Automated Document Categorization Model. In: Das, S., Das, S., Dey, N., Hassanien, AE. (eds) Machine Learning Algorithms for Industrial Applications. Studies in Computational Intelligence, vol 907. Springer, Cham. https://doi.org/10.1007/978-3-030-50641-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-50641-4_2
Published: 19 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50640-7
Online ISBN: 978-3-030-50641-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Automated Document Categorization Model

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Semi-supervised Text Categorization Using Recursive K-means Clustering

Document Clustering Using Different Unsupervised Learning Approaches: A Survey

An Efficient Approach for Document Categorization Using Weighted Sum

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automated Document Categorization Model

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Semi-supervised Text Categorization Using Recursive K-means Clustering

Document Clustering Using Different Unsupervised Learning Approaches: A Survey

An Efficient Approach for Document Categorization Using Weighted Sum

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation