An Efficient Information Retrieval Technique for Document Classification

Santhosh Ramchander, N.; Hegde, Nagaratna P.

doi:10.1007/978-981-16-9705-0_6

N. Santhosh Ramchander⁷ &
Nagaratna P. Hegde⁸

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 283))

Abstract

Information retrieval is prominent as a significant exploration area because of the sensational development in generation of text data, documents images, audio and video files which are uploaded in various forms in Internet. The information retrieval is used for perusing, looking and recovering reports from a gigantic dataset. Most regular systems for image recovery utilize some strategy for including metadata and title or portrayals. There is a need for efficient machine learning and information retrieval algorithms to access the required documents from a large set of text documents. In this paper, we present our method which is an efficient image retrieval using text and visual features. We present our method with the various visual semantic features for different queries through keyword expansions. We experimented our method with sample and standard datasets, and the results have been improved in terms of re-ranking and in terms of precisions in document retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Hardcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Digital Image Retrieval Based on Selective Conceptual Based Features for Important Documents

A Kind of Image Classification Method Study

Content Based Document Image Retrieval Using Computer Vision and AI Techniques

References

Cai, D., He, X., Han, J.: Locally consistent concept factorization for document clustering. IEEE Trans. Knowl. Data Eng. 23(6), 902–913 (2011)
Article Google Scholar
Hammouda, K.M., Kamel, M.S.: Efficient phrase-based document indexing for web document clustering. IEEE Trans. Knowl. Data Eng. 16(10) (2004)
Google Scholar
Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans. Pattern Anal. Mach. Intell. 31(4) (2009)
Google Scholar
Carlberger, J., Dalianis, H., Hassel, M., Knutsson, O.: Improving precision in information retrieval for Swedish using stemming. In: Proceedings of the 13th Nordic Conference on Computational Linguistics NODALIDA’01 (2001)
Google Scholar
Chen, A., Gey, F.: Combining query translation and document translation in cross language retrieval. In: CLEF 2003 (2003). http://www.clef-campaign.org/2003/WNweb/05.pdf
Croft, W.B.: Organizing and searching large files of documents. Ph.D. thesis, University of Cambridge, Oct 1978
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB Conference (1994)
Google Scholar
Beil, F., Ester, M., Xu, X.: Frequent term-based text clustering. In: ACM KDD Conference (2002)
Google Scholar
Abualigah, L.M., Khader, A.T., Hanandeh, E.S.: A novel weighting scheme applied to improve the text document clustering techniques. In: Zelinka, I., Vasant, P., Duy, V., Dao, T. (eds.) Innovative Computing, Optimization and Its Applications. Studies in Computational Intelligence, vol. 741. Springer (2018)
Google Scholar
Zamir, O., Etzioni, O.: Web document clustering: a feasibility demonstration. In: SIGIR’98: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug 1998
Google Scholar
Solorio-Fernández, S., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F.: A review of unsupervised feature selection methods. J. Artif. Intell. Rev. (2) (2020)
Google Scholar
Roul, R.K., Sahoo, J.K.: A novel approach for ranking web documents based on query-optimized personalized pagerank. Int. J. Data Sci. Anal. 11(1), 37–55 (2021)
Google Scholar
Nagy, G.: Twenty years of document image analysis in PAMI. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 38–62 (2000)
Article Google Scholar
Digital Library of India: http://dli.iiit.ac.in/
The Universal Library: http://www.uliborg
Willet, P.: Recent trends in hierarchical document clustering: a critical review. Inf. Process. Manage. 24, 577–597 (1988)
Article Google Scholar
Chen, C.-L., Tseng, F.S.C., Liang, T.: Mining fuzzy frequent itemsets for hierarchical document clustering. Int. J. Inf. Process. Manag. 46(2), 193–211 (2010)
Google Scholar
Cui, X., Potok, T.E., Palathingal, P.: Document clustering using particle swarm optimization. In: Swarm Intelligence Symposium, 2005. SIS 2005. Proceedings 2005 IEEE, June 2005. IEEE, pp. 185–191
Google Scholar
Murugesan, A.K., Zhang, B.J.: A new term weighting scheme for document clustering. In: 7th International Conference on Data Mining (DMIN 2011—WORLDCOMP 2011), Las Vegas, Nevada (2011)
Google Scholar
Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/gather: a cluster-based approach to browsing large document collections. In: Proceedings of the 15th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 318–329 (1992)
Google Scholar
Roul, R.K.: Topic modeling combined with classification technique for extractive multi-document text summarization. Soft Comput. 25(2), 1113–1127 (2021)
Google Scholar
Kumar, R.L., Kannammal, N., Krishnamoorthy, S., Kadry, S., Nam, Y.: Semantics based clustering through cover-Kmeans with OntoVsm for information retrieval. Inf. Technol. Control 49(3), 370–380 (2020)
Article Google Scholar
Kalyanasundaram, C., Ahire, S., Jain, G., Jain, S.: Text clustering for information retrieval system using supplementary information. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 6(2), 1613–1615 (2015)
Google Scholar
Du, S., Ma, Y., Li, S., Ma, Y.: Robust unsupervised feature selection via matrix factorization. Neurocomputing 241, 115–127 (2017). https://doi.org/10.1016/j.neucom.2017.02.034
Article Google Scholar
Dutta, D., Dutta, P., Sil, J.: Simultaneous feature selection and clustering with mixed features by multi objective genetic algorithm. Int. J. Hybrid Intell. Syst. 11(1), 41–54 (2014)
Google Scholar
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 507–514 (2005)
Google Scholar
https://lear.inrialpes.fr/jegou/data.php

Download references

Author information

Authors and Affiliations

OUCE, Osmania University, Hyderabad, India
N. Santhosh Ramchander
Vasavi College of Engineering, Hyderabad, India
Nagaratna P. Hegde

Authors

N. Santhosh Ramchander
View author publications
You can also search for this author in PubMed Google Scholar
Nagaratna P. Hegde
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to N. Santhosh Ramchander .

Editor information

Editors and Affiliations

School of Computer Engineering, Kalinga Institute of Industrial Technology (KIIT), Bhubaneswar, Odisha, India
Suresh Chandra Satapathy
Department of Electronics and Communication Engineering, Shri Ramswaroop Memorial College of Engineering and Management (SRMCEM), Lucknow, Uttar Pradesh, India
Vikrant Bhateja
Reshetnev Siberian State University of Science and Technology, Krasnoyarsk, Russia
Margarita N. Favorskaya
Department of Computer Science and Engineering, Vasvi College of Engineering, Hyderabad, Telangana, India
T. Adilakshmi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Santhosh Ramchander, N., Hegde, N.P. (2022). An Efficient Information Retrieval Technique for Document Classification. In: Satapathy, S.C., Bhateja, V., Favorskaya, M.N., Adilakshmi, T. (eds) Smart Intelligent Computing and Applications, Volume 2. Smart Innovation, Systems and Technologies, vol 283. Springer, Singapore. https://doi.org/10.1007/978-981-16-9705-0_6

Download citation

DOI: https://doi.org/10.1007/978-981-16-9705-0_6
Published: 22 May 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-9704-3
Online ISBN: 978-981-16-9705-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

An Efficient Information Retrieval Technique for Document Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Digital Image Retrieval Based on Selective Conceptual Based Features for Important Documents

A Kind of Image Classification Method Study

Content Based Document Image Retrieval Using Computer Vision and AI Techniques

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

An Efficient Information Retrieval Technique for Document Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Digital Image Retrieval Based on Selective Conceptual Based Features for Important Documents

A Kind of Image Classification Method Study

Content Based Document Image Retrieval Using Computer Vision and AI Techniques

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation