Skip to main content

An Efficient Information Retrieval Technique for Document Classification

  • Conference paper
  • First Online:
Smart Intelligent Computing and Applications, Volume 2

Abstract

Information retrieval is prominent as a significant exploration area because of the sensational development in generation of text data, documents images, audio and video files which are uploaded in various forms in Internet. The information retrieval is used for perusing, looking and recovering reports from a gigantic dataset. Most regular systems for image recovery utilize some strategy for including metadata and title or portrayals. There is a need for efficient machine learning and information retrieval algorithms to access the required documents from a large set of text documents. In this paper, we present our method which is an efficient image retrieval using text and visual features. We present our method with the various visual semantic features for different queries through keyword expansions. We experimented our method with sample and standard datasets, and the results have been improved in terms of re-ranking and in terms of precisions in document retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Cai, D., He, X., Han, J.: Locally consistent concept factorization for document clustering. IEEE Trans. Knowl. Data Eng. 23(6), 902–913 (2011)

    Article  Google Scholar 

  2. Hammouda, K.M., Kamel, M.S.: Efficient phrase-based document indexing for web document clustering. IEEE Trans. Knowl. Data Eng. 16(10) (2004)

    Google Scholar 

  3. Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans. Pattern Anal. Mach. Intell. 31(4) (2009)

    Google Scholar 

  4. Carlberger, J., Dalianis, H., Hassel, M., Knutsson, O.: Improving precision in information retrieval for Swedish using stemming. In: Proceedings of the 13th Nordic Conference on Computational Linguistics NODALIDA’01 (2001)

    Google Scholar 

  5. Chen, A., Gey, F.: Combining query translation and document translation in cross language retrieval. In: CLEF 2003 (2003). http://www.clef-campaign.org/2003/WNweb/05.pdf

  6. Croft, W.B.: Organizing and searching large files of documents. Ph.D. thesis, University of Cambridge, Oct 1978

    Google Scholar 

  7. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB Conference (1994)

    Google Scholar 

  8. Beil, F., Ester, M., Xu, X.: Frequent term-based text clustering. In: ACM KDD Conference (2002)

    Google Scholar 

  9. Abualigah, L.M., Khader, A.T., Hanandeh, E.S.: A novel weighting scheme applied to improve the text document clustering techniques. In: Zelinka, I., Vasant, P., Duy, V., Dao, T. (eds.) Innovative Computing, Optimization and Its Applications. Studies in Computational Intelligence, vol. 741. Springer (2018)

    Google Scholar 

  10. Zamir, O., Etzioni, O.: Web document clustering: a feasibility demonstration. In: SIGIR’98: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug 1998

    Google Scholar 

  11. Solorio-Fernández, S., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F.: A review of unsupervised feature selection methods. J. Artif. Intell. Rev. (2) (2020)

    Google Scholar 

  12. Roul, R.K., Sahoo, J.K.: A novel approach for ranking web documents based on query-optimized personalized pagerank. Int. J. Data Sci. Anal. 11(1), 37–55 (2021)

    Google Scholar 

  13. Nagy, G.: Twenty years of document image analysis in PAMI. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 38–62 (2000)

    Article  Google Scholar 

  14. Digital Library of India: http://dli.iiit.ac.in/

  15. The Universal Library: http://www.uliborg

  16. Willet, P.: Recent trends in hierarchical document clustering: a critical review. Inf. Process. Manage. 24, 577–597 (1988)

    Article  Google Scholar 

  17. Chen, C.-L., Tseng, F.S.C., Liang, T.: Mining fuzzy frequent itemsets for hierarchical document clustering. Int. J. Inf. Process. Manag. 46(2), 193–211 (2010)

    Google Scholar 

  18. Cui, X., Potok, T.E., Palathingal, P.: Document clustering using particle swarm optimization. In: Swarm Intelligence Symposium, 2005. SIS 2005. Proceedings 2005 IEEE, June 2005. IEEE, pp. 185–191

    Google Scholar 

  19. Murugesan, A.K., Zhang, B.J.: A new term weighting scheme for document clustering. In: 7th International Conference on Data Mining (DMIN 2011—WORLDCOMP 2011), Las Vegas, Nevada (2011)

    Google Scholar 

  20. Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/gather: a cluster-based approach to browsing large document collections. In: Proceedings of the 15th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 318–329 (1992)

    Google Scholar 

  21. Roul, R.K.: Topic modeling combined with classification technique for extractive multi-document text summarization. Soft Comput. 25(2), 1113–1127 (2021)

    Google Scholar 

  22. Kumar, R.L., Kannammal, N., Krishnamoorthy, S., Kadry, S., Nam, Y.: Semantics based clustering through cover-Kmeans with OntoVsm for information retrieval. Inf. Technol. Control 49(3), 370–380 (2020)

    Article  Google Scholar 

  23. Kalyanasundaram, C., Ahire, S., Jain, G., Jain, S.: Text clustering for information retrieval system using supplementary information. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 6(2), 1613–1615 (2015)

    Google Scholar 

  24. Du, S., Ma, Y., Li, S., Ma, Y.: Robust unsupervised feature selection via matrix factorization. Neurocomputing 241, 115–127 (2017). https://doi.org/10.1016/j.neucom.2017.02.034

    Article  Google Scholar 

  25. Dutta, D., Dutta, P., Sil, J.: Simultaneous feature selection and clustering with mixed features by multi objective genetic algorithm. Int. J. Hybrid Intell. Syst. 11(1), 41–54 (2014)

    Google Scholar 

  26. He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 507–514 (2005)

    Google Scholar 

  27. https://lear.inrialpes.fr/jegou/data.php

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. Santhosh Ramchander .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Santhosh Ramchander, N., Hegde, N.P. (2022). An Efficient Information Retrieval Technique for Document Classification. In: Satapathy, S.C., Bhateja, V., Favorskaya, M.N., Adilakshmi, T. (eds) Smart Intelligent Computing and Applications, Volume 2. Smart Innovation, Systems and Technologies, vol 283. Springer, Singapore. https://doi.org/10.1007/978-981-16-9705-0_6

Download citation

Publish with us

Policies and ethics