Skip to main content

Content Based Document Image Retrieval Using Computer Vision and AI Techniques

  • Conference paper
  • First Online:
ICT for Intelligent Systems ( ICTIS 2023)

Abstract

This work presents a technique for finding if a same image is present in the database of PDF files. The objective of content-based document image retrieval is achieved by comparing a query image with the images from the PDF documents to check if similar image is present in the searched PDF documents. Typically, the ranking of the photos for retrieval is based on how closely the representative features of the query image and PDF images show similarity. The features such as texture features and Scale Invariant Feature Transform (SIFT) features, are extracted from the images of the pdf documents as well as from query image to determine whether the query images present in the PDF file or not. If the image from searched PDF matches with the query image, the result displays the searched PDF.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 279.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Shiah CY (2020) Content-based document image retrieval based on document modeling. J Intell Inf Syst 55(2):287–306

    Google Scholar 

  2. Firkat E, Dawut A, Tuerxun P, Hamdulla A (2019) Bilingual printed document image retrieval based on SIFT feature. In: 2019 international conference on intelligent transportation, big data and smart city (ICITBS), pp 548–551

    Google Scholar 

  3. Sharma N, Mandal R, Sharma R, Pal U, Blumenstein M (2018) Signature and logo detection using deep CNN for document image retrieval. In: 2018 16th international conference on frontiers in handwriting recognition (ICFHR), pp 416–422

    Google Scholar 

  4. Ketwong P, Hongsa-arparsat P, Srilaphat E, Kaprasit W (2017) The simple image processing scheme for document retrieval using date of issue as query. In: 2017 IEEE 2nd international conference on signal and image processing (ICSIP), pp 288–291

    Google Scholar 

  5. Ullah U, Ben Mabrouk I, Al-Hasan M, Nedil M, Ain MF (2021) A nested square-shape dielectric resonator for microwave band antenna applications. Int J Electr Comput Eng (IJECE) 11(1):481–488

    Article  Google Scholar 

  6. Minarno AE, Munarko Y, Kurniawardhani A, Bimantoro F, Suciati N (2014) Texture feature extraction using co-occurrence matrices of sub-band image for batik image classification. In: 2014 2nd international conference on information and communication technology (ICoICT)

    Google Scholar 

  7. Joglekar J (2021) Texture feature presentation. Lecture notes

    Google Scholar 

  8. A detailed guide to the powerful SIFT technique for image matching (2020). https://www.analyticsvidhya.com/blog/2019/10/detailed-guide-powerful-sift-technique-image-matching-python/. Accessed 18 May 2022

  9. Rezaeijo SM, Ghorvei M, Abedi-Firouzjah R et al (2021) Detecting COVID-19 in chest images based on deep transfer learning and machine learning algorithms. Egypt J Radiol Nucl Med 52:145. https://doi.org/10.1186/s43055-021-00524-y

    Article  Google Scholar 

  10. Haralick RM, Shanmugam K, Dinstein I (1973) Textural features for image classification. IEEE Trans Syst Man Cybern SMC-3(6):610–621

    Google Scholar 

  11. Armi L, Fekri-Ershad S (2019) Texture image analysis and texture classification methods—a review. Int Online J Image Process Pattern Recogn 2(1):1–29

    Google Scholar 

  12. Singh A (2022) A detailed guide to the powerful SIFT technique for image matching (with Python code). https://www.analyticsvidhya.com/blog/2019/10/detailed-guide-powerful-sift-technique-image-matching-python/. Accessed 15 Mar 2022

  13. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60:91–110

    Article  Google Scholar 

  14. OpenCV-Python Tutorials (2016) https://opencv24-python-tutorials.readthedocs.io/en/latest/py_tutorials/py_feature2d/py_sift_intro/py_sift_intro.html. Accessed 15 Mar 2022

  15. Programmer Sought (2016). https://www.programmersought.com/article/96625951757/. Accessed 25 Feb 2022

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Harsh Bharat Shah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shah, H.B., Joglekar, J.V. (2023). Content Based Document Image Retrieval Using Computer Vision and AI Techniques. In: Choudrie, J., Mahalle, P.N., Perumal, T., Joshi, A. (eds) ICT for Intelligent Systems. ICTIS 2023. Smart Innovation, Systems and Technologies, vol 361. Springer, Singapore. https://doi.org/10.1007/978-981-99-3982-4_19

Download citation

Publish with us

Policies and ethics