Empirical Study of Image Captioning Models Using Various Deep Learning Encoders

Gaurav; Mathur, Pratistha

doi:10.1007/978-981-99-0047-3_27

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 998))

Included in the following conference series:

International Conference on Machine Intelligence and Signal Processing

350 Accesses

Abstract

Image captioning is the generation of caption for the given image. It is a growing and challenging research topic in the field of computer vision. There have been many approaches for image captioning. Initially, Template based methods were implemented in which a fixed size template was used for fill up with image objects and their properties. Retrieval based approach was used in which some images used to match with the query image and a caption was generated with the help of captions of these images. There were some limitations in these approaches of missing out important objects. Recent approach for the image captioning uses the technique of encoder and decoder. The encoder and decoder based methods have provided remarkable results. However, it is not easy to determine the impact of only the encoder for the image captioning task. In this paper, we have compared the performance of image captioning models with various image encoders such as Visual Geometry Group (VGG), Residual Networks (ResNet), InceptionV3 uses the Gated Recurrent Unit (GRU) as the decoder for the purpose of text generation. The results are compared on the basis of Bilingual Evaluation Understudy (BLEU) score using Flickr8K Dataset. It can be seen that the ResNet provides a better BLEU score as compared to the VGG16, VGG19 and InceptionV3 when it was implemented using the Flickr8K Dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Experimenting Encoder-Decoder Architecture for Visual Image Captioning

Automatic image caption generation using deep learning

Article 01 June 2023

A Comparative Analysis on Image Caption Generator Using Deep Learning Architecture—ResNet and VGG16

References

Pal A, Kar S, Taneja A, Jadoun VK (2020) Image captioning and comparison of different encoders. J Phys Conf Ser 1478. https://doi.org/10.1088/1742-6596/1478/1/012004
Yang Z, Liu Q (2020) ATT-BM-SOM: a framework of effectively choosing image information and optimizing syntax for image captioning. IEEE Access 8:50565–50573. https://doi.org/10.1109/ACCESS.2020.2980578
Article Google Scholar
Wang H, Wang H, Xu K (2020) Evolutionary recurrent neural network for image captioning. Neurocomputing 401:249–256. https://doi.org/10.1016/j.neucom.2020.03.087
Article Google Scholar
Lu X, Wang B, Zheng X, Li X (2017) Sensing image caption generation. IEEE Trans Geosci Remote Sens 56:1–13
Google Scholar
Wang B, Zheng X, Qu B, Lu X, Member S (2020) Remote sensing image captioning. IEEE J Sel Top Appl Earth Obs Remote Sens 13:256–270
Google Scholar
Liu M, Li L, Hu H, Guan W, Tian J (2020) Image caption generation with dual attention mechanism. Inf Process Manag 57:102178. https://doi.org/10.1016/j.ipm.2019.102178
Article Google Scholar
Gaurav, Mathur P (2011) A survey on various deep learning models for automatic image captioning. J Phys Conf Ser 1950. https://doi.org/10.1088/1742-6596/1950/1/012045
Wang C, Yang H, Meinel C (2018) Image captioning with deep bidirectional LSTMs and multi-task learning. ACM Trans Multimed Comput Commun Appl 14 (2018). https://doi.org/10.1145/3115432
Xiao X, Wang L, Ding K, Xiang S, Pan C (2019) Deep hierarchical encoder-decoder network for image captioning. IEEE Trans Multimed 21:2942–2956. https://doi.org/10.1109/TMM.2019.2915033
Article Google Scholar
Deng Z, Jiang Z, Lan R, Huang W, Luo X (2020) Image captioning using DenseNet network and adaptive attention. Signal Process Image Commun. 85:115836. https://doi.org/10.1016/j.image.2020.115836
Article Google Scholar
Han SH, Choi HJ (2020) Domain-specific image caption generator with semantic ontology. In: Proceedings—2020 IEEE international conference on big data and smart computing (BigComp), pp 526–530. https://doi.org/10.1109/BigComp48618.2020.00-12
Wei H, Li Z, Zhang C, Ma H (2020) The synergy of double attention: combine sentence-level and word-level attention for image captioning. Comput Vis Image Underst 201:103068. https://doi.org/10.1016/j.cviu.2020.103068
Article Google Scholar
Kalra S, Leekha A (2020) Survey of convolutional neural networks for image captioning. J Inf Optim Sci 41:239–260. https://doi.org/10.1080/02522667.2020.1715602
Article Google Scholar
Yu N, Hu X, Song B, Yang J, Zhang J (2019) Topic-Oriented image captioning based on order-embedding. IEEE Trans Image Process 28:2743–2754. https://doi.org/10.1109/TIP.2018.2889922
Article MathSciNet MATH Google Scholar
Part 14 : Dot and Hadamard Product | by Avnish | Linear Algebra | Medium, https://medium.com/linear-algebra/part-14-dot-and-hadamard-product-b7e0723b9133. Last Accessed 07 Feb 2022
Understanding learning rates and how it improves performance in deep learning | by Hafidz Zulkifli | Towards Data Science. https://towardsdatascience.com/understanding-learning-rates-and-how-it-improves-performance-in-deep-learning-d0d4059c1c10. Last Accessed 08 Feb 2022
Zakir Hossain MD, Sohel F, Shiratuddin MF, Laga H (2019) A comprehensive survey of deep learning for image captioning. ACM Comput Surv 51. https://doi.org/10.1145/3295748.

Download references

Author information

Authors and Affiliations

Manipal University Jaipur, Jaipur, Rajasthan, India
Gaurav & Pratistha Mathur

Authors

Gaurav
View author publications
You can also search for this author in PubMed Google Scholar
Pratistha Mathur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gaurav .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology Raipur, Raipur, Chhattisgarh, India
Pradeep Singh
Department of Computer Science and Engineering, National Institute of Technology Raipur, Raipur, Chhattisgarh, India
Deepak Singh
Department of Computer Science and Engineering, International Institute of Information Technology, Naya Raipur, Chhattisgarh, India
Vivek Tiwari
Østfold University College, Halden, Norway
Sanjay Misra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gaurav, Mathur, P. (2023). Empirical Study of Image Captioning Models Using Various Deep Learning Encoders. In: Singh, P., Singh, D., Tiwari, V., Misra, S. (eds) Machine Learning and Computational Intelligence Techniques for Data Engineering. MISP 2022. Lecture Notes in Electrical Engineering, vol 998. Springer, Singapore. https://doi.org/10.1007/978-981-99-0047-3_27

Download citation

DOI: https://doi.org/10.1007/978-981-99-0047-3_27
Published: 16 May 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-0046-6
Online ISBN: 978-981-99-0047-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Empirical Study of Image Captioning Models Using Various Deep Learning Encoders

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Experimenting Encoder-Decoder Architecture for Visual Image Captioning

Automatic image caption generation using deep learning

A Comparative Analysis on Image Caption Generator Using Deep Learning Architecture—ResNet and VGG16

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Empirical Study of Image Captioning Models Using Various Deep Learning Encoders

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Experimenting Encoder-Decoder Architecture for Visual Image Captioning

Automatic image caption generation using deep learning

A Comparative Analysis on Image Caption Generator Using Deep Learning Architecture—ResNet and VGG16

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation