A Comparative Analysis on Image Caption Generator Using Deep Learning Architecture—ResNet and VGG16

Sri Neha, V.; Nikhila, B.; Deepika, K.; Subetha, T.

doi:10.1007/978-981-16-9573-5_15

V. Sri Neha¹⁷,
B. Nikhila¹⁷,
K. Deepika¹⁷ &
…
T. Subetha¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1420))

784 Accesses

Abstract

Image caption generator generates the caption for a given image by understanding the image. The functionality is that it involves numerous concepts of computer vision to identify the image and to reciprocate the same in English. The challenging part of the caption generation is to understand the image and understand the image context and produce English description for the image. In our work, we compared the abilities of two deep learning architectures named VGG16 and ResNet50 for understanding the image and LSTM for generating the relevant caption for the image. The paper discusses about the usage of two deep learning architectures on generating the captions from the photograph. With the advancements in the deep learning techniques, the Flickr8k datasets are taken that have high dimensionality to compare the performance of the caption generated. The Flickr8k dataset has 8000 images where every image is grouped with five varied captions that determine the appropriate content of the image. The high computational power of the deep learning techniques is helpful to build models that can generate captions for picture. The two deep learning architectures performance is compared using BLEU score. The widely used applications of image caption generator are to describe caption for photograph so that blind can understand the image.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Automatic image caption generation using deep learning

Article 01 June 2023

Empirical Study of Image Captioning Models Using Various Deep Learning Encoders

Generation of Image Captions Using VGG and ResNet CNN Models Cascaded with RNN Approach

References

Ranganathan, G.: A study to find facts behind preprocessing on deep learning algorithms. J. Innovative Image Process. (JIIP) 3(01), 66–74 (2021)
Article Google Scholar
Vivekanandam, B.: Evaluation of activity monitoring algorithm based on smart approaches. J. Electron. 2(03), 175–181 (2020)
Google Scholar
Tanti, M., Gatt, A., Camilleri, K.P.: Where to put the image in an image caption generator. Nat. Lang. Eng. 24(3), 467–489 (2018)
Article Google Scholar
Vinyals, O., et al.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
Han, S.-H., Choi, H.-J.: Domain-specific image caption generator with semantic ontology. In: 2020 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE, New York (2020)
Google Scholar
Aneja, J., Deshpande, A., Schwing, A.G.: Convolutional image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Chu, Y., et al.: Automatic image captioning based on ResNet50 and LSTM with soft attention. In: Wireless Com unications and Mobile Computing 2020 (2020)
Google Scholar
Tan, Y.H., Chan, C.S.: Phrase-based image caption generator with hierarchical LSTM network. Neurocomputing 333, 86–100 (2019)
Google Scholar
Katiyar, S., Borgohain, S.K.: Comparative evaluation of CNN architectures for image caption generation. arXiv preprint arXiv:2102.11506 (2021)
Seo, P.H., et al.: Reinforcing an image caption generator using off-line human feedback. In: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, No. 03 (2020)
Google Scholar
Anu, M., Divya, S.: Building a voice based image caption generator with deep learning. In: 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS). IEEE, New York (2021)
Google Scholar
Tanti, M., Gatt, A., Camilleri, K.P.: What is the role of recurrent neural networks (RNNS) in an image caption generator?. arXiv preprint arXiv:1708.02043 (2017)
Google Scholar
Mathur, P., et al.: Camera2Caption: a real-time image caption generator. In: 2017 International Conference on Computational Intelligence in Data Science (ICCIDS). IEEE, New York (2017)
Google Scholar
He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Com puter Vision and Pattern Recognition (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

BVRIT HYDERABAD College of Engineering for Women, Hyderabad, India
V. Sri Neha, B. Nikhila, K. Deepika & T. Subetha

Authors

V. Sri Neha
View author publications
You can also search for this author in PubMed Google Scholar
B. Nikhila
View author publications
You can also search for this author in PubMed Google Scholar
K. Deepika
View author publications
You can also search for this author in PubMed Google Scholar
T. Subetha
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of ECE, RVS Technical Campus, Coimbatore, Tamil Nadu, India
S. Smys
Departamento de Engenharia Mecanica, Faculdade de Engenharia, Universidade do Porto, Porto, Portugal
João Manuel R. S. Tavares
Faculty of Engineering, Aurel Vlaicu University of Arad, Arad, Romania
Valentina Emilia Balas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sri Neha, V., Nikhila, B., Deepika, K., Subetha, T. (2022). A Comparative Analysis on Image Caption Generator Using Deep Learning Architecture—ResNet and VGG16. In: Smys, S., Tavares, J.M.R.S., Balas, V.E. (eds) Computational Vision and Bio-Inspired Computing. Advances in Intelligent Systems and Computing, vol 1420. Springer, Singapore. https://doi.org/10.1007/978-981-16-9573-5_15

Download citation

DOI: https://doi.org/10.1007/978-981-16-9573-5_15
Published: 31 March 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-9572-8
Online ISBN: 978-981-16-9573-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

A Comparative Analysis on Image Caption Generator Using Deep Learning Architecture—ResNet and VGG16

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic image caption generation using deep learning

Empirical Study of Image Captioning Models Using Various Deep Learning Encoders

Generation of Image Captions Using VGG and ResNet CNN Models Cascaded with RNN Approach

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Comparative Analysis on Image Caption Generator Using Deep Learning Architecture—ResNet and VGG16

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic image caption generation using deep learning

Empirical Study of Image Captioning Models Using Various Deep Learning Encoders

Generation of Image Captions Using VGG and ResNet CNN Models Cascaded with RNN Approach

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation