Skip to main content

A Comparative Analysis on Image Caption Generator Using Deep Learning Architecture—ResNet and VGG16

  • Conference paper
  • First Online:
Computational Vision and Bio-Inspired Computing

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1420))

  • 784 Accesses

Abstract

Image caption generator generates the caption for a given image by understanding the image. The functionality is that it involves numerous concepts of computer vision to identify the image and to reciprocate the same in English. The challenging part of the caption generation is to understand the image and understand the image context and produce English description for the image. In our work, we compared the abilities of two deep learning architectures named VGG16 and ResNet50 for understanding the image and LSTM for generating the relevant caption for the image. The paper discusses about the usage of two deep learning architectures on generating the captions from the photograph. With the advancements in the deep learning techniques, the Flickr8k datasets are taken that have high dimensionality to compare the performance of the caption generated. The Flickr8k dataset has 8000 images where every image is grouped with five varied captions that determine the appropriate content of the image. The high computational power of the deep learning techniques is helpful to build models that can generate captions for picture. The two deep learning architectures performance is compared using BLEU score. The widely used applications of image caption generator are to describe caption for photograph so that blind can understand the image.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ranganathan, G.: A study to find facts behind preprocessing on deep learning algorithms. J. Innovative Image Process. (JIIP) 3(01), 66–74 (2021)

    Article  Google Scholar 

  2. Vivekanandam, B.: Evaluation of activity monitoring algorithm based on smart approaches. J. Electron. 2(03), 175–181 (2020)

    Google Scholar 

  3. Tanti, M., Gatt, A., Camilleri, K.P.: Where to put the image in an image caption generator. Nat. Lang. Eng. 24(3), 467–489 (2018)

    Article  Google Scholar 

  4. Vinyals, O., et al.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  5. Han, S.-H., Choi, H.-J.: Domain-specific image caption generator with semantic ontology. In: 2020 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE, New York (2020)

    Google Scholar 

  6. Aneja, J., Deshpande, A., Schwing, A.G.: Convolutional image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  7. Chu, Y., et al.: Automatic image captioning based on ResNet50 and LSTM with soft attention. In: Wireless Com unications and Mobile Computing 2020 (2020)

    Google Scholar 

  8. Tan, Y.H., Chan, C.S.: Phrase-based image caption generator with hierarchical LSTM network. Neurocomputing 333, 86–100 (2019)

    Google Scholar 

  9. Katiyar, S., Borgohain, S.K.: Comparative evaluation of CNN architectures for image caption generation. arXiv preprint arXiv:2102.11506 (2021)

  10. Seo, P.H., et al.: Reinforcing an image caption generator using off-line human feedback. In: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, No. 03 (2020)

    Google Scholar 

  11. Anu, M., Divya, S.: Building a voice based image caption generator with deep learning. In: 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS). IEEE, New York (2021)

    Google Scholar 

  12. Tanti, M., Gatt, A., Camilleri, K.P.: What is the role of recurrent neural networks (RNNS) in an image caption generator?. arXiv preprint arXiv:1708.02043 (2017)

    Google Scholar 

  13. Mathur, P., et al.: Camera2Caption: a real-time image caption generator. In: 2017 International Conference on Computational Intelligence in Data Science (ICCIDS). IEEE, New York (2017)

    Google Scholar 

  14. He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Com puter Vision and Pattern Recognition (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sri Neha, V., Nikhila, B., Deepika, K., Subetha, T. (2022). A Comparative Analysis on Image Caption Generator Using Deep Learning Architecture—ResNet and VGG16. In: Smys, S., Tavares, J.M.R.S., Balas, V.E. (eds) Computational Vision and Bio-Inspired Computing. Advances in Intelligent Systems and Computing, vol 1420. Springer, Singapore. https://doi.org/10.1007/978-981-16-9573-5_15

Download citation

Publish with us

Policies and ethics