Abstract
Image caption generator generates the caption for a given image by understanding the image. The functionality is that it involves numerous concepts of computer vision to identify the image and to reciprocate the same in English. The challenging part of the caption generation is to understand the image and understand the image context and produce English description for the image. In our work, we compared the abilities of two deep learning architectures named VGG16 and ResNet50 for understanding the image and LSTM for generating the relevant caption for the image. The paper discusses about the usage of two deep learning architectures on generating the captions from the photograph. With the advancements in the deep learning techniques, the Flickr8k datasets are taken that have high dimensionality to compare the performance of the caption generated. The Flickr8k dataset has 8000 images where every image is grouped with five varied captions that determine the appropriate content of the image. The high computational power of the deep learning techniques is helpful to build models that can generate captions for picture. The two deep learning architectures performance is compared using BLEU score. The widely used applications of image caption generator are to describe caption for photograph so that blind can understand the image.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ranganathan, G.: A study to find facts behind preprocessing on deep learning algorithms. J. Innovative Image Process. (JIIP) 3(01), 66–74 (2021)
Vivekanandam, B.: Evaluation of activity monitoring algorithm based on smart approaches. J. Electron. 2(03), 175–181 (2020)
Tanti, M., Gatt, A., Camilleri, K.P.: Where to put the image in an image caption generator. Nat. Lang. Eng. 24(3), 467–489 (2018)
Vinyals, O., et al.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Han, S.-H., Choi, H.-J.: Domain-specific image caption generator with semantic ontology. In: 2020 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE, New York (2020)
Aneja, J., Deshpande, A., Schwing, A.G.: Convolutional image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Chu, Y., et al.: Automatic image captioning based on ResNet50 and LSTM with soft attention. In: Wireless Com unications and Mobile Computing 2020 (2020)
Tan, Y.H., Chan, C.S.: Phrase-based image caption generator with hierarchical LSTM network. Neurocomputing 333, 86–100 (2019)
Katiyar, S., Borgohain, S.K.: Comparative evaluation of CNN architectures for image caption generation. arXiv preprint arXiv:2102.11506 (2021)
Seo, P.H., et al.: Reinforcing an image caption generator using off-line human feedback. In: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, No. 03 (2020)
Anu, M., Divya, S.: Building a voice based image caption generator with deep learning. In: 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS). IEEE, New York (2021)
Tanti, M., Gatt, A., Camilleri, K.P.: What is the role of recurrent neural networks (RNNS) in an image caption generator?. arXiv preprint arXiv:1708.02043 (2017)
Mathur, P., et al.: Camera2Caption: a real-time image caption generator. In: 2017 International Conference on Computational Intelligence in Data Science (ICCIDS). IEEE, New York (2017)
He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Com puter Vision and Pattern Recognition (2016)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sri Neha, V., Nikhila, B., Deepika, K., Subetha, T. (2022). A Comparative Analysis on Image Caption Generator Using Deep Learning Architecture—ResNet and VGG16. In: Smys, S., Tavares, J.M.R.S., Balas, V.E. (eds) Computational Vision and Bio-Inspired Computing. Advances in Intelligent Systems and Computing, vol 1420. Springer, Singapore. https://doi.org/10.1007/978-981-16-9573-5_15
Download citation
DOI: https://doi.org/10.1007/978-981-16-9573-5_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-9572-8
Online ISBN: 978-981-16-9573-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)