Semantic Tree-Structured Representation for Visual Question Answering System

Padmajaya Rekha, K.; Chitrakala, S.

doi:10.1007/978-981-16-5348-3_29

K. Padmajaya Rekha¹³ &
S. Chitrakala¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 287))

Abstract

Visual question answering (VQA) system is an integrative research problem in the field of artificial intelligence. An image and a textual query are given as input to the VQA system. It tries to find the correct answer by combining the image and deductions collected from input textual queries. It is essential to interpret and retrieve the accurate answers from the visual reasoning queries. Recent studies have made use of parse tree construction on input queries which leads to poor performance due to the lack of semantic interpretation. This work is proposed to achieve comprehensive reasoning by following a semantic representation of the parsed tree construction. The proposed model, semantic tree-based visual question answering system (STVQA) captures the inherent visual evidence of every word parsed from the textual query and combines the visual evidence of its child nodes. The result obtained is transported to the parent nodes in the parse tree. Thus, the STVQA proposed system aims to fulfil global reasoning interpretation from the image and textual query. The VQA system is applicable to various domains such as image retrieval system, surveillance and hence acts as an aid for visually impaired people. The STVQA system is explored on a publicly available benchmark challenging dataset: CLEVR. It is shown that the model is computationally efficient and data-efficient and achieving a new state-of-the-art 90% accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ViCLEVR: a visual reasoning dataset and hybrid multimodal fusion model for visual question answering in Vietnamese

Article 06 July 2024

Visual Question Answering – VizWiz Challenge

A Critical Analysis of VQA Models and Datasets

References

Wan, Z., He, H.: AnswerNet: Learning to answer questions. IEEE Trans. Big Data 6(1) (2018)
Google Scholar
He, S., Han, C., Han, G., Qin, J.: Exploring duality in visual question-driven top-down saliency. IEEE Trans. Pattern Analysis Machine Intell. 31(7) (2020)
Google Scholar
Liang, J., et al.: “Focal visual-text attention for Memex question answering. IEEE Trans. Pattern Analysis Machine Intell. 41(8) (2018)
Google Scholar
Hashemi Hosseinabad, S., Safayani, M., Mirzaei, A.: Multiple answers to question: a new approach for visual question answering. Springer, Vis Computer (2020)
Google Scholar
Wang, P., et al.: FVQA: Fact-based visual question answering. IEEE Trans. Pattern Analysis Machine Intell. 40, 10 October 2018
Google Scholar
Pendurkar, S., et al.: Attention based multi-modal fusion architecture for open-ended video question answering systems. Elsevier, vol. 171 (2020)
Google Scholar
Andeep, S.: Toor, Harry Wechsler and Michele Nappi, “Question action relevance and editing for visual question answering.” Multimedia Tools Appl 78, 2921–2935 (2019)
Article Google Scholar
Lioutas, V., Passalis, N., Tefas, A.: Explicit ensemble Attention learning for improving visual question answering. Pattern Recogn. Lett. 111, 1 (2018)
Article Google Scholar
Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: Elevating the role of image understanding in Visual Question Answering. in CVPR (2017)
Google Scholar
Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L., Zitnick, C.L., Girshick, R.: CLEVR: A diagnostic dataset for compositional language an elementary visual reasoning. in CVPR (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Engineering Guindy, Department of Computer Science and Engineering, Anna University, Chennai, India
K. Padmajaya Rekha & S. Chitrakala

Authors

K. Padmajaya Rekha
View author publications
You can also search for this author in PubMed Google Scholar
S. Chitrakala
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science & Engineering and Information Technology, Jaypee Institute of Information Technology, Noida, Uttar Pradesh, India
Mukesh Saraswat
Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
Sarbani Roy
Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
Chandreyee Chowdhury
Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, NSW, Australia
Amir H. Gandomi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Padmajaya Rekha, K., Chitrakala, S. (2022). Semantic Tree-Structured Representation for Visual Question Answering System. In: Saraswat, M., Roy, S., Chowdhury, C., Gandomi, A.H. (eds) Proceedings of International Conference on Data Science and Applications. Lecture Notes in Networks and Systems, vol 287. Springer, Singapore. https://doi.org/10.1007/978-981-16-5348-3_29

Download citation

DOI: https://doi.org/10.1007/978-981-16-5348-3_29
Published: 23 November 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-5347-6
Online ISBN: 978-981-16-5348-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Semantic Tree-Structured Representation for Visual Question Answering System

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ViCLEVR: a visual reasoning dataset and hybrid multimodal fusion model for visual question answering in Vietnamese

Visual Question Answering – VizWiz Challenge

A Critical Analysis of VQA Models and Datasets

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Semantic Tree-Structured Representation for Visual Question Answering System

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ViCLEVR: a visual reasoning dataset and hybrid multimodal fusion model for visual question answering in Vietnamese

Visual Question Answering – VizWiz Challenge

A Critical Analysis of VQA Models and Datasets

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation