Abstract
Visual Question Answering (VQA) is a task where machines are challenged to produce correct answers for a question asked about an image. This paper proposes a novel image featurization framework named ImageFuse to improve the task of VQA. It implements a combination of feature fusion networks to form a fine-grained image representation instead of directly adopting common representations from the popular ImageNet CNN models via transfer learning. The two parallel fusion networks are trained using Canonical Correlation Analysis (CCA) and Autoencoders (AE) to capture both linear and non-linear relationships that exist in multiple views of the image. Extensive experiments conducted on DAQUAR VQA dataset show a significant improvement for the proposed framework over single image representation based VQA systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
μ: Membership measure.
- 2.
Ai, Ti: ith predicted answer, and ith ground truth answer.
- 3.
WUP (a, b): Similarity based on depth of two words ‘a’ and ‘b’ in the wordNet taxonomy.
References
Teney, D., Wu, Q., van den Hengel, A.: Visual question answering: a tutorial. IEEE Signal Process. Mag. 34(6), 63–75 (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: elevating the role of image understanding in Visual Question Answering. In: CVPR, vol. 1, no. 2, p. 3 (2017)
Yu, L., Park, E., Berg, A.C., Berg, T.L.: Visual madlibs: fill in the blank description generation and question answering. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2461–2469 (2015)
Tommasi, T., Mallya, A., Plummer, B., Lazebnik, S., Berg, A.C., Berg, T.L.: Combining multiple cues for visual madlibs question answering. Int. J. Comput. Vision 127(1), 38–60 (2019)
Zhu, Y., Groth, O., Bernstein, M., Fei-Fei, L.: Visual7w: grounded question answering in images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4995–5004 (2016)
Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical question-image co-attention for visual question answering. In: Advances in Neural Information Processing Systems, pp. 289–297 (2016)
Manmadhan, S., Kovoor, B.C.: Visual question answering: a state-of-the-art review. Artif. Intell. Rev. 53, 1–41 (2020)
Fader, A., Zettlemoyer, L., Etzioni, O.: Paraphrase-driven learning for open question answering. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol. 1: Long Papers, pp. 1608–1618 (2013)
Yue, C., Cao, H., Xiong, K., Cui, A., Qin, H., Li, M.: Enhanced question understanding with dynamic memory networks for textual question answering. Expert Syst. Appl. 80, 39–45 (2017)
Shih, K.J., Singh, S., Hoiem, D.: Where to look: focus regions for visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4613–4621 (2016)
Saito, K., Shin, A., Ushiku, Y., Harada, T.: Dualnet: domain-invariant network for visual question answering. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 829–834. IEEE (2017)
Toor, A.S., Wechsler, H., Nappi, M.: Question action relevance and editing for visual question answering. Multimedia Tools Appl. 78(3), 2921–2935 (2019)
Sun, Q.S., Zeng, S.G., Liu, Y., Heng, P.A., Xia, D.S.: A new method of feature fusion and its application in image recognition. Pattern Recogn. 38(12), 2437–2448 (2005)
Ergun, H., Akyuz, Y.C., Sert, M., Liu, J.: Early and late level fusion of deep convolutional neural networks for visual concept recognition. Int. J. Semant. Comput. 10(03), 379–397 (2016)
Li, J., Yang, B., Yang, W., Sun, C., Xu, J.: Subspace-based multi-view fusion for instance-level image retrieval. Vis. Comput. 37, 1–15 (2020)
Charte, D., Charte, F., García, S., del Jesus, M.J., Herrera, F.: A practical tutorial on autoencoders for nonlinear feature fusion: taxonomy, models, software and guidelines. Inf. Fusion 44, 78–96 (2018)
Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemometr. Intell. Lab. Syst. 2(1–3), 37–52 (1987)
Yu, H., Yang, J.: A direct LDA algorithm for high-dimensional data—with application to face recognition. Pattern Recogn. 34(10), 2067–2070 (2001)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Manmadhan, S., Kovoor, B.C.: Optimal image feature ranking and fusion for visual question answering. In: Evolution in Computational Intelligence, pp. 103–113. Springer, Singapore (2021)
Cover, T.M.: Elements of Information theory. John Wiley & Sons, Hoboken (1999)
Hotelling, H.: Relations between two sets of variates. In: Breakthroughs in Statistics, pp. 162–190. Springer, New York (1992)
Malinowski, M., Fritz, M.: A multi-world approach to question answering about real-world scenes based on uncertain input. In: Advances in Neural Information Processing Systems, pp. 1682–1690 (2014)
Gurari, D., Li, Q., Stangl, A.J., Guo, A., Lin, C., Grauman, K., Bigham, J.P.: Vizwiz grand challenge: answering visual questions from blind people. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3608–3617 (2018)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Manmadhan, S., Kovoor, B.C. (2021). ImageFuse: A Multi-view Image Featurization Framework for Visual Question Answering. In: Abraham, A., Piuri, V., Gandhi, N., Siarry, P., Kaklauskas, A., Madureira, A. (eds) Intelligent Systems Design and Applications. ISDA 2020. Advances in Intelligent Systems and Computing, vol 1351. Springer, Cham. https://doi.org/10.1007/978-3-030-71187-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-71187-0_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71186-3
Online ISBN: 978-3-030-71187-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)