Bilinear Graph Networks for Visual Question Answering

被引:28
|
作者
Guo, Dalu [1 ]
Xu, Chang [1 ]
Tao, Dacheng [1 ,2 ]
机构
[1] Univ Sydney, Sch Comp Sci, Fac Engn, Sydney, NSW 2008, Australia
[2] JD Explore Acad, Beijing 101100, Peoples R China
基金
澳大利亚研究理事会;
关键词
Visualization; Feature extraction; Task analysis; Knowledge discovery; Cognition; Data models; Semantics; Bilinear graph; deep learning; graph neural networks (GNNs); visual question answering (VQA);
D O I
10.1109/TNNLS.2021.3104937
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article revisits the bilinear attention networks (BANs) in the visual question answering task from a graph perspective. The classical BANs build a bilinear attention map to extract the joint representation of words in the question and objects in the image but lack fully exploring the relationship between words for complex reasoning. In contrast, we develop bilinear graph networks to model the context of the joint embeddings of words and objects. Two kinds of graphs are investigated, namely, image-graph and question-graph. The image-graph transfers features of the detected objects to their related query words, enabling the output nodes to have both semantic and factual information. The question-graph exchanges information between these output nodes from image-graph to amplify the implicit yet important relationship between objects. These two kinds of graphs cooperate with each other, and thus, our resulting model can build the relationship and dependency between objects, which leads to the realization of multistep reasoning. Experimental results on the VQA v2.0 validation dataset demonstrate the ability of our method to handle complex questions. On the test-std set, our best single model achieves state-of-the-art performance, boosting the overall accuracy to 72.56%, and we are one of the top-two entries in the VQA Challenge 2020.
引用
收藏
页码:1023 / 1034
页数:12
相关论文
共 50 条
  • [1] Graph neural networks for visual question answering: a systematic review
    Abdulganiyu Abdu Yusuf
    Chong Feng
    Xianling Mao
    Ramadhani Ally Duma
    Mohammed Salah Abood
    Abdulrahman Hamman Adama Chukkol
    [J]. Multimedia Tools and Applications, 2024, 83 : 55471 - 55508
  • [2] Graph neural networks for visual question answering: a systematic review
    Yusuf, Abdulganiyu Abdu
    Feng, Chong
    Mao, Xianling
    Ally Duma, Ramadhani
    Abood, Mohammed Salah
    Chukkol, Abdulrahman Hamman Adama
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (18) : 55471 - 55508
  • [3] Multimodal Graph Networks for Compositional Generalization in Visual Question Answering
    Saqur, Raeid
    Narasimhan, Karthik
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [4] An analysis of graph convolutional networks and recent datasets for visual question answering
    Yusuf, Abdulganiyu Abdu
    Feng Chong
    Mao Xianling
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (08) : 6277 - 6300
  • [5] An analysis of graph convolutional networks and recent datasets for visual question answering
    Abdulganiyu Abdu Yusuf
    Feng Chong
    Mao Xianling
    [J]. Artificial Intelligence Review, 2022, 55 : 6277 - 6300
  • [6] Graph Strategy for Interpretable Visual Question Answering
    Sarkisyan, Christina
    Savelov, Mikhail
    Kovalev, Alexey K.
    Panov, Aleksandr I.
    [J]. ARTIFICIAL GENERAL INTELLIGENCE, AGI 2022, 2023, 13539 : 86 - 99
  • [7] Object-difference drived graph convolutional networks for visual question answering
    Zhu, Xi
    Mao, Zhendong
    Chen, Zhineng
    Li, Yangyang
    Wang, Zhaohui
    Wang, Bin
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (11) : 16247 - 16265
  • [8] Object-difference drived graph convolutional networks for visual question answering
    Xi Zhu
    Zhendong Mao
    Zhineng Chen
    Yangyang Li
    Zhaohui Wang
    Bin Wang
    [J]. Multimedia Tools and Applications, 2021, 80 : 16247 - 16265
  • [9] Evaluation of graph convolutional networks performance for visual question answering on reasoning datasets
    Abdulganiyu Abdu Yusuf
    Feng Chong
    Mao Xianling
    [J]. Multimedia Tools and Applications, 2022, 81 : 40361 - 40370
  • [10] Evaluation of graph convolutional networks performance for visual question answering on reasoning datasets
    Yusuf, Abdulganiyu Abdu
    Feng Chong
    Mao Xianling
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (28) : 40361 - 40370