Interpretable Visual Question Answering by Reasoning on Dependency Trees

被引:28
|
作者
Cao, Qingxing [1 ]
Liang, Xiaodan [1 ]
Li, Bailin [2 ,3 ]
Lin, Liang [2 ,3 ]
机构
[1] Sun Yat Sen Univ, Sch Intelligent Syst Engn, Guangzhou 510275, Guangdong, Peoples R China
[2] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou, Peoples R China
[3] Minist Educ, Engn Res Ctr Adv Comp Engn Software, Guangzhou 510275, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Cognition; Visualization; Layout; Logic gates; Task analysis; Knowledge discovery; Image coding; Visual question answering; image and language parsing; deep reasoning; attention model;
D O I
10.1109/TPAMI.2019.2943456
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Collaborative reasoning for understanding image-question pairs is a very critical but underexplored topic in interpretable visual question answering systems. Although very recent studies have attempted to use explicit compositional processes to assemble multiple subtasks embedded in questions, their models heavily rely on annotations or handcrafted rules to obtain valid reasoning processes, which leads to either heavy workloads or poor performance on compositional reasoning. In this paper, to better align image and language domains in diverse and unrestricted cases, we propose a novel neural network model that performs global reasoning on a dependency tree parsed from the question; thus, our model is called a parse-tree-guided reasoning network (PTGRN). This network consists of three collaborative modules: i) an attention module that exploits the local visual evidence of each word parsed from the question, ii) a gated residual composition module that composes the previously mined evidence, and iii) a parse-tree-guided propagation module that passes the mined evidence along the parse tree. Thus, PTGRN is capable of building an interpretable visual question answering (VQA) system that gradually derives image cues following question-driven parse-tree reasoning. Experiments on relational datasets demonstrate the superiority of PTGRN over current state-of-the-art VQA methods, and the visualization results highlight the explainable capability of our reasoning system.
引用
收藏
页码:887 / 901
页数:15
相关论文
共 50 条
  • [1] INTERPRETABLE VISUAL QUESTION ANSWERING VIA REASONING SUPERVISION
    Parelli, Maria
    Mallis, Dimitrios
    Diomataris, Markos
    Pitsikalis, Vassilis
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2525 - 2529
  • [2] Sequential Visual Reasoning for Visual Question Answering
    Liu, Jinlai
    Wu, Chenfei
    Wang, Xiaojie
    Dong, Xuan
    [J]. PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 410 - 415
  • [3] Graph Strategy for Interpretable Visual Question Answering
    Sarkisyan, Christina
    Savelov, Mikhail
    Kovalev, Alexey K.
    Panov, Aleksandr I.
    [J]. ARTIFICIAL GENERAL INTELLIGENCE, AGI 2022, 2023, 13539 : 86 - 99
  • [4] Chain of Reasoning for Visual Question Answering
    Wu, Chenfei
    Liu, Jinlai
    Wang, Xiaojie
    Dong, Xuan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [5] NOAHQA: Numerical Reasoning with Interpretable Graph Question Answering Dataset
    Zhang, Qiyuan
    Wang, Lei
    Yu, Sicheng
    Wang, Shuohang
    Wang, Yang
    Jiang, Jing
    Lim, Ee-Ping
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 4147 - 4161
  • [6] PRIOR VISUAL RELATIONSHIP REASONING FOR VISUAL QUESTION ANSWERING
    Yang, Zhuoqian
    Qin, Zengchang
    Yu, Jing
    Wan, Tao
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1411 - 1415
  • [7] INTERPRETABLE VISUAL QUESTION ANSWERING REFERRING TO OUTSIDE KNOWLEDGE
    Zhu, He
    Togo, Ren
    Ogawa, Takahiro
    Haseyama, Miki
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2140 - 2144
  • [8] Visual question answering by pattern matching and reasoning
    Zhan, Huayi
    Xiong, Peixi
    Wang, Xin
    Yang, Lan
    [J]. NEUROCOMPUTING, 2022, 467 : 323 - 336
  • [9] Multimodal Learning and Reasoning for Visual Question Answering
    Ilievski, Ilija
    Feng, Jiashi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [10] Improving reasoning with contrastive visual information for visual question answering
    Long, Yu
    Tang, Pengjie
    Wang, Hanli
    Yu, Jian
    [J]. ELECTRONICS LETTERS, 2021, 57 (20) : 758 - 760