Interpretable Visual Question Answering by Reasoning on Dependency Trees

被引：28

作者：

Cao, Qingxing ^{[1
]}

Liang, Xiaodan ^{[1
]}

Li, Bailin ^{[2
,3
]}

Lin, Liang ^{[2
,3
]}

机构：

[1] Sun Yat Sen Univ, Sch Intelligent Syst Engn, Guangzhou 510275, Guangdong, Peoples R China

[2] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou, Peoples R China

[3] Minist Educ, Engn Res Ctr Adv Comp Engn Software, Guangzhou 510275, Guangdong, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2021年 / 43卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Cognition; Visualization; Layout; Logic gates; Task analysis; Knowledge discovery; Image coding; Visual question answering; image and language parsing; deep reasoning; attention model;

D O I：

10.1109/TPAMI.2019.2943456

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Collaborative reasoning for understanding image-question pairs is a very critical but underexplored topic in interpretable visual question answering systems. Although very recent studies have attempted to use explicit compositional processes to assemble multiple subtasks embedded in questions, their models heavily rely on annotations or handcrafted rules to obtain valid reasoning processes, which leads to either heavy workloads or poor performance on compositional reasoning. In this paper, to better align image and language domains in diverse and unrestricted cases, we propose a novel neural network model that performs global reasoning on a dependency tree parsed from the question; thus, our model is called a parse-tree-guided reasoning network (PTGRN). This network consists of three collaborative modules: i) an attention module that exploits the local visual evidence of each word parsed from the question, ii) a gated residual composition module that composes the previously mined evidence, and iii) a parse-tree-guided propagation module that passes the mined evidence along the parse tree. Thus, PTGRN is capable of building an interpretable visual question answering (VQA) system that gradually derives image cues following question-driven parse-tree reasoning. Experiments on relational datasets demonstrate the superiority of PTGRN over current state-of-the-art VQA methods, and the visualization results highlight the explainable capability of our reasoning system.

引用

页码：887 / 901

页数：15

共 50 条

[1] INTERPRETABLE VISUAL QUESTION ANSWERING VIA REASONING SUPERVISION
Parelli, Maria
Mallis, Dimitrios
Diomataris, Markos
Pitsikalis, Vassilis
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2525 - 2529
[2] Sequential Visual Reasoning for Visual Question Answering
Liu, Jinlai
Wu, Chenfei
Wang, Xiaojie
Dong, Xuan
[J]. PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 410 - 415
[3] Graph Strategy for Interpretable Visual Question Answering
Sarkisyan, Christina
Savelov, Mikhail
Kovalev, Alexey K.
Panov, Aleksandr I.
[J]. ARTIFICIAL GENERAL INTELLIGENCE, AGI 2022, 2023, 13539 : 86 - 99
[4] Chain of Reasoning for Visual Question Answering
Wu, Chenfei
Liu, Jinlai
Wang, Xiaojie
Dong, Xuan
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[5] NOAHQA: Numerical Reasoning with Interpretable Graph Question Answering Dataset
Zhang, Qiyuan
Wang, Lei
Yu, Sicheng
Wang, Shuohang
Wang, Yang
Jiang, Jing
Lim, Ee-Ping
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 4147 - 4161
[6] PRIOR VISUAL RELATIONSHIP REASONING FOR VISUAL QUESTION ANSWERING
Yang, Zhuoqian
Qin, Zengchang
Yu, Jing
Wan, Tao
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1411 - 1415
[7] INTERPRETABLE VISUAL QUESTION ANSWERING REFERRING TO OUTSIDE KNOWLEDGE
Zhu, He
Togo, Ren
Ogawa, Takahiro
Haseyama, Miki
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2140 - 2144
[8] Visual question answering by pattern matching and reasoning
Zhan, Huayi
Xiong, Peixi
Wang, Xin
Yang, Lan
[J]. NEUROCOMPUTING, 2022, 467 : 323 - 336
[9] Multimodal Learning and Reasoning for Visual Question Answering
Ilievski, Ilija
Feng, Jiashi
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[10] Improving reasoning with contrastive visual information for visual question answering
Long, Yu
Tang, Pengjie
Wang, Hanli
Yu, Jian
[J]. ELECTRONICS LETTERS, 2021, 57 (20) : 758 - 760

← 1 2 3 4 5 →