FROM SHALLOW TO DEEP: COMPOSITIONAL REASONING OVER GRAPHS FOR VISUAL QUESTION ANSWERING

被引：1

作者：

Zhu, Zihao ^{[1
]}

机构：

[1] Univ Chinese Acad Sci, Beijing, Peoples R China

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

visual question answering; graph neural modules; compositional reasoning; multi-layer graphs;

D O I：

10.1109/ICASSP43922.2022.9747737

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In order to achieve a general visual question answering (VQA) system, it is essential to learn to answer deeper questions that require compositional reasoning on the image and external knowledge. Meanwhile, the reasoning process should be explicit and explainable to understand the working mechanism of the model. It is effortless for human but challenging for machines. In this paper, we propose a Hierarchical Graph Neural Module Network (HGNMN) that reasons over multi-layer graphs with neural modules to address the above issues. Specifically, we first encode the image by multi-layer graphs from the visual, semantic and commonsense views since the clues that support the answer may exist in different modalities. Our model consists of several well-dasigned neural modules that perform specific functions over graphs, which can be used to conduct multi-step reasoning within and between different graphs. Compared to existing modular networks, we extend visual reasoning from one graph to more graphs. We can explicitly trace the reasoning process according to module weights and graph attentions. Experiments show that our model not only achieves state-of-the-art performance on the CRIC dataset but also obtains explicit and explainable reasoning procedures.

引用

页码：8217 / 8221

页数：5

共 50 条

[1] Maintaining Reasoning Consistency in Compositional Visual Question Answering
Jing, Chenchen
Jia, Yunde
Wu, Yuwei
Liu, Xinyu
Wu, Qi
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5089 - 5098
[2] Deep Cognitive Reasoning Network for Multi-hop Question Answering over Knowledge Graphs
Cai, Jianyu
Zhang, Zhanqiu
Wu, Feng
Wang, Jie
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 219 - 229
[3] A Diagrammatic Approach for Visual Question Answering over Knowledge Graphs
Mouromtsev, Dmitry
Wohlgenannt, Gerhard
Haase, Peter
Pavlov, Dmitry
Emelyanov, Yury
Morozov, Alexey
[J]. SEMANTIC WEB: ESWC 2018 SATELLITE EVENTS, 2018, 11155 : 34 - 39
[4] Sequential Visual Reasoning for Visual Question Answering
Liu, Jinlai
Wu, Chenfei
Wang, Xiaojie
Dong, Xuan
[J]. PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 410 - 415
[5] Chain of Reasoning for Visual Question Answering
Wu, Chenfei
Liu, Jinlai
Wang, Xiaojie
Dong, Xuan
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[6] An improving reasoning network for complex question answering over temporal knowledge graphs
Songlin Jiao
Zhenfang Zhu
Wenqing Wu
Zicheng Zuo
Jiangtao Qi
Wenling Wang
Guangyuan Zhang
Peiyu Liu
[J]. Applied Intelligence, 2023, 53 : 8195 - 8208
[7] Semantic-enhanced reasoning question answering over temporal knowledge graphs
Du, Chenyang
Li, Xiaoge
Li, Zhongyang
[J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024, 62 (03) : 859 - 881
[8] An improving reasoning network for complex question answering over temporal knowledge graphs
Jiao, Songlin
Zhu, Zhenfang
Wu, Wenqing
Zuo, Zicheng
Qi, Jiangtao
Wang, Wenling
Zhang, Guangyuan
Liu, Peiyu
[J]. APPLIED INTELLIGENCE, 2023, 53 (07) : 8195 - 8208
[9] PRIOR VISUAL RELATIONSHIP REASONING FOR VISUAL QUESTION ANSWERING
Yang, Zhuoqian
Qin, Zengchang
Yu, Jing
Wan, Tao
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1411 - 1415
[10] DMRFNet: Deep Multimodal Reasoning and Fusion for Visual Question Answering and explanation generation
Zhang, Weifeng
Yu, Jing
Zhao, Wenhong
Ran, Chuan
[J]. INFORMATION FUSION, 2021, 72 : 70 - 79

← 1 2 3 4 5 →