DMRFNet: Deep Multimodal Reasoning and Fusion for Visual Question Answering and explanation generation

被引：0

作者：

Zhang, Weifeng ^{[1
]}

Yu, Jing ^{[2
]}

Zhao, Wenhong ^{[3
]}

Ran, Chuan ^{[4
]}

机构：

[1] Jiaxing University, Zhejiang, China

[2] Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China

[3] Nanhu College, Jiaxing University, Zhejiang, China

[4] IBM Corporation, NC, United States

来源：

Information Fusion | 2021年 / 72卷

关键词：

Artificial intelligence - Natural language processing systems - Visual languages;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Visual Question Answering (VQA), which aims to answer questions in natural language according to the content of image, has attracted extensive attention from artificial intelligence community. Multimodal reasoning and fusion is a central component in recent VQA models. However, most existing VQA models are still insufficient to reason and fuse clues from multiple modalities. Furthermore, they are lack of interpretability since they disregard the explanations. We argue that reasoning and fusing multiple relations implied in multimodalities contributes to more accurate answers and explanations. In this paper, we design an effective multimodal reasoning and fusion model to achieve fine-grained multimodal reasoning and fusion. Specifically, we propose Multi-Graph Reasoning and Fusion (MGRF) layer, which adopts pre-trained semantic relation embeddings, to reason complex spatial and semantic relations between visual objects and fuse these two kinds of relations adaptively. The MGRF layers can be further stacked in depth to form Deep Multimodal Reasoning and Fusion Network (DMRFNet) to sufficiently reason and fuse multimodal relations. Furthermore, an explanation generation module is designed to justify the predicted answer. This justification reveals the motive of the model's decision and enhances the model's interpretability. Quantitative and qualitative experimental results on VQA 2.0, and VQA-E datasets show DMRFNet's effectiveness. © 2021 Elsevier B.V.

引用

页码：70 / 79

共 50 条

[1] DMRFNet: Deep Multimodal Reasoning and Fusion for Visual Question Answering and explanation generation
Zhang, Weifeng
Yu, Jing
Zhao, Wenhong
Ran, Chuan
INFORMATION FUSION, 2021, 72 : 70 - 79
[2] Multimodal feature fusion by relational reasoning and attention for visual question answering
Zhang, Weifeng
Yu, Jing
Hu, Hua
Hu, Haiyang
Qin, Zengchang
INFORMATION FUSION, 2020, 55 (55) : 116 - 126
[3] Faithful Multimodal Explanation for Visual Question Answering
Wu, Jialin
Mooney, Raymond J.
BLACKBOXNLP WORKSHOP ON ANALYZING AND INTERPRETING NEURAL NETWORKS FOR NLP AT ACL 2019, 2019, : 103 - 112
[4] Multimodal Learning and Reasoning for Visual Question Answering
Ilievski, Ilija
Feng, Jiashi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[5] Visual Question Answering on CLEVR Dataset via Multimodal Fusion and Relational Reasoning
Allahyari, Abbas
Borna, Keivan
2021 52ND ANNUAL IRANIAN MATHEMATICS CONFERENCE (AIMC), 2021, : 74 - 76
[6] Multimodal Graph Reasoning and Fusion for Video Question Answering
Zhang, Shuai
Wang, Xingfu
Hawbani, Ammar
Zhao, Liang
Alsamhi, Saeed Hamood
2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 1410 - 1415
[7] ViCLEVR: a visual reasoning dataset and hybrid multimodal fusion model for visual question answering in Vietnamese
Tran, Khiem Vinh
Phan, Hao Phu
Van Nguyen, Kiet
Nguyen, Ngan Luu Thuy
MULTIMEDIA SYSTEMS, 2024, 30 (04)
[8] Multimodal Knowledge Reasoning for Enhanced Visual Question Answering
Hussain, Afzaal
Maqsood, Ifrah
Shahzad, Muhammad
Fraz, Muhammad Moazam
2022 16TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY & INTERNET-BASED SYSTEMS, SITIS, 2022, : 224 - 230
[9] MUREL: Multimodal Relational Reasoning for Visual Question Answering
Cadene, Remi
Ben-younes, Hedi
Cord, Matthieu
Thome, Nicolas
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1989 - 1998
[10] Multimodal deep fusion for image question answering
Zhang, Weifeng
Yu, Jing
Wang, Yuxia
Wang, Wei
KNOWLEDGE-BASED SYSTEMS, 2021, 212

← 1 2 3 4 5 →