Dynamic Fusion with Intra- and Inter-modality Attention Flow for Visual Question Answering

被引：260

作者：

Gao, Peng ^{[1
]}

Jiang, Zhengkai ^{[3
]}

You, Haoxuan ^{[4
]}

Lu, Pan ^{[4
]}

Hoi, Steven ^{[2
]}

Wang, Xiaogang ^{[1
]}

Li, Hongsheng ^{[1
]}

机构：

[1] Chinese Univ Hong Kong, CUHK SenseTime Joint Lab, Hong Kong, Peoples R China

[2] Singapore Management Univ, Singapore, Singapore

[3] CASIA, NLPR, Beijing, Peoples R China

[4] Tsinghua Univ, Beijing, Peoples R China

来源：

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年

关键词：

D O I：

10.1109/CVPR.2019.00680

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fuse multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities. It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering. We also show that, the proposed dynamic intra modality attention flow conditioned on the other modality can dynamically modulate the intra-modality attention of the current modality, which is vital for multimodality feature fusion. Experimental evaluations on the VQA 2.0 dataset show that the proposed method achieves the state-of-the-art VQA performance. Extensive ablation studies are carried out for the comprehensive analysis of the proposed method.

引用

页码：6632 / 6641

页数：10

共 50 条

[41] The multi-modal fusion in visual question answering: a review of attention mechanisms
Lu, Siyu
Liu, Mingzhe
Yin, Lirong
Yin, Zhengtong
Liu, Xuan
Zheng, Wenfeng
PEERJ COMPUTER SCIENCE, 2023, 9
[42] Echocardiography and magnetic resonance imaging based strain analysis of functional single ventricles: a study of intra- and inter-modality reproducibility
Ghelani, Sunil J.
Harrild, David M.
Gauvreau, Kimberlee
Geva, Tal
Rathod, Rahul H.
INTERNATIONAL JOURNAL OF CARDIOVASCULAR IMAGING, 2016, 32 (07): : 1113 - 1120
[43] Echocardiography and magnetic resonance imaging based strain analysis of functional single ventricles: a study of intra- and inter-modality reproducibility
Sunil J. Ghelani
David M. Harrild
Kimberlee Gauvreau
Tal Geva
Rahul H. Rathod
The International Journal of Cardiovascular Imaging, 2016, 32 : 1113 - 1120
[44] DHHG-TAC: Fusion of Dynamic Heterogeneous Hypergraphs and Transformer Attention Mechanism for Visual Question Answering Tasks
Liu, Xuetao
Dong, Ruiliang
Yang, Hongyan
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2025, 21 (01) : 545 - 553
[45] Visual Question Answering using Explicit Visual Attention
Lioutas, Vasileios
Passalis, Nikolaos
Tefas, Anastasios
2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
[46] Deep Learning Based Inter-modality Image Registration Supervised by Intra-modality Similarity
Cao, Xiaohuan
Yang, Jianhuan
Wang, Li
Xue, Zhong
Wang, Qian
Shen, Dinggang
MACHINE LEARNING IN MEDICAL IMAGING: 9TH INTERNATIONAL WORKSHOP, MLMI 2018, 2018, 11046 : 55 - 63
[47] Hybrid Fusion with Intra- and Cross-Modality Attention for Image-Recipe Retrieval
Li, Jiao
Xu, Xing
Yu, Wei
Shen, Fumin
Cao, Zuo
Zuo, Kai
Shen, Heng Tao
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 244 - 254
[48] SIMPLE AND EFFECTIVE VISUAL QUESTION ANSWERING IN A SINGLE MODALITY
Lin, Yuetan
Pang, Zhangyang
Li, Yanan
Wang, Donghui
2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 2276 - 2280
[49] A Probabilistic, Non-parametric Framework for Inter-modality Label Fusion
Iglesias, Juan Eugenio
Sabuncu, Mert Rory
Van Leemput, Koen
MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION (MICCAI 2013), PT III, 2013, 8151 : 576 - 583
[50] Improving compound-protein interaction prediction by focusing on intra-modality and inter-modality dynamics with a multimodal tensor fusion strategy
Wang, Meng
Wang, Jianmin
Ji, Jianxin
Ma, Chenjing
Wang, Hesong
He, Jia
Song, Yongzhen
Zhang, Xuan
Cao, Yong
Dai, Yanyan
Hua, Menglei
Qin, Ruihao
Li, Kang
Cao, Lei
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2024, 23 : 3714 - 3729

← 1 2 3 4 5 →