Dynamic Fusion with Intra- and Inter-modality Attention Flow for Visual Question Answering

被引:260
|
作者
Gao, Peng [1 ]
Jiang, Zhengkai [3 ]
You, Haoxuan [4 ]
Lu, Pan [4 ]
Hoi, Steven [2 ]
Wang, Xiaogang [1 ]
Li, Hongsheng [1 ]
机构
[1] Chinese Univ Hong Kong, CUHK SenseTime Joint Lab, Hong Kong, Peoples R China
[2] Singapore Management Univ, Singapore, Singapore
[3] CASIA, NLPR, Beijing, Peoples R China
[4] Tsinghua Univ, Beijing, Peoples R China
关键词
D O I
10.1109/CVPR.2019.00680
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fuse multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities. It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering. We also show that, the proposed dynamic intra modality attention flow conditioned on the other modality can dynamically modulate the intra-modality attention of the current modality, which is vital for multimodality feature fusion. Experimental evaluations on the VQA 2.0 dataset show that the proposed method achieves the state-of-the-art VQA performance. Extensive ablation studies are carried out for the comprehensive analysis of the proposed method.
引用
收藏
页码:6632 / 6641
页数:10
相关论文
共 50 条
  • [1] Improving Intra- and Inter-Modality Visual Relation for Image Captioning
    Wang, Yong
    Zhang, WenKai
    Liu, Qing
    Zhang, Zhengyuan
    Gao, Xin
    Sun, Xian
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4190 - 4198
  • [2] Fusion of Intra- and Inter-modality Algorithms for Face-Sketch Recognition
    Galea, Christian
    Farrugia, Reuben A.
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, CAIP 2015, PT II, 2015, 9257 : 700 - 711
  • [3] Multi-modal fusion network with intra- and inter-modality attention for prognosis prediction in breast cancer
    Liu, Honglei
    Shi, Yi
    Li, Ao
    Wang, Minghui
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 168
  • [4] Emotion recognition from multiple physiological signals using intra- and inter-modality attention fusion network
    Gong, Linlin
    Chen, Wanzhong
    Li, Mingyang
    Zhang, Tao
    DIGITAL SIGNAL PROCESSING, 2024, 144
  • [5] Instance-Guided Multi-modal Fake News Detection with Dynamic Intra- and Inter-modality Fusion
    Wang, Jie
    Yang, Yan
    Liu, Keyu
    Xie, Peng
    Liu, Xiaorong
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT I, 2022, 13280 : 510 - 521
  • [6] Intra- and inter-modality registration of functional and anatomical clinical images
    Eberl, S
    Braun, M
    NEW APPROACHES IN MEDICAL IMAGE ANALYSIS, 1999, 3747 : 102 - 114
  • [7] Visual Question Answering With Dense Inter- and Intra-Modality Interactions
    Liu, Fei
    Liu, Jing
    Fang, Zhiwei
    Hong, Richang
    Lu, Hanqing
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 3518 - 3529
  • [8] Maximizing mutual information inside intra- and inter-modality for audio-visual event retrieval
    Ruochen Li
    Nannan Li
    Wenmin Wang
    International Journal of Multimedia Information Retrieval, 2023, 12
  • [9] Maximizing mutual information inside intra- and inter-modality for audio-visual event retrieval
    Li, Ruochen
    Li, Nannan
    Wang, Wenmin
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2023, 12 (01)
  • [10] Multi-Modality Global Fusion Attention Network for Visual Question Answering
    Yang, Cheng
    Wu, Weijia
    Wang, Yuxing
    Zhou, Hong
    ELECTRONICS, 2020, 9 (11) : 1 - 12