Dynamic Fusion with Intra- and Inter-modality Attention Flow for Visual Question Answering

被引:260
|
作者
Gao, Peng [1 ]
Jiang, Zhengkai [3 ]
You, Haoxuan [4 ]
Lu, Pan [4 ]
Hoi, Steven [2 ]
Wang, Xiaogang [1 ]
Li, Hongsheng [1 ]
机构
[1] Chinese Univ Hong Kong, CUHK SenseTime Joint Lab, Hong Kong, Peoples R China
[2] Singapore Management Univ, Singapore, Singapore
[3] CASIA, NLPR, Beijing, Peoples R China
[4] Tsinghua Univ, Beijing, Peoples R China
关键词
D O I
10.1109/CVPR.2019.00680
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fuse multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities. It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering. We also show that, the proposed dynamic intra modality attention flow conditioned on the other modality can dynamically modulate the intra-modality attention of the current modality, which is vital for multimodality feature fusion. Experimental evaluations on the VQA 2.0 dataset show that the proposed method achieves the state-of-the-art VQA performance. Extensive ablation studies are carried out for the comprehensive analysis of the proposed method.
引用
收藏
页码:6632 / 6641
页数:10
相关论文
共 50 条
  • [41] The multi-modal fusion in visual question answering: a review of attention mechanisms
    Lu, Siyu
    Liu, Mingzhe
    Yin, Lirong
    Yin, Zhengtong
    Liu, Xuan
    Zheng, Wenfeng
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [42] Echocardiography and magnetic resonance imaging based strain analysis of functional single ventricles: a study of intra- and inter-modality reproducibility
    Ghelani, Sunil J.
    Harrild, David M.
    Gauvreau, Kimberlee
    Geva, Tal
    Rathod, Rahul H.
    INTERNATIONAL JOURNAL OF CARDIOVASCULAR IMAGING, 2016, 32 (07): : 1113 - 1120
  • [43] Echocardiography and magnetic resonance imaging based strain analysis of functional single ventricles: a study of intra- and inter-modality reproducibility
    Sunil J. Ghelani
    David M. Harrild
    Kimberlee Gauvreau
    Tal Geva
    Rahul H. Rathod
    The International Journal of Cardiovascular Imaging, 2016, 32 : 1113 - 1120
  • [44] DHHG-TAC: Fusion of Dynamic Heterogeneous Hypergraphs and Transformer Attention Mechanism for Visual Question Answering Tasks
    Liu, Xuetao
    Dong, Ruiliang
    Yang, Hongyan
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2025, 21 (01) : 545 - 553
  • [45] Visual Question Answering using Explicit Visual Attention
    Lioutas, Vasileios
    Passalis, Nikolaos
    Tefas, Anastasios
    2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
  • [46] Deep Learning Based Inter-modality Image Registration Supervised by Intra-modality Similarity
    Cao, Xiaohuan
    Yang, Jianhuan
    Wang, Li
    Xue, Zhong
    Wang, Qian
    Shen, Dinggang
    MACHINE LEARNING IN MEDICAL IMAGING: 9TH INTERNATIONAL WORKSHOP, MLMI 2018, 2018, 11046 : 55 - 63
  • [47] Hybrid Fusion with Intra- and Cross-Modality Attention for Image-Recipe Retrieval
    Li, Jiao
    Xu, Xing
    Yu, Wei
    Shen, Fumin
    Cao, Zuo
    Zuo, Kai
    Shen, Heng Tao
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 244 - 254
  • [48] SIMPLE AND EFFECTIVE VISUAL QUESTION ANSWERING IN A SINGLE MODALITY
    Lin, Yuetan
    Pang, Zhangyang
    Li, Yanan
    Wang, Donghui
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 2276 - 2280
  • [49] A Probabilistic, Non-parametric Framework for Inter-modality Label Fusion
    Iglesias, Juan Eugenio
    Sabuncu, Mert Rory
    Van Leemput, Koen
    MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION (MICCAI 2013), PT III, 2013, 8151 : 576 - 583
  • [50] Improving compound-protein interaction prediction by focusing on intra-modality and inter-modality dynamics with a multimodal tensor fusion strategy
    Wang, Meng
    Wang, Jianmin
    Ji, Jianxin
    Ma, Chenjing
    Wang, Hesong
    He, Jia
    Song, Yongzhen
    Zhang, Xuan
    Cao, Yong
    Dai, Yanyan
    Hua, Menglei
    Qin, Ruihao
    Li, Kang
    Cao, Lei
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2024, 23 : 3714 - 3729