Dynamic Fusion with Intra- and Inter-modality Attention Flow for Visual Question Answering

被引:260
|
作者
Gao, Peng [1 ]
Jiang, Zhengkai [3 ]
You, Haoxuan [4 ]
Lu, Pan [4 ]
Hoi, Steven [2 ]
Wang, Xiaogang [1 ]
Li, Hongsheng [1 ]
机构
[1] Chinese Univ Hong Kong, CUHK SenseTime Joint Lab, Hong Kong, Peoples R China
[2] Singapore Management Univ, Singapore, Singapore
[3] CASIA, NLPR, Beijing, Peoples R China
[4] Tsinghua Univ, Beijing, Peoples R China
关键词
D O I
10.1109/CVPR.2019.00680
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fuse multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities. It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering. We also show that, the proposed dynamic intra modality attention flow conditioned on the other modality can dynamically modulate the intra-modality attention of the current modality, which is vital for multimodality feature fusion. Experimental evaluations on the VQA 2.0 dataset show that the proposed method achieves the state-of-the-art VQA performance. Extensive ablation studies are carried out for the comprehensive analysis of the proposed method.
引用
收藏
页码:6632 / 6641
页数:10
相关论文
共 50 条
  • [21] Intra- and inter-modality registration of four-dimensional (4D) images
    Schreibmann, E.
    Xing, L.
    INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2006, 66 (03): : S648 - S648
  • [22] Supervised Intra- and Inter-Modality Similarity Preserving Hashing for Cross-Modal Retrieval
    Chen, Zhikui
    Zhong, Fangming
    Min, Geyong
    Leng, Yonglin
    Ying, Yiming
    IEEE ACCESS, 2018, 6 : 27796 - 27808
  • [23] Densely Connected Attention Flow for Visual Question Answering
    Liu, Fei
    Liu, Jing
    Fang, Zhiwei
    Hong, Richang
    Lu, Hanging
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 869 - 875
  • [24] Collaborative Modality Fusion for Mitigating Language Bias in Visual Question Answering
    Lu, Qiwen
    Chen, Shengbo
    Zhu, Xiaoke
    JOURNAL OF IMAGING, 2024, 10 (03)
  • [25] Multimodal fake news detection through intra-modality feature aggregation and inter-modality semantic fusion
    Zhu, Peican
    Hua, Jiaheng
    Tang, Keke
    Tian, Jiwei
    Xu, Jiwei
    Cui, Xiaodong
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (04) : 5851 - 5863
  • [26] VigilanceNet: Decouple Intra- and Inter-Modality Learning for Multimodal Vigilance Estimation in RSVP-Based BCI
    Cheng, Xinyu
    Wei, Wei
    Du, Changde
    Qiu, Shuang
    Tian, Sanli
    Ma, Xiaojun
    He, Huiguang
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [27] Cross-modality co-attention networks for visual question answering
    Han, Dezhi
    Zhou, Shuli
    Li, Kuan Ching
    de Mello, Rodrigo Fernandes
    SOFT COMPUTING, 2021, 25 (07) : 5411 - 5421
  • [28] Cross-modality co-attention networks for visual question answering
    Dezhi Han
    Shuli Zhou
    Kuan Ching Li
    Rodrigo Fernandes de Mello
    Soft Computing, 2021, 25 : 5411 - 5421
  • [29] Dynamic Co-attention Network for Visual Question Answering
    Ebaid, Doaa B.
    Madbouly, Magda M.
    El-Zoghabi, Adel A.
    2021 8TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI 2021), 2021, : 125 - 129
  • [30] INTER-MODALITY FUSION BASED ATTENTION FOR ZERO-SHOT CROSS-MODAL RETRIEVAL
    Chakraborty, Bela
    Wang, Peng
    Wang, Lei
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2648 - 2652