Multi-Channel Co-Attention Network for Visual Question Answering

被引:0
|
作者
Tian, Weidong
He, Bin
Wang, Nanxun
Zhao, Zhongqiu [1 ,2 ]
机构
[1] Hefei Univ Technol, Key Lab Knowledge Engn Big Data, Hefei, Anhui, Peoples R China
[2] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
VQA; Multi-Channel Co-Attention Network; Multi-Hierarchical Fusion;
D O I
10.1109/ijcnn48605.2020.9207058
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Answering (VQA) is to reason out correct answers based on input questions and images. Significant progresses have been made by learning rich embedding features from images and questions by bilinear models. Attention mechanisms are widely used to focus on specific visual and textual information in VQA reasoning process. However, most state-of-the-art methods concentrate on fusing the global multi-modal features, while neglect local features. Besides, the dimension is reduced excessively (from Kx2048 to 2048) in general visual attention, which causes a mass of visual information loss. In this paper, we propose a novel multi-channel co-attention network (MC-CAN), which integrates multi-modal features from global level to local level. We design different multi-channel attention mechanisms separately for visual (from Kx2048 to Mx2048) and textual features at different level of integrations. Additionally, we further improve our proposed approach by combining it with the complementary modules such as the MLB and the Count modules. Experiments on benchmark datasets show that our approach achieves better VQA performance than other state-of-the-art methods.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Co-Attention Network With Question Type for Visual Question Answering
    Yang, Chao
    Jiang, Mengqi
    Jiang, Bin
    Zhou, Weixin
    Li, Keqin
    [J]. IEEE ACCESS, 2019, 7 : 40771 - 40781
  • [2] Dynamic Co-attention Network for Visual Question Answering
    Ebaid, Doaa B.
    Madbouly, Magda M.
    El-Zoghabi, Adel A.
    [J]. 2021 8TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI 2021), 2021, : 125 - 129
  • [3] Co-attention Network for Visual Question Answering Based on Dual Attention
    Dong, Feng
    Wang, Xiaofeng
    Oad, Ammar
    Talpur, Mir Sajjad Hussain
    [J]. Journal of Engineering Science and Technology Review, 2021, 14 (06) : 116 - 123
  • [4] Co-attention graph convolutional network for visual question answering
    Liu, Chuan
    Tan, Ying-Ying
    Xia, Tian-Tian
    Zhang, Jiajing
    Zhu, Ming
    [J]. MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2527 - 2543
  • [5] Co-attention graph convolutional network for visual question answering
    Chuan Liu
    Ying-Ying Tan
    Tian-Tian Xia
    Jiajing Zhang
    Ming Zhu
    [J]. Multimedia Systems, 2023, 29 : 2527 - 2543
  • [6] Multi-modal co-attention relation networks for visual question answering
    Zihan Guo
    Dezhi Han
    [J]. The Visual Computer, 2023, 39 : 5783 - 5795
  • [7] Multi-modal co-attention relation networks for visual question answering
    Guo, Zihan
    Han, Dezhi
    [J]. VISUAL COMPUTER, 2023, 39 (11): : 5783 - 5795
  • [8] Hierarchical Question-Image Co-Attention for Visual Question Answering
    Lu, Jiasen
    Yang, Jianwei
    Batra, Dhruv
    Parikh, Devi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [9] Deep Modular Co-Attention Networks for Visual Question Answering
    Yu, Zhou
    Yu, Jun
    Cui, Yuhao
    Tao, Dacheng
    Tian, Qi
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6274 - 6283
  • [10] An Effective Dense Co-Attention Networks for Visual Question Answering
    He, Shirong
    Han, Dezhi
    [J]. SENSORS, 2020, 20 (17) : 1 - 15