A Study of Visual Question Answering Techniques Based on Collaborative Multi-Head Attention

被引:1
|
作者
Yang, Yingli [1 ]
Jin, Jingxuan [1 ]
Li, De [2 ]
机构
[1] Yanbian Univ, Inst Intelligent Informat Proc, Yanji, Peoples R China
[2] Yanbian Univ, Dept Compter Sci & Technol, Yanji, Peoples R China
基金
中国国家自然科学基金;
关键词
visual question answering; pre-training; collaborative multi-head attention; Swin transformer;
D O I
10.1109/ACCTCS58815.2023.00037
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In visual question answering task, the dominant approach recently has been to use a unified model for pre-training and fine tuning it. This unified model typically uses a transformer to fuse image and text information. In order to optimize the performance of the model on visual question answering task, this paper proposes a transformer architecture based on a collaborative multi-head attention mechanism to address the key/value projection redundancy problem in the multi-head attention mechanism of the transformer. In addition, this paper uses the Swin transformer model as the image feature extractor to extract multi-scale image information. Validation experiments are conducted on the VQA v2 dataset in this paper, and the experimental results show that applying the collaborative multi-head attention approach and the Swin transformer backbone to the visual question answering model can effectively improve the correct rate of the visual question answering task.
引用
收藏
页码:552 / 555
页数:4
相关论文
共 50 条
  • [1] Parallel multi-head attention and term-weighted question embedding for medical visual question answering
    Manmadhan, Sruthy
    Kovoor, Binsu C.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (22) : 34937 - 34958
  • [2] Parallel multi-head attention and term-weighted question embedding for medical visual question answering
    Sruthy Manmadhan
    Binsu C Kovoor
    [J]. Multimedia Tools and Applications, 2023, 82 : 34937 - 34958
  • [3] Collaborative Attention Network to Enhance Visual Question Answering
    Gu, Rui
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 304 - 305
  • [4] Multi-stage Attention based Visual Question Answering
    Mishra, Aakansha
    Anand, Ashish
    Guha, Prithwijit
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9407 - 9414
  • [5] A Novel Knowledge Tracing Model Based on Collaborative Multi-Head Attention
    Zhang Wei
    Qu Kaiyuan
    Han Yahui
    Tan Longan
    [J]. 6TH INTERNATIONAL CONFERENCE ON INNOVATION IN ARTIFICIAL INTELLIGENCE, ICIAI2022, 2022, : 210 - 215
  • [6] Duplicate Question Detection based on Neural Networks and Multi-head Attention
    Zhang, Heng
    Chen, Liangyu
    [J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 13 - 18
  • [7] Word-level dual channel with multi-head semantic attention interaction for community question answering
    Wu, Jinmeng
    Hong, Hanyu
    Zhang, Yaozong
    Hao, Yanbin
    Ma, Lei
    Wang, Lei
    [J]. ELECTRONIC RESEARCH ARCHIVE, 2023, 31 (10): : 6012 - 6026
  • [8] Combining Multi-Head Attention and Sparse Multi-Head Attention Networks for Session-Based Recommendation
    Zhao, Zhiwei
    Wang, Xiaoye
    Xiao, Yingyuan
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [9] Multi-level Attention Networks for Visual Question Answering
    Yu, Dongfei
    Fu, Jianlong
    Mei, Tao
    Rui, Yong
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4187 - 4195
  • [10] Entity and relation collaborative extraction approach based on multi-head attention and gated mechanism
    Zhao, Wei
    Zhao, Shan
    Chen, Shuhui
    Weng, Tien-Hsiung
    Kang, WenJie
    [J]. CONNECTION SCIENCE, 2022, 34 (01) : 670 - 686