A Study of Visual Question Answering Techniques Based on Collaborative Multi-Head Attention

被引：1

作者：

Yang, Yingli ^{[1
]}

Jin, Jingxuan ^{[1
]}

Li, De ^{[2
]}

机构：

[1] Yanbian Univ, Inst Intelligent Informat Proc, Yanji, Peoples R China

[2] Yanbian Univ, Dept Compter Sci & Technol, Yanji, Peoples R China

来源：

2023 3RD ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS TECHNOLOGY AND COMPUTER SCIENCE, ACCTCS | 2023年

基金：

中国国家自然科学基金;

关键词：

visual question answering; pre-training; collaborative multi-head attention; Swin transformer;

D O I：

10.1109/ACCTCS58815.2023.00037

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In visual question answering task, the dominant approach recently has been to use a unified model for pre-training and fine tuning it. This unified model typically uses a transformer to fuse image and text information. In order to optimize the performance of the model on visual question answering task, this paper proposes a transformer architecture based on a collaborative multi-head attention mechanism to address the key/value projection redundancy problem in the multi-head attention mechanism of the transformer. In addition, this paper uses the Swin transformer model as the image feature extractor to extract multi-scale image information. Validation experiments are conducted on the VQA v2 dataset in this paper, and the experimental results show that applying the collaborative multi-head attention approach and the Swin transformer backbone to the visual question answering model can effectively improve the correct rate of the visual question answering task.

引用

页码：552 / 555

页数：4

共 50 条

[1] Parallel multi-head attention and term-weighted question embedding for medical visual question answering
Manmadhan, Sruthy
Kovoor, Binsu C.
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (22) : 34937 - 34958
[2] Parallel multi-head attention and term-weighted question embedding for medical visual question answering
Sruthy Manmadhan
Binsu C Kovoor
[J]. Multimedia Tools and Applications, 2023, 82 : 34937 - 34958
[3] Collaborative Attention Network to Enhance Visual Question Answering
Gu, Rui
[J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 304 - 305
[4] Multi-stage Attention based Visual Question Answering
Mishra, Aakansha
Anand, Ashish
Guha, Prithwijit
[J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9407 - 9414
[5] A Novel Knowledge Tracing Model Based on Collaborative Multi-Head Attention
Zhang Wei
Qu Kaiyuan
Han Yahui
Tan Longan
[J]. 6TH INTERNATIONAL CONFERENCE ON INNOVATION IN ARTIFICIAL INTELLIGENCE, ICIAI2022, 2022, : 210 - 215
[6] Duplicate Question Detection based on Neural Networks and Multi-head Attention
Zhang, Heng
Chen, Liangyu
[J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 13 - 18
[7] Word-level dual channel with multi-head semantic attention interaction for community question answering
Wu, Jinmeng
Hong, Hanyu
Zhang, Yaozong
Hao, Yanbin
Ma, Lei
Wang, Lei
[J]. ELECTRONIC RESEARCH ARCHIVE, 2023, 31 (10): : 6012 - 6026
[8] Combining Multi-Head Attention and Sparse Multi-Head Attention Networks for Session-Based Recommendation
Zhao, Zhiwei
Wang, Xiaoye
Xiao, Yingyuan
[J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[9] Multi-level Attention Networks for Visual Question Answering
Yu, Dongfei
Fu, Jianlong
Mei, Tao
Rui, Yong
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4187 - 4195
[10] Entity and relation collaborative extraction approach based on multi-head attention and gated mechanism
Zhao, Wei
Zhao, Shan
Chen, Shuhui
Weng, Tien-Hsiung
Kang, WenJie
[J]. CONNECTION SCIENCE, 2022, 34 (01) : 670 - 686

← 1 2 3 4 5 →