A Study of Visual Question Answering Techniques Based on Collaborative Multi-Head Attention

被引：1

作者：

Yang, Yingli ^{[1
]}

Jin, Jingxuan ^{[1
]}

Li, De ^{[2
]}

机构：

[1] Yanbian Univ, Inst Intelligent Informat Proc, Yanji, Peoples R China

[2] Yanbian Univ, Dept Compter Sci & Technol, Yanji, Peoples R China

来源：

2023 3RD ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS TECHNOLOGY AND COMPUTER SCIENCE, ACCTCS | 2023年

基金：

中国国家自然科学基金;

关键词：

visual question answering; pre-training; collaborative multi-head attention; Swin transformer;

D O I：

10.1109/ACCTCS58815.2023.00037

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In visual question answering task, the dominant approach recently has been to use a unified model for pre-training and fine tuning it. This unified model typically uses a transformer to fuse image and text information. In order to optimize the performance of the model on visual question answering task, this paper proposes a transformer architecture based on a collaborative multi-head attention mechanism to address the key/value projection redundancy problem in the multi-head attention mechanism of the transformer. In addition, this paper uses the Swin transformer model as the image feature extractor to extract multi-scale image information. Validation experiments are conducted on the VQA v2 dataset in this paper, and the experimental results show that applying the collaborative multi-head attention approach and the Swin transformer backbone to the visual question answering model can effectively improve the correct rate of the visual question answering task.

引用

页码：552 / 555

页数：4

共 50 条

[31] Dual-feature collaborative relation-attention networks for visual question answering
Yao, Lu
Yang, You
Hu, Juntao
[J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2023, 12 (02)
[32] Dual-feature collaborative relation-attention networks for visual question answering
Lu Yao
You Yang
Juntao Hu
[J]. International Journal of Multimedia Information Retrieval, 2023, 12
[33] Multi-source Multi-level Attention Networks for Visual Question Answering
Yu, Dongfei
Fu, Jianlong
Tian, Xinmei
Mei, Tao
[J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (02)
[34] Guiding Visual Question Answering with Attention Priors
Le, Thao Minh
Le, Vuong
Gupta, Sunil
Venkatesh, Svetha
Tran, Truyen
[J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 4370 - 4379
[35] Re-Attention for Visual Question Answering
Guo, Wenya
Zhang, Ying
Yang, Jufeng
Yuan, Xiaojie
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 6730 - 6743
[36] Re-Attention for Visual Question Answering
Guo, Wenya
Zhang, Ying
Wu, Xiaoping
Yang, Jufeng
Cai, Xiangrui
Yuan, Xiaojie
[J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 91 - 98
[37] Feature Enhancement in Attention for Visual Question Answering
Lin, Yuetan
Pang, Zhangyang
Wang, Donghui
Zhuang, Yueting
[J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4216 - 4222
[38] Feature Fusion Attention Visual Question Answering
Wang, Chunlin
Sun, Jianyong
Chen, Xiaolin
[J]. ICMLC 2019: 2019 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2019, : 412 - 416
[39] Dynamic Capsule Attention for Visual Question Answering
Zhou, Yiyi
Ji, Rongrong
Su, Jinsong
Sun, Xiaoshuai
Chen, Weiqiu
[J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9324 - 9331
[40] Multi-Head Attention with Disagreement Regularization
Li, Jian
Tu, Zhaopeng
Yang, Baosong
Lyu, Michael R.
Zhang, Tong
[J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2897 - 2903

← 1 2 3 4 5 →