CSA-BERT: Video Question Answering

被引：0

作者：

Jenni, Kommineni ^{[1
]}

Srinivas, M. ^{[2
]}

Sannapu, Roshni ^{[2
]}

Perumal, Murukessan ^{[2
]}

机构：

[1] King Khalid Univ, Comp Sci Dept, Abha, Saudi Arabia

[2] NIT Warangal, CSE Dept, Warangal, Telangana, India

来源：

2023 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP, SSP | 2023年

关键词：

component; formatting; style; styling; insert;

D O I：

10.1109/SSP53291.2023.10207954

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Convolutional networks are a key component of many computer vision applications. However, convolutions have a serious flaw. It only works in a small area, hence it lacks global information. The Attention method, on the other hand, is a new improvement in capturing long range interactions that has mostly been used to sequence modeling and generative modeling tasks. As an alternative to convolutions, we investigate the use of convolutions with an attention mechanism in a video question answering task. We present a unique self-attention mechanism based on convolutions that outperforms convolutions in the video question answering task. We discovered that combining convolutions with self-attention produces the greatest outcomes in experiments. As a result, we propose a hybrid idea, which combines convolutional operators with the self-attention mechanism. We combine convolutional feature maps with self-attention feature maps. Experiments show that convolution with self-attention improves video question answering tasks on the MSRVTT-QA dataset.

引用

页码：532 / 536

页数：5

共 50 条

[1] BERT Representations for Video Question Answering
Yang, Zekun
Garcia, Noa
Chu, Chenhui
Otani, Mayu
Nakashima, Yuta
Takemura, Haruo
[J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1545 - 1554
[2] Affective question answering on video
Ruwa, Nelson
Mao, Qirong
Wang, Liangjun
Gou, Jianping
[J]. NEUROCOMPUTING, 2019, 363 : 125 - 139
[3] BERT with History Answer Embedding for Conversational Question Answering
Qu, Chen
Yang, Liu
Qiu, Minghui
Croft, W. Bruce
Zhang, Yongfeng
Iyyer, Mohit
[J]. PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 1133 - 1136
[4] PAL-BERT: An Improved Question Answering Model
Zheng, Wenfeng
Lu, Siyu
Cai, Zhuohang
Wang, Ruiyang
Wang, Lei
Yin, Lirong
[J]. CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2024, 139 (03): : 2729 - 2745
[5] Video Graph Transformer for Video Question Answering
Xiao, Junbin
Zhou, Pan
Chua, Tat-Seng
Yan, Shuicheng
[J]. COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 39 - 58
[6] Video Reference: A Video Question Answering Engine
Gao, Lei
Li, Guangda
Zheng, Yan-Tao
Hong, Richang
Chua, Tat-Seng
[J]. ADVANCES IN MULTIMEDIA MODELING, PROCEEDINGS, 2010, 5916 : 799 - +
[7] Locate Before Answering: Answer Guided Question Localization for Video Question Answering
Qian, Tianwen
Cui, Ran
Chen, Jingjing
Peng, Pai
Guo, Xiaowei
Jiang, Yu-Gang
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4554 - 4563
[8] MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering
Khan, Aisha Urooj
Mazaheri, Amir
Lobo, Niels Da Vitoria
Shah, Mubarak
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4648 - 4660
[9] Video Question Answering on Screencast Tutorials
Zhao, Wentian
Kim, Seokhwan
Xu, Ning
Jin, Hailin
[J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 1061 - 1068
[10] Video Question Answering by Frame Attention
Fang, Jiannan
Sun, Lingling
Wang, Yaqi
[J]. ELEVENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2019), 2019, 11179

← 1 2 3 4 5 →