CSA-BERT: Video Question Answering

被引:0
|
作者
Jenni, Kommineni [1 ]
Srinivas, M. [2 ]
Sannapu, Roshni [2 ]
Perumal, Murukessan [2 ]
机构
[1] King Khalid Univ, Comp Sci Dept, Abha, Saudi Arabia
[2] NIT Warangal, CSE Dept, Warangal, Telangana, India
关键词
component; formatting; style; styling; insert;
D O I
10.1109/SSP53291.2023.10207954
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolutional networks are a key component of many computer vision applications. However, convolutions have a serious flaw. It only works in a small area, hence it lacks global information. The Attention method, on the other hand, is a new improvement in capturing long range interactions that has mostly been used to sequence modeling and generative modeling tasks. As an alternative to convolutions, we investigate the use of convolutions with an attention mechanism in a video question answering task. We present a unique self-attention mechanism based on convolutions that outperforms convolutions in the video question answering task. We discovered that combining convolutions with self-attention produces the greatest outcomes in experiments. As a result, we propose a hybrid idea, which combines convolutional operators with the self-attention mechanism. We combine convolutional feature maps with self-attention feature maps. Experiments show that convolution with self-attention improves video question answering tasks on the MSRVTT-QA dataset.
引用
收藏
页码:532 / 536
页数:5
相关论文
共 50 条
  • [1] BERT Representations for Video Question Answering
    Yang, Zekun
    Garcia, Noa
    Chu, Chenhui
    Otani, Mayu
    Nakashima, Yuta
    Takemura, Haruo
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1545 - 1554
  • [2] Affective question answering on video
    Ruwa, Nelson
    Mao, Qirong
    Wang, Liangjun
    Gou, Jianping
    [J]. NEUROCOMPUTING, 2019, 363 : 125 - 139
  • [3] BERT with History Answer Embedding for Conversational Question Answering
    Qu, Chen
    Yang, Liu
    Qiu, Minghui
    Croft, W. Bruce
    Zhang, Yongfeng
    Iyyer, Mohit
    [J]. PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 1133 - 1136
  • [4] PAL-BERT: An Improved Question Answering Model
    Zheng, Wenfeng
    Lu, Siyu
    Cai, Zhuohang
    Wang, Ruiyang
    Wang, Lei
    Yin, Lirong
    [J]. CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2024, 139 (03): : 2729 - 2745
  • [5] Video Graph Transformer for Video Question Answering
    Xiao, Junbin
    Zhou, Pan
    Chua, Tat-Seng
    Yan, Shuicheng
    [J]. COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 39 - 58
  • [6] Video Reference: A Video Question Answering Engine
    Gao, Lei
    Li, Guangda
    Zheng, Yan-Tao
    Hong, Richang
    Chua, Tat-Seng
    [J]. ADVANCES IN MULTIMEDIA MODELING, PROCEEDINGS, 2010, 5916 : 799 - +
  • [7] Locate Before Answering: Answer Guided Question Localization for Video Question Answering
    Qian, Tianwen
    Cui, Ran
    Chen, Jingjing
    Peng, Pai
    Guo, Xiaowei
    Jiang, Yu-Gang
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4554 - 4563
  • [8] MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering
    Khan, Aisha Urooj
    Mazaheri, Amir
    Lobo, Niels Da Vitoria
    Shah, Mubarak
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4648 - 4660
  • [9] Video Question Answering on Screencast Tutorials
    Zhao, Wentian
    Kim, Seokhwan
    Xu, Ning
    Jin, Hailin
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 1061 - 1068
  • [10] Video Question Answering by Frame Attention
    Fang, Jiannan
    Sun, Lingling
    Wang, Yaqi
    [J]. ELEVENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2019), 2019, 11179