HIERARCHICAL RELATIONAL ATTENTION FOR VIDEO QUESTION ANSWERING

被引:0
|
作者
Chowdhury, Muhammad Iqbal Hasan [1 ]
Kien Nguyen [1 ]
Sridharan, Sridha [1 ]
Fookes, Clinton [1 ]
机构
[1] Queensland Univ Technol, Brisbane, Qld, Australia
关键词
Visual Question Answering (VQA); Hierarchical relational attention; scene understanding;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Video Question Answering (VideoQA) tasks require understanding of the connection of context specific video parts which are temporally distributed. Humans are capable of focusing on temporally distributed video scenes and also to find correspondence or relationships among these segments. To achieve similar capability, a hierarchical relational attention mechanism is proposed in this paper. The proposed VideoQA model derives attention on temporal segments i.e. video features based on each of the question words. Also, contextual relevance of these temporal segments are captured to derive the final video representation which leads to a better reasoning capability. We evaluate the performance of the proposed approach on the MSRVTT-QA and the MSVD-QA datasets to establish its superior performance over the state of the art.
引用
下载
收藏
页码:599 / 603
页数:5
相关论文
共 50 条
  • [1] Hierarchical Recurrent Contextual Attention Network for Video Question Answering
    Zhou, Fei
    Han, Yahong
    ARTIFICIAL INTELLIGENCE, CICAI 2022, PT II, 2022, 13605 : 280 - 290
  • [2] Relation-aware Hierarchical Attention Framework for Video Question Answering
    Li, Fangtao
    Liu, Zihe
    Bai, Ting
    Yan, Chenghao
    Cao, Chenyu
    Wu, Bin
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 164 - 172
  • [3] HAIR: Hierarchical Visual-Semantic Relational Reasoning for Video Question Answering
    Liu, Fei
    Liu, Jing
    Wang, Weining
    Lu, Hanqing
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1678 - 1687
  • [4] Question Answering with Hierarchical Attention Networks
    Alpay, Tayfun
    Heinrich, Stefan
    Nelskamp, Michael
    Wermter, Stefan
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [5] Video Question Answering by Frame Attention
    Fang, Jiannan
    Sun, Lingling
    Wang, Yaqi
    ELEVENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2019), 2019, 11179
  • [6] Video Question Answering via Hierarchical Spatio-Temporal Attention Networks
    Zhao, Zhou
    Yang, Qifan
    Cai, Deng
    He, Xiaofei
    Zhuang, Yueting
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3518 - 3524
  • [7] Video Question Answering via Hierarchical Dual-Level Attention Network Learning
    Zhao, Zhou
    Lin, Jinghao
    Jiang, Xinghua
    Cai, Deng
    He, Xiaofei
    Zhuang, Yueting
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1050 - 1058
  • [8] Multichannel Attention Refinement for Video Question Answering
    Zhuang, Yueting
    Xu, Dejing
    Yan, Xin
    Cheng, Wenzhuo
    Zhao, Zhou
    Pu, Shiliang
    Xiao, Jun
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (01)
  • [9] Hierarchical Temporal Fusion of Multi-grained Attention Features for Video Question Answering
    Xiao, Shaoning
    Li, Yimeng
    Ye, Yunan
    Chen, Long
    Pu, Shiliang
    Zhao, Zhou
    Shao, Jian
    Xiao, Jun
    NEURAL PROCESSING LETTERS, 2020, 52 (02) : 993 - 1003
  • [10] Hierarchical Temporal Fusion of Multi-grained Attention Features for Video Question Answering
    Shaoning Xiao
    Yimeng Li
    Yunan Ye
    Long Chen
    Shiliang Pu
    Zhou Zhao
    Jian Shao
    Jun Xiao
    Neural Processing Letters, 2020, 52 : 993 - 1003