HIERARCHICAL RELATIONAL ATTENTION FOR VIDEO QUESTION ANSWERING

被引:0
|
作者
Chowdhury, Muhammad Iqbal Hasan [1 ]
Kien Nguyen [1 ]
Sridharan, Sridha [1 ]
Fookes, Clinton [1 ]
机构
[1] Queensland Univ Technol, Brisbane, Qld, Australia
关键词
Visual Question Answering (VQA); Hierarchical relational attention; scene understanding;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Video Question Answering (VideoQA) tasks require understanding of the connection of context specific video parts which are temporally distributed. Humans are capable of focusing on temporally distributed video scenes and also to find correspondence or relationships among these segments. To achieve similar capability, a hierarchical relational attention mechanism is proposed in this paper. The proposed VideoQA model derives attention on temporal segments i.e. video features based on each of the question words. Also, contextual relevance of these temporal segments are captured to derive the final video representation which leads to a better reasoning capability. We evaluate the performance of the proposed approach on the MSRVTT-QA and the MSVD-QA datasets to establish its superior performance over the state of the art.
引用
收藏
页码:599 / 603
页数:5
相关论文
共 50 条
  • [41] QHAN: Quantum-inspired Hierarchical Attention Mechanism Network for Question Answering
    Guo, Peng
    Wang, Panpan
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2023, 32 (05)
  • [42] Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering
    Li, Xiangpeng
    Song, Jingkuan
    Gao, Lianli
    Liu, Xianglong
    Huang, Wenbing
    He, Xiangnan
    Gan, Chuang
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8658 - 8665
  • [43] Depth-Aware and Semantic Guided Relational Attention Network for Visual Question Answering
    Liu, Yuhang
    Wei, Wei
    Peng, Daowan
    Mao, Xian-Ling
    He, Zhiyong
    Zhou, Pan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5344 - 5357
  • [44] Affective question answering on video
    Ruwa, Nelson
    Mao, Qirong
    Wang, Liangjun
    Gou, Jianping
    NEUROCOMPUTING, 2019, 363 : 125 - 139
  • [45] Hierarchical synchronization with structured multi-granularity interaction for video question answering
    Qi, Shanshan
    Yang, Luxi
    Li, Chunguo
    NEUROCOMPUTING, 2024, 582
  • [46] Open-Ended Multi-Modal Relational Reasoning for Video Question Answering
    Luo, Haozheng
    Qin, Ruiyang
    Xu, Chenwei
    Ye, Guo
    Luo, Zening
    2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, 2023, : 363 - 369
  • [47] Video question answering via grounded cross-attention network learning
    Ye, Yunan
    Zhang, Shifeng
    Li, Yimeng
    Qian, Xufeng
    Tang, Siliang
    Pu, Shiliang
    Xiao, Jun
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (04)
  • [48] Spatiotemporal-Textual Co-Attention Network for Video Question Answering
    Zha, Zheng-Jun
    Liu, Jiawei
    Yang, Tianhao
    Zhang, Yongdong
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (02)
  • [49] Compositional attention networks with two-stream fusion for video question answering
    Yu, Ting
    Yu, Jun
    Yu, Zhou
    Tao, Dacheng
    IEEE Transactions on Image Processing, 2020, 29 : 1204 - 1218
  • [50] Compositional Attention Networks With Two-Stream Fusion for Video Question Answering
    Yu, Ting
    Yu, Jun
    Yu, Zhou
    Tao, Dacheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 1204 - 1218