Hierarchical Temporal Fusion of Multi-grained Attention Features for Video Question Answering

被引:0
|
作者
Shaoning Xiao
Yimeng Li
Yunan Ye
Long Chen
Shiliang Pu
Zhou Zhao
Jian Shao
Jun Xiao
机构
[1] Zhejiang University,
来源
Neural Processing Letters | 2020年 / 52卷
关键词
Video question answering; Multi-grained representation; Temporal co-attention;
D O I
暂无
中图分类号
学科分类号
摘要
This work aims to address the problem of video question answering (VideoQA) with a novel model and a new open-ended VideoQA dataset. VideoQA is a challenging field in visual information retrieval, which aims to generate the answer according to the video content and question. Ultimately, VideoQA is a video understanding task. Efficiently combining the multi-grained representations is the key factor in understanding a video. The existing works mostly focus on overall frame-level visual understanding to tackle the problem, which neglects finer-grained and temporal information inside the video, or just combines the multi-grained representations simply by concatenation or addition. Thus, we propose the multi-granularity temporal attention network that enables to search for the specific frames in a video that are holistically and locally related to the answer. We first learn the mutual attention representations of multi-grained visual content and question. Then the mutually attended features are combined hierarchically using a double layer LSTM to generate the answer. Furthermore, we illustrate several different multi-grained fusion configurations to prove the advancement of this hierarchical architecture. The effectiveness of our model is demonstrated on the large-scale video question answering dataset based on ActivityNet dataset.
引用
收藏
页码:993 / 1003
页数:10
相关论文
共 50 条
  • [1] Hierarchical Temporal Fusion of Multi-grained Attention Features for Video Question Answering
    Xiao, Shaoning
    Li, Yimeng
    Ye, Yunan
    Chen, Long
    Pu, Shiliang
    Zhao, Zhou
    Shao, Jian
    Xiao, Jun
    [J]. NEURAL PROCESSING LETTERS, 2020, 52 (02) : 993 - 1003
  • [2] Multi-grained Attention with Object-level Grounding for Visual Question Answering
    Huang, Pingping
    Huang, Jianhui
    Guo, Yuqing
    Qiao, Min
    Zhu, Yong
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3595 - 3600
  • [3] Multi-grained unsupervised evidence retrieval for question answering
    Hao You
    [J]. Neural Computing and Applications, 2023, 35 : 21247 - 21257
  • [4] Multi-grained unsupervised evidence retrieval for question answering
    You, Hao
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (28): : 21247 - 21257
  • [5] Hierarchical Attention-Based Fusion for Image Caption With Multi-Grained Rewards
    Wu, Chunlei
    Yuan, Shaozu
    Cao, Haiwen
    Wei, Yiwei
    Wang, Leiquan
    [J]. IEEE ACCESS, 2020, 8 : 57943 - 57951
  • [6] HIERARCHICAL RELATIONAL ATTENTION FOR VIDEO QUESTION ANSWERING
    Chowdhury, Muhammad Iqbal Hasan
    Kien Nguyen
    Sridharan, Sridha
    Fookes, Clinton
    [J]. 2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 599 - 603
  • [7] Video Question Answering via Hierarchical Spatio-Temporal Attention Networks
    Zhao, Zhou
    Yang, Qifan
    Cai, Deng
    He, Xiaofei
    Zhuang, Yueting
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3518 - 3524
  • [8] MMTF: Multi-Modal Temporal Fusion for Commonsense Video Question Answering
    Ahmad, Mobeen
    Park, Geonwoo
    Park, Dongchan
    Park, Sanguk
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 4659 - 4664
  • [9] Multi-Granularity Hierarchical Attention Fusion Networks for Reading Comprehension and Question Answering
    Wang, Wei
    Yan, Ming
    Wu, Chen
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 1705 - 1714
  • [10] Hierarchical Recurrent Contextual Attention Network for Video Question Answering
    Zhou, Fei
    Han, Yahong
    [J]. ARTIFICIAL INTELLIGENCE, CICAI 2022, PT II, 2022, 13605 : 280 - 290