Uncovering the Temporal Context for Video Question Answering

被引:1
|
作者
Linchao Zhu
Zhongwen Xu
Yi Yang
Alexander G. Hauptmann
机构
[1] University of Technology Sydney,CAI
[2] Carnegie Mellon University,SCS
来源
关键词
Video sequence modeling; Video question answering; Video prediction; Cross-media;
D O I
暂无
中图分类号
学科分类号
摘要
In this work, we introduce Video Question Answering in the temporal domain to infer the past, describe the present and predict the future. We present an encoder–decoder approach using Recurrent Neural Networks to learn the temporal structures of videos and introduce a dual-channel ranking loss to answer multiple-choice questions. We explore approaches for finer understanding of video content using the question form of “fill-in-the-blank”, and collect our Video Context QA dataset consisting of 109,895 video clips with a total duration of more than 1000 h from existing TACoS, MPII-MD and MEDTest 14 datasets. In addition, 390,744 corresponding questions are generated from annotations. Extensive experiments demonstrate that our approach significantly outperforms the compared baselines.
引用
收藏
页码:409 / 421
页数:12
相关论文
共 50 条
  • [41] Research Progress of Video Question Answering Technologies
    Bao, Cuizhu
    Ding, Kai
    Dong, Jianfeng
    Yang, Xun
    Xie, Mande
    Wang, Xun
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2024, 61 (03): : 639 - 673
  • [42] Multichannel Attention Refinement for Video Question Answering
    Zhuang, Yueting
    Xu, Dejing
    Yan, Xin
    Cheng, Wenzhuo
    Zhao, Zhou
    Pu, Shiliang
    Xiao, Jun
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (01)
  • [43] VQuAD: Video Question Answering Diagnostic Dataset
    Gupta, Vivek
    Patro, Badri N.
    Parihar, Hemant
    Namboodiri, Vinay P.
    [J]. 2022 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2022), 2022, : 282 - 291
  • [44] Contrastive Video Question Answering via Video Graph Transformer
    Xiao, Junbin
    Zhou, Pan
    Yao, Angela
    Li, Yicong
    Hong, Richang
    Yan, Shuicheng
    Chua, Tat-Seng
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13265 - 13280
  • [45] CSA-BERT: Video Question Answering
    Jenni, Kommineni
    Srinivas, M.
    Sannapu, Roshni
    Perumal, Murukessan
    [J]. 2023 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP, SSP, 2023, : 532 - 536
  • [46] Remember and forget: video and text fusion for video question answering
    Gao, Feng
    Ge, Yuanyuan
    Liu, Yongge
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (22) : 29269 - 29282
  • [47] Video Question Answering With Semantic Disentanglement and Reasoning
    Liu, Jin
    Wang, Guoxiang
    Xie, Jialong
    Zhou, Fengyu
    Xu, Huijuan
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3663 - 3673
  • [48] Embedding VLAD in Transformer for Video Question Answering
    Guo, Dan
    Yao, Shen-Tao
    Wang, Hui
    Wang, Meng
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2023, 46 (04): : 671 - 689
  • [49] Measuring Compositional Consistency for Video Question Answering
    Gandhi, Mona
    Gul, Mustafa Omer
    Prakash, Eva
    Grunde-McLaughlin, Madeleine
    Krishna, Ranjay
    Agrawala, Maneesh
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5036 - 5045
  • [50] Complementary spatiotemporal network for video question answering
    Li, Xinrui
    Wu, Aming
    Han, Yahong
    [J]. MULTIMEDIA SYSTEMS, 2022, 28 (01) : 161 - 169