Uncovering the Temporal Context for Video Question Answering

被引:1
|
作者
Linchao Zhu
Zhongwen Xu
Yi Yang
Alexander G. Hauptmann
机构
[1] University of Technology Sydney,CAI
[2] Carnegie Mellon University,SCS
来源
关键词
Video sequence modeling; Video question answering; Video prediction; Cross-media;
D O I
暂无
中图分类号
学科分类号
摘要
In this work, we introduce Video Question Answering in the temporal domain to infer the past, describe the present and predict the future. We present an encoder–decoder approach using Recurrent Neural Networks to learn the temporal structures of videos and introduce a dual-channel ranking loss to answer multiple-choice questions. We explore approaches for finer understanding of video content using the question form of “fill-in-the-blank”, and collect our Video Context QA dataset consisting of 109,895 video clips with a total duration of more than 1000 h from existing TACoS, MPII-MD and MEDTest 14 datasets. In addition, 390,744 corresponding questions are generated from annotations. Extensive experiments demonstrate that our approach significantly outperforms the compared baselines.
引用
收藏
页码:409 / 421
页数:12
相关论文
共 50 条
  • [21] Video Question Answering by Frame Attention
    Fang, Jiannan
    Sun, Lingling
    Wang, Yaqi
    [J]. ELEVENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2019), 2019, 11179
  • [22] BERT Representations for Video Question Answering
    Yang, Zekun
    Garcia, Noa
    Chu, Chenhui
    Otani, Mayu
    Nakashima, Yuta
    Takemura, Haruo
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1545 - 1554
  • [23] Invariant Grounding for Video Question Answering
    Li, Yicong
    Wang, Xiang
    Xiao, Junbin
    Ji, Wei
    Chua, Tat-Seng
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2918 - 2927
  • [24] Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models
    Ko, Dohwan
    Lee, Ji Soo
    Choi, Miso
    Chu, Jaewon
    Park, Jihwan
    Kim, Hyunwoo J.
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3078 - 3089
  • [25] Leveraging Video Descriptions to Learn Video Question Answering
    Zeng, Kuo-Hao
    Chen, Tseng-Hung
    Chuang, Ching-Yao
    Liao, Yuan-Hong
    Niebles, Juan Carlos
    Sun, Min
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4334 - 4340
  • [26] QUESTION ANSWERING IN THE CONTEXT OF SCIENTIFIC MECHANISMS
    GRAESSER, AC
    HEMPHILL, D
    [J]. JOURNAL OF MEMORY AND LANGUAGE, 1991, 30 (02) : 186 - 209
  • [27] Document retrieval in the context of question answering
    Monz, C
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2003, 2633 : 571 - 579
  • [28] QUESTION ANSWERING IN THE CONTEXT OF GENERIC CONCEPTS
    GRAESSER, AC
    MAGLIANO, JP
    [J]. BULLETIN OF THE PSYCHONOMIC SOCIETY, 1990, 28 (06) : 527 - 527
  • [29] TempQuestions: A Benchmark for Temporal Question Answering
    Jia, Zhen
    Abujabal, Abdalghani
    Roy, Rishiraj Saha
    Stroetgen, Jannik
    Weikum, Gerhard
    [J]. COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 1057 - 1062
  • [30] Question answering with imperfect temporal information
    Schockaert, Steven
    Ahn, David
    De Cock, Martine
    Kerre, Etienne E.
    [J]. FLEXIBLE QUERY ANSWERING SYSTEMS, PROCEEDINGS, 2006, 4027 : 647 - 658