Uncovering the Temporal Context for Video Question Answering

被引：1

作者：

Linchao Zhu

Zhongwen Xu

Yi Yang

Alexander G. Hauptmann

机构：

[1] University of Technology Sydney,CAI

[2] Carnegie Mellon University,SCS

来源：

International Journal of Computer Vision | 2017年 / 124卷

关键词：

Video sequence modeling; Video question answering; Video prediction; Cross-media;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this work, we introduce Video Question Answering in the temporal domain to infer the past, describe the present and predict the future. We present an encoder–decoder approach using Recurrent Neural Networks to learn the temporal structures of videos and introduce a dual-channel ranking loss to answer multiple-choice questions. We explore approaches for finer understanding of video content using the question form of “fill-in-the-blank”, and collect our Video Context QA dataset consisting of 109,895 video clips with a total duration of more than 1000 h from existing TACoS, MPII-MD and MEDTest 14 datasets. In addition, 390,744 corresponding questions are generated from annotations. Extensive experiments demonstrate that our approach significantly outperforms the compared baselines.

引用

页码：409 / 421

页数：12

共 50 条

[21] Video Question Answering by Frame Attention
Fang, Jiannan
Sun, Lingling
Wang, Yaqi
[J]. ELEVENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2019), 2019, 11179
[22] BERT Representations for Video Question Answering
Yang, Zekun
Garcia, Noa
Chu, Chenhui
Otani, Mayu
Nakashima, Yuta
Takemura, Haruo
[J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1545 - 1554
[23] Invariant Grounding for Video Question Answering
Li, Yicong
Wang, Xiang
Xiao, Junbin
Ji, Wei
Chua, Tat-Seng
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2918 - 2927
[24] Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models
Ko, Dohwan
Lee, Ji Soo
Choi, Miso
Chu, Jaewon
Park, Jihwan
Kim, Hyunwoo J.
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3078 - 3089
[25] Leveraging Video Descriptions to Learn Video Question Answering
Zeng, Kuo-Hao
Chen, Tseng-Hung
Chuang, Ching-Yao
Liao, Yuan-Hong
Niebles, Juan Carlos
Sun, Min
[J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4334 - 4340
[26] QUESTION ANSWERING IN THE CONTEXT OF SCIENTIFIC MECHANISMS
GRAESSER, AC
HEMPHILL, D
[J]. JOURNAL OF MEMORY AND LANGUAGE, 1991, 30 (02) : 186 - 209
[27] Document retrieval in the context of question answering
Monz, C
[J]. ADVANCES IN INFORMATION RETRIEVAL, 2003, 2633 : 571 - 579
[28] QUESTION ANSWERING IN THE CONTEXT OF GENERIC CONCEPTS
GRAESSER, AC
MAGLIANO, JP
[J]. BULLETIN OF THE PSYCHONOMIC SOCIETY, 1990, 28 (06) : 527 - 527
[29] TempQuestions: A Benchmark for Temporal Question Answering
Jia, Zhen
Abujabal, Abdalghani
Roy, Rishiraj Saha
Stroetgen, Jannik
Weikum, Gerhard
[J]. COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 1057 - 1062
[30] Question answering with imperfect temporal information
Schockaert, Steven
Ahn, David
De Cock, Martine
Kerre, Etienne E.
[J]. FLEXIBLE QUERY ANSWERING SYSTEMS, PROCEEDINGS, 2006, 4027 : 647 - 658

← 1 2 3 4 5 →