Uncovering the Temporal Context for Video Question Answering

被引：1

作者：

Linchao Zhu

Zhongwen Xu

Yi Yang

Alexander G. Hauptmann

机构：

[1] University of Technology Sydney,CAI

[2] Carnegie Mellon University,SCS

来源：

International Journal of Computer Vision | 2017年 / 124卷

关键词：

Video sequence modeling; Video question answering; Video prediction; Cross-media;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this work, we introduce Video Question Answering in the temporal domain to infer the past, describe the present and predict the future. We present an encoder–decoder approach using Recurrent Neural Networks to learn the temporal structures of videos and introduce a dual-channel ranking loss to answer multiple-choice questions. We explore approaches for finer understanding of video content using the question form of “fill-in-the-blank”, and collect our Video Context QA dataset consisting of 109,895 video clips with a total duration of more than 1000 h from existing TACoS, MPII-MD and MEDTest 14 datasets. In addition, 390,744 corresponding questions are generated from annotations. Extensive experiments demonstrate that our approach significantly outperforms the compared baselines.

引用

页码：409 / 421

页数：12

共 50 条

[41] Research Progress of Video Question Answering Technologies
Bao, Cuizhu
Ding, Kai
Dong, Jianfeng
Yang, Xun
Xie, Mande
Wang, Xun
[J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2024, 61 (03): : 639 - 673
[42] Multichannel Attention Refinement for Video Question Answering
Zhuang, Yueting
Xu, Dejing
Yan, Xin
Cheng, Wenzhuo
Zhao, Zhou
Pu, Shiliang
Xiao, Jun
[J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (01)
[43] VQuAD: Video Question Answering Diagnostic Dataset
Gupta, Vivek
Patro, Badri N.
Parihar, Hemant
Namboodiri, Vinay P.
[J]. 2022 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2022), 2022, : 282 - 291
[44] Contrastive Video Question Answering via Video Graph Transformer
Xiao, Junbin
Zhou, Pan
Yao, Angela
Li, Yicong
Hong, Richang
Yan, Shuicheng
Chua, Tat-Seng
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13265 - 13280
[45] CSA-BERT: Video Question Answering
Jenni, Kommineni
Srinivas, M.
Sannapu, Roshni
Perumal, Murukessan
[J]. 2023 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP, SSP, 2023, : 532 - 536
[46] Remember and forget: video and text fusion for video question answering
Gao, Feng
Ge, Yuanyuan
Liu, Yongge
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (22) : 29269 - 29282
[47] Video Question Answering With Semantic Disentanglement and Reasoning
Liu, Jin
Wang, Guoxiang
Xie, Jialong
Zhou, Fengyu
Xu, Huijuan
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3663 - 3673
[48] Embedding VLAD in Transformer for Video Question Answering
Guo, Dan
Yao, Shen-Tao
Wang, Hui
Wang, Meng
[J]. Jisuanji Xuebao/Chinese Journal of Computers, 2023, 46 (04): : 671 - 689
[49] Measuring Compositional Consistency for Video Question Answering
Gandhi, Mona
Gul, Mustafa Omer
Prakash, Eva
Grunde-McLaughlin, Madeleine
Krishna, Ranjay
Agrawala, Maneesh
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5036 - 5045
[50] Complementary spatiotemporal network for video question answering
Li, Xinrui
Wu, Aming
Han, Yahong
[J]. MULTIMEDIA SYSTEMS, 2022, 28 (01) : 161 - 169

← 1 2 3 4 5 →