Uncovering the Temporal Context for Video Question Answering

被引：1

作者：

Linchao Zhu

Zhongwen Xu

Yi Yang

Alexander G. Hauptmann

机构：

[1] University of Technology Sydney,CAI

[2] Carnegie Mellon University,SCS

来源：

International Journal of Computer Vision | 2017年 / 124卷

关键词：

Video sequence modeling; Video question answering; Video prediction; Cross-media;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this work, we introduce Video Question Answering in the temporal domain to infer the past, describe the present and predict the future. We present an encoder–decoder approach using Recurrent Neural Networks to learn the temporal structures of videos and introduce a dual-channel ranking loss to answer multiple-choice questions. We explore approaches for finer understanding of video content using the question form of “fill-in-the-blank”, and collect our Video Context QA dataset consisting of 109,895 video clips with a total duration of more than 1000 h from existing TACoS, MPII-MD and MEDTest 14 datasets. In addition, 390,744 corresponding questions are generated from annotations. Extensive experiments demonstrate that our approach significantly outperforms the compared baselines.

引用

页码：409 / 421

页数：12

共 50 条

[1] Uncovering the Temporal Context for Video Question Answering
Zhu, Linchao
Xu, Zhongwen
Yang, Yi
Hauptmann, Alexander G.
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 124 (03) : 409 - 421
[2] Spatio-Temporal Context Networks for Video Question Answering
Gao, Kun
Han, Yahong
[J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT II, 2018, 10736 : 108 - 118
[3] Video -Context Aligned Transformer for Video Question Answering
Zong, Linlin
Wan, Jiahui
Zhang, Xianchao
Liu, Xinyue
Liang, Wenxin
Xu, Bo
[J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19795 - 19803
[4] Video Question Answering with Spatio-Temporal Reasoning
Jang, Yunseok
Song, Yale
Kim, Chris Dongjoo
Yu, Youngjae
Kim, Youngjin
Kim, Gunhee
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2019, 127 (10) : 1385 - 1412
[5] Video Question Answering with Spatio-Temporal Reasoning
Yunseok Jang
Yale Song
Chris Dongjoo Kim
Youngjae Yu
Youngjin Kim
Gunhee Kim
[J]. International Journal of Computer Vision, 2019, 127 : 1385 - 1412
[6] Discovering Spatio-Temporal Rationales for Video Question Answering
Li, Yicong
Xiao, Junbin
Feng, Chun
Wang, Xiang
Chua, Tat-Seng
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13823 - 13832
[7] Spatio-Temporal Graph Convolution Transformer for Video Question Answering
Tang, Jiahao
Hu, Jianguo
Huang, Wenjun
Shen, Shengzhi
Pan, Jiakai
Wang, Deming
Ding, Yanyu
[J]. IEEE Access, 2024, 12 : 131664 - 131680
[8] Dynamic Spatio-Temporal Modular Network for Video Question Answering
Qian, Zi
Wang, Xin
Duan, Xuguang
Chen, Hong
Zhu, Wenwu
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4466 - 4477
[9] Harnessing Representative Spatial-Temporal Information for Video Question Answering
Wang, Yuanyuan
Liu, Meng
Song, Xuemeng
Nie, Liqiang
[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2024, 20 (10)
[10] Affective question answering on video
Ruwa, Nelson
Mao, Qirong
Wang, Liangjun
Gou, Jianping
[J]. NEUROCOMPUTING, 2019, 363 : 125 - 139

← 1 2 3 4 5 →