Dynamic Multistep Reasoning based on Video Scene Graph for Video Question Answering

被引：0

作者：

Mao, Jianguo ^{[1
,2
]}

Jiang, Wenbin ^{[3
]}

Wang, Xiangdong ^{[1
]}

Feng, Zhifan ^{[3
]}

Lyu, Yajuan ^{[3
]}

Liu, Hong ^{[1
]}

Zhu, Yong ^{[3
]}

机构：

[1] Chinese Acad Sci, Inst Comp Technol, Beijing Key Lab Mobile Comp & Pervas Device, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

[3] Baidu Inc, Beijing, Peoples R China

来源：

NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES | 2022年

基金：

北京市自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing video question answering (video QA) models lack the capacity for deep video understanding and flexible multistep reasoning. We propose for video QA a novel model which performs dynamic multistep reasoning between questions and videos. It creates video semantic representation based on the video scene graph composed of semantic elements of the video and semantic relations among these elements. Then, it performs multistep reasoning for better answer decision between the representations of the question and the video, and dynamically integrate the reasoning results. Experiments show the significant advantage of the proposed model against previous methods in accuracy and interpretability. Against the existing state-of-the-art model, the proposed model dramatically improves more than 4%/3.1%/2% on the three widely used video QA datasets, MSRVTT-QA, MSRVTT multi-choice, and TGIF-QA, and displays better interpretability by backtracing along with the attention mechanisms to the video scene graphs.

引用

页码：3894 / 3904

页数：11

共 50 条

[21] Explore Multi-Step Reasoning in Video Question Answering
Song, Xiaomeng
Shi, Yucheng
Chen, Xin
Han, Yahong
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 239 - 247
[22] Collaborative Aware Bidirectional Semantic Reasoning for Video Question Answering
Wu, Xize
Wu, Jiasong
Zhu, Lei
Senhadji, Lotfi
Shu, Huazhong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (03) : 2074 - 2086
[23] Graph-Based Multi-Interaction Network for Video Question Answering
Gu, Mao
Zhao, Zhou
Jin, Weike
Hong, Richang
Wu, Fei
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 2758 - 2770
[24] Dynamic Reasoning with Language Model and Knowledge Graph for Question Answering
Lu, Yujie
Wu, Dean
Zhang, Yuhong
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT IV, 2024, 14807 : 441 - 455
[25] Cascade transformers with dynamic attention for video question answering
Jiang, Yimin
Yan, Tingfei
Yao, Mingze
Wang, Huibing
Liu, Wenzhe
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 242
[26] Video Reference: A Video Question Answering Engine
Gao, Lei
Li, Guangda
Zheng, Yan-Tao
Hong, Richang
Chua, Tat-Seng
ADVANCES IN MULTIMEDIA MODELING, PROCEEDINGS, 2010, 5916 : 799 - +
[27] Dynamic Scene Graph Representation for Surgical Video
Holm, Felix
Ghazaei, Ghazal
Czempiel, Tobias
Oezsoy, Ege
Saur, Stefan
Navab, Nassir
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 81 - 87
[28] Graphhopper: Multi-hop Scene Graph Reasoning for Visual Question Answering
Koner, Rajat
Li, Hang
Hildebrandt, Marcel
Das, Deepan
Tresp, Volker
Guennemann, Stephan
SEMANTIC WEB - ISWC 2021, 2021, 12922 : 111 - 127
[29] Differentiated Attention with Multi-modal Reasoning for Video Question Answering
Yao, Shentao
Li, Kun
Xing, Kun
Wu, Kewei
Xie, Zhao
Guo, Dan
2022 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, BIG DATA AND ALGORITHMS (EEBDA), 2022, : 525 - 530
[30] Inferential Knowledge-Enhanced Integrated Reasoning for Video Question Answering
Mao, Jianguo
Jiang, Wenbin
Liu, Hong
Wang, Xiangdong
Lyu, Yajuan
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13380 - 13388

← 1 2 3 4 5 →