Dynamic Multistep Reasoning based on Video Scene Graph for Video Question Answering

被引:0
|
作者
Mao, Jianguo [1 ,2 ]
Jiang, Wenbin [3 ]
Wang, Xiangdong [1 ]
Feng, Zhifan [3 ]
Lyu, Yajuan [3 ]
Liu, Hong [1 ]
Zhu, Yong [3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing Key Lab Mobile Comp & Pervas Device, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Baidu Inc, Beijing, Peoples R China
基金
北京市自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing video question answering (video QA) models lack the capacity for deep video understanding and flexible multistep reasoning. We propose for video QA a novel model which performs dynamic multistep reasoning between questions and videos. It creates video semantic representation based on the video scene graph composed of semantic elements of the video and semantic relations among these elements. Then, it performs multistep reasoning for better answer decision between the representations of the question and the video, and dynamically integrate the reasoning results. Experiments show the significant advantage of the proposed model against previous methods in accuracy and interpretability. Against the existing state-of-the-art model, the proposed model dramatically improves more than 4%/3.1%/2% on the three widely used video QA datasets, MSRVTT-QA, MSRVTT multi-choice, and TGIF-QA, and displays better interpretability by backtracing along with the attention mechanisms to the video scene graphs.
引用
收藏
页码:3894 / 3904
页数:11
相关论文
共 50 条
  • [1] Multimodal Graph Reasoning and Fusion for Video Question Answering
    Zhang, Shuai
    Wang, Xingfu
    Hawbani, Ammar
    Zhao, Liang
    Alsamhi, Saeed Hamood
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 1410 - 1415
  • [2] Reasoning with Heterogeneous Graph Alignment for Video Question Answering
    Jiang, Pin
    Han, Yahong
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11109 - 11116
  • [3] Video Graph Transformer for Video Question Answering
    Xiao, Junbin
    Zhou, Pan
    Chua, Tat-Seng
    Yan, Shuicheng
    [J]. COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 39 - 58
  • [4] DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering
    Wang, Jianyu
    Bao, Bing-Kun
    Xu, Changsheng
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 24 : 3369 - 3380
  • [5] Contrastive Video Question Answering via Video Graph Transformer
    Xiao, Junbin
    Zhou, Pan
    Yao, Angela
    Li, Yicong
    Hong, Richang
    Yan, Shuicheng
    Chua, Tat-Seng
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13265 - 13280
  • [6] Video Question Answering With Semantic Disentanglement and Reasoning
    Liu, Jin
    Wang, Guoxiang
    Xie, Jialong
    Zhou, Fengyu
    Xu, Huijuan
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3663 - 3673
  • [7] ReGR: Relation-aware graph reasoning framework for video question answering
    Wang, Zheng
    Li, Fangtao
    Ota, Kaoru
    Dong, Mianxiong
    Wu, Bin
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (04)
  • [8] Event Graph Guided Compositional Spatial--Temporal Reasoning for Video Question Answering
    Bai, Ziyi
    Wang, Ruiping
    Gao, Difei
    Chen, Xilin
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1109 - 1121
  • [9] Visual Causal Scene Refinement for Video Question Answering
    Wei, Yushen
    Liu, Yang
    Yan, Hong
    Li, Guanbin
    Lin, Liang
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 377 - 386
  • [10] Neural Reasoning, Fast and Slow, for Video Question Answering
    Thao Minh Le
    Vuong Le
    Venkatesh, Svetha
    Truyen Tran
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,