Meta Spatio-Temporal Debiasing for Video Scene Graph Generation

被引:15
|
作者
Xu, Li [1 ]
Qu, Haoxuan [1 ]
Kuen, Jason [2 ]
Gu, Jiuxiang [2 ]
Liu, Jun [1 ]
机构
[1] Singapore Univ Technol & Design, Singapore, Singapore
[2] Adobe Res, San Jose, CA USA
来源
基金
新加坡国家研究基金会;
关键词
VidSGG; Long-tailed bias; Meta learning;
D O I
10.1007/978-3-031-19812-0_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video scene graph generation (VidSGG) aims to parse the video content into scene graphs, which involves modeling the spatio-temporal contextual information in the video. However, due to the long-tailed training data in datasets, the generalization performance of existing VidSGG models can be affected by the spatio-temporal conditional bias problem. In this work, from the perspective of meta-learning, we propose a novel Meta Video Scene Graph Generation (MVSGG) framework to address such a bias problem. Specifically, to handle various types of spatio-temporal conditional biases, our framework first constructs a support set and a group of query sets from the training data, where the data distribution of each query set is different from that of the support set w.r.t. a type of conditional bias. Then, by performing a novel meta training and testing process to optimize the model to obtain good testing performance on these query sets after training on the support set, our framework can effectively guide the model to learn to well generalize against biases. Extensive experiments demonstrate the efficacy of our proposed framework.
引用
收藏
页码:374 / 390
页数:17
相关论文
共 50 条
  • [21] VIDEO ACTION RECOGNITION WITH SPATIO-TEMPORAL GRAPH EMBEDDING AND SPLINE MODELING
    Yuan, Yin
    Zheng, Haomian
    Li, Zhu
    Zhang, David
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 2422 - 2425
  • [22] Spatio-Temporal Scene Analysis Based on Graph Algorithms to Determine Rigid and Articulated Objects
    Kieneke, Stephan
    Steffens, Markus
    Aufderheide, Dominik
    Krybus, Werner
    Kohring, Christine
    Morton, Danny
    COMPUTER VISION/COMPUTER GRAPHICS COLLABORATION TECHNIQUES, PROCEEDINGS, 2009, 5496 : 254 - +
  • [23] Experience Graph using Spatio-Temporal Scene Data for Replaying Mixed Reality Interaction
    Kim, Seonji
    2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS, VRW 2024, 2024, : 1112 - 1113
  • [24] A Graph Model for Spatio-temporal Evolution
    Del Mondo, Geraldine
    Stell, John G.
    Claramunt, Christophe
    Thibaud, Remy
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2010, 16 (11) : 1452 - 1477
  • [25] Spatio-Temporal Action Graph Networks
    Herzig, Roei
    Levi, Elad
    Xu, Huijuan
    Gao, Hang
    Brosh, Eli
    Wang, Xiaolong
    Globerson, Amir
    Darrell, Trevor
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2347 - 2356
  • [26] 3D SPATIO-TEMPORAL GRAPH CUTS FOR VIDEO OBJECTS SEGMENTATION
    Tian, Zhiqiang
    Xue, Jianru
    Zheng, Nanning
    Lan, Xuguang
    Li, Ce
    2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2011,
  • [27] Global-local spatio-temporal graph convolutional networks for video summarization
    Wu, Guangli
    Song, Shanshan
    Zhang, Jing
    COMPUTERS & ELECTRICAL ENGINEERING, 2024, 118
  • [28] Video spatio-temporal generative adversarial network for local action generation
    Liu, Xuejun
    Guo, Jiacheng
    Cui, Zhongji
    Liu, Ling
    Yan, Yong
    Sha, Yun
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (05)
  • [29] Video action detection by learning graph-based spatio-temporal interactions
    Tomei, Matteo
    Baraldi, Lorenzo
    Calderara, Simone
    Bronzin, Simone
    Cucchiara, Rita
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 206
  • [30] Spatio-Temporal Graph-based Semantic Compositional Network for Video Captioning
    Li, Shun
    Zhang, Ze-Fan
    Ji, Yi
    Li, Ying
    Liu, Chun-Ping
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,