Target Adaptive Context Aggregation for Video Scene Graph Generation

被引:36
|
作者
Teng, Yao [1 ]
Wang, Limin [1 ]
Li, Zhifeng [2 ]
Wu, Gangshan [1 ]
机构
[1] Nanjing Univ China, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Tencent AI Lab, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV48922.2021.01343
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper deals with a challenging task of video scene graph generation (VidSGG), which could serve as a structured video representation for high-level understanding tasks. We present a new detect-to-track paradigm for this task by decoupling the context modeling for relation prediction from the complicated low-level entity tracking. Specifically, we design an efficient method for frame-level VidSGG, termed as Target Adaptive Context Aggregation Network (TRACE), with a focus on capturing spatio-temporal context information for relation recognition. Our TRACE framework streamlines the VidSGG pipeline with a modular design, and presents two unique blocks of Hierarchical Relation Tree (HRTree) construction and Target-adaptive Context Aggregation. More specific, our HRTree first provides an adpative structure for organizing possible relation candidates efficiently, and guides context aggregation module to effectively capture spatio-temporal structure information. Then, we obtain a contextualized feature representation for each relation candidate and build a classification head to recognize its relation category. Finally, we provide a simple temporal association strategy to track TRACE detected results to yield the video-level VidSGG. We perform experiments on two VidSGG benchmarks: ImageNet-VidVRD and Action Genome, and the results demonstrate that our TRACE achieves the state-of-the-art performance. The code and models are made available at https://github.com/MCG-NJU/TRACE.
引用
收藏
页码:13668 / 13677
页数:10
相关论文
共 50 条
  • [31] Unconditional Scene Graph Generation
    Garg, Sarthak
    Dhamo, Helisa
    Farshad, Azade
    Musatian, Sabrina
    Navab, Nassir
    Tombari, Federico
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 16342 - 16351
  • [32] Iterative Scene Graph Generation
    Khandelwal, Siddhesh
    Sigal, Leonid
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [33] Panoptic Scene Graph Generation
    Yang, Jingkang
    Ang, Yi Zhe
    Guo, Zujin
    Zhou, Kaiyang
    Zhang, Wayne
    Liu, Ziwei
    COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 178 - 196
  • [34] Video scene clustering by graph partitioning
    Tan, YP
    Lu, H
    ELECTRONICS LETTERS, 2003, 39 (11) : 841 - 842
  • [35] Graph neural network for fraud detection via context encoding and adaptive aggregation
    Lou, Chaoli
    Wang, Yueyang
    Li, Jianing
    Qian, Yueru
    Li, Xiuhua
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 261
  • [36] Triple Correlations-Guided Label Supplementation for Unbiased Video Scene Graph Generation
    Wang, Wenqing
    Gao, Kaifeng
    Luo, Yawei
    Jiang, Tao
    Gao, Fei
    Shao, Jian
    Sun, Jianwen
    Xiao, Jun
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5153 - 5163
  • [37] Video event description in scene context
    Liu, Chunmei
    Hu, Changbo
    Liu, Qingshan
    Aggarwal, J. K.
    NEUROCOMPUTING, 2013, 119 : 82 - 93
  • [38] Beware of Overcorrection: Scene-induced Commonsense Graph for Scene Graph Generation
    Chen, Lianggangxu
    Lu, Jiale
    Song, Youqi
    Wang, Changbo
    He, Gaoqi
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2888 - 2897
  • [39] Context-aware Scene Graph Generation with Seq2Seq Transformers
    Lu, Yichao
    Rai, Himanshu
    Chang, Jason
    Knyazev, Boris
    Yu, Guangwei
    Shekhar, Shashank
    Taylor, Graham W.
    Volkovs, Maksims
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 15911 - 15921
  • [40] Multimodal graph inference network for scene graph generation
    Jingwen Duan
    Weidong Min
    Deyu Lin
    Jianfeng Xu
    Xin Xiong
    Applied Intelligence, 2021, 51 : 8768 - 8783