Target Adaptive Context Aggregation for Video Scene Graph Generation

被引:36
|
作者
Teng, Yao [1 ]
Wang, Limin [1 ]
Li, Zhifeng [2 ]
Wu, Gangshan [1 ]
机构
[1] Nanjing Univ China, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Tencent AI Lab, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV48922.2021.01343
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper deals with a challenging task of video scene graph generation (VidSGG), which could serve as a structured video representation for high-level understanding tasks. We present a new detect-to-track paradigm for this task by decoupling the context modeling for relation prediction from the complicated low-level entity tracking. Specifically, we design an efficient method for frame-level VidSGG, termed as Target Adaptive Context Aggregation Network (TRACE), with a focus on capturing spatio-temporal context information for relation recognition. Our TRACE framework streamlines the VidSGG pipeline with a modular design, and presents two unique blocks of Hierarchical Relation Tree (HRTree) construction and Target-adaptive Context Aggregation. More specific, our HRTree first provides an adpative structure for organizing possible relation candidates efficiently, and guides context aggregation module to effectively capture spatio-temporal structure information. Then, we obtain a contextualized feature representation for each relation candidate and build a classification head to recognize its relation category. Finally, we provide a simple temporal association strategy to track TRACE detected results to yield the video-level VidSGG. We perform experiments on two VidSGG benchmarks: ImageNet-VidVRD and Action Genome, and the results demonstrate that our TRACE achieves the state-of-the-art performance. The code and models are made available at https://github.com/MCG-NJU/TRACE.
引用
收藏
页码:13668 / 13677
页数:10
相关论文
共 50 条
  • [41] Multimodal graph inference network for scene graph generation
    Duan, Jingwen
    Min, Weidong
    Lin, Deyu
    Xu, Jianfeng
    Xiong, Xin
    APPLIED INTELLIGENCE, 2021, 51 (12) : 8768 - 8783
  • [42] Graph R-CNN for Scene Graph Generation
    Yang, Jianwei
    Lu, Jiasen
    Lee, Stefan
    Batra, Dhruv
    Parikh, Devi
    COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 690 - 706
  • [43] Scene Graph Generation: A comprehensive survey
    Li, Hongsheng
    Zhu, Guangming
    Zhang, Liang
    Jiang, Youliang
    Dang, Yixuan
    Hou, Haoran
    Shen, Peiyi
    Zhao, Xia
    Shah, Syed Afaq Ali
    Bennamoun, Mohammed
    NEUROCOMPUTING, 2024, 566
  • [44] Relation Regularized Scene Graph Generation
    Guo, Yuyu
    Gao, Lianli
    Song, Jingkuan
    Wang, Peng
    Sebe, Nicu
    Shen, Heng Tao
    Li, Xuelong
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (07) : 5961 - 5972
  • [45] Unbiased Scene Graph Generation in Videos
    Nag, Sayak
    Min, Kyle
    Tripathi, Subama
    Roy-Chowdhury, Amit K.
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22803 - 22813
  • [46] Fully Convolutional Scene Graph Generation
    Liu, Hengyue
    Yan, Ning
    Mortazavi, Masood
    Bhanu, Bir
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11541 - 11551
  • [47] Review on scene graph generation methods
    Monesh, S.
    Senthilkumar, N. C.
    MULTIAGENT AND GRID SYSTEMS, 2024, 20 (02) : 129 - 160
  • [48] Dynamic Scene Graph Representation for Surgical Video
    Holm, Felix
    Ghazaei, Ghazal
    Czempiel, Tobias
    Oezsoy, Ege
    Saur, Stefan
    Navab, Nassir
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 81 - 87
  • [49] Video summarization and scene detection by graph modeling
    Ngo, CW
    Ma, YF
    Zhang, HJ
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2005, 15 (02) : 296 - 305
  • [50] Scene Video Text Tracking With Graph Matching
    Pei, Wei-Yi
    Yang, Chun
    Meng, Li-Yu
    Hou, Jie-Bo
    Tian, Shu
    Yin, Xu-Cheng
    IEEE ACCESS, 2018, 6 : 19419 - 19426