Target Adaptive Context Aggregation for Video Scene Graph Generation

被引:36
|
作者
Teng, Yao [1 ]
Wang, Limin [1 ]
Li, Zhifeng [2 ]
Wu, Gangshan [1 ]
机构
[1] Nanjing Univ China, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Tencent AI Lab, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV48922.2021.01343
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper deals with a challenging task of video scene graph generation (VidSGG), which could serve as a structured video representation for high-level understanding tasks. We present a new detect-to-track paradigm for this task by decoupling the context modeling for relation prediction from the complicated low-level entity tracking. Specifically, we design an efficient method for frame-level VidSGG, termed as Target Adaptive Context Aggregation Network (TRACE), with a focus on capturing spatio-temporal context information for relation recognition. Our TRACE framework streamlines the VidSGG pipeline with a modular design, and presents two unique blocks of Hierarchical Relation Tree (HRTree) construction and Target-adaptive Context Aggregation. More specific, our HRTree first provides an adpative structure for organizing possible relation candidates efficiently, and guides context aggregation module to effectively capture spatio-temporal structure information. Then, we obtain a contextualized feature representation for each relation candidate and build a classification head to recognize its relation category. Finally, we provide a simple temporal association strategy to track TRACE detected results to yield the video-level VidSGG. We perform experiments on two VidSGG benchmarks: ImageNet-VidVRD and Action Genome, and the results demonstrate that our TRACE achieves the state-of-the-art performance. The code and models are made available at https://github.com/MCG-NJU/TRACE.
引用
收藏
页码:13668 / 13677
页数:10
相关论文
共 50 条
  • [1] Scene Adaptive Context Modeling and Balanced Relation Prediction for Scene Graph Generation
    Xu, Kai
    Wang, Lichun
    Li, Shuang
    Gao, Tong
    Yin, Baocai
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2025, 21 (03)
  • [2] Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation
    Jung, Gayoung
    Lee, Jonghun
    Kim, Incheol
    SENSORS, 2021, 21 (09)
  • [3] Panoptic Video Scene Graph Generation
    Yang, Jingkang
    Peng, Wenxuan
    Li, Xiangtai
    Guo, Zujin
    Chen, Liangyu
    Li, Bo
    Ma, Zheng
    Zhou, Kaiyang
    Zhang, Wayne
    Loy, Chen Change
    Liu, Ziwei
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18675 - 18685
  • [4] Scene Graph Generation With Hierarchical Context
    Ren, Guanghui
    Ren, Lejian
    Liao, Yue
    Liu, Si
    Li, Bo
    Han, Jizhong
    Yan, Shuicheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (02) : 909 - 915
  • [5] Multimodal Context Embedding for Scene Graph Generation
    Jung, Gayoung
    Kim, Incheol
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2020, 16 (06): : 1250 - 1260
  • [6] Adaptive Image-to-Video Scene Graph Generation via Knowledge Reasoning and Adversarial Learning
    Chen, Jin
    Ji, Xiaofeng
    Wu, Xinxiao
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 276 - 284
  • [7] Adaptive Feature Learning for Unbiased Scene Graph Generation
    Yang, Jiarui
    Wang, Chuan
    Yang, Liang
    Jiang, Yuchen
    Cao, Angelina
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 2252 - 2265
  • [8] Transformer networks with adaptive inference for scene graph generation
    Yini Wang
    Yongbin Gao
    Wenjun Yu
    Ruyan Guo
    Weibing Wan
    Shuqun Yang
    Bo Huang
    Applied Intelligence, 2023, 53 : 9621 - 9633
  • [9] Transformer networks with adaptive inference for scene graph generation
    Wang, Yini
    Gao, Yongbin
    Yu, Wenjun
    Guo, Ruyan
    Wan, Weibing
    Yang, Shuqun
    Huang, Bo
    APPLIED INTELLIGENCE, 2023, 53 (08) : 9621 - 9633
  • [10] HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding
    Trong-Thuan Nguyen
    Pha Nguyen
    Luu, Khoa
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 18384 - 18394