Target Adaptive Context Aggregation for Video Scene Graph Generation

被引:36
|
作者
Teng, Yao [1 ]
Wang, Limin [1 ]
Li, Zhifeng [2 ]
Wu, Gangshan [1 ]
机构
[1] Nanjing Univ China, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Tencent AI Lab, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV48922.2021.01343
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper deals with a challenging task of video scene graph generation (VidSGG), which could serve as a structured video representation for high-level understanding tasks. We present a new detect-to-track paradigm for this task by decoupling the context modeling for relation prediction from the complicated low-level entity tracking. Specifically, we design an efficient method for frame-level VidSGG, termed as Target Adaptive Context Aggregation Network (TRACE), with a focus on capturing spatio-temporal context information for relation recognition. Our TRACE framework streamlines the VidSGG pipeline with a modular design, and presents two unique blocks of Hierarchical Relation Tree (HRTree) construction and Target-adaptive Context Aggregation. More specific, our HRTree first provides an adpative structure for organizing possible relation candidates efficiently, and guides context aggregation module to effectively capture spatio-temporal structure information. Then, we obtain a contextualized feature representation for each relation candidate and build a classification head to recognize its relation category. Finally, we provide a simple temporal association strategy to track TRACE detected results to yield the video-level VidSGG. We perform experiments on two VidSGG benchmarks: ImageNet-VidVRD and Action Genome, and the results demonstrate that our TRACE achieves the state-of-the-art performance. The code and models are made available at https://github.com/MCG-NJU/TRACE.
引用
收藏
页码:13668 / 13677
页数:10
相关论文
共 50 条
  • [21] Scene Graph Generation Based on Node-Relation Context Module
    Lin, Xin
    Li, Yonggang
    Liu, Chunping
    Ji, Yi
    Yang, Jianyu
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT II, 2018, 11302 : 134 - 145
  • [22] PMP-NET: RETHINKING VISUAL CONTEXT FOR SCENE GRAPH GENERATION
    Tong, Xuezhi
    Wang, Rui
    Wang, Chuan
    Zhang, Sanyi
    Cao, Xiaochun
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1940 - 1944
  • [23] Knowledge-Enhanced Context Representation for Unbiased Scene Graph Generation
    Wang, Yuanlong
    Liu, Zhenqi
    Zhang, Hu
    Li, Ru
    WEB AND BIG DATA, APWEB-WAIM 2024, PT I, 2024, 14961 : 248 - 263
  • [24] Target-Tailored Source-Transformation for Scene Graph Generation
    Liao, Wentong
    Lan, Cuiling
    Yang, Michael Ying
    Zeng, Wenjun
    Rosenhahn, Bodo
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1663 - 1671
  • [25] Adaptive Fine-Grained Predicates Learning for Scene Graph Generation
    Lyu, Xinyu
    Gao, Lianli
    Zeng, Pengpeng
    Shen, Heng Tao
    Song, Jingkuan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13921 - 13940
  • [26] Guide and interact: scene-graph based generation and control of video captions
    Lu, Xuyang
    Gao, Yang
    MULTIMEDIA SYSTEMS, 2023, 29 (02) : 797 - 809
  • [27] Spatial–Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation
    Pu, Tao
    Chen, Tianshui
    Wu, Hefeng
    Lu, Yongyi
    Lin, Liang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 556 - 568
  • [28] End-to-End Video Scene Graph Generation With Temporal Propagation Transformer
    Zhang, Yong
    Pan, Yingwei
    Yao, Ting
    Huang, Rui
    Mei, Tao
    Chen, Chang-Wen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1613 - 1625
  • [29] Guide and interact: scene-graph based generation and control of video captions
    Xuyang Lu
    Yang Gao
    Multimedia Systems, 2023, 29 : 797 - 809
  • [30] Local context attention learning for fine-grained scene graph generation
    Zhu, Xuhan
    Wang, Ruiping
    Lan, Xiangyuan
    Wang, Yaowei
    PATTERN RECOGNITION, 2024, 156