Target Adaptive Context Aggregation for Video Scene Graph Generation

被引：36

作者：

Teng, Yao ^{[1
]}

Wang, Limin ^{[1
]}

Li, Zhifeng ^{[2
]}

Wu, Gangshan ^{[1
]}

机构：

[1] Nanjing Univ China, State Key Lab Novel Software Technol, Nanjing, Peoples R China

[2] Tencent AI Lab, Shenzhen, Peoples R China

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/ICCV48922.2021.01343

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper deals with a challenging task of video scene graph generation (VidSGG), which could serve as a structured video representation for high-level understanding tasks. We present a new detect-to-track paradigm for this task by decoupling the context modeling for relation prediction from the complicated low-level entity tracking. Specifically, we design an efficient method for frame-level VidSGG, termed as Target Adaptive Context Aggregation Network (TRACE), with a focus on capturing spatio-temporal context information for relation recognition. Our TRACE framework streamlines the VidSGG pipeline with a modular design, and presents two unique blocks of Hierarchical Relation Tree (HRTree) construction and Target-adaptive Context Aggregation. More specific, our HRTree first provides an adpative structure for organizing possible relation candidates efficiently, and guides context aggregation module to effectively capture spatio-temporal structure information. Then, we obtain a contextualized feature representation for each relation candidate and build a classification head to recognize its relation category. Finally, we provide a simple temporal association strategy to track TRACE detected results to yield the video-level VidSGG. We perform experiments on two VidSGG benchmarks: ImageNet-VidVRD and Action Genome, and the results demonstrate that our TRACE achieves the state-of-the-art performance. The code and models are made available at https://github.com/MCG-NJU/TRACE.

引用

页码：13668 / 13677

页数：10

共 50 条

[31] Unconditional Scene Graph Generation
Garg, Sarthak
Dhamo, Helisa
Farshad, Azade
Musatian, Sabrina
Navab, Nassir
Tombari, Federico
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 16342 - 16351
[32] Iterative Scene Graph Generation
Khandelwal, Siddhesh
Sigal, Leonid
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[33] Panoptic Scene Graph Generation
Yang, Jingkang
Ang, Yi Zhe
Guo, Zujin
Zhou, Kaiyang
Zhang, Wayne
Liu, Ziwei
COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 178 - 196
[34] Video scene clustering by graph partitioning
Tan, YP
Lu, H
ELECTRONICS LETTERS, 2003, 39 (11) : 841 - 842
[35] Graph neural network for fraud detection via context encoding and adaptive aggregation
Lou, Chaoli
Wang, Yueyang
Li, Jianing
Qian, Yueru
Li, Xiuhua
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 261
[36] Triple Correlations-Guided Label Supplementation for Unbiased Video Scene Graph Generation
Wang, Wenqing
Gao, Kaifeng
Luo, Yawei
Jiang, Tao
Gao, Fei
Shao, Jian
Sun, Jianwen
Xiao, Jun
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5153 - 5163
[37] Video event description in scene context
Liu, Chunmei
Hu, Changbo
Liu, Qingshan
Aggarwal, J. K.
NEUROCOMPUTING, 2013, 119 : 82 - 93
[38] Beware of Overcorrection: Scene-induced Commonsense Graph for Scene Graph Generation
Chen, Lianggangxu
Lu, Jiale
Song, Youqi
Wang, Changbo
He, Gaoqi
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2888 - 2897
[39] Context-aware Scene Graph Generation with Seq2Seq Transformers
Lu, Yichao
Rai, Himanshu
Chang, Jason
Knyazev, Boris
Yu, Guangwei
Shekhar, Shashank
Taylor, Graham W.
Volkovs, Maksims
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 15911 - 15921
[40] Multimodal graph inference network for scene graph generation
Jingwen Duan
Weidong Min
Deyu Lin
Jianfeng Xu
Xin Xiong
Applied Intelligence, 2021, 51 : 8768 - 8783

← 1 2 3 4 5 →