Target Adaptive Context Aggregation for Video Scene Graph Generation

被引：36

作者：

Teng, Yao ^{[1
]}

Wang, Limin ^{[1
]}

Li, Zhifeng ^{[2
]}

Wu, Gangshan ^{[1
]}

机构：

[1] Nanjing Univ China, State Key Lab Novel Software Technol, Nanjing, Peoples R China

[2] Tencent AI Lab, Shenzhen, Peoples R China

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/ICCV48922.2021.01343

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper deals with a challenging task of video scene graph generation (VidSGG), which could serve as a structured video representation for high-level understanding tasks. We present a new detect-to-track paradigm for this task by decoupling the context modeling for relation prediction from the complicated low-level entity tracking. Specifically, we design an efficient method for frame-level VidSGG, termed as Target Adaptive Context Aggregation Network (TRACE), with a focus on capturing spatio-temporal context information for relation recognition. Our TRACE framework streamlines the VidSGG pipeline with a modular design, and presents two unique blocks of Hierarchical Relation Tree (HRTree) construction and Target-adaptive Context Aggregation. More specific, our HRTree first provides an adpative structure for organizing possible relation candidates efficiently, and guides context aggregation module to effectively capture spatio-temporal structure information. Then, we obtain a contextualized feature representation for each relation candidate and build a classification head to recognize its relation category. Finally, we provide a simple temporal association strategy to track TRACE detected results to yield the video-level VidSGG. We perform experiments on two VidSGG benchmarks: ImageNet-VidVRD and Action Genome, and the results demonstrate that our TRACE achieves the state-of-the-art performance. The code and models are made available at https://github.com/MCG-NJU/TRACE.

引用

页码：13668 / 13677

页数：10

共 50 条

[41] Multimodal graph inference network for scene graph generation
Duan, Jingwen
Min, Weidong
Lin, Deyu
Xu, Jianfeng
Xiong, Xin
APPLIED INTELLIGENCE, 2021, 51 (12) : 8768 - 8783
[42] Graph R-CNN for Scene Graph Generation
Yang, Jianwei
Lu, Jiasen
Lee, Stefan
Batra, Dhruv
Parikh, Devi
COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 690 - 706
[43] Scene Graph Generation: A comprehensive survey
Li, Hongsheng
Zhu, Guangming
Zhang, Liang
Jiang, Youliang
Dang, Yixuan
Hou, Haoran
Shen, Peiyi
Zhao, Xia
Shah, Syed Afaq Ali
Bennamoun, Mohammed
NEUROCOMPUTING, 2024, 566
[44] Relation Regularized Scene Graph Generation
Guo, Yuyu
Gao, Lianli
Song, Jingkuan
Wang, Peng
Sebe, Nicu
Shen, Heng Tao
Li, Xuelong
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (07) : 5961 - 5972
[45] Unbiased Scene Graph Generation in Videos
Nag, Sayak
Min, Kyle
Tripathi, Subama
Roy-Chowdhury, Amit K.
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22803 - 22813
[46] Fully Convolutional Scene Graph Generation
Liu, Hengyue
Yan, Ning
Mortazavi, Masood
Bhanu, Bir
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11541 - 11551
[47] Review on scene graph generation methods
Monesh, S.
Senthilkumar, N. C.
MULTIAGENT AND GRID SYSTEMS, 2024, 20 (02) : 129 - 160
[48] Dynamic Scene Graph Representation for Surgical Video
Holm, Felix
Ghazaei, Ghazal
Czempiel, Tobias
Oezsoy, Ege
Saur, Stefan
Navab, Nassir
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 81 - 87
[49] Video summarization and scene detection by graph modeling
Ngo, CW
Ma, YF
Zhang, HJ
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2005, 15 (02) : 296 - 305
[50] Scene Video Text Tracking With Graph Matching
Pei, Wei-Yi
Yang, Chun
Meng, Li-Yu
Hou, Jie-Bo
Tian, Shu
Yin, Xu-Cheng
IEEE ACCESS, 2018, 6 : 19419 - 19426

← 1 2 3 4 5 →