Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation

被引：58

作者：

Dong, Xingning ^{[1
]}

Gan, Tian ^{[1
]}

Song, Xuemeng ^{[1
]}

Wu, Jianlong ^{[1
]}

Cheng, Yuan ^{[2
]}

Nie, Liqiang ^{[1
]}

机构：

[1] Shandong Univ, Jinan, Peoples R China

[2] Ant Grp, Hangzhou, Peoples R China

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年

基金：

中国国家自然科学基金;

关键词：

COMPRESSION;

D O I：

10.1109/CVPR52688.2022.01882

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Scene Graph Generation, which generally follows a regular encoder-decoder pipeline, aims to first encode the visual contents within the given image and then parse them into a compact summary graph. Existing SGG approaches generally not only neglect the insufficient modality fusion between vision and language, but also fail to provide informative predicates due to the biased relationship predictions, leading SGG far from practical. Towards this end, we first present a novel Stacked Hybrid-Attention network, which facilitates the intra-modal refinement as well as the intermodal interaction, to serve as the encoder. We then devise an innovative Group Collaborative Learning strategy to optimize the decoder. Particularly, based on the observation that the recognition capability of one classifier is limited towards an extremely unbalanced dataset, we first deploy a group of classifiers that are expert in distinguishing different subsets of classes, and then cooperatively optimize them from two aspects to promote the unbiased SGG. Experiments conducted on VG and GQA datasets demonstrate that, we not only establish a new state-of-the-art in the unbiased metric, but also nearly double the performance compared with two baselines. Our code is available at https://github.com/dongxingning/SHA-GCL-for-SGG.

引用

页码：19405 / 19414

页数：10

共 50 条

[21] Bipartite Graph Network with Adaptive Message Passing for Unbiased Scene Graph Generation
Li, Rongjie
Zhang, Songyang
Wan, Bo
He, Xuming
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11104 - 11114
[22] Semantic Diversity-Aware Prototype-Based Learning for Unbiased Scene Graph Generation
Jeon, Jaehyeong
Kim, Kibum
Yoon, Kanghoon
Park, Chanyoung
COMPUTER VISION - ECCV 2024, PT XXXVII, 2025, 15095 : 379 - 395
[23] Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation
Jeon, Jaehyeong
Kim, Kibum
Yoon, Kanghoon
Park, Chanyoung
arXiv,
[24] Local context attention learning for fine-grained scene graph generation
Zhu, Xuhan
Wang, Ruiping
Lan, Xiangyuan
Wang, Yaowei
PATTERN RECOGNITION, 2024, 156
[25] Heterogeneous Learning for Scene Graph Generation
He, Yunqing
Ren, Tongwei
Tang, Jinhui
Wu, Gangshan
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4704 - 4713
[26] Unbiased scene graph generation using the self-distillation method
Sun, Bo
Hao, Zhuo
Yu, Lejun
He, Jun
VISUAL COMPUTER, 2024, 40 (04): : 2381 - 2390
[27] Weakly-supervised Video Scene Graph Generation via Unbiased Cross-modal Learning
Wu, Ziyue
Gao, Junyu
Xu, Changsheng
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4574 - 4583
[28] Taking a Closer Look At Visual Relation: Unbiased Video Scene Graph Generation With Decoupled Label Learning
Wang, Wenqing
Luo, Yawei
Chen, Zhiqing
Jiang, Tao
Yang, Yi
Xiao, Jun
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5718 - 5728
[29] TEMPLATE-GUIDED DATA AUGMENTATION FOR UNBIASED SCENE GRAPH GENERATION
Zang, Yujie
Li, Yaochen
Cao, Luguang
Lu, Ruitao
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 3510 - 3514
[30] Knowledge-Enhanced Context Representation for Unbiased Scene Graph Generation
Wang, Yuanlong
Liu, Zhenqi
Zhang, Hu
Li, Ru
WEB AND BIG DATA, APWEB-WAIM 2024, PT I, 2024, 14961 : 248 - 263

← 1 2 3 4 5 →