Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation

被引:58
|
作者
Dong, Xingning [1 ]
Gan, Tian [1 ]
Song, Xuemeng [1 ]
Wu, Jianlong [1 ]
Cheng, Yuan [2 ]
Nie, Liqiang [1 ]
机构
[1] Shandong Univ, Jinan, Peoples R China
[2] Ant Grp, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
COMPRESSION;
D O I
10.1109/CVPR52688.2022.01882
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene Graph Generation, which generally follows a regular encoder-decoder pipeline, aims to first encode the visual contents within the given image and then parse them into a compact summary graph. Existing SGG approaches generally not only neglect the insufficient modality fusion between vision and language, but also fail to provide informative predicates due to the biased relationship predictions, leading SGG far from practical. Towards this end, we first present a novel Stacked Hybrid-Attention network, which facilitates the intra-modal refinement as well as the intermodal interaction, to serve as the encoder. We then devise an innovative Group Collaborative Learning strategy to optimize the decoder. Particularly, based on the observation that the recognition capability of one classifier is limited towards an extremely unbalanced dataset, we first deploy a group of classifiers that are expert in distinguishing different subsets of classes, and then cooperatively optimize them from two aspects to promote the unbiased SGG. Experiments conducted on VG and GQA datasets demonstrate that, we not only establish a new state-of-the-art in the unbiased metric, but also nearly double the performance compared with two baselines. Our code is available at https://github.com/dongxingning/SHA-GCL-for-SGG.
引用
收藏
页码:19405 / 19414
页数:10
相关论文
共 50 条
  • [21] Bipartite Graph Network with Adaptive Message Passing for Unbiased Scene Graph Generation
    Li, Rongjie
    Zhang, Songyang
    Wan, Bo
    He, Xuming
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11104 - 11114
  • [22] Semantic Diversity-Aware Prototype-Based Learning for Unbiased Scene Graph Generation
    Jeon, Jaehyeong
    Kim, Kibum
    Yoon, Kanghoon
    Park, Chanyoung
    COMPUTER VISION - ECCV 2024, PT XXXVII, 2025, 15095 : 379 - 395
  • [23] Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation
    Jeon, Jaehyeong
    Kim, Kibum
    Yoon, Kanghoon
    Park, Chanyoung
    arXiv,
  • [24] Local context attention learning for fine-grained scene graph generation
    Zhu, Xuhan
    Wang, Ruiping
    Lan, Xiangyuan
    Wang, Yaowei
    PATTERN RECOGNITION, 2024, 156
  • [25] Heterogeneous Learning for Scene Graph Generation
    He, Yunqing
    Ren, Tongwei
    Tang, Jinhui
    Wu, Gangshan
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4704 - 4713
  • [26] Unbiased scene graph generation using the self-distillation method
    Sun, Bo
    Hao, Zhuo
    Yu, Lejun
    He, Jun
    VISUAL COMPUTER, 2024, 40 (04): : 2381 - 2390
  • [27] Weakly-supervised Video Scene Graph Generation via Unbiased Cross-modal Learning
    Wu, Ziyue
    Gao, Junyu
    Xu, Changsheng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4574 - 4583
  • [28] Taking a Closer Look At Visual Relation: Unbiased Video Scene Graph Generation With Decoupled Label Learning
    Wang, Wenqing
    Luo, Yawei
    Chen, Zhiqing
    Jiang, Tao
    Yang, Yi
    Xiao, Jun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5718 - 5728
  • [29] TEMPLATE-GUIDED DATA AUGMENTATION FOR UNBIASED SCENE GRAPH GENERATION
    Zang, Yujie
    Li, Yaochen
    Cao, Luguang
    Lu, Ruitao
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 3510 - 3514
  • [30] Knowledge-Enhanced Context Representation for Unbiased Scene Graph Generation
    Wang, Yuanlong
    Liu, Zhenqi
    Zhang, Hu
    Li, Ru
    WEB AND BIG DATA, APWEB-WAIM 2024, PT I, 2024, 14961 : 248 - 263