Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

被引:1
|
作者
Yu, Qifan [1 ]
Li, Juncheng [1 ]
Wu, Yu [2 ]
Tang, Siliang [1 ]
Ji, Wei [3 ]
Zhuang, Yueting [1 ]
机构
[1] Zhejiang Univ, Hangzhou, Peoples R China
[2] Wuhan Univ, Wuhan, Peoples R China
[3] Natl Univ Singapore, Singapore, Singapore
基金
国家重点研发计划;
关键词
D O I
10.1109/ICCV51070.2023.01971
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene Graph Generation (SGG) aims to extract <subject, predicate, object> relationships in images for vision understanding. Although recent works have made steady progress on SGG, they still suffer long-tail distribution issues that tail-predicates are more costly to train and hard to distinguish due to a small amount of annotated data compared to frequent predicates. Existing re-balancing strategies try to handle it via prior rules but are still confined to pre-defined conditions, which are not scalable for various models and datasets. In this paper, we propose a Cross-modal prediCate boosting (CaCao) framework, where a visually-prompted language model is learned to generate diverse fine-grained predicates in a low-resource way. The proposed CaCao can be applied in a plug-and-play fashion and automatically strengthen existing SGG to tackle the long-tailed problem. Based on that, we further introduce a novel Entangled cross-modal prompt approach for open-world predicate scene graph generation (Epic), where models can generalize to unseen predicates in a zero-shot manner. Comprehensive experiments on three benchmark datasets show that CaCao consistently boosts the performance of multiple scene graph generation models in a model-agnostic way. Moreover, our Epic achieves competitive performance on open-world predicate prediction. The data and code for this paper are publicly available.(1)
引用
收藏
页码:21503 / 21514
页数:12
相关论文
共 50 条
  • [1] Fine-Grained Scene Graph Generation with Data Transfer
    Zhang, Ao
    Yao, Yuan
    Chen, Qianyu
    Ji, Wei
    Liu, Zhiyuan
    Sun, Maosong
    Chua, Tat-Seng
    COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 409 - 424
  • [2] Fine-Grained Predicates Learning for Scene Graph Generation
    Lyu, Xinyu
    Gao, Lianli
    Guo, Yuyu
    Zhao, Zhou
    Huang, Hao
    Shen, Heng Tao
    Song, Jingkuan
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19445 - 19453
  • [3] Adaptive Fine-Grained Predicates Learning for Scene Graph Generation
    Lyu, Xinyu
    Gao, Lianli
    Zeng, Pengpeng
    Shen, Heng Tao
    Song, Jingkuan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13921 - 13940
  • [4] Hierarchical Memory Learning for Fine-Grained Scene Graph Generation
    Deng, Youming
    Li, Yansheng
    Zhang, Yongjun
    Xiang, Xiang
    Wang, Jian
    Chen, Jingdong
    Ma, Jiayi
    COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 266 - 283
  • [5] Fine-Grained Scene Graph Generation with Overlap Region and Geometrical Center
    Zhao, Y. Q.
    Jin, Z.
    Zhao, H. Y.
    Zhang, F.
    Tao, Z. W.
    Dou, C. F.
    Xu, X. H.
    Liu, D. H.
    COMPUTER GRAPHICS FORUM, 2022, 41 (07) : 359 - 370
  • [6] Local context attention learning for fine-grained scene graph generation
    Zhu, Xuhan
    Wang, Ruiping
    Lan, Xiangyuan
    Wang, Yaowei
    PATTERN RECOGNITION, 2024, 156
  • [7] Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction
    Li, Yansheng
    Wang, Tingzhu
    Wu, Kang
    Wang, Linlin
    Guo, Xin
    Wang, Wenbin
    COMPUTER VISION - ECCV 2024, PT XXVI, 2025, 15084 : 18 - 35
  • [8] Environment-Invariant Curriculum Relation Learning for Fine-Grained Scene Graph Generation
    Min, Yukuan
    Wu, Aming
    Deng, Cheng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13250 - 13261
  • [9] Fine-Grained Language Identification in Scene Text Images
    Li, Yongrui
    Wu, Shilian
    Yu, Jun
    Wang, Zengfu
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4573 - 4581
  • [10] KLMo: Knowledge Graph Enhanced Pretrained Language Model with Fine-Grained Relationships
    He, Lei
    Zheng, Suncong
    Yang, Tao
    Zhang, Feng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 4536 - 4542