Contrastive Language-Image Pre-Training with Knowledge Graphs

被引:0
|
作者
Pan, Xuran [1 ]
Ye, Tianzhu [1 ]
Han, Dongchen [1 ]
Song, Shiji [1 ]
Huang, Gao [1 ]
机构
[1] Tsinghua Univ, Dept Automat, BNRist, Beijing, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent years have witnessed the fast development of large-scale pre-training frameworks that can extract multi-modal representations in a unified form and achieve promising performances when transferred to downstream tasks. Nevertheless, existing approaches mainly focus on pre-training with simple image-text pairs, while neglecting the semantic connections between concepts from different modalities. In this paper, we propose a knowledge-based pre-training framework, dubbed Knowledge-CLIP, which injects semantic information into the widely used CLIP model [38]. Through introducing knowledge-based objectives in the pre-training process and utilizing different types of knowledge graphs as training data, our model can semantically align the representations in vision and language with higher quality, and enhance the reasoning ability across scenarios and modalities. Extensive experiments on various vision-language downstream tasks demonstrate the effectiveness of Knowledge-CLIP compared with the original CLIP and competitive baselines.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] UniCLIP: Unified Framework for Contrastive Language-Image Pre-training
    Lee, Janghyeon
    Kim, Jongsuk
    Shon, Hyounguk
    Kim, Bumsoo
    Kim, Seung Hwan
    Lee, Honglak
    Kim, Junmo
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] ARCHICLIP Enhanced Contrastive Language-Image Pre-training Model With Architectural Prior Knowledge
    Xia, Shengtao
    Cheng, Yiming
    Tian, Runjia
    [J]. PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE OF THE ASSOCIATION FOR COMPUTER-AIDED ARCHITECTURAL DESIGN RESEARCH IN ASIA, CAADRIA 2024, VOL 1, 2024, : 69 - 78
  • [3] Non-Contrastive Learning Meets Language-Image Pre-Training
    Zhou, Jinghao
    Dong, Li
    Gan, Zhe
    Wang, Lijuan
    Wei, Furu
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11028 - 11038
  • [4] Grounded Language-Image Pre-training
    Li, Liunian Harold
    Zhang, Pengchuan
    Zhang, Haotian
    Yang, Jianwei
    Li, Chunyuan
    Zhong, Yiwu
    Wang, Lijuan
    Yuan, Lu
    Zhang, Lei
    Hwang, Jenq-Neng
    Chang, Kai-Wei
    Gao, Jianfeng
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10955 - 10965
  • [5] iCLIP: Bridging Image Classification and Contrastive Language-Image Pre-training for Visual Recognition
    Wei, Yixuan
    Cao, Yue
    Zhang, Zheng
    Peng, Houwen
    Yao, Zhuliang
    Xie, Zhenda
    Hue, Han
    Guo, Baining
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2776 - 2786
  • [6] Data Determines Distributional Robustness in Contrastive Language-Image Pre-training (CLIP)
    Fang, Alex
    Ilharco, Gabriel
    Wortsman, Mitchell
    Wan, Yuhao
    Shankar, Vaishaal
    Dave, Achal
    Schmidt, Ludwig
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [7] RA-CLIP: Retrieval Augmented Contrastive Language-Image Pre-training
    Xie, Chen-Wei
    Sun, Siyang
    Xiong, Xiong
    Zheng, Yun
    Zhao, Deli
    Zhou, Jingren
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19265 - 19274
  • [8] Robust Contrastive Language-Image Pre-training against Data Poisoning and Backdoor Attacks
    Yang, Wenhan
    Gao, Jingdong
    Mirzasoleiman, Baharan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] PMC-CLIP: Contrastive Language-Image Pre-training Using Biomedical Documents
    Lin, Weixiong
    Zhao, Ziheng
    Zhang, Xiaoman
    Wu, Chaoyi
    Zhang, Ya
    Wang, Yanfeng
    Xie, Weidi
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VIII, 2023, 14227 : 525 - 536
  • [10] Multimodal Hate Speech Detection in Memes Using Contrastive Language-Image Pre-Training
    Arya, Greeshma
    Hasan, Mohammad Kamrul
    Bagwari, Ashish
    Safie, Nurhizam
    Islam, Shayla
    Ahmed, Fatima Rayan Awad
    De, Aaishani
    Khan, Muhammad Attique
    Ghazal, Taher M.
    [J]. IEEE ACCESS, 2024, 12 : 22359 - 22375