Sigmoid Loss for Language Image Pre-Training

被引:24
|
作者
Zhai, Xiaohua [1 ]
Mustafa, Basil [1 ]
Kolesnikov, Alexander [1 ]
Beyer, Lucas [1 ]
机构
[1] Google DeepMind Zurich, Zurich, Switzerland
来源
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年
关键词
D O I
10.1109/ICCV51070.2023.01100
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a simple pairwise sigmoid loss for imagetext pre-training. Unlike standard contrastive learning with softmax normalization, the sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. The sigmoid loss simultaneously allows further scaling up the batch size, while also performing better at smaller batch sizes. With only four TPUv4 chips, we can train a Base CLIP model at 4 k batch size and a Large LiT model at 20 k batch size, the latter achieves 84.5% ImageNet zero-shot accuracy in two days. This disentanglement of the batch size from the loss further allows us to study the impact of examples vs pairs and negative to positive ratio. Finally, we push the batch size to the extreme, up to one million, and find that the benefits of growing batch size quickly diminish, with a more reasonable batch size of 32 k being sufficient. We hope our research motivates further explorations in improving the quality and efficiency of language-image pre-training.
引用
收藏
页码:11941 / 11952
页数:12
相关论文
共 50 条
  • [1] Grounded Language-Image Pre-training
    Li, Liunian Harold
    Zhang, Pengchuan
    Zhang, Haotian
    Yang, Jianwei
    Li, Chunyuan
    Zhong, Yiwu
    Wang, Lijuan
    Yuan, Lu
    Zhang, Lei
    Hwang, Jenq-Neng
    Chang, Kai-Wei
    Gao, Jianfeng
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10955 - 10965
  • [2] Centered Masking for Language-Image Pre-training
    Liang, Mingliang
    Larson, Martha
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES-RESEARCH TRACK AND DEMO TRACK, PT VIII, ECML PKDD 2024, 2024, 14948 : 90 - 106
  • [3] Scaling Language-Image Pre-training via Masking
    Li, Yanghao
    Fan, Haoqi
    Hu, Ronghang
    Feichtenhofert, Christoph
    He, Kaiming
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23390 - 23400
  • [4] Conditional Embedding Pre-Training Language Model for Image Captioning
    Li, Pengfei
    Zhang, Min
    Lin, Peijie
    Wan, Jian
    Jiang, Ming
    NEURAL PROCESSING LETTERS, 2022, 54 (06) : 4987 - 5003
  • [5] Conditional Embedding Pre-Training Language Model for Image Captioning
    Pengfei Li
    Min Zhang
    Peijie Lin
    Jian Wan
    Ming Jiang
    Neural Processing Letters, 2022, 54 : 4987 - 5003
  • [6] DreamLIP: Language-Image Pre-training with Long Captions
    Zheng, Kecheng
    Zhang, Yifei
    Wu, Wei
    Lu, Fan
    Ma, Shuailei
    Jin, Xin
    Chen, Wei
    Shen, Yujun
    COMPUTER VISION-ECCV 2024, PT XVIII, 2025, 15076 : 73 - 90
  • [7] Contrastive Language-Image Pre-Training with Knowledge Graphs
    Pan, Xuran
    Ye, Tianzhu
    Han, Dongchen
    Song, Shiji
    Huang, Gao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [8] NLIP: Noise-Robust Language-Image Pre-training
    Huang, Runhui
    Long, Yanxin
    Han, Jianhua
    Xu, Hang
    Liang, Xiwen
    Xu, Chunjing
    Liang, Xiaodan
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 926 - 934
  • [9] UniCLIP: Unified Framework for Contrastive Language-Image Pre-training
    Lee, Janghyeon
    Kim, Jongsuk
    Shon, Hyounguk
    Kim, Bumsoo
    Kim, Seung Hwan
    Lee, Honglak
    Kim, Junmo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [10] Enhancing Dynamic Image Advertising with Vision-Language Pre-training
    Wen, Zhoufutu
    Zhao, Xinyu
    Jin, Zhipeng
    Yang, Yi
    Jia, Wei
    Chen, Xiaodong
    Li, Shuanglong
    Liu, Lin
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 3310 - 3314