Downstream Task-agnostic Transferable Attacks on Language-Image Pre-training Models

被引:0
|
作者
Lv, Yiqiang [1 ,2 ]
Chen, Jingjing [1 ,2 ]
Wei, Zhipeng [1 ,2 ]
Chen, Kai [1 ,2 ]
Wu, Zuxuan [1 ,2 ]
Jiang, Yu-Gang [1 ,2 ]
机构
[1] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Sch Comp Sci, Shanghai, Peoples R China
[2] Shanghai Collaborat Innovat Ctr Intelligent Visua, Shanghai, Peoples R China
关键词
Transfer-based Adversarial Attack; Taskagnostic; Visual-Language Pre-training model;
D O I
10.1109/ICME55011.2023.00481
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision-language pre-trained models (e.g., CLIP) trained on large-scale datasets via self-supervised learning, are drawing increasing research attention since they can achieve superior performances on multi-modal downstream tasks. Nevertheless, we find that the adversarial perturbations crafted on vision-language pre-trained models can be used to attack different corresponding downstream task models. Specifically, to investigate such adversarial transferability, we introduce a task-agnostic method named Global and Local Augmentation (GLA) attack to generate highly transferable adversarial examples on CLIP, to attack black-box downstream task models. GLA adopts random crop and resize at both global and local patch levels, to create more diversity and make adversarial noises robust. Then GLA generates the adversarial perturbations by minimizing the cosine similarity between intermediate features from augmented adversarial and benign examples. Extensive experiments on three CLIP image encoders with different backbones and three different downstream tasks demonstrate the superiority of our method compared with other strong baselines. The code is available at https://github.com/yqlvcoding/GLAattack.
引用
收藏
页码:2831 / 2836
页数:6
相关论文
共 50 条
  • [1] Grounded Language-Image Pre-training
    Li, Liunian Harold
    Zhang, Pengchuan
    Zhang, Haotian
    Yang, Jianwei
    Li, Chunyuan
    Zhong, Yiwu
    Wang, Lijuan
    Yuan, Lu
    Zhang, Lei
    Hwang, Jenq-Neng
    Chang, Kai-Wei
    Gao, Jianfeng
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10955 - 10965
  • [2] VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
    Xu, Hu
    Ghosh, Gargi
    Huang, Po-Yao
    Arora, Prahal
    Aminzadeh, Masoumeh
    Feichtenhofer, Christoph
    Metze, Florian
    Zettlemoyer, Luke
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4227 - 4239
  • [3] Centered Masking for Language-Image Pre-training
    Liang, Mingliang
    Larson, Martha
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES-RESEARCH TRACK AND DEMO TRACK, PT VIII, ECML PKDD 2024, 2024, 14948 : 90 - 106
  • [4] Robust Contrastive Language-Image Pre-training against Data Poisoning and Backdoor Attacks
    Yang, Wenhan
    Gao, Jingdong
    Mirzasoleiman, Baharan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] Contrastive Language-Image Pre-Training with Knowledge Graphs
    Pan, Xuran
    Ye, Tianzhu
    Han, Dongchen
    Song, Shiji
    Huang, Gao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [6] DreamLIP: Language-Image Pre-training with Long Captions
    Zheng, Kecheng
    Zhang, Yifei
    Wu, Wei
    Lu, Fan
    Ma, Shuailei
    Jin, Xin
    Chen, Wei
    Shen, Yujun
    COMPUTER VISION-ECCV 2024, PT XVIII, 2025, 15076 : 73 - 90
  • [7] Scaling Language-Image Pre-training via Masking
    Li, Yanghao
    Fan, Haoqi
    Hu, Ronghang
    Feichtenhofert, Christoph
    He, Kaiming
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23390 - 23400
  • [8] Understanding and Mitigating the Soft Error of Contrastive Language-Image Pre-training Models
    Shi, Yihao
    Wang, Bo
    Luo, Shengbai
    Xue, Qingshan
    Zhang, Xueyi
    Ma, Sheng
    8TH INTERNATIONAL TEST CONFERENCE IN ASIA, ITC-ASIA 2024, 2024,
  • [9] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
    Yang, Kaicheng
    Deng, Jiankang
    An, Xiang
    Li, Jiawei
    Feng, Ziyong
    Guo, Jia
    Yang, Jing
    Liu, Tongliang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2910 - 2919
  • [10] A closer look at the explainability of Contrastive language-image pre-training
    Li, Yi
    Wang, Hualiang
    Duan, Yiqun
    Zhang, Jiheng
    Li, Xiaomeng
    PATTERN RECOGNITION, 2025, 162