Downstream Task-agnostic Transferable Attacks on Language-Image Pre-training Models

被引:0
|
作者
Lv, Yiqiang [1 ,2 ]
Chen, Jingjing [1 ,2 ]
Wei, Zhipeng [1 ,2 ]
Chen, Kai [1 ,2 ]
Wu, Zuxuan [1 ,2 ]
Jiang, Yu-Gang [1 ,2 ]
机构
[1] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Sch Comp Sci, Shanghai, Peoples R China
[2] Shanghai Collaborat Innovat Ctr Intelligent Visua, Shanghai, Peoples R China
关键词
Transfer-based Adversarial Attack; Taskagnostic; Visual-Language Pre-training model;
D O I
10.1109/ICME55011.2023.00481
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision-language pre-trained models (e.g., CLIP) trained on large-scale datasets via self-supervised learning, are drawing increasing research attention since they can achieve superior performances on multi-modal downstream tasks. Nevertheless, we find that the adversarial perturbations crafted on vision-language pre-trained models can be used to attack different corresponding downstream task models. Specifically, to investigate such adversarial transferability, we introduce a task-agnostic method named Global and Local Augmentation (GLA) attack to generate highly transferable adversarial examples on CLIP, to attack black-box downstream task models. GLA adopts random crop and resize at both global and local patch levels, to create more diversity and make adversarial noises robust. Then GLA generates the adversarial perturbations by minimizing the cosine similarity between intermediate features from augmented adversarial and benign examples. Extensive experiments on three CLIP image encoders with different backbones and three different downstream tasks demonstrate the superiority of our method compared with other strong baselines. The code is available at https://github.com/yqlvcoding/GLAattack.
引用
收藏
页码:2831 / 2836
页数:6
相关论文
共 50 条
  • [31] PMC-CLIP: Contrastive Language-Image Pre-training Using Biomedical Documents
    Lin, Weixiong
    Zhao, Ziheng
    Zhang, Xiaoman
    Wu, Chaoyi
    Zhang, Ya
    Wang, Yanfeng
    Xie, Weidi
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VIII, 2023, 14227 : 525 - 536
  • [32] Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
    You, Haoxuan
    Zhou, Luowei
    Xiao, Bin
    Codella, Noel
    Cheng, Yu
    Xu, Ruochen
    Chang, Shih-Fu
    Yuan, Lu
    COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 69 - 87
  • [33] MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis
    Wu, Chaoyi
    Zhang, Xiaoman
    Zhang, Ya
    Wang, Yanfeng
    Xie, Weidi
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21315 - 21326
  • [34] Multimodal alignment augmentation transferable attack on vision-language pre-training models
    Fu, Tingchao
    Zhang, Jinhong
    Li, Fanxiao
    Wei, Ping
    Zeng, Xianglong
    Zhou, Wei
    PATTERN RECOGNITION LETTERS, 2025, 191 : 131 - 137
  • [35] Leveraging Contrastive Language-Image Pre-Training and Bidirectional Cross-attention for Multimodal Keyword Spotting
    Liu, Dong
    Mao, Qirong
    Gao, Lijian
    Wang, Gang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 138
  • [36] Task-adaptive Pre-training of Language Models withWord Embedding Regularization
    Nishida, Kosuke
    Nishida, Kyosuke
    Yoshida, Sen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4546 - 4553
  • [37] CLIP-FG:SELECTING DISCRIMINATIVE IMAGE PATCHES BY CONTRASTIVE LANGUAGE-IMAGE PRE-TRAINING FOR FINE-GRAINED IMAGE CLASSIFICATION
    Yuan, Min
    Lv, Ningning
    Xie, Yufei
    Lu, Fuxiang
    Zhan, Kun
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 560 - 564
  • [38] LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse Retrieval
    Luo, Ziyang
    Zhao, Pu
    Xu, Can
    Geng, Xiubo
    Shen, Tao
    Tao, Chongyang
    Ma, Jing
    Lin, Qingwei
    Jiang, Daxin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11172 - 11183
  • [39] Sigmoid Loss for Language Image Pre-Training
    Zhai, Xiaohua
    Mustafa, Basil
    Kolesnikov, Alexander
    Beyer, Lucas
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11941 - 11952
  • [40] CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training
    You, Kihyun
    Gu, Jawook
    Ham, Jiyeon
    Park, Beomhee
    Kim, Jiho
    Hong, Eun K.
    Baek, Woonhyuk
    Roh, Byungseok
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT II, 2023, 14221 : 101 - 111