Downstream Task-agnostic Transferable Attacks on Language-Image Pre-training Models

被引:0
|
作者
Lv, Yiqiang [1 ,2 ]
Chen, Jingjing [1 ,2 ]
Wei, Zhipeng [1 ,2 ]
Chen, Kai [1 ,2 ]
Wu, Zuxuan [1 ,2 ]
Jiang, Yu-Gang [1 ,2 ]
机构
[1] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Sch Comp Sci, Shanghai, Peoples R China
[2] Shanghai Collaborat Innovat Ctr Intelligent Visua, Shanghai, Peoples R China
关键词
Transfer-based Adversarial Attack; Taskagnostic; Visual-Language Pre-training model;
D O I
10.1109/ICME55011.2023.00481
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision-language pre-trained models (e.g., CLIP) trained on large-scale datasets via self-supervised learning, are drawing increasing research attention since they can achieve superior performances on multi-modal downstream tasks. Nevertheless, we find that the adversarial perturbations crafted on vision-language pre-trained models can be used to attack different corresponding downstream task models. Specifically, to investigate such adversarial transferability, we introduce a task-agnostic method named Global and Local Augmentation (GLA) attack to generate highly transferable adversarial examples on CLIP, to attack black-box downstream task models. GLA adopts random crop and resize at both global and local patch levels, to create more diversity and make adversarial noises robust. Then GLA generates the adversarial perturbations by minimizing the cosine similarity between intermediate features from augmented adversarial and benign examples. Extensive experiments on three CLIP image encoders with different backbones and three different downstream tasks demonstrate the superiority of our method compared with other strong baselines. The code is available at https://github.com/yqlvcoding/GLAattack.
引用
收藏
页码:2831 / 2836
页数:6
相关论文
共 50 条
  • [41] Generating Automatic Surgical Captions Using a Contrastive Language-Image Pre-Training Model for Nephrectomy Surgery Images
    Kutuk, Sevdenur
    Caglikantar, Tuba
    Sarikaya, Duygu
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [42] Pre-training Language Models for Comparative Reasoning
    Yu, Mengxia
    Zhang, Zhihan
    Yu, Wenhao
    Jiang, Meng
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 12421 - 12433
  • [43] VILA: On Pre-training for Visual Language Models
    Lin, Ji
    Yin, Hongxu
    Ping, Wei
    Molchanov, Pavlo
    Shoeybi, Mohammad
    Han, Song
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26679 - 26689
  • [44] Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis
    Dobrzycki, Andrzej D.
    Bernardos, Ana M.
    Bergesio, Luca
    Pomirski, Andrzej
    Saez-Trigueros, Daniel
    MATHEMATICS, 2024, 12 (01)
  • [45] GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training
    Deng, Xinchi
    Shi, Han
    Huang, Runhui
    Li, Changlin
    Xu, Hang
    Han, Jianhua
    Kwok, James
    Zhao, Shen
    Zhang, Wei
    Liang, Xiaodan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22121 - 22132
  • [46] Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models
    Xu, Dongkuan
    Mukherjee, Subhabrata
    Liu, Xiaodong
    Dey, Debadeepta
    Wang, Wenhui
    Zhang, Xiang
    Awadallah, Ahmed Hassan
    Gao, Jianfeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [47] Improving the Sample Efficiency of Pre-training Language Models
    Berend, Gabor
    ERCIM NEWS, 2024, (136): : 38 - 40
  • [48] Task-Robust Pre-Training for Worst-Case Downstream Adaptation
    Wang, Jianghui
    Chen, Yang
    Xie, Xingyu
    Fang, Cong
    Lin, Zhouchen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [49] Pre-Training Language Models for Identifying Patronizing and Condescending Language: An Analysis
    Perez-Almendros, Carla
    Espinosa-Anke, Luis
    Schockaert, Steven
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3902 - 3911
  • [50] Conditional Embedding Pre-Training Language Model for Image Captioning
    Li, Pengfei
    Zhang, Min
    Lin, Peijie
    Wan, Jian
    Jiang, Ming
    NEURAL PROCESSING LETTERS, 2022, 54 (06) : 4987 - 5003