Downstream Task-agnostic Transferable Attacks on Language-Image Pre-training Models

被引:0
|
作者
Lv, Yiqiang [1 ,2 ]
Chen, Jingjing [1 ,2 ]
Wei, Zhipeng [1 ,2 ]
Chen, Kai [1 ,2 ]
Wu, Zuxuan [1 ,2 ]
Jiang, Yu-Gang [1 ,2 ]
机构
[1] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Sch Comp Sci, Shanghai, Peoples R China
[2] Shanghai Collaborat Innovat Ctr Intelligent Visua, Shanghai, Peoples R China
关键词
Transfer-based Adversarial Attack; Taskagnostic; Visual-Language Pre-training model;
D O I
10.1109/ICME55011.2023.00481
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision-language pre-trained models (e.g., CLIP) trained on large-scale datasets via self-supervised learning, are drawing increasing research attention since they can achieve superior performances on multi-modal downstream tasks. Nevertheless, we find that the adversarial perturbations crafted on vision-language pre-trained models can be used to attack different corresponding downstream task models. Specifically, to investigate such adversarial transferability, we introduce a task-agnostic method named Global and Local Augmentation (GLA) attack to generate highly transferable adversarial examples on CLIP, to attack black-box downstream task models. GLA adopts random crop and resize at both global and local patch levels, to create more diversity and make adversarial noises robust. Then GLA generates the adversarial perturbations by minimizing the cosine similarity between intermediate features from augmented adversarial and benign examples. Extensive experiments on three CLIP image encoders with different backbones and three different downstream tasks demonstrate the superiority of our method compared with other strong baselines. The code is available at https://github.com/yqlvcoding/GLAattack.
引用
收藏
页码:2831 / 2836
页数:6
相关论文
共 50 条
  • [21] RLIPv2: Fast Scaling of Relational Language-Image Pre-training
    Yuan, Hangjie
    Zhang, Shiwei
    Wang, Xiang
    Albanie, Samuel
    Pan, Yining
    Feng, Tao
    Jiang, Jianwen
    Ni, Dong
    Zhang, Yingya
    Zhao, Deli
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21592 - 21604
  • [22] Data Determines Distributional Robustness in Contrastive Language-Image Pre-training (CLIP)
    Fang, Alex
    Ilharco, Gabriel
    Wortsman, Mitchell
    Wan, Yuhao
    Shankar, Vaishaal
    Dave, Achal
    Schmidt, Ludwig
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [23] RLIPv2: Fast Scaling of Relational Language-Image Pre-training
    Yuan, Hangjie
    Zhang, Shiwei
    Wang, Xiang
    Albanie, Samuel
    Pan, Yining
    Feng, Tao
    Jiang, Jianwen
    Ni, Dong
    Zhang, Yingya
    Zhao, Deli
    Proceedings of the IEEE International Conference on Computer Vision, 2023, : 21592 - 21604
  • [24] Transferable Multimodal Attack on Vision-Language Pre-training Models
    Wang, Haodi
    Dong, Kai
    Zhu, Zhilei
    Qin, Haotong
    Liu, Aishan
    Fang, Xiaolin
    Wang, Jiakai
    Liu, Xianglong
    45TH IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP 2024, 2024, : 1722 - 1740
  • [25] RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection
    Yuan, Hangjie
    Jiang, Jianwen
    Albanie, Samuel
    Feng, Tao
    Huang, Ziyuan
    Ni, Dong
    Tang, Mingqian
    Advances in Neural Information Processing Systems, 2022, 35
  • [26] BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
    Li, Junnan
    Li, Dongxu
    Xiong, Caiming
    Hoi, Steven
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [27] Construction safety inspection with contrastive language-image pre-training (CLIP) image captioning and attention
    Tsai, Wei-Lun
    Le, Phuong-Linh
    Ho, Wang-Fat
    Chi, Nai-Wen
    Lin, Jacob J.
    Tang, Shuai
    Hsieh, Shang-Hsien
    AUTOMATION IN CONSTRUCTION, 2025, 169
  • [28] ARCHICLIP Enhanced Contrastive Language-Image Pre-training Model With Architectural Prior Knowledge
    Xia, Shengtao
    Cheng, Yiming
    Tian, Runjia
    PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE OF THE ASSOCIATION FOR COMPUTER-AIDED ARCHITECTURAL DESIGN RESEARCH IN ASIA, CAADRIA 2024, VOL 1, 2024, : 69 - 78
  • [29] RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection
    Yuan, Hangjie
    Jiang, Jianwen
    Albanie, Samuel
    Feng, Tao
    Huang, Ziyuan
    Ni, Dong
    Tang, Mingqian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [30] Multimodal Hate Speech Detection in Memes Using Contrastive Language-Image Pre-Training
    Arya, Greeshma
    Hasan, Mohammad Kamrul
    Bagwari, Ashish
    Safie, Nurhizam
    Islam, Shayla
    Ahmed, Fatima Rayan Awad
    De, Aaishani
    Khan, Muhammad Attique
    Ghazal, Taher M.
    IEEE ACCESS, 2024, 12 : 22359 - 22375