Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks

被引:0
|
作者
Xi, Zhaohan [1 ]
Du, Tianyu [2 ]
Li, Changjiang [1 ,3 ]
Pang, Ren [1 ]
Ji, Shouling [2 ]
Chen, Jinghui [1 ]
Ma, Fenglong
Wang, Ting [1 ,3 ]
机构
[1] Penn State Univ, University Pk, PA 16802 USA
[2] Zhejiang Univ, Hangzhou, Peoples R China
[3] SUNY Stony Brook, Stony Brook, NY 11794 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-trained language models (PLMs) have demonstrated remarkable performance as few-shot learners. However, their security risks under such settings are largely unexplored. In this work, we conduct a pilot study showing that PLMs as few-shot learners are highly vulnerable to backdoor attacks while existing defenses are inadequate due to the unique challenges of few-shot scenarios. To address such challenges, we advocate MDP, a novel lightweight, pluggable, and effective defense for PLMs as few-shot learners. Specifically, MDP leverages the gap between the masking-sensitivity of poisoned and clean samples: with reference to the limited few-shot data as distributional anchors, it compares the representations of given samples under varying masking and identifies poisoned samples as ones with significant variations. We show analytically that MDP creates an interesting dilemma for the attacker to choose between attack effectiveness and detection evasiveness. The empirical evaluation using benchmark datasets and representative attacks validates the efficacy of MDP. Code available at https://github.com/zhaohan-xi/PLM-prompt-defense.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Making Pre-trained Language Models Better Few-shot Learners
    Gao, Tianyu
    Fisch, Adam
    Chen, Danqi
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 3816 - 3830
  • [2] Few-Shot NLG with Pre-Trained Language Model
    Chen, Zhiyu
    Eavani, Harini
    Chen, Wenhu
    Liu, Yinyin
    Wang, William Yang
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 183 - 190
  • [3] Pathologies of Pre-trained Language Models in Few-shot Fine-tuning
    Chen, Hanjie
    Zheng, Guoqing
    Awadallah, Ahmed Hassan
    Ji, Yangfeng
    [J]. PROCEEDINGS OF THE THIRD WORKSHOP ON INSIGHTS FROM NEGATIVE RESULTS IN NLP (INSIGHTS 2022), 2022, : 144 - 153
  • [4] Aliasing Backdoor Attacks on Pre-trained Models
    Wei, Cheng'an
    Lee, Yeonjoon
    Chen, Kai
    Meng, Guozhu
    Lv, Peizhuo
    [J]. PROCEEDINGS OF THE 32ND USENIX SECURITY SYMPOSIUM, 2023, : 2707 - 2724
  • [5] TOKEN Is a MASK: Few-shot Named Entity Recognition with Pre-trained Language Models
    Davody, Ali
    Adelani, David Ifeoluwa
    Kleinbauer, Thomas
    Klakow, Dietrich
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 138 - 150
  • [6] CBAs: Character-level Backdoor Attacks against Chinese Pre-trained Language Models
    He, Xinyu
    Hao, Fengrui
    Gu, Tianlong
    Chang, Liang
    [J]. ACM TRANSACTIONS ON PRIVACY AND SECURITY, 2024, 27 (03)
  • [7] Better Few-Shot Text Classification with Pre-trained Language Model
    Chen, Zheng
    Zhang, Yunchen
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT II, 2021, 12892 : 537 - 548
  • [8] Investigating Prompt Learning for Chinese Few-Shot Text Classification with Pre-Trained Language Models
    Song, Chengyu
    Shao, Taihua
    Lin, Kejing
    Liu, Dengfeng
    Wang, Siyuan
    Chen, Honghui
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (21):
  • [9] Language Models are Few-Shot Learners
    Brown, Tom B.
    Mann, Benjamin
    Ryder, Nick
    Subbiah, Melanie
    Kaplan, Jared
    Dhariwal, Prafulla
    Neelakantan, Arvind
    Shyam, Pranav
    Sastry, Girish
    Askell, Amanda
    Agarwal, Sandhini
    Herbert-Voss, Ariel
    Krueger, Gretchen
    Henighan, Tom
    Child, Rewon
    Ramesh, Aditya
    Ziegler, Daniel M.
    Wu, Jeffrey
    Winter, Clemens
    Hesse, Christopher
    Chen, Mark
    Sigler, Eric
    Litwin, Mateusz
    Gray, Scott
    Chess, Benjamin
    Clark, Jack
    Berner, Christopher
    McCandlish, Sam
    Radford, Alec
    Sutskever, Ilya
    Amodei, Dario
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [10] Backdoor Attacks Against Transfer Learning With Pre-Trained Deep Learning Models
    Wang, Shuo
    Nepal, Surya
    Rudolph, Carsten
    Grobler, Marthie
    Chen, Shangyu
    Chen, Tianle
    [J]. IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (03) : 1526 - 1539