Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks

被引：0

作者：

Xi, Zhaohan ^{[1
]}

Du, Tianyu ^{[2
]}

Li, Changjiang ^{[1
,3
]}

Pang, Ren ^{[1
]}

Ji, Shouling ^{[2
]}

Chen, Jinghui ^{[1
]}

Ma, Fenglong

Wang, Ting ^{[1
,3
]}

机构：

[1] Penn State Univ, University Pk, PA 16802 USA

[2] Zhejiang Univ, Hangzhou, Peoples R China

[3] SUNY Stony Brook, Stony Brook, NY 11794 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-trained language models (PLMs) have demonstrated remarkable performance as few-shot learners. However, their security risks under such settings are largely unexplored. In this work, we conduct a pilot study showing that PLMs as few-shot learners are highly vulnerable to backdoor attacks while existing defenses are inadequate due to the unique challenges of few-shot scenarios. To address such challenges, we advocate MDP, a novel lightweight, pluggable, and effective defense for PLMs as few-shot learners. Specifically, MDP leverages the gap between the masking-sensitivity of poisoned and clean samples: with reference to the limited few-shot data as distributional anchors, it compares the representations of given samples under varying masking and identifies poisoned samples as ones with significant variations. We show analytically that MDP creates an interesting dilemma for the attacker to choose between attack effectiveness and detection evasiveness. The empirical evaluation using benchmark datasets and representative attacks validates the efficacy of MDP. Code available at https://github.com/zhaohan-xi/PLM-prompt-defense.

引用

页数：17

共 50 条

[1] Making Pre-trained Language Models Better Few-shot Learners
Gao, Tianyu
Fisch, Adam
Chen, Danqi
[J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 3816 - 3830
[2] Few-Shot NLG with Pre-Trained Language Model
Chen, Zhiyu
Eavani, Harini
Chen, Wenhu
Liu, Yinyin
Wang, William Yang
[J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 183 - 190
[3] Pathologies of Pre-trained Language Models in Few-shot Fine-tuning
Chen, Hanjie
Zheng, Guoqing
Awadallah, Ahmed Hassan
Ji, Yangfeng
[J]. PROCEEDINGS OF THE THIRD WORKSHOP ON INSIGHTS FROM NEGATIVE RESULTS IN NLP (INSIGHTS 2022), 2022, : 144 - 153
[4] Aliasing Backdoor Attacks on Pre-trained Models
Wei, Cheng'an
Lee, Yeonjoon
Chen, Kai
Meng, Guozhu
Lv, Peizhuo
[J]. PROCEEDINGS OF THE 32ND USENIX SECURITY SYMPOSIUM, 2023, : 2707 - 2724
[5] TOKEN Is a MASK: Few-shot Named Entity Recognition with Pre-trained Language Models
Davody, Ali
Adelani, David Ifeoluwa
Kleinbauer, Thomas
Klakow, Dietrich
[J]. TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 138 - 150
[6] CBAs: Character-level Backdoor Attacks against Chinese Pre-trained Language Models
He, Xinyu
Hao, Fengrui
Gu, Tianlong
Chang, Liang
[J]. ACM TRANSACTIONS ON PRIVACY AND SECURITY, 2024, 27 (03)
[7] Better Few-Shot Text Classification with Pre-trained Language Model
Chen, Zheng
Zhang, Yunchen
[J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT II, 2021, 12892 : 537 - 548
[8] Investigating Prompt Learning for Chinese Few-Shot Text Classification with Pre-Trained Language Models
Song, Chengyu
Shao, Taihua
Lin, Kejing
Liu, Dengfeng
Wang, Siyuan
Chen, Honghui
[J]. APPLIED SCIENCES-BASEL, 2022, 12 (21):
[9] Language Models are Few-Shot Learners
Brown, Tom B.
Mann, Benjamin
Ryder, Nick
Subbiah, Melanie
Kaplan, Jared
Dhariwal, Prafulla
Neelakantan, Arvind
Shyam, Pranav
Sastry, Girish
Askell, Amanda
Agarwal, Sandhini
Herbert-Voss, Ariel
Krueger, Gretchen
Henighan, Tom
Child, Rewon
Ramesh, Aditya
Ziegler, Daniel M.
Wu, Jeffrey
Winter, Clemens
Hesse, Christopher
Chen, Mark
Sigler, Eric
Litwin, Mateusz
Gray, Scott
Chess, Benjamin
Clark, Jack
Berner, Christopher
McCandlish, Sam
Radford, Alec
Sutskever, Ilya
Amodei, Dario
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[10] Backdoor Attacks Against Transfer Learning With Pre-Trained Deep Learning Models
Wang, Shuo
Nepal, Surya
Rudolph, Carsten
Grobler, Marthie
Chen, Shangyu
Chen, Tianle
[J]. IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (03) : 1526 - 1539

← 1 2 3 4 5 →