I Know Your Triggers: Defending Against Textual Backdoor Attacks with Benign Backdoor Augmentation

被引：0

作者：

Gao, Yue ^{[1
]}

Stokes, Jack W. ^{[2
]}

Prasad, Manoj Ajith ^{[2
]}

Marshall, Andrew T. ^{[2
]}

Fawaz, Kassem ^{[1
]}

Kiciman, Emre ^{[2
]}

机构：

[1] Univ Wisconsin Madison, Madison, WI 53706 USA

[2] Microsoft, Redmond, WA USA

来源：

2022 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM) | 2022年

关键词：

D O I：

10.1109/MILCOM55135.2022.10017466

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A backdoor attack seeks to introduce a backdoor into a machine learning model during training. A backdoored model performs normally on regular inputs but produces a target output chosen by the attacker when the input contains a specific trigger. Backdoor defenses in computer vision are well-studied. Previous approaches for addressing backdoor attacks include 1) cryptographically hashing the original, pristine training and validation datasets to provide evidence of tampering and 2) using machine learning algorithms to detect potentially modified examples. In contrast, textual backdoor defenses are understudied. While textual backdoor attacks have started evading defenses through invisible triggers, textual backdoor defenses have lagged. In this work, we propose Benign Backdoor Augmentation (BBA) to fill the gap between vision and textual backdoor defenses. We discover that existing invisible textual backdoor attacks rely on a small set of publicly documented textual patterns. This unique limitation enables training models with increased robustness to backdoor attacks by augmenting the training and validation datasets with backdoor samples and their true labels. In this way, the model can learn to discard the adversarial connection between the trigger and the target label. Extensive experiments show that the defense can effectively mitigate and identify invisible textual backdoor attacks where existing defenses fall short.

引用

页数：8

共 50 条

[21] Backdoor Attacks against Learning Systems
Ji, Yujie
Zhang, Xinyang
Wang, Ting
2017 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY (CNS), 2017, : 191 - 199
[22] BITE: Textual Backdoor Attacks with Iterative Trigger Injection
Yan, Jun
Gupta, Vansh
Ren, Xiang
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 12951 - 12968
[23] Coordinated Backdoor Attacks against Federated Learning with Model-Dependent Triggers
Gong, Xueluan
Chen, Yanjiao
Huang, Huayang
Liao, Yuqing
Wang, Shuai
Wang, Qian
IEEE NETWORK, 2022, 36 (01): : 84 - 90
[24] Defending Against Backdoor Attacks by Layer-wise Feature Analysis (Extended Abstract)
Jebreel, Najeeb Moharram
Domingo-Ferrer, Josep
Li, Yiming
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 8416 - 8420
[25] Backdoor Attacks with Input-Unique Triggers in NLP
Zhou, Xukun
Li, Jiwei
Zhang, Tianwei
Lyu, Lingjuan
Yang, Muqiao
He, Jun
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT I, ECML PKDD 2024, 2024, 14941 : 296 - 312
[26] Defending Backdoor Attacks on Vision Transformer via Patch Processing
Doan, Khoa D.
Lao, Yingjie
Yang, Peng
Li, Ping
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 506 - 515
[27] Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning
Tejankar, Ajinkya
Sanjabi, Maziar
Wang, Qifan
Wang, Sinong
Firooz, Hamed
Pirsiavash, Hamed
Tan, Liang
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12239 - 12249
[28] RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models
Yang, Wenkai
Lin, Yankai
Li, Peng
Zhou, Jie
Sun, Xu
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 8365 - 8381
[29] FMDL: Federated Mutual Distillation Learning for Defending Backdoor Attacks
Sun, Hanqi
Zhu, Wanquan
Sun, Ziyu
Cao, Mingsheng
Liu, Wenbin
ELECTRONICS, 2023, 12 (23)
[30] On the Effectiveness of Adversarial Training Against Backdoor Attacks
Gao, Yinghua
Wu, Dongxian
Zhang, Jingfeng
Gan, Guanhao
Xia, Shu-Tao
Niu, Gang
Sugiyama, Masashi
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14878 - 14888

← 1 2 3 4 5 →