I Know Your Triggers: Defending Against Textual Backdoor Attacks with Benign Backdoor Augmentation

被引:0
|
作者
Gao, Yue [1 ]
Stokes, Jack W. [2 ]
Prasad, Manoj Ajith [2 ]
Marshall, Andrew T. [2 ]
Fawaz, Kassem [1 ]
Kiciman, Emre [2 ]
机构
[1] Univ Wisconsin Madison, Madison, WI 53706 USA
[2] Microsoft, Redmond, WA USA
关键词
D O I
10.1109/MILCOM55135.2022.10017466
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A backdoor attack seeks to introduce a backdoor into a machine learning model during training. A backdoored model performs normally on regular inputs but produces a target output chosen by the attacker when the input contains a specific trigger. Backdoor defenses in computer vision are well-studied. Previous approaches for addressing backdoor attacks include 1) cryptographically hashing the original, pristine training and validation datasets to provide evidence of tampering and 2) using machine learning algorithms to detect potentially modified examples. In contrast, textual backdoor defenses are understudied. While textual backdoor attacks have started evading defenses through invisible triggers, textual backdoor defenses have lagged. In this work, we propose Benign Backdoor Augmentation (BBA) to fill the gap between vision and textual backdoor defenses. We discover that existing invisible textual backdoor attacks rely on a small set of publicly documented textual patterns. This unique limitation enables training models with increased robustness to backdoor attacks by augmenting the training and validation datasets with backdoor samples and their true labels. In this way, the model can learn to discard the adversarial connection between the trigger and the target label. Extensive experiments show that the defense can effectively mitigate and identify invisible textual backdoor attacks where existing defenses fall short.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Backdoor Attacks against Learning Systems
    Ji, Yujie
    Zhang, Xinyang
    Wang, Ting
    2017 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY (CNS), 2017, : 191 - 199
  • [22] BITE: Textual Backdoor Attacks with Iterative Trigger Injection
    Yan, Jun
    Gupta, Vansh
    Ren, Xiang
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 12951 - 12968
  • [23] Coordinated Backdoor Attacks against Federated Learning with Model-Dependent Triggers
    Gong, Xueluan
    Chen, Yanjiao
    Huang, Huayang
    Liao, Yuqing
    Wang, Shuai
    Wang, Qian
    IEEE NETWORK, 2022, 36 (01): : 84 - 90
  • [24] Defending Against Backdoor Attacks by Layer-wise Feature Analysis (Extended Abstract)
    Jebreel, Najeeb Moharram
    Domingo-Ferrer, Josep
    Li, Yiming
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 8416 - 8420
  • [25] Backdoor Attacks with Input-Unique Triggers in NLP
    Zhou, Xukun
    Li, Jiwei
    Zhang, Tianwei
    Lyu, Lingjuan
    Yang, Muqiao
    He, Jun
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT I, ECML PKDD 2024, 2024, 14941 : 296 - 312
  • [26] Defending Backdoor Attacks on Vision Transformer via Patch Processing
    Doan, Khoa D.
    Lao, Yingjie
    Yang, Peng
    Li, Ping
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 506 - 515
  • [27] Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning
    Tejankar, Ajinkya
    Sanjabi, Maziar
    Wang, Qifan
    Wang, Sinong
    Firooz, Hamed
    Pirsiavash, Hamed
    Tan, Liang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12239 - 12249
  • [28] RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models
    Yang, Wenkai
    Lin, Yankai
    Li, Peng
    Zhou, Jie
    Sun, Xu
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 8365 - 8381
  • [29] FMDL: Federated Mutual Distillation Learning for Defending Backdoor Attacks
    Sun, Hanqi
    Zhu, Wanquan
    Sun, Ziyu
    Cao, Mingsheng
    Liu, Wenbin
    ELECTRONICS, 2023, 12 (23)
  • [30] On the Effectiveness of Adversarial Training Against Backdoor Attacks
    Gao, Yinghua
    Wu, Dongxian
    Zhang, Jingfeng
    Gan, Guanhao
    Xia, Shu-Tao
    Niu, Gang
    Sugiyama, Masashi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14878 - 14888