I Know Your Triggers: Defending Against Textual Backdoor Attacks with Benign Backdoor Augmentation

被引:0
|
作者
Gao, Yue [1 ]
Stokes, Jack W. [2 ]
Prasad, Manoj Ajith [2 ]
Marshall, Andrew T. [2 ]
Fawaz, Kassem [1 ]
Kiciman, Emre [2 ]
机构
[1] Univ Wisconsin Madison, Madison, WI 53706 USA
[2] Microsoft, Redmond, WA USA
关键词
D O I
10.1109/MILCOM55135.2022.10017466
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A backdoor attack seeks to introduce a backdoor into a machine learning model during training. A backdoored model performs normally on regular inputs but produces a target output chosen by the attacker when the input contains a specific trigger. Backdoor defenses in computer vision are well-studied. Previous approaches for addressing backdoor attacks include 1) cryptographically hashing the original, pristine training and validation datasets to provide evidence of tampering and 2) using machine learning algorithms to detect potentially modified examples. In contrast, textual backdoor defenses are understudied. While textual backdoor attacks have started evading defenses through invisible triggers, textual backdoor defenses have lagged. In this work, we propose Benign Backdoor Augmentation (BBA) to fill the gap between vision and textual backdoor defenses. We discover that existing invisible textual backdoor attacks rely on a small set of publicly documented textual patterns. This unique limitation enables training models with increased robustness to backdoor attacks by augmenting the training and validation datasets with backdoor samples and their true labels. In this way, the model can learn to discard the adversarial connection between the trigger and the target label. Extensive experiments show that the defense can effectively mitigate and identify invisible textual backdoor attacks where existing defenses fall short.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Defending against Insertion-based Textual Backdoor Attacks via Attribution
    Li, Jiazhao
    Wu, Zhuofeng
    Ping, Wei
    Xiao, Chaowei
    Vydiswaran, V. G. Vinod
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8818 - 8833
  • [2] Defending Against Backdoor Attacks by Quarantine Training
    Yu, Chengxu
    Zhang, Yulai
    IEEE ACCESS, 2024, 12 : 10681 - 10689
  • [3] Defending against Backdoor Attacks in Natural Language Generation
    Sun, Xiaofei
    Li, Xiaoya
    Meng, Yuxian
    Ao, Xiang
    Lyu, Lingjuan
    Li, Jiwei
    Zhang, Tianwei
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 4, 2023, : 5257 - 5265
  • [4] Invariant Aggregator for Defending against Federated Backdoor Attacks
    Wang, Xiaoyang
    Dimitriadis, Dimitrios
    Koyejo, Sanmi
    Tople, Shruti
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [5] BDDR: An Effective Defense Against Textual Backdoor Attacks
    Shao, Kun
    Yang, Junan
    Ai, Yang
    Liu, Hui
    Zhang, Yu
    Shao, Kun (1608053548@qq.com), 1600, Elsevier Ltd (110):
  • [6] BDDR: An Effective Defense Against Textual Backdoor Attacks
    Shao, Kun
    Yang, Junan
    Ai, Yang
    Liu, Hui
    Zhang, Yu
    COMPUTERS & SECURITY, 2021, 110
  • [7] SPECTRE Defending Against Backdoor Attacks Using Robust Statistics
    Hayase, Jonathan
    Kong, Weihao
    Somani, Raghav
    Oh, Sewoong
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [8] FedPD: Defending federated prototype learning against backdoor attacks
    Tan, Zhou
    Cai, Jianping
    Li, De
    Lian, Puwei
    Liu, Ximeng
    Che, Yan
    NEURAL NETWORKS, 2025, 184
  • [9] RoPE: Defending against backdoor attacks in federated learning systems
    Wang, Yongkang
    Zhai, Di-Hua
    Xia, Yuanqing
    KNOWLEDGE-BASED SYSTEMS, 2024, 293
  • [10] DEFENDING AGAINST BACKDOOR ATTACKS IN FEDERATED LEARNING WITH DIFFERENTIAL PRIVACY
    Miao, Lu
    Yang, Wei
    Hu, Rong
    Li, Lu
    Huang, Liusheng
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2999 - 3003