I Know Your Triggers: Defending Against Textual Backdoor Attacks with Benign Backdoor Augmentation

被引:0
|
作者
Gao, Yue [1 ]
Stokes, Jack W. [2 ]
Prasad, Manoj Ajith [2 ]
Marshall, Andrew T. [2 ]
Fawaz, Kassem [1 ]
Kiciman, Emre [2 ]
机构
[1] Univ Wisconsin Madison, Madison, WI 53706 USA
[2] Microsoft, Redmond, WA USA
关键词
D O I
10.1109/MILCOM55135.2022.10017466
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A backdoor attack seeks to introduce a backdoor into a machine learning model during training. A backdoored model performs normally on regular inputs but produces a target output chosen by the attacker when the input contains a specific trigger. Backdoor defenses in computer vision are well-studied. Previous approaches for addressing backdoor attacks include 1) cryptographically hashing the original, pristine training and validation datasets to provide evidence of tampering and 2) using machine learning algorithms to detect potentially modified examples. In contrast, textual backdoor defenses are understudied. While textual backdoor attacks have started evading defenses through invisible triggers, textual backdoor defenses have lagged. In this work, we propose Benign Backdoor Augmentation (BBA) to fill the gap between vision and textual backdoor defenses. We discover that existing invisible textual backdoor attacks rely on a small set of publicly documented textual patterns. This unique limitation enables training models with increased robustness to backdoor attacks by augmenting the training and validation datasets with backdoor samples and their true labels. In this way, the model can learn to discard the adversarial connection between the trigger and the target label. Extensive experiments show that the defense can effectively mitigate and identify invisible textual backdoor attacks where existing defenses fall short.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] A defense method against backdoor attacks on neural networks
    Kaviani, Sara
    Shamshiri, Samaneh
    Sohn, Insoo
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213
  • [42] Countermeasures Against Backdoor Attacks Towards Malware Detectors
    Narisada, Shintaro
    Matsumoto, Yuki
    Hidano, Seira
    Uchibayashi, Toshihiro
    Suganuma, Takuo
    Hiji, Masahiro
    Kiyomoto, Shinsaku
    CRYPTOLOGY AND NETWORK SECURITY, CANS 2021, 2021, 13099 : 295 - 314
  • [43] Backdoor Attacks against Voice Recognition Systems: A Survey
    Yan, Baochen
    Lan, Jiahe
    Yan, Zheng
    ACM COMPUTING SURVEYS, 2025, 57 (03)
  • [44] Efficient and Secure Federated Learning Against Backdoor Attacks
    Miao, Yinbin
    Xie, Rongpeng
    Li, Xinghua
    Liu, Zhiquan
    Choo, Kim-Kwang Raymond
    Deng, Robert H.
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2024, 21 (05) : 4619 - 4636
  • [45] MagBackdoor: Beware of Your Loudspeaker as A Backdoor For Magnetic Injection Attacks
    Liu, Tiantian
    Lin, Feng
    Wang, Zhangsen
    Wang, Chao
    Ba, Zhongjie
    Lu, Li
    Xu, Wenyao
    Ren, Kui
    2023 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2023, : 3416 - 3431
  • [46] Stealthy Targeted Backdoor Attacks Against Image Captioning
    Fan, Wenshu
    Li, Hongwei
    Jiang, Wenbo
    Hao, Meng
    Yu, Shui
    Zhang, Xiao
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 5655 - 5667
  • [47] Dynamic Backdoor Attacks Against Machine Learning Models
    Salem, Ahmed
    Wen, Rui
    Backes, Michael
    Ma, Shiqing
    Zhang, Yang
    2022 IEEE 7TH EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY (EUROS&P 2022), 2022, : 703 - 718
  • [48] Countermeasure against Backdoor Attacks using Epistemic Classifiers
    Yang, Zhaoyuan
    Virani, Nurali
    Iyer, Naresh S.
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS II, 2020, 11413
  • [49] VILLAIN: Backdoor Attacks Against Vertical Split Learning
    Bai, Yijie
    Chen, Yanjiao
    Zhang, Hanlei
    Xu, Wenyuan
    Weng, Haiqin
    Goodman, Dou
    PROCEEDINGS OF THE 32ND USENIX SECURITY SYMPOSIUM, 2023, : 2743 - 2760
  • [50] Meme Trojan: Backdoor Attacks Against Hateful Meme Detection via Cross-Modal Triggers
    Wang, Ruofei
    Lin, Hongzhan
    Luo, Ziyuan
    Cheung, Ka Chun
    See, Simon
    Ma, Jing
    Wan, Renjie
    arXiv,