I Know Your Triggers: Defending Against Textual Backdoor Attacks with Benign Backdoor Augmentation

被引:0
|
作者
Gao, Yue [1 ]
Stokes, Jack W. [2 ]
Prasad, Manoj Ajith [2 ]
Marshall, Andrew T. [2 ]
Fawaz, Kassem [1 ]
Kiciman, Emre [2 ]
机构
[1] Univ Wisconsin Madison, Madison, WI 53706 USA
[2] Microsoft, Redmond, WA USA
关键词
D O I
10.1109/MILCOM55135.2022.10017466
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A backdoor attack seeks to introduce a backdoor into a machine learning model during training. A backdoored model performs normally on regular inputs but produces a target output chosen by the attacker when the input contains a specific trigger. Backdoor defenses in computer vision are well-studied. Previous approaches for addressing backdoor attacks include 1) cryptographically hashing the original, pristine training and validation datasets to provide evidence of tampering and 2) using machine learning algorithms to detect potentially modified examples. In contrast, textual backdoor defenses are understudied. While textual backdoor attacks have started evading defenses through invisible triggers, textual backdoor defenses have lagged. In this work, we propose Benign Backdoor Augmentation (BBA) to fill the gap between vision and textual backdoor defenses. We discover that existing invisible textual backdoor attacks rely on a small set of publicly documented textual patterns. This unique limitation enables training models with increased robustness to backdoor attacks by augmenting the training and validation datasets with backdoor samples and their true labels. In this way, the model can learn to discard the adversarial connection between the trigger and the target label. Extensive experiments show that the defense can effectively mitigate and identify invisible textual backdoor attacks where existing defenses fall short.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger
    Qi, Fanchao
    Li, Mukai
    Chen, Yangyi
    Zhang, Zhengyan
    Liu, Zhiyuan
    Wang, Yasheng
    Sun, Maosong
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 443 - 453
  • [32] Kallima: A Clean-Label Framework for Textual Backdoor Attacks
    Chen, Xiaoyi
    Dong, Yinpeng
    Sun, Zeyu
    Zhai, Shengfang
    Shen, Qingni
    Wu, Zhonghai
    COMPUTER SECURITY - ESORICS 2022, PT I, 2022, 13554 : 447 - 466
  • [33] Verifying Neural Networks Against Backdoor Attacks
    Pham, Long H.
    Sun, Jun
    COMPUTER AIDED VERIFICATION (CAV 2022), PT I, 2022, 13371 : 171 - 192
  • [34] Backdoor attacks against distributed swarm learning
    Chen, Kongyang
    Zhang, Huaiyuan
    Feng, Xiangyu
    Zhang, Xiaoting
    Mi, Bing
    Jin, Zhiping
    ISA TRANSACTIONS, 2023, 141 : 59 - 72
  • [35] RAB: Provable Robustness Against Backdoor Attacks
    Weber, Maurice
    Xu, Xiaojun
    Karlas, Bojan
    Zhang, Ce
    Li, Bo
    2023 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2023, : 1311 - 1328
  • [36] Can You Hear It? Backdoor Attacks via Ultrasonic Triggers
    Koffas, Stefanos
    Xu, Jing
    Conti, Mauro
    Picek, Stjepan
    PROCEEDINGS OF THE 2022 ACM WORKSHOP ON WIRELESS SECURITY AND MACHINE LEARNIG (WISEML '22), 2022, : 57 - 62
  • [37] CASSOCK: Viable Backdoor Attacks against DNN in the Wall of Source-Specific Backdoor Defenses
    Wang, Shang
    Gao, Yansong
    Fu, Anmin
    Zhang, Zhi
    Zhang, Yuqing
    Susilo, Willy
    Liu, Dongxi
    PROCEEDINGS OF THE 2023 ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, ASIA CCS 2023, 2023, : 938 - 950
  • [38] Backdoor Attacks on Graph Neural Networks Trained with Data Augmentation
    Yashiki, Shingo
    Takahashi, Chako
    Suzuki, Koutarou
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2024, E107A (03) : 355 - 358
  • [39] Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks
    Xi, Zhaohan
    Du, Tianyu
    Li, Changjiang
    Pang, Ren
    Ji, Shouling
    Chen, Jinghui
    Ma, Fenglong
    Wang, Ting
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [40] DETECTING BACKDOOR ATTACKS AGAINST POINT CLOUD CLASSIFIERS
    Xiang, Zhen
    Miller, David J.
    Chen, Siheng
    Li, Xi
    Kesidis, George
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3159 - 3163