Defending against Backdoor Attacks in Natural Language Generation

被引:0
|
作者
Sun, Xiaofei [1 ]
Li, Xiaoya [2 ]
Meng, Yuxian [2 ]
Ao, Xiang [3 ]
Lyu, Lingjuan [4 ]
Li, Jiwei [1 ,2 ]
Zhang, Tianwei [5 ]
机构
[1] Zhejiang Univ, Hangzhou, Peoples R China
[2] Shannon AI, Beijing, Peoples R China
[3] Chinese Acad Sci, Beijing, Peoples R China
[4] Sony AI, Tokyo, Japan
[5] Nanyang Technol Univ, Singapore, Singapore
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The frustratingly fragile nature of neural network models make current natural language generation (NLG) systems prone to backdoor attacks and generate malicious sequences that could be sexist or offensive. Unfortunately, little effort has been invested to how backdoor attacks can affect current NLG models and how to defend against these attacks. In this work, by giving a formal definition of backdoor attack and defense, we investigate this problem on two important NLG tasks, machine translation and dialog generation. Tailored to the inherent nature of NLG models (e.g., producing a sequence of coherent words given contexts), we design defending strategies against attacks. We find that testing the backward probability of generating sources given targets yields effective defense performance against all different types of attacks, and is able to handle the one-to-many issue in many NLG tasks such as dialog generation. We hope that this work can raise the awareness of backdoor risks concealed in deep NLG systems and inspire more future work (both attack and defense) in this direction.
引用
收藏
页码:5257 / 5265
页数:9
相关论文
共 50 条
  • [1] Defending Against Backdoor Attacks by Quarantine Training
    Yu, Chengxu
    Zhang, Yulai
    IEEE ACCESS, 2024, 12 : 10681 - 10689
  • [2] Invariant Aggregator for Defending against Federated Backdoor Attacks
    Wang, Xiaoyang
    Dimitriadis, Dimitrios
    Koyejo, Sanmi
    Tople, Shruti
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [3] SPECTRE Defending Against Backdoor Attacks Using Robust Statistics
    Hayase, Jonathan
    Kong, Weihao
    Somani, Raghav
    Oh, Sewoong
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [4] FedPD: Defending federated prototype learning against backdoor attacks
    Tan, Zhou
    Cai, Jianping
    Li, De
    Lian, Puwei
    Liu, Ximeng
    Che, Yan
    NEURAL NETWORKS, 2025, 184
  • [5] RoPE: Defending against backdoor attacks in federated learning systems
    Wang, Yongkang
    Zhai, Di-Hua
    Xia, Yuanqing
    KNOWLEDGE-BASED SYSTEMS, 2024, 293
  • [6] DEFENDING AGAINST BACKDOOR ATTACKS IN FEDERATED LEARNING WITH DIFFERENTIAL PRIVACY
    Miao, Lu
    Yang, Wei
    Hu, Rong
    Li, Lu
    Huang, Liusheng
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2999 - 3003
  • [7] Artemis: Defending Against Backdoor Attacks via Distribution Shift
    Xue, Meng
    Wang, Zhixian
    Zhang, Qian
    Gong, Xueluan
    Liu, Zhihang
    Chen, Yanjiao
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2025, 22 (02) : 1781 - 1795
  • [8] Defending Against Data and Model Backdoor Attacks in Federated Learning
    Wang, Hao
    Mu, Xuejiao
    Wang, Dong
    Xu, Qiang
    Li, Kaiju
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (24): : 39276 - 39294
  • [9] Attention-based backdoor attacks against natural language processing models
    Zhang, Yunchun
    Wang, Qi
    Min, Shaohui
    Zuo, Ruifeng
    Huang, Feiyang
    Liu, Hao
    Yao, Shaowen
    APPLIED SOFT COMPUTING, 2025, 173
  • [10] Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks
    Xi, Zhaohan
    Du, Tianyu
    Li, Changjiang
    Pang, Ren
    Ji, Shouling
    Chen, Jinghui
    Ma, Fenglong
    Wang, Ting
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,