Defending against Backdoor Attacks in Natural Language Generation

被引:0
|
作者
Sun, Xiaofei [1 ]
Li, Xiaoya [2 ]
Meng, Yuxian [2 ]
Ao, Xiang [3 ]
Lyu, Lingjuan [4 ]
Li, Jiwei [1 ,2 ]
Zhang, Tianwei [5 ]
机构
[1] Zhejiang Univ, Hangzhou, Peoples R China
[2] Shannon AI, Beijing, Peoples R China
[3] Chinese Acad Sci, Beijing, Peoples R China
[4] Sony AI, Tokyo, Japan
[5] Nanyang Technol Univ, Singapore, Singapore
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The frustratingly fragile nature of neural network models make current natural language generation (NLG) systems prone to backdoor attacks and generate malicious sequences that could be sexist or offensive. Unfortunately, little effort has been invested to how backdoor attacks can affect current NLG models and how to defend against these attacks. In this work, by giving a formal definition of backdoor attack and defense, we investigate this problem on two important NLG tasks, machine translation and dialog generation. Tailored to the inherent nature of NLG models (e.g., producing a sequence of coherent words given contexts), we design defending strategies against attacks. We find that testing the backward probability of generating sources given targets yields effective defense performance against all different types of attacks, and is able to handle the one-to-many issue in many NLG tasks such as dialog generation. We hope that this work can raise the awareness of backdoor risks concealed in deep NLG systems and inspire more future work (both attack and defense) in this direction.
引用
收藏
页码:5257 / 5265
页数:9
相关论文
共 50 条
  • [21] Enhancing robustness of backdoor attacks against backdoor defenses
    Hu, Bin
    Guo, Kehua
    Ren, Sheng
    Fang, Hui
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 269
  • [22] Defending Backdoor Attacks on Vision Transformer via Patch Processing
    Doan, Khoa D.
    Lao, Yingjie
    Yang, Peng
    Li, Ping
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 506 - 515
  • [23] FMDL: Federated Mutual Distillation Learning for Defending Backdoor Attacks
    Sun, Hanqi
    Zhu, Wanquan
    Sun, Ziyu
    Cao, Mingsheng
    Liu, Wenbin
    ELECTRONICS, 2023, 12 (23)
  • [24] Backdoor Attack Against Dataset Distillation in Natural Language Processing
    Chen, Yuhao
    Xu, Weida
    Zhang, Sicong
    Xu, Yang
    APPLIED SCIENCES-BASEL, 2024, 14 (23):
  • [25] Backdoor Attacks against Learning Systems
    Ji, Yujie
    Zhang, Xinyang
    Wang, Ting
    2017 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY (CNS), 2017, : 191 - 199
  • [26] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
    Zhang, Zhexin
    Yang, Junxiao
    Ke, Pei
    Mi, Fei
    Wang, Hongning
    Huang, Minlie
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 8865 - 8887
  • [27] On the Effectiveness of Adversarial Training Against Backdoor Attacks
    Gao, Yinghua
    Wu, Dongxian
    Zhang, Jingfeng
    Gan, Guanhao
    Xia, Shu-Tao
    Niu, Gang
    Sugiyama, Masashi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14878 - 14888
  • [28] Verifying Neural Networks Against Backdoor Attacks
    Pham, Long H.
    Sun, Jun
    COMPUTER AIDED VERIFICATION (CAV 2022), PT I, 2022, 13371 : 171 - 192
  • [29] Backdoor attacks against distributed swarm learning
    Chen, Kongyang
    Zhang, Huaiyuan
    Feng, Xiangyu
    Zhang, Xiaoting
    Mi, Bing
    Jin, Zhiping
    ISA TRANSACTIONS, 2023, 141 : 59 - 72
  • [30] RAB: Provable Robustness Against Backdoor Attacks
    Weber, Maurice
    Xu, Xiaojun
    Karlas, Bojan
    Zhang, Ce
    Li, Bo
    2023 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2023, : 1311 - 1328