CBAs: Character-level Backdoor Attacks against Chinese Pre-trained Language Models

被引:0
|
作者
He, Xinyu [1 ]
Hao, Fengrui [2 ]
Gu, Tianlong [2 ]
Chang, Liang [1 ]
机构
[1] Guilin Univ Elect Technol, Guilin, Guangxi, Peoples R China
[2] Jinan Univ, Engn Res Ctr Trustworthy AI, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Pre-trained language models; backdoor attacks; Chinese; character;
D O I
10.1145/3678007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Pre-trained language models (PLMs) aim to assist computers in various domains to provide natural and efficient language interaction and text processing capabilities. However, recent studies have shown that PLMs are highly vulnerable to malicious backdoor attacks, where triggers could be injected into the models to guide them to exhibit the expected behavior of the attackers. Unfortunately, existing research on backdoor attacks has mainly focused on English PLMs and paid less attention to Chinese PLMs. Moreover, these extant backdoor attacks do not work well against Chinese PLMs. In this article, we disclose the limitations of English backdoor attacks against Chinese PLMs, and propose the character-level backdoor attacks (CBAs) against the Chinese PLMs. Specifically, we first design three Chinese trigger generation strategies to ensure that the backdoor is effectively triggered while improving the effectiveness of the backdoor attacks. Then, based on the attacker's capabilities of accessing the training dataset, we develop trigger injection mechanisms with either the target label similarity or the masked language model, which select the most influential position and insert the trigger to maximize the stealth of backdoor attacks. Extensive experiments on three major natural language processing tasks in various Chinese PLMs and English PLMs demonstrate the effectiveness and stealthiness of our method. In addition, CBAs have very strong resistance against three state-of-the-art backdoor defense methods.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] Aliasing Backdoor Attacks on Pre-trained Models
    Wei, Cheng'an
    Lee, Yeonjoon
    Chen, Kai
    Meng, Guozhu
    Lv, Peizhuo
    PROCEEDINGS OF THE 32ND USENIX SECURITY SYMPOSIUM, 2023, : 2707 - 2724
  • [2] Character-Level Syntax Infusion in Pre-Trained Models for Chinese Semantic Role Labeling
    Wang, Yuxuan
    Lei, Zhilin
    Che, Wanxiang
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (12) : 3503 - 3515
  • [3] Character-Level Syntax Infusion in Pre-Trained Models for Chinese Semantic Role Labeling
    Yuxuan Wang
    Zhilin Lei
    Wanxiang Che
    International Journal of Machine Learning and Cybernetics, 2021, 12 : 3503 - 3515
  • [4] Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks
    Xi, Zhaohan
    Du, Tianyu
    Li, Changjiang
    Pang, Ren
    Ji, Shouling
    Chen, Jinghui
    Ma, Fenglong
    Wang, Ting
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] Backdoor Attacks Against Transfer Learning With Pre-Trained Deep Learning Models
    Wang, Shuo
    Nepal, Surya
    Rudolph, Carsten
    Grobler, Marthie
    Chen, Shangyu
    Chen, Tianle
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (03) : 1526 - 1539
  • [6] Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning
    Li, Linyang
    Song, Demin
    Li, Xiaonan
    Zeng, Jiehang
    Ma, Ruotian
    Qiu, Xipeng
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3023 - 3032
  • [7] Multi-target Backdoor Attacks for Code Pre-trained Models
    Li, Yanzhou
    Liu, Shangqing
    Chen, Kangjie
    Xie, Xiaofei
    Zhang, Tianwei
    Liu, Yang
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7236 - 7254
  • [8] Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks
    Zhang, Zhengyan
    Xiao, Guangxuan
    Li, Yongwei
    Lv, Tian
    Qi, Fanchao
    Liu, Zhiyuan
    Wang, Yasheng
    Jiang, Xin
    Sun, Maosong
    MACHINE INTELLIGENCE RESEARCH, 2023, 20 (02) : 180 - 193
  • [9] Character, Word, or Both? Revisiting the Segmentation Granularity for Chinese Pre-trained Language Models
    Liang, Xinnian
    Zhou, Zefan
    Huang, Hui
    Wu, Shuangzhi
    Xiao, Tong
    Yang, Muyun
    Li, Zhoujun
    Bian, Chao
    arXiv, 2023,
  • [10] Moderate-fitting as a Natural Backdoor Defender for Pre-trained Language Models
    Zhu, Biru
    Qin, Yujia
    Cui, Ganqu
    Chen, Yangyi
    Zhao, Weilin
    Fu, Chong
    Deng, Yangdong
    Liu, Zhiyuan
    Wang, Jingang
    Wu, Wei
    Sun, Maosong
    Gu, Ming
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,