Rethinking Textual Adversarial Defense for Pre-Trained Language Models

被引:7
|
作者
Wang, Jiayi [1 ,2 ,3 ]
Bao, Rongzhou [4 ]
Zhang, Zhuosheng [1 ,2 ,3 ]
Zhao, Hai [1 ,2 ,3 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
[2] Shanghai Jiao Tong Univ, Key Lab Shanghai Educ Commiss Intelligent Interac, Shanghai 200240, Peoples R China
[3] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai 200240, Peoples R China
[4] Ant Grp, Hangzhou 310000, Peoples R China
基金
中国国家自然科学基金;
关键词
Detectors; Perturbation methods; Robustness; Speech processing; Adaptation models; Predictive models; Computer vision; Adversarial attack; adversarial defense; pre-trained language models; ATTACKS;
D O I
10.1109/TASLP.2022.3192097
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Although pre-trained language models (PrLMs) have achieved significant success, recent studies demonstrate that PrLMs are vulnerable to adversarial attacks. By generating adversarial examples with slight perturbations on different levels (sentence / word / character), adversarial attacks can fool PrLMs to generate incorrect predictions, which questions the robustness of PrLMs. However, we find that most existing textual adversarial examples are unnatural, which can be easily distinguished by both human and machine. Based on a general anomaly detector, we propose a novel metric (Degree of Anomaly) as a constraint to enable current adversarial attack approaches to generate more natural and imperceptible adversarial examples. Under this new constraint, the success rate of existing attacks drastically decreases, which reveals that the robustness of PrLMs is not as fragile as they claimed. In addition, we find that four types of randomization can invalidate a large portion of textual adversarial examples. Based on anomaly detector and randomization, we design a universal defense framework, which is among the first to perform textual adversarial defense without knowing the specific attack. Empirical results show that our universal defense framework achieves comparable or even higher after-attack accuracy with other specific defenses, while preserving higher original accuracy at the same time. Our work discloses the essence of textual adversarial attacks, and indicates that (i) further works of adversarial attacks should focus more on how to overcome the detection and resist the randomization, otherwise their adversarial examples would be easily detected and invalidated; and (ii) compared with the unnatural and perceptible adversarial examples, it is those undetectable adversarial examples that pose real risks for PrLMs and require more attention for future robustness-enhancing strategies.
引用
下载
收藏
页码:2526 / 2540
页数:15
相关论文
共 50 条
  • [41] Memorisation versus Generalisation in Pre-trained Language Models
    Tanzer, Michael
    Ruder, Sebastian
    Rei, Marek
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 7564 - 7578
  • [42] Understanding Online Attitudes with Pre-Trained Language Models
    Power, William
    Obradovic, Zoran
    PROCEEDINGS OF THE 2023 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2023, 2023, : 745 - 752
  • [43] On the Sentence Embeddings from Pre-trained Language Models
    Li, Bohan
    Zhou, Hao
    He, Junxian
    Wang, Mingxuan
    Yang, Yiming
    Li, Lei
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 9119 - 9130
  • [44] Compressing Pre-trained Language Models by Matrix Decomposition
    Ben Noach, Matan
    Goldberg, Yoav
    1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 884 - 889
  • [45] Pre-trained language models for keyphrase prediction: A review
    Umair, Muhammad
    Sultana, Tangina
    Lee, Young-Koo
    ICT EXPRESS, 2024, 10 (04): : 871 - 890
  • [46] Pre-Trained Language Models for Text Generation: A Survey
    Li, Junyi
    Tang, Tianyi
    Zhao, Wayne Xin
    Nie, Jian-Yun
    Wen, Ji-Rong
    ACM COMPUTING SURVEYS, 2024, 56 (09)
  • [47] Pre-trained models for natural language processing: A survey
    QIU XiPeng
    SUN TianXiang
    XU YiGe
    SHAO YunFan
    DAI Ning
    HUANG XuanJing
    Science China(Technological Sciences), 2020, (10) : 1872 - 1897
  • [48] Evaluating and Inducing Personality in Pre-trained Language Models
    Jiang, Guangyuan
    Xu, Manjie
    Zhu, Song-Chun
    Han, Wenjuan
    Zhang, Chi
    Zhu, Yixin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [49] Robust Lottery Tickets for Pre-trained Language Models
    Zheng, Rui
    Bao, Rong
    Zhou, Yuhao
    Liang, Di
    Wane, Sirui
    Wu, Wei
    Gui, Tao
    Zhang, Qi
    Huang, Xuanjing
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2211 - 2224
  • [50] Pre-trained models for natural language processing: A survey
    XiPeng Qiu
    TianXiang Sun
    YiGe Xu
    YunFan Shao
    Ning Dai
    XuanJing Huang
    Science China Technological Sciences, 2020, 63 : 1872 - 1897