Improving the Adversarial Robustness of NLP Models by Information Bottleneck

被引:0
|
作者
Zhang, Cenyuan [1 ,2 ]
Zhou, Xiang [1 ,2 ]
Wan, Yixin [3 ]
Zheng, Xiaoqing [1 ,2 ]
Chang, Kai-Wei [3 ]
Hsieh, Cho-Jui [3 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[2] Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
[3] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing studies have demonstrated that adversarial examples can be directly attributed to the presence of non-robust features, which are highly predictive, but can be easily manipulated by adversaries to fool NLP models. In this study, we explore the feasibility of capturing task-specific robust features, while eliminating the non-robust ones by using the information bottleneck theory. Through extensive experiments, we show that the models trained with our information bottleneck-based method are able to achieve a significant improvement in robust accuracy, exceeding performances of all the previously reported defense methods while suffering almost no performance drop in clean accuracy on SST-2, AGNEWS and IMDB datasets.
引用
收藏
页码:3588 / 3598
页数:11
相关论文
共 50 条
  • [1] Improving Adversarial Robustness via Information Bottleneck Distillation
    Kuang, Huafeng
    Liu, Hong
    Wu, YongJian
    Satoh, Shin'ichi
    Ji, Rongrong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] Improving adversarial robustness using knowledge distillation guided by attention information bottleneck
    Gong, Yuxin
    Wang, Shen
    Yu, Tingyue
    Jiang, Xunzhi
    Sun, Fanghui
    INFORMATION SCIENCES, 2024, 665
  • [3] Towards Improving Adversarial Training of NLP Models
    Yoo, Jin Yong
    Qi, Yanjun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 945 - 956
  • [4] CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation
    Wang, Tianlu
    Wang, Xuezhi
    Qin, Yao
    Ben Packer
    Lee, Kang
    Chen, Jilin
    Beutel, Alex
    Chi, Ed
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5141 - 5146
  • [5] IB-RAR: Information Bottleneck as Regularizer for Adversarial Robustness
    Xu, Xiaoyun
    Perin, Guilherme
    Picek, Stjepan
    2023 53RD ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS WORKSHOPS, DSN-W, 2023, : 129 - 135
  • [6] Revisiting Hilbert-Schmidt Information Bottleneck for Adversarial Robustness
    Wang, Zifeng
    Jian, Tong
    Masoomi, Aria
    Ioannidis, Stratis
    Dy, Jennifer
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [7] Improving adversarial robustness by learning shared information
    Yu, Xi
    Smedemark-Margulies, Niklas
    Aeron, Shuchin
    Koike-Akino, Toshiaki
    Moulin, Pierre
    Brand, Matthew
    Parsons, Kieran
    Wang, Ye
    PATTERN RECOGNITION, 2023, 134
  • [8] A Survey of Adversarial Defenses and Robustness in NLP
    Goyal, Shreya
    Doddapaneni, Sumanth
    Khapra, Mitesh M.
    Ravindran, Balaraman
    ACM COMPUTING SURVEYS, 2023, 55 (14S)
  • [9] Adversarial Information Bottleneck
    Zhai, Penglong
    Zhang, Shihua
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (01) : 221 - 230
  • [10] InfoAT: Improving Adversarial Training Using the Information Bottleneck Principle
    Xu, Mengting
    Zhang, Tao
    Li, Zhongnian
    Zhang, Daoqiang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (01) : 1255 - 1264