Improving the Adversarial Robustness of NLP Models by Information Bottleneck

被引:0
|
作者
Zhang, Cenyuan [1 ,2 ]
Zhou, Xiang [1 ,2 ]
Wan, Yixin [3 ]
Zheng, Xiaoqing [1 ,2 ]
Chang, Kai-Wei [3 ]
Hsieh, Cho-Jui [3 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[2] Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
[3] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing studies have demonstrated that adversarial examples can be directly attributed to the presence of non-robust features, which are highly predictive, but can be easily manipulated by adversaries to fool NLP models. In this study, we explore the feasibility of capturing task-specific robust features, while eliminating the non-robust ones by using the information bottleneck theory. Through extensive experiments, we show that the models trained with our information bottleneck-based method are able to achieve a significant improvement in robust accuracy, exceeding performances of all the previously reported defense methods while suffering almost no performance drop in clean accuracy on SST-2, AGNEWS and IMDB datasets.
引用
收藏
页码:3588 / 3598
页数:11
相关论文
共 50 条
  • [21] Evaluating and Improving Adversarial Robustness of Deep Learning Models for Intelligent Vehicle Safety
    Hussain, Manzoor
    Hong, Jang-Eui
    IEEE TRANSACTIONS ON RELIABILITY, 2024,
  • [22] Improving Adversarial Robustness of 3D Point Cloud Classification Models
    Li, Guanlin
    Xu, Guowen
    Qiu, Han
    He, Ruan
    Li, Jiwei
    Zhang, Tianwei
    COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 672 - 689
  • [23] Towards Trustworthy NLP: An Adversarial Robustness Enhancement Based on Perplexity Difference
    Ge, Zhaocheng
    Hu, Hanping
    Zhao, Tengfei
    Frontiers in Artificial Intelligence and Applications, 2023, 372 : 803 - 810
  • [24] Improving the robustness of steganalysis in the adversarial environment with Generative Adversarial Network
    Peng, Ye
    Yu, Qi
    Fu, Guobin
    Zhang, WenWen
    Duan, ChaoFan
    JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2024, 82
  • [25] Improving Adversarial Robustness via Attention and Adversarial Logit Pairing
    Li, Xingjian
    Goodman, Dou
    Liu, Ji
    Wei, Tao
    Dou, Dejing
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2022, 4
  • [26] Improving Ensemble Robustness by Collaboratively Promoting and Demoting Adversarial Robustness
    Anh Tuan Bui
    Trung Le
    Zhao, He
    Montague, Paul
    deVel, Olivier
    Abraham, Tamas
    Dinh Phung
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 6831 - 6839
  • [27] Semantically Equivalent Adversarial Rules for Debugging NLP Models
    Ribeiro, Marco Tulio
    Singh, Sameer
    Guestrin, Carlos
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 856 - 865
  • [28] Assessing and improving syntactic adversarial robustness of pre-trained models for code translation
    Yang, Guang
    Zhou, Yu
    Zhang, Xiangyu
    Chen, Xiang
    Han, Tingting
    Chen, Taolue
    INFORMATION AND SOFTWARE TECHNOLOGY, 2025, 181
  • [29] On Robustness of Finetuned Transformer-based NLP Models
    Neerudu, Pavan Kalyan Reddy
    Oota, Subba Reddy
    Marreddy, Mounika
    Kagita, Venkateswara Rao
    Gupta, Manish
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 7180 - 7195
  • [30] Improving Adversarial Robustness by Reconstructing Interclass Relationships
    Xu, Li
    Guo, Huiting
    Yang, Zejin
    Wan, Xu
    Fan, Chunlong
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1968 - 1973