Improving the Adversarial Robustness of NLP Models by Information Bottleneck

被引:0
|
作者
Zhang, Cenyuan [1 ,2 ]
Zhou, Xiang [1 ,2 ]
Wan, Yixin [3 ]
Zheng, Xiaoqing [1 ,2 ]
Chang, Kai-Wei [3 ]
Hsieh, Cho-Jui [3 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[2] Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
[3] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing studies have demonstrated that adversarial examples can be directly attributed to the presence of non-robust features, which are highly predictive, but can be easily manipulated by adversaries to fool NLP models. In this study, we explore the feasibility of capturing task-specific robust features, while eliminating the non-robust ones by using the information bottleneck theory. Through extensive experiments, we show that the models trained with our information bottleneck-based method are able to achieve a significant improvement in robust accuracy, exceeding performances of all the previously reported defense methods while suffering almost no performance drop in clean accuracy on SST-2, AGNEWS and IMDB datasets.
引用
收藏
页码:3588 / 3598
页数:11
相关论文
共 50 条
  • [31] Parseval Networks: Improving Robustness to Adversarial Examples
    Cisse, Moustapha
    Bojanowski, Piotr
    Grave, Edouard
    Dauphin, Yann
    Usunier, Nicolas
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [32] Improving Calibration through the Relationship with Adversarial Robustness
    Qin, Yao
    Wang, Xuezhi
    Beutel, Alex
    Chi, Ed H.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [33] Interpreting the Robustness of Neural NLP Models to Textual Perturbations
    Zhang, Yunxiang
    Pan, Liangming
    Tang, Samson
    Kan, Min-Yen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3993 - 4007
  • [34] Exploring Robust Features for Improving Adversarial Robustness
    Wang, Hong
    Deng, Yuefan
    Yoo, Shinjae
    Lin, Yuewei
    IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (09) : 5141 - 5151
  • [35] Models in the Wild: On Corruption Robustness of Neural NLP Systems
    Rychalska, Barbara
    Basaj, Dominika
    Gosiewska, Alicja
    Biecek, Przemyslaw
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT III, 2019, 11955 : 235 - 247
  • [36] Enhancing Adversarial Transferability via Information Bottleneck Constraints
    Qi, Biqing
    Gao, Junqi
    Liu, Jianxing
    Wu, Ligang
    Zhou, Bowen
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 1414 - 1418
  • [37] Diverse Knowledge Distillation (DKD): A Solution for Improving The Robustness of Ensemble Models Against Adversarial Attacks
    Mirzaeian, Ali
    Kosecka, Jana
    Homayoun, Houman
    Mohsenin, Tinoosh
    Sasan, Avesta
    PROCEEDINGS OF THE 2021 TWENTY SECOND INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED 2021), 2021, : 319 - 324
  • [38] Unsupervised Anomaly Detection for Improving Adversarial Robustness of 3D Object Detection Models
    Cai, Mumuxin
    Wang, Xupeng
    Sohel, Ferdous
    Lei, Hang
    ELECTRONICS, 2025, 14 (02):
  • [39] Adversarial Robustness of Deep Sensor Fusion Models
    Wang, Shaojie
    Wu, Tong
    Chakrabarti, Ayan
    Vorobeychik, Yevgeniy
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 1371 - 1380
  • [40] Using Random Perturbations to Mitigate Adversarial Attacks on NLP Models
    Swenor, Abigail
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 13142 - 13143