Improving the Adversarial Robustness of NLP Models by Information Bottleneck

被引:0
|
作者
Zhang, Cenyuan [1 ,2 ]
Zhou, Xiang [1 ,2 ]
Wan, Yixin [3 ]
Zheng, Xiaoqing [1 ,2 ]
Chang, Kai-Wei [3 ]
Hsieh, Cho-Jui [3 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[2] Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
[3] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing studies have demonstrated that adversarial examples can be directly attributed to the presence of non-robust features, which are highly predictive, but can be easily manipulated by adversaries to fool NLP models. In this study, we explore the feasibility of capturing task-specific robust features, while eliminating the non-robust ones by using the information bottleneck theory. Through extensive experiments, we show that the models trained with our information bottleneck-based method are able to achieve a significant improvement in robust accuracy, exceeding performances of all the previously reported defense methods while suffering almost no performance drop in clean accuracy on SST-2, AGNEWS and IMDB datasets.
引用
收藏
页码:3588 / 3598
页数:11
相关论文
共 50 条
  • [41] Adversarial Robustness of Phishing Email Detection Models
    Gholampour, Parisa Mehdi
    Verma, Rakesh M.
    PROCEEDINGS OF THE 9TH ACM INTERNATIONAL WORKSHOP ON SECURITY AND PRIVACY ANALYTICS, IWSPA 2023, 2023, : 67 - 76
  • [42] On the Robustness of Semantic Segmentation Models to Adversarial Attacks
    Arnab, Anurag
    Miksik, Ondrej
    Torr, Philip H. S.
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 888 - 897
  • [43] A survey on improving NLP models with human explanations
    Hartmann, Mareike
    Sonntag, Daniel
    PROCEEDINGS OF THE FIRST WORKSHOP ON LEARNING WITH NATURAL LANGUAGE SUPERVISION (LNLS 2022), 2022, : 40 - 47
  • [44] Improving Adversarial Robustness via Guided Complement Entropy
    Chen, Hao-Yun
    Liang, Jhao-Hong
    Chang, Shih-Chieh
    Pan, Jia-Yu
    Chen, Yu-Ting
    Wei, Wei
    Juan, Da-Cheng
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4880 - 4888
  • [45] Adversarial Robustness and Explainability of Machine Learning Models
    Gafur, Jamil
    Goddard, Steve
    Lai, William K. M.
    PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING 2024, PEARC 2024, 2024,
  • [46] On the Robustness of Semantic Segmentation Models to Adversarial Attacks
    Arnab, Anurag
    Miksik, Ondrej
    Torr, Philip H. S.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (12) : 3040 - 3053
  • [47] IMPROVING ROBUSTNESS TO ADVERSARIAL EXAMPLES BY ENCOURAGING DISCRIMINATIVE FEATURES
    Agarwal, Chirag
    Anh Nguyen
    Schonfeld, Dan
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3801 - 3805
  • [48] Weighted Adaptive Perturbations Adversarial Training for Improving Robustness
    Wang, Yan
    Zhang, Dongmei
    Zhang, Haiyang
    PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2022, 13630 : 402 - 415
  • [49] An orthogonal classifier for improving the adversarial robustness of neural networks
    Xu, Cong
    Li, Xiang
    Yang, Min
    INFORMATION SCIENCES, 2022, 591 : 251 - 262
  • [50] Improving Adversarial Robustness of Detector via Objectness Regularization
    Bao, Jiayu
    Chen, Jiansheng
    Ma, Hongbing
    Ma, Huimin
    Yu, Cheng
    Huang, Yiqing
    PATTERN RECOGNITION AND COMPUTER VISION, PT IV, 2021, 13022 : 252 - 262