Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training

被引:0
|
作者
Wu, Xi [1 ]
Jang, Uyeong [2 ]
Chen, Jiefeng [2 ]
Chen, Lingjiao [2 ]
Jha, Somesh [2 ]
机构
[1] Google, Mountain View, CA 94043 USA
[2] Univ Wisconsin Madison, Madison, WI USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we study leveraging confidence information induced by adversarial training to reinforce adversarial robustness of a given adversarially trained model. A natural measure of confidence is parallel to F(x)parallel to(infinity) (i.e. how confident F is about its prediction?). We start by analyzing an adversarial training formulation proposed by Madry et al.. We demonstrate that, under a variety of instantiations, an only somewhat good solution to their objective induces confidence to be a discriminator, which can distinguish between right and wrong model predictions in a neighborhood of a point sampled from the underlying distribution. Based on this, we propose Highly Confident Near Neighbor (HCNN), a framework that combines confidence information and nearest neighbor search, to reinforce adversarial robustness of a base model. We give algorithms in this framework and perform a detailed empirical study. We report encouraging experimental results that support our analysis, and also discuss problems we observed with existing adversarial training.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Recent Advances in Adversarial Training for Adversarial Robustness
    Bai, Tao
    Luo, Jinqi
    Zhao, Jun
    Wen, Bihan
    Wang, Qian
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4312 - 4321
  • [2] Increasing Confidence in Adversarial Robustness Evaluations
    Zimmermann, Roland S.
    Brendel, Wieland
    Tramer, Florian
    Carlini, Nicholas
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [3] Adversarial Minimax Training for Robustness Against Adversarial Examples
    Komiyama, Ryota
    Hattori, Motonobu
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2018), PT II, 2018, 11302 : 690 - 699
  • [4] Sliced Wasserstein adversarial training for improving adversarial robustness
    Lee, Woojin
    Lee, Sungyoon
    Kim, Hoki
    Lee, Jaewook
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2024, 15 (08) : 3229 - 3242
  • [5] Integrating confidence calibration and adversarial robustness via adversarial calibration entropy
    Chen, Yong
    Hu, Peng
    Yuan, Zhong
    Peng, Dezhong
    Wang, Xu
    [J]. INFORMATION SCIENCES, 2024, 668
  • [6] On the Convergence and Robustness of Adversarial Training
    Wang, Yisen
    Ma, Xingjun
    Bailey, James
    Yi, Jinfeng
    Zhou, Bowen
    Gu, Quanquan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [7] Achieving Model Robustness through Discrete Adversarial Training
    Ivgi, Maor
    Berant, Jonathan
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1529 - 1544
  • [8] Poster: Boosting Adversarial Robustness by Adversarial Pre-training
    Xu, Xiaoyun
    Picek, Stjepan
    [J]. PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 3540 - 3542
  • [9] REINFORCING THE ROBUSTNESS OF A DEEP NEURAL NETWORK TO ADVERSARIAL EXAMPLES BY USING COLOR QUANTIZATION OF TRAINING IMAGE DATA
    Miyazato, Shuntaro
    Wang, Xueting
    Yamasaki, Toshihiko
    Aizawa, Kiyoharu
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 884 - 888
  • [10] Adversarial Training and Robustness for Multiple Perturbations
    Tramer, Florian
    Boneh, Dan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32