Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training

被引：0

作者：

Wu, Xi ^{[1
]}

Jang, Uyeong ^{[2
]}

Chen, Jiefeng ^{[2
]}

Chen, Lingjiao ^{[2
]}

Jha, Somesh ^{[2
]}

机构：

[1] Google, Mountain View, CA 94043 USA

[2] Univ Wisconsin Madison, Madison, WI USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80 | 2018年 / 80卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we study leveraging confidence information induced by adversarial training to reinforce adversarial robustness of a given adversarially trained model. A natural measure of confidence is parallel to F(x)parallel to(infinity) (i.e. how confident F is about its prediction?). We start by analyzing an adversarial training formulation proposed by Madry et al.. We demonstrate that, under a variety of instantiations, an only somewhat good solution to their objective induces confidence to be a discriminator, which can distinguish between right and wrong model predictions in a neighborhood of a point sampled from the underlying distribution. Based on this, we propose Highly Confident Near Neighbor (HCNN), a framework that combines confidence information and nearest neighbor search, to reinforce adversarial robustness of a base model. We give algorithms in this framework and perform a detailed empirical study. We report encouraging experimental results that support our analysis, and also discuss problems we observed with existing adversarial training.

引用

页数：9

共 50 条

[1] Recent Advances in Adversarial Training for Adversarial Robustness
Bai, Tao
Luo, Jinqi
Zhao, Jun
Wen, Bihan
Wang, Qian
[J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4312 - 4321
[2] Increasing Confidence in Adversarial Robustness Evaluations
Zimmermann, Roland S.
Brendel, Wieland
Tramer, Florian
Carlini, Nicholas
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[3] Adversarial Minimax Training for Robustness Against Adversarial Examples
Komiyama, Ryota
Hattori, Motonobu
[J]. NEURAL INFORMATION PROCESSING (ICONIP 2018), PT II, 2018, 11302 : 690 - 699
[4] Sliced Wasserstein adversarial training for improving adversarial robustness
Lee, Woojin
Lee, Sungyoon
Kim, Hoki
Lee, Jaewook
[J]. Journal of Ambient Intelligence and Humanized Computing, 2024, 15 (08) : 3229 - 3242
[5] Integrating confidence calibration and adversarial robustness via adversarial calibration entropy
Chen, Yong
Hu, Peng
Yuan, Zhong
Peng, Dezhong
Wang, Xu
[J]. INFORMATION SCIENCES, 2024, 668
[6] On the Convergence and Robustness of Adversarial Training
Wang, Yisen
Ma, Xingjun
Bailey, James
Yi, Jinfeng
Zhou, Bowen
Gu, Quanquan
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[7] Achieving Model Robustness through Discrete Adversarial Training
Ivgi, Maor
Berant, Jonathan
[J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1529 - 1544
[8] Poster: Boosting Adversarial Robustness by Adversarial Pre-training
Xu, Xiaoyun
Picek, Stjepan
[J]. PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 3540 - 3542
[9] REINFORCING THE ROBUSTNESS OF A DEEP NEURAL NETWORK TO ADVERSARIAL EXAMPLES BY USING COLOR QUANTIZATION OF TRAINING IMAGE DATA
Miyazato, Shuntaro
Wang, Xueting
Yamasaki, Toshihiko
Aizawa, Kiyoharu
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 884 - 888
[10] Adversarial Training and Robustness for Multiple Perturbations
Tramer, Florian
Boneh, Dan
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32

← 1 2 3 4 5 →