The Enemy of My Enemy is My Friend: Exploring Inverse Adversaries for Improving Adversarial Training
被引:7
|
作者:
Dong, Junhao
论文数: 0引用数: 0
h-index: 0
机构:
Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R ChinaSun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
Dong, Junhao
[1
]
Moosavi-Dezfooli, Seyed-Mohsen
论文数: 0引用数: 0
h-index: 0
机构:
Imperial Coll London, London, EnglandSun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
Moosavi-Dezfooli, Seyed-Mohsen
[2
]
Lai, Jianhuang
论文数: 0引用数: 0
h-index: 0
机构:
Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
Guangdong Prov Key Lab Informat Secur Technol, Guangzhou, Peoples R China
Minist Educ, Key Lab Machine Intelligence & Adv Comp, Beijing, Peoples R ChinaSun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
Lai, Jianhuang
[1
,3
,4
]
Xie, Xiaohua
论文数: 0引用数: 0
h-index: 0
机构:
Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
Guangdong Prov Key Lab Informat Secur Technol, Guangzhou, Peoples R China
Minist Educ, Key Lab Machine Intelligence & Adv Comp, Beijing, Peoples R ChinaSun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
Xie, Xiaohua
[1
,3
,4
]
机构:
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
[2] Imperial Coll London, London, England
[3] Guangdong Prov Key Lab Informat Secur Technol, Guangzhou, Peoples R China
[4] Minist Educ, Key Lab Machine Intelligence & Adv Comp, Beijing, Peoples R China
Although current deep learning techniques have yielded superior performance on various computer vision tasks, yet they are still vulnerable to adversarial examples. Adversarial training and its variants have been shown to be the most effective approaches to defend against adversarial examples. A particular class of these methods regularize the difference between output probabilities for an adversarial and its corresponding natural example. However, it may have a negative impact if a natural example is misclassified. To circumvent this issue, we propose a novel adversarial training scheme that encourages the model to produce similar output probabilities for an adversarial example and its "inverse adversarial" counterpart. Particularly, the counterpart is generated by maximizing the likelihood in the neighborhood of the natural example. Extensive experiments on various vision datasets and architectures demonstrate that our training method achieves state-of-the-art robustness as well as natural accuracy among robust models. Furthermore, using a universal version of inverse adversarial examples, we improve the performance of single-step adversarial training techniques at a low computational cost.