Adversarial training and attribution methods enable evaluation of robustness and interpretability of deep learning models for image classification

被引:0
|
作者
Santos, Flavio A. O. [1 ]
Zanchettin, Cleber [1 ,2 ]
Lei, Weihua [3 ]
Amaral, Luis A. Nunes [2 ,3 ,4 ,5 ]
机构
[1] Univ Fed Pernambuco, Ctr Informat, BR-52061080 Recife, PE, Brazil
[2] Northwestern Univ, Dept Chem & Biol Engn, Evanston, IL 60208 USA
[3] Northwestern Univ, Dept Phys & Astron, Evanston, IL 60208 USA
[4] Northwestern Univ, Northwestern Inst Complex Syst, Evanston, IL 60208 USA
[5] Northwestern Univ, NSF Simons Natl Inst Theory & Math Biol, Chicago, IL 60611 USA
关键词
Compendex;
D O I
10.1103/PhysRevE.110.054310
中图分类号
O35 [流体力学]; O53 [等离子体物理学];
学科分类号
070204 ; 080103 ; 080704 ;
摘要
Deep learning models have achieved high performance in a wide range of applications. Recently, however, there have been increasing concerns about the fragility of many of those models to adversarial approaches and out-of-distribution inputs. A way to investigate and potentially address model fragility is to develop the ability to provide interpretability to model predictions. To this end, input attribution approaches such as Grad-CAM and integrated gradients have been introduced to address model interpretability. Here, we combine adversarial and input attribution approaches in order to achieve two goals. The first is to investigate the impact of adversarial approaches on input attribution. The second is to benchmark competing input attribution approaches. In the context of the image classification task, we find that models trained with adversarial approaches yield dramatically different input attribution matrices from those obtained using standard techniques for all considered input attribution approaches. Additionally, by evaluating the signal-(typical input attribution of the foreground)to-noise (typical input attribution of the background) ratio and correlating it to model confidence, we are able to identify the most reliable input attribution approaches and demonstrate that adversarial training does increase prediction robustness. Our approach can be easily extended to contexts other than the image classification task and enables users to increase their confidence in the reliability of deep learning models.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Using ensemble methods to improve the robustness of deep learning for image classification in marine environments
    Wyatt, Mathew
    Radford, Ben
    Callow, Nikolaus
    Bennamoun, Mohammed
    Hickey, Sharyn
    METHODS IN ECOLOGY AND EVOLUTION, 2022, 13 (06): : 1317 - 1328
  • [22] Adversarial Training Methods for Deep Learning: A Systematic Review
    Zhao, Weimin
    Alwidian, Sanaa
    Mahmoud, Qusay H.
    ALGORITHMS, 2022, 15 (08)
  • [23] The Impact of Model Variations on the Robustness of Deep Learning Models in Adversarial Settings
    Juraev, Firuz
    Abuhamad, Mohammed
    Woo, Simon S.
    Thiruvathukal, George K.
    Abuhmed, Tamer
    2024 SILICON VALLEY CYBERSECURITY CONFERENCE, SVCC 2024, 2024,
  • [24] ADVRET: An Adversarial Robustness Evaluating and Testing Platform for Deep Learning Models
    Ren, Fei
    Yang, Yonghui
    Hu, Chi
    Zhou, Yuyao
    Ma, Siyou
    2021 21ST INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY COMPANION (QRS-C 2021), 2021, : 9 - 14
  • [25] Adversarial Robustness for Deep Learning-Based Wildfire Prediction Models
    Ide, Ryo
    Yang, Lei
    FIRE-SWITZERLAND, 2025, 8 (02):
  • [26] Adversarial Deep Learning: A Survey on Adversarial Attacks and Defense Mechanisms on Image Classification
    Khamaiseh, Samer Y.
    Bagagem, Derek
    Al-Alaj, Abdullah
    Mancino, Mathew
    Alomari, Hakam W.
    IEEE ACCESS, 2022, 10 : 102266 - 102291
  • [27] Adversarial attacks and adversarial training for burn image segmentation based on deep learning
    Chen, Luying
    Liang, Jiakai
    Wang, Chao
    Yue, Keqiang
    Li, Wenjun
    Fu, Zhihui
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2024, 62 (09) : 2717 - 2735
  • [28] Evaluating Pretrained Deep Learning Models for Image Classification Against Individual and Ensemble Adversarial Attacks
    Rahman, Mafizur
    Roy, Prosenjit
    Frizell, Sherri S.
    Qian, Lijun
    IEEE ACCESS, 2025, 13 : 35230 - 35242
  • [29] CARLA-GEAR: A Dataset Generator for a Systematic Evaluation of Adversarial Robustness of Deep Learning Vision Models
    Nesti, Federico
    Rossolini, Giulio
    D'Amico, Gianluca
    Biondi, Alessandro
    Buttazzo, Giorgio
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (08) : 9840 - 9851
  • [30] CARLA-GEAR: A Dataset Generator for a Systematic Evaluation of Adversarial Robustness of Deep Learning Vision Models
    Nesti, Federico
    Rossolini, Giulio
    D'Amico, Gianluca
    Biondi, Alessandro
    Buttazzo, Giorgio
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (08) : 9840 - 9851