Adversarial training and attribution methods enable evaluation of robustness and interpretability of deep learning models for image classification

被引:0
|
作者
Santos, Flavio A. O. [1 ]
Zanchettin, Cleber [1 ,2 ]
Lei, Weihua [3 ]
Amaral, Luis A. Nunes [2 ,3 ,4 ,5 ]
机构
[1] Univ Fed Pernambuco, Ctr Informat, BR-52061080 Recife, PE, Brazil
[2] Northwestern Univ, Dept Chem & Biol Engn, Evanston, IL 60208 USA
[3] Northwestern Univ, Dept Phys & Astron, Evanston, IL 60208 USA
[4] Northwestern Univ, Northwestern Inst Complex Syst, Evanston, IL 60208 USA
[5] Northwestern Univ, NSF Simons Natl Inst Theory & Math Biol, Chicago, IL 60611 USA
关键词
Compendex;
D O I
10.1103/PhysRevE.110.054310
中图分类号
O35 [流体力学]; O53 [等离子体物理学];
学科分类号
070204 ; 080103 ; 080704 ;
摘要
Deep learning models have achieved high performance in a wide range of applications. Recently, however, there have been increasing concerns about the fragility of many of those models to adversarial approaches and out-of-distribution inputs. A way to investigate and potentially address model fragility is to develop the ability to provide interpretability to model predictions. To this end, input attribution approaches such as Grad-CAM and integrated gradients have been introduced to address model interpretability. Here, we combine adversarial and input attribution approaches in order to achieve two goals. The first is to investigate the impact of adversarial approaches on input attribution. The second is to benchmark competing input attribution approaches. In the context of the image classification task, we find that models trained with adversarial approaches yield dramatically different input attribution matrices from those obtained using standard techniques for all considered input attribution approaches. Additionally, by evaluating the signal-(typical input attribution of the foreground)to-noise (typical input attribution of the background) ratio and correlating it to model confidence, we are able to identify the most reliable input attribution approaches and demonstrate that adversarial training does increase prediction robustness. Our approach can be easily extended to contexts other than the image classification task and enables users to increase their confidence in the reliability of deep learning models.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] INFORMER- Interpretability Founded Monitoring of Medical Image Deep Learning Models
    Shu, Shelley Zixin
    de Mortanges, Aurelie Pahud
    Poellinger, Alexander
    Mahapatra, Dwarikanath
    Reyes, Mauricio
    UNCERTAINTY FOR SAFE UTILIZATION OF MACHINE LEARNING IN MEDICAL IMAGING, UNSURE 2024, 2025, 15167 : 215 - 224
  • [42] TorchEsegeta: Framework for Interpretability and Explainability of Image-Based Deep Learning Models
    Chatterjee, Soumick
    Das, Arnab
    Mandal, Chirag
    Mukhopadhyay, Budhaditya
    Vipinraj, Manish
    Shukla, Aniruddh
    Rao, Rajatha Nagaraja
    Sarasaen, Chompunuch
    Speck, Oliver
    Nuernberger, Andreas
    APPLIED SCIENCES-BASEL, 2022, 12 (04):
  • [43] Directional adversarial training for cost sensitive deep learning classification applications
    Terzi, Matteo
    Susto, Gian Antonio
    Chaudhari, Pratik
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 91
  • [44] Graph Contrastive Learning based Adversarial Training for SAR Image Classification
    Wang, Xu
    Ye, Tian
    Kannan, Rajgopal
    Prasanna, Viktor
    ALGORITHMS FOR SYNTHETIC APERTURE RADAR IMAGERY XXXI, 2024, 13032
  • [45] Quantum Transfer Learning with Adversarial Robustness for Classification of High-Resolution Image Datasets
    Khatun, Amena
    Usman, Muhammad
    ADVANCED QUANTUM TECHNOLOGIES, 2025, 8 (01)
  • [46] ROBUST SENSIBLE ADVERSARIAL LEARNING OF DEEP NEURAL NETWORKS FOR IMAGE CLASSIFICATION
    Kim, Jungeum
    Wang, Xiao
    ANNALS OF APPLIED STATISTICS, 2023, 17 (02): : 961 - 984
  • [47] Deep Learning Pre-training Strategy for Mammogram Image Classification: an Evaluation Study
    Kadie Clancy
    Sarah Aboutalib
    Aly Mohamed
    Jules Sumkin
    Shandong Wu
    Journal of Digital Imaging, 2020, 33 : 1257 - 1265
  • [48] Deep Learning Pre-training Strategy for Mammogram Image Classification: an Evaluation Study
    Clancy, Kadie
    Aboutalib, Sarah
    Mohamed, Aly
    Sumkin, Jules
    Wu, Shandong
    JOURNAL OF DIGITAL IMAGING, 2020, 33 (05) : 1257 - 1265
  • [49] DRFL-VAT: Deep Representative Feature Learning With Virtual Adversarial Training for Semisupervised Classification of Hyperspectral Image
    Chen, Jialong
    Wang, Yuebin
    Zhang, Liqiang
    Liu, Meiling
    Plaza, Antonio
    IEEE Transactions on Geoscience and Remote Sensing, 2022, 60
  • [50] DRFL-VAT: Deep Representative Feature Learning With Virtual Adversarial Training for Semisupervised Classification of Hyperspectral Image
    Chen, Jialong
    Wang, Yuebin
    Zhang, Liqiang
    Liu, Meiling
    Plaza, Antonio
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60