Adversarial training and attribution methods enable evaluation of robustness and interpretability of deep learning models for image classification

被引:0
|
作者
Santos, Flavio A. O. [1 ]
Zanchettin, Cleber [1 ,2 ]
Lei, Weihua [3 ]
Amaral, Luis A. Nunes [2 ,3 ,4 ,5 ]
机构
[1] Univ Fed Pernambuco, Ctr Informat, BR-52061080 Recife, PE, Brazil
[2] Northwestern Univ, Dept Chem & Biol Engn, Evanston, IL 60208 USA
[3] Northwestern Univ, Dept Phys & Astron, Evanston, IL 60208 USA
[4] Northwestern Univ, Northwestern Inst Complex Syst, Evanston, IL 60208 USA
[5] Northwestern Univ, NSF Simons Natl Inst Theory & Math Biol, Chicago, IL 60611 USA
关键词
Compendex;
D O I
10.1103/PhysRevE.110.054310
中图分类号
O35 [流体力学]; O53 [等离子体物理学];
学科分类号
070204 ; 080103 ; 080704 ;
摘要
Deep learning models have achieved high performance in a wide range of applications. Recently, however, there have been increasing concerns about the fragility of many of those models to adversarial approaches and out-of-distribution inputs. A way to investigate and potentially address model fragility is to develop the ability to provide interpretability to model predictions. To this end, input attribution approaches such as Grad-CAM and integrated gradients have been introduced to address model interpretability. Here, we combine adversarial and input attribution approaches in order to achieve two goals. The first is to investigate the impact of adversarial approaches on input attribution. The second is to benchmark competing input attribution approaches. In the context of the image classification task, we find that models trained with adversarial approaches yield dramatically different input attribution matrices from those obtained using standard techniques for all considered input attribution approaches. Additionally, by evaluating the signal-(typical input attribution of the foreground)to-noise (typical input attribution of the background) ratio and correlating it to model confidence, we are able to identify the most reliable input attribution approaches and demonstrate that adversarial training does increase prediction robustness. Our approach can be easily extended to contexts other than the image classification task and enables users to increase their confidence in the reliability of deep learning models.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Robustness of on-device Models: Adversarial Attack to Deep Learning Models on Android Apps
    Huang, Yujin
    Hu, Han
    Chen, Chunyang
    2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE (ICSE-SEIP 2021), 2021, : 101 - 110
  • [32] Visual interpretation of deep learning model in ECG classification: A comprehensive evaluation of feature attribution methods
    Suh, Jangwon
    Kim, Jimyeong
    Kwon, Soonil
    Jung, Euna
    Ahn, Hyo-Jeong
    Lee, Kyung-Yeon
    Choi, Eue-Keun
    Rhee, Wonjong
    Computers in Biology and Medicine, 2024, 182
  • [33] Deep Intrinsic Decomposition With Adversarial Learning for Hyperspectral Image Classification
    Gong, Zhiqiang
    Qi, Jiahao
    Zhong, Ping
    Zhou, Xian
    Yao, Wen
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [34] DEEP ADVERSARIAL ACTIVE LEARNING WITH MODEL UNCERTAINTY FOR IMAGE CLASSIFICATION
    Zhu, Zheng
    Wang, Hongxing
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1711 - 1715
  • [35] Is Robustness the Cost of Accuracy? - A Comprehensive Study on the Robustness of 18 Deep Image Classification Models
    Su, Dong
    Zhang, Huan
    Chen, Hongge
    Yi, Jinfeng
    Chen, Pin-Yu
    Gao, Yupeng
    COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 : 644 - 661
  • [36] Robustness of Deep Learning models in electrocardiogram noise detection and classification
    Rahman, Saifur
    Pal, Shantanu
    Yearwood, John
    Karmakar, Chandan
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2024, 253
  • [37] Hyperspectral Image Classification With Deep Learning Models
    Yang, Xiaofei
    Ye, Yunming
    Li, Xutao
    Lau, Raymond Y. K.
    Zhang, Xiaofeng
    Huang, Xiaohui
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2018, 56 (09): : 5408 - 5423
  • [38] Evaluating and Improving Adversarial Robustness of Deep Learning Models for Intelligent Vehicle Safety
    Hussain, Manzoor
    Hong, Jang-Eui
    IEEE TRANSACTIONS ON RELIABILITY, 2024,
  • [39] Exploring adversarial image attacks on deep learning models in oncology
    Joel, Marina
    Umrao, Sachin
    Chang, Enoch
    Choi, Rachel
    Yang, Daniel
    Gilson, Aidan
    Herbst, Roy
    Krumholz, Harlan
    Aneja, Sanjay
    CLINICAL CANCER RESEARCH, 2021, 27 (05)
  • [40] Robustness of Image-Based Malware Classification Models trained with Generative Adversarial Networks
    Reilly, Ciaran
    O'Shaughnessy, Stephen
    Thorpe, Christina
    PROCEEDINGS OF THE 2023 EUROPEAN INTERDISCIPLINARY CYBERSECURITY CONFERENCE, EICC 2023, 2023, : 92 - 99