Explaining Deep Learning Models with Constrained Adversarial Examples

被引:17
|
作者
Moore, Jonathan [1 ]
Hammerla, Nils [1 ]
Watkins, Chris [2 ]
机构
[1] Babylon Hlth, London SW3 3DD, England
[2] Royal Holloway Univ London, Egham, Surrey, England
关键词
Explainable AI; Adversarial examples; Counerfactual explanations; INTERPRETABILITY;
D O I
10.1007/978-3-030-29908-8_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning algorithms generally suffer from a problem of explainability. Given a classification result from a model, it is typically hard to determine what caused the decision to be made, and to give an informative explanation. We explore a new method of generating counterfactual explanations, which instead of explaining why a particular classification was made explain how a different outcome can be achieved. This gives the recipients of the explanation a better way to understand the outcome, and provides an actionable suggestion. We show that the introduced method of Constrained Adversarial Examples (CADEX) can be used in real world applications, and yields explanations which incorporate business or domain constraints such as handling categorical attributes and range constraints.
引用
收藏
页码:43 / 56
页数:14
相关论文
共 50 条
  • [41] Deep learning models for electrocardiograms are susceptible to adversarial attack
    Han, Xintian
    Hu, Yuxuan
    Foschini, Luca
    Chinitz, Larry
    Jankelson, Lior
    Ranganath, Rajesh
    NATURE MEDICINE, 2020, 26 (03) : 360 - +
  • [42] Defending Deep Learning Models Against Adversarial Attacks
    Mani, Nag
    Moh, Melody
    Moh, Teng-Sheng
    INTERNATIONAL JOURNAL OF SOFTWARE SCIENCE AND COMPUTATIONAL INTELLIGENCE-IJSSCI, 2021, 13 (01): : 72 - 89
  • [43] Adversarial attacks on deep learning models in smart grids
    Hao, Jingbo
    Tao, Yang
    ENERGY REPORTS, 2022, 8 : 123 - 129
  • [44] On the Robustness of Deep Learning Models to Universal Adversarial Attack
    Karim, Rezaul
    Islam, Md Amirul
    Mohammed, Noman
    Bruce, Neil D. B.
    2018 15TH CONFERENCE ON COMPUTER AND ROBOT VISION (CRV), 2018, : 55 - 62
  • [45] Robust Adversarial Objects against Deep Learning Models
    Tsai, Tzungyu
    Yang, Kaichen
    Ho, Tsung-Yi
    Jin, Yier
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 954 - 962
  • [46] Deep learning models for electrocardiograms are susceptible to adversarial attack
    Xintian Han
    Yuxuan Hu
    Luca Foschini
    Larry Chinitz
    Lior Jankelson
    Rajesh Ranganath
    Nature Medicine, 2020, 26 : 360 - 363
  • [47] Safe batch constrained deep reinforcement learning with generative adversarial network
    Dong, Wenbo
    Liu, Shaofan
    Sun, Shiliang
    INFORMATION SCIENCES, 2023, 634 : 259 - 270
  • [48] Houdini: Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples
    Cisse, Moustapha
    Adi, Yossi
    Neverova, Natalia
    Keshet, Joseph
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [49] Is Deep Learning Safe for Robot Vision? Adversarial Examples against the iCub Humanoid
    Melis, Marco
    Demontis, Ambra
    Biggio, Battista
    Brown, Gavin
    Fumera, Giorgio
    Roli, Fabio
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 751 - 759
  • [50] Using Adversarial Examples to Bypass Deep Learning Based URL Detection System
    Chen, Wencheng
    Zeng, Yi
    Qiu, Meikang
    4TH IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2019) / 3RD INTERNATIONAL SYMPOSIUM ON REINFORCEMENT LEARNING (ISRL 2019), 2019, : 128 - 130