Explaining Deep Learning Models with Constrained Adversarial Examples

被引:17
|
作者
Moore, Jonathan [1 ]
Hammerla, Nils [1 ]
Watkins, Chris [2 ]
机构
[1] Babylon Hlth, London SW3 3DD, England
[2] Royal Holloway Univ London, Egham, Surrey, England
关键词
Explainable AI; Adversarial examples; Counerfactual explanations; INTERPRETABILITY;
D O I
10.1007/978-3-030-29908-8_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning algorithms generally suffer from a problem of explainability. Given a classification result from a model, it is typically hard to determine what caused the decision to be made, and to give an informative explanation. We explore a new method of generating counterfactual explanations, which instead of explaining why a particular classification was made explain how a different outcome can be achieved. This gives the recipients of the explanation a better way to understand the outcome, and provides an actionable suggestion. We show that the introduced method of Constrained Adversarial Examples (CADEX) can be used in real world applications, and yields explanations which incorporate business or domain constraints such as handling categorical attributes and range constraints.
引用
收藏
页码:43 / 56
页数:14
相关论文
共 50 条
  • [21] Adversarial examples: attacks and defences on medical deep learning systems
    Puttagunta, Murali Krishna
    Ravi, S.
    Babu, C. Nelson Kennedy
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (22) : 33773 - 33809
  • [22] Understanding adversarial examples requires a theory of artefacts for deep learning
    Cameron Buckner
    Nature Machine Intelligence, 2020, 2 : 731 - 736
  • [23] A hybrid adversarial training for deep learning model and denoising network resistant to adversarial examples
    Gwonsang Ryu
    Daeseon Choi
    Applied Intelligence, 2023, 53 : 9174 - 9187
  • [24] A hybrid adversarial training for deep learning model and denoising network resistant to adversarial examples
    Ryu, Gwonsang
    Choi, Daeseon
    APPLIED INTELLIGENCE, 2023, 53 (08) : 9174 - 9187
  • [25] ADVERSARIAL EXAMPLES FOR GOOD: ADVERSARIAL EXAMPLES GUIDED IMBALANCED LEARNING
    Zhang, Jie
    Zhang, Lei
    Li, Gang
    Wu, Chao
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 136 - 140
  • [26] Explaining deep learning models for speech enhancement
    Sivasankaran, Sunit
    Vincent, Emmanuel
    Fohr, Dominique
    INTERSPEECH 2021, 2021, : 696 - 700
  • [27] Explaining Concept Drift of Deep Learning Models
    Wang, Xiaolu
    Wang, Zhi
    Shao, Wei
    Jia, Chunfu
    Li, Xiang
    CYBERSPACE SAFETY AND SECURITY, PT II, 2019, 11983 : 524 - 534
  • [28] Adversarial Deep Learning Models with Multiple Adversaries
    Chivukula, Aneesh Sreevallabh
    Liu, Wei
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (06) : 1066 - 1079
  • [29] Adversarial Attacks and Defenses for Deep Learning Models
    Li M.
    Jiang P.
    Wang Q.
    Shen C.
    Li Q.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2021, 58 (05): : 909 - 926
  • [30] Generating Adversarial Examples With Distance Constrained Adversarial Imitation Networks
    Tang, Pengfei
    Wang, Wenjie
    Lou, Jian
    Xiong, Li
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2022, 19 (06) : 4145 - 4155