Explaining Deep Learning Models with Constrained Adversarial Examples

被引：17

作者：

Moore, Jonathan ^{[1
]}

Hammerla, Nils ^{[1
]}

Watkins, Chris ^{[2
]}

机构：

[1] Babylon Hlth, London SW3 3DD, England

[2] Royal Holloway Univ London, Egham, Surrey, England

来源：

PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I | 2019年 / 11670卷

关键词：

Explainable AI; Adversarial examples; Counerfactual explanations; INTERPRETABILITY;

D O I：

10.1007/978-3-030-29908-8_4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Machine learning algorithms generally suffer from a problem of explainability. Given a classification result from a model, it is typically hard to determine what caused the decision to be made, and to give an informative explanation. We explore a new method of generating counterfactual explanations, which instead of explaining why a particular classification was made explain how a different outcome can be achieved. This gives the recipients of the explanation a better way to understand the outcome, and provides an actionable suggestion. We show that the introduced method of Constrained Adversarial Examples (CADEX) can be used in real world applications, and yields explanations which incorporate business or domain constraints such as handling categorical attributes and range constraints.

引用

页码：43 / 56

页数：14

共 50 条

[21] Adversarial examples: attacks and defences on medical deep learning systems
Puttagunta, Murali Krishna
Ravi, S.
Babu, C. Nelson Kennedy
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (22) : 33773 - 33809
[22] Understanding adversarial examples requires a theory of artefacts for deep learning
Cameron Buckner
Nature Machine Intelligence, 2020, 2 : 731 - 736
[23] A hybrid adversarial training for deep learning model and denoising network resistant to adversarial examples
Gwonsang Ryu
Daeseon Choi
Applied Intelligence, 2023, 53 : 9174 - 9187
[24] A hybrid adversarial training for deep learning model and denoising network resistant to adversarial examples
Ryu, Gwonsang
Choi, Daeseon
APPLIED INTELLIGENCE, 2023, 53 (08) : 9174 - 9187
[25] ADVERSARIAL EXAMPLES FOR GOOD: ADVERSARIAL EXAMPLES GUIDED IMBALANCED LEARNING
Zhang, Jie
Zhang, Lei
Li, Gang
Wu, Chao
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 136 - 140
[26] Explaining deep learning models for speech enhancement
Sivasankaran, Sunit
Vincent, Emmanuel
Fohr, Dominique
INTERSPEECH 2021, 2021, : 696 - 700
[27] Explaining Concept Drift of Deep Learning Models
Wang, Xiaolu
Wang, Zhi
Shao, Wei
Jia, Chunfu
Li, Xiang
CYBERSPACE SAFETY AND SECURITY, PT II, 2019, 11983 : 524 - 534
[28] Adversarial Deep Learning Models with Multiple Adversaries
Chivukula, Aneesh Sreevallabh
Liu, Wei
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (06) : 1066 - 1079
[29] Adversarial Attacks and Defenses for Deep Learning Models
Li M.
Jiang P.
Wang Q.
Shen C.
Li Q.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2021, 58 (05): : 909 - 926
[30] Generating Adversarial Examples With Distance Constrained Adversarial Imitation Networks
Tang, Pengfei
Wang, Wenjie
Lou, Jian
Xiong, Li
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2022, 19 (06) : 4145 - 4155

← 1 2 3 4 5 →