SoK: Explainable Machine Learning in Adversarial Environments

被引：1

作者：

Noppel, Maximilian ^{[1
]}

Wressnegger, Christian ^{[1
]}

机构：

[1] Karlsruhe Inst Technol, KASTEL Secur Res Labs, Karlsruhe, Germany

来源：

45TH IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP 2024 | 2024年

关键词：

Explainable Machine Learning; XAI; Attacks; Defenses; Robustness Notions; CLASSIFICATION; EXPLANATIONS; DECISIONS; ATTACKS;

D O I：

10.1109/SP54263.2024.00021

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Modern deep learning methods have long been considered black boxes due to the lack of insights into their decision-making process. However, recent advances in explainable machine learning have turned the tables. Post-hoc explanation methods enable precise relevance attribution of input features for otherwise opaque models such as deep neural networks. This progression has raised expectations that these techniques can uncover attacks against learning-based systems such as adversarial examples or neural backdoors. Unfortunately, current methods are not robust against manipulations themselves. In this paper, we set out to systematize attacks against post-hoc explanation methods to lay the groundwork for developing more robust explainable machine learning. If explanation methods cannot be misled by an adversary, they can serve as an effective tool against attacks, marking a turning point in adversarial machine learning. We present a hierarchy of explanation-aware robustness notions and relate existing defenses to it. In doing so, we uncover synergies, research gaps, and future directions toward more reliable explanations robust against manipulations.

引用

页码：2441 / 2459

页数：19

共 50 条

[41] Adversarial machine learning in dermatology
Gilmore, Stephen
AUSTRALASIAN JOURNAL OF DERMATOLOGY, 2022, 63 : 118 - 118
[42] Evaluating data distribution and drift vulnerabilities of machine learning algorithms in secure and adversarial environments
Nelson, Kevin
Corbin, George
Blowers, Misty
MACHINE INTELLIGENCE AND BIO-INSPIRED COMPUTATION: THEORY AND APPLICATIONS VIII, 2014, 9119
[43] Secure Learning and Mining in Adversarial Environments
Li, Bo
2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2015, : 1538 - 1539
[44] Online Learning in Adversarial Lipschitz Environments
Maillard, Odalric-Ambrym
Munos, Remi
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II: EUROPEAN CONFERENCE, ECML PKDD 2010, 2010, 6322 : 305 - 320
[45] Learning Coordinated Maneuver in Adversarial Environments
Hu, Zechen
Limbu, Manshi
Shishika, Daigo
Xiao, Xuesu
Wang, Xuan
2024 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2024), 2024, : 10740 - 10745
[46] Explainable Machine Learning in Credit Risk Management
Bussmann, Niklas
Giudici, Paolo
Marinelli, Dimitri
Papenbrock, Jochen
COMPUTATIONAL ECONOMICS, 2021, 57 (01) : 203 - 216
[47] Predicting Software Defects with Explainable Machine Learning
Santos, Geanderson
Figueiredo, Eduardo
Veloso, Adriano
Viggiato, Markos
Ziviani, Nivio
PROCEEDINGS OF THE 19TH BRAZILIAN SYMPOSIUM ON SOFTWARE QUALITY, SBOS 2020, 2020,
[48] Explainable machine learning for hydrocarbon prospect risking
Mustafa, Ahmad
Koster, Klaas
Alregib, Ghassan
GEOPHYSICS, 2024, 89 (01) : WA13 - WA24
[49] Explainable machine learning for phishing feature detection
Calzarossa, Maria Carla
Giudici, Paolo
Zieni, Rasha
QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2024, 40 (01) : 362 - 373
[50] Evaluating Explainable Machine Learning Models for Clinicians
Scarpato, Noemi
Nourbakhsh, Aria
Ferroni, Patrizia
Riondino, Silvia
Roselli, Mario
Fallucchi, Francesca
Barbanti, Piero
Guadagni, Fiorella
Zanzotto, Fabio Massimo
COGNITIVE COMPUTATION, 2024, 16 (04) : 1436 - 1446

← 1 2 3 4 5 →