SoK: Explainable Machine Learning in Adversarial Environments

被引：1

作者：

Noppel, Maximilian ^{[1
]}

Wressnegger, Christian ^{[1
]}

机构：

[1] Karlsruhe Inst Technol, KASTEL Secur Res Labs, Karlsruhe, Germany

来源：

45TH IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP 2024 | 2024年

关键词：

Explainable Machine Learning; XAI; Attacks; Defenses; Robustness Notions; CLASSIFICATION; EXPLANATIONS; DECISIONS; ATTACKS;

D O I：

10.1109/SP54263.2024.00021

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Modern deep learning methods have long been considered black boxes due to the lack of insights into their decision-making process. However, recent advances in explainable machine learning have turned the tables. Post-hoc explanation methods enable precise relevance attribution of input features for otherwise opaque models such as deep neural networks. This progression has raised expectations that these techniques can uncover attacks against learning-based systems such as adversarial examples or neural backdoors. Unfortunately, current methods are not robust against manipulations themselves. In this paper, we set out to systematize attacks against post-hoc explanation methods to lay the groundwork for developing more robust explainable machine learning. If explanation methods cannot be misled by an adversary, they can serve as an effective tool against attacks, marking a turning point in adversarial machine learning. We present a hierarchy of explanation-aware robustness notions and relate existing defenses to it. In doing so, we uncover synergies, research gaps, and future directions toward more reliable explanations robust against manipulations.

引用

页码：2441 / 2459

页数：19

共 50 条

[21] Explainable Machine Learning via Argumentation
Prentzas, Nicoletta
Pattichis, Constantinos
Kakas, Antonis
EXPLAINABLE ARTIFICIAL INTELLIGENCE, XAI 2023, PT III, 2023, 1903 : 371 - 398
[22] Explainable machine learning in materials science
Xiaoting Zhong
Brian Gallagher
Shusen Liu
Bhavya Kailkhura
Anna Hiszpanski
T. Yong-Jin Han
npj Computational Materials, 8
[23] Explainable machine learning for diffraction patterns
Nawaz, Shah
Rahmani, Vahid
Pennicard, David
Setty, Shabarish Pala Ramakantha
Klaudel, Barbara
Graafsma, Heinz
JOURNAL OF APPLIED CRYSTALLOGRAPHY, 2023, 56 : 1494 - 1504
[24] Explainable machine learning in materials science
Zhong, Xiaoting
Gallagher, Brian
Liu, Shusen
Kailkhura, Bhavya
Hiszpanski, Anna
Han, T. Yong-Jin
NPJ COMPUTATIONAL MATERIALS, 2022, 8 (01)
[25] eXplainable Cooperative Machine Learning with NOVA
Baur, Tobias
Heimerl, Alexander
Lingenfelser, Florian
Wagner, Johannes
Valstar, Michel F.
Schuller, Bjoern
Andre, Elisabeth
KUNSTLICHE INTELLIGENZ, 2020, 34 (02): : 143 - 164
[26] Principles and Practice of Explainable Machine Learning
Belle, Vaishak
Papantonis, Ioannis
FRONTIERS IN BIG DATA, 2021, 4
[27] Explainable Machine Learning for Trustworthy AI
Giannotti, Fosca
ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT, 2022, 356 : 3 - 3
[28] Explainable Machine Learning for Fraud Detection
Psychoula, Ismini
Gutmann, Andreas
Mainali, Pradip
Lee, S. H.
Dunphy, Paul
Petitcolas, Fabien A. P.
COMPUTER, 2021, 54 (10) : 49 - 59
[29] Explainable machine learning models with privacy
Bozorgpanah, Aso
Torra, Vicenc
PROGRESS IN ARTIFICIAL INTELLIGENCE, 2024, 13 (01) : 31 - 50
[30] eXplainable Cooperative Machine Learning with NOVA
Tobias Baur
Alexander Heimerl
Florian Lingenfelser
Johannes Wagner
Michel F. Valstar
Björn Schuller
Elisabeth André
KI - Künstliche Intelligenz, 2020, 34 : 143 - 164

← 1 2 3 4 5 →