SoK: Explainable Machine Learning in Adversarial Environments

被引:1
|
作者
Noppel, Maximilian [1 ]
Wressnegger, Christian [1 ]
机构
[1] Karlsruhe Inst Technol, KASTEL Secur Res Labs, Karlsruhe, Germany
关键词
Explainable Machine Learning; XAI; Attacks; Defenses; Robustness Notions; CLASSIFICATION; EXPLANATIONS; DECISIONS; ATTACKS;
D O I
10.1109/SP54263.2024.00021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern deep learning methods have long been considered black boxes due to the lack of insights into their decision-making process. However, recent advances in explainable machine learning have turned the tables. Post-hoc explanation methods enable precise relevance attribution of input features for otherwise opaque models such as deep neural networks. This progression has raised expectations that these techniques can uncover attacks against learning-based systems such as adversarial examples or neural backdoors. Unfortunately, current methods are not robust against manipulations themselves. In this paper, we set out to systematize attacks against post-hoc explanation methods to lay the groundwork for developing more robust explainable machine learning. If explanation methods cannot be misled by an adversary, they can serve as an effective tool against attacks, marking a turning point in adversarial machine learning. We present a hierarchy of explanation-aware robustness notions and relate existing defenses to it. In doing so, we uncover synergies, research gaps, and future directions toward more reliable explanations robust against manipulations.
引用
收藏
页码:2441 / 2459
页数:19
相关论文
共 50 条
  • [21] Explainable Machine Learning via Argumentation
    Prentzas, Nicoletta
    Pattichis, Constantinos
    Kakas, Antonis
    EXPLAINABLE ARTIFICIAL INTELLIGENCE, XAI 2023, PT III, 2023, 1903 : 371 - 398
  • [22] Explainable machine learning in materials science
    Xiaoting Zhong
    Brian Gallagher
    Shusen Liu
    Bhavya Kailkhura
    Anna Hiszpanski
    T. Yong-Jin Han
    npj Computational Materials, 8
  • [23] Explainable machine learning for diffraction patterns
    Nawaz, Shah
    Rahmani, Vahid
    Pennicard, David
    Setty, Shabarish Pala Ramakantha
    Klaudel, Barbara
    Graafsma, Heinz
    JOURNAL OF APPLIED CRYSTALLOGRAPHY, 2023, 56 : 1494 - 1504
  • [24] Explainable machine learning in materials science
    Zhong, Xiaoting
    Gallagher, Brian
    Liu, Shusen
    Kailkhura, Bhavya
    Hiszpanski, Anna
    Han, T. Yong-Jin
    NPJ COMPUTATIONAL MATERIALS, 2022, 8 (01)
  • [25] eXplainable Cooperative Machine Learning with NOVA
    Baur, Tobias
    Heimerl, Alexander
    Lingenfelser, Florian
    Wagner, Johannes
    Valstar, Michel F.
    Schuller, Bjoern
    Andre, Elisabeth
    KUNSTLICHE INTELLIGENZ, 2020, 34 (02): : 143 - 164
  • [26] Principles and Practice of Explainable Machine Learning
    Belle, Vaishak
    Papantonis, Ioannis
    FRONTIERS IN BIG DATA, 2021, 4
  • [27] Explainable Machine Learning for Trustworthy AI
    Giannotti, Fosca
    ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT, 2022, 356 : 3 - 3
  • [28] Explainable Machine Learning for Fraud Detection
    Psychoula, Ismini
    Gutmann, Andreas
    Mainali, Pradip
    Lee, S. H.
    Dunphy, Paul
    Petitcolas, Fabien A. P.
    COMPUTER, 2021, 54 (10) : 49 - 59
  • [29] Explainable machine learning models with privacy
    Bozorgpanah, Aso
    Torra, Vicenc
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2024, 13 (01) : 31 - 50
  • [30] eXplainable Cooperative Machine Learning with NOVA
    Tobias Baur
    Alexander Heimerl
    Florian Lingenfelser
    Johannes Wagner
    Michel F. Valstar
    Björn Schuller
    Elisabeth André
    KI - Künstliche Intelligenz, 2020, 34 : 143 - 164