SoK: Explainable Machine Learning in Adversarial Environments

被引:1
|
作者
Noppel, Maximilian [1 ]
Wressnegger, Christian [1 ]
机构
[1] Karlsruhe Inst Technol, KASTEL Secur Res Labs, Karlsruhe, Germany
关键词
Explainable Machine Learning; XAI; Attacks; Defenses; Robustness Notions; CLASSIFICATION; EXPLANATIONS; DECISIONS; ATTACKS;
D O I
10.1109/SP54263.2024.00021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern deep learning methods have long been considered black boxes due to the lack of insights into their decision-making process. However, recent advances in explainable machine learning have turned the tables. Post-hoc explanation methods enable precise relevance attribution of input features for otherwise opaque models such as deep neural networks. This progression has raised expectations that these techniques can uncover attacks against learning-based systems such as adversarial examples or neural backdoors. Unfortunately, current methods are not robust against manipulations themselves. In this paper, we set out to systematize attacks against post-hoc explanation methods to lay the groundwork for developing more robust explainable machine learning. If explanation methods cannot be misled by an adversary, they can serve as an effective tool against attacks, marking a turning point in adversarial machine learning. We present a hierarchy of explanation-aware robustness notions and relate existing defenses to it. In doing so, we uncover synergies, research gaps, and future directions toward more reliable explanations robust against manipulations.
引用
收藏
页码:2441 / 2459
页数:19
相关论文
共 50 条
  • [41] Adversarial machine learning in dermatology
    Gilmore, Stephen
    AUSTRALASIAN JOURNAL OF DERMATOLOGY, 2022, 63 : 118 - 118
  • [42] Evaluating data distribution and drift vulnerabilities of machine learning algorithms in secure and adversarial environments
    Nelson, Kevin
    Corbin, George
    Blowers, Misty
    MACHINE INTELLIGENCE AND BIO-INSPIRED COMPUTATION: THEORY AND APPLICATIONS VIII, 2014, 9119
  • [43] Secure Learning and Mining in Adversarial Environments
    Li, Bo
    2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2015, : 1538 - 1539
  • [44] Online Learning in Adversarial Lipschitz Environments
    Maillard, Odalric-Ambrym
    Munos, Remi
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II: EUROPEAN CONFERENCE, ECML PKDD 2010, 2010, 6322 : 305 - 320
  • [45] Learning Coordinated Maneuver in Adversarial Environments
    Hu, Zechen
    Limbu, Manshi
    Shishika, Daigo
    Xiao, Xuesu
    Wang, Xuan
    2024 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2024), 2024, : 10740 - 10745
  • [46] Explainable Machine Learning in Credit Risk Management
    Bussmann, Niklas
    Giudici, Paolo
    Marinelli, Dimitri
    Papenbrock, Jochen
    COMPUTATIONAL ECONOMICS, 2021, 57 (01) : 203 - 216
  • [47] Predicting Software Defects with Explainable Machine Learning
    Santos, Geanderson
    Figueiredo, Eduardo
    Veloso, Adriano
    Viggiato, Markos
    Ziviani, Nivio
    PROCEEDINGS OF THE 19TH BRAZILIAN SYMPOSIUM ON SOFTWARE QUALITY, SBOS 2020, 2020,
  • [48] Explainable machine learning for hydrocarbon prospect risking
    Mustafa, Ahmad
    Koster, Klaas
    Alregib, Ghassan
    GEOPHYSICS, 2024, 89 (01) : WA13 - WA24
  • [49] Explainable machine learning for phishing feature detection
    Calzarossa, Maria Carla
    Giudici, Paolo
    Zieni, Rasha
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2024, 40 (01) : 362 - 373
  • [50] Evaluating Explainable Machine Learning Models for Clinicians
    Scarpato, Noemi
    Nourbakhsh, Aria
    Ferroni, Patrizia
    Riondino, Silvia
    Roselli, Mario
    Fallucchi, Francesca
    Barbanti, Piero
    Guadagni, Fiorella
    Zanzotto, Fabio Massimo
    COGNITIVE COMPUTATION, 2024, 16 (04) : 1436 - 1446