SoK: Explainable Machine Learning in Adversarial Environments

被引:1
|
作者
Noppel, Maximilian [1 ]
Wressnegger, Christian [1 ]
机构
[1] Karlsruhe Inst Technol, KASTEL Secur Res Labs, Karlsruhe, Germany
关键词
Explainable Machine Learning; XAI; Attacks; Defenses; Robustness Notions; CLASSIFICATION; EXPLANATIONS; DECISIONS; ATTACKS;
D O I
10.1109/SP54263.2024.00021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern deep learning methods have long been considered black boxes due to the lack of insights into their decision-making process. However, recent advances in explainable machine learning have turned the tables. Post-hoc explanation methods enable precise relevance attribution of input features for otherwise opaque models such as deep neural networks. This progression has raised expectations that these techniques can uncover attacks against learning-based systems such as adversarial examples or neural backdoors. Unfortunately, current methods are not robust against manipulations themselves. In this paper, we set out to systematize attacks against post-hoc explanation methods to lay the groundwork for developing more robust explainable machine learning. If explanation methods cannot be misled by an adversary, they can serve as an effective tool against attacks, marking a turning point in adversarial machine learning. We present a hierarchy of explanation-aware robustness notions and relate existing defenses to it. In doing so, we uncover synergies, research gaps, and future directions toward more reliable explanations robust against manipulations.
引用
收藏
页码:2441 / 2459
页数:19
相关论文
共 50 条
  • [31] Explainable machine learning models with privacy
    Aso Bozorgpanah
    Vicenç Torra
    Progress in Artificial Intelligence, 2024, 13 : 31 - 50
  • [32] Hardware Acceleration of Explainable Machine Learning
    Pan, Zhixin
    Mishra, Prabhat
    PROCEEDINGS OF THE 2022 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2022), 2022, : 1127 - 1130
  • [33] eXplainable Cooperative Machine Learning with NOVA
    Baur, Tobias
    Heimerl, Alexander
    Lingenfelser, Florian
    Wagner, Johannes
    Valstar, Michel F.
    Schuller, Björn
    André, Elisabeth
    KI - Kunstliche Intelligenz, 2020, 34 (02): : 143 - 164
  • [34] Explainable Machine Learning for Intrusion Detection
    Bellegdi, Sameh
    Selamat, Ali
    Olatunji, Sunday O.
    Fujita, Hamido
    Krejcar, Ondfrej
    ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, IEA-AIE 2024, 2024, 14748 : 122 - 134
  • [35] Explainable Artificial Intelligence and Machine Learning
    Raunak, M. S.
    Kuhn, Rick
    COMPUTER, 2021, 54 (10) : 25 - 27
  • [36] Explainable machine learning in cybersecurity: A survey
    Yan, Feixue
    Wen, Sheng
    Nepal, Surya
    Paris, Cecile
    Xiang, Yang
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (12) : 12305 - 12334
  • [37] Machine Learning in Adversarial Settings
    McDaniel, Patrick
    Papernot, Nicolas
    Celik, Z. Berkay
    IEEE SECURITY & PRIVACY, 2016, 14 (03) : 68 - 72
  • [38] Quantum adversarial machine learning
    Lu, Sirui
    Duan, Lu-Ming
    Deng, Dong-Ling
    PHYSICAL REVIEW RESEARCH, 2020, 2 (03):
  • [39] Adversarial Machine Learning for Text
    Lee, Daniel
    Verma, Rakesh
    PROCEEDINGS OF THE SIXTH INTERNATIONAL WORKSHOP ON SECURITY AND PRIVACY ANALYTICS (IWSPA'20), 2020, : 33 - 34
  • [40] On the Economics of Adversarial Machine Learning
    Merkle, Florian
    Samsinger, Maximilian
    Schottle, Pascal
    Pevny, Tomas
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 4670 - 4685