SoK: Explainable Machine Learning in Adversarial Environments

被引:1
|
作者
Noppel, Maximilian [1 ]
Wressnegger, Christian [1 ]
机构
[1] Karlsruhe Inst Technol, KASTEL Secur Res Labs, Karlsruhe, Germany
关键词
Explainable Machine Learning; XAI; Attacks; Defenses; Robustness Notions; CLASSIFICATION; EXPLANATIONS; DECISIONS; ATTACKS;
D O I
10.1109/SP54263.2024.00021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern deep learning methods have long been considered black boxes due to the lack of insights into their decision-making process. However, recent advances in explainable machine learning have turned the tables. Post-hoc explanation methods enable precise relevance attribution of input features for otherwise opaque models such as deep neural networks. This progression has raised expectations that these techniques can uncover attacks against learning-based systems such as adversarial examples or neural backdoors. Unfortunately, current methods are not robust against manipulations themselves. In this paper, we set out to systematize attacks against post-hoc explanation methods to lay the groundwork for developing more robust explainable machine learning. If explanation methods cannot be misled by an adversary, they can serve as an effective tool against attacks, marking a turning point in adversarial machine learning. We present a hierarchy of explanation-aware robustness notions and relate existing defenses to it. In doing so, we uncover synergies, research gaps, and future directions toward more reliable explanations robust against manipulations.
引用
收藏
页码:2441 / 2459
页数:19
相关论文
共 50 条
  • [1] SoK: Explainable Machine Learning for Computer Security Applications
    Nadeem, Azqa
    Vos, Daniel
    Cao, Clinton
    Pajola, Luca
    Dieck, Simon
    Baumgartner, Robert
    Verwer, Sicco
    2023 IEEE 8TH EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY, EUROS&P, 2023, : 221 - 240
  • [2] Machine learning in adversarial environments
    Pavel Laskov
    Richard Lippmann
    Machine Learning, 2010, 81 : 115 - 119
  • [3] Machine learning in adversarial environments
    Laskov, Pavel
    Lippmann, Richard
    MACHINE LEARNING, 2010, 81 (02) : 115 - 119
  • [4] Machine Learning it Adversarial RF Environments
    Roy, Debashri
    Mukherjee, Tathagata
    Chatterjee, Mainak
    IEEE COMMUNICATIONS MAGAZINE, 2019, 57 (05) : 82 - 87
  • [5] eXplainable and Reliable Against Adversarial Machine Learning in Data Analytics
    Vaccari, Ivan
    Carlevaro, Alberto
    Narteni, Sara
    Cambiaso, Enrico
    Mongelli, Maurizio
    IEEE ACCESS, 2022, 10 : 83949 - 83970
  • [6] Machine Learning Integrity and Privacy in Adversarial Environments
    Oprea, Alina
    PROCEEDINGS OF THE 26TH ACM SYMPOSIUM ON ACCESS CONTROL MODELS AND TECHNOLOGIES, SACMAT 2021, 2021, : 1 - 2
  • [7] SoK: Security and Privacy in Machine Learning
    Papernot, Nicolas
    McDaniel, Patrick
    Sinha, Arunesh
    Wellman, Michael P.
    2018 3RD IEEE EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY (EUROS&P 2018), 2018, : 399 - 414
  • [8] Adversarial Attacks in Explainable Machine Learning: A Survey of Threats Against Models and Humans
    Vadillo, Jon
    Santana, Roberto
    Lozano, Jose A.
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2025, 15 (01)
  • [9] Secure and Resilient Distributed Machine Learning Under Adversarial Environments
    Zhang, Rui
    Zhu, Quanyan
    IEEE AEROSPACE AND ELECTRONIC SYSTEMS MAGAZINE, 2016, 31 (03) : 34 - 36
  • [10] Secure and Resilient Distributed Machine Learning Under Adversarial Environments
    Zhang, Rui
    Zhu, Quanyan
    2015 18TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2015, : 644 - 651