SoK: Explainable Machine Learning in Adversarial Environments

被引:1
|
作者
Noppel, Maximilian [1 ]
Wressnegger, Christian [1 ]
机构
[1] Karlsruhe Inst Technol, KASTEL Secur Res Labs, Karlsruhe, Germany
关键词
Explainable Machine Learning; XAI; Attacks; Defenses; Robustness Notions; CLASSIFICATION; EXPLANATIONS; DECISIONS; ATTACKS;
D O I
10.1109/SP54263.2024.00021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern deep learning methods have long been considered black boxes due to the lack of insights into their decision-making process. However, recent advances in explainable machine learning have turned the tables. Post-hoc explanation methods enable precise relevance attribution of input features for otherwise opaque models such as deep neural networks. This progression has raised expectations that these techniques can uncover attacks against learning-based systems such as adversarial examples or neural backdoors. Unfortunately, current methods are not robust against manipulations themselves. In this paper, we set out to systematize attacks against post-hoc explanation methods to lay the groundwork for developing more robust explainable machine learning. If explanation methods cannot be misled by an adversary, they can serve as an effective tool against attacks, marking a turning point in adversarial machine learning. We present a hierarchy of explanation-aware robustness notions and relate existing defenses to it. In doing so, we uncover synergies, research gaps, and future directions toward more reliable explanations robust against manipulations.
引用
收藏
页码:2441 / 2459
页数:19
相关论文
共 50 条
  • [11] Explainable Machine Learning
    Garcke, Jochen
    Roscher, Ribana
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2023, 5 (01): : 169 - 170
  • [12] Attack-agnostic Adversarial Detection on Medical Data Using Explainable Machine Learning
    Watson, Matthew
    Al Moubayed, Noura
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 8180 - 8187
  • [13] Leveraging Machine Learning for Generating and Utilizing Motion Primitives in Adversarial Environments
    Goddard, Zachary C.
    Rajasekar, Rithesh
    Mocharla, Madhumita
    Manaster, Garrett
    Williams, Kyle
    Mazumdar, Anirban
    JOURNAL OF AEROSPACE INFORMATION SYSTEMS, 2024, 21 (02): : 127 - 139
  • [14] Explainable Machine Learning in Deployment
    Bhatt, Umang
    Xiang, Alice
    Sharma, Shubham
    Weller, Adrian
    Taly, Ankur
    Jia, Yunhan
    Ghosh, Joydeep
    Puri, Ruchir
    Moura, Jose M. F.
    Eckersley, Peter
    FAT* '20: PROCEEDINGS OF THE 2020 CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, 2020, : 648 - 657
  • [15] Adversarial Machine Learning
    Tygar, J. D.
    IEEE INTERNET COMPUTING, 2011, 15 (05) : 4 - 6
  • [16] SoK: quantum computing methods for machine learning optimization
    Baniata, Hamza
    QUANTUM MACHINE INTELLIGENCE, 2024, 6 (02)
  • [17] Learning Consensus in Adversarial Environments
    Vamvoudakis, Kyriakos G.
    Carrillo, Luis R. Garcia
    Hespanha, Joao P.
    UNMANNED SYSTEMS TECHNOLOGY XV, 2013, 8741
  • [18] xGAIL: Explainable Generative Adversarial Imitation Learning for Explainable Human Decision Analysis
    Pan, Menghai
    Huang, Weixiao
    Li, Yanhua
    Zhou, Xun
    Luo, Jun
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1334 - 1343
  • [19] SoK: Pragmatic Assessment of Machine Learning for Network Intrusion Detection
    Apruzzese, Giovanni
    Laskov, Pavel
    Schneider, Johannes
    2023 IEEE 8TH EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY, EUROS&P, 2023, : 592 - 614
  • [20] SoK: Unintended Interactions among Machine Learning Defenses and Risks
    Duddu, Vasisht
    Szyllert, Sebastian
    Asokan, N.
    45TH IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP 2024, 2024, : 2996 - 3014