Interpretable adversarial example detection via high-level concept activation vector

被引:0
|
作者
Li, Jiaxing [1 ]
Tan, Yu-an [1 ]
Liu, Xinyu [1 ]
Meng, Weizhi [2 ]
Li, Yuanzhang [3 ]
机构
[1] Beijing Inst Technol, Sch Cyberspace Sci & Technol, Beijing 100081, Peoples R China
[2] Univ Lancaster, Sch Comp & Commun, Lancaster LA1 4YR, England
[3] Beijing Inst Technol, Sch Comp Sci Technol, Beijing 100081, Peoples R China
关键词
Deep learning; Adversarial machine learning; Model explainability; Adversarial defense; Concept activation vector;
D O I
10.1016/j.cose.2024.104218
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep neural networks have achieved amazing performance in many tasks. However, they are easily fooled by small perturbations added to the input. Such small perturbations to image data are usually imperceptible to humans. The uninterpretable nature of deep learning systems is considered to be one of the reasons why they are vulnerable to adversarial attacks. For enhanced trust and confidence, it is crucial for artificial intelligence systems to ensure transparency, reliability, and human comprehensibility in their decision-making processes as they gain wider acceptance among the general public. In this paper, we propose an approach for defending against adversarial attacks based on conceptually interpretable techniques. Our approach to model interpretation is on high-level concepts rather than low-level pixel features. Our key finding is that adding small perturbations leads to large changes in the model concept vector tests. Based on this, we design a single image concept vector testing method for detecting adversarial examples. Our experiments on the Imagenet dataset show that our method can achieve an average accuracy of over 95%. We provide source code in the supplementary material.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] CONTEXT: A HIGH-LEVEL STRUCTURING CONCEPT FOR GKS INPUT.
    Mac an Airchinnigh, Micheal
    Computers and Graphics (Pergamon), 1985, 9 (03): : 211 - 220
  • [32] HIGH-LEVEL WELLNESS, A PERTINENT CONCEPT FOR THE HEALTH-PROFESSIONS
    KAUFMANN, MA
    MENTAL HYGIENE, 1963, 47 (01) : 57 - 62
  • [33] Saliency Detection Based on Low-Level and High-Level Features via Manifold-Space Ranking
    Li, Xiaoli
    Liu, Yunpeng
    Zhao, Huaici
    ELECTRONICS, 2023, 12 (02)
  • [34] AED-PADA: Improving Generalizability of Adversarial Example Detection via Principal Adversarial Domain Adaptation
    Peng, Heqi
    Wang, Yunhong
    Yang, Ruijie
    Li, Beichen
    Wang, Rui
    Guo, Yuanfang
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2025, 21 (02)
  • [35] Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser
    Liao, Fangzhou
    Liang, Ming
    Dong, Yinpeng
    Pang, Tianyu
    Hu, Xiaolin
    Zhu, Jun
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1778 - 1787
  • [36] HIGH-LEVEL OPTIMIZATION VIA AUTOMATED STATISTICAL MODELING
    BREWER, EA
    SIGPLAN NOTICES, 1995, 30 (08): : 80 - 91
  • [37] High-level data structures in CACSD example: Programming with graphs in Scilab
    Delebecque, F
    Gomez, C
    Goursat, M
    PROCEEDINGS OF THE 1996 IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER-AIDED CONTROL SYSTEM DESIGN, 1996, : 424 - 429
  • [38] HIGH-LEVEL TESTING AND EXAMPLE-DIRECTED DEVELOPMENT OF SOFTWARE SPECIFICATIONS
    PROBERT, RL
    URAL, H
    JOURNAL OF SYSTEMS AND SOFTWARE, 1984, 4 (04) : 317 - 325
  • [39] Obfuscating DSP Circuits via High-Level Transformations
    Lao, Yingjie
    Parhi, Keshab K.
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2015, 23 (05) : 819 - 830
  • [40] Object Detection by Estimating and Combining High-Level Features
    Levine, Geoffrey
    DeJong, Gerald
    IMAGE ANALYSIS AND PROCESSING - ICIAP 2009, PROCEEDINGS, 2009, 5716 : 161 - 169