Interpretable adversarial example detection via high-level concept activation vector

被引:0
|
作者
Li, Jiaxing [1 ]
Tan, Yu-an [1 ]
Liu, Xinyu [1 ]
Meng, Weizhi [2 ]
Li, Yuanzhang [3 ]
机构
[1] Beijing Inst Technol, Sch Cyberspace Sci & Technol, Beijing 100081, Peoples R China
[2] Univ Lancaster, Sch Comp & Commun, Lancaster LA1 4YR, England
[3] Beijing Inst Technol, Sch Comp Sci Technol, Beijing 100081, Peoples R China
关键词
Deep learning; Adversarial machine learning; Model explainability; Adversarial defense; Concept activation vector;
D O I
10.1016/j.cose.2024.104218
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep neural networks have achieved amazing performance in many tasks. However, they are easily fooled by small perturbations added to the input. Such small perturbations to image data are usually imperceptible to humans. The uninterpretable nature of deep learning systems is considered to be one of the reasons why they are vulnerable to adversarial attacks. For enhanced trust and confidence, it is crucial for artificial intelligence systems to ensure transparency, reliability, and human comprehensibility in their decision-making processes as they gain wider acceptance among the general public. In this paper, we propose an approach for defending against adversarial attacks based on conceptually interpretable techniques. Our approach to model interpretation is on high-level concepts rather than low-level pixel features. Our key finding is that adding small perturbations leads to large changes in the model concept vector tests. Based on this, we design a single image concept vector testing method for detecting adversarial examples. Our experiments on the Imagenet dataset show that our method can achieve an average accuracy of over 95%. We provide source code in the supplementary material.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Improving the transferability of adversarial examples via the high-level interpretable features for object detection
    Zhiyi Ding
    Lei Sun
    Xiuqing Mao
    Leyu Dai
    Ruiyang Ding
    The Journal of Supercomputing, 81 (6)
  • [2] Robust Adversarial Example Detection Algorithm Based on High-Level Feature Differences
    Mu, Hua
    Li, Chenggang
    Peng, Anjie
    Wang, Yangyang
    Liang, Zhenyu
    Sensors, 2025, 25 (06)
  • [3] Interpretable High-level Features for Human Activity Recognition
    Hartmann, Yale
    Liu, Hui
    Lahrberg, Steffen
    Schultz, Tanja
    BIOSIGNALS: PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL 4: BIOSIGNALS, 2022, : 40 - 49
  • [4] Using Visual Context and Region Semantics for High-Level Concept Detection
    Mylonas, Phivos
    Spyrou, Evaggelos
    Avrithis, Yannis
    Kollias, Stefanos
    IEEE TRANSACTIONS ON MULTIMEDIA, 2009, 11 (02) : 229 - 243
  • [5] Dense Face Detection via High-level Context Mining
    Geng, Qixiang
    Liang, Dong
    Zhou, Huiyu
    Zhang, Liyan
    Sun, Han
    Liu, Ningzhong
    2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021), 2021,
  • [6] HLR: Generating Adversarial Examples by High-Level Representations
    Hao, Yuying
    Li, Tuanhui
    Li, Li
    Jiang, Yong
    Cheng, Xuanye
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: IMAGE PROCESSING, PT III, 2019, 11729 : 724 - 730
  • [7] High-level concept detection based on mid-level semantic information and contextual adaptation
    Mylonas, Phivos
    Spyrou, Evaggelos
    Avrithis, Yannis
    SECOND INTERNATIONAL WORKSHOP ON SEMANTIC MEDIA ADAPTATION AND PERSONALIZATION, PROCEEDINGS, 2007, : 193 - 198
  • [8] A region thesaurus approach for high-level concept detection in the natural disaster domain
    Spyrou, Evaggelos
    Avrithis, Yannis
    SEMANTIC MULTIMEDIA, PROCEEDINGS, 2007, 4816 : 74 - 77
  • [9] High dimensional forecasting via interpretable vector autoregression
    Nicholson, William B.
    Wilms, Ines
    Bien, Jacob
    Matteson, David S.
    Journal of Machine Learning Research, 2020, 21
  • [10] High Dimensional Forecasting via Interpretable Vector Autoregression
    Nicholson, William B.
    Wilms, Ines
    Bien, Jacob
    Matteson, David S.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21