Defending Against Backdoor Attacks by Layer-wise Feature Analysis (Extended Abstract)

被引:0
|
作者
Jebreel, Najeeb Moharram [1 ]
Domingo-Ferrer, Josep [1 ]
Li, Yiming [2 ]
机构
[1] Univ Rovira Virgili, Tarragona, Spain
[2] Zhejiang Univ, State Key Lab Blockchain & Data Secur, Hangzhou, Zhejiang, Peoples R China
基金
欧盟地平线“2020”;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Training deep neural networks (DNNs) usually requires massive training data and computational resources. Users who cannot afford this may prefer to outsource training to a third party or resort to publicly available pre-trained models. Unfortunately, doing so facilitates a new training-time attack (i.e., backdoor attack) against DNNs. This attack aims to induce misclassification of input samples containing adversary-specified trigger patterns. In this paper, we first conduct a layer-wise feature analysis of poisoned and benign samples from the target class. We find out that the feature difference between benign and poisoned samples tends to be maximum at a critical layer, which is not always the one typically used in existing defenses, namely the layer before fully-connected layers. We also demonstrate how to locate this critical layer based on the behaviors of benign samples. We then propose a simple yet effective method to filter poisoned samples by analyzing the feature differences between suspicious and benign samples at the critical layer. Extensive experiments on two benchmark datasets are reported which confirm the effectiveness of our defense.
引用
收藏
页码:8416 / 8420
页数:5
相关论文
共 50 条
  • [21] Shallowing Deep Networks: Layer-wise Pruning based on Feature Representations
    Chen, Shi
    Zhao, Qi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (12) : 3048 - 3056
  • [22] Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning
    Tejankar, Ajinkya
    Sanjabi, Maziar
    Wang, Qifan
    Wang, Sinong
    Firooz, Hamed
    Pirsiavash, Hamed
    Tan, Liang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12239 - 12249
  • [23] RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models
    Yang, Wenkai
    Lin, Yankai
    Li, Peng
    Zhou, Jie
    Sun, Xu
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 8365 - 8381
  • [24] Two Layer Defending Mechanism against DDoS Attacks
    Subramanian, Kiruthika
    Gunasekaran, Preetha
    Selvaraj, Mercy
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2015, 12 (04) : 317 - 324
  • [25] Adversarial attacks on text classification models using layer-wise relevance propagation
    Xu, Jincheng
    Du, Qingfeng
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2020, 35 (09) : 1397 - 1415
  • [26] Adversarial Examples Detection and Analysis with Layer-wise Autoencoders
    Wojcik, Bartosz
    Morawiecki, Pawel
    Smieja, Marek
    Krzyzek, Tomasz
    Spurek, Przemyslaw
    Tabor, Jacek
    2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 1322 - 1326
  • [27] Mixed layer-wise models for multilayered plates analysis
    Carrera, E
    COMPOSITE STRUCTURES, 1998, 43 (01) : 57 - 70
  • [28] Defending Against Universal Attacks Through Selective Feature Regeneration
    Borkar, Tejas
    Heide, Felix
    Karam, Lina
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 706 - 716
  • [29] A layer-wise triangle for analysis of laminated composite plates and shells
    Botello, S
    Oñate, E
    Canet, JM
    COMPUTERS & STRUCTURES, 1999, 70 (06) : 635 - 646
  • [30] Failure analysis of reinforeced glulam beams by a layer-wise theory
    Davalos, JF
    Kim, YC
    Qiao, PZ
    5TH WORLD CONFERENCE ON TIMBER ENGINEERING, VOL 2, PROCEEDINGS, 1998, : 182 - 189