Defending Against Backdoor Attacks by Layer-wise Feature Analysis (Extended Abstract)

被引：0

作者：

Jebreel, Najeeb Moharram ^{[1
]}

Domingo-Ferrer, Josep ^{[1
]}

Li, Yiming ^{[2
]}

机构：

[1] Univ Rovira Virgili, Tarragona, Spain

[2] Zhejiang Univ, State Key Lab Blockchain & Data Secur, Hangzhou, Zhejiang, Peoples R China

来源：

PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024 | 2024年

基金：

欧盟地平线“2020”;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Training deep neural networks (DNNs) usually requires massive training data and computational resources. Users who cannot afford this may prefer to outsource training to a third party or resort to publicly available pre-trained models. Unfortunately, doing so facilitates a new training-time attack (i.e., backdoor attack) against DNNs. This attack aims to induce misclassification of input samples containing adversary-specified trigger patterns. In this paper, we first conduct a layer-wise feature analysis of poisoned and benign samples from the target class. We find out that the feature difference between benign and poisoned samples tends to be maximum at a critical layer, which is not always the one typically used in existing defenses, namely the layer before fully-connected layers. We also demonstrate how to locate this critical layer based on the behaviors of benign samples. We then propose a simple yet effective method to filter poisoned samples by analyzing the feature differences between suspicious and benign samples at the critical layer. Extensive experiments on two benchmark datasets are reported which confirm the effectiveness of our defense.

引用

页码：8416 / 8420

页数：5

共 50 条

[21] Shallowing Deep Networks: Layer-wise Pruning based on Feature Representations
Chen, Shi
Zhao, Qi
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (12) : 3048 - 3056
[22] Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning
Tejankar, Ajinkya
Sanjabi, Maziar
Wang, Qifan
Wang, Sinong
Firooz, Hamed
Pirsiavash, Hamed
Tan, Liang
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12239 - 12249
[23] RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models
Yang, Wenkai
Lin, Yankai
Li, Peng
Zhou, Jie
Sun, Xu
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 8365 - 8381
[24] Two Layer Defending Mechanism against DDoS Attacks
Subramanian, Kiruthika
Gunasekaran, Preetha
Selvaraj, Mercy
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2015, 12 (04) : 317 - 324
[25] Adversarial attacks on text classification models using layer-wise relevance propagation
Xu, Jincheng
Du, Qingfeng
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2020, 35 (09) : 1397 - 1415
[26] Adversarial Examples Detection and Analysis with Layer-wise Autoencoders
Wojcik, Bartosz
Morawiecki, Pawel
Smieja, Marek
Krzyzek, Tomasz
Spurek, Przemyslaw
Tabor, Jacek
2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 1322 - 1326
[27] Mixed layer-wise models for multilayered plates analysis
Carrera, E
COMPOSITE STRUCTURES, 1998, 43 (01) : 57 - 70
[28] Defending Against Universal Attacks Through Selective Feature Regeneration
Borkar, Tejas
Heide, Felix
Karam, Lina
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 706 - 716
[29] A layer-wise triangle for analysis of laminated composite plates and shells
Botello, S
Oñate, E
Canet, JM
COMPUTERS & STRUCTURES, 1999, 70 (06) : 635 - 646
[30] Failure analysis of reinforeced glulam beams by a layer-wise theory
Davalos, JF
Kim, YC
Qiao, PZ
5TH WORLD CONFERENCE ON TIMBER ENGINEERING, VOL 2, PROCEEDINGS, 1998, : 182 - 189

← 1 2 3 4 5 →