Detecting backdoor in deep neural networks via intentional adversarial perturbations

被引：5

作者：

Xue, Mingfu ^{[1
]}

Wu, Yinghao ^{[1
]}

Wu, Zhiyu ^{[2
]}

Zhang, Yushu ^{[1
]}

Wang, Jian ^{[1
]}

Liu, Weiqiang ^{[3
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China

[2] Nanjing Univ Aeronaut & Astronaut, Coll Sci, Nanjing, Peoples R China

[3] Nanjing Univ Aeronaut & Astronaut, Coll Elect & Informat Engn, Nanjing, Peoples R China

来源：

INFORMATION SCIENCES | 2023年 / 634卷

基金：

中国国家自然科学基金;

关键词：

Backdoor attacks; Deep neural networks; Backdoor detection; Defenses; Adversarial examples;

D O I：

10.1016/j.ins.2023.03.112

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent researches show that deep learning model is susceptible to backdoor attacks. Many defenses against backdoor attacks have been proposed. However, existing defense works require high computational overhead or backdoor attack information such as the trigger size. In this paper, we propose a novel backdoor detection method based on intentional adversarial perturbations. The proposed method leverages intentional adversarial perturbation to detect whether an image contains a trigger, which can be applied in both the training stage and the inference stage (sanitize the training set in training stage or detect the backdoor instances in inference stage). Specifically, given an untrusted image, the adversarial perturbation is added to the image intentionally. If the prediction of the model on the perturbed image is consistent with that on the unperturbed image, the input image will be considered as a backdoor instance. Compared with most existing defense works, the proposed method is faster and introduces less computational overhead during backdoor detection process. Moreover, the proposed method maintains the visual quality of the image (as the l(2) norm of the added perturbation is as low as 2.8715, 3.0513 and 2.4362 on Fashion-MNIST, CIFAR-10 and GTSRB datasets, respectively). Experimental results show that, for general backdoor attack, the backdoor detection rate of the proposed defense method is 99.63%, 99.76% and 99.91% on Fashion-MNIST, CIFAR-10 and GTSRB datasets, respectively. For invisible backdoor attack, the backdoor detection rate of the proposed defense method is 99.75% against blended backdoor attack and 98.00% against sample-specific backdoor attack. It is also demonstrated that the proposed method can achieve high defense performance against backdoor attacks under different attack settings (trigger transparency, trigger size and trigger pattern). In addition, the experimental comparison with related work demonstrates that the proposed method has better detection performance and higher detection efficiency.

引用

页码：564 / 577

页数：14

共 50 条

[1] Detection of backdoor attacks using targeted universal adversarial perturbations for deep neural networks
Qu, Yubin
Huang, Song
Chen, Xiang
Wang, Xingya
Yao, Yongming
[J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2024, 207
[2] Detecting Backdoor Attacks via Class Difference in Deep Neural Networks
Kwon, Hyun
[J]. IEEE ACCESS, 2020, 8 : 191049 - 191056
[3] HYBRID DEFENSE FOR DEEP NEURAL NETWORKS: AN INTEGRATION OF DETECTING AND CLEANING ADVERSARIAL PERTURBATIONS
Fan, Weiqi
Sun, Guangling
Su, Yuying
Liu, Zhi
Lu, Xiaofeng
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2019, : 210 - 215
[4] Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations
Peng, Zirui
Li, Shaofeng
Chen, Guoxing
Zhang, Cheng
Zhu, Haojin
Xue, Minhui
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13420 - 13429
[5] Detecting adversarial examples via prediction difference for deep neural networks
Guo, Feng
Zhao, Qingjie
Li, Xuan
Kuang, Xiaohui
Zhang, Jianwei
Han, Yahong
Tan, Yu-an
[J]. INFORMATION SCIENCES, 2019, 501 : 182 - 192
[6] Generalizing universal adversarial perturbations for deep neural networks
Yanghao Zhang
Wenjie Ruan
Fu Wang
Xiaowei Huang
[J]. Machine Learning, 2023, 112 : 1597 - 1626
[7] Luring Transferable Adversarial Perturbations for Deep Neural Networks
Bernhard, Remi
Moellic, Pierre-Alain
Dutertre, Jean-Max
[J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[8] Generalizing universal adversarial perturbations for deep neural networks
Zhang, Yanghao
Ruan, Wenjie
Wang, Fu
Huang, Xiaowei
[J]. MACHINE LEARNING, 2023, 112 (05) : 1597 - 1626
[9] Detecting adversarial example attacks to deep neural networks
Carrara, Fabio
Falchi, Fabrizio
Caldelli, Roberto
Amato, Giuseppe
Fumarola, Roberta
Becarelli, Rudy
[J]. PROCEEDINGS OF THE 15TH INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2017,
[10] Natural Backdoor Attacks on Deep Neural Networks via Raindrops
Zhao, Feng
Zhou, Li
Zhong, Qi
Lan, Rushi
Zhang, Leo Yu
[J]. SECURITY AND COMMUNICATION NETWORKS, 2022, 2022

← 1 2 3 4 5 →