Detecting backdoor in deep neural networks via intentional adversarial perturbations

被引:5
|
作者
Xue, Mingfu [1 ]
Wu, Yinghao [1 ]
Wu, Zhiyu [2 ]
Zhang, Yushu [1 ]
Wang, Jian [1 ]
Liu, Weiqiang [3 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China
[2] Nanjing Univ Aeronaut & Astronaut, Coll Sci, Nanjing, Peoples R China
[3] Nanjing Univ Aeronaut & Astronaut, Coll Elect & Informat Engn, Nanjing, Peoples R China
基金
中国国家自然科学基金;
关键词
Backdoor attacks; Deep neural networks; Backdoor detection; Defenses; Adversarial examples;
D O I
10.1016/j.ins.2023.03.112
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent researches show that deep learning model is susceptible to backdoor attacks. Many defenses against backdoor attacks have been proposed. However, existing defense works require high computational overhead or backdoor attack information such as the trigger size. In this paper, we propose a novel backdoor detection method based on intentional adversarial perturbations. The proposed method leverages intentional adversarial perturbation to detect whether an image contains a trigger, which can be applied in both the training stage and the inference stage (sanitize the training set in training stage or detect the backdoor instances in inference stage). Specifically, given an untrusted image, the adversarial perturbation is added to the image intentionally. If the prediction of the model on the perturbed image is consistent with that on the unperturbed image, the input image will be considered as a backdoor instance. Compared with most existing defense works, the proposed method is faster and introduces less computational overhead during backdoor detection process. Moreover, the proposed method maintains the visual quality of the image (as the l(2) norm of the added perturbation is as low as 2.8715, 3.0513 and 2.4362 on Fashion-MNIST, CIFAR-10 and GTSRB datasets, respectively). Experimental results show that, for general backdoor attack, the backdoor detection rate of the proposed defense method is 99.63%, 99.76% and 99.91% on Fashion-MNIST, CIFAR-10 and GTSRB datasets, respectively. For invisible backdoor attack, the backdoor detection rate of the proposed defense method is 99.75% against blended backdoor attack and 98.00% against sample-specific backdoor attack. It is also demonstrated that the proposed method can achieve high defense performance against backdoor attacks under different attack settings (trigger transparency, trigger size and trigger pattern). In addition, the experimental comparison with related work demonstrates that the proposed method has better detection performance and higher detection efficiency.
引用
收藏
页码:564 / 577
页数:14
相关论文
共 50 条
  • [1] Detection of backdoor attacks using targeted universal adversarial perturbations for deep neural networks
    Qu, Yubin
    Huang, Song
    Chen, Xiang
    Wang, Xingya
    Yao, Yongming
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2024, 207
  • [2] Detecting Backdoor Attacks via Class Difference in Deep Neural Networks
    Kwon, Hyun
    [J]. IEEE ACCESS, 2020, 8 : 191049 - 191056
  • [3] HYBRID DEFENSE FOR DEEP NEURAL NETWORKS: AN INTEGRATION OF DETECTING AND CLEANING ADVERSARIAL PERTURBATIONS
    Fan, Weiqi
    Sun, Guangling
    Su, Yuying
    Liu, Zhi
    Lu, Xiaofeng
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2019, : 210 - 215
  • [4] Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations
    Peng, Zirui
    Li, Shaofeng
    Chen, Guoxing
    Zhang, Cheng
    Zhu, Haojin
    Xue, Minhui
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13420 - 13429
  • [5] Detecting adversarial examples via prediction difference for deep neural networks
    Guo, Feng
    Zhao, Qingjie
    Li, Xuan
    Kuang, Xiaohui
    Zhang, Jianwei
    Han, Yahong
    Tan, Yu-an
    [J]. INFORMATION SCIENCES, 2019, 501 : 182 - 192
  • [6] Generalizing universal adversarial perturbations for deep neural networks
    Yanghao Zhang
    Wenjie Ruan
    Fu Wang
    Xiaowei Huang
    [J]. Machine Learning, 2023, 112 : 1597 - 1626
  • [7] Luring Transferable Adversarial Perturbations for Deep Neural Networks
    Bernhard, Remi
    Moellic, Pierre-Alain
    Dutertre, Jean-Max
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [8] Generalizing universal adversarial perturbations for deep neural networks
    Zhang, Yanghao
    Ruan, Wenjie
    Wang, Fu
    Huang, Xiaowei
    [J]. MACHINE LEARNING, 2023, 112 (05) : 1597 - 1626
  • [9] Detecting adversarial example attacks to deep neural networks
    Carrara, Fabio
    Falchi, Fabrizio
    Caldelli, Roberto
    Amato, Giuseppe
    Fumarola, Roberta
    Becarelli, Rudy
    [J]. PROCEEDINGS OF THE 15TH INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2017,
  • [10] Natural Backdoor Attacks on Deep Neural Networks via Raindrops
    Zhao, Feng
    Zhou, Li
    Zhong, Qi
    Lan, Rushi
    Zhang, Leo Yu
    [J]. SECURITY AND COMMUNICATION NETWORKS, 2022, 2022