Black-box Detection of Backdoor Attacks with Limited Information and Data

被引:27
|
作者
Dong, Yinpeng [1 ,2 ,4 ]
Yang, Xiao [1 ,2 ]
Deng, Zhijie [1 ,2 ]
Pang, Tianyu [1 ,2 ]
Xiao, Zihao [4 ]
Su, Hang [1 ,2 ,3 ]
Zhu, Jun [1 ,2 ,3 ,4 ]
机构
[1] Tsinghua Bosch Joint ML Ctr, Inst AI, Dept Comp Sci & Tech, Beijing, Peoples R China
[2] Tsinghua Univ, THBI Lab, BNRist Ctr, Beijing 100084, Peoples R China
[3] Pazhou Lab, Guangzhou 510330, Peoples R China
[4] Real AI, Beijing, Peoples R China
关键词
D O I
10.1109/ICCV48922.2021.01617
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although deep neural networks (DNNs) have made rapid progress in recent years, they are vulnerable in adversarial environments. A malicious backdoor could be embedded in a model by poisoning the training dataset, whose intention is to make the infected model give wrong predictions during inference when the specific trigger appears. To mitigate the potential threats of backdoor attacks, various backdoor detection and defense methods have been proposed. However, the existing techniques usually require the poisoned training data or access to the white-box model, which is commonly unavailable in practice. In this paper, we propose a blackbox backdoor detection (B3D) method to identify backdoor attacks with only query access to the model. We introduce a gradient-free optimization algorithm to reverse-engineer the potential trigger for each class, which helps to reveal the existence of backdoor attacks. In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models. Extensive experiments on hundreds of DNN models trained on several datasets corroborate the effectiveness of our method under the black-box setting against various backdoor attacks.
引用
收藏
页码:16462 / 16471
页数:10
相关论文
共 50 条
  • [1] Black-box Adversarial Attacks with Limited Queries and Information
    Ilyas, Andrew
    Engstrom, Logan
    Athalye, Anish
    Lin, Jessy
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [2] Inspecting Prediction Confidence for Detecting Black-Box Backdoor Attacks
    Wang, Tong
    Yao, Yuan
    Xu, Feng
    Xu, Miao
    An, Shengwei
    Wang, Ting
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 1, 2024, : 274 - 282
  • [3] Generative Adversarial Networks for Black-Box API Attacks with Limited Training Data
    Shi, Yi
    Sagduyu, Yalin E.
    Davaslioglu, Kemal
    Li, Jason H.
    [J]. 2018 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2018, : 453 - 458
  • [4] Black-Box Graph Backdoor Defense
    Yang, Xiao
    Li, Gaolei
    Tao, Xiaoyi
    Zhang, Chaofeng
    Li, Jianhua
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2023, PT V, 2024, 14491 : 163 - 180
  • [5] Backdoor attacks on black-box ciphers exploiting low-entropy plaintexts
    Young, A
    Yung, M
    [J]. INFORMATION SECURITY AND PRIVACY, PROCEEDINGS, 2003, 2727 : 297 - 311
  • [6] Black-Box Data Poisoning Attacks on Crowdsourcing
    Chen, Pengpeng
    Yang, Yongqiang
    Yang, Dingqi
    Sun, Hailong
    Chen, Zhijun
    Lin, Peng
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 2975 - 2983
  • [7] B3: Backdoor Attacks against Black-box Machine Learning Models
    Gong, Xueluan
    Chen, Yanjiao
    Yang, Wenbin
    Huang, Huayang
    Wang, Qian
    [J]. ACM TRANSACTIONS ON PRIVACY AND SECURITY, 2023, 26 (04)
  • [8] Differential Analysis of Triggers and Benign Features for Black-Box DNN Backdoor Detection
    Fu, Hao
    Krishnamurthy, Prashanth
    Garg, Siddharth
    Khorrami, Farshad
    [J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 4668 - 4680
  • [9] Black-box Attacks Against Neural Binary Function Detection
    Bundt, Joshua
    Davinroy, Michael
    Agadakos, Ioannis
    Oprea, Alina
    Robertson, William
    [J]. PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON RESEARCH IN ATTACKS, INTRUSIONS AND DEFENSES, RAID 2023, 2023, : 1 - 16
  • [10] Black-box Attacks to Log-based Anomaly Detection
    Huang, Shaohan
    Liu, Yi
    Fung, Carol
    Yang, Hailong
    Luan, Zhongzhi
    [J]. 2022 18TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM 2022): INTELLIGENT MANAGEMENT OF DISRUPTIVE NETWORK TECHNOLOGIES AND SERVICES, 2022, : 310 - 316