Universal Certified Defense for Black-Box Models Based on Random Smoothing

被引:0
|
作者
Li Q. [1 ]
Chen J. [1 ,2 ]
Zhang Z.-J. [1 ]
He K. [1 ]
Du R.-Y. [1 ,3 ]
Wang X.-X. [1 ]
机构
[1] Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, Wuhan
[2] Institute of Information Technology, Wuhan University, Shandong, Rizhao
[3] Collaborative Innovation Center of Geospatial Technology, Wuhan
来源
关键词
black-box; certified defense; deep neural networks; models; substitute models; random smoothing;
D O I
10.11897/SP.J.1016.2024.00690
中图分类号
学科分类号
摘要
In recent years, the widespread application of image classification models based on deep neural networks (DNNs) has significantly impacted critical fields, including facial recognition and autonomous driving. These models have showcased remarkable performance, revolutionizing the way we interact with technology. However, despite their success, deep neural networks are not without vulnerabilities, particularly in the face of adversarial attacks, which can lead to misclassi-fication and compromise the integrity of these models. Addressing this challenge has become a pivotal research direction, as ensuring the robustness of these models is essential for their real-world deployment. Currently, many defense methods, especially empirical ones, operate under the white-box assumption. This assumption relies on defenders having access to detailed information about the model, including its architecture and parameters. Unfortunately, model owners often hesitate to share such sensitive information due to privacy concerns. Even existing black-box defense methods struggle to provide comprehensive protection against attacks involving all norms, lacking the necessary universality. This inherent limitation has spurred the need for innovative solutions. In response to this challenge, this paper proposes a groundbreaking universal black-box certified defense method applicable to a broad spectrum of black-box models. The key innovation lies in the design of a query-based data-free substitute model generation scheme. Unlike traditional methods, this scheme eliminates the need for training data and prior knowledge of the model structure. Leveraging queries and zero-order optimization, it generates high-quality substitute models, effectively transforming the certified defense scenario into a white-box setting without compromising model privacy. Furthermore, this paper introduces additional layers of security through the incorporation of random smoothing and noise selection methods based on the white-box substitute model. These enhancements contribute to the construction of a universal certified defense solution capable of resisting adversarial attacks involving any norm. To validate the effectiveness of the substitute model, performance comparisons are made with the original model under white-box certified defense conditions. The experimental results, particularly on the CIFAR10 dataset, showcase the superiority of the proposed universal black-box certified defense solution over existing methods. The solution not only achieves significant improvements in certification accuracy but also maintains similar performance to white-box certified defense methods. Notably, compared to previous black-box certified defense methods, the proposed solution demonstrates over a 20% improvement in certification accuracy while effectively safeguarding the privacy of the original model. Specifically, the proposed solution successfully reduces the success rate of membership inference attacks by 5. 48%, further highlighting its robustness and practical applicability in real-world scenarios. © 2024 Science Press. All rights reserved.
引用
收藏
页码:690 / 702
页数:12
相关论文
共 31 条
  • [1] Wang M, Deng W., Deep face recognition: A survey, Neuro-computing, 429, pp. 215-244, (2021)
  • [2] Grigorescu S, Trasnea B, Cocias T, Et al., A survey of deep learning techniques for autonomous driving, Journal of Field Robotics, 37, 3, pp. 362-386, (2020)
  • [3] Szegedy C, Zaremba W, Sutskever I, Et al., Intriguing properties of neural networks, (2013)
  • [4] Jia S, Yin B, Yao T, Et al., Adv-attribute: Inconspicuous and transferable adversarial attack on face recognition, Advances in Neural Information Processing Systems, 35, pp. 34136-34147, (2022)
  • [5] Zhang Q, IIu S, Sun J, Et al., On adversarial robustness of trajectory prediction for autonomous vehicles, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15159-15168, (2022)
  • [6] Xu W, Evans D, Qi Y., Feature squeezing: Detecting adversarial examples in deep neural networks, Proceedings of the 2018 Network and Distributed System Security Symposium, (2018)
  • [7] Geifman Y, El-yaniv R., SelectiveNet
  • [8] A deep neural network with an integrated reject option, Proceedings of the International Conference on Machine Learning, pp. 2151-2159, (2019)
  • [9] Tian J, Zhou J, Li Y, Et al., Detecting adversarial examples from sensitivity inconsistency of spatial-transform domain, Proceedings of the AAAI Conference on Artificial Intelligence, pp. 9877-9885, (2021)
  • [10] Madry A, Makelov A, Schmidt L, Et al., Towards deep learning models resistant to adversarial attacks, Proceedings of the International Conference on Learning Representations, pp. 1-23, (2018)