Three-Stage Adversarial Perturbation Generation Active Defense Algorithm for Facial Attribute Editing

被引:0
|
作者
Chen B. [1 ,2 ,3 ]
Zhang H.-T. [1 ,3 ]
Li Y.-R. [1 ,3 ]
机构
[1] Engineering Research Center of Digital Forensics, Ministry of Education, Nanjing University of Information Science and Technology, Nanjing
[2] Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing University of Information Science and Technology, Nanjing
[3] School of Computer Science, Nanjing University of Information Science and Technology, Nanjing
来源
关键词
active defense; adversarial attack; auxiliary classifier; alternate training; facial attribute editing;
D O I
10.11897/SP.J.1016.2024.00677
中图分类号
学科分类号
摘要
With the gradual maturity of deep generation technology, the facial image generated by facial attribute editing technologies appears to mix the spurious with the genuine. Once these facial attribute editing technologies are maliciously used, such as infringing on personal privacy, and maliciously guiding public opinion, etc., they may trigger some moral, social, and security issues. Regarding the resolution of these malicious facial attribute editing behaviors, although the current passive defense technology based on forensics has achieved considerable performance, it can only provide evidence for tampering behavior and cannot prevent its occurrence, which is difficult to eliminate the losses caused by malicious tampering behavior. Then, the active defense technology has emerged. It prevents face from being tampered with by disrupting the output of facial attribute editing. However, the existing two-stage training active defense framework for facial attribute editing has the issues of insufficient transferability and perturbation robustness. Therefore, this paper proposes a three-stage adversarial perturbation active defense framework for facial attribute editing by optimizing the two-stage training architecture and its loss function and introducing an auxiliary classifier. This paper first modifies the substitute target model in the two-stage training architecture and designs the attribute editing loss for the training of perturbation generator to improve the reconstruction performance and attribute constraint ability of the substitute model, thus reducing the overfitting issue of the substitute model; Secondly, the auxiliary classifier is introduced in the training phase to classify the source attributes of the encoded features extracted by the substitute model and the corresponding auxiliary classifier loss is designed for the training of perturbation generator. Then, the original two-stage alternate training is changed to the three-stage alternate training of substitute target model, auxiliary classifier and perturbation generator, so that it is expected to promote active defense against tampering model by countering auxiliary classifier; Finally, an attack layer is introduced in the training of the perturbation generator to enhance the robustness of the adversarial perturbation against filtering and joint photographic experts group (JPEG) compression. Experimental results on five facial attribute editing models (StarGAN, AttGAN with difference attribute vector input, AttGAN with target attribute vector input, STD-GAN, and style-aware model) show that the proposed framework can better migrate active defense from the white-box substitute model to the black-box attribute editing model than the existing frameworks, improving 16. 17% in terms of peak signal-to-noise ratio (PSNR) in the case of black-box, and the generated adversarial perturbation has stronger robustness against JPEG compression and filtering than the baseline, improving 13. 91% in terms of PSNR for JPEG compression, and 17. 76% for the Gaussian filtering. © 2024 Science Press. All rights reserved.
引用
收藏
页码:677 / 689
页数:12
相关论文
共 27 条
  • [1] Goodfellow I, Pouget-Abadie J, Mirza M, Et al., Generative adversarial networks, Communications of the ACM, 63, 11, pp. 139-144, (2020)
  • [2] He Z, Zuo W, Kan M, Et al., AttGAN: Facial attribute editing by only changing what you want, IEEE Transactions on Image Processing, 28, 11, pp. 5464-5478, (2019)
  • [3] Choi Y, Choi M, Kim M, Et al., StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8789-8797, (2018)
  • [4] Huang Q, Zhang J, Zhou W, Et al., Initiative defense against facial manipulation, Proceedings of the 35 th AAAI Conference on Artificial Intelligence. Virtual, 35, 2, pp. 1619-1627, (2021)
  • [5] Ruiz N, Bargal S A, Sclaroff S., Disrupting deepfakes: Adversarial attacks against conditional image translation networks and facial manipulation systems, Proceedings of the Computer Vision—ECCV 2020 Workshops, pp. 236-251, (2020)
  • [6] Ji Shou-Ling, Du Tian-Yu, Deng Shui-Guang, Et al., Robustness certification research on deep learning models: A survey, Chinese Journal of Computers, 45, 1, pp. 190-206, (2022)
  • [7] Yeh C Y, Chen II W, Tsai S L, Et al., Disrupting image-translation-based deepfake algorithms with adversarial attacks, Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, pp. 53-62, (2020)
  • [8] Fang Z, Yang Y, Lin J, Et al., Adversarial attacks for multi target image translation networks, Proceedings of the 2020 IEEE International Conference on Progress in Informatics and Computing, pp. 179-184, (2020)
  • [9] Qiu II, Du Y, Lu T., The framework of cross-domain and model adversarial attack against deepfake, Future Internet, 14, 2, pp. 1-16, (2022)
  • [10] Huang II, Wang Y, Chen Z, Et al., CMUA-watermark: A cross-model universal adversarial watermark for combating deepfakes, Proceedings of the 36th AAAI Conference on Artificial Intelligence. Virtual, 36, 1, pp. 989-997, (2022)