Backdoor Attacks via Machine Unlearning

被引：0

作者：

Liu, Zihao ^{[1
]}

Wang, Tianhao ^{[2
]}

Huai, Mengdi ^{[1
]}

Miao, Chenglin ^{[1
]}

机构：

[1] Iowa State Univ, Dept Comp Sci, Ames, IA 50011 USA

[2] Univ Virginia, Dept Comp Sci, Charlottesville, VA 22903 USA

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13 | 2024年

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As a new paradigm to erase data from a model and protect user privacy, machine unlearning has drawn significant attention. However, existing studies on machine unlearning mainly focus on its effectiveness and efficiency, neglecting the security challenges introduced by this technique. In this paper, we aim to bridge this gap and study the possibility of conducting malicious attacks leveraging machine unlearning. Specifically, we consider the backdoor attack via machine unlearning, where an attacker seeks to inject a backdoor in the unlearned model by submitting malicious unlearning requests, so that the prediction made by the unlearned model can be changed when a particular trigger presents. In our study, we propose two attack approaches. The first attack approach does not require the attacker to poison any training data of the model. The attacker can achieve the attack goal only by requesting to unlearn a small subset of his contributed training data. The second approach allows the attacker to poison a few training instances with a pre-defined trigger upfront, and then activate the attack via submitting a malicious unlearning request. Both attack approaches are proposed with the goal of maximizing the attack utility while ensuring attack stealthiness. The effectiveness of the proposed attacks is demonstrated with different machine unlearning algorithms as well as different models on different datasets.

引用

页码：14115 / 14123

页数：9

共 50 条

[41] Defending against Insertion-based Textual Backdoor Attacks via Attribution
Li, Jiazhao
Wu, Zhuofeng
Ping, Wei
Xiao, Chaowei
Vydiswaran, V. G. Vinod
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8818 - 8833
[42] Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution
Qi, Fanchao
Yao, Yuan
Xu, Sophia
Liu, Zhiyuan
Sun, Maosong
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4873 - 4883
[43] FDNet: Imperceptible backdoor attacks via frequency domain steganography and negative sampling
Dong, Liang
Fu, Zhongwang
Chen, Leiyang
Ding, Hongwei
Zheng, Chengliang
Cui, Xiaohui
Shen, Zhidong
NEUROCOMPUTING, 2024, 583
[44] Detecting Textual Backdoor Attacks via Class Difference for Text Classification System
Kwon, Hyun
Lee, Jun
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2025, E108D (02) : 114 - 123
[45] DATAELIXIR: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via Diffusion Models
Zhou, Jiachen
Lv, Peizhuo
Lan, Yibing
Meng, Guozhu
Chen, Kai
Ma, Hualong
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 19, 2024, : 21850 - 21858
[46] The Design and Development of a Game to Study Backdoor Poisoning Attacks: The Backdoor Game
Ashktorab, Zahra
Dugan, Casey
Johnson, James
Sharma, Aabhas
Torres, Dustin Ramsey
Lange, Ingrid
Hoover, Benjamin
Ludwig, Heiko
Chen, Bryant
Baracaldo, Nathalie
Geyer, Werner
Pan, Qian
IUI '21 - 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES, 2021, : 423 - 433
[47] Adaptive Machine Unlearning
Gupta, Varun
Jung, Christopher
Neel, Seth
Roth, Aaron
Sharifi-Malvajerdi, Saeed
Waites, Chris
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[48] A Review on Machine Unlearning
Zhang H.
Nakamura T.
Isohara T.
Sakurai K.
SN Computer Science, 4 (4)
[49] Clean-Image Backdoor Attacks
Rong, Dazhong
Yu, Guoyao
Shen, Shuheng
Fu, Xinyi
Qian, Peng
Chen, Jianhai
He, Qinming
Fu, Xing
Wang, Weiqiang
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT X, 2024, 15025 : 187 - 202
[50] Backdoor Attacks against Learning Systems
Ji, Yujie
Zhang, Xinyang
Wang, Ting
2017 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY (CNS), 2017, : 191 - 199

← 1 2 3 4 5 →