Data Stealing Attacks against Large Language Models via Backdooring

被引：0

作者：

He, Jiaming ^{[1
]}

Hou, Guanyu ^{[1
]}

Jia, Xinyue ^{[1
]}

Chen, Yangyang ^{[1
]}

Liao, Wenqi ^{[1
]}

Zhou, Yinhang ^{[2
]}

Zhou, Rang ^{[1
]}

机构：

[1] Chengdu Univ Technol, Coll Comp Sci & Cyber Secur, Oxford Brookes Coll, Chengdu 610059, Peoples R China

[2] Shenyang Normal Univ, Software Coll, Shenyang 110034, Peoples R China

来源：

ELECTRONICS | 2024年 / 13卷 / 14期

关键词：

data privacy; large language models; stealing attacks;

D O I：

10.3390/electronics13142858

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Large language models (LLMs) have gained immense attention and are being increasingly applied in various domains. However, this technological leap forward poses serious security and privacy concerns. This paper explores a novel approach to data stealing attacks by introducing an adaptive method to extract private training data from pre-trained LLMs via backdooring. Our method mainly focuses on the scenario of model customization and is conducted in two phases, including backdoor training and backdoor activation, which allow for the extraction of private information without prior knowledge of the model's architecture or training data. During the model customization stage, attackers inject the backdoor into the pre-trained LLM by poisoning a small ratio of the training dataset. During the inference stage, attackers can extract private information from the third-party knowledge database by incorporating the pre-defined backdoor trigger. Our method leverages the customization process of LLMs, injecting a stealthy backdoor that can be triggered after deployment to retrieve private data. We demonstrate the effectiveness of our proposed attack through extensive experiments, achieving a notable attack success rate. Extensive experiments demonstrate the effectiveness of our stealing attack in popular LLM architectures, as well as stealthiness during normal inference.

引用

页数：19

共 50 条

[1] Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing
Zhao, Wei
Li, Zhe
Li, Yige
Zhang, Ye
Sun, Jun
arXiv,
[2] Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing
Zhao, Wei
Li, Zhe
Li, Yige
Zhang, Ye
Sun, Jun
EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024, 2024, : 5094 - 5109
[3] JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
Feng, Yingchaojie
Chen, Zhizhang
Kang, Zhining
Wang, Sijia
Zhu, Minfeng
Zhang, Wei
Chen, Wei
arXiv,
[4] Adversarial Attacks on Large Language Models
Zou, Jing
Zhang, Shungeng
Qiu, Meikang
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT IV, KSEM 2024, 2024, 14887 : 85 - 96
[5] Medical large language models are vulnerable to data-poisoning attacks
Daniel Alexander Alber
Zihao Yang
Anton Alyakin
Eunice Yang
Sumedha Rai
Aly A. Valliani
Jeff Zhang
Gabriel R. Rosenbaum
Ashley K. Amend-Thomas
David B. Kurland
Caroline M. Kremer
Alexander Eremiev
Bruck Negash
Daniel D. Wiggan
Michelle A. Nakatsuka
Karl L. Sangwon
Sean N. Neifert
Hammad A. Khan
Akshay Vinod Save
Adhith Palla
Eric A. Grin
Monika Hedman
Mustafa Nasir-Moin
Xujin Chris Liu
Lavender Yao Jiang
Michal A. Mankowski
Dorry L. Segev
Yindalon Aphinyanaphongs
Howard A. Riina
John G. Golfinos
Daniel A. Orringer
Douglas Kondziolka
Eric Karl Oermann
Nature Medicine, 2025, 31 (2) : 618 - 626
[6] BadCodePrompt: backdoor attacks against prompt engineering of large language models for code generation
Qu, Yubin
Huang, Song
Li, Yanzhou
Bai, Tongtong
Chen, Xiang
Wang, Xingya
Li, Long
Yao, Yongming
Automated Software Engineering, 2025, 32 (01)
[7] Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks
Liu, Kang
Dolan-Gavitt, Brendan
Garg, Siddharth
RESEARCH IN ATTACKS, INTRUSIONS, AND DEFENSES, RAID 2018, 2018, 11050 : 273 - 294
[8] Prompt Engineering: Unleashing the Power of Large Language Models to Defend Against Social Engineering Attacks
Nezer, Ahmed I.
Nema, Bashar M.
Salim, Wisam Makki
Iraqi Journal for Computer Science and Mathematics, 2024, 5 (03): : 404 - 416
[9] Stealing the Decoding Algorithms of Language Models
Naseh, Ali
Krishna, Kalpesh
Iyyer, Mohit
Houmansadr, Amir
PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 1835 - 1849
[10] Data Poisoning Attacks against Autoregressive Models
Alfeld, Scott
Zhu, Xiaojin
Barford, Paul
THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1452 - 1458

← 1 2 3 4 5 →