Data Stealing Attacks against Large Language Models via Backdooring

被引:0
|
作者
He, Jiaming [1 ]
Hou, Guanyu [1 ]
Jia, Xinyue [1 ]
Chen, Yangyang [1 ]
Liao, Wenqi [1 ]
Zhou, Yinhang [2 ]
Zhou, Rang [1 ]
机构
[1] Chengdu Univ Technol, Coll Comp Sci & Cyber Secur, Oxford Brookes Coll, Chengdu 610059, Peoples R China
[2] Shenyang Normal Univ, Software Coll, Shenyang 110034, Peoples R China
关键词
data privacy; large language models; stealing attacks;
D O I
10.3390/electronics13142858
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large language models (LLMs) have gained immense attention and are being increasingly applied in various domains. However, this technological leap forward poses serious security and privacy concerns. This paper explores a novel approach to data stealing attacks by introducing an adaptive method to extract private training data from pre-trained LLMs via backdooring. Our method mainly focuses on the scenario of model customization and is conducted in two phases, including backdoor training and backdoor activation, which allow for the extraction of private information without prior knowledge of the model's architecture or training data. During the model customization stage, attackers inject the backdoor into the pre-trained LLM by poisoning a small ratio of the training dataset. During the inference stage, attackers can extract private information from the third-party knowledge database by incorporating the pre-defined backdoor trigger. Our method leverages the customization process of LLMs, injecting a stealthy backdoor that can be triggered after deployment to retrieve private data. We demonstrate the effectiveness of our proposed attack through extensive experiments, achieving a notable attack success rate. Extensive experiments demonstrate the effectiveness of our stealing attack in popular LLM architectures, as well as stealthiness during normal inference.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing
    Zhao, Wei
    Li, Zhe
    Li, Yige
    Zhang, Ye
    Sun, Jun
    arXiv,
  • [2] Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing
    Zhao, Wei
    Li, Zhe
    Li, Yige
    Zhang, Ye
    Sun, Jun
    EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024, 2024, : 5094 - 5109
  • [3] JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
    Feng, Yingchaojie
    Chen, Zhizhang
    Kang, Zhining
    Wang, Sijia
    Zhu, Minfeng
    Zhang, Wei
    Chen, Wei
    arXiv,
  • [4] Adversarial Attacks on Large Language Models
    Zou, Jing
    Zhang, Shungeng
    Qiu, Meikang
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT IV, KSEM 2024, 2024, 14887 : 85 - 96
  • [5] Medical large language models are vulnerable to data-poisoning attacks
    Daniel Alexander Alber
    Zihao Yang
    Anton Alyakin
    Eunice Yang
    Sumedha Rai
    Aly A. Valliani
    Jeff Zhang
    Gabriel R. Rosenbaum
    Ashley K. Amend-Thomas
    David B. Kurland
    Caroline M. Kremer
    Alexander Eremiev
    Bruck Negash
    Daniel D. Wiggan
    Michelle A. Nakatsuka
    Karl L. Sangwon
    Sean N. Neifert
    Hammad A. Khan
    Akshay Vinod Save
    Adhith Palla
    Eric A. Grin
    Monika Hedman
    Mustafa Nasir-Moin
    Xujin Chris Liu
    Lavender Yao Jiang
    Michal A. Mankowski
    Dorry L. Segev
    Yindalon Aphinyanaphongs
    Howard A. Riina
    John G. Golfinos
    Daniel A. Orringer
    Douglas Kondziolka
    Eric Karl Oermann
    Nature Medicine, 2025, 31 (2) : 618 - 626
  • [6] BadCodePrompt: backdoor attacks against prompt engineering of large language models for code generation
    Qu, Yubin
    Huang, Song
    Li, Yanzhou
    Bai, Tongtong
    Chen, Xiang
    Wang, Xingya
    Li, Long
    Yao, Yongming
    Automated Software Engineering, 2025, 32 (01)
  • [7] Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks
    Liu, Kang
    Dolan-Gavitt, Brendan
    Garg, Siddharth
    RESEARCH IN ATTACKS, INTRUSIONS, AND DEFENSES, RAID 2018, 2018, 11050 : 273 - 294
  • [8] Prompt Engineering: Unleashing the Power of Large Language Models to Defend Against Social Engineering Attacks
    Nezer, Ahmed I.
    Nema, Bashar M.
    Salim, Wisam Makki
    Iraqi Journal for Computer Science and Mathematics, 2024, 5 (03): : 404 - 416
  • [9] Stealing the Decoding Algorithms of Language Models
    Naseh, Ali
    Krishna, Kalpesh
    Iyyer, Mohit
    Houmansadr, Amir
    PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 1835 - 1849
  • [10] Data Poisoning Attacks against Autoregressive Models
    Alfeld, Scott
    Zhu, Xiaojin
    Barford, Paul
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1452 - 1458