Data Stealing Attacks against Large Language Models via Backdooring

被引:0
|
作者
He, Jiaming [1 ]
Hou, Guanyu [1 ]
Jia, Xinyue [1 ]
Chen, Yangyang [1 ]
Liao, Wenqi [1 ]
Zhou, Yinhang [2 ]
Zhou, Rang [1 ]
机构
[1] Chengdu Univ Technol, Coll Comp Sci & Cyber Secur, Oxford Brookes Coll, Chengdu 610059, Peoples R China
[2] Shenyang Normal Univ, Software Coll, Shenyang 110034, Peoples R China
关键词
data privacy; large language models; stealing attacks;
D O I
10.3390/electronics13142858
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large language models (LLMs) have gained immense attention and are being increasingly applied in various domains. However, this technological leap forward poses serious security and privacy concerns. This paper explores a novel approach to data stealing attacks by introducing an adaptive method to extract private training data from pre-trained LLMs via backdooring. Our method mainly focuses on the scenario of model customization and is conducted in two phases, including backdoor training and backdoor activation, which allow for the extraction of private information without prior knowledge of the model's architecture or training data. During the model customization stage, attackers inject the backdoor into the pre-trained LLM by poisoning a small ratio of the training dataset. During the inference stage, attackers can extract private information from the third-party knowledge database by incorporating the pre-defined backdoor trigger. Our method leverages the customization process of LLMs, injecting a stealthy backdoor that can be triggered after deployment to retrieve private data. We demonstrate the effectiveness of our proposed attack through extensive experiments, achieving a notable attack success rate. Extensive experiments demonstrate the effectiveness of our stealing attack in popular LLM architectures, as well as stealthiness during normal inference.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Defending Language Models Against Image-Based Prompt Attacks via User-Provided Specifications
    Sharma, Reshabh K.
    Gupta, Vinayak
    Grossman, Dan
    PROCEEDINGS 45TH IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS, SPW 2024, 2024, : 112 - 131
  • [22] DiffDefense: Defending Against Adversarial Attacks via Diffusion Models
    Silva, Hondamunige Prasanna
    Seidenari, Lorenzo
    Del Bimbo, Alberto
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT II, 2023, 14234 : 430 - 442
  • [23] HARNESSING TASK OVERLOAD FOR SCALABLE JAILBREAK ATTACKS ON LARGE LANGUAGE MODELS
    Dong, Yiting
    Shen, Guobin
    Zhao, Dongcheng
    He, Xiang
    Zeng, Yi
    arXiv,
  • [24] Demystifying Data Management for Large Language Models
    Miao, Xupeng
    Jia, Zhihao
    Cui, Bin
    COMPANION OF THE 2024 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, SIGMOD-COMPANION 2024, 2024, : 547 - 555
  • [25] Lookin' Out My Backdoor! Investigating Backdooring Attacks Against DL-driven Malware Detectors
    D'Onghia, Mario
    Di Cesare, Federico
    Gallo, Luigi
    Carminati, Michele
    Polino, Mario
    Zanero, Stefano
    PROCEEDINGS OF THE 16TH ACM WORKSHOP ON ARTIFICIAL INTELLIGENCE AND SECURITY, AISEC 2023, 2023, : 209 - 220
  • [26] Adversarial Attacks Against Deep Generative Models on Data: A Survey
    Sun, Hui
    Zhu, Tianqing
    Zhang, Zhiqiu
    Jin, Dawei
    Xiong, Ping
    Zhou, Wanlei
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (04) : 3367 - 3388
  • [27] Data Poisoning Attacks Against Outcome Interpretations of Predictive Models
    Zhang, Hengtong
    Gao, Jing
    Su, Lu
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 2165 - 2173
  • [28] UnSplit: Data-Oblivious Model Inversion, Model Stealing, and Label Inference Attacks Against Split Learning
    Erdogan, Ege
    Kupcu, Alptekin
    Cicek, A. Ercument
    PROCEEDINGS OF THE 21ST WORKSHOP ON PRIVACY IN THE ELECTRONIC SOCIETY, WPES 2022, 2022, : 115 - 124
  • [29] Stealing Machine Learning Models: Attacks and Countermeasures for Generative Adversarial Networks
    Hu, Hailong
    Pang, Jun
    37TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, ACSAC 2021, 2021, : 1 - 16
  • [30] Stealing Machine Learning Parameters via Side Channel Power Attacks
    Wolf, Shaya
    Hu, Hui
    Cooley, Rafer
    Borowczak, Mike
    2021 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2021), 2021, : 242 - 247