Selective privacy-preserving framework for large language models fine-tuning

被引：0

作者：

Wang, Teng ^{[1
]}

Zhai, Lindong ^{[1
]}

Yang, Tengfei ^{[1
]}

Luo, Zhucheng ^{[2
]}

Liu, Shuanggen ^{[1
]}

机构：

[1] Xian Univ Posts & Telecommun, Sch Cyberspace Secur, Xian 710121, Shaanxi, Peoples R China

[2] Sun Yat Sen Univ, Affiliated Hosp 3, Informat Ctr, Guangzhou 510630, Peoples R China

来源：

INFORMATION SCIENCES | 2024年 / 678卷

基金：

中国国家自然科学基金;

关键词：

Large language models; Fine-tuning; Local differential privacy; Selective privacy protection; DIFFERENTIAL PRIVACY;

D O I：

10.1016/j.ins.2024.121000

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Fine-tuning pre -trained large language models (LLMs) helps various downstream tasks, but brings serious privacy leaks when relying on large amounts of data for training. Differentially private stochastic gradient descent (DPSGD) has been designed to introduce noise during model updates to prevent privacy leaks. Nevertheless, fine-tuning LLMs via DPSGD limits the model utility since heavy perturbations are introduced on large high -dimensional gradients. Besides, existing privacy -preserving mechanisms directly perturb all tokens of the input sentences, which are too pessimistic to achieve good model performance. Therefore, this paper researches a selective privacy -preserving framework for fine-tuning LLMs. We propose a first -of -its -kind privacy notion called selective sequence local differential privacy (S-SeqLDP), which provides guarantees of indistinguishability only for the secret part of the sequences. Furthermore, we design a novel framework called SLDP-FT that enables S-SeqLDP-compliant large language model fine-tuning by perturbing the forward -pass embeddings with selective noises. We innovatively investigate the privacy forward weight that determines the noise magnitude of achieving selective privacy protection. Extensive experiments on three tasks demonstrate that our SLDP-FT achieves better model accuracy than state-of-the-art techniques when providing the same level of privacy protection.

引用

页数：14

共 50 条

[1] EW-Tune: A Framework for Privately Fine-Tuning Large Language Models with Differential Privacy
Behnia, Rouzbeh
Ebrahimi, Mohammadreza
Pacheco, Jason
Padmanabhan, Balaji
2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW, 2022, : 560 - 566
[2] Transferrable DP-Adapter Tuning: A Privacy-Preserving Multimodal Parameter-Efficient Fine-Tuning Framework
Ji, Lixia
Xiao, Shijie
Xu, Bingzhi
Zhang, Han
2024 IEEE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2024, : 471 - 482
[3] Efficient Fine-Tuning with Domain Adaptation for Privacy-Preserving Vision Transformer
Nagamori, Teru
Shiota, Sayaka
Kiya, Hitoshi
APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2024, 13 (01)
[4] Fine-tuning a local LLaMA-3 large language model for automated privacy-preserving physician letter generation in radiation oncology
Hou, Yihao
Bert, Christoph
Gomaa, Ahmed
Lahmer, Godehard
Hoefler, Daniel
Weissmann, Thomas
Voigt, Raphaela
Schubert, Philipp
Schmitter, Charlotte
Depardon, Alina
Semrau, Sabine
Maier, Andreas
Fietkau, Rainer
Huang, Yixing
Putz, Florian
FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2025, 7
[5] Phased Instruction Fine-Tuning for Large Language Models
Pang, Wei
Zhou, Chuan
Zhou, Xiao-Hua
Wang, Xiaojie
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 5735 - 5748
[6] HackMentor: Fine-Tuning Large Language Models for Cybersecurity
Zhang, Jie
Wen, Hui
Deng, Liting
Xin, Mingfeng
Li, Zhi
Li, Lun
Zhu, Hongsong
Sun, Limin
2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 452 - 461
[7] Feasibility and Prospect of Privacy-preserving Large Language Models in Radiology
Cai, Wenli
RADIOLOGY, 2023, 309 (01)
[8] Demystifying Instruction Mixing for Fine-tuning Large Language Models
Wang, Renxi
Li, Haonan
Wu, Minghao
Wang, Yuxia
Han, Xudong
Zhang, Chiyu
Baldwin, Timothy
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 4: STUDENT RESEARCH WORKSHOP, 2024, : 86 - 93
[9] Getting it right: the limits of fine-tuning large language models
Browning, Jacob
ETHICS AND INFORMATION TECHNOLOGY, 2024, 26 (02)
[10] Scaling Federated Learning for Fine-Tuning of Large Language Models
Hilmkil, Agrin
Callh, Sebastian
Barbieri, Matteo
Sutfeld, Leon Rene
Zec, Edvin Listo
Mogren, Olof
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2021), 2021, 12801 : 15 - 23

← 1 2 3 4 5 →