Selective privacy-preserving framework for large language models fine-tuning

被引：0

作者：

Wang, Teng ^{[1
]}

Zhai, Lindong ^{[1
]}

Yang, Tengfei ^{[1
]}

Luo, Zhucheng ^{[2
]}

Liu, Shuanggen ^{[1
]}

机构：

[1] Xian Univ Posts & Telecommun, Sch Cyberspace Secur, Xian 710121, Shaanxi, Peoples R China

[2] Sun Yat Sen Univ, Affiliated Hosp 3, Informat Ctr, Guangzhou 510630, Peoples R China

来源：

INFORMATION SCIENCES | 2024年 / 678卷

基金：

中国国家自然科学基金;

关键词：

Large language models; Fine-tuning; Local differential privacy; Selective privacy protection; DIFFERENTIAL PRIVACY;

D O I：

10.1016/j.ins.2024.121000

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Fine-tuning pre -trained large language models (LLMs) helps various downstream tasks, but brings serious privacy leaks when relying on large amounts of data for training. Differentially private stochastic gradient descent (DPSGD) has been designed to introduce noise during model updates to prevent privacy leaks. Nevertheless, fine-tuning LLMs via DPSGD limits the model utility since heavy perturbations are introduced on large high -dimensional gradients. Besides, existing privacy -preserving mechanisms directly perturb all tokens of the input sentences, which are too pessimistic to achieve good model performance. Therefore, this paper researches a selective privacy -preserving framework for fine-tuning LLMs. We propose a first -of -its -kind privacy notion called selective sequence local differential privacy (S-SeqLDP), which provides guarantees of indistinguishability only for the secret part of the sequences. Furthermore, we design a novel framework called SLDP-FT that enables S-SeqLDP-compliant large language model fine-tuning by perturbing the forward -pass embeddings with selective noises. We innovatively investigate the privacy forward weight that determines the noise magnitude of achieving selective privacy protection. Extensive experiments on three tasks demonstrate that our SLDP-FT achieves better model accuracy than state-of-the-art techniques when providing the same level of privacy protection.

引用

页数：14

共 50 条

[21] Leveraging Large Language Models Knowledge Enhancement Dual-Stage Fine-Tuning Framework for Recommendation
Zeng, Biqing
Shi, Hao
Li, Yangyu
Li, Ruizhe
Deng, Huimin
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT II, NLPCC 2024, 2025, 15360 : 333 - 345
[22] Privacy-Preserving Techniques in Generative AI and Large Language Models: A Narrative Review
Feretzakis, Georgios
Papaspyridis, Konstantinos
Gkoulalas-Divanis, Aris
Verykios, Vassilios S.
INFORMATION, 2024, 15 (11)
[23] InferDPT: Privacy-preserving Inference for Black-box Large Language Models
Tong, Meng
Chen, Kejiang
Zhang, Jie
Qi, Yuang
Zhang, Weiming
Yu, Nenghai
Zhang, Tianwei
Zhang, Zhikun
arXiv, 2023,
[24] Enhancing Code Language Models for Program Repair by Curricular Fine-tuning Framework
Hao, Sichong
Shi, Xianjun
Liu, Hongwei
Shu, Yanjun
2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION, ICSME, 2023, : 136 - 146
[25] Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model
Zhang, Hengyuan
Wu, Yanru
Li, Dawei
Yang, Sak
Zhao, Rui
Jiang, Yong
Tan, Fei
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 7467 - 7509
[26] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models
Zong, Yongshuo
Bohdal, Ondrej
Yu, Tingyang
Yang, Yongxin
Hospedales, Timothy
arXiv, 1600,
[27] Prompting or Fine-tuning? A Comparative Study of Large Language Models for Taxonomy Construction
Chen, Boqi
Yi, Fandi
Varro, Daniel
2023 ACM/IEEE INTERNATIONAL CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS COMPANION, MODELS-C, 2023, : 588 - 596
[28] Enhanced Discriminative Fine-Tuning of Large Language Models for Chinese Text Classification
Song, Jinwang
Zan, Hongying
Zhang, Kunli
2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 168 - 174
[29] CSAFT: Continuous Semantic Augmentation Fine-Tuning for Legal Large Language Models
Li, Bo
Fan, Shuang
Huang, Jin
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT V, 2024, 15020 : 293 - 307
[30] Personalized Large Language Models through Parameter Efficient Fine-Tuning Techniques
Braga, Marco
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 3076 - 3076

← 1 2 3 4 5 →