Selective privacy-preserving framework for large language models fine-tuning

被引:0
|
作者
Wang, Teng [1 ]
Zhai, Lindong [1 ]
Yang, Tengfei [1 ]
Luo, Zhucheng [2 ]
Liu, Shuanggen [1 ]
机构
[1] Xian Univ Posts & Telecommun, Sch Cyberspace Secur, Xian 710121, Shaanxi, Peoples R China
[2] Sun Yat Sen Univ, Affiliated Hosp 3, Informat Ctr, Guangzhou 510630, Peoples R China
基金
中国国家自然科学基金;
关键词
Large language models; Fine-tuning; Local differential privacy; Selective privacy protection; DIFFERENTIAL PRIVACY;
D O I
10.1016/j.ins.2024.121000
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Fine-tuning pre -trained large language models (LLMs) helps various downstream tasks, but brings serious privacy leaks when relying on large amounts of data for training. Differentially private stochastic gradient descent (DPSGD) has been designed to introduce noise during model updates to prevent privacy leaks. Nevertheless, fine-tuning LLMs via DPSGD limits the model utility since heavy perturbations are introduced on large high -dimensional gradients. Besides, existing privacy -preserving mechanisms directly perturb all tokens of the input sentences, which are too pessimistic to achieve good model performance. Therefore, this paper researches a selective privacy -preserving framework for fine-tuning LLMs. We propose a first -of -its -kind privacy notion called selective sequence local differential privacy (S-SeqLDP), which provides guarantees of indistinguishability only for the secret part of the sequences. Furthermore, we design a novel framework called SLDP-FT that enables S-SeqLDP-compliant large language model fine-tuning by perturbing the forward -pass embeddings with selective noises. We innovatively investigate the privacy forward weight that determines the noise magnitude of achieving selective privacy protection. Extensive experiments on three tasks demonstrate that our SLDP-FT achieves better model accuracy than state-of-the-art techniques when providing the same level of privacy protection.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Leveraging Large Language Models Knowledge Enhancement Dual-Stage Fine-Tuning Framework for Recommendation
    Zeng, Biqing
    Shi, Hao
    Li, Yangyu
    Li, Ruizhe
    Deng, Huimin
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT II, NLPCC 2024, 2025, 15360 : 333 - 345
  • [22] Privacy-Preserving Techniques in Generative AI and Large Language Models: A Narrative Review
    Feretzakis, Georgios
    Papaspyridis, Konstantinos
    Gkoulalas-Divanis, Aris
    Verykios, Vassilios S.
    INFORMATION, 2024, 15 (11)
  • [23] InferDPT: Privacy-preserving Inference for Black-box Large Language Models
    Tong, Meng
    Chen, Kejiang
    Zhang, Jie
    Qi, Yuang
    Zhang, Weiming
    Yu, Nenghai
    Zhang, Tianwei
    Zhang, Zhikun
    arXiv, 2023,
  • [24] Enhancing Code Language Models for Program Repair by Curricular Fine-tuning Framework
    Hao, Sichong
    Shi, Xianjun
    Liu, Hongwei
    Shu, Yanjun
    2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION, ICSME, 2023, : 136 - 146
  • [25] Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model
    Zhang, Hengyuan
    Wu, Yanru
    Li, Dawei
    Yang, Sak
    Zhao, Rui
    Jiang, Yong
    Tan, Fei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 7467 - 7509
  • [26] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models
    Zong, Yongshuo
    Bohdal, Ondrej
    Yu, Tingyang
    Yang, Yongxin
    Hospedales, Timothy
    arXiv, 1600,
  • [27] Prompting or Fine-tuning? A Comparative Study of Large Language Models for Taxonomy Construction
    Chen, Boqi
    Yi, Fandi
    Varro, Daniel
    2023 ACM/IEEE INTERNATIONAL CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS COMPANION, MODELS-C, 2023, : 588 - 596
  • [28] Enhanced Discriminative Fine-Tuning of Large Language Models for Chinese Text Classification
    Song, Jinwang
    Zan, Hongying
    Zhang, Kunli
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 168 - 174
  • [29] CSAFT: Continuous Semantic Augmentation Fine-Tuning for Legal Large Language Models
    Li, Bo
    Fan, Shuang
    Huang, Jin
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT V, 2024, 15020 : 293 - 307
  • [30] Personalized Large Language Models through Parameter Efficient Fine-Tuning Techniques
    Braga, Marco
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 3076 - 3076