Selective privacy-preserving framework for large language models fine-tuning

被引:0
|
作者
Wang, Teng [1 ]
Zhai, Lindong [1 ]
Yang, Tengfei [1 ]
Luo, Zhucheng [2 ]
Liu, Shuanggen [1 ]
机构
[1] Xian Univ Posts & Telecommun, Sch Cyberspace Secur, Xian 710121, Shaanxi, Peoples R China
[2] Sun Yat Sen Univ, Affiliated Hosp 3, Informat Ctr, Guangzhou 510630, Peoples R China
基金
中国国家自然科学基金;
关键词
Large language models; Fine-tuning; Local differential privacy; Selective privacy protection; DIFFERENTIAL PRIVACY;
D O I
10.1016/j.ins.2024.121000
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Fine-tuning pre -trained large language models (LLMs) helps various downstream tasks, but brings serious privacy leaks when relying on large amounts of data for training. Differentially private stochastic gradient descent (DPSGD) has been designed to introduce noise during model updates to prevent privacy leaks. Nevertheless, fine-tuning LLMs via DPSGD limits the model utility since heavy perturbations are introduced on large high -dimensional gradients. Besides, existing privacy -preserving mechanisms directly perturb all tokens of the input sentences, which are too pessimistic to achieve good model performance. Therefore, this paper researches a selective privacy -preserving framework for fine-tuning LLMs. We propose a first -of -its -kind privacy notion called selective sequence local differential privacy (S-SeqLDP), which provides guarantees of indistinguishability only for the secret part of the sequences. Furthermore, we design a novel framework called SLDP-FT that enables S-SeqLDP-compliant large language model fine-tuning by perturbing the forward -pass embeddings with selective noises. We innovatively investigate the privacy forward weight that determines the noise magnitude of achieving selective privacy protection. Extensive experiments on three tasks demonstrate that our SLDP-FT achieves better model accuracy than state-of-the-art techniques when providing the same level of privacy protection.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] EW-Tune: A Framework for Privately Fine-Tuning Large Language Models with Differential Privacy
    Behnia, Rouzbeh
    Ebrahimi, Mohammadreza
    Pacheco, Jason
    Padmanabhan, Balaji
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW, 2022, : 560 - 566
  • [2] Transferrable DP-Adapter Tuning: A Privacy-Preserving Multimodal Parameter-Efficient Fine-Tuning Framework
    Ji, Lixia
    Xiao, Shijie
    Xu, Bingzhi
    Zhang, Han
    2024 IEEE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2024, : 471 - 482
  • [3] Efficient Fine-Tuning with Domain Adaptation for Privacy-Preserving Vision Transformer
    Nagamori, Teru
    Shiota, Sayaka
    Kiya, Hitoshi
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2024, 13 (01)
  • [4] Fine-tuning a local LLaMA-3 large language model for automated privacy-preserving physician letter generation in radiation oncology
    Hou, Yihao
    Bert, Christoph
    Gomaa, Ahmed
    Lahmer, Godehard
    Hoefler, Daniel
    Weissmann, Thomas
    Voigt, Raphaela
    Schubert, Philipp
    Schmitter, Charlotte
    Depardon, Alina
    Semrau, Sabine
    Maier, Andreas
    Fietkau, Rainer
    Huang, Yixing
    Putz, Florian
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2025, 7
  • [5] Phased Instruction Fine-Tuning for Large Language Models
    Pang, Wei
    Zhou, Chuan
    Zhou, Xiao-Hua
    Wang, Xiaojie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 5735 - 5748
  • [6] HackMentor: Fine-Tuning Large Language Models for Cybersecurity
    Zhang, Jie
    Wen, Hui
    Deng, Liting
    Xin, Mingfeng
    Li, Zhi
    Li, Lun
    Zhu, Hongsong
    Sun, Limin
    2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 452 - 461
  • [7] Feasibility and Prospect of Privacy-preserving Large Language Models in Radiology
    Cai, Wenli
    RADIOLOGY, 2023, 309 (01)
  • [8] Demystifying Instruction Mixing for Fine-tuning Large Language Models
    Wang, Renxi
    Li, Haonan
    Wu, Minghao
    Wang, Yuxia
    Han, Xudong
    Zhang, Chiyu
    Baldwin, Timothy
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 4: STUDENT RESEARCH WORKSHOP, 2024, : 86 - 93
  • [9] Getting it right: the limits of fine-tuning large language models
    Browning, Jacob
    ETHICS AND INFORMATION TECHNOLOGY, 2024, 26 (02)
  • [10] Scaling Federated Learning for Fine-Tuning of Large Language Models
    Hilmkil, Agrin
    Callh, Sebastian
    Barbieri, Matteo
    Sutfeld, Leon Rene
    Zec, Edvin Listo
    Mogren, Olof
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2021), 2021, 12801 : 15 - 23