Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

被引：0

作者：

Chen, Zixiang ^{[1
]}

Deng, Yihe ^{[1
]}

Yuan, Huizhuo ^{[1
]}

Ji, Kaixuan ^{[1
]}

Gu, Quanquan ^{[1
]}

机构：

[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING | 2024年 / 235卷

关键词：

GAME;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the prospect of growing a strong LLM out of a weak one without the need for acquiring additional human-annotated data. We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN), which starts from a supervised fine-tuned model. At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself. More specifically, the LLM generates its own training data from its previous iterations, refining its policy by discerning these self-generated responses from those obtained from human-annotated data. Our method progressively elevates the LLM from a nascent model to a formidable one, unlocking the full potential of human-annotated demonstration data for SFT. Theoretically, we prove that the global optimum to the training objective function of our method is achieved only when the LLM policy aligns with the target data distribution. Empirically, we evaluate our method on several benchmark datasets including the HuggingFace Open LLM Leaderboard, MT-Bench, and datasets from Big-Bench. Our results show that SPIN can significantly improve the LLM's performance across a variety of benchmarks and even outperform models trained through direct preference optimization (DPO) supplemented with extra GPT-4 preference data. This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents. Codes are available at https://github.com/uclaml/SPIN.

引用

页数：22

共 50 条

[1] Phased Instruction Fine-Tuning for Large Language Models
Pang, Wei
Zhou, Chuan
Zhou, Xiao-Hua
Wang, Xiaojie
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 5735 - 5748
[2] CONVFIT: Conversational Fine-Tuning of Pretrained Language Models
Vulic, Ivan
Su, Pei-Hao
Coope, Sam
Gerz, Daniela
Budzianowski, Pawel
Casanueva, Inigo
Mrksic, Nikola
Wen, Tsung-Hsien
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1151 - 1168
[3] Improve Performance of Fine-tuning Language Models with Prompting
Yang, Zijian Gyozo
Ligeti-Nagy, Noenn
INFOCOMMUNICATIONS JOURNAL, 2023, 15 : 62 - 68
[4] HackMentor: Fine-Tuning Large Language Models for Cybersecurity
Zhang, Jie
Wen, Hui
Deng, Liting
Xin, Mingfeng
Li, Zhi
Li, Lun
Zhu, Hongsong
Sun, Limin
2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 452 - 461
[5] Fine-tuning language models to recognize semantic relations
Roussinov, Dmitri
Sharoff, Serge
Puchnina, Nadezhda
LANGUAGE RESOURCES AND EVALUATION, 2023, 57 (04) : 1463 - 1486
[6] Fine-tuning language models to recognize semantic relations
Dmitri Roussinov
Serge Sharoff
Nadezhda Puchnina
Language Resources and Evaluation, 2023, 57 : 1463 - 1486
[7] Fine-Tuning Language Models with Just Forward Passes
Malladi, Sadhika
Gao, Tianyu
Nichani, Eshaan
Damian, Alex
Lee, Jason D.
Chen, Danqi
Arora, Sanjeev
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[8] Fine-tuning large neural language models for biomedical natural language processing
Tinn, Robert
Cheng, Hao
Gu, Yu
Usuyama, Naoto
Liu, Xiaodong
Naumann, Tristan
Gao, Jianfeng
Poon, Hoifung
PATTERNS, 2023, 4 (04):
[9] How fine can fine-tuning be? Learning efficient language models
Radiya-Dixit, Evani
Wang, Xin
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 2435 - 2442
[10] Demystifying Instruction Mixing for Fine-tuning Large Language Models
Wang, Renxi
Li, Haonan
Wu, Minghao
Wang, Yuxia
Han, Xudong
Zhang, Chiyu
Baldwin, Timothy
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 4: STUDENT RESEARCH WORKSHOP, 2024, : 86 - 93

← 1 2 3 4 5 →