Small Language Models Improve Giants by Rewriting Their Outputs

被引：0

作者：

Vernikos, Giorgos ^{[1
,2
,4
]}

Brazinskas, Arthur ^{[3
]}

Adamek, Jakub ^{[3
]}

Mallinson, Jonathan ^{[3
]}

Severyn, Aliaksei ^{[3
]}

Malmi, Eric ^{[3
]}

机构：

[1] Ecole Polytech Fed Lausanne, Lausanne, Switzerland

[2] HEIG VD HES SO, Yverdon, Switzerland

[3] Google Res, Mountain View, CA USA

[4] Google, Mountain View, CA USA

来源：

PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS | 2024年

基金：

瑞士国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite the impressive performance of large language models (LLMs), they often lag behind specialized models in various tasks. LLMs only use a fraction of the existing training data for in-context learning, while task-specific models harness the full dataset for fine-tuning. In this work, we tackle the problem of leveraging training data to improve the performance of LLMs without fine-tuning. Our approach directly targets LLM predictions without requiring access to their weights. We create a pool of candidates from the LLM through few-shot prompting and we employ a compact model, the LM-corrector (LMCOR), specifically trained to merge these candidates to produce an enhanced output. Our experiments on four natural language generation tasks demonstrate that even a small LMCOR model (250M) substantially improves the few-shot performance of LLMs (62B), matching and even outperforming standard fine-tuning. Furthermore, we illustrate the robustness of LMCOR against different prompts, thereby minimizing the need for extensive prompt engineering. Finally, we show that LMCOR can be seamlessly integrated with different LLMs at inference, serving as a plug-and-play module to improve their performance.

引用

页码：2703 / 2718

页数：16

共 50 条

[21] A Simple Method to Improve the Performance of Small Pre-trained Language Models on Few-shot Tasks
Zhang, Yanan
Wu, Chaofan
Shi, Rongkun
Zhang, Yiying
PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1572 - 1577
[22] Teaching Language Models to Self-Improve by Learning from Language Feedback
Hu, Chi
Hu, Yimin
Cao, Hang
Xiao, Tong
Zhu, Jingbo
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 6090 - 6101
[23] Modeling Overregularization in Children with Small Language Models
Haga, Akari
Sugawara, Saku
Akiyo, Fukatsu
Oba, Miyu
Ouchi, Hiroki
Watanabe, Taro
Oseki, Yohei
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 14532 - 14550
[24] Curriculum Learning for Small Code Language Models
Nair, Marwa
Yamani, Kamel
Lhadji, Lynda Said
Baghdadi, Riyadh
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 4: STUDENT RESEARCH WORKSHOP, 2024, : 408 - 419
[25] FinGPT: Large Generative Models for a Small Language
Luukkonen, Risto
Komulainen, Ville
Luoma, Jouni
Eskelinen, Anni
Kanerva, Jenna
Kupari, Hanna-Mari
Ginter, Filip
Laippala, Veronika
Muennighoff, Niklas
Piktus, Aleksandra
Wang, Thomas
Tazi, Nouamane
Le Scao, Teven
Wolf, Thomas
Suominen, Osma
Sairanen, Samuli
Merioksa, Mikko
Heinonen, Jyrki
Vahtola, Aija
Ffi, Samuel Antao
Pyysalo, Sampo
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 2710 - 2726
[26] SMALL GIANTS: GREATER OMAHA
Sorvino, Chloe
FORBES, 2017, 200 (04): : 94 - 98
[27] THE LANGUAGE OF GIANTS SWORD HILT IN 'BEOWULF'
SCHRADER, RJ
NEUPHILOLOGISCHE MITTEILUNGEN, 1993, 94 (02) : 141 - 147
[28] CLEAN - A LANGUAGE FOR FUNCTIONAL GRAPH REWRITING
BRUS, TH
VANEEKELEN, MCJD
VANLEER, MO
PLASMEIJER, MJ
LECTURE NOTES IN COMPUTER SCIENCE, 1987, 274 : 364 - 384
[29] Compiling CIL rewriting language for multiprocessors
Tian, Xinmin
Wang, Dingxing
Zheng, Weimin
Shen, Meiming
Li, Cheng
Journal of Computer Science and Technology, 1994, 9 (04) : 302 - 310
[30] DACTL - AN EXPERIMENTAL GRAPH REWRITING LANGUAGE
GLAUERT, JRW
KENNAWAY, JR
SLEEP, MR
LECTURE NOTES IN COMPUTER SCIENCE, 1991, 532 : 378 - 395

← 1 2 3 4 5 →