Small Language Models Improve Giants by Rewriting Their Outputs

被引:0
|
作者
Vernikos, Giorgos [1 ,2 ,4 ]
Brazinskas, Arthur [3 ]
Adamek, Jakub [3 ]
Mallinson, Jonathan [3 ]
Severyn, Aliaksei [3 ]
Malmi, Eric [3 ]
机构
[1] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
[2] HEIG VD HES SO, Yverdon, Switzerland
[3] Google Res, Mountain View, CA USA
[4] Google, Mountain View, CA USA
基金
瑞士国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the impressive performance of large language models (LLMs), they often lag behind specialized models in various tasks. LLMs only use a fraction of the existing training data for in-context learning, while task-specific models harness the full dataset for fine-tuning. In this work, we tackle the problem of leveraging training data to improve the performance of LLMs without fine-tuning. Our approach directly targets LLM predictions without requiring access to their weights. We create a pool of candidates from the LLM through few-shot prompting and we employ a compact model, the LM-corrector (LMCOR), specifically trained to merge these candidates to produce an enhanced output. Our experiments on four natural language generation tasks demonstrate that even a small LMCOR model (250M) substantially improves the few-shot performance of LLMs (62B), matching and even outperforming standard fine-tuning. Furthermore, we illustrate the robustness of LMCOR against different prompts, thereby minimizing the need for extensive prompt engineering. Finally, we show that LMCOR can be seamlessly integrated with different LLMs at inference, serving as a plug-and-play module to improve their performance.
引用
收藏
页码:2703 / 2718
页数:16
相关论文
共 50 条
  • [21] A Simple Method to Improve the Performance of Small Pre-trained Language Models on Few-shot Tasks
    Zhang, Yanan
    Wu, Chaofan
    Shi, Rongkun
    Zhang, Yiying
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1572 - 1577
  • [22] Teaching Language Models to Self-Improve by Learning from Language Feedback
    Hu, Chi
    Hu, Yimin
    Cao, Hang
    Xiao, Tong
    Zhu, Jingbo
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 6090 - 6101
  • [23] Modeling Overregularization in Children with Small Language Models
    Haga, Akari
    Sugawara, Saku
    Akiyo, Fukatsu
    Oba, Miyu
    Ouchi, Hiroki
    Watanabe, Taro
    Oseki, Yohei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 14532 - 14550
  • [24] Curriculum Learning for Small Code Language Models
    Nair, Marwa
    Yamani, Kamel
    Lhadji, Lynda Said
    Baghdadi, Riyadh
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 4: STUDENT RESEARCH WORKSHOP, 2024, : 408 - 419
  • [25] FinGPT: Large Generative Models for a Small Language
    Luukkonen, Risto
    Komulainen, Ville
    Luoma, Jouni
    Eskelinen, Anni
    Kanerva, Jenna
    Kupari, Hanna-Mari
    Ginter, Filip
    Laippala, Veronika
    Muennighoff, Niklas
    Piktus, Aleksandra
    Wang, Thomas
    Tazi, Nouamane
    Le Scao, Teven
    Wolf, Thomas
    Suominen, Osma
    Sairanen, Samuli
    Merioksa, Mikko
    Heinonen, Jyrki
    Vahtola, Aija
    Ffi, Samuel Antao
    Pyysalo, Sampo
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 2710 - 2726
  • [26] SMALL GIANTS: GREATER OMAHA
    Sorvino, Chloe
    FORBES, 2017, 200 (04): : 94 - 98
  • [27] THE LANGUAGE OF GIANTS SWORD HILT IN 'BEOWULF'
    SCHRADER, RJ
    NEUPHILOLOGISCHE MITTEILUNGEN, 1993, 94 (02) : 141 - 147
  • [28] CLEAN - A LANGUAGE FOR FUNCTIONAL GRAPH REWRITING
    BRUS, TH
    VANEEKELEN, MCJD
    VANLEER, MO
    PLASMEIJER, MJ
    LECTURE NOTES IN COMPUTER SCIENCE, 1987, 274 : 364 - 384
  • [29] Compiling CIL rewriting language for multiprocessors
    Tian, Xinmin
    Wang, Dingxing
    Zheng, Weimin
    Shen, Meiming
    Li, Cheng
    Journal of Computer Science and Technology, 1994, 9 (04) : 302 - 310
  • [30] DACTL - AN EXPERIMENTAL GRAPH REWRITING LANGUAGE
    GLAUERT, JRW
    KENNAWAY, JR
    SLEEP, MR
    LECTURE NOTES IN COMPUTER SCIENCE, 1991, 532 : 378 - 395