Small Language Models Improve Giants by Rewriting Their Outputs

被引:0
|
作者
Vernikos, Giorgos [1 ,2 ,4 ]
Brazinskas, Arthur [3 ]
Adamek, Jakub [3 ]
Mallinson, Jonathan [3 ]
Severyn, Aliaksei [3 ]
Malmi, Eric [3 ]
机构
[1] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
[2] HEIG VD HES SO, Yverdon, Switzerland
[3] Google Res, Mountain View, CA USA
[4] Google, Mountain View, CA USA
基金
瑞士国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the impressive performance of large language models (LLMs), they often lag behind specialized models in various tasks. LLMs only use a fraction of the existing training data for in-context learning, while task-specific models harness the full dataset for fine-tuning. In this work, we tackle the problem of leveraging training data to improve the performance of LLMs without fine-tuning. Our approach directly targets LLM predictions without requiring access to their weights. We create a pool of candidates from the LLM through few-shot prompting and we employ a compact model, the LM-corrector (LMCOR), specifically trained to merge these candidates to produce an enhanced output. Our experiments on four natural language generation tasks demonstrate that even a small LMCOR model (250M) substantially improves the few-shot performance of LLMs (62B), matching and even outperforming standard fine-tuning. Furthermore, we illustrate the robustness of LMCOR against different prompts, thereby minimizing the need for extensive prompt engineering. Finally, we show that LMCOR can be seamlessly integrated with different LLMs at inference, serving as a plug-and-play module to improve their performance.
引用
收藏
页码:2703 / 2718
页数:16
相关论文
共 50 条
  • [41] Rewriting models of Boolean programs
    Bouajjani, Ahmed
    Esparza, Javier
    TERM REWRITING AND APPLICATIONS, PROCEEDINGS, 2006, 4098 : 136 - 150
  • [42] Leveraging Large Language Models to Improve REST API Testing
    Kim, Myeongsoo
    Stennett, Tyler
    Shah, Dhruv
    Sinha, Saurabh
    Orso, Alessandro
    2024 IEEE/ACM 46TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: NEW IDEAS AND EMERGING RESULTS, ICSE-NIER 2024, 2024, : 37 - 41
  • [43] Assess and Summarize: Improve Outage Understanding with Large Language Models
    Jin, Pengxiang
    Zhang, Shenglin
    Ma, Minghua
    Li, Haozhe
    Kang, Yu
    Li, Liqun
    Liu, Yudong
    Qiao, Bo
    Zhang, Chaoyun
    Zhao, Pu
    He, Shilin
    Sarro, Federica
    Dang, Yingnong
    Rajmohan, Saravana
    Lin, Qingwei
    Zhang, Dongmei
    PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 1657 - 1668
  • [44] RNA language models predict mutations that improve RNA function
    Shulgina, Yekaterina
    Trinidad, Marena I.
    Langeberg, Conner J.
    Nisonoff, Hunter
    Chithrananda, Seyone
    Skopintsev, Petr
    Nissley, Amos J.
    Patel, Jaymin
    Boger, Ron S.
    Shi, Honglue
    Yoon, Peter H.
    Doherty, Erin E.
    Pande, Tara
    Iyer, Aditya M.
    Doudna, Jennifer A.
    Cate, Jamie H. D.
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [45] Large language models improve annotation of prokaryotic viral proteins
    Flamholz, Zachary N.
    Biller, Steven J.
    Kelly, Libusha
    NATURE MICROBIOLOGY, 2024, 9 (03) : 657 - +
  • [46] AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models
    Chronopoulou, Alexandra
    Peters, Matthew E.
    Fraser, Alexander
    Dodge, Jesse
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2054 - 2063
  • [47] Improve Performance of Fine-tuning Language Models with Prompting
    Yang, Zijian Gyozo
    Ligeti-Nagy, Noenn
    INFOCOMMUNICATIONS JOURNAL, 2023, 15 : 62 - 68
  • [48] Large language models improve annotation of prokaryotic viral proteins
    Zachary N. Flamholz
    Steven J. Biller
    Libusha Kelly
    Nature Microbiology, 2024, 9 : 537 - 549
  • [49] Distilling mathematical reasoning capabilities into Small Language Models
    Zhu, Xunyu
    Li, Jian
    Liu, Yong
    Ma, Can
    Wang, Weiping
    NEURAL NETWORKS, 2024, 179
  • [50] Authorship Attribution of Small Messages Through Language Models
    Theophilo, Antonio
    Rocha, Anderson
    2022 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY (WIFS), 2022,