Distilling mathematical reasoning capabilities into Small Language Models

被引:0
|
作者
Zhu, Xunyu [1 ,2 ]
Li, Jian [1 ,2 ]
Liu, Yong [3 ]
Ma, Can [1 ,2 ]
Wang, Weiping [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[3] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Large language models; Knowledge Distillation; Mathematical reasoning; Chain-of-Thought; Program-of-Thought;
D O I
10.1016/j.neunet.2024.106594
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work addresses the challenge of democratizing advanced Large Language Models (LLMs) by compressing their mathematical reasoning capabilities into sub-billion parameter Small Language Models (SLMs) without compromising performance. We introduce Equation-of-Thought Distillation (EoTD), a novel technique that encapsulates the reasoning process into equation-based representations to construct an EoTD dataset for finetuning SLMs. Additionally, we propose the Ensemble Thoughts Distillation (ETD) framework to enhance the reasoning performance of SLMs. This involves creating a reasoning dataset with multiple thought processes, including Chain-of-Thought (CoT), Program-of-Thought (PoT), and Equation-of-Thought (EoT), and using it for fine-tuning. Our experimental performance demonstrates that EoTD significantly boosts the reasoning abilities of SLMs, while ETD enables these models to achieve state-of-the-art reasoning performance.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Small Language Models Need Strong Verifiers to Self-Correct Reasoning
    Zhang, Yunxiang
    Khalifa, Muhammad
    Logeswaran, Lajanugen
    Kim, Jaekyeom
    Lee, Moontae
    Lee, Honglak
    Wang, Lu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 15637 - 15653
  • [32] Small language models learn enhanced reasoning skills from medical textbooks
    Hyunjae Kim
    Hyeon Hwang
    Jiwoo Lee
    Sihyeon Park
    Dain Kim
    Taewhoo Lee
    Chanwoong Yoon
    Jiwoong Sohn
    Jungwoo Park
    Olga Reykhart
    Thomas Fetherston
    Donghee Choi
    Soo Heon Kwak
    Qingyu Chen
    Jaewoo Kang
    npj Digital Medicine, 8 (1)
  • [33] Distilling large language models for matching patients to clinical trials
    Nievas, Mauro
    Basu, Aditya
    Wang, Yanshan
    Singh, Hrituraj
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 1953 - 1963
  • [34] Introspective Capabilities in Large Language Models
    Long, Robert
    JOURNAL OF CONSCIOUSNESS STUDIES, 2023, 30 (9-10) : 143 - 153
  • [35] Select, Prompt, Filter: Distilling Large Language Models for Summarizing Conversations
    Pham, Minh-Quang
    Indurthi, Sathish Reddy
    Chollampatt, Shamil
    Turchi, Marco
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 12257 - 12265
  • [36] Efficient Toxic Content Detection by Bootstrapping and Distilling Large Language Models
    Zhang, Jiang
    Wu, Qiong
    Xu, Yiming
    Cao, Cheng
    Du, Zheng
    Psounis, Konstantinos
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 19, 2024, : 21779 - 21787
  • [37] Large Language Models Are Reasoning Teachers
    Ho, Namgyu
    Schmid, Laura
    Yun, Se-Young
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14852 - 14882
  • [38] Mathematical models and language shift
    Kabatek, Johannes
    ESTUDOS DE LINGUISTICA GALEGA, 2012, 4 : 27 - 43
  • [39] Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models
    Zhang, Dingzong
    Listiyani, Devi
    Singh, Priyanka
    Mohanty, Manoranjan
    IEEE ACCESS, 2025, 13 : 56296 - 56325
  • [40] The importance of specific mathematical language for early proportional reasoning
    Vanluydt, Elien
    Supply, Anne-Sophie
    Verschaffel, Lieven
    Van, Wim
    EARLY CHILDHOOD RESEARCH QUARTERLY, 2021, 55 : 193 - 200