Distilling mathematical reasoning capabilities into Small Language Models

被引:0
|
作者
Zhu, Xunyu [1 ,2 ]
Li, Jian [1 ,2 ]
Liu, Yong [3 ]
Ma, Can [1 ,2 ]
Wang, Weiping [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[3] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Large language models; Knowledge Distillation; Mathematical reasoning; Chain-of-Thought; Program-of-Thought;
D O I
10.1016/j.neunet.2024.106594
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work addresses the challenge of democratizing advanced Large Language Models (LLMs) by compressing their mathematical reasoning capabilities into sub-billion parameter Small Language Models (SLMs) without compromising performance. We introduce Equation-of-Thought Distillation (EoTD), a novel technique that encapsulates the reasoning process into equation-based representations to construct an EoTD dataset for finetuning SLMs. Additionally, we propose the Ensemble Thoughts Distillation (ETD) framework to enhance the reasoning performance of SLMs. This involves creating a reasoning dataset with multiple thought processes, including Chain-of-Thought (CoT), Program-of-Thought (PoT), and Equation-of-Thought (EoT), and using it for fine-tuning. Our experimental performance demonstrates that EoTD significantly boosts the reasoning abilities of SLMs, while ETD enables these models to achieve state-of-the-art reasoning performance.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] MATHEMATICAL LINGUISTICS, LOGIC, AND DEVELOPMENT OF LANGUAGE AND REASONING IN CHILD
    EPSTEIN, G
    SHAPIRO, SC
    ANNALS OF THE NEW YORK ACADEMY OF SCIENCES, 1976, 280 (OCT28) : 120 - 126
  • [42] Distilling Relation Embeddings from Pre-trained Language Models
    Ushio, Asahi
    Camacho-Collados, Jose
    Schockaert, Steven
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9044 - 9062
  • [43] Capabilities and Limitations of Mathematical Models in Ecological Safety Forecasting
    S. O. Travin
    Yu. I. Skurlatov
    A. V. Roshchin
    Russian Journal of Physical Chemistry B, 2020, 14 : 86 - 99
  • [44] Qu-Prolog: An implementation language for agents with advanced reasoning capabilities
    Robinson, PJ
    Hinchey, M
    Clark, K
    FORMAL APPROACHES TO AGENT-BASED SYSTEMS, 2003, 2699 : 162 - 172
  • [45] Capabilities and Limitations of Mathematical Models in Ecological Safety Forecasting
    Travin, S. O.
    Skurlatov, Yu. I.
    Roshchin, A. V.
    RUSSIAN JOURNAL OF PHYSICAL CHEMISTRY B, 2020, 14 (01) : 86 - 99
  • [46] Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks
    Kang, Minki
    Lee, Seanie
    Baek, Jinheon
    Kawaguchi, Kenji
    Hwang, Sung Ju
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [47] Towards Reasoning in Large Language Models: A Survey
    Huang, Jie
    Chang, Kevin Chen-Chuan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 1049 - 1065
  • [48] Psycholinguistic Diagnosis of Language Models' Commonsense Reasoning
    Cong, Yan
    PROCEEDINGS OF THE FIRST WORKSHOP ON COMMONSENSE REPRESENTATION AND REASONING (CSRR 2022), 2022, : 17 - 22
  • [49] Large Language Models are Visual Reasoning Coordinators
    Chen, Liangyu
    Li, Bo
    Shen, Sheng
    Yang, Jingkang
    Li, Chunyuan
    Keutzer, Kurt
    Darrell, Trevor
    Liu, Ziwei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [50] ALERT: Adapting Language Models to Reasoning Tasks
    Yu, Ping
    Wang, Tianlu
    Golovneva, Olga
    AlKhamissi, Badr
    Verma, Siddharth
    Jin, Zhijing
    Ghosh, Gargi
    Diab, Mona
    Celikyilmaz, Asli
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 1055 - 1081