DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models

被引:0
|
作者
Han, Chengcheng [1 ,2 ]
Du, Xiaowei [2 ]
Zhang, Che [3 ]
Lian, Yixin [2 ]
Li, Xiang [1 ]
Gao, Ming [1 ,4 ]
Wang, Baoyuan [2 ]
机构
[1] East China Normal Univ, Sch Data Sci & Engn, Shanghai, Peoples R China
[2] Xiaobing AI, Boston, MA 02199 USA
[3] Peking Univ, Sch Software & Microelect, Beijing, Peoples R China
[4] East China Normal Univ, KLATASDS MOE Sch Stat, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Chain-of-Thought (CoT) prompting has proven to be effective in enhancing the reasoning capabilities of Large Language Models (LLMs) with at least 100 billion parameters. However, it is ineffective or even detrimental when applied to reasoning tasks in Smaller Language Models (SLMs) with less than 10 billion parameters. To address this limitation, we introduce Dialogue-guided Chain-of-Thought (DialCoT) which employs a dialogue format to generate intermediate reasoning steps, guiding the model toward the final answer. Additionally, we optimize the model's reasoning path selection using the Proximal Policy Optimization (PPO) algorithm, further enhancing its reasoning capabilities. Our method offers several advantages compared to previous approaches. Firstly, we transform the process of solving complex reasoning questions by breaking them down into a series of simpler sub-questions, significantly reducing the task difficulty and making it more suitable for SLMs. Secondly, we optimize the model's reasoning path selection through the PPO algorithm. We conduct comprehensive experiments on four arithmetic reasoning datasets, demonstrating that our method achieves significant performance improvements compared to state-of-the-art competitors.(1)
引用
收藏
页码:8055 / 8068
页数:14
相关论文
共 12 条
  • [1] Distilling Reasoning Capabilities into Smaller Language Models
    Shridhar, Kumar
    Stolfo, Alessandro
    Sachan, Mrinmaya
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 7059 - 7073
  • [2] IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
    You, Haoxuan
    Sun, Rui
    Wang, Zhecan
    Chen, Long
    Wang, Gengyu
    Ayyubi, Hammad A.
    Chang, Kai-Wei
    Chang, Shih-Fu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11289 - 11303
  • [3] Exploring Reversal Mathematical Reasoning Ability for Large Language Models
    Guo, Pei
    You, Wangjie
    Li, Juntao
    Yan, Bowen
    Zhang, Min
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 13671 - 13685
  • [4] Large Language Models are Versatile Decomposers: Decomposing Evidence and Questions for Table-based Reasoning
    Ye, Yunhu
    Hui, Binyuan
    Yang, Min
    Li, Binhua
    Huang, Fei
    Li, Yongbin
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 174 - 184
  • [5] Exploring the Capacity of Pretrained Language Models for Reasoning about Actions and Change
    He, Weinan
    Huang, Canming
    Xiao, Zhanhao
    Liu, Yongmei
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4629 - 4643
  • [6] Exploring Abductive Reasoning in Language Models for Data-to-Text Generation
    Onderkova, Kristyna
    Nickles, Matthias
    2023 31ST IRISH CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COGNITIVE SCIENCE, AICS, 2023,
  • [7] Exploring the Numerical Reasoning Capabilities of Language Models: A Comprehensive Analysis on Tabular Data
    Akhtar, Mubashara
    Shankarampeta, Abhilash
    Gupta, Vivek
    Patil, Arpit
    Cocarascul, Oana
    Simper, Elena
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 15391 - 15405
  • [8] Exploring Equation as a Better Intermediate Meaning Representation for Numerical Reasoning of Large Language Models
    Wang, Dingzirui
    Dou, Longxu
    Zhang, Wenbin
    Zeng, Junyu
    Che, Wanxiang
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19116 - 19125
  • [9] Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the NeuBAROCO Dataset
    Ozeki, Kentaro
    Ando, Risako
    Morishita, Takanobu
    Abe, Hirohiko
    Mineshima, Koji
    Okada, Mitsuhiro
    arXiv,
  • [10] Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the NeuBAROCO Dataset
    Ozeki, Kentaro
    Ando, Risako
    Morishita, Takanobu
    Abe, Hirohiko
    Mineshima, Koji
    Okada, Mitsuhiro
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 16063 - 16077