Over-Reasoning and Redundant Calculation of Large Language Models

被引：0

作者：

Chiang, Cheng-Han ^{[1
]}

Lee, Hung-yi ^{[1
]}

机构：

[1] Natl Taiwan Univ, Taipei, Taiwan

来源：

PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models (LLMs) can solve problems step-by-step. While this chain-of-thought (CoT) reasoning boosts LLMs' performance, it is unclear if LLMs know when to use CoT and whether those CoT are always necessary to answer the question. This paper shows that LLMs tend to generate redundant calculations and reasoning on a manually constructed math QA dataset, GSM8K-Zero. GSM8K-Zero is constructed such that the questions can be answered without any calculations, but LLMs, including Llama-2 models and Claude-2, tend to generate lengthy and unnecessary calculations to answer the questions. We also conduct experiments to explain why LLMs generate redundant calculations and reasonings. GSM8K-Zero is publicly available at https://github.com/d223302/Over-Reasoning-of- LLMs and https://huggingface.co/datasets/dcml0714/GSM8K-Zero.

引用

页码：161 / 169

页数：9

共 50 条

[31] Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language Models
Tan, Qingyu
Ng, Hwee Tou
Bing, Lidong
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14820 - 14835
[32] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Wei, Jason
Wang, Xuezhi
Schuurmans, Dale
Bosma, Maarten
Ichter, Brian
Xia, Fei
Chi, Ed H.
Le, Quoc V.
Zhou, Denny
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[33] An Evaluation of Reasoning Capabilities of Large Language Models in Financial Sentiment Analysis
Du, Kelvin
Xing, Frank
Mao, Rui
Cambria, Erik
2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 189 - 194
[34] Large Language Models lack essential metacognition for reliable medical reasoning
Griot, Maxime
Hemptinne, Coralie
Vanderdonckt, Jean
Yuksel, Demet
NATURE COMMUNICATIONS, 2025, 16 (01)
[35] TIMEBENCH: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models
Chu, Zheng
Chen, Jingchang
Chen, Qianglong
Yu, Weijiang
Wang, Haotian
Liu, Ming
Qin, Bing
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 1204 - 1228
[36] Reasoning in Large Language Models Through Symbolic Math Word Problems
Gaur, Vedant
Saunshi, Nikunj
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 5889 - 5903
[37] VISA: Reasoning Video Object Segmentation via Large Language Models
Yan, Cilin
Wang, Haochen
Yan, Shilin
Jiang, Xiaolong
Hu, Yao
Kang, Guoliang
Xie, Weidi
Gavves, Efstratios
COMPUTER VISION - ECCV 2024, PT XV, 2025, 15073 : 98 - 115
[38] The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code
Liu, Xiao
Yin, Da
Zhang, Chen
Feng, Yansong
Zhao, Dongyan
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 9009 - 9022
[39] ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
Zhou, Kaiwen
Lee, Kwonjoon
Misu, Teruhisa
Wang, Xin Eric
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 10783 - 10795
[40] Assessing the Emergent Symbolic Reasoning Abilities of Llama Large Language Models
Petruzzellis, Flavio
Testolin, Alberto
Sperduti, Alessandro
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT V, 2024, 15020 : 266 - 276

← 1 2 3 4 5 →