Large Language Models lack essential metacognition for reliable medical reasoning

被引:1
|
作者
Griot, Maxime [1 ,2 ]
Hemptinne, Coralie [1 ,3 ]
Vanderdonckt, Jean [2 ]
Yuksel, Demet [1 ,4 ]
机构
[1] Catholic Univ Louvain, Inst Neurosci, Brussels, Belgium
[2] Catholic Univ Louvain, Louvain Res Inst Management & Org, Louvain La Neuve, Belgium
[3] Clin Univ St Luc, Ophthalmol, Brussels, Belgium
[4] Clin Univ St Luc, Med Informat Dept, Brussels, Belgium
关键词
REFLECTIVE PRACTICE; STRATEGIES;
D O I
10.1038/s41467-024-55628-6
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Large Language Models have demonstrated expert-level accuracy on medical board examinations, suggesting potential for clinical decision support systems. However, their metacognitive abilities, crucial for medical decision-making, remain largely unexplored. To address this gap, we developed MetaMedQA, a benchmark incorporating confidence scores and metacognitive tasks into multiple-choice medical questions. We evaluated twelve models on dimensions including confidence-based accuracy, missing answer recall, and unknown recall. Despite high accuracy on multiple-choice questions, our study revealed significant metacognitive deficiencies across all tested models. Models consistently failed to recognize their knowledge limitations and provided confident answers even when correct options were absent. In this work, we show that current models exhibit a critical disconnect between perceived and actual capabilities in medical reasoning, posing significant risks in clinical settings. Our findings emphasize the need for more robust evaluation frameworks that incorporate metacognitive abilities, essential for developing reliable Large Language Model enhanced clinical decision support systems.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Targeted training for numerical reasoning with large language models
    Li, Xiao
    Liu, Sichen
    Zhu, Yin
    Cheng, Gong
    KNOWLEDGE AND INFORMATION SYSTEMS, 2025, 67 (01) : 197 - 221
  • [22] Automatic Model Selection with Large Language Models for Reasoning
    Zhao, James Xu
    Xie, Yuxi
    Kawaguchi, Kenji
    He, Junxian
    Xie, Michael Qizhe
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 758 - 783
  • [23] NEWTON: Are Large Language Models Capable of Physical Reasoning?
    Wang, Yi Ru
    Du, Jiafei
    Fox, Dieter
    Srinivasa, Siddhartha
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9743 - 9758
  • [24] Dynamic Voting for Efficient Reasoning in Large Language Models
    Xue, Mingfeng
    Liu, Dayiheng
    Lei, Wenqiang
    Ren, Xingzhang
    Yang, Baosong
    Xie, Jun
    Zhang, Yidan
    Peng, Dezhong
    Lv, Jiancheng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 3085 - 3104
  • [25] Rationality of Thought Improves Reasoning in Large Language Models
    Gou, Tian
    Zhang, Boyao
    Sun, Zhenglie
    Wang, Jing
    Liu, Fang
    Wang, Yangang
    Wang, Jue
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT IV, KSEM 2024, 2024, 14887 : 343 - 358
  • [26] Benchmarking medical large language models
    Bakhshandeh, Sadra
    NATURE REVIEWS BIOENGINEERING, 2023, 1 (08): : 543 - 543
  • [27] Making large language models into reliable physician assistants
    Mamede, Silvia
    Schmidt, Henk G.
    NATURE MEDICINE, 2025, : 1071 - 1072
  • [28] Are Large Language Models Reliable Argument Quality Annotators?
    Mirzakhmedova, Nailia
    Gohsen, Marcel
    Chang, Chia Hao
    Stein, Benno
    ROBUST ARGUMENTATION MACHINES, RATIO 2024, 2024, 14638 : 129 - 146
  • [29] Medical large language model for diagnostic reasoning across specialties
    Wang, Guangyu
    Liu, Xiaohong
    NATURE MEDICINE, 2025, : 743 - 744
  • [30] Leveraging Retrieval-Augmented Generation for Reliable Medical Question Answering Using Large Language Models
    Kharitonova, Ksenia
    Perez-Fernandez, David
    Gutierrez-Hernando, Javier
    Gutierrez-Fandino, Asier
    Callejas, Zoraida
    Griol, David
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT II, HAIS 2024, 2025, 14858 : 141 - 153