Large Language Models lack essential metacognition for reliable medical reasoning

被引:1
|
作者
Griot, Maxime [1 ,2 ]
Hemptinne, Coralie [1 ,3 ]
Vanderdonckt, Jean [2 ]
Yuksel, Demet [1 ,4 ]
机构
[1] Catholic Univ Louvain, Inst Neurosci, Brussels, Belgium
[2] Catholic Univ Louvain, Louvain Res Inst Management & Org, Louvain La Neuve, Belgium
[3] Clin Univ St Luc, Ophthalmol, Brussels, Belgium
[4] Clin Univ St Luc, Med Informat Dept, Brussels, Belgium
关键词
REFLECTIVE PRACTICE; STRATEGIES;
D O I
10.1038/s41467-024-55628-6
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Large Language Models have demonstrated expert-level accuracy on medical board examinations, suggesting potential for clinical decision support systems. However, their metacognitive abilities, crucial for medical decision-making, remain largely unexplored. To address this gap, we developed MetaMedQA, a benchmark incorporating confidence scores and metacognitive tasks into multiple-choice medical questions. We evaluated twelve models on dimensions including confidence-based accuracy, missing answer recall, and unknown recall. Despite high accuracy on multiple-choice questions, our study revealed significant metacognitive deficiencies across all tested models. Models consistently failed to recognize their knowledge limitations and provided confident answers even when correct options were absent. In this work, we show that current models exhibit a critical disconnect between perceived and actual capabilities in medical reasoning, posing significant risks in clinical settings. Our findings emphasize the need for more robust evaluation frameworks that incorporate metacognitive abilities, essential for developing reliable Large Language Model enhanced clinical decision support systems.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] The role of large language models in medical genetics
    Merdler-Rabinowicz, Rona
    Omar, Mahmud
    Ganesh, Jaya
    Morava, Eva
    Nadkarni, Girish N.
    Klang, Eyal
    MOLECULAR GENETICS AND METABOLISM, 2025, 145 (01)
  • [42] Large Language Models and Their Implications on Medical Education
    Bair, Henry
    Norden, Justin
    ACADEMIC MEDICINE, 2023, 98 (08) : 869 - 870
  • [43] Probabilistic medical predictions of large language models
    Gu, Bowen
    Desai, Rishi J.
    Lin, Kueiyu Joshua
    Yang, Jie
    NPJ DIGITAL MEDICINE, 2024, 7 (01):
  • [44] Large language models as partners in medical literature
    Perez-Guerrero, Eduardo J.
    Mehrotra, Isha
    Jain, Sneha S.
    V. Perez, Marco
    HEART RHYTHM, 2025, 22 (02) : 579 - 584
  • [45] Limitations of large language models in medical applications
    Deng, Jiawen
    Zubair, Areeba
    Park, Ye-Jean
    POSTGRADUATE MEDICAL JOURNAL, 2023, 99 (1178) : 1298 - 1299
  • [46] Improving medical reasoning through retrieval and self-reflection with retrieval-augmented large language models
    Jeong, Minbyul
    Sohn, Jiwoong
    Sung, Mujeen
    Kang, Jaewoo
    BIOINFORMATICS, 2024, 40 : i119 - i129
  • [47] Understanding Social Reasoning in Language Models with Language Models
    Gandhi, Kanishk
    Franken, J. -Philipp
    Gerstenberg, Tobias
    Goodman, Noah D.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [48] Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language Models
    Tan, Qingyu
    Ng, Hwee Tou
    Bing, Lidong
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14820 - 14835
  • [49] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
    Wei, Jason
    Wang, Xuezhi
    Schuurmans, Dale
    Bosma, Maarten
    Ichter, Brian
    Xia, Fei
    Chi, Ed H.
    Le, Quoc V.
    Zhou, Denny
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [50] An Evaluation of Reasoning Capabilities of Large Language Models in Financial Sentiment Analysis
    Du, Kelvin
    Xing, Frank
    Mao, Rui
    Cambria, Erik
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 189 - 194