Evaluating large language models as agents in the clinic

被引：0

作者：

Nikita Mehandru

Brenda Y. Miao

Eduardo Rodriguez Almaraz

Madhumita Sushil

Atul J. Butte

Ahmed Alaa

机构：

[1] University of California,Bakar Computational Health Sciences Institute

[2] Berkeley,Neurosurgery Department Division of Neuro

[3] University of California San Francisco,Oncology

[4] University of California San Francisco,Department of Epidemiology and Biostatistics

[5] University of California San Francisco,Department of Pediatrics

[6] University of California San Francisco,undefined

来源：

npj Digital Medicine | / 7卷

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Recent developments in large language models (LLMs) have unlocked opportunities for healthcare, from information synthesis to clinical decision support. These LLMs are not just capable of modeling language, but can also act as intelligent “agents” that interact with stakeholders in open-ended conversations and even influence clinical decision-making. Rather than relying on benchmarks that measure a model’s ability to process clinical data or answer standardized test questions, LLM agents can be modeled in high-fidelity simulations of clinical settings and should be assessed for their impact on clinical workflows. These evaluation frameworks, which we refer to as “Artificial Intelligence Structured Clinical Examinations” (“AI-SCE”), can draw from comparable technologies where machines operate with varying degrees of self-governance, such as self-driving cars, in dynamic environments with multiple stakeholders. Developing these robust, real-world clinical evaluations will be crucial towards deploying LLM agents in medical settings.

引用

共 50 条

[41] Evaluating the effectiveness of large language models in patient education for conjunctivitis
Wang, Jingyuan
Shi, Runhan
Le, Qihua
Shan, Kun
Chen, Zhi
Zhou, Xujiao
He, Yao
Hong, Jiaxu
BRITISH JOURNAL OF OPHTHALMOLOGY, 2024,
[42] Evaluating interactions of patients with large language models for medical information
Carl, Nicolas
Haggenmueller, Sarah
Wies, Christoph
Nguyen, Lisa
Winterstein, Jana Theres
Hetz, Martin Joachim
Mangold, Maurin Helen
Hartung, Friedrich Otto
Gruene, Britta
Holland-Letz, Tim
Michel, Maurice Stephan
Brinker, Titus Josef
Wessels, Frederik
BJU INTERNATIONAL, 2025,
[43] Evaluating the Reliability of Self-explanations in Large Language Models
Randl, Korbinian
Pavlopoulos, John
Henriksson, Aron
Lindgren, Tony
DISCOVERY SCIENCE, DS 2024, PT I, 2025, 15243 : 36 - 51
[44] Evaluating Explanations for Software Patches Generated by Large Language Models
Sobania, Dominik
Geiger, Alina
Callan, James
Brownlee, Alexander
Hanna, Carol
Moussa, Rebecca
Lopez, Mar Zamorano
Petke, Justyna
Sarro, Federica
SEARCH-BASED SOFTWARE ENGINEERING, SSBSE 2023, 2024, 14415 : 147 - 152
[45] Evaluating Cognitive Maps and planning in Large Language Models with CogEval
Momennejad, Ida
Hasanbeig, Hosein
Frujeri, Felipe Vieira
Sharma, Hiteshi
Ness, Robert Osazuwa
Jojic, Nebojsa
Palangi, Hamid
Larson, Jonathan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[46] Evaluating the Elementary Multilingual Capabilities of Large Language Models with MULTIQ
Holtermann, Carolin
Rottger, Paul
Dill, Timm
Lauscher, Anne
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 4476 - 4494
[47] Evaluating the Efficacy of Large Language Models in Identifying Phishing Attempts
Patel, Het
Reiman, Umair
Iqbal, Farkhund
2024 16TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION, HSI 2024, 2024,
[48] Evaluating Object Hallucination in Large Vision-Language Models
Li, Yifan
Du, Yifan
Zhou, Kun
Wang, Jinpeng
Zhao, Wayne Xin
Wen, Ji-Rong
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 292 - 305
[49] Methodological Challenges in Evaluating Large Language Models in Radiology Response
Krishna, Satheesh
Bhayana, Rajesh
RADIOLOGY, 2024, 313 (03) : 1 - 2
[50] ProAgent: Building Proactive Cooperative Agents with Large Language Models
Zhang, Ceyao
Yang, Kaijie
Hu, Siyi
Wang, Zihao
Li, Guanghe
Sun, Yihang
Zhang, Cheng
Zhang, Zhaowei
Liu, Anji
Zhu, Song-Chun
Chang, Xiaojun
Zhang, Junge
Yin, Feng
Liang, Yitao
Yang, Yaodong
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17591 - 17599

← 1 2 3 4 5 →