Evaluating large language models as agents in the clinic

被引:0
|
作者
Nikita Mehandru
Brenda Y. Miao
Eduardo Rodriguez Almaraz
Madhumita Sushil
Atul J. Butte
Ahmed Alaa
机构
[1] University of California,Bakar Computational Health Sciences Institute
[2] Berkeley,Neurosurgery Department Division of Neuro
[3] University of California San Francisco,Oncology
[4] University of California San Francisco,Department of Epidemiology and Biostatistics
[5] University of California San Francisco,Department of Pediatrics
[6] University of California San Francisco,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Recent developments in large language models (LLMs) have unlocked opportunities for healthcare, from information synthesis to clinical decision support. These LLMs are not just capable of modeling language, but can also act as intelligent “agents” that interact with stakeholders in open-ended conversations and even influence clinical decision-making. Rather than relying on benchmarks that measure a model’s ability to process clinical data or answer standardized test questions, LLM agents can be modeled in high-fidelity simulations of clinical settings and should be assessed for their impact on clinical workflows. These evaluation frameworks, which we refer to as “Artificial Intelligence Structured Clinical Examinations” (“AI-SCE”), can draw from comparable technologies where machines operate with varying degrees of self-governance, such as self-driving cars, in dynamic environments with multiple stakeholders. Developing these robust, real-world clinical evaluations will be crucial towards deploying LLM agents in medical settings.
引用
收藏
相关论文
共 50 条
  • [41] Evaluating the effectiveness of large language models in patient education for conjunctivitis
    Wang, Jingyuan
    Shi, Runhan
    Le, Qihua
    Shan, Kun
    Chen, Zhi
    Zhou, Xujiao
    He, Yao
    Hong, Jiaxu
    BRITISH JOURNAL OF OPHTHALMOLOGY, 2024,
  • [42] Evaluating interactions of patients with large language models for medical information
    Carl, Nicolas
    Haggenmueller, Sarah
    Wies, Christoph
    Nguyen, Lisa
    Winterstein, Jana Theres
    Hetz, Martin Joachim
    Mangold, Maurin Helen
    Hartung, Friedrich Otto
    Gruene, Britta
    Holland-Letz, Tim
    Michel, Maurice Stephan
    Brinker, Titus Josef
    Wessels, Frederik
    BJU INTERNATIONAL, 2025,
  • [43] Evaluating the Reliability of Self-explanations in Large Language Models
    Randl, Korbinian
    Pavlopoulos, John
    Henriksson, Aron
    Lindgren, Tony
    DISCOVERY SCIENCE, DS 2024, PT I, 2025, 15243 : 36 - 51
  • [44] Evaluating Explanations for Software Patches Generated by Large Language Models
    Sobania, Dominik
    Geiger, Alina
    Callan, James
    Brownlee, Alexander
    Hanna, Carol
    Moussa, Rebecca
    Lopez, Mar Zamorano
    Petke, Justyna
    Sarro, Federica
    SEARCH-BASED SOFTWARE ENGINEERING, SSBSE 2023, 2024, 14415 : 147 - 152
  • [45] Evaluating Cognitive Maps and planning in Large Language Models with CogEval
    Momennejad, Ida
    Hasanbeig, Hosein
    Frujeri, Felipe Vieira
    Sharma, Hiteshi
    Ness, Robert Osazuwa
    Jojic, Nebojsa
    Palangi, Hamid
    Larson, Jonathan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [46] Evaluating the Elementary Multilingual Capabilities of Large Language Models with MULTIQ
    Holtermann, Carolin
    Rottger, Paul
    Dill, Timm
    Lauscher, Anne
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 4476 - 4494
  • [47] Evaluating the Efficacy of Large Language Models in Identifying Phishing Attempts
    Patel, Het
    Reiman, Umair
    Iqbal, Farkhund
    2024 16TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION, HSI 2024, 2024,
  • [48] Evaluating Object Hallucination in Large Vision-Language Models
    Li, Yifan
    Du, Yifan
    Zhou, Kun
    Wang, Jinpeng
    Zhao, Wayne Xin
    Wen, Ji-Rong
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 292 - 305
  • [49] Methodological Challenges in Evaluating Large Language Models in Radiology Response
    Krishna, Satheesh
    Bhayana, Rajesh
    RADIOLOGY, 2024, 313 (03) : 1 - 2
  • [50] ProAgent: Building Proactive Cooperative Agents with Large Language Models
    Zhang, Ceyao
    Yang, Kaijie
    Hu, Siyi
    Wang, Zihao
    Li, Guanghe
    Sun, Yihang
    Zhang, Cheng
    Zhang, Zhaowei
    Liu, Anji
    Zhu, Song-Chun
    Chang, Xiaojun
    Zhang, Junge
    Yin, Feng
    Liang, Yitao
    Yang, Yaodong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17591 - 17599