Large language models leverage external knowledge to extend clinical insight beyond language boundaries

被引：1

作者：

Wu, Jiageng ^{[1
]}

Wu, Xian ^{[2
]}

Qiu, Zhaopeng ^{[2
]}

Li, Minghui

Lin, Shixu ^{[1
]}

Zhang, Yingying ^{[2
]}

Zheng, Yefeng ^{[2
]}

Yuan, Changzheng ^{[1
,3
,5
]}

Yang, Jie ^{[1
,4
,6
,7
]}

机构：

[1] Zhejiang Univ, Sch Med, Sch Publ Hlth, Hangzhou 310058, Peoples R China

[2] Tencent YouTu Lab, Jarvis Res Ctr, 1 Tianchen East Rd, Beijing 100101, Peoples R China

[3] Harvard TH Chan Sch Publ Hlth, Dept Nutr, Boston, MA 02115 USA

[4] Harvard Med Sch, Brigham & Womens Hosp, Dept Med, Div Pharmacoepidemiol & Pharmacoecon, Boston, MA 02115 USA

[5] Zhejiang Univ, Sch Publ Hlth, 866 Yuhangtang Rd, Hangzhou, Zhejiang, Peoples R China

[6] Brigham & Womens Hosp, Dept Med, 75 Francis St, Boston, MA 02115 USA

[7] Harvard Med Sch, 75 Francis St, Boston, MA 02115 USA

来源：

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION | 2024年 / 31卷 / 09期

关键词：

large language models; clinical knowledge; natural language processing; medical examination;

D O I：

10.1093/jamia/ocae079

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Objectives Large Language Models (LLMs) such as ChatGPT and Med-PaLM have excelled in various medical question-answering tasks. However, these English-centric models encounter challenges in non-English clinical settings, primarily due to limited clinical knowledge in respective languages, a consequence of imbalanced training corpora. We systematically evaluate LLMs in the Chinese medical context and develop a novel in-context learning framework to enhance their performance.Materials and Methods The latest China National Medical Licensing Examination (CNMLE-2022) served as the benchmark. We collected 53 medical books and 381 149 medical questions to construct the medical knowledge base and question bank. The proposed Knowledge and Few-shot Enhancement In-context Learning (KFE) framework leverages the in-context learning ability of LLMs to integrate diverse external clinical knowledge sources. We evaluated KFE with ChatGPT (GPT-3.5), GPT-4, Baichuan2-7B, Baichuan2-13B, and QWEN-72B in CNMLE-2022 and further investigated the effectiveness of different pathways for incorporating LLMs with medical knowledge from 7 distinct perspectives.Results Directly applying ChatGPT failed to qualify for the CNMLE-2022 at a score of 51. Cooperated with the KFE framework, the LLMs with varying sizes yielded consistent and significant improvements. The ChatGPT's performance surged to 70.04 and GPT-4 achieved the highest score of 82.59. This surpasses the qualification threshold (60) and exceeds the average human score of 68.70, affirming the effectiveness and robustness of the framework. It also enabled a smaller Baichuan2-13B to pass the examination, showcasing the great potential in low-resource settings.Discussion and Conclusion This study shed light on the optimal practices to enhance the capabilities of LLMs in non-English medical scenarios. By synergizing medical knowledge through in-context learning, LLMs can extend clinical insight beyond language barriers in healthcare, significantly reducing language-related disparities of LLM applications and ensuring global benefit in this field.

引用

页码：2054 / 2064

页数：11

共 50 条

[21] Neural Natural Language Inference Models Enhanced with External Knowledge
Chen, Qian
Zhu, Xiaodan
Ling, Zhen-Hua
Inkpen, Diana
Wei, Si
PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2406 - 2417
[22] Quo Vadis ChatGPT? From large language models to Large Knowledge Models
Venkatasubramanian, Venkat
Chakraborty, Arijit
COMPUTERS & CHEMICAL ENGINEERING, 2025, 192
[23] Updating knowledge in Large Language Models: an Empirical Evaluation
Marinelli, Alberto Roberto
Carta, Antonio
Passaro, Lucia C.
IEEE CONFERENCE ON EVOLVING AND ADAPTIVE INTELLIGENT SYSTEMS 2024, IEEE EAIS 2024, 2024, : 289 - 296
[24] Benchmarking Biomedical Relation Knowledge in Large Language Models
Zhang, Fenghui
Yang, Kuo
Zhao, Chenqian
Li, Haixu
Dong, Xin
Tian, Haoyu
Zhou, Xuezhong
BIOINFORMATICS RESEARCH AND APPLICATIONS, PT II, ISBRA 2024, 2024, 14955 : 482 - 495
[25] ALCUNA: Large Language Models Meet New Knowledge
Yin, Xunjian
Huang, Baizhou
Wan, Xiaojun
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 1397 - 1414
[26] Knowledge of cultural moral norms in large language models
Ramezani, Aida
Xu, Yang
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 428 - 446
[27] Systematic Assessment of Factual Knowledge in Large Language Models
Luo, Linhao
Thuy-Trang Vu
Phung, Dinh
Haffari, Gholamreza
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 13272 - 13286
[28] CPLLM: Clinical prediction with large language models
Ben Shoham, Ofir
Rappoport, Nadav
PLOS DIGITAL HEALTH, 2024, 3 (12):
[29] Poisoning medical knowledge using large language models
Yang, Junwei
Xu, Hanwen
Mirzoyan, Srbuhi
Chen, Tong
Liu, Zixuan
Liu, Zequn
Ju, Wei
Liu, Luchen
Xiao, Zhiping
Zhang, Ming
Wang, Sheng
NATURE MACHINE INTELLIGENCE, 2024, 6 (10) : 1156 - 1168
[30] Knowledge Graph Treatments for Hallucinating Large Language Models
Collarana, Diego
Busch, Moritz
Lange, Christoph
ERCIM NEWS, 2024, (136): : 35 - 36

← 1 2 3 4 5 →