Large language models leverage external knowledge to extend clinical insight beyond language boundaries

被引：1

作者：

Wu, Jiageng ^{[1
]}

Wu, Xian ^{[2
]}

Qiu, Zhaopeng ^{[2
]}

Li, Minghui

Lin, Shixu ^{[1
]}

Zhang, Yingying ^{[2
]}

Zheng, Yefeng ^{[2
]}

Yuan, Changzheng ^{[1
,3
,5
]}

Yang, Jie ^{[1
,4
,6
,7
]}

机构：

[1] Zhejiang Univ, Sch Med, Sch Publ Hlth, Hangzhou 310058, Peoples R China

[2] Tencent YouTu Lab, Jarvis Res Ctr, 1 Tianchen East Rd, Beijing 100101, Peoples R China

[3] Harvard TH Chan Sch Publ Hlth, Dept Nutr, Boston, MA 02115 USA

[4] Harvard Med Sch, Brigham & Womens Hosp, Dept Med, Div Pharmacoepidemiol & Pharmacoecon, Boston, MA 02115 USA

[5] Zhejiang Univ, Sch Publ Hlth, 866 Yuhangtang Rd, Hangzhou, Zhejiang, Peoples R China

[6] Brigham & Womens Hosp, Dept Med, 75 Francis St, Boston, MA 02115 USA

[7] Harvard Med Sch, 75 Francis St, Boston, MA 02115 USA

来源：

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION | 2024年 / 31卷 / 09期

关键词：

large language models; clinical knowledge; natural language processing; medical examination;

D O I：

10.1093/jamia/ocae079

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Objectives Large Language Models (LLMs) such as ChatGPT and Med-PaLM have excelled in various medical question-answering tasks. However, these English-centric models encounter challenges in non-English clinical settings, primarily due to limited clinical knowledge in respective languages, a consequence of imbalanced training corpora. We systematically evaluate LLMs in the Chinese medical context and develop a novel in-context learning framework to enhance their performance.Materials and Methods The latest China National Medical Licensing Examination (CNMLE-2022) served as the benchmark. We collected 53 medical books and 381 149 medical questions to construct the medical knowledge base and question bank. The proposed Knowledge and Few-shot Enhancement In-context Learning (KFE) framework leverages the in-context learning ability of LLMs to integrate diverse external clinical knowledge sources. We evaluated KFE with ChatGPT (GPT-3.5), GPT-4, Baichuan2-7B, Baichuan2-13B, and QWEN-72B in CNMLE-2022 and further investigated the effectiveness of different pathways for incorporating LLMs with medical knowledge from 7 distinct perspectives.Results Directly applying ChatGPT failed to qualify for the CNMLE-2022 at a score of 51. Cooperated with the KFE framework, the LLMs with varying sizes yielded consistent and significant improvements. The ChatGPT's performance surged to 70.04 and GPT-4 achieved the highest score of 82.59. This surpasses the qualification threshold (60) and exceeds the average human score of 68.70, affirming the effectiveness and robustness of the framework. It also enabled a smaller Baichuan2-13B to pass the examination, showcasing the great potential in low-resource settings.Discussion and Conclusion This study shed light on the optimal practices to enhance the capabilities of LLMs in non-English medical scenarios. By synergizing medical knowledge through in-context learning, LLMs can extend clinical insight beyond language barriers in healthcare, significantly reducing language-related disparities of LLM applications and ensuring global benefit in this field.

引用

页码：2054 / 2064

页数：11

共 50 条

[31] SKILL: Structured Knowledge Infusion for Large Language Models
Moiseev, Fedor
Dong, Zhe
Alfonseca, Enrique
Jaggi, Martin
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 1581 - 1588
[32] Detoxifying Large Language Models via Knowledge Editing
Wang, Mengru
Zhang, Ningyu
Xu, Ziwen
Xi, Zekun
Deng, Shumin
Yao, Yunzhi
Zhang, Qishen
Yang, Linyi
Wang, Jindong
Chen, Huajun
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 3093 - 3118
[33] Unifying Large Language Models and Knowledge Graphs: A Roadmap
Pan, Shirui
Luo, Linhao
Wang, Yufei
Chen, Chen
Wang, Jiapu
Wu, Xindong
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (07) : 3580 - 3599
[34] Clinical large language models with misplaced focus
Luo, Zining
Ma, Haowei
Li, Zhiwu
Chen, Yuquan
Sun, Yixin
Hu, Aimin
Yu, Jiang
Qiao, Yang
Gu, Junxian
Li, Hongying
Peng, Xuxi
Wang, Dunrui
Liu, Ying
Liu, Zhenglong
Xie, Jiebin
Jiang, Zhen
Tian, Gang
NATURE MACHINE INTELLIGENCE, 2024, : 1411 - 1412
[35] Large language models encode clinical knowledge (vol 620, pg 172, 2023)
Singhal, Karan
Azizi, Shekoofeh
Tu, Tao
Mahdavi, S. Sara
Wei, Jason
Chung, Hyung Won
Scales, Nathan
Tanwani, Ajay
Cole-Lewis, Heather
Pfohl, Stephen
Payne, Perry
Seneviratne, Martin
Gamble, Paul
Kelly, Chris
Babiker, Abubakr
Schaerli, Nathanael
Chowdhery, Aakanksha
Mansfield, Philip
Demner-Fushman, Dina
Arcas, Blaise
Webster, Dale
Corrado, Greg S.
Matias, Yossi
Chou, Katherine
Gottweis, Juraj
Tomasev, Nenad
Liu, Yun
Rajkomar, Alvin
Barral, Joelle
Semturs, Christopher
Karthikesalingam, Alan
Natarajan, Vivek
NATURE, 2023,
[36] Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding
Wang, Yuqing
Zhao, Yun
Petzold, Linda
MACHINE LEARNING FOR HEALTHCARE CONFERENCE, VOL 219, 2023, 219
[37] Assertion Detection in Clinical Natural Language Processing using Large Language Models
Ji, Yuelyu
Yu, Zeshui
Wang, Yanshan
2024 IEEE 12TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS, ICHI 2024, 2024, : 242 - 247
[38] Beyond the hype: Unveiling the challenges of large language models in urology
Kwong, Jethro C. C.
Nguyen, David-Dan
Khondker, Adree
Li, Tiange
CUAJ-CANADIAN UROLOGICAL ASSOCIATION JOURNAL, 2024, 18 (10): : 333 - 334
[39] Large Language Models are Not Models of Natural Language: They are Corpus Models
Veres, Csaba
IEEE ACCESS, 2022, 10 : 61970 - 61979
[40] Pushing the Boundaries of Legal Information Processing with Integration of Large Language Models
Nguyen, Chau
Tran, Thanh
Le, Khang
Nguyen, Hien
Do, Truong
Pham, Trang
Luu, Son T.
Vo, Trung
Nguyen, Le-Minh
NEW FRONTIERS IN ARTIFICIAL INTELLIGENCE, JSAI-ISAI 2024, 2024, 14741 : 167 - 182

← 1 2 3 4 5 →