Enhancement of the Performance of Large Language Models inDiabetes Education through Retrieval-Augmented Generation:Comparative Study

被引：1

作者：

Wang, Dingqiao ^{[1
]}

Liang, Jiangbo ^{[1
]}

Ye, Jinguo ^{[1
]}

Li, Jingni ^{[1
]}

Li, Jingpeng ^{[1
]}

Zhang, Qikai ^{[1
]}

Hu, Qiuling ^{[1
]}

Pan, Caineng ^{[1
]}

Wang, Dongliang ^{[1
]}

Liu, Zhong ^{[1
]}

Shi, Wen ^{[1
]}

Shi, Danli ^{[2
]}

Li, Fei ^{[1
]}

Qu, Bo ^{[3
]}

Zheng, Yingfeng ^{[1
]}

机构：

[1] Sun Yat sen Univ, Zhongshan Ophthalm Ctr, Guangdong Prov Clin Res Ctr Ocular Dis, State Key Lab Ophthalmol,Guangdong Prov Key Lab Op, 07 Jinsui Rd, Guangzhou 510060, Peoples R China

[2] Hong Kong Polytech Univ, Res Ctr SHARP Vis, Hong Kong, Peoples R China

[3] Peking Univ Third Hosp, Beijing, Peoples R China

来源：

JOURNAL OF MEDICAL INTERNET RESEARCH | 2024年 / 26卷

关键词：

large language models; LLMs; retrieval-augmented generation; RAG; GPT-4.0; Claude-2; Google Bard; diabetes education;

D O I：

10.2196/58041

中图分类号：

R19 [保健组织与事业（卫生事业管理）];

学科分类号：

摘要：

Background: Large language models (LLMs) demonstrated advanced performance in processing clinical information. However,commercially available LLMs lack specialized medical knowledge and remain susceptible to generating inaccurate information.Given the need for self-management in diabetes, patients commonly seek information online. We introduce the Retrieval-augmentedInformation System for Enhancement (RISE) framework and evaluate its performance in enhancing LLMs to provide accurateresponses to diabetes-related inquiries.Objective: This study aimed to evaluate the potential of the RISE framework, an information retrieval and augmentation tool,to improve the LLM's performance to accurately and safely respond to diabetes-related inquiries.Methods: The RISE, an innovative retrieval augmentation framework, comprises 4 steps: rewriting query, information retrieval,summarization, and execution. Using a set of 43 common diabetes-related questions, we evaluated 3 base LLMs (GPT-4, AnthropicClaude 2, Google Bard) and their RISE-enhanced versions respectively. Assessments were conducted by clinicians for accuracyand comprehensiveness and by patients for understandability.Results: The integration of RISE significantly improved the accuracy and comprehensiveness of responses from all 3 baseLLMs. On average, the percentage of accurate responses increased by 12% (15/129) with RISE. Specifically, the rates of accurateresponses increased by 7% (3/43) for GPT-4, 19% (8/43) for Claude 2, and 9% (4/43) for Google Bard. The framework alsoenhanced response comprehensiveness, with mean scores improving by 0.44 (SD 0.10). Understandability was also enhanced by0.19 (SD 0.13) on average. Data collection was conducted from September 30, 2023 to February 5, 2024.Conclusions: The RISE significantly improves LLMs'performance in responding to diabetes-related inquiries, enhancingaccuracy, comprehensiveness, and understandability. These improvements have crucial implications for RISE's future role inpatient education and chronic illness self-management, which contributes to relieving medical resource pressures and raisingpublic awareness of medical knowledge.

引用

页数：12

共 50 条

[1] Benchmarking Large Language Models in Retrieval-Augmented Generation
Chen, Jiawei
Lin, Hongyu
Han, Xianpei
Sun, Le
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17754 - 17762
[2] Adaptive Control of Retrieval-Augmented Generation for Large Language Models Through Reflective Tags
Yao, Chengyuan
Fujita, Satoshi
ELECTRONICS, 2024, 13 (23):
[3] Quantitative Evaluation of Using Large Language Models and Retrieval-Augmented Generation in Computer Science Education
Wang, Kevin Shukang
Lawrence, Ramon
PROCEEDINGS OF THE 56TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE TS 2025, VOL 2, 2025, : 1183 - 1189
[4] Quantitative Evaluation of Using Large Language Models and Retrieval-Augmented Generation in Computer Science Education
Wang, Kevin Shukang
Lawrence, Ramon
PROCEEDINGS OF THE 56TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE TS 2025, VOL 1, 2025, : 1183 - 1189
[5] Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy
Shaol, Zhihong
Gong, Yeyun
Shen, Yelong
Huang, Minlie
Duane, Nan
Chen, Weizhu
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9248 - 9274
[6] Advancing Cyber Incident Timeline Analysis Through Retrieval-Augmented Generation and Large Language Models
Loumachi, Fatma Yasmine
Ghanem, Mohamed Chahine
Ferrag, Mohamed Amine
COMPUTERS, 2025, 14 (02)
[7] Integrating Graph Retrieval-Augmented Generation With Large Language Models for Supplier Discovery
Li, Yunqing
Ko, Hyunwoong
Ameri, Farhad
JOURNAL OF COMPUTING AND INFORMATION SCIENCE IN ENGINEERING, 2025, 25 (02)
[8] Query Rewriting for Retrieval-Augmented Large Language Models
Ma, Xinbei
Gong, Yeyun
He, Pengcheng
Zhao, Hai
Duan, Nan
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5303 - 5315
[9] Unveiling the Power of Large Language Models: A Comparative Study of Retrieval-Augmented Generation, Fine-Tuning, and Their Synergistic Fusion for Enhanced Performance
Budakoglu, Gulsum
Emekci, Hakan
IEEE ACCESS, 2025, 13 : 30936 - 30951
[10] TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models
Shanghai Jiao Tong University, China
arXiv,

← 1 2 3 4 5 →