Enhancement of the Performance of Large Language Models inDiabetes Education through Retrieval-Augmented Generation:Comparative Study

被引:1
|
作者
Wang, Dingqiao [1 ]
Liang, Jiangbo [1 ]
Ye, Jinguo [1 ]
Li, Jingni [1 ]
Li, Jingpeng [1 ]
Zhang, Qikai [1 ]
Hu, Qiuling [1 ]
Pan, Caineng [1 ]
Wang, Dongliang [1 ]
Liu, Zhong [1 ]
Shi, Wen [1 ]
Shi, Danli [2 ]
Li, Fei [1 ]
Qu, Bo [3 ]
Zheng, Yingfeng [1 ]
机构
[1] Sun Yat sen Univ, Zhongshan Ophthalm Ctr, Guangdong Prov Clin Res Ctr Ocular Dis, State Key Lab Ophthalmol,Guangdong Prov Key Lab Op, 07 Jinsui Rd, Guangzhou 510060, Peoples R China
[2] Hong Kong Polytech Univ, Res Ctr SHARP Vis, Hong Kong, Peoples R China
[3] Peking Univ Third Hosp, Beijing, Peoples R China
关键词
large language models; LLMs; retrieval-augmented generation; RAG; GPT-4.0; Claude-2; Google Bard; diabetes education;
D O I
10.2196/58041
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Large language models (LLMs) demonstrated advanced performance in processing clinical information. However,commercially available LLMs lack specialized medical knowledge and remain susceptible to generating inaccurate information.Given the need for self-management in diabetes, patients commonly seek information online. We introduce the Retrieval-augmentedInformation System for Enhancement (RISE) framework and evaluate its performance in enhancing LLMs to provide accurateresponses to diabetes-related inquiries.Objective: This study aimed to evaluate the potential of the RISE framework, an information retrieval and augmentation tool,to improve the LLM's performance to accurately and safely respond to diabetes-related inquiries.Methods: The RISE, an innovative retrieval augmentation framework, comprises 4 steps: rewriting query, information retrieval,summarization, and execution. Using a set of 43 common diabetes-related questions, we evaluated 3 base LLMs (GPT-4, AnthropicClaude 2, Google Bard) and their RISE-enhanced versions respectively. Assessments were conducted by clinicians for accuracyand comprehensiveness and by patients for understandability.Results: The integration of RISE significantly improved the accuracy and comprehensiveness of responses from all 3 baseLLMs. On average, the percentage of accurate responses increased by 12% (15/129) with RISE. Specifically, the rates of accurateresponses increased by 7% (3/43) for GPT-4, 19% (8/43) for Claude 2, and 9% (4/43) for Google Bard. The framework alsoenhanced response comprehensiveness, with mean scores improving by 0.44 (SD 0.10). Understandability was also enhanced by0.19 (SD 0.13) on average. Data collection was conducted from September 30, 2023 to February 5, 2024.Conclusions: The RISE significantly improves LLMs'performance in responding to diabetes-related inquiries, enhancingaccuracy, comprehensiveness, and understandability. These improvements have crucial implications for RISE's future role inpatient education and chronic illness self-management, which contributes to relieving medical resource pressures and raisingpublic awareness of medical knowledge.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Integrating Small Language Models with Retrieval-Augmented Generation in Computing Education: Key Takeaways, Setup, and Practical Insights
    Yu, Zezhu
    Liu, Suqing
    Denny, Paul
    Bergen, Andreas
    Liut, Michael
    PROCEEDINGS OF THE 56TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE TS 2025, VOL 1, 2025, : 1302 - 1308
  • [42] Application of NotebookLM, a large language model with retrieval-augmented generation, for lung cancer staging
    Tozuka, Ryota
    Johno, Hisashi
    Amakawa, Akitomo
    Sato, Junichi
    Muto, Mizuki
    Seki, Shoichiro
    Komaba, Atsushi
    Onishi, Hiroshi
    JAPANESE JOURNAL OF RADIOLOGY, 2024, : 706 - 712
  • [43] Emergency Patient Triage Improvement through a Retrieval-Augmented Generation Enhanced Large-Scale Language Model
    Yazaki, Megumi
    Maki, Satoshi
    Furuya, Takeo
    Inoue, Ken
    Nagai, Ko
    Nagashima, Yuki
    Maruyama, Juntaro
    Toki, Yasunori
    Kitagawa, Kyota
    Iwata, Shuhei
    Kitamura, Takaki
    Gushiken, Sho
    Noguchi, Yuji
    Inoue, Masahiro
    Shiga, Yasuhiro
    Inage, Kazuhide
    Orita, Sumihisa
    Nakada, Takaaki
    Ohtori, Seiji
    PREHOSPITAL EMERGENCY CARE, 2024,
  • [44] Optimizing High-Level Synthesis Designs with Retrieval-Augmented Large Language Models
    Xu, Haocheng
    Hu, Haotian
    Huang, Sitao
    2024 IEEE LLM AIDED DESIGN WORKSHOP, LAD 2024, 2024,
  • [45] Performance Evaluation of Vector Embeddings with Retrieval-Augmented Generation
    Kukreja, Sanjay
    Kumar, Tarun
    Bharate, Vishal
    Purohit, Amit
    Dasgupta, Abhijit
    Guha, Debashis
    2024 9TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS, ICCCS 2024, 2024, : 333 - 340
  • [46] Leveraging Retrieval-Augmented Generation for Swahili Language Conversation Systems
    Ndimbo, Edmund V.
    Luo, Qin
    Fernando, Gimo C.
    Yang, Xu
    Wang, Bang
    APPLIED SCIENCES-BASEL, 2025, 15 (02):
  • [47] Multimodal retrieval-augmented generation for financial documents: image-centric analysis of charts and tables with large language models
    Jiang, Cheng
    Zhang, Pengle
    Ni, Ying
    Wang, Xiaoli
    Peng, Hanghang
    Liu, Sen
    Fei, Mengdi
    He, Yuxin
    Xiao, Yaxuan
    Huang, Jin
    Ma, Xingyu
    Yang, Tian
    VISUAL COMPUTER, 2025,
  • [48] Enhanced Recommendation Systems with Retrieval-Augmented Large Language Model
    Wei, Chuyuan
    Duan, Ke
    Zhuo, Shengda
    Wang, Hongchun
    Huang, Shuqiang
    Liu, Jie
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2025, 82 : 1147 - 1173
  • [49] A Retrieval-Augmented Framework for Tabular Interpretation with Large Language Model
    Yan, Mengyi
    Rene, Weilong
    Wang, Yaoshu
    Li, Jianxin
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2024, PT 2, 2025, 14851 : 341 - 356
  • [50] Application of retrieval-augmented generation for interactive industrial knowledge management via a large language model
    Chen, Lun-Chi
    Pardeshi, Mayuresh Sunil
    Liao, Yi-Xiang
    Pai, Kai-Chih
    COMPUTER STANDARDS & INTERFACES, 2025, 94