Enhancement of the Performance of Large Language Models inDiabetes Education through Retrieval-Augmented Generation:Comparative Study

被引:1
|
作者
Wang, Dingqiao [1 ]
Liang, Jiangbo [1 ]
Ye, Jinguo [1 ]
Li, Jingni [1 ]
Li, Jingpeng [1 ]
Zhang, Qikai [1 ]
Hu, Qiuling [1 ]
Pan, Caineng [1 ]
Wang, Dongliang [1 ]
Liu, Zhong [1 ]
Shi, Wen [1 ]
Shi, Danli [2 ]
Li, Fei [1 ]
Qu, Bo [3 ]
Zheng, Yingfeng [1 ]
机构
[1] Sun Yat sen Univ, Zhongshan Ophthalm Ctr, Guangdong Prov Clin Res Ctr Ocular Dis, State Key Lab Ophthalmol,Guangdong Prov Key Lab Op, 07 Jinsui Rd, Guangzhou 510060, Peoples R China
[2] Hong Kong Polytech Univ, Res Ctr SHARP Vis, Hong Kong, Peoples R China
[3] Peking Univ Third Hosp, Beijing, Peoples R China
关键词
large language models; LLMs; retrieval-augmented generation; RAG; GPT-4.0; Claude-2; Google Bard; diabetes education;
D O I
10.2196/58041
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Large language models (LLMs) demonstrated advanced performance in processing clinical information. However,commercially available LLMs lack specialized medical knowledge and remain susceptible to generating inaccurate information.Given the need for self-management in diabetes, patients commonly seek information online. We introduce the Retrieval-augmentedInformation System for Enhancement (RISE) framework and evaluate its performance in enhancing LLMs to provide accurateresponses to diabetes-related inquiries.Objective: This study aimed to evaluate the potential of the RISE framework, an information retrieval and augmentation tool,to improve the LLM's performance to accurately and safely respond to diabetes-related inquiries.Methods: The RISE, an innovative retrieval augmentation framework, comprises 4 steps: rewriting query, information retrieval,summarization, and execution. Using a set of 43 common diabetes-related questions, we evaluated 3 base LLMs (GPT-4, AnthropicClaude 2, Google Bard) and their RISE-enhanced versions respectively. Assessments were conducted by clinicians for accuracyand comprehensiveness and by patients for understandability.Results: The integration of RISE significantly improved the accuracy and comprehensiveness of responses from all 3 baseLLMs. On average, the percentage of accurate responses increased by 12% (15/129) with RISE. Specifically, the rates of accurateresponses increased by 7% (3/43) for GPT-4, 19% (8/43) for Claude 2, and 9% (4/43) for Google Bard. The framework alsoenhanced response comprehensiveness, with mean scores improving by 0.44 (SD 0.10). Understandability was also enhanced by0.19 (SD 0.13) on average. Data collection was conducted from September 30, 2023 to February 5, 2024.Conclusions: The RISE significantly improves LLMs'performance in responding to diabetes-related inquiries, enhancingaccuracy, comprehensiveness, and understandability. These improvements have crucial implications for RISE's future role inpatient education and chronic illness self-management, which contributes to relieving medical resource pressures and raisingpublic awareness of medical knowledge.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] In-Context Retrieval-Augmented Language Models
    Ram, Ori
    Levine, Yoav
    Dalmedigos, Itay
    Muhlgay, Dor
    Shashua, Amnon
    Leyton-Brown, Kevin
    Shoham, Yoav
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1316 - 1331
  • [32] SurgeryLLM: a retrieval-augmented generation large language model framework for surgical decision support and workflow enhancement
    Ong, Chin Siang
    Obey, Nicholas T.
    Zheng, Yanan
    Cohan, Arman
    Schneider, Eric B.
    npj Digital Medicine, 2024, 7 (01)
  • [33] Performance of Retrieval-Augmented Large Language Models to Recommend Head and Neck Cancer Clinical Trials
    Hung, Tony K. W.
    Kuperman, Gilad J.
    Sherman, Eric J.
    Ho, Alan L.
    Weng, Chunhua
    Pfister, David G.
    Mao, Jun J.
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [34] SafetyRAG: Towards Safe Large Language Model-Based Application through Retrieval-Augmented Generation
    Omri, Sihem
    Abdelkader, Manel
    Hamdi, Mohamed
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2025, 16 (02) : 243 - 250
  • [35] Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models
    Kim, Gangwoo
    Kim, Sungdong
    Jeon, Byeongguk
    Park, Joonsuk
    Kang, Jaewoo
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 996 - 1009
  • [36] Leveraging GPT-4 for Accuracy in Education: A Comparative Study on Retrieval-Augmented Generation in MOOCs
    Miladi, Fatma
    Psyche, Valery
    Lemire, Daniel
    ARTIFICIAL INTELLIGENCE IN EDUCATION: POSTERS AND LATE BREAKING RESULTS, WORKSHOPS AND TUTORIALS, INDUSTRY AND INNOVATION TRACKS, PRACTITIONERS, DOCTORAL CONSORTIUM AND BLUE SKY, AIED 2024, PT I, 2024, 2150 : 427 - 434
  • [37] Retrieval-augmented Recommender System: Enhancing Recommender Systems with Large Language Models
    Di Palma, Dario
    PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 1369 - 1373
  • [38] Integrating Small Language Models with Retrieval-Augmented Generation in Computing Education: Key Takeaways, Setup, and Practical Insights
    Yu, Zezhu
    Liu, Suqing
    Denny, Paul
    Bergen, Andreas
    Liut, Michael
    PROCEEDINGS OF THE 56TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE TS 2025, VOL 2, 2025, : 1302 - 1308
  • [39] LeanDojo: Theorem Proving with Retrieval-Augmented Language Models
    Yang, Kaiyu
    Swope, Aidan M.
    Gu, Alex
    Chalamala, Rahul
    Song, Peiyang
    Yu, Shixing
    Godil, Saad
    Prenger, Ryan
    Anandkumar, Anima
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [40] Retrieval-Augmented Generation Approach: Document Question Answering using Large Language Model
    Muludi, Kurnia
    Fitria, Kaira Milani
    Triloka, Joko
    Sutedi
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (03) : 776 - 785