OphGLM: An ophthalmology large language-and-vision assistant

被引:1
|
作者
Deng, Zhuo [1 ]
Gao, Weihao [1 ]
Chen, Chucheng [1 ]
Niu, Zhiyuan [1 ]
Gong, Zheng [1 ]
Zhang, Ruiheng [2 ]
Cao, Zhenjie [1 ]
Li, Fang [1 ]
Ma, Zhaoyi [3 ,4 ]
Wei, Wenbin [2 ]
Ma, Lan [1 ]
机构
[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[2] Beijing Tongren Hosp, Beijing Tongren Eye Ctr, Beijing, Peoples R China
[3] Natl Hlth Commiss Capacity Bldg, Beijing, Peoples R China
[4] Continuing Educ Ctr, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Ophthalmology; Visual dialogue interaction; Large language models; ARTIFICIAL-INTELLIGENCE;
D O I
10.1016/j.artmed.2024.103001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision computer-aided diagnostic methods have been used in early ophthalmic disease screening and diagnosis. However, the limited output formats of these methods lead to poor human-computer interaction and low clinical applicability value. Thus, ophthalmic visual question answering is worth studying. Unfortunately, no practical solutions exist before Large Language Models(LLMs). In this paper, we investigate the ophthalmic visual diagnostic interaction problem. We construct an ophthalmology large language-and-vision assistant, OphGLM, consisting of an image encoder, a text encoder, a fusion module, and an LLM module. We establish anew Chinese ophthalmic fine-tuning dataset, FundusTuning-CN, including the fundus instruction and conversation sets. Based on FundusTuning-CN, we establish a novel LLM-tuning strategy to introduce visual model understanding and ophthalmic knowledge into LLMs at a low cost and high efficiency. Leveraging the pre-training of the image encoder, OphGLM demonstrates strong visual understanding and surpasses opensource visual language models in common fundus disease classification tasks. The FundusTuning-CN enables OphGLM to surpass open-source medical LLMs in both ophthalmic knowledge and interactive capabilities. Our proposed OphGLM has the potential to revolutionize clinical applications in ophthalmology. The dataset, code, and models will be publicly available at https://github.com/ML-AILab/OphGLM.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] A look at the emerging trends of large language models in ophthalmology
    Tan, Ting Fang
    Quek, Chrystie
    Wong, Joy
    Ting, Daniel S. W.
    CURRENT OPINION IN OPHTHALMOLOGY, 2025, 36 (01) : 83 - 89
  • [22] LLMGA: Multimodal Large Language Model Based Generation Assistant
    Xia, Bin
    Wang, Shiyin
    Tao, Yingfan
    Wang, Yitong
    Jia, Jiaya
    COMPUTER VISION-ECCV 2024, PT XXXVIII, 2025, 15096 : 389 - 406
  • [23] Pediatric Ophthalmology and Large Language Models: AI Has Arrived
    Wagner, Rudolph S.
    JOURNAL OF PEDIATRIC OPHTHALMOLOGY & STRABISMUS, 2024, 61 (02) : 80 - 80
  • [24] Utilizing Large Language Models in Ophthalmology: The Current Landscape and Challenges
    Chotcomwongse, Peranut
    Ruamviboonsuk, Paisan
    Grzybowski, Andrzej
    OPHTHALMOLOGY AND THERAPY, 2024, 13 (10) : 2543 - 2558
  • [25] A vision for veterinary ophthalmology
    Gelatt, KN
    JOURNAL OF SMALL ANIMAL PRACTICE, 2005, 46 (08) : 369 - 370
  • [26] CoLLaVO: Crayon Large Language and Vision mOdel
    Lee, Byung-Kwan
    Park, Beomchan
    Kim, Chae Won
    Ro, Yong Man
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 1121 - 1138
  • [27] A Large Language Model Agent Based Legal Assistant for Governance Applications
    Mamalis, Marios Evangelos
    Kalampokis, Evangelos
    Fitsilis, Fotios
    Theodorakopoulos, Georgios
    Tarabanis, Konstantinos
    ELECTRONIC GOVERNMENT, EGOV 2024, 2024, 14841 : 286 - 301
  • [28] Exploring large language model for next generation of artificial intelligence in ophthalmology
    Jin, Kai
    Yuan, Lu
    Wu, Hongkang
    Grzybowski, Andrzej
    Ye, Juan
    FRONTIERS IN MEDICINE, 2023, 10
  • [29] Incorporating External Knowledge Reasoning for Vision-and-Language Navigation with Assistant's Help
    Li, Xin
    Zhang, Yu
    Yuan, Weilin
    Luo, Junren
    APPLIED SCIENCES-BASEL, 2022, 12 (14):
  • [30] Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks
    Hakimov, Sherzod
    Schlangen, David
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 14196 - 14210