OphGLM: An ophthalmology large language-and-vision assistant

被引:1
|
作者
Deng, Zhuo [1 ]
Gao, Weihao [1 ]
Chen, Chucheng [1 ]
Niu, Zhiyuan [1 ]
Gong, Zheng [1 ]
Zhang, Ruiheng [2 ]
Cao, Zhenjie [1 ]
Li, Fang [1 ]
Ma, Zhaoyi [3 ,4 ]
Wei, Wenbin [2 ]
Ma, Lan [1 ]
机构
[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[2] Beijing Tongren Hosp, Beijing Tongren Eye Ctr, Beijing, Peoples R China
[3] Natl Hlth Commiss Capacity Bldg, Beijing, Peoples R China
[4] Continuing Educ Ctr, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Ophthalmology; Visual dialogue interaction; Large language models; ARTIFICIAL-INTELLIGENCE;
D O I
10.1016/j.artmed.2024.103001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision computer-aided diagnostic methods have been used in early ophthalmic disease screening and diagnosis. However, the limited output formats of these methods lead to poor human-computer interaction and low clinical applicability value. Thus, ophthalmic visual question answering is worth studying. Unfortunately, no practical solutions exist before Large Language Models(LLMs). In this paper, we investigate the ophthalmic visual diagnostic interaction problem. We construct an ophthalmology large language-and-vision assistant, OphGLM, consisting of an image encoder, a text encoder, a fusion module, and an LLM module. We establish anew Chinese ophthalmic fine-tuning dataset, FundusTuning-CN, including the fundus instruction and conversation sets. Based on FundusTuning-CN, we establish a novel LLM-tuning strategy to introduce visual model understanding and ophthalmic knowledge into LLMs at a low cost and high efficiency. Leveraging the pre-training of the image encoder, OphGLM demonstrates strong visual understanding and surpasses opensource visual language models in common fundus disease classification tasks. The FundusTuning-CN enables OphGLM to surpass open-source medical LLMs in both ophthalmic knowledge and interactive capabilities. Our proposed OphGLM has the potential to revolutionize clinical applications in ophthalmology. The dataset, code, and models will be publicly available at https://github.com/ML-AILab/OphGLM.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] LLaVA-Endo: a large language-and-vision assistant for gastrointestinal endoscopy
    Yao, Jieru
    Li, Xueran
    Xie, Qiang
    Han, Longfei
    Jia, Yiwen
    Liu, Nian
    Zhang, Dingwen
    Han, Junwei
    FRONTIERS OF COMPUTER SCIENCE, 2025, 19 (04)
  • [2] LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
    Li, Chunyuan
    Wong, Cliff
    Zhang, Sheng
    Usuyama, Naoto
    Liu, Haotian
    Yang, Jianwei
    Naumann, Tristan
    Poon, Hoifung
    Gao, Jianfeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] Multimodal Large Language Models in Vision and Ophthalmology
    Lu, Zhiyong
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2024, 65 (07)
  • [4] Vision of the future: large language models in ophthalmology
    Tailor, Prashant D.
    D'Souza, Haley S.
    Li, Hanzhou
    Starr, Matthew R.
    CURRENT OPINION IN OPHTHALMOLOGY, 2024, 35 (05) : 391 - 402
  • [5] Vision language models in ophthalmology
    Lim, Gilbert
    Elangovan, Kabilan
    Jin, Liyuan
    CURRENT OPINION IN OPHTHALMOLOGY, 2024, 35 (06) : 487 - 493
  • [6] Large Language and Vision Assistant in dermatology: a game changer or just hype?
    Goktas, Polat
    Gulseren, Duygu
    Tobin, Anne-Marie
    CLINICAL AND EXPERIMENTAL DERMATOLOGY, 2024, 49 (08) : 783 - 792
  • [7] Transmission Versus Truth, Imitation Versus Innovation: What Children Can Do That Large Language and Language-and-Vision Models Cannot (Yet)
    Yiu, Eunice
    Kosoy, Eliza
    Gopnik, Alison
    PERSPECTIVES ON PSYCHOLOGICAL SCIENCE, 2024, 19 (05) : 874 - 883
  • [8] SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
    Sun, Guohao
    Qin, Can
    Wang, Jiamian
    Chen, Zeyuan
    Xu, Ran
    Tao, Zhiqiang
    COMPUTER VISION - ECCV 2024, PT IX, 2025, 15067 : 156 - 172
  • [9] VLAAD: Vision and Language Assistant for Autonomous Driving
    Park, SungYeon
    Lee, MinJae
    Kang, JiHyuk
    Choi, Hahyeon
    Park, Yoonah
    Cho, Juhwan
    Lee, Adam
    Kim, DongKyu
    2024 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS, WACVW 2024, 2024, : 980 - 987
  • [10] What Does a Language-And-Vision Transformer See: The Impact of Semantic Information on Visual Representations
    Ilinykh, Nikolai
    Dobnik, Simon
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2021, 4