MKGF: A multi-modal knowledge graph based RAG framework to enhance LVLMs for Medical visual question answering

被引:0
|
作者
Wu, Yinan [1 ]
Lu, Yuming [1 ]
Zhou, Yan [1 ]
Ding, Yifan [2 ]
Liu, Jingping [1 ]
Ruan, Tong [1 ]
机构
[1] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai 200237, Peoples R China
[2] Fudan Univ, Zhongshan Hosp, Dept Crit Care Med, Shanghai 200032, Peoples R China
关键词
Multi-modal; Knowledge graph; Large language model; RETRIEVAL;
D O I
10.1016/j.neucom.2025.129999
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical visual question answering (MedVQA) is a challenging task that requires models to understand medical images and return accurate responses for the given questions. Most recent methods focus on transferring general-domain large vision-language models (LVLMs) to the medical domain by constructing medical instruction datasets and in-context learning. However, the performance of these methods are limited due to the hallucination issue of LVLMs. In addition, fine-tuning the abundant parameters of LVLMs on medical instruction datasets is high time and economic cost. Hence, we propose a MKGF framework that leverages a multi-modal medical knowledge graph (MMKG) to relieve the hallucination issue without fine-tuning the abundant parameters of LVLMs. Firstly, we employ a pre-trained text retriever to build question-knowledge relations on training set. Secondly, we train a multi-modal retriever with these relations. Finally, we use it to retrieve question-relevant knowledge and enhance the performance of LVLMs on the test set. To evaluate the effectiveness of MKGF, we conduct extensive experiments on two public datasets Slake and VQA-RAD. Our method improves the pre-trained SOTA LVLMs by 10.15% and 9.32%, respectively. The source codes are available at https://github.com/ehnal/MKGF.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] A knowledge graph based question answering method for medical domain
    Huang, Xiaofeng
    Zhang, Jixin
    Xu, Zisang
    Ou, Lu
    Tong, Jianbin
    PEERJ COMPUTER SCIENCE, 2021, 7 : 1 - 19
  • [32] A Chinese Medical Question Answering System Based on Knowledge Graph
    Zhou, Chengyang
    Guan, Renchu
    Zhao, Chuntao
    Chai, Gonglei
    Wang, Leigang
    Han, Xiaosong
    2021 IEEE 15TH INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING (BIGDATASE 2021), 2021, : 28 - 33
  • [33] Multi-level, multi-modal interactions for visual question answering over text in images
    Chen, Jincai
    Zhang, Sheng
    Zeng, Jiangfeng
    Zou, Fuhao
    Li, Yuan-Fang
    Liu, Tao
    Lu, Ping
    World Wide Web, 2022, 25 (04) : 1607 - 1623
  • [34] Multi-level, multi-modal interactions for visual question answering over text in images
    Jincai Chen
    Sheng Zhang
    Jiangfeng Zeng
    Fuhao Zou
    Yuan-Fang Li
    Tao Liu
    Ping Lu
    World Wide Web, 2022, 25 : 1607 - 1623
  • [35] Multi-level, multi-modal interactions for visual question answering over text in images
    Chen, Jincai
    Zhang, Sheng
    Zeng, Jiangfeng
    Zou, Fuhao
    Li, Yuan-Fang
    Liu, Tao
    Lu, Ping
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2022, 25 (04): : 1607 - 1623
  • [36] K-armed Bandit based Multi-modal Network Architecture Search for Visual Question Answering
    Zhou, Yiyi
    Ji, Rongrong
    Sun, Xiaoshuai
    Luo, Gen
    Hong, Xiaopeng
    Su, Jinsong
    Ding, Xinghao
    Shao, Ling
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1245 - 1254
  • [37] Decouple Before Interact: Multi-Modal Prompt Learning for Continual Visual Question Answering
    Qian, Zi
    Wang, Xin
    Duan, Xuguang
    Qin, Pengda
    Li, Yuhong
    Zhu, Wenwu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2941 - 2950
  • [38] MRD-Net: Multi-Modal Residual Knowledge Distillation for Spoken Question Answering
    You, Chenyu
    Chen, Nuo
    Zou, Yuexian
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3985 - 3991
  • [39] Cross-modal knowledge reasoning for knowledge-based visual question answering
    Yu, Jing
    Zhu, Zihao
    Wang, Yujing
    Zhang, Weifeng
    Hu, Yue
    Tan, Jianlong
    PATTERN RECOGNITION, 2020, 108
  • [40] Research on medical automatic Question answering model based on knowledge graph
    Shi, Haonan
    Liu, Xueping
    Shi, Gonglin
    Li, Dongyu
    Ding, Silu
    2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 1778 - 1782