MKGF: A multi-modal knowledge graph based RAG framework to enhance LVLMs for Medical visual question answering

被引:0
|
作者
Wu, Yinan [1 ]
Lu, Yuming [1 ]
Zhou, Yan [1 ]
Ding, Yifan [2 ]
Liu, Jingping [1 ]
Ruan, Tong [1 ]
机构
[1] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai 200237, Peoples R China
[2] Fudan Univ, Zhongshan Hosp, Dept Crit Care Med, Shanghai 200032, Peoples R China
关键词
Multi-modal; Knowledge graph; Large language model; RETRIEVAL;
D O I
10.1016/j.neucom.2025.129999
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical visual question answering (MedVQA) is a challenging task that requires models to understand medical images and return accurate responses for the given questions. Most recent methods focus on transferring general-domain large vision-language models (LVLMs) to the medical domain by constructing medical instruction datasets and in-context learning. However, the performance of these methods are limited due to the hallucination issue of LVLMs. In addition, fine-tuning the abundant parameters of LVLMs on medical instruction datasets is high time and economic cost. Hence, we propose a MKGF framework that leverages a multi-modal medical knowledge graph (MMKG) to relieve the hallucination issue without fine-tuning the abundant parameters of LVLMs. Firstly, we employ a pre-trained text retriever to build question-knowledge relations on training set. Secondly, we train a multi-modal retriever with these relations. Finally, we use it to retrieve question-relevant knowledge and enhance the performance of LVLMs on the test set. To evaluate the effectiveness of MKGF, we conduct extensive experiments on two public datasets Slake and VQA-RAD. Our method improves the pre-trained SOTA LVLMs by 10.15% and 9.32%, respectively. The source codes are available at https://github.com/ehnal/MKGF.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Knowledge-Based Visual Question Answering Using Multi-Modal Semantic Graph
    Jiang, Lei
    Meng, Zuqiang
    ELECTRONICS, 2023, 12 (06)
  • [2] Medical Visual Question-Answering Model Based on Knowledge Enhancement and Multi-Modal Fusion
    Zhang, Dianyuan
    Yu, Chuanming
    An, Lu
    Proceedings of the Association for Information Science and Technology, 2024, 61 (01) : 703 - 708
  • [3] Multi-modal Question Answering System Driven by Domain Knowledge Graph
    Zhao, Zhengwei
    Wang, Xiaodong
    Xu, Xiaowei
    Wang, Qing
    5TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM 2019), 2019, : 43 - 47
  • [4] Interpretable medical image Visual Question Answering via multi-modal relationship graph learning
    Hu, Xinyue
    Gu, Lin
    Kobayashi, Kazuma
    Liu, Liangchen
    Zhang, Mengliang
    Harada, Tatsuya
    Summers, Ronald M.
    Zhu, Yingying
    MEDICAL IMAGE ANALYSIS, 2024, 97
  • [5] MM-Reasoner: A Multi-Modal Knowledge-Aware Framework for Knowledge-Based Visual Question Answering
    Khademi, Mahmoud
    Yang, Ziyi
    Frujeri, Felipe Vieira
    Zhu, Chenguang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 6571 - 6581
  • [6] Hierarchical deep multi-modal network for medical visual question answering
    Gupta D.
    Suman S.
    Ekbal A.
    Expert Systems with Applications, 2021, 164
  • [7] Multi-modal Contextual Graph Neural Network for Text Visual Question Answering
    Liang, Yaoyuan
    Wang, Xin
    Duan, Xuguang
    Zhu, Wenwu
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3491 - 3498
  • [8] Knowledge-Enhanced Visual Question Answering with Multi-modal Joint Guidance
    Wang, Jianfeng
    Zhang, Anda
    Du, Huifang
    Wang, Haofen
    Zhang, Wenqiang
    PROCEEDINGS OF THE 11TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE GRAPHS, IJCKG 2022, 2022, : 115 - 120
  • [9] Multi-Modal Validation and Domain Interaction Learning for Knowledge-Based Visual Question Answering
    Xu, Ning
    Gao, Yifei
    Liu, An-An
    Tian, Hongshuo
    Zhang, Yongdong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (11) : 6628 - 6640
  • [10] Multi-modal Multi-scale State Space Model for Medical Visual Question Answering
    Chen, Qishen
    Bian, Minjie
    He, Wenxuan
    Xu, Huahu
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VIII, 2024, 15023 : 328 - 342