Fine-grained knowledge fusion for retrieval-augmented medical visual question answering

被引:0
|
作者
Liang, Xiao [1 ]
Wang, Di [1 ]
Jing, Bin [2 ]
Jiao, Zhicheng [3 ]
Li, Ronghan [1 ]
Liu, Ruyi [1 ]
Miao, Qiguang [1 ]
Wang, Quan [1 ]
机构
[1] Xidian Univ, Sch Comp Sci & Technol, Xian, Shaanxi, Peoples R China
[2] Capital Med Univ, Sch Biomed Engn, Beijing, Peoples R China
[3] Brown Univ, Warren Alpert Med Sch, Providence, RI USA
基金
中国国家自然科学基金;
关键词
Medical visual question answering (MedVQA); Foundation model; Vision and language; Retrieval augmentation; Multi-modal learning;
D O I
10.1016/j.inffus.2025.103059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given that medical image analysis often requires experts to recall typical symptoms from diagnostic archives or their own experience, implementing retrieval augmentation in multi-modal tasks like Medical Visual Question Answering (MedVQA) becomes a logical step to facilitate access and use of diverse case data. However, introducing existing retrieval augmentation methods to MedVQA faces two limitations: (1) Due to privacy concerns, direct access to original medical data is typically restricted. (2) The symptoms distinguishing various diseases are often subtle and fine-grained, complicating the task of ensuring that the retrieved information precisely matches the query. To address these challenges, we propose a retrieval augmentation framework with the Fine-Grained Re-Weighting (FGRW) strategy, which employs fine-grained encoding for retrieved multi-source knowledge, avoiding direct access to original image-text data. It then computes re-weighted relevance scores between queries and knowledge, using these scores as supervised priors to guide the fusion of queries and knowledge, thus reducing interference from redundant information in answering questions. Experimental results on PathVQA, VQA-RAD, and SLAKE public benchmarks demonstrate FGRW's state-of-the-art performance. Code is available at the public repository.1
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering
    Lin, Weizhe
    Chen, Jinghong
    Mei, Jingbiao
    Coca, Alexandru
    Byrne, Bill
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] Retrieval-Augmented Knowledge Graph Reasoning for Commonsense Question Answering
    Sha, Yuchen
    Feng, Yujian
    He, Miao
    Liu, Shangdong
    Ji, Yimu
    MATHEMATICS, 2023, 11 (15)
  • [3] RAVL: A Retrieval-Augmented Visual Language Model Framework for Knowledge-Based Visual Question Answering
    Chai, Naiquan
    Zou, Dongsheng
    Liu, Jiyuan
    Wang, Hao
    Yang, Yuming
    Song, Xinyi
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 394 - 406
  • [4] Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering
    Xu, Zhentao
    Cruz, Mark Jerome
    Guevara, Matthew
    Wang, Tie
    Deshpande, Manasi
    Wang, Xiaofeng
    Li, Zheng
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2905 - 2909
  • [5] Fine-Grained Unbalanced Interaction Network for Visual Question Answering
    Liao, Xinxin
    Wu, Mingyan
    Chai, Heyan
    Qi, Shuhan
    Wang, Xuan
    Liao, Qing
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT III, 2021, 12817 : 85 - 97
  • [6] Plenty is Plague: Fine-Grained Learning for Visual Question Answering
    Zhou, Yiyi
    Ji, Rongrong
    Sun, Xiaoshuai
    Su, Jinsong
    Meng, Deyu
    Gao, Yue
    Shen, Chunhua
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (02) : 697 - 709
  • [7] Leveraging Retrieval-Augmented Generation for Reliable Medical Question Answering Using Large Language Models
    Kharitonova, Ksenia
    Perez-Fernandez, David
    Gutierrez-Hernando, Javier
    Gutierrez-Fandino, Asier
    Callejas, Zoraida
    Griol, David
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT II, HAIS 2024, 2025, 14858 : 141 - 153
  • [8] Evaluating Retrieval-Augmented Generation Models for Financial Report Question and Answering
    Iaroshev, Ivan
    Pillai, Ramalingam
    Vaglietti, Leandro
    Hanne, Thomas
    APPLIED SCIENCES-BASEL, 2024, 14 (20):
  • [9] Fine-grained linguistic evaluation of question answering systems
    El Ayari, Sarra
    Grau, Brigitte
    Ligozat, Anne-Laure
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 2354 - 2360
  • [10] CooKie: commonsense knowledge-guided mixture-of-experts framework for fine-grained visual question answering
    Wang, Chao
    Yang, Jianming
    Zhou, Yang
    Yue, Xiaodong
    INFORMATION SCIENCES, 2025, 695