Medical Visual Question Answering via Conditional Reasoning

被引:58
|
作者
Zhan, Li-Ming [1 ]
Liu, Bo [1 ]
Fan, Lu [1 ]
Chen, Jiaxin [1 ]
Wu, Xiao-Ming [1 ]
机构
[1] Hong Kong Polytech Univ, Hong Kong, Peoples R China
关键词
medical visual question answering; attention mechanism; conditional reasoning;
D O I
10.1145/3394171.3413761
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical visual question answering (Med-VQA) aims to accurately answer a clinical question presented with a medical image. Despite its enormous potential in healthcare industry and services, the technology is still in its infancy and is far from practical use. Med-VQA tasks are highly challenging due to the massive diversity of clinical questions and the disparity of required visual reasoning skills for different types of questions. In this paper, we propose a novel conditional reasoning framework for Med-VQA, aiming to automatically learn effective reasoning skills for various Med-VQA tasks. Particularly, we develop a question-conditioned reasoning module to guide the importance selection over multimodal fusion features. Considering the different nature of closed-ended and open-ended Med-VQA tasks, we further propose a type-conditioned reasoning module to learn a different set of reasoning skills for the two types of tasks separately. Our conditional reasoning framework can be easily applied to existing Med-VQA systems to bring performance gains. In the experiments, we build our system on top of a recent state-of-the-art Med-VQA model and evaluate it on the VQA-RAD benchmark [23]. Remarkably, our system achieves significantly increased accuracy in predicting answers to both closed-ended and open-ended questions, especially for open-ended questions, where a 10.8% increase in absolute accuracy is obtained. The source code can be downloaded from https://github.com/awenbocc/med-vqa.
引用
收藏
页码:2345 / 2354
页数:10
相关论文
共 50 条
  • [21] A DIAGNOSTIC STUDY OF VISUAL QUESTION ANSWERING WITH ANALOGICAL REASONING
    Huang, Ziqi
    Zhu, Hongyuan
    Sun, Ying
    Choi, Dongkyu
    Tan, Cheston
    Lim, Joo-Hwee
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2463 - 2467
  • [22] Reasoning with large language models for medical question answering
    Lucas, Mary M.
    Yang, Justin
    Pomeroy, Jon K.
    Yang, Christopher C.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09)
  • [23] Visual Question Answering Research on Joint Knowledge and Visual Information Reasoning
    Su, Zhenqiang
    Gou, Gang
    Computer Engineering and Applications, 2024, 60 (05) : 95 - 102
  • [24] MVQAS: A Medical Visual Question Answering System
    Bai, Haoyue
    Shan, Xiaoyan
    Huang, Yefan
    Wang, Xiaoli
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 4675 - 4679
  • [25] Localized Questions in Medical Visual Question Answering
    Tascon-Morales, Sergio
    Marquez-Neila, Pablo
    Sznitman, Raphael
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT II, 2023, 14221 : 361 - 370
  • [26] Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts
    Lan, Yunshi
    Li, Xiang
    Liu, Xin
    Li, Yang
    Qin, Wei
    Qian, Weining
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4389 - 4400
  • [27] Learning a Mixture of Conditional Gating Blocks for Visual Question Answering
    Sun, Qiang
    Fu, Yan-Wei
    Xue, Xiang-Yang
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (04) : 912 - 928
  • [28] Self-Critical Reasoning for Robust Visual Question Answering
    Wu, Jialin
    Mooney, Raymond J.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [29] Explicit Knowledge-based Reasoning for Visual Question Answering
    Wang, Peng
    Wu, Qi
    Shen, Chunhua
    Dick, Anthony
    van den Hengel, Anton
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1290 - 1296
  • [30] LoRA: A Logical Reasoning Augmented Dataset for Visual Question Answering
    Gao, Jingying
    Wu, Qi
    Blair, Alan
    Pagnucco, Maurice
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,