Medical Visual Question Answering via Conditional Reasoning

被引:58
|
作者
Zhan, Li-Ming [1 ]
Liu, Bo [1 ]
Fan, Lu [1 ]
Chen, Jiaxin [1 ]
Wu, Xiao-Ming [1 ]
机构
[1] Hong Kong Polytech Univ, Hong Kong, Peoples R China
关键词
medical visual question answering; attention mechanism; conditional reasoning;
D O I
10.1145/3394171.3413761
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical visual question answering (Med-VQA) aims to accurately answer a clinical question presented with a medical image. Despite its enormous potential in healthcare industry and services, the technology is still in its infancy and is far from practical use. Med-VQA tasks are highly challenging due to the massive diversity of clinical questions and the disparity of required visual reasoning skills for different types of questions. In this paper, we propose a novel conditional reasoning framework for Med-VQA, aiming to automatically learn effective reasoning skills for various Med-VQA tasks. Particularly, we develop a question-conditioned reasoning module to guide the importance selection over multimodal fusion features. Considering the different nature of closed-ended and open-ended Med-VQA tasks, we further propose a type-conditioned reasoning module to learn a different set of reasoning skills for the two types of tasks separately. Our conditional reasoning framework can be easily applied to existing Med-VQA systems to bring performance gains. In the experiments, we build our system on top of a recent state-of-the-art Med-VQA model and evaluate it on the VQA-RAD benchmark [23]. Remarkably, our system achieves significantly increased accuracy in predicting answers to both closed-ended and open-ended questions, especially for open-ended questions, where a 10.8% increase in absolute accuracy is obtained. The source code can be downloaded from https://github.com/awenbocc/med-vqa.
引用
收藏
页码:2345 / 2354
页数:10
相关论文
共 50 条
  • [41] A Study on Effectiveness of BERT Models and Task-Conditioned Reasoning Strategy for Medical Visual Question Answering
    Nguyen, Chau
    Le, Tung
    Le, Nguyen-Khang
    Pham, Trung-Tin
    Nguyen, Le-Minh
    ARTIFICIAL INTELLIGENCE FOR COMMUNICATIONS AND NETWORKS, AICON 2022, 2023, 477 : 60 - 71
  • [42] Medical visual question answering via corresponding feature fusion combined with semantic attention
    Zhu, Han
    He, Xiaohai
    Wang, Meiling
    Zhang, Mozhi
    Qing, Linbo
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2022, 19 (10) : 10192 - 10212
  • [43] Temporal knowledge graph question answering via subgraph reasoning
    Chen, Ziyang
    Zhao, Xiang
    Liao, Jinzhi
    Li, Xinyi
    Kanoulas, Evangelos
    KNOWLEDGE-BASED SYSTEMS, 2022, 251
  • [44] A Question-Centric Model for Visual Question Answering in Medical Imaging
    Vu, Minh H.
    Lofstedt, Tommy
    Nyholm, Tufve
    Sznitman, Raphael
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (09) : 2856 - 2868
  • [45] TYPE-AWARE MEDICAL VISUAL QUESTION ANSWERING
    Zhang, Anda
    Tao, Wei
    Li, Ziyan
    Wang, Haofen
    Zhang, Wenqiang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4838 - 4842
  • [46] Overcoming Data Limitation in Medical Visual Question Answering
    Nguyen, Binh D.
    Thanh-Toan Do
    Nguyen, Binh X.
    Do, Tuong
    Tjiputra, Erman
    Tran, Quang D.
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT IV, 2019, 11767 : 522 - 530
  • [47] Coarse-to-Fine Visual Question Answering by Iterative, Conditional Refinement
    Burghouts, Gertjan J.
    Huizinga, Wyke
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT II, 2022, 13232 : 418 - 428
  • [48] VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering
    Wang, Yanan
    Yasunaga, Michihiro
    Ren, Hongyu
    Wada, Shinya
    Leskovec, Jure
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21525 - 21535
  • [49] ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
    Masry, Ahmed
    Long, Do Xuan
    Tan, Jia Qing
    Joty, Shafiq
    Hogue, Enamul
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2263 - 2279
  • [50] Multimodal feature fusion by relational reasoning and attention for visual question answering
    Zhang, Weifeng
    Yu, Jing
    Hu, Hua
    Hu, Haiyang
    Qin, Zengchang
    INFORMATION FUSION, 2020, 55 (55) : 116 - 126