Medical Visual Question Answering via Conditional Reasoning

被引：58

作者：

Zhan, Li-Ming ^{[1
]}

Liu, Bo ^{[1
]}

Fan, Lu ^{[1
]}

Chen, Jiaxin ^{[1
]}

Wu, Xiao-Ming ^{[1
]}

机构：

[1] Hong Kong Polytech Univ, Hong Kong, Peoples R China

来源：

MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年

关键词：

medical visual question answering; attention mechanism; conditional reasoning;

D O I：

10.1145/3394171.3413761

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Medical visual question answering (Med-VQA) aims to accurately answer a clinical question presented with a medical image. Despite its enormous potential in healthcare industry and services, the technology is still in its infancy and is far from practical use. Med-VQA tasks are highly challenging due to the massive diversity of clinical questions and the disparity of required visual reasoning skills for different types of questions. In this paper, we propose a novel conditional reasoning framework for Med-VQA, aiming to automatically learn effective reasoning skills for various Med-VQA tasks. Particularly, we develop a question-conditioned reasoning module to guide the importance selection over multimodal fusion features. Considering the different nature of closed-ended and open-ended Med-VQA tasks, we further propose a type-conditioned reasoning module to learn a different set of reasoning skills for the two types of tasks separately. Our conditional reasoning framework can be easily applied to existing Med-VQA systems to bring performance gains. In the experiments, we build our system on top of a recent state-of-the-art Med-VQA model and evaluate it on the VQA-RAD benchmark [23]. Remarkably, our system achieves significantly increased accuracy in predicting answers to both closed-ended and open-ended questions, especially for open-ended questions, where a 10.8% increase in absolute accuracy is obtained. The source code can be downloaded from https://github.com/awenbocc/med-vqa.

引用

页码：2345 / 2354

页数：10

共 50 条

[41] A Study on Effectiveness of BERT Models and Task-Conditioned Reasoning Strategy for Medical Visual Question Answering
Nguyen, Chau
Le, Tung
Le, Nguyen-Khang
Pham, Trung-Tin
Nguyen, Le-Minh
ARTIFICIAL INTELLIGENCE FOR COMMUNICATIONS AND NETWORKS, AICON 2022, 2023, 477 : 60 - 71
[42] Medical visual question answering via corresponding feature fusion combined with semantic attention
Zhu, Han
He, Xiaohai
Wang, Meiling
Zhang, Mozhi
Qing, Linbo
MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2022, 19 (10) : 10192 - 10212
[43] Temporal knowledge graph question answering via subgraph reasoning
Chen, Ziyang
Zhao, Xiang
Liao, Jinzhi
Li, Xinyi
Kanoulas, Evangelos
KNOWLEDGE-BASED SYSTEMS, 2022, 251
[44] A Question-Centric Model for Visual Question Answering in Medical Imaging
Vu, Minh H.
Lofstedt, Tommy
Nyholm, Tufve
Sznitman, Raphael
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (09) : 2856 - 2868
[45] TYPE-AWARE MEDICAL VISUAL QUESTION ANSWERING
Zhang, Anda
Tao, Wei
Li, Ziyan
Wang, Haofen
Zhang, Wenqiang
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4838 - 4842
[46] Overcoming Data Limitation in Medical Visual Question Answering
Nguyen, Binh D.
Thanh-Toan Do
Nguyen, Binh X.
Do, Tuong
Tjiputra, Erman
Tran, Quang D.
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT IV, 2019, 11767 : 522 - 530
[47] Coarse-to-Fine Visual Question Answering by Iterative, Conditional Refinement
Burghouts, Gertjan J.
Huizinga, Wyke
IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT II, 2022, 13232 : 418 - 428
[48] VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering
Wang, Yanan
Yasunaga, Michihiro
Ren, Hongyu
Wada, Shinya
Leskovec, Jure
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21525 - 21535
[49] ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
Masry, Ahmed
Long, Do Xuan
Tan, Jia Qing
Joty, Shafiq
Hogue, Enamul
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2263 - 2279
[50] Multimodal feature fusion by relational reasoning and attention for visual question answering
Zhang, Weifeng
Yu, Jing
Hu, Hua
Hu, Haiyang
Qin, Zengchang
INFORMATION FUSION, 2020, 55 (55) : 116 - 126

← 1 2 3 4 5 →