Medical visual question answering based on question-type reasoning and semantic space constraint

被引：11

作者：

Wang, Meiling ^{[1
]}

He, Xiaohai ^{[1
]}

Liu, Luping ^{[1
]}

Qing, Linbo ^{[1
]}

Chen, Honggang ^{[1
]}

Liu, Yan ^{[2
]}

Ren, Chao ^{[1
]}

机构：

[1] Sichuan Univ, Coll Elect & Informat Engn, Chengdu 610065, Sichuan, Peoples R China

[2] Southwest Jiaotong Univ, Dept Neurol, Affiliated Hosp, Peoples Hosp 3, Chengdu, Sichuan, Peoples R China

来源：

ARTIFICIAL INTELLIGENCE IN MEDICINE | 2022年 / 131卷

基金：

中国国家自然科学基金;

关键词：

Medical visual question answering; Question -type reasoning; Semantic space constraint; Attention mechanism; DYNAMIC MEMORY NETWORKS; LANGUAGE;

D O I：

10.1016/j.artmed.2022.102346

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Medical visual question answering (Med-VQA) aims to accurately answer clinical questions about medical images. Despite its enormous potential for application in the medical domain, the current technology is still in its infancy. Compared with general visual question answering task, Med-VQA task involve more demanding challenges. First, clinical questions about medical images are usually diverse due to different clinicians and the complexity of diseases. Consequently, noise is inevitably introduced when extracting question features. Second, Med-VQA task have always been regarded as a classification problem for predefined answers, ignoring the relationships between candidate responses. Thus, the Med-VQA model pays equal attention to all candidate answers when predicting answers. In this paper, a novel Med-VQA framework is proposed to alleviate the abovementioned problems. Specifically, we employed a question-type reasoning module severally to closed-ended and open-ended questions, thereby extracting the important information contained in the questions through an attention mechanism and filtering the noise to extract more valuable question features. To take advantage of the relational information between answers, we designed a semantic constraint space to calculate the similarity between the answers and assign higher attention to answers with high correlation. To evaluate the effectiveness of the proposed method, extensive experiments were conducted on a public dataset, namely VQA-RAD. Experimental results showed that the proposed method achieved better performance compared to other the state-ofthe-art methods. The overall accuracy, closed-ended accuracy, and open-ended accuracy reached 74.1 %, 82.7 %, and 60.9 %, respectively. It is worth noting that the absolute accuracy of the proposed method improved by 5.5 % for closed-ended questions.

引用

页数：11

共 50 条

[31] Multimodal Knowledge Reasoning for Enhanced Visual Question Answering
Hussain, Afzaal
Maqsood, Ifrah
Shahzad, Muhammad
Fraz, Muhammad Moazam
2022 16TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY & INTERNET-BASED SYSTEMS, SITIS, 2022, : 224 - 230
[32] Relational reasoning and adaptive fusion for visual question answering
Shen, Xiang
Han, Dezhi
Zong, Liang
Guo, Zihan
Hua, Jie
APPLIED INTELLIGENCE, 2024, 54 (06) : 5062 - 5080
[33] INTERPRETABLE VISUAL QUESTION ANSWERING VIA REASONING SUPERVISION
Parelli, Maria
Mallis, Dimitrios
Diomataris, Markos
Pitsikalis, Vassilis
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2525 - 2529
[34] MUREL: Multimodal Relational Reasoning for Visual Question Answering
Cadene, Remi
Ben-younes, Hedi
Cord, Matthieu
Thome, Nicolas
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1989 - 1998
[35] Maintaining Reasoning Consistency in Compositional Visual Question Answering
Jing, Chenchen
Jia, Yunde
Wu, Yuwei
Liu, Xinyu
Wu, Qi
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5089 - 5098
[36] A DIAGNOSTIC STUDY OF VISUAL QUESTION ANSWERING WITH ANALOGICAL REASONING
Huang, Ziqi
Zhu, Hongyuan
Sun, Ying
Choi, Dongkyu
Tan, Cheston
Lim, Joo-Hwee
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2463 - 2467
[37] A Semantic Parsing and Reasoning-Based Approach to Knowledge Base Question Answering
Abdelaziz, Ibrahim
Ravishankar, Srinivas
Kapanipathi, Pavan
Roukos, Salim
Gray, Alexander
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 15985 - 15987
[38] A Transformer-based Medical Visual Question Answering Model
Liu, Lei
Su, Xiangdong
Guo, Hui
Zhu, Daobin
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1712 - 1718
[39] A Question-Centric Model for Visual Question Answering in Medical Imaging
Vu, Minh H.
Lofstedt, Tommy
Nyholm, Tufve
Sznitman, Raphael
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (09) : 2856 - 2868
[40] Reasoning with large language models for medical question answering
Lucas, Mary M.
Yang, Justin
Pomeroy, Jon K.
Yang, Christopher C.
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09)

← 1 2 3 4 5 →