Parallel multi-head attention and term-weighted question embedding for medical visual question answering

被引：0

作者：

Sruthy Manmadhan

Binsu C Kovoor

机构：

[1] Cochin University of Science and Technology,Division of Information Technology

[2] NSS College of Engineering,Department of Computer Science and Engineering

来源：

Multimedia Tools and Applications | 2023年 / 82卷

关键词：

Multi-head attention; Denoising autoencoder; Radiology images; Supervised term weighting; Visual question answering; VQA-RAD;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The goal of medical visual question answering (Med-VQA) is to correctly answer a clinical question posed by a medical image. Medical images are fundamentally different from images in the general domain. As a result, using general domain Visual Question Answering (VQA) models to the medical domain is impossible. Furthermore, the large-scale data required by VQA models is rarely available in the medical arena. Existing approaches of medical visual question answering often rely on transfer learning with external data to generate good image feature representation and use cross-modal fusion of visual and language features to acclimate to the lack of labelled data. This research provides a new parallel multi-head attention framework (MaMVQA) for dealing with Med-VQA without the use of external data. The proposed framework addresses image feature extraction using the unsupervised Denoising Auto-Encoder (DAE) and language feature extraction using term-weighted question embedding. In addition, we present qf-MI, a unique supervised term-weighting (STW) scheme based on the concept of mutual information (MI) between the word and the corresponding class label. Extensive experimental findings on the VQA-RAD public medical VQA benchmark show that the proposed methodology outperforms previous state-of-the-art methods in terms of accuracy while requiring no external data to train the model. Remarkably, the presented MaMVQA model achieved significantly increased accuracy in predicting answers to both close-ended (78.68%) and open-ended (55.31%) questions. Also, an extensive set of ablations are studied to demonstrate the significance of individual components of the system.

引用

页码：34937 / 34958

页数：21

共 50 条

[1] Parallel multi-head attention and term-weighted question embedding for medical visual question answering
Manmadhan, Sruthy
Kovoor, Binsu C.
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (22) : 34937 - 34958
[2] Multi-Tier Attention Network using Term-weighted Question Features for Visual Question Answering
Manmadhan, Sruthy
Kovoor, Binsu C.
[J]. IMAGE AND VISION COMPUTING, 2021, 115
[3] An Enhanced Term Weighted Question Embedding for Visual Question Answering
Manmadhan, Sruthy
Kovoor, Binsu C.
[J]. JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2022, 21 (02)
[4] A Study of Visual Question Answering Techniques Based on Collaborative Multi-Head Attention
Yang, Yingli
Jin, Jingxuan
Li, De
[J]. 2023 3RD ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS TECHNOLOGY AND COMPUTER SCIENCE, ACCTCS, 2023, : 552 - 555
[5] Multi visual and textual embedding on visual question answering for blind people
Tung Le
Huy Tien Nguyen
Minh Le Nguyen
[J]. NEUROCOMPUTING, 2021, 465 : 451 - 464
[6] A Dual-Attention Learning Network With Word and Sentence Embedding for Medical Visual Question Answering
Huang, Xiaofei
Gong, Hongfang
[J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (02) : 832 - 845
[7] Question -Led object attention for visual question answering
Gao, Lianli
Cao, Liangfu
Xu, Xing
Shao, Jie
Song, Jingkuan
[J]. NEUROCOMPUTING, 2020, 391 : 227 - 233
[8] Question-Agnostic Attention for Visual Question Answering
Farazi, Moshiur
Khan, Salman
Barnes, Nick
[J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3542 - 3549
[9] Question Type Guided Attention in Visual Question Answering
Shi, Yang
Furlanello, Tommaso
Zha, Sheng
Anandkumar, Animashree
[J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 158 - 175
[10] Word-level dual channel with multi-head semantic attention interaction for community question answering
Wu, Jinmeng
Hong, Hanyu
Zhang, Yaozong
Hao, Yanbin
Ma, Lei
Wang, Lei
[J]. ELECTRONIC RESEARCH ARCHIVE, 2023, 31 (10): : 6012 - 6026

← 1 2 3 4 5 →