Parallel multi-head attention and term-weighted question embedding for medical visual question answering

被引:0
|
作者
Sruthy Manmadhan
Binsu C Kovoor
机构
[1] Cochin University of Science and Technology,Division of Information Technology
[2] NSS College of Engineering,Department of Computer Science and Engineering
来源
关键词
Multi-head attention; Denoising autoencoder; Radiology images; Supervised term weighting; Visual question answering; VQA-RAD;
D O I
暂无
中图分类号
学科分类号
摘要
The goal of medical visual question answering (Med-VQA) is to correctly answer a clinical question posed by a medical image. Medical images are fundamentally different from images in the general domain. As a result, using general domain Visual Question Answering (VQA) models to the medical domain is impossible. Furthermore, the large-scale data required by VQA models is rarely available in the medical arena. Existing approaches of medical visual question answering often rely on transfer learning with external data to generate good image feature representation and use cross-modal fusion of visual and language features to acclimate to the lack of labelled data. This research provides a new parallel multi-head attention framework (MaMVQA) for dealing with Med-VQA without the use of external data. The proposed framework addresses image feature extraction using the unsupervised Denoising Auto-Encoder (DAE) and language feature extraction using term-weighted question embedding. In addition, we present qf-MI, a unique supervised term-weighting (STW) scheme based on the concept of mutual information (MI) between the word and the corresponding class label. Extensive experimental findings on the VQA-RAD public medical VQA benchmark show that the proposed methodology outperforms previous state-of-the-art methods in terms of accuracy while requiring no external data to train the model. Remarkably, the presented MaMVQA model achieved significantly increased accuracy in predicting answers to both close-ended (78.68%) and open-ended (55.31%) questions. Also, an extensive set of ablations are studied to demonstrate the significance of individual components of the system.
引用
收藏
页码:34937 / 34958
页数:21
相关论文
共 50 条
  • [1] Parallel multi-head attention and term-weighted question embedding for medical visual question answering
    Manmadhan, Sruthy
    Kovoor, Binsu C.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (22) : 34937 - 34958
  • [2] Multi-Tier Attention Network using Term-weighted Question Features for Visual Question Answering
    Manmadhan, Sruthy
    Kovoor, Binsu C.
    [J]. IMAGE AND VISION COMPUTING, 2021, 115
  • [3] An Enhanced Term Weighted Question Embedding for Visual Question Answering
    Manmadhan, Sruthy
    Kovoor, Binsu C.
    [J]. JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2022, 21 (02)
  • [4] A Study of Visual Question Answering Techniques Based on Collaborative Multi-Head Attention
    Yang, Yingli
    Jin, Jingxuan
    Li, De
    [J]. 2023 3RD ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS TECHNOLOGY AND COMPUTER SCIENCE, ACCTCS, 2023, : 552 - 555
  • [5] Multi visual and textual embedding on visual question answering for blind people
    Tung Le
    Huy Tien Nguyen
    Minh Le Nguyen
    [J]. NEUROCOMPUTING, 2021, 465 : 451 - 464
  • [6] A Dual-Attention Learning Network With Word and Sentence Embedding for Medical Visual Question Answering
    Huang, Xiaofei
    Gong, Hongfang
    [J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (02) : 832 - 845
  • [7] Question -Led object attention for visual question answering
    Gao, Lianli
    Cao, Liangfu
    Xu, Xing
    Shao, Jie
    Song, Jingkuan
    [J]. NEUROCOMPUTING, 2020, 391 : 227 - 233
  • [8] Question-Agnostic Attention for Visual Question Answering
    Farazi, Moshiur
    Khan, Salman
    Barnes, Nick
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3542 - 3549
  • [9] Question Type Guided Attention in Visual Question Answering
    Shi, Yang
    Furlanello, Tommaso
    Zha, Sheng
    Anandkumar, Animashree
    [J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 158 - 175
  • [10] Word-level dual channel with multi-head semantic attention interaction for community question answering
    Wu, Jinmeng
    Hong, Hanyu
    Zhang, Yaozong
    Hao, Yanbin
    Ma, Lei
    Wang, Lei
    [J]. ELECTRONIC RESEARCH ARCHIVE, 2023, 31 (10): : 6012 - 6026