Multi-modal multi-head self-attention for medical VQA

被引:0
|
作者
Vasudha Joshi
Pabitra Mitra
Supratik Bose
机构
[1] Computer Science and Engineering,
[2] Indian Institute of Technology,undefined
[3] Varian Medical Systems Inc.,undefined
来源
Multimedia Tools and Applications | 2024年 / 83卷
关键词
Medical visual question answering; Multi-head self-attention; DistilBERT; VQA-Med 2019;
D O I
暂无
中图分类号
学科分类号
摘要
Medical Visual Question answering (MedVQA) systems provide answers to questions based on radiology images. Medical images are more complex than general images. They have low contrast and are very similar to one another. The difference between medical images can only be understood by medical practitioners. While general images have very high quality and their differences can easily be spotted by anyone. Therefore, methods used for general-domain Visual Question Answering (VQA) Systems can not be used directly. The performance of MedVQA systems depends mainly on the method used to combine the features of the two input modalities: medical image and question. In this work, we propose an architecturally simple fusion strategy that uses multi-head self-attention to combine medical images and questions of the VQA-Med dataset of the ImageCLEF 2019 challenge. The model captures long-range dependencies between input modalities using the attention mechanism of the Transformer. We have experimentally shown that the representational power of the model is improved by increasing the length of the embeddings, used in the transformer. We have achieved an overall accuracy of 60.0% which improves by 1.35% from the existing model. We have also performed the ablation study to elucidate the importance of each model component.
引用
收藏
页码:42585 / 42608
页数:23
相关论文
共 50 条
  • [21] Enlivening Redundant Heads in Multi-head Self-attention for Machine Translation
    Zhang, Tianfu
    Huang, Heyan
    Feng, Chong
    Cao, Longbing
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3238 - 3248
  • [22] Convolutional multi-head self-attention on memory for aspect sentiment classification
    Zhang, Yaojie
    Xu, Bing
    Zhao, Tiejun
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2020, 7 (04) : 1038 - 1044
  • [23] SPEECH ENHANCEMENT USING SELF-ADAPTATION AND MULTI-HEAD SELF-ATTENTION
    Koizumi, Yuma
    Yatabe, Kohei
    Delcroix, Marc
    Masuyama, Yoshiki
    Takeuchi, Daiki
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 181 - 185
  • [24] IS CROSS-ATTENTION PREFERABLE TO SELF-ATTENTION FOR MULTI-MODAL EMOTION RECOGNITION?
    Rajan, Vandana
    Brutti, Alessio
    Cavallaro, Andrea
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4693 - 4697
  • [25] Lane Detection Method Based on Improved Multi-Head Self-Attention
    Ge, Zekun
    Tao, Fazhan
    Fu, Zhumu
    Song, Shuzhong
    Computer Engineering and Applications, 60 (02): : 264 - 271
  • [26] MSIN: An Efficient Multi-head Self-attention Framework for Inertial Navigation
    Shi, Gaotao
    Pan, Bingjia
    Ni, Yuzhi
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2023, PT I, 2024, 14487 : 455 - 473
  • [27] Local Multi-Head Channel Self-Attention for Facial Expression Recognition
    Pecoraro, Roberto
    Basile, Valerio
    Bono, Viviana
    INFORMATION, 2022, 13 (09)
  • [28] Convolutional Multi-Head Self-Attention on Memory for Aspect Sentiment Classification
    Yaojie Zhang
    Bing Xu
    Tiejun Zhao
    IEEE/CAA Journal of Automatica Sinica, 2020, 7 (04) : 1038 - 1044
  • [29] SQL Injection Detection Based on Lightweight Multi-Head Self-Attention
    Lo, Rui-Teng
    Hwang, Wen-Jyi
    Tai, Tsung-Ming
    APPLIED SCIENCES-BASEL, 2025, 15 (02):
  • [30] MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding
    Park, Geondo
    Han, Chihye
    Kim, Daeshik
    Yoon, Wonjun
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1507 - 1515