Multi-modal multi-head self-attention for medical VQA

被引:0
|
作者
Vasudha Joshi
Pabitra Mitra
Supratik Bose
机构
[1] Computer Science and Engineering,
[2] Indian Institute of Technology,undefined
[3] Varian Medical Systems Inc.,undefined
来源
关键词
Medical visual question answering; Multi-head self-attention; DistilBERT; VQA-Med 2019;
D O I
暂无
中图分类号
学科分类号
摘要
Medical Visual Question answering (MedVQA) systems provide answers to questions based on radiology images. Medical images are more complex than general images. They have low contrast and are very similar to one another. The difference between medical images can only be understood by medical practitioners. While general images have very high quality and their differences can easily be spotted by anyone. Therefore, methods used for general-domain Visual Question Answering (VQA) Systems can not be used directly. The performance of MedVQA systems depends mainly on the method used to combine the features of the two input modalities: medical image and question. In this work, we propose an architecturally simple fusion strategy that uses multi-head self-attention to combine medical images and questions of the VQA-Med dataset of the ImageCLEF 2019 challenge. The model captures long-range dependencies between input modalities using the attention mechanism of the Transformer. We have experimentally shown that the representational power of the model is improved by increasing the length of the embeddings, used in the transformer. We have achieved an overall accuracy of 60.0% which improves by 1.35% from the existing model. We have also performed the ablation study to elucidate the importance of each model component.
引用
收藏
页码:42585 / 42608
页数:23
相关论文
共 50 条
  • [1] Multi-modal multi-head self-attention for medical VQA
    Joshi, Vasudha
    Mitra, Pabitra
    Bose, Supratik
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (14) : 42585 - 42608
  • [2] Multi-Modal Fusion Network with Multi-Head Self-Attention for Injection Training Evaluation in Medical Education
    Li, Zhe
    Kanazuka, Aya
    Hojo, Atsushi
    Nomura, Yukihiro
    Nakaguchi, Toshiya
    ELECTRONICS, 2024, 13 (19)
  • [3] Multi-modal feature fusion with multi-head self-attention for epileptic EEG signals
    Huang, Ning
    Xi, Zhengtao
    Jiao, Yingying
    Zhang, Yudong
    Jiao, Zhuqing
    Li, Xiaona
    Mathematical Biosciences and Engineering, 2024, 21 (08) : 6918 - 6935
  • [4] Dual-stream fusion network with multi-head self-attention for multi-modal fake news detection
    Yang, Yimei
    Liu, Jinping
    Yang, Yujun
    Cen, Lihui
    APPLIED SOFT COMPUTING, 2024, 167
  • [5] Adaptive Pruning for Multi-Head Self-Attention
    Messaoud, Walid
    Trabelsi, Rim
    Cabani, Adnane
    Abdelkefi, Fatma
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2023, PT II, 2023, 14126 : 48 - 57
  • [6] Multi-Head Attention for Multi-Modal Joint Vehicle Motion Forecasting
    Mercat, Jean
    Gilles, Thomas
    El Zoghby, Nicole
    Sandou, Guillaume
    Beauvois, Dominique
    Gil, Guillermo Pita
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 9638 - 9644
  • [7] Neural News Recommendation with Multi-Head Self-Attention
    Wu, Chuhan
    Wu, Fangzhao
    Ge, Suyu
    Qi, Tao
    Huang, Yongfeng
    Xie, Xing
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6389 - 6394
  • [8] Multi-head attention fusion networks for multi-modal speech emotion recognition
    Zhang, Junfeng
    Xing, Lining
    Tan, Zhen
    Wang, Hongsen
    Wang, Kesheng
    COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 168
  • [9] Transformer Encoder With Multi-Modal Multi-Head Attention for Continuous Affect Recognition
    Chen, Haifeng
    Jiang, Dongmei
    Sahli, Hichem
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 4171 - 4183
  • [10] Masked multi-head self-attention for causal speech enhancement
    Nicolson, Aaron
    Paliwal, Kuldip K.
    SPEECH COMMUNICATION, 2020, 125 : 80 - 96