Multi-modal multi-head self-attention for medical VQA

被引:0
|
作者
Vasudha Joshi
Pabitra Mitra
Supratik Bose
机构
[1] Computer Science and Engineering,
[2] Indian Institute of Technology,undefined
[3] Varian Medical Systems Inc.,undefined
来源
Multimedia Tools and Applications | 2024年 / 83卷
关键词
Medical visual question answering; Multi-head self-attention; DistilBERT; VQA-Med 2019;
D O I
暂无
中图分类号
学科分类号
摘要
Medical Visual Question answering (MedVQA) systems provide answers to questions based on radiology images. Medical images are more complex than general images. They have low contrast and are very similar to one another. The difference between medical images can only be understood by medical practitioners. While general images have very high quality and their differences can easily be spotted by anyone. Therefore, methods used for general-domain Visual Question Answering (VQA) Systems can not be used directly. The performance of MedVQA systems depends mainly on the method used to combine the features of the two input modalities: medical image and question. In this work, we propose an architecturally simple fusion strategy that uses multi-head self-attention to combine medical images and questions of the VQA-Med dataset of the ImageCLEF 2019 challenge. The model captures long-range dependencies between input modalities using the attention mechanism of the Transformer. We have experimentally shown that the representational power of the model is improved by increasing the length of the embeddings, used in the transformer. We have achieved an overall accuracy of 60.0% which improves by 1.35% from the existing model. We have also performed the ablation study to elucidate the importance of each model component.
引用
收藏
页码:42585 / 42608
页数:23
相关论文
共 50 条
  • [31] Hunt for Unseen Intrusion: Multi-Head Self-Attention Neural Detector
    Seo, Seongyun
    Han, Sungmin
    Park, Janghyeon
    Shim, Shinwoo
    Ryu, Han-Eul
    Cho, Byoungmo
    Lee, Sangkyun
    IEEE ACCESS, 2021, 9 : 129635 - 129647
  • [32] Speech enhancement method based on the multi-head self-attention mechanism
    Chang X.
    Zhang Y.
    Yang L.
    Kou J.
    Wang X.
    Xu D.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2020, 47 (01): : 104 - 110
  • [33] Multi-modal Scene Recognition Based on Global Self-attention Mechanism
    Li, Xiang
    Sun, Ning
    Liu, Jixin
    Chai, Lei
    Sun, Haian
    ADVANCES IN NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, ICNC-FSKD 2022, 2023, 153 : 109 - 121
  • [34] Modality attention fusion model with hybrid multi-head self-attention for video understanding
    Zhuang, Xuqiang
    Liu, Fang'al
    Hou, Jian
    Hao, Jianhua
    Cai, Xiaohong
    PLOS ONE, 2022, 17 (10):
  • [35] Multi-label Text Classification Based on BiGRU and Multi-Head Self-Attention Mechanism
    Luo, Tongtong
    Shi, Nan
    Jin, Meilin
    Qin, Aolong
    Tang, Jiacheng
    Wang, Xihan
    Gao, Quanli
    Shao, Lianhe
    2024 3RD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND MEDIA COMPUTING, ICIPMC 2024, 2024, : 204 - 210
  • [36] A HYBRID TEXT NORMALIZATION SYSTEM USING MULTI-HEAD SELF-ATTENTION FOR MANDARIN
    Zhang, Junhui
    Pan, Junjie
    Yin, Xiang
    Li, Chen
    Liu, Shichao
    Zhang, Yang
    Wang, Yuxuan
    Ma, Zejun
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6694 - 6698
  • [37] Integration of Multi-Head Self-Attention and Convolution for Person Re-Identification
    Zhou, Yalei
    Liu, Peng
    Cui, Yue
    Liu, Chunguang
    Duan, Wenli
    SENSORS, 2022, 22 (16)
  • [38] Multi-Head Modularization to Leverage Generalization Capability in Multi-Modal Networks
    Lee, Jun-Tae
    Park, Hyunsin
    Yun, Sungrack
    Chang, Simyung
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7354 - 7362
  • [39] Text summarization based on multi-head self-attention mechanism and pointer network
    Qiu, Dong
    Yang, Bing
    COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (01) : 555 - 567
  • [40] Lip Recognition Based on Bi-GRU with Multi-Head Self-Attention
    Ni, Ran
    Jiang, Haiyang
    Zhou, Lu
    Lu, Yuanyao
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, PT III, AIAI 2024, 2024, 713 : 99 - 110