A Dual-Attention Learning Network With Word and Sentence Embedding for Medical Visual Question Answering

被引:0
|
作者
Huang, Xiaofei [1 ]
Gong, Hongfang [1 ]
机构
[1] Changsha Univ Sci & Technol, Sch Math & Stat, Changsha 410114, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Visualization; Medical diagnostic imaging; Data mining; Question answering (information retrieval); Task analysis; Cognition; Medical visual question answering; double embedding; medical information; guided attention; visual reasoning; MODEL;
D O I
10.1109/TMI.2023.3322868
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Research in medical visual question answering (MVQA) can contribute to the development of computer-aided diagnosis. MVQA is a task that aims to predict accurate and convincing answers based on given medical images and associated natural language questions. This task requires extracting medical knowledge-rich feature content and making fine-grained understandings of them. Therefore, constructing an effective feature extraction and understanding scheme are keys to modeling. Existing MVQA question extraction schemes mainly focus on word information, ignoring medical information in the text, such as medical concepts and domain-specific terms. Meanwhile, some visual and textual feature understanding schemes cannot effectively capture the correlation between regions and keywords for reasonable visual reasoning. In this study, a dual-attention learning network with word and sentence embedding (DALNet-WSE) is proposed. We design a module, transformer with sentence embedding (TSE), to extract a double embedding representation of questions containing keywords and medical information. A dual-attention learning (DAL) module consisting of self-attention and guided attention is proposed to model intensive intramodal and intermodal interactions. With multiple DAL modules (DALs), learning visual and textual co-attention can increase the granularity of understanding and improve visual reasoning. Experimental results on the ImageCLEF 2019 VQA-MED (VQA-MED 2019) and VQA-RAD datasets demonstrate that our proposed method outperforms previous state-of-the-art methods. According to the ablation studies and Grad-CAM maps, DALNet-WSE can extract rich textual information and has strong visual reasoning ability.
引用
收藏
页码:832 / 845
页数:14
相关论文
共 50 条
  • [1] Learning knowledge graph embedding with a dual-attention embedding network
    Fang, Haichuan
    Wang, Youwei
    Tian, Zhen
    Ye, Yangdong
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 212
  • [2] Exploiting Sentence Embedding for Medical Question Answering
    Hao, Yu
    Liu, Xien
    Wu, Ji
    Lv, Ping
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 938 - 945
  • [3] Word-to-region attention network for visual question answering
    Liang Peng
    Yang Yang
    Yi Bin
    Ning Xie
    Fumin Shen
    Yanli Ji
    Xing Xu
    [J]. Multimedia Tools and Applications, 2019, 78 : 3843 - 3858
  • [4] Word-to-region attention network for visual question answering
    Peng, Liang
    Yang, Yang
    Bin, Yi
    Xie, Ning
    Shen, Fumin
    Ji, Yanli
    Xu, Xing
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 3843 - 3858
  • [5] Co-attention Network for Visual Question Answering Based on Dual Attention
    Dong, Feng
    Wang, Xiaofeng
    Oad, Ammar
    Talpur, Mir Sajjad Hussain
    [J]. Journal of Engineering Science and Technology Review, 2021, 14 (06) : 116 - 123
  • [6] Parallel multi-head attention and term-weighted question embedding for medical visual question answering
    Manmadhan, Sruthy
    Kovoor, Binsu C.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (22) : 34937 - 34958
  • [7] Parallel multi-head attention and term-weighted question embedding for medical visual question answering
    Sruthy Manmadhan
    Binsu C Kovoor
    [J]. Multimedia Tools and Applications, 2023, 82 : 34937 - 34958
  • [8] Flexible Sentence Analysis Model for Visual Question Answering Network
    Deng, Wei
    Wang, Jianming
    Wang, Shengbei
    Jin, Guanghao
    [J]. 2018 2ND INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND BIOINFORMATICS (ICBEB 2018), 2018, : 89 - 95
  • [9] Dual Attention and Question Categorization-Based Visual Question Answering
    Mishra, Aakansha
    Anand, Ashish
    Guha, Prithwijit
    [J]. IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 81 - 91
  • [10] Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering
    Zhou, Guangyou
    He, Tingting
    Zhao, Jun
    Hu, Po
    [J]. PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, 2015, : 250 - 259