Medical visual question answering with symmetric interaction attention and cross-modal gating

被引：0

作者：

Chen, Zhi ^{[1
]}

Zou, Beiji ^{[1
]}

Dai, Yulan ^{[1
]}

Zhu, Chengzhang ^{[1
]}

Kong, Guilan ^{[2
]}

Zhang, Wensheng ^{[3
]}

机构：

[1] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Peoples R China

[2] Peking Univ, Natl Inst Hlth Data Sci, Beijing 100871, Peoples R China

[3] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China

来源：

BIOMEDICAL SIGNAL PROCESSING AND CONTROL | 2023年 / 85卷

关键词：

Medical visual question answering; Self-attention; Information interaction; Cross-modal gating;

D O I：

10.1016/j.bspc.2023.105049

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

The purpose of medical visual question answering (Med-VQA) is to provide accurate answers to clinical questions related to visual content of medical images. However, previous attempts neglect to take full advantage of the information interaction between medical images and clinical questions, which hinders the further progress of Med-VQA. The above issue requires the efforts to focus on critical information interaction within each modality and relevant information interaction between modalities. In this paper, we utilize the multiple meta-model quantifying model as visual encoder and the GloVe word embedding followed by the LSTM as textual encoder to form our feature extraction module. Then, we design a symmetric interaction attention module to construct dense and deep intra-and inter-modal information interaction on medical images and clinical questions for the Med-VQA task. Specifically, the symmetric interaction attention module consists of multiple symmetric interaction attention blocks that contain two basic units, i.e., self-attention and interaction attention. Technically, self-attention is introduced for intra-modal information interaction, while interaction attention is constructed for inter-modal information interaction. In addition, we develop a multi-modal fusion scheme that leverages the cross-modal gating to effectively fuse multi-modal information and avoid redundant information after sufficient intra-and inter-modal information interaction. Experimental results on the VQA-RAD dataset and PathVQA dataset show that our method outperforms other state-of-the-art Med-VQA models, achieving 74.7% and 48.7% on accuracy, 73.5% and 46.0% on F1-score, respectively.

引用

页数：10

共 50 条

[1] Visual question answering with attention transfer and a cross-modal gating mechanism
Li, Wei
Sun, Jianhui
Liu, Ge
Zhao, Linglan
Fang, Xiangzhong
[J]. PATTERN RECOGNITION LETTERS, 2020, 133 (133) : 334 - 340
[2] Asymmetric cross-modal attention network with multimodal augmented mixup for medical visual question answering
Li, Yong
Yang, Qihao
Wang, Fu Lee
Lee, Lap-Kei
Qu, Yingying
Hao, Tianyong
[J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2023, 144
[3] Cross-Modal Multistep Fusion Network With Co-Attention for Visual Question Answering
Lao, Mingrui
Guo, Yanming
Wang, Hui
Zhang, Xin
[J]. IEEE ACCESS, 2018, 6 : 31516 - 31524
[4] Cross-Modal Visual Question Answering for Remote Sensing Data
Felix, Rafael
Repasky, Boris
Hodge, Samuel
Zolfaghari, Reza
Abbasnejad, Ehsan
Sherrah, Jamie
[J]. 2021 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA 2021), 2021, : 57 - 65
[5] Cross-modal Relational Reasoning Network for Visual Question Answering
Chen, Hongyu
Liu, Ruifang
Peng, Bo
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3939 - 3948
[6] Cross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering
Gong, Haifan
Chen, Guanqi
Liu, Sishuo
Yu, Yizhou
Li, Guanbin
[J]. PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 456 - 460
[7] Cross-Modal Retrieval for Knowledge-Based Visual Question Answering
Lerner, Paul
Ferret, Olivier
Guinaudeau, Camille
[J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I, 2024, 14608 : 421 - 438
[8] Reasoning on the Relation: Enhancing Visual Representation for Visual Question Answering and Cross-Modal Retrieval
Yu, Jing
Zhang, Weifeng
Lu, Yuhang
Qin, Zengchang
Hu, Yue
Tan, Jianlong
Wu, Qi
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (12) : 3196 - 3209
[9] Cross-modal knowledge reasoning for knowledge-based visual question answering
Yu, Jing
Zhu, Zihao
Wang, Yujing
Zhang, Weifeng
Hu, Yue
Tan, Jianlong
[J]. PATTERN RECOGNITION, 2020, 108
[10] Cross-Modal Dense Passage Retrieval for Outside Knowledge Visual Question Answering
Reichman, Benjamin
Heck, Larry
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2829 - 2834

← 1 2 3 4 5 →