Question-guided feature pyramid network for medical visual question answering

被引：5

作者：

Yu, Yonglin ^{[1
]}

Li, Haifeng ^{[1
]}

Shi, Hanrong ^{[2
]}

Li, Lin ^{[2
]}

Xiao, Jun ^{[1
,2
]}

机构：

[1] Zhejiang Univ, Childrens Hosp, Natl Clin Res Ctr Child Hlth, Sch Med,Dept Rehabil, Hangzhou, Peoples R China

[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2023年 / 214卷

基金：

浙江省自然科学基金; 中国国家自然科学基金;

关键词：

Visual question answering; Feature pyramid network; Dynamic filter network;

D O I：

10.1016/j.eswa.2022.119148

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Medical VQA (VQA-Med) is a critical multi-modal task that raises attention from the community. Existing models utilized just one high-level feature map (i.e., the last layer feature map) extracted by CNN and then fused it with semantic features through the co-attention mechanism. However, only using the high-level feature as a visual feature often ignores the details of the image, which are crucial for VQA-Med. In addition, questions often serve as a guide to targets of attention in the medical image. Therefore, in this paper, we propose a question-guided Feature Pyramid Network (QFPN) for VQA-Med. It extracts multi-level visual features with a feature pyramid network (FPN). In this way, the multi-scale information of medical images can be captured by using the high resolution of low-level features and rich semantic information of high-level features. Besides, a novel question-guided dynamic filter network (DFN) is designed to modulate the fusion progress of multi-level visual features and semantic features with respect to the raised question. Extensive results have demonstrated the effectiveness of the QFPN. Especially, we beat the winner of the 2019 ImageCLEF challenge and achieved 63.8% Accuracy and 65.7% BLEU in the ImageCLEF 2019 VQA-Med dataset.

引用

页数：8

共 50 条

[21] Compound-Attention Network with Original Feature injection for visual question and answering
Wu, Chunlei
Lu, Jing
Li, Haisheng
Wu, Jie
Duan, Hailong
Yuan, Shaozu
SIGNAL IMAGE AND VIDEO PROCESSING, 2021, 15 (08) : 1853 - 1861
[22] Co-Attention Network With Question Type for Visual Question Answering
Yang, Chao
Jiang, Mengqi
Jiang, Bin
Zhou, Weixin
Li, Keqin
IEEE ACCESS, 2019, 7 : 40771 - 40781
[23] An Answer FeedBack Network for Visual Question Answering
Tian, Weidong
Tian, Ruihua
Zhao, Zhongqiu
Ren, Quan
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[24] M2FNet: Multi-granularity Feature Fusion Network for Medical Visual Question Answering
Wang, He
Pan, Haiwei
Zhang, Kejia
He, Shuning
Chen, Chunling
PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2022, 13630 : 141 - 154
[25] Medical visual question answering via corresponding feature fusion combined with semantic attention
Zhu, Han
He, Xiaohai
Wang, Meiling
Zhang, Mozhi
Qing, Linbo
MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2022, 19 (10) : 10192 - 10212
[26] Hierarchical deep multi-modal network for medical visual question answering
Gupta D.
Suman S.
Ekbal A.
Expert Systems with Applications, 2021, 164
[27] Dual Self-Guided Attention with Sparse Question Networks for Visual Question Answering
Shen, Xiang
Han, Dezhi
Chang, Chin-Chen
Zong, Liang
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (04) : 785 - 796
[28] Medical Visual Question Answering via Conditional Reasoning
Zhan, Li-Ming
Liu, Bo
Fan, Lu
Chen, Jiaxin
Wu, Xiao-Ming
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2345 - 2354
[29] TYPE-AWARE MEDICAL VISUAL QUESTION ANSWERING
Zhang, Anda
Tao, Wei
Li, Ziyan
Wang, Haofen
Zhang, Wenqiang
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4838 - 4842
[30] Overcoming Data Limitation in Medical Visual Question Answering
Nguyen, Binh D.
Thanh-Toan Do
Nguyen, Binh X.
Do, Tuong
Tjiputra, Erman
Tran, Quang D.
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT IV, 2019, 11767 : 522 - 530

← 1 2 3 4 5 →