Question-guided feature pyramid network for medical visual question answering

被引：5

作者：

Yu, Yonglin ^{[1
]}

Li, Haifeng ^{[1
]}

Shi, Hanrong ^{[2
]}

Li, Lin ^{[2
]}

Xiao, Jun ^{[1
,2
]}

机构：

[1] Zhejiang Univ, Childrens Hosp, Natl Clin Res Ctr Child Hlth, Sch Med,Dept Rehabil, Hangzhou, Peoples R China

[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2023年 / 214卷

基金：

浙江省自然科学基金; 中国国家自然科学基金;

关键词：

Visual question answering; Feature pyramid network; Dynamic filter network;

D O I：

10.1016/j.eswa.2022.119148

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Medical VQA (VQA-Med) is a critical multi-modal task that raises attention from the community. Existing models utilized just one high-level feature map (i.e., the last layer feature map) extracted by CNN and then fused it with semantic features through the co-attention mechanism. However, only using the high-level feature as a visual feature often ignores the details of the image, which are crucial for VQA-Med. In addition, questions often serve as a guide to targets of attention in the medical image. Therefore, in this paper, we propose a question-guided Feature Pyramid Network (QFPN) for VQA-Med. It extracts multi-level visual features with a feature pyramid network (FPN). In this way, the multi-scale information of medical images can be captured by using the high resolution of low-level features and rich semantic information of high-level features. Besides, a novel question-guided dynamic filter network (DFN) is designed to modulate the fusion progress of multi-level visual features and semantic features with respect to the raised question. Extensive results have demonstrated the effectiveness of the QFPN. Especially, we beat the winner of the 2019 ImageCLEF challenge and achieved 63.8% Accuracy and 65.7% BLEU in the ImageCLEF 2019 VQA-Med dataset.

引用

页数：8

共 50 条

[41] Triple attention network for sentimental visual question answering
Ruwa, Nelson
Mao, Qirong
Song, Heping
Jia, Hongjie
Dong, Ming
COMPUTER VISION AND IMAGE UNDERSTANDING, 2019, 189
[42] Scene Graph Refinement Network for Visual Question Answering
Qian, Tianwen
Chen, Jingjing
Chen, Shaoxiang
Wu, Bo
Jiang, Yu-Gang
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 3950 - 3961
[43] Fair Attention Network for Robust Visual Question Answering
Bi Y.
Jiang H.
Hu Y.
Sun Y.
Yin B.
IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (09) : 1 - 1
[44] Question-guided stubborn set methods for state properties
L. M. Kristensen
K. Schmidt
A. Valmari
Formal Methods in System Design, 2006, 29 : 215 - 251
[45] TRANS-VQA: Fully Transformer-Based Image Question-Answering Model Using Question-guided Vision Attention
Koshti D.
Gupta A.
Kalla M.
Sharma A.
Inteligencia Artificial, 2024, 27 (73) : 111 - 128
[46] Locate Before Answering: Answer Guided Question Localization for Video Question Answering
Qian, Tianwen
Cui, Ran
Chen, Jingjing
Peng, Pai
Guo, Xiaowei
Jiang, Yu-Gang
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4554 - 4563
[47] TRANS-VQA: Fully Transformer-Based Image Question-Answering Model Using Question-guided Vision Attention
Koshti, Dipali
Gupta, Ashutosh
Kalla, Mukesh
Sharma, Arvind
INTELIGENCIA ARTIFICIAL-IBEROAMERICAL JOURNAL OF ARTIFICIAL INTELLIGENCE, 2024, 27 (73): : 111 - 128
[48] Deep Fuzzy Multi-Teacher Distillation Network for Medical Visual Question Answering
Liu Y.
Chen B.
Wang S.
Lu G.
Zhang Z.
IEEE Transactions on Fuzzy Systems, 2024, 32 (10) : 1 - 15
[49] Medical knowledge-based network for Patient-oriented Visual Question Answering
Jian, Huang
Chen, Yihao
Yong, Li
Yang, Zhenguo
Gong, Xuehao
Lee, Wang Fu
Xu, Xiaohong
Liu, Wenyin
INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (02)
[50] VQA: Visual Question Answering
Antol, Stanislaw
Agrawal, Aishwarya
Lu, Jiasen
Mitchell, Margaret
Batra, Dhruv
Zitnick, C. Lawrence
Parikh, Devi
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433

← 1 2 3 4 5 →