Question-guided feature pyramid network for medical visual question answering

被引:5
|
作者
Yu, Yonglin [1 ]
Li, Haifeng [1 ]
Shi, Hanrong [2 ]
Li, Lin [2 ]
Xiao, Jun [1 ,2 ]
机构
[1] Zhejiang Univ, Childrens Hosp, Natl Clin Res Ctr Child Hlth, Sch Med,Dept Rehabil, Hangzhou, Peoples R China
[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China
基金
浙江省自然科学基金; 中国国家自然科学基金;
关键词
Visual question answering; Feature pyramid network; Dynamic filter network;
D O I
10.1016/j.eswa.2022.119148
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical VQA (VQA-Med) is a critical multi-modal task that raises attention from the community. Existing models utilized just one high-level feature map (i.e., the last layer feature map) extracted by CNN and then fused it with semantic features through the co-attention mechanism. However, only using the high-level feature as a visual feature often ignores the details of the image, which are crucial for VQA-Med. In addition, questions often serve as a guide to targets of attention in the medical image. Therefore, in this paper, we propose a question-guided Feature Pyramid Network (QFPN) for VQA-Med. It extracts multi-level visual features with a feature pyramid network (FPN). In this way, the multi-scale information of medical images can be captured by using the high resolution of low-level features and rich semantic information of high-level features. Besides, a novel question-guided dynamic filter network (DFN) is designed to modulate the fusion progress of multi-level visual features and semantic features with respect to the raised question. Extensive results have demonstrated the effectiveness of the QFPN. Especially, we beat the winner of the 2019 ImageCLEF challenge and achieved 63.8% Accuracy and 65.7% BLEU in the ImageCLEF 2019 VQA-Med dataset.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Question-Guided Semantic Dual-Graph Visual Reasoning with Novel Answers
    Zhou, Xinzhe
    Mu, Yadong
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 411 - 419
  • [32] Text-Guided Dual-Branch Attention Network for Visual Question Answering
    Li, Mengfei
    Gu, Li
    Ji, Yi
    Liu, Chunping
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT III, 2018, 11166 : 750 - 760
  • [33] Scale-guided Fusion Inference Network for Remote Sensing Visual Question Answering
    Zhao E.-Y.
    Song N.
    Nie J.
    Wang X.
    Zheng C.-Y.
    Wei Z.-Q.
    Ruan Jian Xue Bao/Journal of Software, 2024, 35 (05): : 2133 - 2149
  • [34] VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering
    Bolanos, Marc
    Peris, Alvaro
    Casacuberta, Francisco
    Radeva, Petia
    PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2017), 2017, 10255 : 372 - 380
  • [35] Depth-Aware and Semantic Guided Relational Attention Network for Visual Question Answering
    Liu, Yuhang
    Wei, Wei
    Peng, Daowan
    Mao, Xian-Ling
    He, Zhiyong
    Zhou, Pan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5344 - 5357
  • [36] Visual Question Generation as Dual Task of Visual Question Answering
    Li, Yikang
    Duan, Nan
    Zhou, Bolei
    Chu, Xiao
    Ouyang, Wanli
    Wang, Xiaogang
    Zhou, Ming
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6116 - 6124
  • [37] Modal Feature Contribution Distribution Strategy in Visual Question Answering
    Dong F.
    Wang X.
    Oad A.
    Khoso M.N.
    Journal of Engineering Science and Technology Review, 2022, 15 (01) : 8 - 15
  • [38] Debiased Visual Question Answering from Feature and Sample Perspectives
    Wen, Zhiquan
    Xu, Guanghui
    Tan, Mingkui
    Wu, Qingyao
    Wu, Qi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [39] Collaborative Attention Network to Enhance Visual Question Answering
    Gu, Rui
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 304 - 305
  • [40] ADAPTIVE ATTENTION FUSION NETWORK FOR VISUAL QUESTION ANSWERING
    Gu, Geonmo
    Kim, Seong Tae
    Ro, Yong Man
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 997 - 1002