Question-guided feature pyramid network for medical visual question answering

被引:5
|
作者
Yu, Yonglin [1 ]
Li, Haifeng [1 ]
Shi, Hanrong [2 ]
Li, Lin [2 ]
Xiao, Jun [1 ,2 ]
机构
[1] Zhejiang Univ, Childrens Hosp, Natl Clin Res Ctr Child Hlth, Sch Med,Dept Rehabil, Hangzhou, Peoples R China
[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China
基金
浙江省自然科学基金; 中国国家自然科学基金;
关键词
Visual question answering; Feature pyramid network; Dynamic filter network;
D O I
10.1016/j.eswa.2022.119148
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical VQA (VQA-Med) is a critical multi-modal task that raises attention from the community. Existing models utilized just one high-level feature map (i.e., the last layer feature map) extracted by CNN and then fused it with semantic features through the co-attention mechanism. However, only using the high-level feature as a visual feature often ignores the details of the image, which are crucial for VQA-Med. In addition, questions often serve as a guide to targets of attention in the medical image. Therefore, in this paper, we propose a question-guided Feature Pyramid Network (QFPN) for VQA-Med. It extracts multi-level visual features with a feature pyramid network (FPN). In this way, the multi-scale information of medical images can be captured by using the high resolution of low-level features and rich semantic information of high-level features. Besides, a novel question-guided dynamic filter network (DFN) is designed to modulate the fusion progress of multi-level visual features and semantic features with respect to the raised question. Extensive results have demonstrated the effectiveness of the QFPN. Especially, we beat the winner of the 2019 ImageCLEF challenge and achieved 63.8% Accuracy and 65.7% BLEU in the ImageCLEF 2019 VQA-Med dataset.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Compound-Attention Network with Original Feature injection for visual question and answering
    Wu, Chunlei
    Lu, Jing
    Li, Haisheng
    Wu, Jie
    Duan, Hailong
    Yuan, Shaozu
    SIGNAL IMAGE AND VIDEO PROCESSING, 2021, 15 (08) : 1853 - 1861
  • [22] Co-Attention Network With Question Type for Visual Question Answering
    Yang, Chao
    Jiang, Mengqi
    Jiang, Bin
    Zhou, Weixin
    Li, Keqin
    IEEE ACCESS, 2019, 7 : 40771 - 40781
  • [23] An Answer FeedBack Network for Visual Question Answering
    Tian, Weidong
    Tian, Ruihua
    Zhao, Zhongqiu
    Ren, Quan
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [24] M2FNet: Multi-granularity Feature Fusion Network for Medical Visual Question Answering
    Wang, He
    Pan, Haiwei
    Zhang, Kejia
    He, Shuning
    Chen, Chunling
    PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2022, 13630 : 141 - 154
  • [25] Medical visual question answering via corresponding feature fusion combined with semantic attention
    Zhu, Han
    He, Xiaohai
    Wang, Meiling
    Zhang, Mozhi
    Qing, Linbo
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2022, 19 (10) : 10192 - 10212
  • [26] Hierarchical deep multi-modal network for medical visual question answering
    Gupta D.
    Suman S.
    Ekbal A.
    Expert Systems with Applications, 2021, 164
  • [27] Dual Self-Guided Attention with Sparse Question Networks for Visual Question Answering
    Shen, Xiang
    Han, Dezhi
    Chang, Chin-Chen
    Zong, Liang
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (04) : 785 - 796
  • [28] Medical Visual Question Answering via Conditional Reasoning
    Zhan, Li-Ming
    Liu, Bo
    Fan, Lu
    Chen, Jiaxin
    Wu, Xiao-Ming
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2345 - 2354
  • [29] TYPE-AWARE MEDICAL VISUAL QUESTION ANSWERING
    Zhang, Anda
    Tao, Wei
    Li, Ziyan
    Wang, Haofen
    Zhang, Wenqiang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4838 - 4842
  • [30] Overcoming Data Limitation in Medical Visual Question Answering
    Nguyen, Binh D.
    Thanh-Toan Do
    Nguyen, Binh X.
    Do, Tuong
    Tjiputra, Erman
    Tran, Quang D.
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT IV, 2019, 11767 : 522 - 530