Question-guided feature pyramid network for medical visual question answering

被引:5
|
作者
Yu, Yonglin [1 ]
Li, Haifeng [1 ]
Shi, Hanrong [2 ]
Li, Lin [2 ]
Xiao, Jun [1 ,2 ]
机构
[1] Zhejiang Univ, Childrens Hosp, Natl Clin Res Ctr Child Hlth, Sch Med,Dept Rehabil, Hangzhou, Peoples R China
[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China
基金
浙江省自然科学基金; 中国国家自然科学基金;
关键词
Visual question answering; Feature pyramid network; Dynamic filter network;
D O I
10.1016/j.eswa.2022.119148
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical VQA (VQA-Med) is a critical multi-modal task that raises attention from the community. Existing models utilized just one high-level feature map (i.e., the last layer feature map) extracted by CNN and then fused it with semantic features through the co-attention mechanism. However, only using the high-level feature as a visual feature often ignores the details of the image, which are crucial for VQA-Med. In addition, questions often serve as a guide to targets of attention in the medical image. Therefore, in this paper, we propose a question-guided Feature Pyramid Network (QFPN) for VQA-Med. It extracts multi-level visual features with a feature pyramid network (FPN). In this way, the multi-scale information of medical images can be captured by using the high resolution of low-level features and rich semantic information of high-level features. Besides, a novel question-guided dynamic filter network (DFN) is designed to modulate the fusion progress of multi-level visual features and semantic features with respect to the raised question. Extensive results have demonstrated the effectiveness of the QFPN. Especially, we beat the winner of the 2019 ImageCLEF challenge and achieved 63.8% Accuracy and 65.7% BLEU in the ImageCLEF 2019 VQA-Med dataset.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Question-Guided Hybrid Convolution for Visual Question Answering
    Gao, Peng
    Li, Hongsheng
    Li, Shuang
    Lu, Pan
    Li, Yikang
    Hoi, Steven C. H.
    Wang, Xiaogang
    COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 485 - 501
  • [2] A question-guided multi-hop reasoning graph network for visual question answering
    Xu, Zhaoyang
    Gu, Jinguang
    Liu, Maofu
    Zhou, Guangyou
    Fu, Haidong
    Qiu, Chen
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (02)
  • [3] Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
    Xu, Huijuan
    Saenko, Kate
    COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 : 451 - 466
  • [4] Learning neighbor-enhanced region representations and question-guided visual representations for visual question answering
    Gao, Ling
    Zhang, Hongda
    Sheng, Nan
    Shi, Lida
    Xu, Hao
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [5] Learning Question-Guided Video Representation for Multi-Turn Video Question Answering
    Chao, Guan-Lin
    Rastogi, Abhinav
    Yavuz, Semih
    Hakkani-Tur, Dilek
    Chen, Jindong
    Lane, Ian
    20TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2019), 2019, : 215 - 225
  • [6] Question-Guided Erasing-Based Spatiotemporal Attention Learning for Video Question Answering
    Liu, Fei
    Liu, Jing
    Hong, Richang
    Lu, Hanqing
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (03) : 1367 - 1379
  • [7] Question Type Guided Attention in Visual Question Answering
    Shi, Yang
    Furlanello, Tommaso
    Zha, Sheng
    Anandkumar, Animashree
    COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 158 - 175
  • [8] Semantically Guided Visual Question Answering
    Zhao, Handong
    Fan, Quanfu
    Gutfreund, Dan
    Fu, Yun
    2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1852 - 1860
  • [9] Divide and Conquer: Question-Guided Spatio-Temporal Contextual Attention for Video Question Answering
    Jiang, Jianwen
    Chen, Ziqiang
    Lin, Haojie
    Zhao, Xibin
    Gao, Yue
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11101 - 11108
  • [10] Medical visual question answering: A survey
    Lin, Zhihong
    Zhang, Donghao
    Tao, Qingyi
    Shi, Danli
    Haffari, Gholamreza
    Wu, Qi
    He, Mingguang
    Ge, Zongyuan
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2023, 143