Deep Fuzzy Multi-Teacher Distillation Network for Medical Visual Question Answering

被引:0
|
作者
Liu Y. [1 ]
Chen B. [2 ]
Wang S. [3 ]
Lu G. [1 ]
Zhang Z. [5 ]
机构
[1] Shenzhen Medical Biometrics Perception and Analysis Engineering Laboratory, Harbin Institute of Technology, Shenzhen
[2] School of Software, South China Normal University Nanhai, Foshan, Guangdong
[3] an Jiaotong-Liverpool University, Suzhou, Jiangsu
[4] School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen
基金
中国国家自然科学基金;
关键词
Fuzzy deep learning; fuzzy logic; knowledge distillation; medical visual question answering;
D O I
10.1109/TFUZZ.2024.3402086
中图分类号
学科分类号
摘要
Medical visual question answering (Medical VQA) is a critical cross-modal interaction task that garnered considerable attention in the medical domain. Several existing methods commonly leverage the vision-and-language pre-training paradigms to mitigate the limitation of small-scale data. Nevertheless, most of them still suffer from two challenges that remain for further research: 1) Limited research focuses on distilling representation from a complete modality to guide the representation learning of masked data in other modalities. 2) Multi-modal fusion based on self-attention mechanisms cannot effectively handle the inherent uncertainty and vagueness of information interaction across modalities. To mitigate these issues, in this paper, we propose a novel Deep Fuzzy Multi-teacher Distillation (DFMD) Network for medical visual question answering, which can take advantage of fuzzy logic to model the uncertainties from vison-language representations across modalities in a multi-teacher framework. Specifically, a multi-teacher knowledge distillation (MKD) module is conceived to assist in reconstructing the missing semantics under the supervision signal generated by teachers from the other complete modality, achieving more robust semantic interaction across modalities. Incorporating insights from fuzzy logic theory, we propose a noise-robust encoder called FuzBERT that enables our DFMD model to reduce the imprecision and ambiguity in feature representation during the multi-modal interaction process. To the best of our knowledge, our work is <italic>the first attempt</italic> to combine fuzzy logic theory with the transformer-based encoder to effectively learn multi-modal representation for medical visual question answering. Experimental results on the VQA-RAD and SLAKE datasets consistently demonstrate the superiority of our proposed DFMD method over state-of-the-art baselines. IEEE
引用
收藏
页码:1 / 15
页数:14
相关论文
共 50 条
  • [1] Hierarchical deep multi-modal network for medical visual question answering
    Gupta D.
    Suman S.
    Ekbal A.
    Expert Systems with Applications, 2021, 164
  • [2] Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System
    Yang, Ze
    Shou, Linjun
    Gong, Ming
    Lin, Wutao
    Jiang, Daxin
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM '20), 2020, : 690 - 698
  • [3] Let All Be Whitened: Multi-Teacher Distillation for Efficient Visual Retrieval
    Ma, Zhe
    Dong, Jianfeng
    Ji, Shouling
    Liu, Zhenguang
    Zhang, Xuhong
    Wang, Zonghui
    He, Sifeng
    Qian, Feng
    Zhang, Xiaobo
    Yang, Lei
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4126 - 4135
  • [4] Reinforced Multi-Teacher Selection for Knowledge Distillation
    Yuan, Fei
    Shou, Linjun
    Pei, Jian
    Lin, Wutao
    Gong, Ming
    Fu, Yan
    Jiang, Daxin
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14284 - 14291
  • [5] Correlation Guided Multi-teacher Knowledge Distillation
    Shi, Luyao
    Jiang, Ning
    Tang, Jialiang
    Huang, Xinlei
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT IV, 2024, 14450 : 562 - 574
  • [6] Answer Distillation for Visual Question Answering
    Fang, Zhiwei
    Liu, Jing
    Tang, Qu
    Li, Yong
    Lu, Hanqing
    COMPUTER VISION - ACCV 2018, PT I, 2019, 11361 : 72 - 87
  • [7] Optimal Deep Neural Network-Based Model for Answering Visual Medical Question
    Gasmi, Karim
    Ben Ltaifa, Ibtihel
    Lejeune, Gael
    Alshammari, Hamoud
    Ben Ammar, Lassaad
    Mahmood, Mahmood A.
    CYBERNETICS AND SYSTEMS, 2022, 53 (05) : 403 - 424
  • [8] Knowledge Distillation via Multi-Teacher Feature Ensemble
    Ye, Xin
    Jiang, Rongxin
    Tian, Xiang
    Zhang, Rui
    Chen, Yaowu
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 566 - 570
  • [9] CONFIDENCE-AWARE MULTI-TEACHER KNOWLEDGE DISTILLATION
    Zhang, Hailin
    Chen, Defang
    Wang, Can
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4498 - 4502
  • [10] Adaptive multi-teacher multi-level knowledge distillation
    Liu, Yuang
    Zhang, Wei
    Wang, Jun
    NEUROCOMPUTING, 2020, 415 : 106 - 113