Deep Fuzzy Multi-Teacher Distillation Network for Medical Visual Question Answering

被引:0
|
作者
Liu Y. [1 ]
Chen B. [2 ]
Wang S. [3 ]
Lu G. [1 ]
Zhang Z. [5 ]
机构
[1] Shenzhen Medical Biometrics Perception and Analysis Engineering Laboratory, Harbin Institute of Technology, Shenzhen
[2] School of Software, South China Normal University Nanhai, Foshan, Guangdong
[3] an Jiaotong-Liverpool University, Suzhou, Jiangsu
[4] School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen
基金
中国国家自然科学基金;
关键词
Fuzzy deep learning; fuzzy logic; knowledge distillation; medical visual question answering;
D O I
10.1109/TFUZZ.2024.3402086
中图分类号
学科分类号
摘要
Medical visual question answering (Medical VQA) is a critical cross-modal interaction task that garnered considerable attention in the medical domain. Several existing methods commonly leverage the vision-and-language pre-training paradigms to mitigate the limitation of small-scale data. Nevertheless, most of them still suffer from two challenges that remain for further research: 1) Limited research focuses on distilling representation from a complete modality to guide the representation learning of masked data in other modalities. 2) Multi-modal fusion based on self-attention mechanisms cannot effectively handle the inherent uncertainty and vagueness of information interaction across modalities. To mitigate these issues, in this paper, we propose a novel Deep Fuzzy Multi-teacher Distillation (DFMD) Network for medical visual question answering, which can take advantage of fuzzy logic to model the uncertainties from vison-language representations across modalities in a multi-teacher framework. Specifically, a multi-teacher knowledge distillation (MKD) module is conceived to assist in reconstructing the missing semantics under the supervision signal generated by teachers from the other complete modality, achieving more robust semantic interaction across modalities. Incorporating insights from fuzzy logic theory, we propose a noise-robust encoder called FuzBERT that enables our DFMD model to reduce the imprecision and ambiguity in feature representation during the multi-modal interaction process. To the best of our knowledge, our work is <italic>the first attempt</italic> to combine fuzzy logic theory with the transformer-based encoder to effectively learn multi-modal representation for medical visual question answering. Experimental results on the VQA-RAD and SLAKE datasets consistently demonstrate the superiority of our proposed DFMD method over state-of-the-art baselines. IEEE
引用
收藏
页码:1 / 15
页数:14
相关论文
共 50 条
  • [21] Deep Attention Neural Tensor Network for Visual Question Answering
    Bai, Yalong
    Fu, Jianlong
    Zhao, Tiejun
    Mei, Tao
    COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 : 21 - 37
  • [22] Deep Modular Bilinear Attention Network for Visual Question Answering
    Yan, Feng
    Silamu, Wushouer
    Li, Yanbing
    SENSORS, 2022, 22 (03)
  • [23] Question-guided feature pyramid network for medical visual question answering
    Yu, Yonglin
    Li, Haifeng
    Shi, Hanrong
    Li, Lin
    Xiao, Jun
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 214
  • [24] Enhanced Accuracy and Robustness via Multi-teacher Adversarial Distillation
    Zhao, Shiji
    Yu, Jie
    Sun, Zhenlong
    Zhang, Bo
    Wei, Xingxing
    COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 585 - 602
  • [25] Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning
    Zhang, Hailin
    Chen, Defang
    Wang, Can
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1943 - 1948
  • [26] Multi-Teacher Distillation With Single Model for Neural Machine Translation
    Liang, Xiaobo
    Wu, Lijun
    Li, Juntao
    Qin, Tao
    Zhang, Min
    Liu, Tie-Yan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 992 - 1002
  • [27] ATMKD: adaptive temperature guided multi-teacher knowledge distillation
    Lin, Yu-e
    Yin, Shuting
    Ding, Yifeng
    Liang, Xingzhu
    MULTIMEDIA SYSTEMS, 2024, 30 (05)
  • [28] Medical visual question answering: A survey
    Lin, Zhihong
    Zhang, Donghao
    Tao, Qingyi
    Shi, Danli
    Haffari, Gholamreza
    Wu, Qi
    He, Mingguang
    Ge, Zongyuan
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2023, 143
  • [29] Learning to Specialize with Knowledge Distillation for Visual Question Answering
    Mun, Jonghwan
    Lee, Kimin
    Shin, Jinwoo
    Han, Bohyung
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [30] Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation
    Cao, Shengcao
    Li, Mengtian
    Hays, James
    Ramanan, Deva
    Wang, Yu-Xiong
    Gui, Liang-Yan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202