Deep Fuzzy Multi-Teacher Distillation Network for Medical Visual Question Answering

被引:0
|
作者
Liu Y. [1 ]
Chen B. [2 ]
Wang S. [3 ]
Lu G. [1 ]
Zhang Z. [5 ]
机构
[1] Shenzhen Medical Biometrics Perception and Analysis Engineering Laboratory, Harbin Institute of Technology, Shenzhen
[2] School of Software, South China Normal University Nanhai, Foshan, Guangdong
[3] an Jiaotong-Liverpool University, Suzhou, Jiangsu
[4] School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen
基金
中国国家自然科学基金;
关键词
Fuzzy deep learning; fuzzy logic; knowledge distillation; medical visual question answering;
D O I
10.1109/TFUZZ.2024.3402086
中图分类号
学科分类号
摘要
Medical visual question answering (Medical VQA) is a critical cross-modal interaction task that garnered considerable attention in the medical domain. Several existing methods commonly leverage the vision-and-language pre-training paradigms to mitigate the limitation of small-scale data. Nevertheless, most of them still suffer from two challenges that remain for further research: 1) Limited research focuses on distilling representation from a complete modality to guide the representation learning of masked data in other modalities. 2) Multi-modal fusion based on self-attention mechanisms cannot effectively handle the inherent uncertainty and vagueness of information interaction across modalities. To mitigate these issues, in this paper, we propose a novel Deep Fuzzy Multi-teacher Distillation (DFMD) Network for medical visual question answering, which can take advantage of fuzzy logic to model the uncertainties from vison-language representations across modalities in a multi-teacher framework. Specifically, a multi-teacher knowledge distillation (MKD) module is conceived to assist in reconstructing the missing semantics under the supervision signal generated by teachers from the other complete modality, achieving more robust semantic interaction across modalities. Incorporating insights from fuzzy logic theory, we propose a noise-robust encoder called FuzBERT that enables our DFMD model to reduce the imprecision and ambiguity in feature representation during the multi-modal interaction process. To the best of our knowledge, our work is <italic>the first attempt</italic> to combine fuzzy logic theory with the transformer-based encoder to effectively learn multi-modal representation for medical visual question answering. Experimental results on the VQA-RAD and SLAKE datasets consistently demonstrate the superiority of our proposed DFMD method over state-of-the-art baselines. IEEE
引用
收藏
页码:1 / 15
页数:14
相关论文
共 50 条
  • [31] Affective Visual Question Answering Network
    Ruwa, Nelson
    Mao, Qirong
    Wang, Liangjun
    Dong, Ming
    IEEE 1ST CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2018), 2018, : 170 - 173
  • [32] Multi-teacher cross-modal distillation with cooperative deep supervision fusion learning for unimodal segmentation
    Ahmad S.
    Ullah Z.
    Gwak J.
    Knowledge-Based Systems, 2024, 297
  • [33] Collaborative Multi-Teacher Knowledge Distillation for Learning Low Bit-width Deep Neural Networks
    Cuong Pham
    Tuan Hoang
    Thanh-Toan Do
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 6424 - 6432
  • [34] Multi-modality Latent Interaction Network for Visual Question Answering
    Gao, Peng
    You, Haoxuan
    Zhang, Zhanpeng
    Wang, Xiaogang
    Li, Hongsheng
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5824 - 5834
  • [35] Unsupervised Domain Adaptation in Medical Image Segmentation via Fourier Feature Decoupling and Multi-teacher Distillation
    Hu, Wei
    Xu, Qiaozhi
    Qi, Xuanhao
    Yin, Yanjun
    Zhi, Min
    Lian, Zhe
    Yang, Na
    Duan, Wentao
    Yu, Lei
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VI, ICIC 2024, 2024, 14867 : 98 - 110
  • [36] MTKDSR: Multi-Teacher Knowledge Distillation for Super Resolution Image Reconstruction
    Yao, Gengqi
    Li, Zhan
    Bhanu, Bir
    Kang, Zhiqing
    Zhong, Ziyi
    Zhang, Qingfeng
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 352 - 358
  • [37] Adversarial Multi-Teacher Distillation for Semi-Supervised Relation Extraction
    Li, Wanli
    Qian, Tieyun
    Li, Xuhui
    Zou, Lixin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (08) : 11291 - 11301
  • [38] Multi-Question Learning for Visual Question Answering
    Lei, Chenyi
    Wu, Lei
    Liu, Dong
    Li, Zhao
    Wang, Guoxin
    Tang, Haihong
    Li, Houqiang
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11328 - 11335
  • [39] MTUW-GAN: A Multi-Teacher Knowledge Distillation Generative Adversarial Network for Underwater Image Enhancement
    Zhang, Tianchi
    Liu, Yuxuan
    Mase, Atsushi
    APPLIED SCIENCES-BASEL, 2024, 14 (02):
  • [40] Accurate and efficient protein embedding using multi-teacher distillation learning
    Shang, Jiayu
    Peng, Cheng
    Ji, Yongxin
    Guan, Jiaojiao
    Cai, Dehan
    Tang, Xubo
    Sun, Yanni
    BIOINFORMATICS, 2024, 40 (09)