Learnable Cross-modal Knowledge Distillation for Multi-modal Learning with Missing Modality

被引:7
|
作者
Wang, Hu [1 ]
Ma, Congbo [1 ]
Zhang, Jianpeng [2 ]
Zhang, Yuan [1 ]
Avery, Jodie [1 ]
Hull, Louise [1 ]
Carneiro, Gustavo [3 ]
机构
[1] Univ Adelaide, Adelaide, SA, Australia
[2] Alibaba DAMO Acad, Hangzhou, Peoples R China
[3] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford, Surrey, England
关键词
Missing modality issue; Multi-modal learning; Learnable cross-modal knowledge distillation; SEGMENTATION;
D O I
10.1007/978-3-031-43901-8_21
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The problem of missing modalities is both critical and nontrivial to be handled in multi-modal models. It is common for multimodal tasks that certain modalities contribute more compared to other modalities, and if those important modalities are missing, the model performance drops significantly. Such fact remains unexplored by current multi-modal approaches that recover the representation from missing modalities by feature reconstruction or blind feature aggregation from other modalities, instead of extracting useful information from the best performing modalities. In this paper, we propose a Learnable Cross-modal Knowledge Distillation (LCKD) model to adaptively identify important modalities and distil knowledge from them to help other modalities from the cross-modal perspective for solving the missing modality issue. Our approach introduces a teacher election procedure to select the most "qualified" teachers based on their single modality performance on certain tasks. Then, cross-modal knowledge distillation is performed between teacher and student modalities for each task to push the model parameters to a point that is beneficial for all tasks. Hence, even if the teacher modalities for certain tasks are missing during testing, the available student modalities can accomplish the task well enough based on the learned knowledge from their automatically elected teacher modalities. Experiments on the Brain Tumour Segmentation Dataset 2018 (BraTS2018) shows that LCKD outperforms other methods by a considerable margin, improving the state-of-the-art performance by 3.61% for enhancing tumour, 5.99% for tumour core, and 3.76% for whole tumour in terms of segmentation Dice score.
引用
收藏
页码:216 / 226
页数:11
相关论文
共 50 条
  • [31] Unsupervised Deep Cross-Modal Hashing by Knowledge Distillation for Large-scale Cross-modal Retrieval
    Li, Mingyong
    Wang, Hongya
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 183 - 191
  • [32] Multi-modal Robustness Fake News Detection with Cross-Modal and Propagation Network Contrastive Learning
    Chen, Han
    Wang, Hairong
    Liu, Zhipeng
    Li, Yuhua
    Hu, Yifan
    Zhang, Yujing
    Shu, Kai
    Li, Ruixuan
    Yu, Philip S.
    KNOWLEDGE-BASED SYSTEMS, 2025, 309
  • [33] Uncertainty-Aware Multi-modal Learning via Cross-Modal Random Network Prediction
    Wang, Hu
    Zhang, Jianpeng
    Chen, Yuanhong
    Ma, Congbo
    Avery, Jodie
    Hull, Louise
    Carneiro, Gustavo
    COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 200 - 217
  • [34] Acoustic NLOS Imaging with Cross-Modal Knowledge Distillation
    Shin, Ui-Hyeon
    Jang, Seungwoo
    Kim, Kwangsu
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 1405 - 1413
  • [35] Learning Cross-Modality Representations From Multi-Modal Images
    van Tulder, Gijs
    de Bruijne, Marleen
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2019, 38 (02) : 638 - 648
  • [36] Cross-Modal Graph Knowledge Representation and Distillation Learning for Land Cover Classification
    Wang, Wenzhen
    Liu, Fang
    Liao, Wenzhi
    Xiao, Liang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [37] XKD: Cross-Modal Knowledge Distillation with Domain Alignment for Video Representation Learning
    Sarkar, Pritam
    Etemad, Ali
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14875 - 14885
  • [38] Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection
    Dai, Rui
    Das, Srijan
    Bremond, Francois
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13033 - 13044
  • [39] Unified Multi-Modal Image Synthesis for Missing Modality Imputation
    Zhang, Yue
    Peng, Chengtao
    Wang, Qiuli
    Song, Dan
    Li, Kaiyan
    Zhou, S. Kevin
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2025, 44 (01) : 4 - 18
  • [40] Bridging Modality Gap for Visual Grounding with Effecitve Cross-Modal Distillation
    Wang, Jiaxi
    Hu, Wenhui
    Liu, Xueyang
    Wu, Beihu
    Qiu, Yuting
    Cai, YingYing
    PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 347 - 363