Learnable Cross-modal Knowledge Distillation for Multi-modal Learning with Missing Modality

被引:7
|
作者
Wang, Hu [1 ]
Ma, Congbo [1 ]
Zhang, Jianpeng [2 ]
Zhang, Yuan [1 ]
Avery, Jodie [1 ]
Hull, Louise [1 ]
Carneiro, Gustavo [3 ]
机构
[1] Univ Adelaide, Adelaide, SA, Australia
[2] Alibaba DAMO Acad, Hangzhou, Peoples R China
[3] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford, Surrey, England
关键词
Missing modality issue; Multi-modal learning; Learnable cross-modal knowledge distillation; SEGMENTATION;
D O I
10.1007/978-3-031-43901-8_21
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The problem of missing modalities is both critical and nontrivial to be handled in multi-modal models. It is common for multimodal tasks that certain modalities contribute more compared to other modalities, and if those important modalities are missing, the model performance drops significantly. Such fact remains unexplored by current multi-modal approaches that recover the representation from missing modalities by feature reconstruction or blind feature aggregation from other modalities, instead of extracting useful information from the best performing modalities. In this paper, we propose a Learnable Cross-modal Knowledge Distillation (LCKD) model to adaptively identify important modalities and distil knowledge from them to help other modalities from the cross-modal perspective for solving the missing modality issue. Our approach introduces a teacher election procedure to select the most "qualified" teachers based on their single modality performance on certain tasks. Then, cross-modal knowledge distillation is performed between teacher and student modalities for each task to push the model parameters to a point that is beneficial for all tasks. Hence, even if the teacher modalities for certain tasks are missing during testing, the available student modalities can accomplish the task well enough based on the learned knowledge from their automatically elected teacher modalities. Experiments on the Brain Tumour Segmentation Dataset 2018 (BraTS2018) shows that LCKD outperforms other methods by a considerable margin, improving the state-of-the-art performance by 3.61% for enhancing tumour, 5.99% for tumour core, and 3.76% for whole tumour in terms of segmentation Dice score.
引用
收藏
页码:216 / 226
页数:11
相关论文
共 50 条
  • [41] CMC-MMR: multi-modal recommendation model with cross-modal correction
    Wang, Yubin
    Xia, Hongbin
    Liu, Yuan
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024, 62 (05) : 1187 - 1211
  • [42] Cross-modal learning with multi-modal model for video action recognition based on adaptive weight training
    Zhou, Qingguo
    Hou, Yufeng
    Zhou, Rui
    Li, Yan
    Wang, Jinqiang
    Wu, Zhen
    Li, Hung-Wei
    Weng, Tien-Hsiung
    CONNECTION SCIENCE, 2024, 36 (01)
  • [43] Unpaired Multi-Modal Segmentation via Knowledge Distillation
    Dou, Qi
    Liu, Quande
    Heng, Pheng Ann
    Glocker, Ben
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (07) : 2415 - 2425
  • [44] Mapping Multi-Modal Brain Connectome for Brain Disorder Diagnosis via Cross-Modal Mutual Learning
    Yang, Yanwu
    Ye, Chenfei
    Guo, Xutao
    Wu, Tao
    Xiang, Yang
    Ma, Ting
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (01) : 108 - 121
  • [45] Multi-modal Dictionary BERT for Cross-modal Video Search in Baidu Advertising
    Yu, Tan
    Yang, Yi
    Li, Yi
    Liu, Lin
    Sun, Mingming
    Li, Ping
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 4341 - 4351
  • [46] Cross-Modal Semantic Alignment and Information Refinement for Multi-Modal Sentiment Analysis
    Ding, Meirong
    Chen, Hongye
    Zeng, Biqing
    Computer Engineering and Applications, 2024, 60 (22) : 114 - 125
  • [47] Cross-modal context-gated convolution for multi-modal sentiment analysis
    Wen, Huanglu
    You, Shaodi
    Fu, Ying
    PATTERN RECOGNITION LETTERS, 2021, 146 : 252 - 259
  • [48] Multi-Modal Sarcasm Detection via Cross-Modal Graph Convolutional Network
    Liang, Bin
    Lou, Chenwei
    Li, Xiang
    Yang, Min
    Gui, Lin
    He, Yulan
    Pei, Wenjie
    Xu, Ruifeng
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1767 - 1777
  • [49] CA_DeepSC: Cross-Modal Alignment for Multi-Modal Semantic Communications
    Wang, Wenjun
    Liu, Minghao
    Chen, Mingkai
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 5871 - 5876
  • [50] Multi-modal Learning with Missing Modality via Shared-Specific Feature Modelling
    Wang, Hu
    Chen, Yuanhong
    Ma, Congbo
    Avery, Jodie
    Hull, Louise
    Carneiro, Gustavo
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15878 - 15887