Learnable Cross-modal Knowledge Distillation for Multi-modal Learning with Missing Modality

被引：7

作者：

Wang, Hu ^{[1
]}

Ma, Congbo ^{[1
]}

Zhang, Jianpeng ^{[2
]}

Zhang, Yuan ^{[1
]}

Avery, Jodie ^{[1
]}

Hull, Louise ^{[1
]}

Carneiro, Gustavo ^{[3
]}

机构：

[1] Univ Adelaide, Adelaide, SA, Australia

[2] Alibaba DAMO Acad, Hangzhou, Peoples R China

[3] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford, Surrey, England

来源：

MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IV | 2023年 / 14223卷

关键词：

Missing modality issue; Multi-modal learning; Learnable cross-modal knowledge distillation; SEGMENTATION;

D O I：

10.1007/978-3-031-43901-8_21

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

The problem of missing modalities is both critical and nontrivial to be handled in multi-modal models. It is common for multimodal tasks that certain modalities contribute more compared to other modalities, and if those important modalities are missing, the model performance drops significantly. Such fact remains unexplored by current multi-modal approaches that recover the representation from missing modalities by feature reconstruction or blind feature aggregation from other modalities, instead of extracting useful information from the best performing modalities. In this paper, we propose a Learnable Cross-modal Knowledge Distillation (LCKD) model to adaptively identify important modalities and distil knowledge from them to help other modalities from the cross-modal perspective for solving the missing modality issue. Our approach introduces a teacher election procedure to select the most "qualified" teachers based on their single modality performance on certain tasks. Then, cross-modal knowledge distillation is performed between teacher and student modalities for each task to push the model parameters to a point that is beneficial for all tasks. Hence, even if the teacher modalities for certain tasks are missing during testing, the available student modalities can accomplish the task well enough based on the learned knowledge from their automatically elected teacher modalities. Experiments on the Brain Tumour Segmentation Dataset 2018 (BraTS2018) shows that LCKD outperforms other methods by a considerable margin, improving the state-of-the-art performance by 3.61% for enhancing tumour, 5.99% for tumour core, and 3.76% for whole tumour in terms of segmentation Dice score.

引用

页码：216 / 226

页数：11

共 50 条

[31] Unsupervised Deep Cross-Modal Hashing by Knowledge Distillation for Large-scale Cross-modal Retrieval
Li, Mingyong
Wang, Hongya
PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 183 - 191
[32] Multi-modal Robustness Fake News Detection with Cross-Modal and Propagation Network Contrastive Learning
Chen, Han
Wang, Hairong
Liu, Zhipeng
Li, Yuhua
Hu, Yifan
Zhang, Yujing
Shu, Kai
Li, Ruixuan
Yu, Philip S.
KNOWLEDGE-BASED SYSTEMS, 2025, 309
[33] Uncertainty-Aware Multi-modal Learning via Cross-Modal Random Network Prediction
Wang, Hu
Zhang, Jianpeng
Chen, Yuanhong
Ma, Congbo
Avery, Jodie
Hull, Louise
Carneiro, Gustavo
COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 200 - 217
[34] Acoustic NLOS Imaging with Cross-Modal Knowledge Distillation
Shin, Ui-Hyeon
Jang, Seungwoo
Kim, Kwangsu
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 1405 - 1413
[35] Learning Cross-Modality Representations From Multi-Modal Images
van Tulder, Gijs
de Bruijne, Marleen
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2019, 38 (02) : 638 - 648
[36] Cross-Modal Graph Knowledge Representation and Distillation Learning for Land Cover Classification
Wang, Wenzhen
Liu, Fang
Liao, Wenzhi
Xiao, Liang
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[37] XKD: Cross-Modal Knowledge Distillation with Domain Alignment for Video Representation Learning
Sarkar, Pritam
Etemad, Ali
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14875 - 14885
[38] Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection
Dai, Rui
Das, Srijan
Bremond, Francois
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13033 - 13044
[39] Unified Multi-Modal Image Synthesis for Missing Modality Imputation
Zhang, Yue
Peng, Chengtao
Wang, Qiuli
Song, Dan
Li, Kaiyan
Zhou, S. Kevin
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2025, 44 (01) : 4 - 18
[40] Bridging Modality Gap for Visual Grounding with Effecitve Cross-Modal Distillation
Wang, Jiaxi
Hu, Wenhui
Liu, Xueyang
Wu, Beihu
Qiu, Yuting
Cai, YingYing
PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 347 - 363

← 1 2 3 4 5 →