Learnable Cross-modal Knowledge Distillation for Multi-modal Learning with Missing Modality

被引：7

作者：

Wang, Hu ^{[1
]}

Ma, Congbo ^{[1
]}

Zhang, Jianpeng ^{[2
]}

Zhang, Yuan ^{[1
]}

Avery, Jodie ^{[1
]}

Hull, Louise ^{[1
]}

Carneiro, Gustavo ^{[3
]}

机构：

[1] Univ Adelaide, Adelaide, SA, Australia

[2] Alibaba DAMO Acad, Hangzhou, Peoples R China

[3] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford, Surrey, England

来源：

MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IV | 2023年 / 14223卷

关键词：

Missing modality issue; Multi-modal learning; Learnable cross-modal knowledge distillation; SEGMENTATION;

D O I：

10.1007/978-3-031-43901-8_21

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

The problem of missing modalities is both critical and nontrivial to be handled in multi-modal models. It is common for multimodal tasks that certain modalities contribute more compared to other modalities, and if those important modalities are missing, the model performance drops significantly. Such fact remains unexplored by current multi-modal approaches that recover the representation from missing modalities by feature reconstruction or blind feature aggregation from other modalities, instead of extracting useful information from the best performing modalities. In this paper, we propose a Learnable Cross-modal Knowledge Distillation (LCKD) model to adaptively identify important modalities and distil knowledge from them to help other modalities from the cross-modal perspective for solving the missing modality issue. Our approach introduces a teacher election procedure to select the most "qualified" teachers based on their single modality performance on certain tasks. Then, cross-modal knowledge distillation is performed between teacher and student modalities for each task to push the model parameters to a point that is beneficial for all tasks. Hence, even if the teacher modalities for certain tasks are missing during testing, the available student modalities can accomplish the task well enough based on the learned knowledge from their automatically elected teacher modalities. Experiments on the Brain Tumour Segmentation Dataset 2018 (BraTS2018) shows that LCKD outperforms other methods by a considerable margin, improving the state-of-the-art performance by 3.61% for enhancing tumour, 5.99% for tumour core, and 3.76% for whole tumour in terms of segmentation Dice score.

引用

页码：216 / 226

页数：11

共 50 条

[41] CMC-MMR: multi-modal recommendation model with cross-modal correction
Wang, Yubin
Xia, Hongbin
Liu, Yuan
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024, 62 (05) : 1187 - 1211
[42] Cross-modal learning with multi-modal model for video action recognition based on adaptive weight training
Zhou, Qingguo
Hou, Yufeng
Zhou, Rui
Li, Yan
Wang, Jinqiang
Wu, Zhen
Li, Hung-Wei
Weng, Tien-Hsiung
CONNECTION SCIENCE, 2024, 36 (01)
[43] Unpaired Multi-Modal Segmentation via Knowledge Distillation
Dou, Qi
Liu, Quande
Heng, Pheng Ann
Glocker, Ben
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (07) : 2415 - 2425
[44] Mapping Multi-Modal Brain Connectome for Brain Disorder Diagnosis via Cross-Modal Mutual Learning
Yang, Yanwu
Ye, Chenfei
Guo, Xutao
Wu, Tao
Xiang, Yang
Ma, Ting
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (01) : 108 - 121
[45] Multi-modal Dictionary BERT for Cross-modal Video Search in Baidu Advertising
Yu, Tan
Yang, Yi
Li, Yi
Liu, Lin
Sun, Mingming
Li, Ping
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 4341 - 4351
[46] Cross-Modal Semantic Alignment and Information Refinement for Multi-Modal Sentiment Analysis
Ding, Meirong
Chen, Hongye
Zeng, Biqing
Computer Engineering and Applications, 2024, 60 (22) : 114 - 125
[47] Cross-modal context-gated convolution for multi-modal sentiment analysis
Wen, Huanglu
You, Shaodi
Fu, Ying
PATTERN RECOGNITION LETTERS, 2021, 146 : 252 - 259
[48] Multi-Modal Sarcasm Detection via Cross-Modal Graph Convolutional Network
Liang, Bin
Lou, Chenwei
Li, Xiang
Yang, Min
Gui, Lin
He, Yulan
Pei, Wenjie
Xu, Ruifeng
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1767 - 1777
[49] CA_DeepSC: Cross-Modal Alignment for Multi-Modal Semantic Communications
Wang, Wenjun
Liu, Minghao
Chen, Mingkai
IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 5871 - 5876
[50] Multi-modal Learning with Missing Modality via Shared-Specific Feature Modelling
Wang, Hu
Chen, Yuanhong
Ma, Congbo
Avery, Jodie
Hull, Louise
Carneiro, Gustavo
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15878 - 15887

← 1 2 3 4 5 →