Speech Emotion Recognition via Multi-Level Cross-Modal Distillation

被引：4

作者：

Li, Ruichen ^{[1
]}

Zhao, Jinming ^{[1
]}

Jin, Qin ^{[1
]}

机构：

[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China

来源：

INTERSPEECH 2021 | 2021年

基金：

北京市自然科学基金; 中国国家自然科学基金; 国家重点研发计划;

关键词：

speech emotion recognition; cross-modal transfer; pretraining;

D O I：

10.21437/Interspeech.2021-785

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Speech emotion recognition faces the problem that most of the existing speech corpora are limited in scale and diversity due to the high annotation cost and label ambiguity. In this work, we explore the task of learning robust speech emotion representations based on large unlabeled speech data. Under a simple assumption that the internal emotional states across different modalities are similar, we propose a method called Multi-level Cross-modal Emotion Distillation (MCED), which trains the speech emotion model without any labeled speech emotion data by transferring emotion knowledge from a pretrained text emotion model. Extensive experiments on two benchmark datasets, IEMOCAP and MELD, show that our proposed MCED can help learn effective speech emotion representations which generalize well on downstream speech emotion recognition tasks.

引用

页码：4488 / 4492

页数：5

共 50 条

[41] CROSS-MODAL KNOWLEDGE DISTILLATION IN MULTI-MODAL FAKE NEWS DETECTION
Wei, Zimian
Pan, Hengyue
Qiao, Linbo
Niu, Xin
Dong, Peijie
Li, Dongsheng
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4733 - 4737
[42] Multispectral Scene Classification via Cross-Modal Knowledge Distillation
Liu, Hao
Qu, Ying
Zhang, Liqiang
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[43] Multi-modal Attention for Speech Emotion Recognition
Pan, Zexu
Luo, Zhaojie
Yang, Jichen
Li, Haizhou
[J]. INTERSPEECH 2020, 2020, : 364 - 368
[44] REPRESENTATION LEARNING THROUGH CROSS-MODAL CONDITIONAL TEACHER-STUDENT TRAINING FOR SPEECH EMOTION RECOGNITION
Srinivasan, Sundararajan
Huang, Zhaocheng
Kirchhoff, Katrin
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6442 - 6446
[45] END-TO-END VOICE CONVERSION VIA CROSS-MODAL KNOWLEDGE DISTILLATION FOR DYSARTHRIC SPEECH RECONSTRUCTION
Wang, Disong
Yu, Jianwei
Wu, Xixin
Liu, Songxiang
Sung, Lifa
Liu, Xunying
Meng, Helen
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7744 - 7748
[46] CROSS-MODAL KNOWLEDGE DISTILLATION FOR VISION-TO-SENSOR ACTION RECOGNITION
Ni, Jianyuan
Sarbajna, Raunak
Liu, Yang
Ngu, Anne H. H.
Yan, Yan
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4448 - 4452
[47] SPEECH EMOTION RECOGNITION WITH CO-ATTENTION BASED MULTI-LEVEL ACOUSTIC INFORMATION
Zou, Heqing
Si, Yuke
Chen, Chen
Rajan, Deepu
Chng, Eng Siong
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7367 - 7371
[48] SELF-SUPERVISED LEARNING WITH CROSS-MODAL TRANSFORMERS FOR EMOTION RECOGNITION
Khare, Aparna
Parthasarathy, Srinivas
Sundaram, Shiva
[J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 381 - 388
[49] Multi-level cross-modal interaction network for RGB-D salient object detection
Huang, Zhou
Chen, Huai-Xin
Zhou, Tao
Yang, Yun-Zhi
Liu, Bi-Yuan
[J]. NEUROCOMPUTING, 2021, 452 : 200 - 211
[50] Learnable Cross-modal Knowledge Distillation for Multi-modal Learning with Missing Modality
Wang, Hu
Ma, Congbo
Zhang, Jianpeng
Zhang, Yuan
Avery, Jodie
Hull, Louise
Carneiro, Gustavo
[J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IV, 2023, 14223 : 216 - 226

← 1 2 3 4 5 →