Speech Emotion Recognition via Multi-Level Cross-Modal Distillation

被引:4
|
作者
Li, Ruichen [1 ]
Zhao, Jinming [1 ]
Jin, Qin [1 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
来源
基金
北京市自然科学基金; 中国国家自然科学基金; 国家重点研发计划;
关键词
speech emotion recognition; cross-modal transfer; pretraining;
D O I
10.21437/Interspeech.2021-785
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Speech emotion recognition faces the problem that most of the existing speech corpora are limited in scale and diversity due to the high annotation cost and label ambiguity. In this work, we explore the task of learning robust speech emotion representations based on large unlabeled speech data. Under a simple assumption that the internal emotional states across different modalities are similar, we propose a method called Multi-level Cross-modal Emotion Distillation (MCED), which trains the speech emotion model without any labeled speech emotion data by transferring emotion knowledge from a pretrained text emotion model. Extensive experiments on two benchmark datasets, IEMOCAP and MELD, show that our proposed MCED can help learn effective speech emotion representations which generalize well on downstream speech emotion recognition tasks.
引用
收藏
页码:4488 / 4492
页数:5
相关论文
共 50 条
  • [1] A Multi-Level Circulant Cross-Modal Transformer for Multimodal Speech Emotion Recognition
    Gong, Peizhu
    Liu, Jin
    Wu, Zhongdai
    Han, Bing
    Wang, Y. Ken
    He, Huihua
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 4203 - 4220
  • [2] Electroglottograph-Based Speech Emotion Recognition via Cross-Modal Distillation
    Chen, Lijiang
    Ren, Jie
    Mao, Xia
    Zhao, Qi
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (09):
  • [3] Contextual and Cross-Modal Interaction for Multi-Modal Speech Emotion Recognition
    Yang, Dingkang
    Huang, Shuai
    Liu, Yang
    Zhang, Lihua
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2093 - 2097
  • [4] A Multi-Level Alignment and Cross-Modal Unified Semantic Graph Refinement Network for Conversational Emotion Recognition
    Zhang, Xiaoheng
    Cui, Weigang
    Hu, Bin
    Li, Yang
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (03) : 1553 - 1566
  • [5] Speech Emotion Recognition via Multi-Level Attention Network
    Liu, Ke
    Wang, Dekui
    Wu, Dongya
    Liu, Yutao
    Feng, Jun
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2278 - 2282
  • [6] Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
    Albanie, Samuel
    Nagrani, Arsha
    Vedaldi, Andrea
    Zisserman, Andrew
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 292 - 301
  • [7] FedCMD: A Federated Cross-modal Knowledge Distillation for Drivers' Emotion Recognition
    Bano, Saira
    Tonellotto, Nicola
    Cassara, Pietro
    Gotta, Alberto
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 15 (03)
  • [8] Cross-modal dynamic convolution for multi-modal emotion recognition
    Wen, Huanglu
    You, Shaodi
    Fu, Ying
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 78
  • [9] Cross-Modal Knowledge Distillation Method for Automatic Cued Speech Recognition
    Wang, Jianrong
    Tang, Ziyue
    Li, Xuewei
    Yu, Mei
    Fang, Qiang
    Liu, Li
    [J]. INTERSPEECH 2021, 2021, : 2986 - 2990
  • [10] Multi-level adversarial attention cross-modal hashing
    Wang, Benhui
    Zhang, Huaxiang
    Zhu, Lei
    Nie, Liqiang
    Liu, Li
    [J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 117