Deep Transductive Transfer Regression Network for Cross-Corpus Speech Emotion Recognition

被引:3
|
作者
Zhao, Yan [1 ,2 ]
Wang, Jincen [3 ]
Ye, Ru [4 ]
Zong, Yuan [1 ]
Zheng, Wenming [1 ]
Zhao, Li [2 ]
机构
[1] Southeast Univ, Minist Educ, Key Lab Child Dev & Learning Sci, Nanjing, Peoples R China
[2] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China
[3] Nanjing Univ Informat Sci & Technol, Sch Elect & Informat Engn, Nanjing, Peoples R China
[4] Nanjing Univ Informat Sci & Technol, Changwang Sch Honors, Nanjing, Peoples R China
来源
关键词
Cross-corpus speech emotion recognition; speech emotion recognition; domain adaptation; transfer learning; deep learning; FEATURES; KERNEL;
D O I
10.21437/Interspeech.2022-679
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we focus on the research of cross-corpus speech emotion recognition (SER), in which the training (source) and testing (target) speech samples come from different corpora leading to a feature distribution gap between them. To solve this problem, we propose a simple yet effective method called deep transductive transfer regression network (DTTRN). The basic idea of DTTRN is to learn a corpus invariant deep neural network to bridge the source and target speech samples and their label information. Following this idea, we make use of a transductive learning way to enforce a deep regressor to build the relationship between the features and emotional labels jointly in both speech corpora. Meanwhile, we also design an emotion guided regularization term for learning DTTRN by aligning source and target speech samples' feature distributions from three different scales. Thus, the DTTRN only absorbing the label information provided by source speech samples is able to correctly predict the emotions of the target ones. To evaluate DTTRN, we conduct extensive cross-corpus SER experiments on EmoDB, CASIA, and eNTERFACE corpora. Experimental results show the superior performance of our DTTRN over recent state-of-the-art deep transfer learning methods in dealing with the cross-corpus SER tasks.
引用
收藏
页码:371 / 375
页数:5
相关论文
共 50 条
  • [1] Progressively Discriminative Transfer Network for Cross-Corpus Speech Emotion Recognition
    Lu, Cheng
    Tang, Chuangao
    Zhang, Jiacheng
    Zong, Yuan
    [J]. ENTROPY, 2022, 24 (08)
  • [2] Cross-Corpus Speech Emotion Recognition Based on Joint Transfer Subspace Learning and Regression
    Zhang, Weijian
    Song, Peng
    Chen, Dongliang
    Sheng, Chao
    Zhang, Wenjing
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (02) : 588 - 598
  • [3] Transferable discriminant linear regression for cross-corpus speech emotion recognition
    Li, Shaokai
    Song, Peng
    Zhang, Wenjing
    [J]. APPLIED ACOUSTICS, 2022, 197
  • [4] A CROSS-CORPUS STUDY ON SPEECH EMOTION RECOGNITION
    Milner, Rosanna
    Jalal, Md Asif
    Ng, Raymond W. M.
    Hain, Thomas
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 304 - 311
  • [5] Deep Cross-Corpus Speech Emotion Recognition: Recent Advances and Perspectives
    Zhang, Shiqing
    Liu, Ruixin
    Tao, Xin
    Zhao, Xiaoming
    [J]. FRONTIERS IN NEUROROBOTICS, 2021, 15
  • [6] Analysis of Deep Learning Architectures for Cross-corpus Speech Emotion Recognition
    Parry, Jack
    Palaz, Dimitri
    Clarke, Georgia
    Lecomte, Pauline
    Mead, Rebecca
    Berger, Michael
    Hofer, Gregor
    [J]. INTERSPEECH 2019, 2019, : 1656 - 1660
  • [7] Transfer Linear Subspace Learning for Cross-Corpus Speech Emotion Recognition
    Song, Peng
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (02) : 265 - 275
  • [8] DOMAIN GENERALIZATION WITH TRIPLET NETWORK FOR CROSS-CORPUS SPEECH EMOTION RECOGNITION
    Lee, Shi-wook
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 389 - 396
  • [9] Transfer Subspace Learning for Unsupervised Cross-Corpus Speech Emotion Recognition
    Liu, Na
    Zhang, Baofeng
    Liu, Bin
    Shi, Jingang
    Yang, Lei
    Li, Zhiwei
    Zhu, Junchao
    [J]. IEEE ACCESS, 2021, 9 : 95925 - 95937
  • [10] Domain Generalization with Triplet Network for Cross-Corpus Speech Emotion Recognition
    Lee, Shi-Wook
    [J]. 2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings, 2021, : 389 - 396