Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability

被引:0
|
作者
Ahn, Youngdo [1 ]
Han, Sangwook [1 ]
Lee, Seonggyu [1 ]
Shin, Jong Won [1 ]
机构
[1] Gwangju Inst Sci & Technol, Sch Elect Engn & Comp Sci, Gwangju 61005, South Korea
关键词
speech emotion recognition; out-of-corpus; generalization; relative difficulty; labeling reliability; CORPUS;
D O I
10.3390/s24134111
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Emotions in speech are expressed in various ways, and the speech emotion recognition (SER) model may perform poorly on unseen corpora that contain different emotional factors from those expressed in training databases. To construct an SER model robust to unseen corpora, regularization approaches or metric losses have been studied. In this paper, we propose an SER method that incorporates relative difficulty and labeling reliability of each training sample. Inspired by the Proxy-Anchor loss, we propose a novel loss function which gives higher gradients to the samples for which the emotion labels are more difficult to estimate among those in the given minibatch. Since the annotators may label the emotion based on the emotional expression which resides in the conversational context or other modality but is not apparent in the given speech utterance, some of the emotional labels may not be reliable and these unreliable labels may affect the proposed loss function more severely. In this regard, we propose to apply label smoothing for the samples misclassified by a pre-trained SER model. Experimental results showed that the performance of the SER on unseen corpora was improved by adopting the proposed loss function with label smoothing on the misclassified data.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Developing an Expressive Speech Labeling Tool Incorporating the Temporal Characteristics of Emotion
    Scherer, Stefan
    Siegert, Ingo
    Bigalke, Lutz
    Meudt, Sascha
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
  • [2] Self-labeling with feature transfer for speech emotion recognition
    Wen, Guihua
    Liao, Huiqiang
    Li, Huihui
    Wen, Pengchen
    Zhang, Tong
    Gao, Sande
    Wang, Bao
    KNOWLEDGE-BASED SYSTEMS, 2022, 254
  • [3] RELATIVE DIFFICULTY AND ROBUSTNESS OF SPEECH RECOGNITION TASKS THAT USE GRAMMATICAL CONSTRAINTS
    SONDHI, MM
    LEVINSON, SE
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1977, 62 : S64 - S64
  • [4] Relative Speech Emotion Recognition Based Artificial Neural Network
    Fu, Liqin
    Mao, Xia
    Chen, Lijiang
    PACIIA: 2008 PACIFIC-ASIA WORKSHOP ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION, VOLS 1-3, PROCEEDINGS, 2008, : 1111 - 1115
  • [5] Speech Emotion Recognition
    Lalitha, S.
    Madhavan, Abhishek
    Bhushan, Bharath
    Saketh, Srinivas
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRONICS, COMPUTERS AND COMMUNICATIONS (ICAECC), 2014,
  • [6] Emotion Prompting for Speech Emotion Recognition
    Zhou, Xingfa
    Li, Min
    Yang, Lan
    Sun, Rui
    Wang, Xin
    Zhan, Huayi
    INTERSPEECH 2023, 2023, : 3108 - 3112
  • [7] SPEECH EMOTION RECOGNITION USING SEMI-SUPERVISED LEARNING WITH EFFICIENT LABELING STRATEGIES
    Zhu, Zhi
    Sato, Yoshinao
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 358 - 365
  • [8] Clustering-Based Speech Emotion Recognition by Incorporating Learned Features and Deep BiLSTM
    Mustaqeem
    Sajjad, Muhammad
    Kwon, Soonil
    IEEE ACCESS, 2020, 8 : 79861 - 79875
  • [9] Speech emotion recognition based on emotion perception
    Gang Liu
    Shifang Cai
    Ce Wang
    EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [10] Speech emotion recognition based on emotion perception
    Liu, Gang
    Cai, Shifang
    Wang, Ce
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)