Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability

被引:0
|
作者
Ahn, Youngdo [1 ]
Han, Sangwook [1 ]
Lee, Seonggyu [1 ]
Shin, Jong Won [1 ]
机构
[1] Gwangju Inst Sci & Technol, Sch Elect Engn & Comp Sci, Gwangju 61005, South Korea
关键词
speech emotion recognition; out-of-corpus; generalization; relative difficulty; labeling reliability; CORPUS;
D O I
10.3390/s24134111
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Emotions in speech are expressed in various ways, and the speech emotion recognition (SER) model may perform poorly on unseen corpora that contain different emotional factors from those expressed in training databases. To construct an SER model robust to unseen corpora, regularization approaches or metric losses have been studied. In this paper, we propose an SER method that incorporates relative difficulty and labeling reliability of each training sample. Inspired by the Proxy-Anchor loss, we propose a novel loss function which gives higher gradients to the samples for which the emotion labels are more difficult to estimate among those in the given minibatch. Since the annotators may label the emotion based on the emotional expression which resides in the conversational context or other modality but is not apparent in the given speech utterance, some of the emotional labels may not be reliable and these unreliable labels may affect the proposed loss function more severely. In this regard, we propose to apply label smoothing for the samples misclassified by a pre-trained SER model. Experimental results showed that the performance of the SER on unseen corpora was improved by adopting the proposed loss function with label smoothing on the misclassified data.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Progress in speech emotion recognition
    Zhang, Xueying
    Sun, Ying
    Duan, Shufei
    TENCON 2015 - 2015 IEEE REGION 10 CONFERENCE, 2015,
  • [22] Review on speech emotion recognition
    Han, W.-J. (hanwenjing07@gmail.com), 1600, Chinese Academy of Sciences (25):
  • [23] Emotion recognition in Arabic speech
    Hadjadji, Imene
    Falek, Leila
    Demri, Lyes
    Teffahi, Hocine
    2019 INTERNATIONAL CONFERENCE ON ADVANCED ELECTRICAL ENGINEERING (ICAEE), 2019,
  • [24] Bengali Speech Emotion Recognition
    Mohanta, Abhijit
    Sharma, Uzzal
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 2812 - 2814
  • [25] Emotion recognition in Arabic speech
    Klaylat, Samira
    Osman, Ziad
    Hamandi, Lama
    Zantout, Rached
    ANALOG INTEGRATED CIRCUITS AND SIGNAL PROCESSING, 2018, 96 (02) : 337 - 351
  • [26] Multiroom Speech Emotion Recognition
    Shalev, Erez
    Cohen, Israel
    European Signal Processing Conference, 2022, 2022-August : 135 - 139
  • [27] The Impact of Face Mask and Emotion on Automatic Speech Recognition (ASR) and Speech Emotion Recognition (SER)
    Oh, Qi Qi
    Seow, Chee Kiat
    Yusuff, Mulliana
    Pranata, Sugiri
    Cao, Qi
    2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA, 2023, : 523 - 531
  • [28] Emotion Recognition using Imperfect Speech Recognition
    Metze, Florian
    Batliner, Anton
    Eyben, Florian
    Polzehl, Tim
    Schuller, Bjoern
    Steidl, Stefan
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 478 - +
  • [29] Significance of incorporating excitation source parameters for improved emotion recognition from speech and electroglottographic signals
    Pravena D.
    Govind D.
    Govind, D. (d_govind@cb.amrita.edu), 1600, Springer Science and Business Media, LLC (20): : 787 - 797
  • [30] PulseEmoNet: Pulse emotion network for speech emotion recognition
    Zhang, Huiyun
    Tang, Gaigai
    Huang, Heming
    Yuan, Zhu
    Li, Zongjin
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 105