Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability

被引:0
|
作者
Ahn, Youngdo [1 ]
Han, Sangwook [1 ]
Lee, Seonggyu [1 ]
Shin, Jong Won [1 ]
机构
[1] Gwangju Inst Sci & Technol, Sch Elect Engn & Comp Sci, Gwangju 61005, South Korea
关键词
speech emotion recognition; out-of-corpus; generalization; relative difficulty; labeling reliability; CORPUS;
D O I
10.3390/s24134111
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Emotions in speech are expressed in various ways, and the speech emotion recognition (SER) model may perform poorly on unseen corpora that contain different emotional factors from those expressed in training databases. To construct an SER model robust to unseen corpora, regularization approaches or metric losses have been studied. In this paper, we propose an SER method that incorporates relative difficulty and labeling reliability of each training sample. Inspired by the Proxy-Anchor loss, we propose a novel loss function which gives higher gradients to the samples for which the emotion labels are more difficult to estimate among those in the given minibatch. Since the annotators may label the emotion based on the emotional expression which resides in the conversational context or other modality but is not apparent in the given speech utterance, some of the emotional labels may not be reliable and these unreliable labels may affect the proposed loss function more severely. In this regard, we propose to apply label smoothing for the samples misclassified by a pre-trained SER model. Experimental results showed that the performance of the SER on unseen corpora was improved by adopting the proposed loss function with label smoothing on the misclassified data.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Automatic emotion recognition by the speech signal
    Schuller, B
    Lang, M
    Rigoll, G
    6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL IX, PROCEEDINGS: IMAGE, ACOUSTIC, SPEECH AND SIGNAL PROCESSING II, 2002, : 367 - 372
  • [42] Robust recognition of emotion from speech
    Hoque, Mohammed E.
    Yeasin, Mohammed
    Louwerse, Max M.
    INTELLIGENT VIRTUAL AGENTS, PROCEEDINGS, 2006, 4133 : 42 - 53
  • [43] Emotion Impacts Speech Recognition Performance
    Munot, Rushab
    Nenkova, Ani
    NAACL HLT 2019: THE 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2019, : 16 - 21
  • [44] Hierarchical framework for speech emotion recognition
    You, Mingyu
    Chen, Chun
    Bu, Jiajun
    Liu, Jia
    Tao, Jianhua
    2006 IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS, VOLS 1-7, 2006, : 515 - +
  • [45] Temporal Context in Speech Emotion Recognition
    Xia, Yangyang
    Chen, Li-Wei
    Rudnicky, Alexander
    Stern, Richard M.
    INTERSPEECH 2021, 2021, : 3370 - 3374
  • [46] Speech emotion recognition and intensity estimation
    Song, ML
    Chen, C
    Bu, JJ
    You, MY
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2004, PT 4, 2004, 3046 : 406 - 413
  • [47] Techniques and Applications of Emotion Recognition in Speech
    Lugovic, S.
    Dunder, I.
    Horvat, M.
    2016 39TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2016, : 1278 - 1283
  • [48] Emotion recognition on the basis of human speech
    Ciota, Zygmunt
    ICECOM 2005: 18TH INTERNATIONAL CONFERENCE ON APPLIED ELECTROMAGNETICS AND COMMUNICATIONS, CONFERENCE PROCEEDINGS, 2005, : 467 - 470
  • [49] Speech emotion recognition for the Urdu language
    Zaheer, Nimra
    Ahmad, Obaid Ullah
    Shabbir, Mudassir
    Raza, Agha Ali
    LANGUAGE RESOURCES AND EVALUATION, 2023, 57 (02) : 915 - 944
  • [50] Speech Emotion Recognition: A Comprehensive Survey
    Al-Dujaili, Mohammed Jawad
    Ebrahimi-Moghadam, Abbas
    WIRELESS PERSONAL COMMUNICATIONS, 2023, 129 (04) : 2525 - 2561