Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability

被引:0
|
作者
Ahn, Youngdo [1 ]
Han, Sangwook [1 ]
Lee, Seonggyu [1 ]
Shin, Jong Won [1 ]
机构
[1] Gwangju Inst Sci & Technol, Sch Elect Engn & Comp Sci, Gwangju 61005, South Korea
关键词
speech emotion recognition; out-of-corpus; generalization; relative difficulty; labeling reliability; CORPUS;
D O I
10.3390/s24134111
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Emotions in speech are expressed in various ways, and the speech emotion recognition (SER) model may perform poorly on unseen corpora that contain different emotional factors from those expressed in training databases. To construct an SER model robust to unseen corpora, regularization approaches or metric losses have been studied. In this paper, we propose an SER method that incorporates relative difficulty and labeling reliability of each training sample. Inspired by the Proxy-Anchor loss, we propose a novel loss function which gives higher gradients to the samples for which the emotion labels are more difficult to estimate among those in the given minibatch. Since the annotators may label the emotion based on the emotional expression which resides in the conversational context or other modality but is not apparent in the given speech utterance, some of the emotional labels may not be reliable and these unreliable labels may affect the proposed loss function more severely. In this regard, we propose to apply label smoothing for the samples misclassified by a pre-trained SER model. Experimental results showed that the performance of the SER on unseen corpora was improved by adopting the proposed loss function with label smoothing on the misclassified data.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] AESR: Speech Recognition With Speech Emotion Recogniting Learning
    Han, RongQi
    Liu, Xin
    Zhang, Hui
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2024, 2025, 2312 : 91 - 101
  • [32] Research on Emergency Parking Instruction Recognition Based on Speech Recognition and Speech Emotion Recognition
    Tian Kexin
    Huang Yongming
    Zhang Guobao
    Zhang Lin
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 2933 - 2937
  • [33] Speech Emotion Recognition: A Comprehensive Survey
    Mohammed Jawad Al-Dujaili
    Abbas Ebrahimi-Moghadam
    Wireless Personal Communications, 2023, 129 : 2525 - 2561
  • [34] Speech Emotion Recognition using DWT
    Lalitha, S.
    Mudupu, Anoop
    Nandyala, Bala Visali
    Munagala, Renuka
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2015, : 20 - 23
  • [35] A Review on Emotion Recognition using Speech
    Basu, Saikat
    Chakraborty, Jaybrata
    Bag, Arnab
    Aftabuddin, Md.
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2017, : 109 - 114
  • [36] Speech Emotion Recognition Using CNN
    Huang, Zhengwei
    Dong, Ming
    Mao, Qirong
    Zhan, Yongzhao
    PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 801 - 804
  • [37] Representation Learning for Speech Emotion Recognition
    Ghosh, Sayan
    Laksana, Eugene
    Morency, Louis-Philippe
    Scherer, Stefan
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3603 - 3607
  • [38] Survey on Arabic speech emotion recognition
    Iben Nasr L.
    Masmoudi A.
    Hadrich Belguith L.
    International Journal of Speech Technology, 2024, 27 (01) : 53 - 68
  • [39] Speech Emotion Recognition for Performance Interaction
    Vryzas, Nikolaos
    Kotsakis, Rigas
    Liatsou, Aikaterini
    Dimoulas, Charalampos
    Kalliris, George
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2018, 66 (06): : 457 - 467
  • [40] Emotion recognition and synthesis system on speech
    Moriyama, T
    Ozawa, S
    IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 1, 1999, : 840 - 844