Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability

被引：0

作者：

Ahn, Youngdo ^{[1
]}

Han, Sangwook ^{[1
]}

Lee, Seonggyu ^{[1
]}

Shin, Jong Won ^{[1
]}

机构：

[1] Gwangju Inst Sci & Technol, Sch Elect Engn & Comp Sci, Gwangju 61005, South Korea

来源：

SENSORS | 2024年 / 24卷 / 13期

关键词：

speech emotion recognition; out-of-corpus; generalization; relative difficulty; labeling reliability; CORPUS;

D O I：

10.3390/s24134111

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Emotions in speech are expressed in various ways, and the speech emotion recognition (SER) model may perform poorly on unseen corpora that contain different emotional factors from those expressed in training databases. To construct an SER model robust to unseen corpora, regularization approaches or metric losses have been studied. In this paper, we propose an SER method that incorporates relative difficulty and labeling reliability of each training sample. Inspired by the Proxy-Anchor loss, we propose a novel loss function which gives higher gradients to the samples for which the emotion labels are more difficult to estimate among those in the given minibatch. Since the annotators may label the emotion based on the emotional expression which resides in the conversational context or other modality but is not apparent in the given speech utterance, some of the emotional labels may not be reliable and these unreliable labels may affect the proposed loss function more severely. In this regard, we propose to apply label smoothing for the samples misclassified by a pre-trained SER model. Experimental results showed that the performance of the SER on unseen corpora was improved by adopting the proposed loss function with label smoothing on the misclassified data.

引用

页数：12

共 50 条

[21] Progress in speech emotion recognition
Zhang, Xueying
Sun, Ying
Duan, Shufei
TENCON 2015 - 2015 IEEE REGION 10 CONFERENCE, 2015,
[22] Review on speech emotion recognition
Han, W.-J. (hanwenjing07@gmail.com), 1600, Chinese Academy of Sciences (25):
[23] Emotion recognition in Arabic speech
Hadjadji, Imene
Falek, Leila
Demri, Lyes
Teffahi, Hocine
2019 INTERNATIONAL CONFERENCE ON ADVANCED ELECTRICAL ENGINEERING (ICAEE), 2019,
[24] Bengali Speech Emotion Recognition
Mohanta, Abhijit
Sharma, Uzzal
PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 2812 - 2814
[25] Emotion recognition in Arabic speech
Klaylat, Samira
Osman, Ziad
Hamandi, Lama
Zantout, Rached
ANALOG INTEGRATED CIRCUITS AND SIGNAL PROCESSING, 2018, 96 (02) : 337 - 351
[26] Multiroom Speech Emotion Recognition
Shalev, Erez
Cohen, Israel
European Signal Processing Conference, 2022, 2022-August : 135 - 139
[27] The Impact of Face Mask and Emotion on Automatic Speech Recognition (ASR) and Speech Emotion Recognition (SER)
Oh, Qi Qi
Seow, Chee Kiat
Yusuff, Mulliana
Pranata, Sugiri
Cao, Qi
2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA, 2023, : 523 - 531
[28] Emotion Recognition using Imperfect Speech Recognition
Metze, Florian
Batliner, Anton
Eyben, Florian
Polzehl, Tim
Schuller, Bjoern
Steidl, Stefan
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 478 - +
[29] Significance of incorporating excitation source parameters for improved emotion recognition from speech and electroglottographic signals
Pravena D.
Govind D.
Govind, D. (d_govind@cb.amrita.edu), 1600, Springer Science and Business Media, LLC (20): : 787 - 797
[30] PulseEmoNet: Pulse emotion network for speech emotion recognition
Zhang, Huiyun
Tang, Gaigai
Huang, Heming
Yuan, Zhu
Li, Zongjin
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 105

← 1 2 3 4 5 →