Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability

被引：0

作者：

Ahn, Youngdo ^{[1
]}

Han, Sangwook ^{[1
]}

Lee, Seonggyu ^{[1
]}

Shin, Jong Won ^{[1
]}

机构：

[1] Gwangju Inst Sci & Technol, Sch Elect Engn & Comp Sci, Gwangju 61005, South Korea

来源：

SENSORS | 2024年 / 24卷 / 13期

关键词：

speech emotion recognition; out-of-corpus; generalization; relative difficulty; labeling reliability; CORPUS;

D O I：

10.3390/s24134111

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Emotions in speech are expressed in various ways, and the speech emotion recognition (SER) model may perform poorly on unseen corpora that contain different emotional factors from those expressed in training databases. To construct an SER model robust to unseen corpora, regularization approaches or metric losses have been studied. In this paper, we propose an SER method that incorporates relative difficulty and labeling reliability of each training sample. Inspired by the Proxy-Anchor loss, we propose a novel loss function which gives higher gradients to the samples for which the emotion labels are more difficult to estimate among those in the given minibatch. Since the annotators may label the emotion based on the emotional expression which resides in the conversational context or other modality but is not apparent in the given speech utterance, some of the emotional labels may not be reliable and these unreliable labels may affect the proposed loss function more severely. In this regard, we propose to apply label smoothing for the samples misclassified by a pre-trained SER model. Experimental results showed that the performance of the SER on unseen corpora was improved by adopting the proposed loss function with label smoothing on the misclassified data.

引用

页数：12

共 50 条

[1] Developing an Expressive Speech Labeling Tool Incorporating the Temporal Characteristics of Emotion
Scherer, Stefan
Siegert, Ingo
Bigalke, Lutz
Meudt, Sascha
LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
[2] Self-labeling with feature transfer for speech emotion recognition
Wen, Guihua
Liao, Huiqiang
Li, Huihui
Wen, Pengchen
Zhang, Tong
Gao, Sande
Wang, Bao
KNOWLEDGE-BASED SYSTEMS, 2022, 254
[3] RELATIVE DIFFICULTY AND ROBUSTNESS OF SPEECH RECOGNITION TASKS THAT USE GRAMMATICAL CONSTRAINTS
SONDHI, MM
LEVINSON, SE
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1977, 62 : S64 - S64
[4] Relative Speech Emotion Recognition Based Artificial Neural Network
Fu, Liqin
Mao, Xia
Chen, Lijiang
PACIIA: 2008 PACIFIC-ASIA WORKSHOP ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION, VOLS 1-3, PROCEEDINGS, 2008, : 1111 - 1115
[5] Speech Emotion Recognition
Lalitha, S.
Madhavan, Abhishek
Bhushan, Bharath
Saketh, Srinivas
2014 INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRONICS, COMPUTERS AND COMMUNICATIONS (ICAECC), 2014,
[6] Emotion Prompting for Speech Emotion Recognition
Zhou, Xingfa
Li, Min
Yang, Lan
Sun, Rui
Wang, Xin
Zhan, Huayi
INTERSPEECH 2023, 2023, : 3108 - 3112
[7] SPEECH EMOTION RECOGNITION USING SEMI-SUPERVISED LEARNING WITH EFFICIENT LABELING STRATEGIES
Zhu, Zhi
Sato, Yoshinao
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 358 - 365
[8] Clustering-Based Speech Emotion Recognition by Incorporating Learned Features and Deep BiLSTM
Mustaqeem
Sajjad, Muhammad
Kwon, Soonil
IEEE ACCESS, 2020, 8 : 79861 - 79875
[9] Speech emotion recognition based on emotion perception
Gang Liu
Shifang Cai
Ce Wang
EURASIP Journal on Audio, Speech, and Music Processing, 2023
[10] Speech emotion recognition based on emotion perception
Liu, Gang
Cai, Shifang
Wang, Ce
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)

← 1 2 3 4 5 →