Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability

被引：0

作者：

Ahn, Youngdo ^{[1
]}

Han, Sangwook ^{[1
]}

Lee, Seonggyu ^{[1
]}

Shin, Jong Won ^{[1
]}

机构：

[1] Gwangju Inst Sci & Technol, Sch Elect Engn & Comp Sci, Gwangju 61005, South Korea

来源：

SENSORS | 2024年 / 24卷 / 13期

关键词：

speech emotion recognition; out-of-corpus; generalization; relative difficulty; labeling reliability; CORPUS;

D O I：

10.3390/s24134111

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Emotions in speech are expressed in various ways, and the speech emotion recognition (SER) model may perform poorly on unseen corpora that contain different emotional factors from those expressed in training databases. To construct an SER model robust to unseen corpora, regularization approaches or metric losses have been studied. In this paper, we propose an SER method that incorporates relative difficulty and labeling reliability of each training sample. Inspired by the Proxy-Anchor loss, we propose a novel loss function which gives higher gradients to the samples for which the emotion labels are more difficult to estimate among those in the given minibatch. Since the annotators may label the emotion based on the emotional expression which resides in the conversational context or other modality but is not apparent in the given speech utterance, some of the emotional labels may not be reliable and these unreliable labels may affect the proposed loss function more severely. In this regard, we propose to apply label smoothing for the samples misclassified by a pre-trained SER model. Experimental results showed that the performance of the SER on unseen corpora was improved by adopting the proposed loss function with label smoothing on the misclassified data.

引用

页数：12

共 50 条

[31] AESR: Speech Recognition With Speech Emotion Recogniting Learning
Han, RongQi
Liu, Xin
Zhang, Hui
MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2024, 2025, 2312 : 91 - 101
[32] Research on Emergency Parking Instruction Recognition Based on Speech Recognition and Speech Emotion Recognition
Tian Kexin
Huang Yongming
Zhang Guobao
Zhang Lin
2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 2933 - 2937
[33] Speech Emotion Recognition: A Comprehensive Survey
Mohammed Jawad Al-Dujaili
Abbas Ebrahimi-Moghadam
Wireless Personal Communications, 2023, 129 : 2525 - 2561
[34] Speech Emotion Recognition using DWT
Lalitha, S.
Mudupu, Anoop
Nandyala, Bala Visali
Munagala, Renuka
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2015, : 20 - 23
[35] A Review on Emotion Recognition using Speech
Basu, Saikat
Chakraborty, Jaybrata
Bag, Arnab
Aftabuddin, Md.
PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2017, : 109 - 114
[36] Speech Emotion Recognition Using CNN
Huang, Zhengwei
Dong, Ming
Mao, Qirong
Zhan, Yongzhao
PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 801 - 804
[37] Representation Learning for Speech Emotion Recognition
Ghosh, Sayan
Laksana, Eugene
Morency, Louis-Philippe
Scherer, Stefan
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3603 - 3607
[38] Survey on Arabic speech emotion recognition
Iben Nasr L.
Masmoudi A.
Hadrich Belguith L.
International Journal of Speech Technology, 2024, 27 (01) : 53 - 68
[39] Speech Emotion Recognition for Performance Interaction
Vryzas, Nikolaos
Kotsakis, Rigas
Liatsou, Aikaterini
Dimoulas, Charalampos
Kalliris, George
JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2018, 66 (06): : 457 - 467
[40] Emotion recognition and synthesis system on speech
Moriyama, T
Ozawa, S
IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 1, 1999, : 840 - 844

← 1 2 3 4 5 →