Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability

被引：0

作者：

Ahn, Youngdo ^{[1
]}

Han, Sangwook ^{[1
]}

Lee, Seonggyu ^{[1
]}

Shin, Jong Won ^{[1
]}

机构：

[1] Gwangju Inst Sci & Technol, Sch Elect Engn & Comp Sci, Gwangju 61005, South Korea

来源：

SENSORS | 2024年 / 24卷 / 13期

关键词：

speech emotion recognition; out-of-corpus; generalization; relative difficulty; labeling reliability; CORPUS;

D O I：

10.3390/s24134111

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Emotions in speech are expressed in various ways, and the speech emotion recognition (SER) model may perform poorly on unseen corpora that contain different emotional factors from those expressed in training databases. To construct an SER model robust to unseen corpora, regularization approaches or metric losses have been studied. In this paper, we propose an SER method that incorporates relative difficulty and labeling reliability of each training sample. Inspired by the Proxy-Anchor loss, we propose a novel loss function which gives higher gradients to the samples for which the emotion labels are more difficult to estimate among those in the given minibatch. Since the annotators may label the emotion based on the emotional expression which resides in the conversational context or other modality but is not apparent in the given speech utterance, some of the emotional labels may not be reliable and these unreliable labels may affect the proposed loss function more severely. In this regard, we propose to apply label smoothing for the samples misclassified by a pre-trained SER model. Experimental results showed that the performance of the SER on unseen corpora was improved by adopting the proposed loss function with label smoothing on the misclassified data.

引用

页数：12

共 50 条

[41] Automatic emotion recognition by the speech signal
Schuller, B
Lang, M
Rigoll, G
6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL IX, PROCEEDINGS: IMAGE, ACOUSTIC, SPEECH AND SIGNAL PROCESSING II, 2002, : 367 - 372
[42] Robust recognition of emotion from speech
Hoque, Mohammed E.
Yeasin, Mohammed
Louwerse, Max M.
INTELLIGENT VIRTUAL AGENTS, PROCEEDINGS, 2006, 4133 : 42 - 53
[43] Emotion Impacts Speech Recognition Performance
Munot, Rushab
Nenkova, Ani
NAACL HLT 2019: THE 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2019, : 16 - 21
[44] Hierarchical framework for speech emotion recognition
You, Mingyu
Chen, Chun
Bu, Jiajun
Liu, Jia
Tao, Jianhua
2006 IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS, VOLS 1-7, 2006, : 515 - +
[45] Temporal Context in Speech Emotion Recognition
Xia, Yangyang
Chen, Li-Wei
Rudnicky, Alexander
Stern, Richard M.
INTERSPEECH 2021, 2021, : 3370 - 3374
[46] Speech emotion recognition and intensity estimation
Song, ML
Chen, C
Bu, JJ
You, MY
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2004, PT 4, 2004, 3046 : 406 - 413
[47] Techniques and Applications of Emotion Recognition in Speech
Lugovic, S.
Dunder, I.
Horvat, M.
2016 39TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2016, : 1278 - 1283
[48] Emotion recognition on the basis of human speech
Ciota, Zygmunt
ICECOM 2005: 18TH INTERNATIONAL CONFERENCE ON APPLIED ELECTROMAGNETICS AND COMMUNICATIONS, CONFERENCE PROCEEDINGS, 2005, : 467 - 470
[49] Speech emotion recognition for the Urdu language
Zaheer, Nimra
Ahmad, Obaid Ullah
Shabbir, Mudassir
Raza, Agha Ali
LANGUAGE RESOURCES AND EVALUATION, 2023, 57 (02) : 915 - 944
[50] Speech Emotion Recognition: A Comprehensive Survey
Al-Dujaili, Mohammed Jawad
Ebrahimi-Moghadam, Abbas
WIRELESS PERSONAL COMMUNICATIONS, 2023, 129 (04) : 2525 - 2561

← 1 2 3 4 5 →