Sequential Modeling by Leveraging Non-Uniform Distribution of Speech Emotion

被引：2

作者：

Lin, Wei-Cheng ^{[1
]}

Busso, Carlos ^{[1
]}

机构：

[1] Univ Texas Dallas, Erik Jonsson Sch Engn & Comp Sci, Richardson, TX 75080 USA

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2023年 / 31卷

基金：

美国国家科学基金会;

关键词：

Hidden Markov models; Task analysis; Emotion recognition; Feature extraction; Annotations; Speech processing; Databases; Emotion rankers; speech emotion recognition; chunk-level segmentation; sequence-to-sequence modeling; RECOGNITION; CORPUS; RANKING;

D O I：

10.1109/TASLP.2023.3244527

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The expression and perception of human emotions are not uniformly distributed over time. Therefore, tracking local changes of emotion within a segment can lead to better models for speech emotion recognition (SER), even when the task is to provide a sentence-level prediction of the emotional content. A challenge to exploring local emotional changes within a sentence is that most existing emotional corpora only provide sentence-level annotations (i.e., one label per sentence). This labeling approach is not appropriate for leveraging the dynamic emotional trends within a sentence. We propose a framework that splits a sentence into a fixed number of chunks, generating chunk-level emotional patterns. The approach relies on emotion rankers to unveil the emotional pattern within a sentence, creating continuous emotional curves. Our approach trains the sentence-level SER model with a sequence-to-sequence formulation by leveraging the retrieved emotional curves. The proposed method achieves the best concordance correlation coefficient (CCC) prediction performance for arousal (0.7120), valence (0.3125), and dominance (0.6324) on the MSP-Podcast corpus. In addition, we validate the approach with experiments on the IEMOCAP and MSP-IMPROV databases. We further compare the retrieved curves with time-continuous emotional traces. The evaluation demonstrates that these retrieved chunk-label curves can effectively capture emotional trends within a sentence, displaying a time-consistency property that is similar to time-continuous traces annotated by human listeners. The proposed SER model learns meaningful, complementary, local information that contributes to the improvement of sentence-level predictions of emotional attributes.

引用

页码：1087 / 1099

页数：13

共 50 条

[21] RELAXATION OF SPATIALLY UNIFORM DISTRIBUTION FUNCTION IN THE CASE OF NON-UNIFORM ENERGY DISTRIBUTION
Sizhuk, A. S.
Yezhov, S. M.
UKRAINIAN JOURNAL OF PHYSICS, 2012, 57 (12): : 1250 - 1256
[22] Modeling the tensile strains of non-uniform fibers
C. M. Deng
L. J. Wang
X. G. Wang
Fibers and Polymers, 2007, 8 : 289 - 294
[23] Modeling the tensile strains of non-uniform fibers
Deng, C. M.
Wang, L. J.
Wang, X. G.
FIBERS AND POLYMERS, 2007, 8 (03) : 289 - 294
[24] Modeling of the non-uniform combustion in a scramjet engine
Gu, Rui
Sun, Mingbo
Cai, Zun
Li, Peibo
Yao, YiZhi
INTERNATIONAL JOURNAL OF HYDROGEN ENERGY, 2021, 46 (52) : 26607 - 26615
[25] Business Process Modeling for Non-uniform Work
Tarkkanen, Kimmo
ENTERPRISE INFORMATION SYSTEMS-B, 2009, 19 : 188 - 200
[26] Business process modeling for non-uniform work
Tarkkanen, Kimmo
Lecture Notes in Business Information Processing, 2009, 19 : 188 - 200
[27] INSTABILITY OF A NON-UNIFORM DISTRIBUTION OF CURRENT AND FIELD
GUREVICH, LE
IOFFE, VI
DOKLADY AKADEMII NAUK SSSR, 1966, 168 (01): : 65 - &
[28] To Detect the Distribution of Non-Uniform Random Waypoints
Wang, Ting
Low, Chor Ping
IEEE COMMUNICATIONS LETTERS, 2011, 15 (02) : 193 - 195
[29] NON-UNIFORM DISTRIBUTION OF STRUCTURAL IMPURITIES IN QUARTZ
SIEBERS, FB
GIESE, U
FLORKE, OW
ACTA CRYSTALLOGRAPHICA SECTION A, 1984, 40 : C332 - C332
[30] A non-uniform distribution of the nearest brown dwarfs
Bihain, G.
Scholz, R. -D.
ASTRONOMY & ASTROPHYSICS, 2016, 589

← 1 2 3 4 5 →