Sequential Modeling by Leveraging Non-Uniform Distribution of Speech Emotion

被引:2
|
作者
Lin, Wei-Cheng [1 ]
Busso, Carlos [1 ]
机构
[1] Univ Texas Dallas, Erik Jonsson Sch Engn & Comp Sci, Richardson, TX 75080 USA
基金
美国国家科学基金会;
关键词
Hidden Markov models; Task analysis; Emotion recognition; Feature extraction; Annotations; Speech processing; Databases; Emotion rankers; speech emotion recognition; chunk-level segmentation; sequence-to-sequence modeling; RECOGNITION; CORPUS; RANKING;
D O I
10.1109/TASLP.2023.3244527
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The expression and perception of human emotions are not uniformly distributed over time. Therefore, tracking local changes of emotion within a segment can lead to better models for speech emotion recognition (SER), even when the task is to provide a sentence-level prediction of the emotional content. A challenge to exploring local emotional changes within a sentence is that most existing emotional corpora only provide sentence-level annotations (i.e., one label per sentence). This labeling approach is not appropriate for leveraging the dynamic emotional trends within a sentence. We propose a framework that splits a sentence into a fixed number of chunks, generating chunk-level emotional patterns. The approach relies on emotion rankers to unveil the emotional pattern within a sentence, creating continuous emotional curves. Our approach trains the sentence-level SER model with a sequence-to-sequence formulation by leveraging the retrieved emotional curves. The proposed method achieves the best concordance correlation coefficient (CCC) prediction performance for arousal (0.7120), valence (0.3125), and dominance (0.6324) on the MSP-Podcast corpus. In addition, we validate the approach with experiments on the IEMOCAP and MSP-IMPROV databases. We further compare the retrieved curves with time-continuous emotional traces. The evaluation demonstrates that these retrieved chunk-label curves can effectively capture emotional trends within a sentence, displaying a time-consistency property that is similar to time-continuous traces annotated by human listeners. The proposed SER model learns meaningful, complementary, local information that contributes to the improvement of sentence-level predictions of emotional attributes.
引用
收藏
页码:1087 / 1099
页数:13
相关论文
共 50 条
  • [1] Importance of Non-Uniform Prosody Modification for Speech Recognition in Emotion Conditions
    Raju, V. V. Vidyadhara
    Vydana, Hari Krishna
    Gangashetty, Suryakanth, V
    Vuppala, Anil Kumar
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 573 - 576
  • [2] Intraparticle Modeling of Non-Uniform Active Phase Distribution Catalyst
    Russo, Vincenzo
    Mastroianni, Luca
    Tesser, Riccardo
    Salmi, Tapio
    Di Serio, Martino
    CHEMENGINEERING, 2020, 4 (02) : 1 - 15
  • [3] Leveraging non-uniform resources for parallel query processing
    Mayr, T
    Bonnet, P
    Gehrke, J
    Seshadri, P
    CCGRID 2003: 3RD IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, PROCEEDINGS, 2003, : 120 - 127
  • [4] Modeling non-uniform distribution of acoustic sources and wave leakage in sunspots
    Parchevsky, K. V.
    Kosovichev, A. G.
    SUBSURFACE AND ATMOSPHERIC INFLUENCES ON SOLAR ACTIVITY, 2008, 383 : 289 - 295
  • [5] Modeling and Linearization of Active Phased Arrays with Non-uniform Power Distribution
    Li, Yunfeng
    Huang, Yonghui
    Shen, Ming
    2019 27TH TELECOMMUNICATIONS FORUM (TELFOR 2019), 2019, : 125 - 128
  • [6] Modeling the Non-Uniform Distribution of Radiation-Induced Interface Traps
    Esqueda, Ivan S.
    Barnaby, Hugh J.
    IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2012, 59 (04) : 723 - 727
  • [7] NON-UNIFORM DISTRIBUTION OF FACES IN A ZONE
    HARTMAN, P
    ZEITSCHRIFT FUR KRISTALLOGRAPHIE, 1965, 121 (01): : 78 - &
  • [8] The l-distribution method for modeling non-gray absorption in uniform and non-uniform gaseous media
    Andre, Frederic
    JOURNAL OF QUANTITATIVE SPECTROSCOPY & RADIATIVE TRANSFER, 2016, 179 : 19 - 32
  • [9] Adaptive piezoelectric metamaterials leveraging non-uniform local resonators
    Wang, Ting
    Dupont, Joshua
    Tang, J.
    ACTIVE AND PASSIVE SMART STRUCTURES AND INTEGRATED SYSTEMS XVII, 2023, 12483
  • [10] Effect of non-uniform temperature on thermal modeling and strain distribution in electronic packaging
    Wakil, J
    Ho, PS
    49TH ELECTRONIC COMPONENTS & TECHNOLOGY CONFERENCE - 1999 PROCEEDINGS, 1999, : 330 - 337