Sequential Modeling by Leveraging Non-Uniform Distribution of Speech Emotion

被引:2
|
作者
Lin, Wei-Cheng [1 ]
Busso, Carlos [1 ]
机构
[1] Univ Texas Dallas, Erik Jonsson Sch Engn & Comp Sci, Richardson, TX 75080 USA
基金
美国国家科学基金会;
关键词
Hidden Markov models; Task analysis; Emotion recognition; Feature extraction; Annotations; Speech processing; Databases; Emotion rankers; speech emotion recognition; chunk-level segmentation; sequence-to-sequence modeling; RECOGNITION; CORPUS; RANKING;
D O I
10.1109/TASLP.2023.3244527
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The expression and perception of human emotions are not uniformly distributed over time. Therefore, tracking local changes of emotion within a segment can lead to better models for speech emotion recognition (SER), even when the task is to provide a sentence-level prediction of the emotional content. A challenge to exploring local emotional changes within a sentence is that most existing emotional corpora only provide sentence-level annotations (i.e., one label per sentence). This labeling approach is not appropriate for leveraging the dynamic emotional trends within a sentence. We propose a framework that splits a sentence into a fixed number of chunks, generating chunk-level emotional patterns. The approach relies on emotion rankers to unveil the emotional pattern within a sentence, creating continuous emotional curves. Our approach trains the sentence-level SER model with a sequence-to-sequence formulation by leveraging the retrieved emotional curves. The proposed method achieves the best concordance correlation coefficient (CCC) prediction performance for arousal (0.7120), valence (0.3125), and dominance (0.6324) on the MSP-Podcast corpus. In addition, we validate the approach with experiments on the IEMOCAP and MSP-IMPROV databases. We further compare the retrieved curves with time-continuous emotional traces. The evaluation demonstrates that these retrieved chunk-label curves can effectively capture emotional trends within a sentence, displaying a time-consistency property that is similar to time-continuous traces annotated by human listeners. The proposed SER model learns meaningful, complementary, local information that contributes to the improvement of sentence-level predictions of emotional attributes.
引用
收藏
页码:1087 / 1099
页数:13
相关论文
共 50 条
  • [31] Impedance of Structures with Non-uniform Current Distribution
    L. A. Shendrikova
    N. S. Perov
    N. A. Buznikov
    Nanobiotechnology Reports, 2023, 18 : S341 - S344
  • [32] The distribution of velocities in a slightly non-uniform gas
    Burnett, D
    PROCEEDINGS OF THE LONDON MATHEMATICAL SOCIETY, 1935, 39 : 385 - 430
  • [33] Impedance of Structures with Non-uniform Current Distribution
    Shendrikova, L. A.
    Perov, N. S.
    Buznikov, N. A.
    NANOBIOTECHNOLOGY REPORTS, 2023, 18 (SUPPL 2) : S341 - S344
  • [34] Effectiveness of shelterbelt with a non-uniform density distribution
    Ma, Rui
    Wang, Jihe
    Qu, Jianjun
    Liu, Hujun
    JOURNAL OF WIND ENGINEERING AND INDUSTRIAL AERODYNAMICS, 2010, 98 (12) : 767 - 771
  • [35] A Chinese text-to-speech system based on part-of-speech analysis, prosodic modeling and non-uniform units
    Chou, FC
    Tseng, CY
    Chen, KJ
    Lee, LS
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS, 1997, : 923 - 926
  • [36] Power Delivery Modeling for 3D Systems with Non-Uniform TSV Distribution
    He, Huanyu
    Xu, Zheng
    Gu, Xiaoxiong
    Lu, Jian-Qiang
    2013 IEEE 63RD ELECTRONIC COMPONENTS AND TECHNOLOGY CONFERENCE (ECTC), 2013, : 1115 - 1121
  • [37] The MINC proximity function for fractured reservoirs flow modeling with non-uniform block distribution
    Farah, Nicolas
    Ghadboun, Ali
    OIL & GAS SCIENCE AND TECHNOLOGY-REVUE D IFP ENERGIES NOUVELLES, 2021, 76
  • [38] Numerical modeling of non-uniform indoor temperature distribution for coordinated air flow control
    Li, Yuming
    Pan, Yiqun
    Huang, Zhizhong
    Fu, Ling
    Li, Jing
    Sun, Tianrui
    Zhu, Mingya
    Yuan, Xiaolei
    JOURNAL OF BUILDING ENGINEERING, 2024, 82
  • [39] Modeling of Thermal Field in Active Elements with Non-Uniform Concentration Distribution of Dopant Ions
    Petrov, V. A.
    Kuptsov, G. V.
    Petrov, V. V.
    Kirpichnikov, A. V.
    Laptev, A. V.
    Spichak, M. P.
    Korel, I. I.
    Pestryakov, E. V.
    HIGH ENERGY PROCESSES IN CONDENSED MATTER (HEPCM 2019), 2019, 2125
  • [40] Perceptually non-uniform spectral compression for noisy speech recognition
    Chu, KK
    Leung, SH
    Yip, CS
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 404 - 407