Sequential Modeling by Leveraging Non-Uniform Distribution of Speech Emotion

被引:2
|
作者
Lin, Wei-Cheng [1 ]
Busso, Carlos [1 ]
机构
[1] Univ Texas Dallas, Erik Jonsson Sch Engn & Comp Sci, Richardson, TX 75080 USA
基金
美国国家科学基金会;
关键词
Hidden Markov models; Task analysis; Emotion recognition; Feature extraction; Annotations; Speech processing; Databases; Emotion rankers; speech emotion recognition; chunk-level segmentation; sequence-to-sequence modeling; RECOGNITION; CORPUS; RANKING;
D O I
10.1109/TASLP.2023.3244527
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The expression and perception of human emotions are not uniformly distributed over time. Therefore, tracking local changes of emotion within a segment can lead to better models for speech emotion recognition (SER), even when the task is to provide a sentence-level prediction of the emotional content. A challenge to exploring local emotional changes within a sentence is that most existing emotional corpora only provide sentence-level annotations (i.e., one label per sentence). This labeling approach is not appropriate for leveraging the dynamic emotional trends within a sentence. We propose a framework that splits a sentence into a fixed number of chunks, generating chunk-level emotional patterns. The approach relies on emotion rankers to unveil the emotional pattern within a sentence, creating continuous emotional curves. Our approach trains the sentence-level SER model with a sequence-to-sequence formulation by leveraging the retrieved emotional curves. The proposed method achieves the best concordance correlation coefficient (CCC) prediction performance for arousal (0.7120), valence (0.3125), and dominance (0.6324) on the MSP-Podcast corpus. In addition, we validate the approach with experiments on the IEMOCAP and MSP-IMPROV databases. We further compare the retrieved curves with time-continuous emotional traces. The evaluation demonstrates that these retrieved chunk-label curves can effectively capture emotional trends within a sentence, displaying a time-consistency property that is similar to time-continuous traces annotated by human listeners. The proposed SER model learns meaningful, complementary, local information that contributes to the improvement of sentence-level predictions of emotional attributes.
引用
收藏
页码:1087 / 1099
页数:13
相关论文
共 50 条
  • [41] Perceptual Speech Enhancement System Based on Non-Uniform Analysis
    Zoghlami, Novlene
    Lachiri, Zied
    INTERNATIONAL CONFERENCE ON SIGNALS AND ELECTRONIC SYSTEMS (ICSES '10): CONFERENCE PROCEEDINGS, 2010, : 73 - 76
  • [42] Application of Non-uniform Sampling in Compressed Sensing for Speech Signal
    Zhang, Changqing
    Min, Gang
    Ma, Huan
    Li, Xian
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, PT I, 2018, 10954 : 413 - 424
  • [43] Automatic Speech Recognition Based on Non-Uniform Error Criteria
    Fu, Qiang
    Zhao, Yong
    Juang, Biing-Hwang
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (03): : 780 - 793
  • [44] A Method of Real-Time Non-uniform Speech Stretching
    Kupryjanow, Adam
    Czyzewski, Andrzej
    E-BUSINESS AND TELECOMMUNICATIONS, 2012, 314 : 362 - 373
  • [45] Non-uniform error criteria for automatic pattern and speech recognition
    Fu, Qiang
    Mansjur, Dwi Sianto
    Juang, Biing-Hwang
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 1853 - 1856
  • [46] Modeling of photovoltaic system for uniform and non-uniform irradiance: A critical review
    Jena, Debashisha
    Ramana, Vanjari Venkata
    RENEWABLE & SUSTAINABLE ENERGY REVIEWS, 2015, 52 : 400 - 417
  • [47] Study of performance evaluation methods for non-uniform speech segmentation
    Galka, Jakub
    Ziolko, Bartosz
    MUSP '08: MULTIMEDIA SYSTEMS AND SIGNAL PROCESSING, 2008, : 27 - +
  • [48] Modeling of uniform/non-uniform doping effects for MOSFET based on BSIM
    Zhao, Y
    Parke, S
    Burke, F
    CHINESE JOURNAL OF ELECTRONICS, 2004, 13 (03): : 413 - 415
  • [49] Metaplectic geometrical optics for modeling caustics in uniform and non-uniform media
    Lopez, N. A.
    Dodin, I. Y.
    JOURNAL OF OPTICS, 2021, 23 (02)
  • [50] BATCH SEQUENTIAL ESTIMATION WITH NON-UNIFORM MEASUREMENTS AND NON-STATIONARY NOISE
    Ely, Todd A.
    Seubert, Jill
    ASTRODYNAMICS 2017, PTS I-IV, 2018, 162 : 1815 - 1831