Two-level discriminative speech emotion recognition model with wave field dynamics: A personalized speech emotion recognition method

被引:3
|
作者
Jia, Ning [1 ]
Zheng, Chunjun [1 ]
机构
[1] Dalian Neusoft Univ Informat, Sch Software, Dalian, Peoples R China
关键词
Speech emotion recognition; Speaker classification; Wave field dynamics; Cross medium; Convolutional recurrent neural network; Two-level discriminative model;
D O I
10.1016/j.comcom.2021.09.013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Presently available speech emotion recognition (SER) methods generally rely on a single SER model. Getting a higher accuracy of SER involves feature extraction method and model design scheme in the speech. However, the generalization performance of models is typically poor because the emotional features of different speakers can vary substantially. The present work addresses this issue by applying a two-level discriminative model to the SER task. The first level places an individual speaker within a specific speaker group according to the speaker's characteristics. The second level constructs a personalized SER model for each group of speakers using the wave field dynamics model and a dual-channel general SER model. Two-level discriminative model are fused for implementing an ensemble learning scheme to achieve effective SER classification. The proposed method is demonstrated to provide higher SER accuracy in experiments based on interactive emotional dynamic motion capture (IEMOCAP) corpus and a custom-built SER corpus. In IEMOCAP corpus, the proposed model improves the recognition accuracy by 7%. In custom-built SER corpus, both masked and unmasked speakers is employed to demonstrate that the proposed method maintains higher SER accuracy.
引用
收藏
页码:161 / 170
页数:10
相关论文
共 50 条
  • [1] Speech Emotion Recognition with Discriminative Feature Learning
    Zhou, Huan
    Liu, Kai
    [J]. INTERSPEECH 2020, 2020, : 4094 - 4097
  • [2] Discriminative Feature Learning for Speech Emotion Recognition
    Zhang, Yuying
    Zou, Yuexian
    Peng, Junyi
    Luo, Danqing
    Huang, Dongyan
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: TEXT AND TIME SERIES, PT IV, 2019, 11730 : 198 - 210
  • [3] English speech emotion recognition method based on speech recognition
    Liu, Man
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (2) : 391 - 398
  • [4] English speech emotion recognition method based on speech recognition
    Man Liu
    [J]. International Journal of Speech Technology, 2022, 25 : 391 - 398
  • [5] TRNet: Two-level Refinement Network leveraging speech enhancement for noise robust speech emotion recognition
    Chen, Chengxin
    Zhang, Pengyuan
    [J]. APPLIED ACOUSTICS, 2024, 225
  • [6] Towards Discriminative Representation Learning for Speech Emotion Recognition
    Li, Runnan
    Wu, Zhiyong
    Jia, Jia
    Bu, Yaohua
    Zhao, Sheng
    Meng, Helen
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5060 - 5066
  • [7] Emotion Recognition in Speech with Latent Discriminative Representations Learning
    Han, Jing
    Zhang, Zixing
    Keren, Gil
    Schuller, Bjorn
    [J]. ACTA ACUSTICA UNITED WITH ACUSTICA, 2018, 104 (05) : 737 - 740
  • [8] Survey on discriminative feature selection for speech emotion recognition
    Xu, Xin
    Li, Ya
    Xu, Xiaoying
    Wen, Zhengqi
    Che, Hao
    Liu, Shanfeng
    Tao, Jianhua
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 345 - +
  • [9] Speech Emotion Recognition
    Lalitha, S.
    Madhavan, Abhishek
    Bhushan, Bharath
    Saketh, Srinivas
    [J]. 2014 INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRONICS, COMPUTERS AND COMMUNICATIONS (ICAECC), 2014,
  • [10] Robotic Emotion Recognition Using Two-Level Features Fusion in Audio Signals of Speech
    Li, Chang
    [J]. IEEE SENSORS JOURNAL, 2022, 22 (18) : 17447 - 17454