Two-level discriminative speech emotion recognition model with wave field dynamics: A personalized speech emotion recognition method

被引:3
|
作者
Jia, Ning [1 ]
Zheng, Chunjun [1 ]
机构
[1] Dalian Neusoft Univ Informat, Sch Software, Dalian, Peoples R China
关键词
Speech emotion recognition; Speaker classification; Wave field dynamics; Cross medium; Convolutional recurrent neural network; Two-level discriminative model;
D O I
10.1016/j.comcom.2021.09.013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Presently available speech emotion recognition (SER) methods generally rely on a single SER model. Getting a higher accuracy of SER involves feature extraction method and model design scheme in the speech. However, the generalization performance of models is typically poor because the emotional features of different speakers can vary substantially. The present work addresses this issue by applying a two-level discriminative model to the SER task. The first level places an individual speaker within a specific speaker group according to the speaker's characteristics. The second level constructs a personalized SER model for each group of speakers using the wave field dynamics model and a dual-channel general SER model. Two-level discriminative model are fused for implementing an ensemble learning scheme to achieve effective SER classification. The proposed method is demonstrated to provide higher SER accuracy in experiments based on interactive emotional dynamic motion capture (IEMOCAP) corpus and a custom-built SER corpus. In IEMOCAP corpus, the proposed model improves the recognition accuracy by 7%. In custom-built SER corpus, both masked and unmasked speakers is employed to demonstrate that the proposed method maintains higher SER accuracy.
引用
收藏
页码:161 / 170
页数:10
相关论文
共 50 条
  • [41] Improved discriminative completed local binary pattern for speech emotion recognition
    Tao, Huawei
    Zhang, Xinran
    Liang, Ruiyu
    Zha, Cheng
    Zhao, Li
    Wang, Qingyun
    [J]. Shengxue Xuebao/Acta Acustica, 2016, 41 (06): : 905 - 912
  • [42] Speech emotion recognition using stacked generative and discriminative hybrid models
    Huang, Yongming
    Zhang, Guobao
    Dong, Fei
    Li, Yue
    [J]. Shengxue Xuebao/Acta Acustica, 2013, 38 (02): : 231 - 240
  • [43] A novel feature selection method for speech emotion recognition
    Ozseven, Turgut
    [J]. APPLIED ACOUSTICS, 2019, 146 : 320 - 326
  • [44] Speech Emotion Recognition Method Using Depth Wavefield Extrapolation and Improved Wave Physics Model
    Zheng, Chunjun
    Wang, Chunli
    Jia, Ning
    [J]. 2021 2ND INTERNATIONAL CONFERENCE ON E-COMMERCE AND INTERNET TECHNOLOGY (ECIT 2021), 2021, : 356 - 359
  • [45] Two-stream Emotion-embedded Autoencoder for Speech Emotion Recognition
    Zhang, Chenghao
    Xue, Lei
    [J]. 2021 IEEE INTERNATIONAL IOT, ELECTRONICS AND MECHATRONICS CONFERENCE (IEMTRONICS), 2021, : 969 - 974
  • [46] Interaction and Transition Model for Speech Emotion Recognition in Dialogue
    Zhang, Ruo
    Atsushi, Ando
    Kobashikawa, Satoshi
    Aono, Yushi
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1094 - 1097
  • [47] Double sparse learning model for speech emotion recognition
    Zong, Yuan
    Zheng, Wenming
    Cui, Zhen
    Li, Qiang
    [J]. ELECTRONICS LETTERS, 2016, 52 (16) : 1410 - 1411
  • [48] Recognition of Emotion Intensity Basing on Neutral Speech Model
    Kaminska, Dorota
    Sapinski, Tomasz
    Pelikant, Adam
    [J]. MAN-MACHINE INTERACTIONS 3, 2014, 242 : 451 - 458
  • [49] Speech emotion recognition based on statistical pitch model
    WANG Zhiping ZHAO Li ZOU Cairong (Department of Radio Engineering
    [J]. Chinese Journal of Acoustics, 2006, (01) : 87 - 96
  • [50] Speech Emotion Recognition Based on Acoustic Segment Model
    Zheng, Siyuan
    Du, Jun
    Zhou, Hengshun
    Bai, Xue
    Lee, Chin-Hui
    Li, Shipeng
    [J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,