Two-level discriminative speech emotion recognition model with wave field dynamics: A personalized speech emotion recognition method

被引:3
|
作者
Jia, Ning [1 ]
Zheng, Chunjun [1 ]
机构
[1] Dalian Neusoft Univ Informat, Sch Software, Dalian, Peoples R China
关键词
Speech emotion recognition; Speaker classification; Wave field dynamics; Cross medium; Convolutional recurrent neural network; Two-level discriminative model;
D O I
10.1016/j.comcom.2021.09.013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Presently available speech emotion recognition (SER) methods generally rely on a single SER model. Getting a higher accuracy of SER involves feature extraction method and model design scheme in the speech. However, the generalization performance of models is typically poor because the emotional features of different speakers can vary substantially. The present work addresses this issue by applying a two-level discriminative model to the SER task. The first level places an individual speaker within a specific speaker group according to the speaker's characteristics. The second level constructs a personalized SER model for each group of speakers using the wave field dynamics model and a dual-channel general SER model. Two-level discriminative model are fused for implementing an ensemble learning scheme to achieve effective SER classification. The proposed method is demonstrated to provide higher SER accuracy in experiments based on interactive emotional dynamic motion capture (IEMOCAP) corpus and a custom-built SER corpus. In IEMOCAP corpus, the proposed model improves the recognition accuracy by 7%. In custom-built SER corpus, both masked and unmasked speakers is employed to demonstrate that the proposed method maintains higher SER accuracy.
引用
收藏
页码:161 / 170
页数:10
相关论文
共 50 条
  • [31] Review on speech emotion recognition
    [J]. Han, W.-J. (hanwenjing07@gmail.com), 1600, Chinese Academy of Sciences (25):
  • [32] Emotion recognition in Arabic speech
    Hadjadji, Imene
    Falek, Leila
    Demri, Lyes
    Teffahi, Hocine
    [J]. 2019 INTERNATIONAL CONFERENCE ON ADVANCED ELECTRICAL ENGINEERING (ICAEE), 2019,
  • [33] Bengali Speech Emotion Recognition
    Mohanta, Abhijit
    Sharma, Uzzal
    [J]. PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 2812 - 2814
  • [34] Multiroom Speech Emotion Recognition
    Shalev, Erez
    Cohen, Israel
    [J]. European Signal Processing Conference, 2022, 2022-August : 135 - 139
  • [35] Emotion recognition in Arabic speech
    Klaylat, Samira
    Osman, Ziad
    Hamandi, Lama
    Zantout, Rached
    [J]. ANALOG INTEGRATED CIRCUITS AND SIGNAL PROCESSING, 2018, 96 (02) : 337 - 351
  • [36] A Study on the Search of the Most Discriminative Speech Features in the Speaker Dependent Speech Emotion Recognition
    Pao, Tsang-Long
    Wang, Chun-Hsiang
    Li, Yu-Ji
    [J]. 2012 FIFTH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING (PAAP), 2012, : 157 - 162
  • [37] The Impact of Face Mask and Emotion on Automatic Speech Recognition (ASR) and Speech Emotion Recognition (SER)
    Oh, Qi Qi
    Seow, Chee Kiat
    Yusuff, Mulliana
    Pranata, Sugiri
    Cao, Qi
    [J]. 2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA, 2023, : 523 - 531
  • [38] Emotion Recognition using Imperfect Speech Recognition
    Metze, Florian
    Batliner, Anton
    Eyben, Florian
    Polzehl, Tim
    Schuller, Bjoern
    Steidl, Stefan
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 478 - +
  • [39] Speech emotion recognition based on an improved brain emotion learning model
    Liu, Zhen-Tao
    Xie, Qiao
    Wu, Min
    Cao, Wei-Hua
    Mei, Ying
    Mao, Jun-Wei
    [J]. NEUROCOMPUTING, 2018, 309 : 145 - 156
  • [40] Speech emotion recognition using nonlinear dynamics features
    Shahzadi, Ali
    Ahmadyfard, Alireza
    Harimi, Ali
    Yaghmaie, Khashayar
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2015, 23 : 2056 - 2073