Autoencoder With Emotion Embedding for Speech Emotion Recognition

被引:0
|
作者
Zhang, Chenghao [1 ]
Xue, Lei [1 ]
机构
[1] Shanghai Univ, Sch Commun & Informat Engn, Shanghai 200444, Peoples R China
来源
IEEE ACCESS | 2021年 / 9卷
关键词
Feature extraction; Speech recognition; Emotion recognition; Spectrogram; Noise reduction; Hidden Markov models; Acoustics; Speech emotion recognition; autoencoder; emotion embedding; instance normalization; GENERATION;
D O I
10.1109/ACCESS.2021.3069818
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An important part of the human-computer interaction process is speech emotion recognition (SER), which has been receiving more attention in recent years. However, although a wide diversity of methods has been proposed in SER, these approaches still cannot improve the performance. A key issue in the low performance of the SER system is how to effectively extract emotion-oriented features. In this paper, we propose a novel algorithm, an autoencoder with emotion embedding, to extract deep emotion features. Unlike many previous works, instance normalization, which is a common technique in the style transfer field, is introduced into our model rather than batch normalization. Furthermore, the emotion embedding path in our method can lead the autoencoder to efficiently learn a priori knowledge from the label. It can enable the model to distinguish which features are most related to human emotion. We concatenate the latent representation learned by the autoencoder and acoustic features obtained by the openSMILE toolkit. Finally, the concatenated feature vector is utilized for emotion classification. To improve the generalization of our method, a simple data augmentation approach is applied. Two publicly available and highly popular databases, IEMOCAP and EMODB, are chosen to evaluate our method. Experimental results demonstrate that the proposed model achieves significant performance improvement compared to other speech emotion recognition systems.
引用
收藏
页码:51231 / 51241
页数:11
相关论文
共 50 条
  • [41] Speech emotion recognition using emotion perception spectral feature
    Jiang, Lin
    Tan, Ping
    Yang, Junfeng
    Liu, Xingbao
    Wang, Chao
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (11):
  • [42] Integrating Language and Emotion Features for Multilingual Speech Emotion Recognition
    Heracleous, Panikos
    Mohammad, Yasser
    Yoneyama, Akio
    [J]. HUMAN-COMPUTER INTERACTION. MULTIMODAL AND NATURAL INTERACTION, HCI 2020, PT II, 2020, 12182 : 187 - 196
  • [43] Emotion Recognition using Imperfect Speech Recognition
    Metze, Florian
    Batliner, Anton
    Eyben, Florian
    Polzehl, Tim
    Schuller, Bjoern
    Steidl, Stefan
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 478 - +
  • [44] Autonomous Emotion Learning in Speech: A View of Zero-Shot Speech Emotion Recognition
    Xu, Xinzhou
    Deng, Jun
    Cummins, Nicholas
    Zhang, Zixing
    Zhao, Li
    Schuller, Bjorn W.
    [J]. INTERSPEECH 2019, 2019, : 949 - 953
  • [45] Informative Speech Features based on Emotion Classes and Gender in Explainable Speech Emotion Recognition
    Yildirim, Huseyin Ediz
    Iren, Deniz
    [J]. 2023 11TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2023,
  • [46] Empirical Interpretation of Speech Emotion Perception with Attention Based Model for Speech Emotion Recognition
    Jalal, Md Asif
    Milner, Rosanna
    Hain, Thomas
    [J]. INTERSPEECH 2020, 2020, : 4113 - 4117
  • [47] English speech emotion recognition method based on speech recognition
    Liu, Man
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (2) : 391 - 398
  • [48] English speech emotion recognition method based on speech recognition
    Man Liu
    [J]. International Journal of Speech Technology, 2022, 25 : 391 - 398
  • [49] Speech Emotion Recognition Using CNN
    Huang, Zhengwei
    Dong, Ming
    Mao, Qirong
    Zhan, Yongzhao
    [J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 801 - 804
  • [50] Speech Emotion Recognition using DWT
    Lalitha, S.
    Mudupu, Anoop
    Nandyala, Bala Visali
    Munagala, Renuka
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2015, : 20 - 23