Autoencoder With Emotion Embedding for Speech Emotion Recognition

被引:0
|
作者
Zhang, Chenghao [1 ]
Xue, Lei [1 ]
机构
[1] Shanghai Univ, Sch Commun & Informat Engn, Shanghai 200444, Peoples R China
来源
IEEE ACCESS | 2021年 / 9卷
关键词
Feature extraction; Speech recognition; Emotion recognition; Spectrogram; Noise reduction; Hidden Markov models; Acoustics; Speech emotion recognition; autoencoder; emotion embedding; instance normalization; GENERATION;
D O I
10.1109/ACCESS.2021.3069818
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An important part of the human-computer interaction process is speech emotion recognition (SER), which has been receiving more attention in recent years. However, although a wide diversity of methods has been proposed in SER, these approaches still cannot improve the performance. A key issue in the low performance of the SER system is how to effectively extract emotion-oriented features. In this paper, we propose a novel algorithm, an autoencoder with emotion embedding, to extract deep emotion features. Unlike many previous works, instance normalization, which is a common technique in the style transfer field, is introduced into our model rather than batch normalization. Furthermore, the emotion embedding path in our method can lead the autoencoder to efficiently learn a priori knowledge from the label. It can enable the model to distinguish which features are most related to human emotion. We concatenate the latent representation learned by the autoencoder and acoustic features obtained by the openSMILE toolkit. Finally, the concatenated feature vector is utilized for emotion classification. To improve the generalization of our method, a simple data augmentation approach is applied. Two publicly available and highly popular databases, IEMOCAP and EMODB, are chosen to evaluate our method. Experimental results demonstrate that the proposed model achieves significant performance improvement compared to other speech emotion recognition systems.
引用
收藏
页码:51231 / 51241
页数:11
相关论文
共 50 条
  • [1] Autoencoder with emotion embedding for speech emotion recognition
    Zhang, Chenghao
    Xue, Lei
    [J]. IEEE Access, 2021, 9 : 51231 - 51241
  • [2] Speech Emotion Recognition 'in the wild' Using an Autoencoder
    Dissanayake, Vipula
    Zhang, Haimo
    Billinghurst, Mark
    Nanayakkara, Suranga
    [J]. INTERSPEECH 2020, 2020, : 526 - 530
  • [3] Two-stream Emotion-embedded Autoencoder for Speech Emotion Recognition
    Zhang, Chenghao
    Xue, Lei
    [J]. 2021 IEEE INTERNATIONAL IOT, ELECTRONICS AND MECHATRONICS CONFERENCE (IEMTRONICS), 2021, : 969 - 974
  • [4] Sparse Autoencoder with Attention Mechanism for Speech Emotion Recognition
    Sun, Ting-Wei
    Wu, An-Yeu
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019), 2019, : 146 - 149
  • [5] A VECTOR QUANTIZED MASKED AUTOENCODER FOR SPEECH EMOTION RECOGNITION
    Sadok, Samir
    Leglaive, Simon
    Seguier, Renaud
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
  • [6] Speech Emotion Recognition Using Speech Feature and Word Embedding
    Atmaja, Bagus Tris
    Shirai, Kiyoaki
    Akagi, Masato
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 519 - 523
  • [7] Speech Emotion Recognition Using Spectrogram & Phoneme Embedding
    Yenigalla, Promod
    Kumar, Abhay
    Tripathi, Suraj
    Singh, Chirag
    Kar, Sibsambhu
    Vepa, Jithendra
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3688 - 3692
  • [8] Unsupervised Feature Learning for Speech Emotion Recognition Based on Autoencoder
    Ying, Yangwei
    Tu, Yuanwu
    Zhou, Hong
    [J]. ELECTRONICS, 2021, 10 (17)
  • [9] Performance Evaluation of Deep Autoencoder Network for Speech Emotion Recognition
    AndleebSiddiqui, Maria
    Hussain, Wajahat
    Ali, Syed Abbas
    Danish-ur-Rehman
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (02) : 606 - 611
  • [10] SPEECH EMOTION RECOGNITION USING AUTOENCODER BOTTLENECK FEATURES AND LSTM
    Huang, Kun-Yi
    Wu, Chung-Hsien
    Yang, Tsung-Hsien
    Su, Ming-Hsiang
    Chou, Jia-Hui
    [J]. 2016 INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT), 2018, : 1 - 4