SPEECH EMOTION RECOGNITION USING AUTOENCODER BOTTLENECK FEATURES AND LSTM

被引:0
|
作者
Huang, Kun-Yi [1 ]
Wu, Chung-Hsien [1 ]
Yang, Tsung-Hsien [1 ]
Su, Ming-Hsiang [1 ]
Chou, Jia-Hui [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan, Taiwan
关键词
Speech emotion recognition; bottleneck features; long-short term memory;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A complete emotional expression contains a complex temporal course in a conversation. Related research on utterance and segment-level processing lacks considering subtle differences in characteristics and historical information. In this work, as Deep Scattering Spectrum (DSS) can obtain more detailed energy distributions in frequency domain than the Low Level Descriptors (LLDs), this work combines LLDs and DSS as the speech features. Autoencoder neural network is then applied to extract the bottleneck features for dimensionality reduction. Finally, the long-short term memory (LSTM) is employed to characterize temporal variation of speech emotion for emotion recognition. For evaluation, the MHMC emotion database was collected and used for performance evaluation. Experimental results show that the proposed method using the bottleneck features from the combination of the LLDs and DSS achieved an emotion recognition accuracy of 98.1%, outperforming the systems using LLDs or DSS individually.
引用
收藏
页码:1 / 4
页数:4
相关论文
共 50 条
  • [1] Speech Emotion Recognition using MFCC features and LSTM network
    Kumbhar, Harshawardhan S.
    Bhandari, Sheetal U.
    [J]. 2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
  • [2] Speech Emotion Recognition 'in the wild' Using an Autoencoder
    Dissanayake, Vipula
    Zhang, Haimo
    Billinghurst, Mark
    Nanayakkara, Suranga
    [J]. INTERSPEECH 2020, 2020, : 526 - 530
  • [3] Autoencoder With Emotion Embedding for Speech Emotion Recognition
    Zhang, Chenghao
    Xue, Lei
    [J]. IEEE ACCESS, 2021, 9 : 51231 - 51241
  • [4] Autoencoder with emotion embedding for speech emotion recognition
    Zhang, Chenghao
    Xue, Lei
    [J]. IEEE Access, 2021, 9 : 51231 - 51241
  • [5] Multiple Enhancements to LSTM for Learning Emotion-Salient Features in Speech Emotion Recognition
    Hu, Desheng
    Hu, Xinhui
    Xu, Xinkang
    [J]. INTERSPEECH 2022, 2022, : 4720 - 4724
  • [6] Speech Emotion Recognition using Combination of Features
    Zhang, Qingli
    An, Ning
    Wang, Kunxia
    Ren, Fuji
    Li, Lian
    [J]. PROCEEDINGS OF THE 2013 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT CONTROL AND INFORMATION PROCESSING (ICICIP), 2013, : 523 - 528
  • [7] Autoencoder bottleneck features with multi-task optimisation for improved continuous dysarthric speech recognition
    Yue, Zhengjun
    Christensen, Heidi
    Barker, Jon
    [J]. INTERSPEECH 2020, 2020, : 4581 - 4585
  • [8] Speech Emotion Recognition Based on Speech Segment Using LSTM with Attention Model
    Atmaja, Bagus Tris
    Akagi, Masato
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON SIGNALS AND SYSTEMS (ICSIGSYS), 2019, : 40 - 44
  • [9] Facing Realism in Spontaneous Emotion Recognition from Speech: Feature Enhancement by Autoencoder with LSTM Neural Networks
    Zhang, Zixing
    Ringeval, Fabien
    Han, Jing
    Deng, Jun
    Marchi, Erik
    Schuller, Bjoern
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3593 - 3597
  • [10] Speech emotion recognition in Persian based on stacked autoencoder by comparing local and global features
    Bastanfard, Azam
    Abbasian, Alireza
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (23) : 36413 - 36430