SPEECH EMOTION RECOGNITION USING AUTOENCODER BOTTLENECK FEATURES AND LSTM

被引:0
|
作者
Huang, Kun-Yi [1 ]
Wu, Chung-Hsien [1 ]
Yang, Tsung-Hsien [1 ]
Su, Ming-Hsiang [1 ]
Chou, Jia-Hui [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan, Taiwan
关键词
Speech emotion recognition; bottleneck features; long-short term memory;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A complete emotional expression contains a complex temporal course in a conversation. Related research on utterance and segment-level processing lacks considering subtle differences in characteristics and historical information. In this work, as Deep Scattering Spectrum (DSS) can obtain more detailed energy distributions in frequency domain than the Low Level Descriptors (LLDs), this work combines LLDs and DSS as the speech features. Autoencoder neural network is then applied to extract the bottleneck features for dimensionality reduction. Finally, the long-short term memory (LSTM) is employed to characterize temporal variation of speech emotion for emotion recognition. For evaluation, the MHMC emotion database was collected and used for performance evaluation. Experimental results show that the proposed method using the bottleneck features from the combination of the LLDs and DSS achieved an emotion recognition accuracy of 98.1%, outperforming the systems using LLDs or DSS individually.
引用
收藏
页码:1 / 4
页数:4
相关论文
共 50 条
  • [41] EXTRACTING DEEP BOTTLENECK FEATURES FOR VISUAL SPEECH RECOGNITION
    Sui, Chao
    Togneri, Roberto
    Bennamoun, Mohammed
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 1518 - 1522
  • [42] Deep Autoencoder based Speech Features for Improved Dysarthric Speech Recognition
    Vachhani, Bhavik
    Bhat, Chitralekha
    Das, Biswajit
    Kopparapu, Sunil Kumar
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1854 - 1858
  • [43] Attention-Based Dense LSTM for Speech Emotion Recognition
    Xie, Yue
    Liang, Ruiyu
    Liang, Zhenlin
    Zhao, Li
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (07): : 1426 - 1429
  • [44] SPEECH EMOTION RECOGNITION WITH DUAL-SEQUENCE LSTM ARCHITECTURE
    Wang, Jianyou
    Xue, Michael
    Culhane, Ryan
    Diao, Enmao
    Ding, Jie
    Tarokh, Vahid
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6474 - 6478
  • [45] Siamese Attention-Based LSTM for Speech Emotion Recognition
    Nizamidin, Tashpolat
    Zhao, Li
    Liang, Ruiyu
    Xie, Yue
    Hamdulla, Askar
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2020, E103A (07) : 937 - 941
  • [46] Emotion Recognition from Speech - an LSTM approach with the Tess Dataset
    Pandiammal, Sankara K.
    Karishma, S.
    Sakthe, Harine K.
    Manimaran, V
    Kalaiselvi, S.
    Anitha, V
    [J]. 2024 5TH INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN INFORMATION TECHNOLOGY, ICITIIT 2024, 2024,
  • [47] Urdu Speech Emotion Recognition using Speech Spectral Features and Deep Learning Techniques
    Taj, Soonh
    Shaikh, Ghulam Mujtaba
    Hassan, Saif
    Nimra
    [J]. 2023 4th International Conference on Computing, Mathematics and Engineering Technologies: Sustainable Technologies for Socio-Economic Development, iCoMET 2023, 2023,
  • [48] Speech Emotion Recognition Using Cross-Correlation and Acoustic Features
    Chatterjee, Joyjit
    Mukesh, Vajja
    Hsu, Hui-Huang
    Vyas, Garima
    Liu, Zhen
    [J]. 2018 16TH IEEE INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP, 16TH IEEE INT CONF ON PERVAS INTELLIGENCE AND COMP, 4TH IEEE INT CONF ON BIG DATA INTELLIGENCE AND COMP, 3RD IEEE CYBER SCI AND TECHNOL CONGRESS (DASC/PICOM/DATACOM/CYBERSCITECH), 2018, : 243 - 249
  • [49] Emotion recognition from speech using global and local prosodic features
    Rao K.S.
    Koolagudi S.G.
    Vempada R.R.
    [J]. International Journal of Speech Technology, 2013, 16 (2) : 143 - 160
  • [50] Emotion recognition from telephone speech using acoustic and nonlinear features
    Bedoya-Jaramillo, S.
    Orozco-Arroyave, J. R.
    Arias-Londono, J. D.
    Vargas-Bonilla, J. F.
    [J]. 2013 47TH INTERNATIONAL CARNAHAN CONFERENCE ON SECURITY TECHNOLOGY (ICCST), 2013,