Speech Emotion Recognition Using a Dual-Channel Complementary Spectrogram and the CNN-SSAE Neutral Network

被引:10
|
作者
Li, Juan [1 ,2 ]
Zhang, Xueying [1 ]
Huang, Lixia [1 ]
Li, Fenglian [1 ]
Duan, Shufei [1 ]
Sun, Ying [1 ]
机构
[1] Taiyuan Univ Technol, Coll Informat & Comp, Jinzhong 030600, Peoples R China
[2] Yuncheng Univ, Dept Phys & Elect Engn, Yuncheng 044000, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 19期
基金
中国国家自然科学基金;
关键词
speech emotion recognition; deep learning; Mel spectrogram; IMel spectrogram; STACKED SPARSE AUTOENCODER; SPECTRAL FEATURES; STRESS RECOGNITION; NEURAL-NETWORK; MODEL; PSO;
D O I
10.3390/app12199518
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Featured Application Emotion recognition is the computer's automatic recognition of the emotional state of input speech. It is a hot research field, resulting from the mutual infiltration and interweaving of phonetics, psychology, digital signal processing, pattern recognition, and artificial intelligence. At present, speech emotion recognition has been widely used in the fields of intelligent signal processing, smart medical care, business intelligence, assistant lie detection, criminal investigation, the service industry, self-driving cars, voice assistants of smartphones, and human psychoanalysis, etc. In the background of artificial intelligence, the realization of smooth communication between people and machines has become the goal pursued by people. Mel spectrograms is a common method used in speech emotion recognition, focusing on the low-frequency part of speech. In contrast, the inverse Mel (IMel) spectrogram, which focuses on the high-frequency part, is proposed to comprehensively analyze emotions. Because the convolutional neural network-stacked sparse autoencoder (CNN-SSAE) can extract deep optimized features, the Mel-IMel dual-channel complementary structure is proposed. In the first channel, a CNN is used to extract the low-frequency information of the Mel spectrogram. The other channel extracts the high-frequency information of the IMel spectrogram. This information is transmitted into an SSAE to reduce the number of dimensions, and obtain the optimized information. Experimental results show that the highest recognition rates achieved on the EMO-DB, SAVEE, and RAVDESS datasets were 94.79%, 88.96%, and 83.18%, respectively. The conclusions are that the recognition rate of the two spectrograms was higher than that of each of the single spectrograms, which proves that the two spectrograms are complementary. The SSAE followed the CNN to get the optimized information, and the recognition rate was further improved, which proves the effectiveness of the CNN-SSAE network.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] Wearable Wireless Dual-Channel EEG System for Emotion Recognition Based on Machine Learning
    Wang, Yue
    Tian, Wei
    Xu, Jingyi
    Tian, Yingnan
    Xu, Chengtao
    Ma, Biao
    Hao, Qing
    Zhao, Chao
    Liu, Hong
    IEEE SENSORS JOURNAL, 2023, 23 (18) : 21767 - 21775
  • [42] Recognition and detection of unusual activities in ATM using dual-channel capsule generative adversarial network
    Kajendran, K.
    Mayan, J. Albert
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 247
  • [43] Modulation classification based on the collaboration of dual-channel CNN-LSTM and residual network
    Li Hui
    Li Shanshan
    Zou Borong
    Chen Yannan
    The Journal of China Universities of Posts and Telecommunications, 2022, 29 (01) : 113 - 124
  • [44] Inplace Gated Convolutional Recurrent Neural Network For Dual-channel Speech Enhancement
    Liu, Jinjiang
    Zhang, Xueliang
    INTERSPEECH 2021, 2021, : 1852 - 1856
  • [45] Collaborative Radio Frequency Fingerprint Identification Using Dual-Channel Parallel CNN
    Wang, Hanbo
    Wang, Jian
    2024 INTERNATIONAL CONFERENCE ON UBIQUITOUS COMMUNICATION, UCOM 2024, 2024, : 351 - 355
  • [46] Simulation of English speech emotion recognition based on transfer learning and CNN neural network
    Chen, Xuehua
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (02) : 2349 - 2360
  • [47] Deep neural network-based generalized sidelobe canceller for dual-channel far-field speech recognition
    Li, Guanjun
    Liang, Shan
    Nie, Shuai
    Liu, Wenju
    Yang, Zhanlei
    NEURAL NETWORKS, 2021, 141 : 225 - 237
  • [48] Background noise reduction via dual-channel scheme for speech recognition in vehicular environment
    Ahn, S
    Ko, H
    ICCE: 2005 INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, DIGEST OF TECHNICAL PAPERS, 2005, : 461 - 462
  • [49] Background noise reduction via dual-channel scheme for speech recognition in vehicular environment
    Ahn, S
    Ko, H
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2005, 51 (01) : 22 - 27
  • [50] Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model
    Mishra, Swami
    Bhatnagar, Nehal
    Prakasam, P.
    Sureshkumar, T. R.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 37603 - 37620