Speech emotion recognition based on optimized deep features of dual-channel complementary spectrogram

被引:7
|
作者
Li, Juan [1 ,2 ]
Zhang, Xueying [1 ]
Li, Fenglian [1 ]
Huang, Lixia [1 ]
机构
[1] Taiyuan Univ Technol, Coll Elect Informat & Opt Engn, Taiyuan 030600, Peoples R China
[2] Yuncheng Univ, Dept Phys & Elect Engn, Yuncheng 044000, Peoples R China
关键词
Speech emotion recognition; VTMel spectrogram; Mel spectrogram; Dual-channel complementary structure; Optimized deep features;
D O I
10.1016/j.ins.2023.119649
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech emotion recognition (SER) is an essential field of artificial intelligence. Although the Mel spectrogram is commonly used in SER, it emphasizes low-frequency emotional components. In this paper, we propose VMD-Teager-Mel (VTMel) spectrogram, which complements the Mel spectrogram by emphasizing high-frequency components. In addition, to reduce the redundancy of the acoustic features, we propose a convolutional neural network with a deep restricted Boltzmann machine (CNN-DBM) to obtain optimized deep features. Furthermore, a dual-channel complementary structure is proposed for SER. First, a CNN-DBM extracts optimized deep features from the Mel spectrogram, highlighting low-frequency components. Second, another CNN-DBM extracts optimized deep features from the VTMel spectrogram, highlighting high-frequency components. These features are spliced together and fed to a classifier. The experimental results on three public datasets (EMO-DB, SAVEE, and RAVDESS) reveal that the use of the merged features achieves better performance, confirming the complementarity between the Mel and VTMel spectrograms. The recognition accuracy using CNN-DBM optimized deep features is superior to that using deep features from CNN alone, demonstrating the superiority of the proposed method. Our experiments also show advantages of the proposed method compared with the state-of-the-art methods reported in the literature.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Deep neural network-based generalized sidelobe canceller for dual-channel far-field speech recognition
    Li, Guanjun
    Liang, Shan
    Nie, Shuai
    Liu, Wenju
    Yang, Zhanlei
    NEURAL NETWORKS, 2021, 141 : 225 - 237
  • [22] Speech Emotion Recognition Using Spectrogram & Phoneme Embedding
    Yenigalla, Promod
    Kumar, Abhay
    Tripathi, Suraj
    Singh, Chirag
    Kar, Sibsambhu
    Vepa, Jithendra
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3688 - 3692
  • [23] Optimized Multi-Channel Deep Neural Network with 2D Graphical Representation of Acoustic Speech Features for Emotion Recognition
    Stolar, Melissa N.
    Lech, Margaret
    Burnett, Ian S.
    2014 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2014,
  • [24] Deep features-based speech emotion recognition for smart affective services
    Badshah, Abdul Malik
    Rahim, Nasir
    Ullah, Noor
    Ahmad, Jamil
    Muhammad, Khan
    Lee, Mi Young
    Kwon, Soonil
    Baik, Sung Wook
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (05) : 5571 - 5589
  • [25] Deep features-based speech emotion recognition for smart affective services
    Abdul Malik Badshah
    Nasir Rahim
    Noor Ullah
    Jamil Ahmad
    Khan Muhammad
    Mi Young Lee
    Soonil Kwon
    Sung Wook Baik
    Multimedia Tools and Applications, 2019, 78 : 5571 - 5589
  • [26] Speech Emotion Recognition Based on Arabic Features
    Meddeb, Mohamed
    Karray, Hichem
    Alimi, Adel M.
    2015 15TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS (ISDA), 2015, : 46 - 51
  • [27] On the Effect of Log-Mel Spectrogram Parameter Tuning for Deep Learning-Based Speech Emotion Recognition
    Mukhamediya, Azamat
    Fazli, Siamac
    Zollanvari, Amin
    IEEE ACCESS, 2023, 11 : 61950 - 61957
  • [28] Detecting Human Emotion via Speech Recognition by Using Speech Spectrogram
    Prasomphan, Sathit
    PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (IEEE DSAA 2015), 2015, : 113 - 122
  • [29] Deep and shallow features fusion based on deep convolutional neural network for speech emotion recognition
    Sun L.
    Chen J.
    Xie K.
    Gu T.
    International Journal of Speech Technology, 2018, 21 (04) : 931 - 940
  • [30] Dual-channel speech intelligibility enhancement based on the psychoacoustics
    Lee, Sang-Hoon
    Jeong, Hong
    LECTURE NOTES IN SIGNAL SCIENCE, INTERNET AND EDUCATION (SSIP'07/MIV'07/DIWEB'07), 2007, : 83 - +