Speech emotion recognition based on optimized deep features of dual-channel complementary spectrogram

被引:7
|
作者
Li, Juan [1 ,2 ]
Zhang, Xueying [1 ]
Li, Fenglian [1 ]
Huang, Lixia [1 ]
机构
[1] Taiyuan Univ Technol, Coll Elect Informat & Opt Engn, Taiyuan 030600, Peoples R China
[2] Yuncheng Univ, Dept Phys & Elect Engn, Yuncheng 044000, Peoples R China
关键词
Speech emotion recognition; VTMel spectrogram; Mel spectrogram; Dual-channel complementary structure; Optimized deep features;
D O I
10.1016/j.ins.2023.119649
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech emotion recognition (SER) is an essential field of artificial intelligence. Although the Mel spectrogram is commonly used in SER, it emphasizes low-frequency emotional components. In this paper, we propose VMD-Teager-Mel (VTMel) spectrogram, which complements the Mel spectrogram by emphasizing high-frequency components. In addition, to reduce the redundancy of the acoustic features, we propose a convolutional neural network with a deep restricted Boltzmann machine (CNN-DBM) to obtain optimized deep features. Furthermore, a dual-channel complementary structure is proposed for SER. First, a CNN-DBM extracts optimized deep features from the Mel spectrogram, highlighting low-frequency components. Second, another CNN-DBM extracts optimized deep features from the VTMel spectrogram, highlighting high-frequency components. These features are spliced together and fed to a classifier. The experimental results on three public datasets (EMO-DB, SAVEE, and RAVDESS) reveal that the use of the merged features achieves better performance, confirming the complementarity between the Mel and VTMel spectrograms. The recognition accuracy using CNN-DBM optimized deep features is superior to that using deep features from CNN alone, demonstrating the superiority of the proposed method. Our experiments also show advantages of the proposed method compared with the state-of-the-art methods reported in the literature.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Speech Emotion Recognition Using a Dual-Channel Complementary Spectrogram and the CNN-SSAE Neutral Network
    Li, Juan
    Zhang, Xueying
    Huang, Lixia
    Li, Fenglian
    Duan, Shufei
    Sun, Ying
    APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [2] Speech Emotion Recognition Based on Dual-Channel Convolutional Gated Recurrent Network
    Sun, Hanyu
    Huang, Lixia
    Zhang, Xueying
    Li, Juan
    Computer Engineering and Applications, 2024, 59 (02) : 170 - 177
  • [3] Experimental Analysis and Selection of Spectrogram Features for Speech Emotion Recognition
    Tang, Gui-Chen
    Liang, Rui-Yu
    Feng, Yue-Qin
    Wang, Qing-Yun
    INTERNATIONAL CONFERENCE ON MECHANICS, BUILDING MATERIAL AND CIVIL ENGINEERING (MBMCE 2015), 2015, : 757 - 762
  • [4] Speech Emotion Recognition Using Auditory Spectrogram and Cepstral Features
    Zhao, Shujie
    Yang, Yan
    Cohen, Israel
    Zhang, Lijun
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 136 - 140
  • [5] Emotion recognition based on AlexNet using speech spectrogram
    Park, Soeun
    Lee, Chul
    Kwon, Soonil
    Park, Neungsoo
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2018, 123 : 49 - 49
  • [6] A BERT based dual-channel explainable text emotion recognition system
    Kumar, Puneet
    Raman, Balasubramanian
    NEURAL NETWORKS, 2022, 150 : 392 - 407
  • [7] Convolutional Neural Network with Spectrogram and Perceptual Features for Speech Emotion Recognition
    Zhang, Linjuan
    Wang, Longbiao
    Dang, Jianwu
    Guo, Lili
    Guan, Haotian
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT IV, 2018, 11304 : 62 - 71
  • [8] Music Emotion Recognition by Using Chroma Spectrogram and Deep Visual Features
    Er, Mehmet Bilal
    Aydilek, Ibrahim Berkan
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2019, 12 (02) : 1622 - 1634
  • [9] Music Emotion Recognition by Using Chroma Spectrogram and Deep Visual Features
    Mehmet Bilal Er
    Ibrahim Berkan Aydilek
    International Journal of Computational Intelligence Systems, 2019, 12 : 1622 - 1634
  • [10] Facial Expression Recognition Based on Dual-Channel Fusion with Edge Features
    Tang, Xiaoyu
    Liu, Sirui
    Xiang, Qiuchi
    Cheng, Jintao
    He, Huifang
    Xue, Bohuan
    SYMMETRY-BASEL, 2022, 14 (12):