Convolutional Neural Network with Spectrogram and Perceptual Features for Speech Emotion Recognition

被引:5
|
作者
Zhang, Linjuan [1 ]
Wang, Longbiao [1 ]
Dang, Jianwu [1 ,2 ]
Guo, Lili [1 ]
Guan, Haotian [3 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China
[2] Japan Adv Inst Sci & Technol, Nomi, Ishikawa, Japan
[3] Intelligent Spoken Language Technol Tianjin Co Lt, Tianjin, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech emotion recognition; Spectrogram; Perceptual features; Convolutional neural network; Bi-directional long short-term memory;
D O I
10.1007/978-3-030-04212-7_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolutional neural network (CNN) has demonstrated a great power at mining deep information from spectrogram for speech emotion recognition. However, perceptual features such as low-level descriptors (LLDs) and their statistical values were not utilized sufficiently in CNN-based emotion recognition. To solve this problem, we propose novel features to combine spectrogram and perceptual features in different levels. Firstly, frame-level LLDs are arranged as time-sequence LLDs. Then, spectrogram and time-sequence LLDs are fused as compositional spectrographic features (CSF). To fully utilize perceptual features and global information, statistical values of LLDs are added in CSF to generate rich-compositional spectrographic features (RSF). Finally, the proposed features are individually fed to CNN to extract deep features for emotion recognition. Bi-directional long short-term memory was employed to identify emotions and the experiments were conducted on EmoDB. Compared with spectrogram, CSF and RSF improve the unweighted accuracy by a relative error reduction of 32.04% and 36.91%, respectively.
引用
收藏
页码:62 / 71
页数:10
相关论文
共 50 条
  • [41] Speech emotion recognition based on improved masking EMD and convolutional recurrent neural network
    Sun, Congshan
    Li, Haifeng
    Ma, Lin
    [J]. FRONTIERS IN PSYCHOLOGY, 2023, 13
  • [42] LIGHT-SERNET: A LIGHTWEIGHT FULLY CONVOLUTIONAL NEURAL NETWORK FOR SPEECH EMOTION RECOGNITION
    Aftab, Arya
    Morsali, Alireza
    Ghaemmaghami, Shahrokh
    Champagne, Benoit
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6912 - 6916
  • [43] Speech Emotion Recognition by Combining Amplitude and Phase Information Using Convolutional Neural Network
    Guo, Lili
    Wang, Longbiao
    Dang, Jianwu
    Zhang, Linjuan
    Guan, Haotian
    Li, Xiangang
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1611 - 1615
  • [44] Deep Convolutional Neural Network and Gray Wolf Optimization Algorithm for Speech Emotion Recognition
    Mohammad Reza Falahzadeh
    Fardad Farokhi
    Ali Harimi
    Reza Sabbaghi-Nadooshan
    [J]. Circuits, Systems, and Signal Processing, 2023, 42 : 449 - 492
  • [45] A NEW APPROACH FOR SPEECH EMOTION RECOGNITION USING SINGLE LAYERED CONVOLUTIONAL NEURAL NETWORK
    Mannan, J. Mannar
    Kumar, V. Vinoth
    Palaiahnakote, Shivakumara
    Khan, Surbhi Bhatia
    Almusharraf, Ahlam
    [J]. MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2024, 37 (01) : 89 - 106
  • [46] Speech Emotion Recognition with Hybrid Neural Network
    Wei, Chuanzheng
    Sun, Xiao
    Tian, Fang
    Ren, Fuji
    [J]. 5TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM 2019), 2019, : 298 - 302
  • [47] Implementation of Convolutional Neural Network for Speech Recognition
    Wang, Zhichao
    Na, Xingyu
    Liu, Yong
    Pan, Jielin
    Yan, Yonghong
    [J]. INTERNATIONAL ACADEMIC CONFERENCE ON THE INFORMATION SCIENCE AND COMMUNICATION ENGINEERING (ISCE 2014), 2014, : 239 - 243
  • [48] Cough Recognition Based on Mel-Spectrogram and Convolutional Neural Network
    Zhou, Quan
    Shan, Jianhua
    Ding, Wenlong
    Wang, Chengyin
    Yuan, Shi
    Sun, Fuchun
    Li, Haiyuan
    Fang, Bin
    [J]. FRONTIERS IN ROBOTICS AND AI, 2021, 8
  • [49] SPEECH EMOTION RECOGNITION USING QUATERNION CONVOLUTIONAL NEURAL NETWORKS
    Muppidi, Aneesh
    Radfar, Martin
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6309 - 6313
  • [50] FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition
    Dossou, Bonaventure F. P.
    Gbenou, Yeno K. S.
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3526 - 3531