Speech Emotion Recognition Using Auditory Spectrogram and Cepstral Features

被引:0
|
作者
Zhao, Shujie [1 ]
Yang, Yan [1 ]
Cohen, Israel [2 ]
Zhang, Lijun [1 ]
机构
[1] Northwestern Polytech Univ, CIAIC, Xian 710072, Shaanxi, Peoples R China
[2] Technion Israel Inst Technol, IL-3200003 Haifa, Israel
基金
中国国家自然科学基金;
关键词
Emotion recognition; speech signals; machine learning; pattern recognition; feature extraction; noise;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A systematic comparison on the impact of environmental noises on key acoustic features is critical in order to transfer speech emotion recognition (SER) systems into real world applications. In this study, we investigate the noise-tolerance of different acoustic features in distinguishing various emotions by comparing the SER classification performance on clean speech signals and noisy speech signals. We extract the spectrum and cepstral parameters based on human auditory characteristics and develop machine learning algorithms to classify four types of emotions using these features. Experimental results across the clean and noisy data show that compared to cepstral features, the auditory spectrogram-based features can achieve higher recognition accuracy for low signal-to-noise ratios (SNRs), but lower accuracy for high SNRs. Gammatone filter cepstral coefficients (GFCCs) outperformed all the extracted features on the Berlin database of emotional speech (EmoDB), under all four kinds of tested noise conditions. These results show compensation relationships between auditory spectrogram-based features and cepstral features for SER with better noise robustness in real-world applications.
引用
收藏
页码:136 / 140
页数:5
相关论文
共 50 条
  • [41] Automatic Emotion Recognition using Auditory and Prosodic Indicative Features
    Gharsellaoui, Soumaya
    Selouani, Sid-Ahmed
    Dahmane, Adel Omar
    [J]. 2015 IEEE 28TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2015, : 1265 - 1270
  • [42] Speech Emotion Recognition Based on Coiflet Wavelet Packet Cepstral Coefficients
    Huang, Yongming
    Wu, Ao
    Zhang, Guobao
    Li, Yue
    [J]. PATTERN RECOGNITION (CCPR 2014), PT II, 2014, 484 : 436 - 443
  • [43] Spectral and Cepstral Audio Noise Reduction Techniques in Speech Emotion Recognition
    Pohjalainen, Jouni
    Ringeval, Fabien
    Zhang, Zixing
    Schuller, Bjoern
    [J]. MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 670 - 674
  • [44] Phoneme recognition using speech image (spectrogram)
    Ahmadi, M
    Bailey, NJ
    Hoyle, BS
    [J]. ICSP '96 - 1996 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1996, : 675 - 677
  • [45] Speech Emotion Recognition Using Neural Network and Wavelet Features
    Roy, Tanmoy
    Marwala, Tshilidzi
    Chakraverty, S.
    [J]. RECENT TRENDS IN WAVE MECHANICS AND VIBRATIONS, WMVC 2018, 2020, : 427 - 438
  • [46] SPEECH EMOTION RECOGNITION USING SELF-SUPERVISED FEATURES
    Morais, Edmilson
    Hoory, Ron
    Zhu, Weizhong
    Gat, Itai
    Damasceno, Matheus
    Aronowitz, Hagai
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6922 - 6926
  • [47] Automatic speech based emotion recognition using paralinguistics features
    Hook, J.
    Noroozi, F.
    Toygar, O.
    Anbarjafari, G.
    [J]. BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES, 2019, 67 (03) : 479 - 488
  • [48] Learning Salient Features for Speech Emotion Recognition Using CNN
    Liu, Jiamu
    Han, Wenjing
    Ruan, Huabin
    Chen, Xiaomin
    Jiang, Dongmei
    Li, Haifeng
    [J]. 2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [49] Automatic speech emotion recognition using modulation spectral features
    Wu, Siqing
    Falk, Tiago H.
    Chan, Wai-Yip
    [J]. SPEECH COMMUNICATION, 2011, 53 (05) : 768 - 785
  • [50] Emotion Recognition from Speech using Prosodic and Linguistic Features
    Pervaiz, Mahwish
    Khan, Tamim Ahmed
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (08) : 84 - 90