Leveraged Mel Spectrograms Using Harmonic and Percussive Components in Speech Emotion Recognition

被引:5
|
作者
Rudd, David Hason [1 ]
Huo, Huan [1 ]
Xu, Guandong [1 ,2 ]
机构
[1] Univ Technol Sydney, 15 Broadway, Ultimo, Australia
[2] Data Sci Inst, 15 Broadway, Ultimo, Australia
基金
澳大利亚研究理事会;
关键词
Speech Emotion Recognition (SER); Mel spectrogram; Convolutional Neural Network (CNN); Voice signal processing; Acoustic features;
D O I
10.1007/978-3-031-05936-0_31
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech Emotion Recognition (SER) affective technology enables the intelligent embedded devices to interact with sensitivity. Similarly, call centre employees recognise customers' emotions from their pitch, energy, and tone of voice so as to modify their speech for a high-quality interaction with customers. This work explores, for the first time, the effects of the harmonic and percussive components of Mel spectrograms in SER. We attempt to leverage the Mel spectrogram by decomposing distinguishable acoustic features for exploitation in our proposed architecture, which includes a novel feature map generator algorithm, a CNN-based network feature extractor and a multi-layer perceptron (MLP) classifier. This study specifically focuses on effective data augmentation techniques for building an enriched hybrid-based feature map. This process results in a function that outputs a 2D image so that it can be used as input data for a pre-trained CNN-VGG16 feature extractor. Furthermore, we also investigate other acoustic features such as MFCCs, chromagram, spectral contrast, and the tonnetz to assess our proposed framework. A test accuracy of 92.79% on the Berlin EMO-DB database is achieved. Our result is higher than previous works using CNN-VGG16.
引用
收藏
页码:392 / 404
页数:13
相关论文
共 50 条
  • [1] Music Genre Recognition Using Spectrograms with Harmonic-Percussive Sound Separation
    Aguiar, Rafael de Lima
    da Costa, Yandre Maldonado e Gomes
    Nanni, Loris
    [J]. PROCEEDINGS OF THE 2016 35TH INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY (SCCC), 2016,
  • [2] Emotion recognition from speech using deep learning on spectrograms
    Li, Xingguang
    Song, Wenjun
    Liang, Zonglin
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (03) : 2791 - 2796
  • [3] Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms
    Satt, Aharon
    Rozenberg, Shai
    Hoory, Ron
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1089 - 1093
  • [4] Speech Emotion Recognition using Convolutional Recurrent Neural Networks and Spectrograms
    Qamhan, Mustafa A.
    Meftah, Ali H.
    Selouani, Sid-Ahmed
    Alotaibi, Yousef A.
    Zakariah, Mohammed
    Seddiq, Yasser Mohammad
    [J]. 2020 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2020,
  • [5] Emotion Recognition from Speech using Spectrograms and Shallow Neural Networks
    Slimi, Anwer
    Hamroun, Mohamed
    Zrigui, Mounir
    Nicolas, Henri
    [J]. MOMM 2020: THE 18TH INTERNATIONAL CONFERENCE ON ADVANCES IN MOBILE COMPUTING & MULTIMEDIA, 2020, : 35 - 39
  • [6] Speech-Based Emotion Analysis Using Log-Mel Spectrograms and MFCC Features
    Yetkin, Ahmet Kemal
    Kose, Hatice
    [J]. 2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2023,
  • [7] Speech Emotion Recognition From 3D Log-Mel Spectrograms With Deep Learning Network
    Meng, Hao
    Yan, Tianhao
    Yuan, Fei
    Wei, Hongwei
    [J]. IEEE ACCESS, 2019, 7 : 125868 - 125881
  • [8] Speech Quality Assessment using Mel Frequency Spectrograms of Speech Signals
    Zafar, Shakeel
    Nizami, Imran Fareed
    Majid, Muhammad
    [J]. 2021 INTERNATIONAL CONFERENCE ON DIGITAL FUTURES AND TRANSFORMATIVE TECHNOLOGIES (ICODT2), 2021,
  • [9] LEARNING DISCRIMINATIVE FEATURES FROM SPECTROGRAMS USING CENTER LOSS FOR SPEECH EMOTION RECOGNITION
    Dai, Dongyang
    Wu, Zhiyong
    Li, Runnan
    Wu, Xixin
    Jia, Jia
    Meng, Helen
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7405 - 7409
  • [10] Speech Emotion Recognition using Mel Frequency Cepstral Coefficient and SVM Classifier
    Fernandes, V.
    Mascarehnas, L.
    Mendonca, C.
    Johnson, A.
    Mishra, R.
    [J]. PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON SYSTEM MODELING & ADVANCEMENT IN RESEARCH TRENDS (SMART), 2018, : 200 - 204