Scalogram vs Spectrogram as Speech Representation Inputs for Speech Emotion Recognition Using CNN

被引:1
|
作者
Enriquez, Marc Dominic [1 ]
Lucas, Crisron Rudolf [2 ]
Aquino, Angelina [3 ]
机构
[1] Univ Philippines, Digital Signal Proc Lab, Quezon City, Philippines
[2] Univ Coll Dublin, Insight Res Ctr, Dublin, Ireland
[3] Charles Darwin Univ, Northern Inst, Darwin, NT, Australia
基金
爱尔兰科学基金会;
关键词
Spectrogram; Scalogram; SER; CNN; Fourier Transform; Wavelet Transform;
D O I
10.1109/ISSC59246.2023.10162085
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech Emotion Recognition (SER) focuses on understanding the human emotion in a given speech utterance using its acoustic and/or linguistic features. This paper presents a comparison between two speech representation inputs for SER: spectrograms and scalograms. Speech signals from four databases (Emo-DB, RAVDESS, SAVEE, and a mix of all three) were converted into each type of representation and were used to train variations of a convolutional neural network (CNN) VGG16 Model-3. Results show that the scalogram-based models have a higher mean f1-score compared to the spectrogram-based models; however, further analysis indicate that the difference is statistically insignificant at a 95% confidence level. In conclusion, spectrograms and scalograms have statistically the same performance on the systems presented.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Speech Emotion Recognition Using CNN
    Huang, Zhengwei
    Dong, Ming
    Mao, Qirong
    Zhan, Yongzhao
    [J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 801 - 804
  • [2] Detecting Human Emotion via Speech Recognition by Using Speech Spectrogram
    Prasomphan, Sathit
    [J]. PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (IEEE DSAA 2015), 2015, : 113 - 122
  • [3] Speech Emotion Recognition Using Spectrogram & Phoneme Embedding
    Yenigalla, Promod
    Kumar, Abhay
    Tripathi, Suraj
    Singh, Chirag
    Kar, Sibsambhu
    Vepa, Jithendra
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3688 - 3692
  • [4] Emotion recognition based on AlexNet using speech spectrogram
    Park, Soeun
    Lee, Chul
    Kwon, Soonil
    Park, Neungsoo
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2018, 123 : 49 - 49
  • [5] Speech emotion recognition using scalogram based deep structure
    Aghajani, K.
    Esmaili Paeen Afrakoti, I.
    [J]. International Journal of Engineering, Transactions B: Applications, 2020, 33 (02): : 285 - 292
  • [6] Improvement Of Speech Emotion Recognition with Neural Network Classifier by Using Speech Spectrogram
    Prasomphan, Sathit
    [J]. 2015 INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP 2015), 2015, : 73 - 76
  • [7] Speech Emotion Recognition Using Scalogram Based Deep Structure
    Aghajani, K.
    Afrakoti, I. Esmaili Paeen
    [J]. INTERNATIONAL JOURNAL OF ENGINEERING, 2020, 33 (02): : 285 - 292
  • [8] Speech Emotion Recognition Using Auditory Spectrogram and Cepstral Features
    Zhao, Shujie
    Yang, Yan
    Cohen, Israel
    Zhang, Lijun
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 136 - 140
  • [9] Speech Emotion Recognition Using a Dual-Channel Complementary Spectrogram and the CNN-SSAE Neutral Network
    Li, Juan
    Zhang, Xueying
    Huang, Lixia
    Li, Fenglian
    Duan, Shufei
    Sun, Ying
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [10] Scalogram as a Representation of Emotional Speech
    Powroznik, Pawel
    Wojcicki, Piotr
    Przylucki, Slawomir W.
    [J]. IEEE ACCESS, 2021, 9 : 154044 - 154057