Scalogram vs Spectrogram as Speech Representation Inputs for Speech Emotion Recognition Using CNN

被引:1
|
作者
Enriquez, Marc Dominic [1 ]
Lucas, Crisron Rudolf [2 ]
Aquino, Angelina [3 ]
机构
[1] Univ Philippines, Digital Signal Proc Lab, Quezon City, Philippines
[2] Univ Coll Dublin, Insight Res Ctr, Dublin, Ireland
[3] Charles Darwin Univ, Northern Inst, Darwin, NT, Australia
基金
爱尔兰科学基金会;
关键词
Spectrogram; Scalogram; SER; CNN; Fourier Transform; Wavelet Transform;
D O I
10.1109/ISSC59246.2023.10162085
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech Emotion Recognition (SER) focuses on understanding the human emotion in a given speech utterance using its acoustic and/or linguistic features. This paper presents a comparison between two speech representation inputs for SER: spectrograms and scalograms. Speech signals from four databases (Emo-DB, RAVDESS, SAVEE, and a mix of all three) were converted into each type of representation and were used to train variations of a convolutional neural network (CNN) VGG16 Model-3. Results show that the scalogram-based models have a higher mean f1-score compared to the spectrogram-based models; however, further analysis indicate that the difference is statistically insignificant at a 95% confidence level. In conclusion, spectrograms and scalograms have statistically the same performance on the systems presented.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Speech Emotion Recognition Using Speech Feature and Word Embedding
    Atmaja, Bagus Tris
    Shirai, Kiyoaki
    Akagi, Masato
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 519 - 523
  • [42] Pitch-synchronous single frequency filtering spectrogram for speech emotion recognition
    Gupta, Shruti
    Fahad, Md. Shah
    Deepak, Akshay
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (31-32) : 23347 - 23365
  • [43] Pitch-synchronous single frequency filtering spectrogram for speech emotion recognition
    Shruti Gupta
    Md. Shah Fahad
    Akshay Deepak
    [J]. Multimedia Tools and Applications, 2020, 79 : 23347 - 23365
  • [44] SPEECH RECOGNITION THROUGH SPECTROGRAM MATCHING
    INGEMANN, F
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 56 : S27 - S27
  • [45] Waveform-Domain Speech Enhancement Using Spectrogram Encoding for Robust Speech Recognition
    Shi, Hao
    Mimura, Masato
    Kawahara, Tatsuya
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3049 - 3060
  • [46] SPEECH RECOGNITION THROUGH SPECTROGRAM MATCHING
    INGEMANN, F
    MERMELSTEIN, P
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1975, 57 (01): : 253 - 255
  • [47] Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method
    Boulal H.
    Hamidi M.
    Abarkan M.
    Barkani J.
    [J]. International Journal of Speech Technology, 2024, 27 (01) : 287 - 296
  • [48] Speech Emotion Recognition
    Lalitha, S.
    Madhavan, Abhishek
    Bhushan, Bharath
    Saketh, Srinivas
    [J]. 2014 INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRONICS, COMPUTERS AND COMMUNICATIONS (ICAECC), 2014,
  • [49] Speech Recognition Using HMM-CNN
    Santos, Lyndaines
    Moreira, Nicolas de Araujo
    Sampaio, Robson
    Lima, Raizielle
    Mattos Brito Oliveira, Francisco Carlos
    [J]. INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 1, WORLDCIST 2023, 2024, 799 : 528 - 537
  • [50] The modulation spectrogram: In pursuit of an invariant representation of speech
    Greenberg, S
    Kingsbury, BED
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1647 - 1650