Comparative Analysis of Windows for Speech Emotion Recognition Using CNN

被引:0
|
作者
Teixeira, Felipe L. [1 ,2 ]
Soares, Salviano Pinto [4 ,5 ]
Abreu, J. L. Pio [6 ,7 ]
Oliveira, Paulo M. [8 ]
Teixeira, Joao P. [1 ,3 ]
机构
[1] Inst Politecn Braganca, Res Ctr Digitalizat & Intelligent Robot CEDRI, Campus Santa Apolonia, P-5300253 Braganca, Portugal
[2] Univ Tras Os Montes & Alto Douro UTAD, Sch Sci & Technol, Engn Dept, P-5000801 Vila Real, Portugal
[3] Inst Politecn Braganca, Associate Lab Sustainabil & Technol SusTEC, Campus Santa Apolonia, P-5300253 Braganca, Portugal
[4] Univ Aveiro, Inst Elect & Informat Engn Aveiro IEETA, P-3810193 Aveiro, Portugal
[5] Univ Aveiro, Intelligent Syst Associate Lab LASI, P-3810193 Aveiro, Portugal
[6] Hosp Univ Coimbra, P-3004561 Coimbra, Portugal
[7] Univ Coimbra, Fac Med, P-3000548 Coimbra, Portugal
[8] Univ Tras Os Montes & Alto Douro UTAD, INESC TEC, Vila Real, Portugal
关键词
Speech Emotion Recognition; Hamming; Hanning; CNN; FEATURES; SPECTROGRAM; SELECTION;
D O I
10.1007/978-3-031-53025-8_17
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The paper presents the comparison of accuracy in the Speech Emotion Recognition task using the Hamming and Hanning windows for framing the speech and determining the spectrogram to be used as input of a convolutional neural network. The detection of between 4 and 10 emotional states was tested for both windows. The results show significant differences in accuracy between the two window types and provide valuable insights for the development of more efficient emotional state detection systems. The best accuracy between 4 and 10 emotions was 64.1% (4 emotions), 57.8% (5 emotions), 59.8% (6 emotions), 48.4% (7 emotions), 47.8% (8 emotions), 51.4% (9 emotions), and 45.9% (10 emotions). These accuracy is at the state-of-the art level.
引用
收藏
页码:233 / 248
页数:16
相关论文
共 50 条
  • [41] Emotion recognition in speech using neural networks
    Nicholson, J
    Takahashi, K
    Nakatsu, R
    [J]. AFFECTIVE MINDS, 2000, : 215 - 220
  • [42] Speech emotion recognition using auditory cortex
    Wahab, Abdul
    Quek, Chai
    De, Sussan
    [J]. 2007 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-10, PROCEEDINGS, 2007, : 2658 - 2664
  • [43] Emotion Recognition in Speech Using Neural Networks
    J. Nicholson
    K. Takahashi
    R. Nakatsu
    [J]. Neural Computing & Applications, 2000, 9 : 290 - 296
  • [44] USING REGIONAL SALIENCY FOR SPEECH EMOTION RECOGNITION
    Aldeneh, Zakaria
    Provost, Emily Mower
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2741 - 2745
  • [45] Speech Emotion Recognition using Combination of Features
    Zhang, Qingli
    An, Ning
    Wang, Kunxia
    Ren, Fuji
    Li, Lian
    [J]. PROCEEDINGS OF THE 2013 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT CONTROL AND INFORMATION PROCESSING (ICICIP), 2013, : 523 - 528
  • [46] Speech Emotion Recognition Using Speech Feature and Word Embedding
    Atmaja, Bagus Tris
    Shirai, Kiyoaki
    Akagi, Masato
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 519 - 523
  • [47] Emotion recognition in speech using neural networks
    Nicholson, J
    Takahashi, K
    Nakatsu, R
    [J]. NEURAL COMPUTING & APPLICATIONS, 2000, 9 (04): : 290 - 296
  • [48] RECOGNITION OF EMOTION IN SPEECH USING SPECTRAL PATTERNS
    Shahzadi, Ali
    Ahmadyfard, Alireza
    Yaghmaie, Khashayar
    Harimi, Ali
    [J]. MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2013, 26 (02) : 140 - 158
  • [49] Speech emotion recognition using a fuzzy approach
    Ton-That, An H.
    Cao, Nhan T.
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (02) : 1587 - 1597
  • [50] Speech emotion recognition using data augmentation
    V. M. Praseetha
    P. P. Joby
    [J]. International Journal of Speech Technology, 2022, 25 : 783 - 792