Comparative Analysis of Windows for Speech Emotion Recognition Using CNN

被引:0
|
作者
Teixeira, Felipe L. [1 ,2 ]
Soares, Salviano Pinto [4 ,5 ]
Abreu, J. L. Pio [6 ,7 ]
Oliveira, Paulo M. [8 ]
Teixeira, Joao P. [1 ,3 ]
机构
[1] Inst Politecn Braganca, Res Ctr Digitalizat & Intelligent Robot CEDRI, Campus Santa Apolonia, P-5300253 Braganca, Portugal
[2] Univ Tras Os Montes & Alto Douro UTAD, Sch Sci & Technol, Engn Dept, P-5000801 Vila Real, Portugal
[3] Inst Politecn Braganca, Associate Lab Sustainabil & Technol SusTEC, Campus Santa Apolonia, P-5300253 Braganca, Portugal
[4] Univ Aveiro, Inst Elect & Informat Engn Aveiro IEETA, P-3810193 Aveiro, Portugal
[5] Univ Aveiro, Intelligent Syst Associate Lab LASI, P-3810193 Aveiro, Portugal
[6] Hosp Univ Coimbra, P-3004561 Coimbra, Portugal
[7] Univ Coimbra, Fac Med, P-3000548 Coimbra, Portugal
[8] Univ Tras Os Montes & Alto Douro UTAD, INESC TEC, Vila Real, Portugal
关键词
Speech Emotion Recognition; Hamming; Hanning; CNN; FEATURES; SPECTROGRAM; SELECTION;
D O I
10.1007/978-3-031-53025-8_17
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The paper presents the comparison of accuracy in the Speech Emotion Recognition task using the Hamming and Hanning windows for framing the speech and determining the spectrogram to be used as input of a convolutional neural network. The detection of between 4 and 10 emotional states was tested for both windows. The results show significant differences in accuracy between the two window types and provide valuable insights for the development of more efficient emotional state detection systems. The best accuracy between 4 and 10 emotions was 64.1% (4 emotions), 57.8% (5 emotions), 59.8% (6 emotions), 48.4% (7 emotions), 47.8% (8 emotions), 51.4% (9 emotions), and 45.9% (10 emotions). These accuracy is at the state-of-the art level.
引用
收藏
页码:233 / 248
页数:16
相关论文
共 50 条
  • [1] Speech Emotion Recognition Using CNN
    Huang, Zhengwei
    Dong, Ming
    Mao, Qirong
    Zhan, Yongzhao
    [J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 801 - 804
  • [2] Speech Emotion Recognition Using Machine Learning: A Comparative Analysis
    Nath S.
    Shahi A.K.
    Martin T.
    Choudhury N.
    Mandal R.
    [J]. SN Computer Science, 5 (4)
  • [3] Learning Salient Features for Speech Emotion Recognition Using CNN
    Liu, Jiamu
    Han, Wenjing
    Ruan, Huabin
    Chen, Xiaomin
    Jiang, Dongmei
    Li, Haifeng
    [J]. 2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [4] Speech Emotion Recognition using XGBoost and CNN BLSTM with Attention
    He, Jingru
    Ren, Liyong
    [J]. 2021 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, INTERNET OF PEOPLE, AND SMART CITY INNOVATIONS (SMARTWORLD/SCALCOM/UIC/ATC/IOP/SCI 2021), 2021, : 154 - 159
  • [5] A Combined CNN Architecture for Speech Emotion Recognition
    Begazo, Rolinson
    Aguilera, Ana
    Dongo, Irvin
    Cardinale, Yudith
    [J]. SENSORS, 2024, 24 (17)
  • [6] Scalogram vs Spectrogram as Speech Representation Inputs for Speech Emotion Recognition Using CNN
    Enriquez, Marc Dominic
    Lucas, Crisron Rudolf
    Aquino, Angelina
    [J]. 2023 34TH IRISH SIGNALS AND SYSTEMS CONFERENCE, ISSC, 2023,
  • [7] BLSTM and CNN Stacking Architecture for Speech Emotion Recognition
    Dongdong Li
    Linyu Sun
    Xinlei Xu
    Zhe Wang
    Jing Zhang
    Wenli Du
    [J]. Neural Processing Letters, 2021, 53 : 4097 - 4115
  • [8] BLSTM and CNN Stacking Architecture for Speech Emotion Recognition
    Li, Dongdong
    Sun, Linyu
    Xu, Xinlei
    Wang, Zhe
    Zhang, Jing
    Du, Wenli
    [J]. NEURAL PROCESSING LETTERS, 2021, 53 (06) : 4097 - 4115
  • [9] Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model
    Mishra, Swami
    Bhatnagar, Nehal
    Prakasam, P.
    Sureshkumar, T. R.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 37603 - 37620
  • [10] Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model
    Swami Mishra
    Nehal Bhatnagar
    Prakasam P
    Sureshkumar T. R
    [J]. Multimedia Tools and Applications, 2024, 83 : 37603 - 37620