Speech Emotion Recognition Model with Time-Scale-Invariance MFCCs as Input

被引:1
|
作者
Xie, Xiaohan [1 ]
Lou, Jiaqi [2 ]
Zhang, Lingzhi [3 ]
机构
[1] Shandong Prov Tengzhou 1 High Sch, Tengzhou, Peoples R China
[2] Cranfield Univ, Sch Aerosp Transport & Mfg, Bedford, England
[3] Univ Manchester, Sch Environm Educ & Dev, Manchester, Lancs, England
关键词
Emotion Recognition Analysis; MFCC; CNN; Multi-Head Attention Mechanism; NETWORKS; 5G;
D O I
10.1109/IWCMC51323.2021.9498598
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Speech Emotion Recognition (SER) is a significant task for human communication. In the recent years, Mel-frequency Cepstrum Coefficient (MFCC) feature can be usually utilized in the related tasks of speech emotion recognition. In this study, we developed a multi-head-attention CNN model with auxiliary task of gender task. Base on proposed model, we explore the effect of different time-scale MFCCs and different combination of them as input on the performance of proposed model. Experimental results show that MFCC having higher resolution in time-scale as input can help model achieving better performance of speech emotion recognition with a moderate range. Also, it can help model achieving better performance to combine different time-scale MFCCs appropriately.
引用
收藏
页码:537 / 542
页数:6
相关论文
共 50 条
  • [21] Multi-Scale Temporal Transformer For Speech Emotion Recognition
    Li, Zhipeng
    Xing, Xiaofen
    Fang, Yuanbo
    Zhang, Weibin
    Fan, Hengsheng
    Xu, Xiangmin
    INTERSPEECH 2023, 2023, : 3652 - 3656
  • [22] Double sparse learning model for speech emotion recognition
    Zong, Yuan
    Zheng, Wenming
    Cui, Zhen
    Li, Qiang
    ELECTRONICS LETTERS, 2016, 52 (16) : 1410 - 1411
  • [23] Interaction and Transition Model for Speech Emotion Recognition in Dialogue
    Zhang, Ruo
    Atsushi, Ando
    Kobashikawa, Satoshi
    Aono, Yushi
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1094 - 1097
  • [24] Semi-supervised Model for Emotion Recognition in Speech
    Pereira, Ingryd
    Santos, Diego
    Maciel, Alexandre
    Barros, Pablo
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 : 791 - 800
  • [25] Speech emotion recognition based on statistical pitch model
    WANG Zhiping ZHAO Li ZOU Cairong (Department of Radio Engineering
    Chinese Journal of Acoustics, 2006, (01) : 87 - 96
  • [26] Ensemble softmax regression model for speech emotion recognition
    Yaxin Sun
    Guihua Wen
    Multimedia Tools and Applications, 2017, 76 : 8305 - 8328
  • [27] Speech Emotion Recognition Based on Acoustic Segment Model
    Zheng, Siyuan
    Du, Jun
    Zhou, Hengshun
    Bai, Xue
    Lee, Chin-Hui
    Li, Shipeng
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [28] Model Comparison in Speech Emotion Recognition for Indonesian Language
    Rumagit, Reinert Yosua
    Alexander, Glenn
    Saputra, Irfan Fahmi
    5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020, 2021, 179 : 789 - 797
  • [29] Recognition of Emotion Intensity Basing on Neutral Speech Model
    Kaminska, Dorota
    Sapinski, Tomasz
    Pelikant, Adam
    MAN-MACHINE INTERACTIONS 3, 2014, 242 : 451 - 458
  • [30] Ensemble softmax regression model for speech emotion recognition
    Sun, Yaxin
    Wen, Guihua
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (06) : 8305 - 8328