Speech Emotion Recognition Model with Time-Scale-Invariance MFCCs as Input

被引:1
|
作者
Xie, Xiaohan [1 ]
Lou, Jiaqi [2 ]
Zhang, Lingzhi [3 ]
机构
[1] Shandong Prov Tengzhou 1 High Sch, Tengzhou, Peoples R China
[2] Cranfield Univ, Sch Aerosp Transport & Mfg, Bedford, England
[3] Univ Manchester, Sch Environm Educ & Dev, Manchester, Lancs, England
关键词
Emotion Recognition Analysis; MFCC; CNN; Multi-Head Attention Mechanism; NETWORKS; 5G;
D O I
10.1109/IWCMC51323.2021.9498598
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Speech Emotion Recognition (SER) is a significant task for human communication. In the recent years, Mel-frequency Cepstrum Coefficient (MFCC) feature can be usually utilized in the related tasks of speech emotion recognition. In this study, we developed a multi-head-attention CNN model with auxiliary task of gender task. Base on proposed model, we explore the effect of different time-scale MFCCs and different combination of them as input on the performance of proposed model. Experimental results show that MFCC having higher resolution in time-scale as input can help model achieving better performance of speech emotion recognition with a moderate range. Also, it can help model achieving better performance to combine different time-scale MFCCs appropriately.
引用
收藏
页码:537 / 542
页数:6
相关论文
共 50 条
  • [1] Scale-invariant MFCCs for speech/speaker recognition
    Tufekci, Zekeriya
    Disken, Gokay
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (05) : 3758 - 3762
  • [2] Multimodal speech emotion recognition based on multi-scale MFCCs and multi-view attention mechanism
    Lin Feng
    Lu-Yao Liu
    Sheng-Lan Liu
    Jian Zhou
    Han-Qing Yang
    Jie Yang
    Multimedia Tools and Applications, 2023, 82 : 28917 - 28935
  • [3] Multimodal speech emotion recognition based on multi-scale MFCCs and multi-view attention mechanism
    Feng, Lin
    Liu, Lu-Yao
    Liu, Sheng-Lan
    Zhou, Jian
    Yang, Han-Qing
    Yang, Jie
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 28917 - 28935
  • [4] A Lightweight Multi-Scale Model for Speech Emotion Recognition
    Li, Haoming
    Zhao, Daqi
    Wang, Jingwen
    Wang, Deqiang
    IEEE ACCESS, 2024, 12 : 130228 - 130240
  • [5] Speaker Independent Automatic Emotion Recognition from Speech: A Comparison of MFCCs and Discrete Wavelet Transforms
    Shah, Firoz A.
    Krishnan, Vimal V. R.
    Sukumar, Raji A.
    Jayakumar, Athulya
    Anto, Babu P.
    2009 INTERNATIONAL CONFERENCE ON ADVANCES IN RECENT TECHNOLOGIES IN COMMUNICATION AND COMPUTING (ARTCOM 2009), 2009, : 528 - 531
  • [6] Auditory Model Based Optimization of MFCCs Improves Automatic Speech Recognition Performance
    Chatterjee, Saikat
    Koniaris, Christos
    Kleijn, W. Bastiaan
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2943 - 2946
  • [7] Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS
    Toyoshima, Itsuki
    Okada, Yoshifumi
    Ishimaru, Momoko
    Uchiyama, Ryunosuke
    Tada, Mayu
    SENSORS, 2023, 23 (03)
  • [8] Speech emotion recognition using MFCCs extracted from a mobile terminal based on ETSI front end
    Beritelli, Francesco
    Casale, Salvatore
    Russo, Alessandra
    Serrano, Salvatore
    2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4, 2006, : 1607 - +
  • [9] The time course of emotion recognition in speech and music
    Nordstrom, Henrik
    Laukka, Petri
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 145 (05): : 3058 - 3074
  • [10] MULTI-TIME-SCALE CONVOLUTION FOR EMOTION RECOGNITION FROM SPEECH AUDIO SIGNALS
    Guizzo, Eric
    Weyde, Tillman
    Leveson, Jack Barnett
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6489 - 6493