Speech Emotion Recognition Model with Time-Scale-Invariance MFCCs as Input

被引：1

作者：

Xie, Xiaohan ^{[1
]}

Lou, Jiaqi ^{[2
]}

Zhang, Lingzhi ^{[3
]}

机构：

[1] Shandong Prov Tengzhou 1 High Sch, Tengzhou, Peoples R China

[2] Cranfield Univ, Sch Aerosp Transport & Mfg, Bedford, England

[3] Univ Manchester, Sch Environm Educ & Dev, Manchester, Lancs, England

来源：

IWCMC 2021: 2021 17TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE (IWCMC) | 2021年

关键词：

Emotion Recognition Analysis; MFCC; CNN; Multi-Head Attention Mechanism; NETWORKS; 5G;

D O I：

10.1109/IWCMC51323.2021.9498598

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Speech Emotion Recognition (SER) is a significant task for human communication. In the recent years, Mel-frequency Cepstrum Coefficient (MFCC) feature can be usually utilized in the related tasks of speech emotion recognition. In this study, we developed a multi-head-attention CNN model with auxiliary task of gender task. Base on proposed model, we explore the effect of different time-scale MFCCs and different combination of them as input on the performance of proposed model. Experimental results show that MFCC having higher resolution in time-scale as input can help model achieving better performance of speech emotion recognition with a moderate range. Also, it can help model achieving better performance to combine different time-scale MFCCs appropriately.

引用

页码：537 / 542

页数：6

共 50 条

[1] Scale-invariant MFCCs for speech/speaker recognition
Tufekci, Zekeriya
Disken, Gokay
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (05) : 3758 - 3762
[2] Multimodal speech emotion recognition based on multi-scale MFCCs and multi-view attention mechanism
Lin Feng
Lu-Yao Liu
Sheng-Lan Liu
Jian Zhou
Han-Qing Yang
Jie Yang
Multimedia Tools and Applications, 2023, 82 : 28917 - 28935
[3] Multimodal speech emotion recognition based on multi-scale MFCCs and multi-view attention mechanism
Feng, Lin
Liu, Lu-Yao
Liu, Sheng-Lan
Zhou, Jian
Yang, Han-Qing
Yang, Jie
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 28917 - 28935
[4] A Lightweight Multi-Scale Model for Speech Emotion Recognition
Li, Haoming
Zhao, Daqi
Wang, Jingwen
Wang, Deqiang
IEEE ACCESS, 2024, 12 : 130228 - 130240
[5] Speaker Independent Automatic Emotion Recognition from Speech: A Comparison of MFCCs and Discrete Wavelet Transforms
Shah, Firoz A.
Krishnan, Vimal V. R.
Sukumar, Raji A.
Jayakumar, Athulya
Anto, Babu P.
2009 INTERNATIONAL CONFERENCE ON ADVANCES IN RECENT TECHNOLOGIES IN COMMUNICATION AND COMPUTING (ARTCOM 2009), 2009, : 528 - 531
[6] Auditory Model Based Optimization of MFCCs Improves Automatic Speech Recognition Performance
Chatterjee, Saikat
Koniaris, Christos
Kleijn, W. Bastiaan
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2943 - 2946
[7] Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS
Toyoshima, Itsuki
Okada, Yoshifumi
Ishimaru, Momoko
Uchiyama, Ryunosuke
Tada, Mayu
SENSORS, 2023, 23 (03)
[8] Speech emotion recognition using MFCCs extracted from a mobile terminal based on ETSI front end
Beritelli, Francesco
Casale, Salvatore
Russo, Alessandra
Serrano, Salvatore
2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4, 2006, : 1607 - +
[9] The time course of emotion recognition in speech and music
Nordstrom, Henrik
Laukka, Petri
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 145 (05): : 3058 - 3074
[10] MULTI-TIME-SCALE CONVOLUTION FOR EMOTION RECOGNITION FROM SPEECH AUDIO SIGNALS
Guizzo, Eric
Weyde, Tillman
Leveson, Jack Barnett
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6489 - 6493

← 1 2 3 4 5 →