Emotion recognition from speech using deep learning on spectrograms

被引：0

作者：

Li, Xingguang ^{[1
]}

Song, Wenjun ^{[1
]}

Liang, Zonglin ^{[1
]}

机构：

[1] Changchun Univ Sci & Technol, Elect Informat Engn, 7089 Satellite Rd, Changchun, Jilin, Peoples R China

来源：

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS | 2020年 / 39卷 / 03期

关键词：

Speech emotion recognition; spectrograms; CRNN; focal loss;

D O I：

10.3233/JIFS-191129

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In speech emotion recognition, most emotional corpora generally have problems such as inconsistent sample length and imbalance of sample categories. Considering these problems, in this paper, a variable length input CRNN deep learning model based on Focal Loss is proposed for speech emotion recognition of anger, happiness, neutrality and sadness in IEMOCAP emotional corpus. In this model, Firstly, a variable-length strategy is introduced to input the speech spectra of the filled speech samples into CNN. Then the effective part of the input sequence is preserved and output by masking matrix and convolution layer. Thirdly, the effective output of input sequence is input into BiGRU network for learning. Finally, the focal loss is used for network training to control and adjust the contribution of various samples to the total loss. Compared with the traditional speech emotion recognition model, simulations show that our method can effectively improve the accuracy and performance of emotion recognition.

引用

页码：2791 / 2796

页数：6

共 50 条

[1] Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms
Satt, Aharon
Rozenberg, Shai
Hoory, Ron
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1089 - 1093
[2] Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms
Ma, Xi
Wu, Zhiyong
Jia, Jia
Xu, Mingxing
Meng, Helen
Cai, Lianhong
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3683 - 3687
[3] Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network
Badshah, Abdul Malik
Ahmad, Jamil
Rahim, Nasir
Baik, Sung Wook
[J]. 2017 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON), 2017, : 125 - 129
[4] LEARNING DISCRIMINATIVE FEATURES FROM SPECTROGRAMS USING CENTER LOSS FOR SPEECH EMOTION RECOGNITION
Dai, Dongyang
Wu, Zhiyong
Li, Runnan
Wu, Xixin
Jia, Jia
Meng, Helen
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7405 - 7409
[5] Speech Emotion Recognition Using Deep Learning
Alagusundari, N.
Anuradha, R.
[J]. ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, VOL 1, AITA 2023, 2024, 843 : 313 - 325
[6] Speech Emotion Recognition Using Deep Learning
Ahmed, Waqar
Riaz, Sana
Iftikhar, Khunsa
Konur, Savas
[J]. ARTIFICIAL INTELLIGENCE XL, AI 2023, 2023, 14381 : 191 - 197
[7] Emotion Recognition from Speech using Spectrograms and Shallow Neural Networks
Slimi, Anwer
Hamroun, Mohamed
Zrigui, Mounir
Nicolas, Henri
[J]. MOMM 2020: THE 18TH INTERNATIONAL CONFERENCE ON ADVANCES IN MOBILE COMPUTING & MULTIMEDIA, 2020, : 35 - 39
[8] Speech Emotion Recognition From 3D Log-Mel Spectrograms With Deep Learning Network
Meng, Hao
Yan, Tianhao
Yuan, Fei
Wei, Hongwei
[J]. IEEE ACCESS, 2019, 7 : 125868 - 125881
[9] Speech Emotion Recognition with Deep Learning
Harar, Pavol
Burget, Radim
Dutta, Malay Kishore
[J]. 2017 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2017, : 137 - 140
[10] Emotion Recognition from Human Speech Using Temporal Information and Deep Learning
Kim, John W.
Saurous, Rif A.
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 937 - 940

← 1 2 3 4 5 →