Learning Discriminative Features using Center Loss and Reconstruction as Regularizer for Speech Emotion Recognition

被引：0

作者：

Tripathi, Suraj ^{[1
]}

Ramesh, Abhiram ^{[1
]}

Kumar, Abhay ^{[1
]}

Singh, Chirag ^{[1
]}

Yenigalla, Promod ^{[1
]}

机构：

[1] Samsung Res, Nat Language Understanding Grp, Bengaluru, India

来源：

WORKSHOP ON ARTIFICIAL INTELLIGENCE IN AFFECTIVE COMPUTING, VOL 122 | 2019年 / 122卷

关键词：

Spectrogram; MFCC; speech emotion recognition; multitask learning; center loss; NEURAL-NETWORKS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes a Convolutional Neural Network (CNN) inspired by Multitask Learning (MTL) and based on speech features trained under the joint supervision of softmax loss and center loss, a powerful metric learning strategy, for the recognition of emotion in speech. Speech features such as Spectrograms and Mel-frequency Cepstral Coefficients (MFCCs) help retain emotion related low-level characteristics in speech. We experimented with several Deep Neural Network (DNN) architectures that take in speech features as input and trained them under both softmax and center loss, which resulted in highly discriminative features ideal for Speech Emotion Recognition (SER). Our networks also employ a regularizing effect by simultaneously performing the auxiliary task of reconstructing the input speech features. This sharing of representations among related tasks enables our network to better generalize the original task of SER. Some of our proposed networks contain far fewer parameters when compared to state-of-the-art architectures. We used the University of Southern California's Interactive Emotional Motion Capture (USC-IEMOCAP) database in this work. Our best performing model achieves a 3.1% improvement in overall accuracy and a 5.3% improvement in class accuracy when compared to existing state-of-the-art methods.

引用

页码：44 / 53

页数：10

共 50 条

[21] Island Loss for Learning Discriminative Features in Facial Expression Recognition
Cai, Jie
Meng, Zibo
Khan, Ahmed Shehab
Li, Zhiyuan
O'Reilly, James
Tong, Yan
PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, : 302 - 309
[22] Speech emotion recognition using stacked generative and discriminative hybrid models
Huang, Yongming
Zhang, Guobao
Dong, Fei
Li, Yue
Shengxue Xuebao/Acta Acustica, 2013, 38 (02): : 231 - 240
[23] Speech Emotion Recognition Using Deep Learning
Alagusundari, N.
Anuradha, R.
ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, VOL 1, AITA 2023, 2024, 843 : 313 - 325
[24] Speech Emotion Recognition Using Deep Learning
Ahmed, Waqar
Riaz, Sana
Iftikhar, Khunsa
Konur, Savas
ARTIFICIAL INTELLIGENCE XL, AI 2023, 2023, 14381 : 191 - 197
[25] Speech Emotion Recognition Using Transfer Learning
Song, Peng
Jin, Yun
Zhao, Li
Xin, Minghai
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (09): : 2530 - 2532
[26] Emotion Recognition in Speech Using MFCC and Wavelet Features
Kishore, K. V. Krishna
Satish, P. Krishna
PROCEEDINGS OF THE 2013 3RD IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2013, : 842 - 847
[27] Speech emotion recognition using nonlinear dynamics features
Shahzadi, Ali
Ahmadyfard, Alireza
Harimi, Ali
Yaghmaie, Khashayar
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2015, 23 : 2056 - 2073
[28] Speech Emotion Recognition Using Minimum Extracted Features
Abdulsalam, Wisal Hashim
Alhamdani, Rafah Shihab
Abdullah, Mohammed Najm
2018 1ST ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION AND SCIENCES (AICIS 2018), 2018, : 58 - 61
[29] Speech Emotion Recognition Using Magnitude and Phase Features
Shankar D.R.
Manjula R.B.
Biradar R.C.
SN Computer Science, 5 (5)
[30] RECOGNITION OF EMOTION IN SPEECH USING VARIOGRAM BASED FEATURES
Esmaileyan, Zeynab
Marvi, Hosein
MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2014, 27 (03) : 156 - 170

← 1 2 3 4 5 →