Learning Discriminative Features using Center Loss and Reconstruction as Regularizer for Speech Emotion Recognition

被引:0
|
作者
Tripathi, Suraj [1 ]
Ramesh, Abhiram [1 ]
Kumar, Abhay [1 ]
Singh, Chirag [1 ]
Yenigalla, Promod [1 ]
机构
[1] Samsung Res, Nat Language Understanding Grp, Bengaluru, India
关键词
Spectrogram; MFCC; speech emotion recognition; multitask learning; center loss; NEURAL-NETWORKS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a Convolutional Neural Network (CNN) inspired by Multitask Learning (MTL) and based on speech features trained under the joint supervision of softmax loss and center loss, a powerful metric learning strategy, for the recognition of emotion in speech. Speech features such as Spectrograms and Mel-frequency Cepstral Coefficients (MFCCs) help retain emotion related low-level characteristics in speech. We experimented with several Deep Neural Network (DNN) architectures that take in speech features as input and trained them under both softmax and center loss, which resulted in highly discriminative features ideal for Speech Emotion Recognition (SER). Our networks also employ a regularizing effect by simultaneously performing the auxiliary task of reconstructing the input speech features. This sharing of representations among related tasks enables our network to better generalize the original task of SER. Some of our proposed networks contain far fewer parameters when compared to state-of-the-art architectures. We used the University of Southern California's Interactive Emotional Motion Capture (USC-IEMOCAP) database in this work. Our best performing model achieves a 3.1% improvement in overall accuracy and a 5.3% improvement in class accuracy when compared to existing state-of-the-art methods.
引用
收藏
页码:44 / 53
页数:10
相关论文
共 50 条
  • [21] Island Loss for Learning Discriminative Features in Facial Expression Recognition
    Cai, Jie
    Meng, Zibo
    Khan, Ahmed Shehab
    Li, Zhiyuan
    O'Reilly, James
    Tong, Yan
    PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, : 302 - 309
  • [22] Speech emotion recognition using stacked generative and discriminative hybrid models
    Huang, Yongming
    Zhang, Guobao
    Dong, Fei
    Li, Yue
    Shengxue Xuebao/Acta Acustica, 2013, 38 (02): : 231 - 240
  • [23] Speech Emotion Recognition Using Deep Learning
    Alagusundari, N.
    Anuradha, R.
    ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, VOL 1, AITA 2023, 2024, 843 : 313 - 325
  • [24] Speech Emotion Recognition Using Deep Learning
    Ahmed, Waqar
    Riaz, Sana
    Iftikhar, Khunsa
    Konur, Savas
    ARTIFICIAL INTELLIGENCE XL, AI 2023, 2023, 14381 : 191 - 197
  • [25] Speech Emotion Recognition Using Transfer Learning
    Song, Peng
    Jin, Yun
    Zhao, Li
    Xin, Minghai
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (09): : 2530 - 2532
  • [26] Emotion Recognition in Speech Using MFCC and Wavelet Features
    Kishore, K. V. Krishna
    Satish, P. Krishna
    PROCEEDINGS OF THE 2013 3RD IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2013, : 842 - 847
  • [27] Speech emotion recognition using nonlinear dynamics features
    Shahzadi, Ali
    Ahmadyfard, Alireza
    Harimi, Ali
    Yaghmaie, Khashayar
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2015, 23 : 2056 - 2073
  • [28] Speech Emotion Recognition Using Minimum Extracted Features
    Abdulsalam, Wisal Hashim
    Alhamdani, Rafah Shihab
    Abdullah, Mohammed Najm
    2018 1ST ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION AND SCIENCES (AICIS 2018), 2018, : 58 - 61
  • [29] Speech Emotion Recognition Using Magnitude and Phase Features
    Shankar D.R.
    Manjula R.B.
    Biradar R.C.
    SN Computer Science, 5 (5)
  • [30] RECOGNITION OF EMOTION IN SPEECH USING VARIOGRAM BASED FEATURES
    Esmaileyan, Zeynab
    Marvi, Hosein
    MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2014, 27 (03) : 156 - 170