Learning Discriminative Features using Center Loss and Reconstruction as Regularizer for Speech Emotion Recognition

被引:0
|
作者
Tripathi, Suraj [1 ]
Ramesh, Abhiram [1 ]
Kumar, Abhay [1 ]
Singh, Chirag [1 ]
Yenigalla, Promod [1 ]
机构
[1] Samsung Res, Nat Language Understanding Grp, Bengaluru, India
关键词
Spectrogram; MFCC; speech emotion recognition; multitask learning; center loss; NEURAL-NETWORKS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a Convolutional Neural Network (CNN) inspired by Multitask Learning (MTL) and based on speech features trained under the joint supervision of softmax loss and center loss, a powerful metric learning strategy, for the recognition of emotion in speech. Speech features such as Spectrograms and Mel-frequency Cepstral Coefficients (MFCCs) help retain emotion related low-level characteristics in speech. We experimented with several Deep Neural Network (DNN) architectures that take in speech features as input and trained them under both softmax and center loss, which resulted in highly discriminative features ideal for Speech Emotion Recognition (SER). Our networks also employ a regularizing effect by simultaneously performing the auxiliary task of reconstructing the input speech features. This sharing of representations among related tasks enables our network to better generalize the original task of SER. Some of our proposed networks contain far fewer parameters when compared to state-of-the-art architectures. We used the University of Southern California's Interactive Emotional Motion Capture (USC-IEMOCAP) database in this work. Our best performing model achieves a 3.1% improvement in overall accuracy and a 5.3% improvement in class accuracy when compared to existing state-of-the-art methods.
引用
收藏
页码:44 / 53
页数:10
相关论文
共 50 条
  • [31] Speech Emotion Recognition Using ANN on MFCC Features
    Dolka, Harshit
    Xavier, Arul V. M.
    Juliet, Sujitha
    ICSPC'21: 2021 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION (ICPSC), 2021, : 431 - 435
  • [32] Speech Emotion Recognition Using Local and Global Features
    Gao, Yuanbo
    Li, Baobin
    Wang, Ning
    Zhu, Tingshao
    BRAIN INFORMATICS, BI 2017, 2017, 10654 : 3 - 13
  • [33] Emotion recognition using novel speech signal features
    Tabatabaei, Talieh Seyed
    Krishnan, Sridhar
    Guergachi, Aziz
    2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, 2007, : 345 - +
  • [34] Ensemble Learning of Hybrid Acoustic Features for Speech Emotion Recognition
    Zvarevashe, Kudakwashe
    Olugbara, Oludayo
    ALGORITHMS, 2020, 13 (03)
  • [35] Noisy speech emotion recognition using sample reconstruction and multiple-kernel learning
    Jiang Xiaoqing
    Xia Kewen
    Lin Yongliang
    Bai Jianchuan
    The Journal of China Universities of Posts and Telecommunications, 2017, 24 (02) : 1 - 9
  • [36] Noisy speech emotion recognition using sample reconstruction and multiple-kernel learning
    Xiaoqing J.
    Kewen X.
    Yongliang L.
    Jianchuan B.
    J. China Univ. Post Telecom., 2 (1,17-9): : 1,17 - 9
  • [37] Discriminative feature learning based on multi-view attention network with diffusion joint loss for speech emotion recognition
    Liu, Yang
    Chen, Xin
    Song, Yuan
    Li, Yarong
    Wang, Shengbei
    Yuan, Weitao
    Li, Yongwei
    Zhao, Zhen
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
  • [38] Excitation Features of Speech for Emotion Recognition Using Neutral Speech as Reference
    Kadin, Sudarsana Reddy
    Gangamohan, P.
    Gangashetty, Suryakanth, V
    Alku, Paavo
    Yegnanarayana, B.
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2020, 39 (09) : 4459 - 4481
  • [39] Excitation Features of Speech for Emotion Recognition Using Neutral Speech as Reference
    Sudarsana Reddy Kadiri
    P. Gangamohan
    Suryakanth V. Gangashetty
    Paavo Alku
    B. Yegnanarayana
    Circuits, Systems, and Signal Processing, 2020, 39 : 4459 - 4481
  • [40] Survey on discriminative feature selection for speech emotion recognition
    Xu, Xin
    Li, Ya
    Xu, Xiaoying
    Wen, Zhengqi
    Che, Hao
    Liu, Shanfeng
    Tao, Jianhua
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 345 - +