Speech Emotion Recognition with Multi-task Learning

被引:23
|
作者
Cai, Xingyu [1 ]
Yuan, Jiahong [1 ]
Zheng, Renjie [1 ]
Huang, Liang [1 ]
Church, Kenneth [1 ]
机构
[1] Baidu Res, Sunnyvale, CA 94089 USA
来源
关键词
speech emotion recognition; multi-task learning; MODELS;
D O I
10.21437/Interspeech.2021-1852
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Speech emotion recognition (SER) classifies speech into emotion categories such as: Happy, Angry, Sad and Neutral. Recently, deep learning has been applied to the SER task. This paper proposes a multi-task learning (MTL) framework to simultaneously perform speech-to-text recognition and emotion classification, with an end-to-end deep neural model based on wav2vec-2.0. Experiments on the IEMOCAP benchmark show that the proposed method achieves the state-of-the-art performance on the SER task. In addition, an ablation study establishes the effectiveness of the proposed MTL framework.
引用
收藏
页码:4508 / 4512
页数:5
相关论文
共 50 条
  • [21] Speaker independent feature selection for speech emotion recognition: A multi-task approach
    Kalhor, Elham
    Bakhtiari, Behzad
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 8127 - 8146
  • [22] Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion Recognition
    Latif, Siddique
    Rana, Rajib
    Khalifa, Sara
    Jurdak, Raja
    Epps, Julien
    Schuller, Bjoern W.
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (02) : 992 - 1004
  • [23] Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis
    Akhtar, Md Shad
    Chauhan, Dushyant Singh
    Ghosal, Deepanway
    Poria, Soujanya
    Ekbal, Asif
    Bhattacharyya, Pushpak
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 370 - 379
  • [24] Speaker independent feature selection for speech emotion recognition: A multi-task approach
    Elham Kalhor
    Behzad Bakhtiari
    Multimedia Tools and Applications, 2021, 80 : 8127 - 8146
  • [25] MT-TCCT: Multi-task Learning for Multimodal Emotion Recognition
    Wang, Yandan
    Chen, Zhongtang
    Chen, Shuang
    Zhu, Yu
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT III, 2022, 13531 : 429 - 442
  • [26] Inconsistency-Based Multi-Task Cooperative Learning for Emotion Recognition
    Xu, Yifan
    Cui, Yuqi
    Jiang, Xue
    Yin, Yingjie
    Ding, Jingting
    Li, Liang
    Wu, Dongrui
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (04) : 2017 - 2027
  • [27] Speech Emotion: Investigating Model Representations, Multi-Task Learning and Knowledge Distillation
    Mitra, Vikramjit
    Chien, Hsiang-Yun Sherry
    Kowtha, Vasudha
    Cheng, Joseph Yitan
    Azemi, Erdrin
    INTERSPEECH 2022, 2022, : 4715 - 4719
  • [28] Poster Abstract: Speech Emotion Recognition via Attention-based DNN from Multi-Task Learning
    Ma, Fei
    Gu, Weixi
    Zhang, Wei
    Ni, Shiguang
    Huang, Shao-Lun
    Zhang, Lin
    SENSYS'18: PROCEEDINGS OF THE 16TH CONFERENCE ON EMBEDDED NETWORKED SENSOR SYSTEMS, 2018, : 363 - 364
  • [29] Speaker-Aware Multi-Task Learning for Automatic Speech Recognition
    Pironkov, Gueorgui
    Dupont, Stephane
    Dutoit, Thierry
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2900 - 2905
  • [30] MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION
    Ravanelli, Mirco
    Zhong, Jianyuan
    Pascual, Santiago
    Swietojanski, Pawel
    Monteiro, Joao
    Trmal, Jan
    Bengio, Yoshua
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6989 - 6993