Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning

被引:91
|
作者
Li, Yuanchao [1 ]
Zhao, Tianyu [2 ]
Kawahara, Tatsuya [2 ]
机构
[1] Honda Res & Dev Co Ltd, Honda Innovat Lab, Haga, Japan
[2] Kyoto Univ, Grad Sch Informat, Kyoto, Japan
来源
关键词
Speech emotion recognition (SER); spectrogram; end-to-end (E2E); attention mechanism; multitask learning;
D O I
10.21437/Interspeech.2019-2594
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Accurately recognizing emotion from speech is a necessary yet challenging task due to the variability in speech and emotion. In this paper, we propose a speech emotion recognition (SER) method using end-to-end (E2E) multitask learning with self attention to deal with several issues. First, we extract features directly from speech spectrogram instead of using traditional hand-crafted features to better represent emotion. Second, we adopt self attention mechanism to focus on the salient periods of emotion in speech utterances. Finally, giving consideration to mutual features between emotion and gender classification tasks, we incorporate gender classification as an auxiliary task by using multitask learning to share useful information with emotion classification task. Evaluation on IEMOCAP (a commonly used database for SER research) demonstrates that the proposed method outperforms the state-of-the-art methods and improves the overall accuracy by an absolute of 7.7% compared to the best existing result.
引用
收藏
页码:2803 / 2807
页数:5
相关论文
共 50 条
  • [1] AN END-TO-END MULTITASK LEARNING MODEL TO IMPROVE SPEECH EMOTION RECOGNITION
    Fu, Changzeng
    Liu, Chaoran
    Ishi, Carlos Toshinori
    Ishiguro, Hiroshi
    [J]. 28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 351 - 355
  • [2] End-to-End Audiovisual Speech Recognition System With Multitask Learning
    Tao, Fei
    Busso, Carlos
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 1 - 11
  • [3] Improved training of end-to-end attention models for speech recognition
    Zeyer, Albert
    Irie, Kazuki
    Schlueter, Ralf
    Ney, Hermann
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 7 - 11
  • [4] An End-to-end Speech Recognition Algorithm based on Attention Mechanism
    Chen, Jia-nan
    Gao, Shuang
    Sun, Han-zhe
    Liu, Xiao-hui
    Wang, Zi-ning
    Zheng, Yan
    [J]. PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 2935 - 2940
  • [5] Self-Attention Transducers for End-to-End Speech Recognition
    Tian, Zhengkun
    Yi, Jiangyan
    Tao, Jianhua
    Bai, Ye
    Wen, Zhengqi
    [J]. INTERSPEECH 2019, 2019, : 4395 - 4399
  • [6] Speech Representation Learning for Emotion Recognition Using End-to-End ASR with Factorized Adaptation
    Yeh, Sung-Lin
    Lin, Yun-Shao
    Lee, Chi-Chun
    [J]. INTERSPEECH 2020, 2020, : 536 - 540
  • [7] Multitask Training with Text Data for End-to-End Speech Recognition
    Wang, Peidong
    Sainath, Tara N.
    Weiss, Ron J.
    [J]. INTERSPEECH 2021, 2021, : 2566 - 2570
  • [8] TRIGGERED ATTENTION FOR END-TO-END SPEECH RECOGNITION
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5666 - 5670
  • [9] Adversarial joint training with self-attention mechanism for robust end-to-end speech recognition
    Lujun Li
    Yikai Kang
    Yuchen Shi
    Ludwig Kürzinger
    Tobias Watzel
    Gerhard Rigoll
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [10] Adversarial joint training with self-attention mechanism for robust end-to-end speech recognition
    Li, Lujun
    Kang, Yikai
    Shi, Yuchen
    Kurzinger, Ludwig
    Watzel, Tobias
    Rigoll, Gerhard
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)