Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning

被引:91
|
作者
Li, Yuanchao [1 ]
Zhao, Tianyu [2 ]
Kawahara, Tatsuya [2 ]
机构
[1] Honda Res & Dev Co Ltd, Honda Innovat Lab, Haga, Japan
[2] Kyoto Univ, Grad Sch Informat, Kyoto, Japan
来源
关键词
Speech emotion recognition (SER); spectrogram; end-to-end (E2E); attention mechanism; multitask learning;
D O I
10.21437/Interspeech.2019-2594
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Accurately recognizing emotion from speech is a necessary yet challenging task due to the variability in speech and emotion. In this paper, we propose a speech emotion recognition (SER) method using end-to-end (E2E) multitask learning with self attention to deal with several issues. First, we extract features directly from speech spectrogram instead of using traditional hand-crafted features to better represent emotion. Second, we adopt self attention mechanism to focus on the salient periods of emotion in speech utterances. Finally, giving consideration to mutual features between emotion and gender classification tasks, we incorporate gender classification as an auxiliary task by using multitask learning to share useful information with emotion classification task. Evaluation on IEMOCAP (a commonly used database for SER research) demonstrates that the proposed method outperforms the state-of-the-art methods and improves the overall accuracy by an absolute of 7.7% compared to the best existing result.
引用
收藏
页码:2803 / 2807
页数:5
相关论文
共 50 条
  • [21] End-to-end recognition of streaming Japanese speech using CTC and local attention
    Chen, Jiahao
    Nishimura, Ryota
    Kitaoka, Norihide
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2020, 9 (01)
  • [22] End-to-end Parking Behavior Recognition Based on Self-attention Mechanism
    Li, Penghua
    Zhu, Dechen
    Mou, Qiyun
    Tu, Yushan
    Wu, Jinfeng
    [J]. 2023 2ND ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING, CACML 2023, 2023, : 371 - 376
  • [23] SIMPLIFIED SELF-ATTENTION FOR TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION
    Luo, Haoneng
    Zhang, Shiliang
    Lei, Ming
    Xie, Lei
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 75 - 81
  • [24] STRUCTURED SPARSE ATTENTION FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Xue, Jiabin
    Zheng, Tieran
    Han, Jiqing
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7044 - 7048
  • [25] Hybrid CTC/Attention Architecture for End-to-End Speech Recognition
    Watanabe, Shinji
    Hori, Takaaki
    Kim, Suyoun
    Hershey, John R.
    Hayashi, Tomoki
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1240 - 1253
  • [26] Multi-channel Attention for End-to-End Speech Recognition
    Braun, Stefan
    Neil, Daniel
    Anumula, Jithendar
    Ceolini, Enea
    Liu, Shih-Chii
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 17 - 21
  • [27] Joint CTC/attention decoding for end-to-end speech recognition
    Hori, Takaaki
    Watanabe, Shinji
    Hershey, John R.
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 518 - 529
  • [28] SELF-TRAINING FOR END-TO-END SPEECH RECOGNITION
    Kahn, Jacob
    Lee, Ann
    Hannun, Awni
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7084 - 7088
  • [29] IMPROVING END-TO-END SPEECH RECOGNITION WITH POLICY LEARNING
    Zhou, Yingbo
    Xiong, Caiming
    Socher, Richard
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5819 - 5823
  • [30] Towards end-to-end speech recognition with transfer learning
    Chu-Xiong Qin
    Dan Qu
    Lian-Hai Zhang
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2018