Transformer-based transfer learning and multi-task learning for improving the performance of speech emotion recognition

被引:0
|
作者
Park, Sunchan [1 ]
Kim, Hyung Soon [1 ]
机构
[1] Pusan Natl Univ, Dept Elect Engn, 2 Busandaehak Ro 63Beon Gil, Busan 46241, South Korea
来源
关键词
Speech emotion recognition; Transformer; Transfer learning; Multi-task learning;
D O I
10.7776/ASK.2021.40.5.515
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
It is hard to prepare sufficient training data for speech emotion recognition due to the difficulty of emotion labeling. In this paper, we apply transfer learning with large-scale training data for speech recognition on a transformer-based model to improve the performance of speech emotion recognition. In addition, we propose a method to utilize context information without decoding by multi-task learning with speech recognition. According to the speech emotion recognition experiments using the IEMOCAP dataset, our model achieves a weighted accuracy of 70.6 % and an unweighted accuracy of 71.6 %, which shows that the proposed method is effective in improving the performance of speech emotion recognition.
引用
收藏
页码:515 / 522
页数:8
相关论文
共 50 条
  • [41] Finger Vein Recognition Based on Multi-Task Learning
    Hao, Zhiang
    Fang, Peiyu
    Yang, Hanwen
    [J]. 2020 5TH INTERNATIONAL CONFERENCE ON MATHEMATICS AND ARTIFICIAL INTELLIGENCE (ICMAI 2020), 2020, : 133 - 140
  • [42] Attribute Knowledge Integration for Speech Recognition Based on Multi-task Learning Neural Networks
    Zheng, Hao
    Yang, Zhanlei
    Qiao, Liwei
    Li, Jianping
    Liu, Wenju
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 543 - 547
  • [43] Paraphrase Bidirectional Transformer with Multi-Task Learning
    Ko, Bowon
    Choi, Ho-Jin
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 217 - 220
  • [44] Speaker-Aware Multi-Task Learning for Automatic Speech Recognition
    Pironkov, Gueorgui
    Dupont, Stephane
    Dutoit, Thierry
    [J]. 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2900 - 2905
  • [45] MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION
    Ravanelli, Mirco
    Zhong, Jianyuan
    Pascual, Santiago
    Swietojanski, Pawel
    Monteiro, Joao
    Trmal, Jan
    Bengio, Yoshua
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6989 - 6993
  • [46] Speech Emotion: Investigating Model Representations, Multi-Task Learning and Knowledge Distillation
    Mitra, Vikramjit
    Chien, Hsiang-Yun Sherry
    Kowtha, Vasudha
    Cheng, Joseph Yitan
    Azemi, Erdrin
    [J]. INTERSPEECH 2022, 2022, : 4715 - 4719
  • [47] Multi-Task Learning of Speech Recognition and Speech Synthesis Parameters for Ultrasound-based Silent Speech Interfaces
    Toth, Laszlo
    Gosztolya, Gabor
    Grosz, Tamas
    Marko, Alexandra
    Csapo, Tamas Gabor
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3172 - 3176
  • [48] MMATERIC: Multi-Task Learning and Multi-Fusion for AudioText Emotion Recognition in Conversation
    Liang, Xingwei
    Zou, You
    Zhuang, Xinnan
    Yang, Jie
    Niu, Taiyu
    Xu, Ruifeng
    [J]. ELECTRONICS, 2023, 12 (07)
  • [49] Visual-audio emotion recognition based on multi-task and ensemble learning with multiple features
    Hao, Man
    Cao, Wei-Hua
    Liu, Zhen-Tao
    Wu, Min
    Xiao, Peng
    [J]. Neurocomputing, 2022, 391 : 42 - 51
  • [50] MFUnetr: A transformer-based multi-task learning network for multi-organ segmentation from partially labeled datasets
    Hao, Qin
    Tian, Shengwei
    Yu, Long
    Wang, Junwen
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 85