Transformer-based transfer learning and multi-task learning for improving the performance of speech emotion recognition

被引:0
|
作者
Park, Sunchan [1 ]
Kim, Hyung Soon [1 ]
机构
[1] Pusan Natl Univ, Dept Elect Engn, 2 Busandaehak Ro 63Beon Gil, Busan 46241, South Korea
来源
关键词
Speech emotion recognition; Transformer; Transfer learning; Multi-task learning;
D O I
10.7776/ASK.2021.40.5.515
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
It is hard to prepare sufficient training data for speech emotion recognition due to the difficulty of emotion labeling. In this paper, we apply transfer learning with large-scale training data for speech recognition on a transformer-based model to improve the performance of speech emotion recognition. In addition, we propose a method to utilize context information without decoding by multi-task learning with speech recognition. According to the speech emotion recognition experiments using the IEMOCAP dataset, our model achieves a weighted accuracy of 70.6 % and an unweighted accuracy of 71.6 %, which shows that the proposed method is effective in improving the performance of speech emotion recognition.
引用
收藏
页码:515 / 522
页数:8
相关论文
共 50 条
  • [1] Speech Emotion Recognition based on Multi-Task Learning
    Zhao, Huijuan
    Han Zhijie
    Wang, Ruchuan
    [J]. 2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 186 - 188
  • [2] Speech Emotion Recognition with Multi-task Learning
    Cai, Xingyu
    Yuan, Jiahong
    Zheng, Renjie
    Huang, Liang
    Church, Kenneth
    [J]. INTERSPEECH 2021, 2021, : 4508 - 4512
  • [3] Improving Transformer-based Speech Recognition with Unsupervised Pre-training and Multi-task Semantic Knowledge Learning
    Li, Song
    Li, Lin
    Hong, Qingyang
    Liu, Lingling
    [J]. INTERSPEECH 2020, 2020, : 5006 - 5010
  • [4] Multi-task Learning for Speech Emotion and Emotion Intensity Recognition
    Yue, Pengcheng
    Qu, Leyuan
    Zheng, Shukai
    Li, Taihao
    [J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1232 - 1237
  • [5] Meta Multi-task Learning for Speech Emotion Recognition
    Cai, Ruichu
    Guo, Kaibin
    Xu, Boyan
    Yang, Xiaoyan
    Zhang, Zhenjie
    [J]. INTERSPEECH 2020, 2020, : 3336 - 3340
  • [6] Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning
    Zhao Huijuan
    Ye Ning
    Wang Ruchuan
    [J]. Journal of Signal Processing Systems, 2021, 93 : 299 - 308
  • [7] Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning
    Zhao, Huijuan
    Ye, Ning
    Wang, Ruchuan
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2021, 93 (2-3): : 299 - 308
  • [8] Speech Emotion Recognition in the Wild using Multi-task and Adversarial Learning
    Parry, Jack
    DeMattos, Eric
    Klementiev, Anita
    Ind, Axel
    Morse-Kopp, Daniela
    Clarke, Georgia
    Palaz, Dimitri
    [J]. INTERSPEECH 2022, 2022, : 1158 - 1162
  • [9] Speech Emotion Recognition Based on Multi-Task Learning Using a Convolutional Neural Network
    Kim, Nam Kyun
    Lee, Jiwon
    Ha, Hun Kyu
    Lee, Geon Woo
    Lee, Jung Hyuk
    Kim, Hong Kook
    [J]. 2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 704 - 707
  • [10] Predicting Outcomes for Cancer Patients with Transformer-Based Multi-task Learning
    Gerrard, Leah
    Peng, Xueping
    Clarke, Allison
    Schlegel, Clement
    Jiang, Jing
    [J]. AI 2021: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13151 : 381 - 392