MTLSER: Multi-task learning enhanced speech emotion recognition with pre-trained acoustic model

被引:0
|
作者
Chen, Zengzhao [1 ,2 ]
Liu, Chuan [1 ]
Wang, Zhifeng [1 ]
Zhao, Chuanxu [1 ]
Lin, Mengting [1 ]
Zheng, Qiuyu [1 ]
机构
[1] Cent China Normal Univ, Fac Artificial Intelligence Educ, Wuhan 430079, Peoples R China
[2] Natl Intelligent Soc Governance Expt Base Educ, Wuhan 430079, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-task learning; Speech emotion recognition; Speaker identification; Automatic speech recognition; Speech representation learning;
D O I
10.1016/j.eswa.2025.126855
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study proposes a novel Speech Emotion Recognition (SER) approach employing a Multi-Task Learning framework (MTLSER), designed to boost recognition accuracy by training multiple related tasks simultaneously and sharing information via a joint loss function. This framework integrates SER as the primary task, with Automatic Speech Recognition (ASR) and speaker identification serving as auxiliary tasks. Feature extraction is conducted using the pre-trained wav2vec2.0 model, which acts as a shared layer within our multi-task learning (MTL) framework. Extracted features are then processed in parallel by the three tasks. The contributions of auxiliary tasks are adjusted through hyperparameters, and their loss functions are amalgamated into a singular joint loss function for effective backpropagation. This optimization refines the model's internal parameters. Our method's efficacy is tested during the inference stage, where the model concurrently outputs the emotion, textual content, and speaker identity from the input audio. We conducted ablation studies and a sensitivity analysis on the hyperparameters to determine the optimal settings for emotion recognition. The performance of our proposed MTLSER model is evaluated using the public IEMOCAP dataset. Results from extensive testing show a significant improvement over traditional methods, achieving a Weighted Accuracy (WA) of 82.63% and an Unweighted Accuracy (UA) of 82.19%. These findings affirm the effectiveness and robustness of our approach. Our code is publicly available at https://github.com/CCNU-nercel-lc/MTL-SER.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Speech Emotion Recognition with Multi-task Learning
    Cai, Xingyu
    Yuan, Jiahong
    Zheng, Renjie
    Huang, Liang
    Church, Kenneth
    INTERSPEECH 2021, 2021, : 4508 - 4512
  • [2] Multi-task Learning based Pre-trained Language Model for Code Completion
    Liu, Fang
    Li, Ge
    Zhao, Yunfei
    Jin, Zhi
    2020 35TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2020), 2020, : 473 - 485
  • [3] Multi-task Learning for Speech Emotion and Emotion Intensity Recognition
    Yue, Pengcheng
    Qu, Leyuan
    Zheng, Shukai
    Li, Taihao
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1232 - 1237
  • [4] Meta Multi-task Learning for Speech Emotion Recognition
    Cai, Ruichu
    Guo, Kaibin
    Xu, Boyan
    Yang, Xiaoyan
    Zhang, Zhenjie
    INTERSPEECH 2020, 2020, : 3336 - 3340
  • [5] Speech Emotion Recognition based on Multi-Task Learning
    Zhao, Huijuan
    Han Zhijie
    Wang, Ruchuan
    2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 186 - 188
  • [6] MCM: A Multi-task Pre-trained Customer Model for Personalization
    Luo, Rui
    Wang, Tianxin
    Deng, Jingyuan
    Wan, Peng
    PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 637 - 639
  • [7] MMER: Multimodal Multi-task Learning for Speech Emotion Recognition
    Ghosh, Sreyan
    Tyagi, Utkarsh
    Ramaneswaran, S.
    Srivastava, Harshvardhan
    Manocha, Dinesh
    INTERSPEECH 2023, 2023, : 1209 - 1213
  • [8] A Novel Policy for Pre-trained Deep Reinforcement Learning for Speech Emotion Recognition
    Rajapakshe, Thejan
    Rana, Rajib
    Khalifa, Sara
    Liu, Jiajun
    Schuller, Bjorn
    2022 AUSTRALIAN COMPUTER SCIENCE WEEK (ACSW 2022), 2022, : 96 - 105
  • [9] Drug knowledge discovery via multi-task learning and pre-trained models
    Li, Dongfang
    Xiong, Ying
    Hu, Baotian
    Tang, Buzhou
    Peng, Weihua
    Chen, Qingcai
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2021, 21 (SUPPL 9)
  • [10] Enhancing Pre-trained Language Representation for Multi-Task Learning of Scientific Summarization
    Jia, Ruipeng
    Cao, Yannan
    Fang, Fang
    Li, Jinpeng
    Liu, Yanbing
    Yin, Pengfei
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,