MULTI-LINGUAL MULTI-TASK SPEECH EMOTION RECOGNITION USING WAV2VEC 2.0

被引:31
|
作者
Sharma, Mayank [1 ]
机构
[1] Amazon, Chennai, Tamil Nadu, India
关键词
Multi-task Multi-lingual speech emotion recognition; Pre-trained wav2vec 2.0; PANN;
D O I
10.1109/ICASSP43922.2022.9747417
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech Emotion Recognition (SER) has several use cases for Digital Entertainment Content (DEC) in Over-the-top (OTT) services, emotive Text-to-Speech (TTS) engines and voice assistants. In this work, we present a Multi-Lingual (MLi) and Multi-Task Learning (MTL) audio only SER system based on the multi-lingual pre-trained wav2vec 2.0 model. The model is fine-tuned on 25 open source datasets in 13 locales across 7 emotion categories. We show that, a) Our wav2vec 2.0 single task based model outperforms Pre-trained Audio Neural Network (PANN) based single task pre-trained model by 7.2% (relative), b) The best MTL model outperforms the PANN based and wav2vec 2.0 based single task models by 8.6% and 1.7% (relative) respectively, c) The MTL based system outperforms pre-trained single task wav2vec 2.0 model in 9 out of 13 locales in terms of weighted F1 scores, and d) The MTL-MLi wav2vec 2.0 outperforms the state-of-the-art for the languages contained in the pre-training corpora.
引用
收藏
页码:6907 / 6911
页数:5
相关论文
共 50 条
  • [41] Multi-Lingual Speech Emotion Recognition: Investigating Similarities between English and German Languages
    Devi, Ghaayathri K.
    Likhitha, Kolluru
    Akshaya, J.
    Rfj, Gokul
    Lal, Jyothish G.
    2024 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND APPLIED INFORMATICS, ACCAI 2024, 2024,
  • [42] A Multi-lingual Multi-task Architecture for Low-resource Sequence Labeling
    Lin, Ying
    Yang, Shengqi
    Stoyanov, Veselin
    Ji, Heng
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 799 - 809
  • [43] A multi-lingual speech recognition system using a neural network approach
    Chen, OTC
    Chen, CY
    Chang, HT
    Hsu, FR
    Yang, HL
    Lee, YG
    ICNN - 1996 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS. 1-4, 1996, : 1576 - 1581
  • [44] A Novel Multi-Feature Fusion Model Based on Pre-Trained Wav2vec 2.0 for Underwater Acoustic Target Recognition
    Pu, Zijun
    Zhang, Qunfei
    Xue, Yangtao
    Zhu, Peican
    Cui, Xiaodong
    REMOTE SENSING, 2024, 16 (13)
  • [45] Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments
    Zhang, Xu
    Zhang, Xiangcheng
    Chen, Weisi
    Li, Chenlong
    Yu, Chengyuan
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [46] Speech emotion recognition using fine-tuned Wav2vec2.0 and neural controlleddifferential equations classifier
    Wang, Ni
    Yang, Danyu
    PLOS ONE, 2025, 20 (02):
  • [47] BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer Learning using Wav2Vec 2.0
    Kim, Miseul
    Piao, Zhenyu
    Lee, Jihyun
    Kang, Hong-Goo
    2023 IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS, BHI, 2023,
  • [48] PROFICIENCY ASSESSMENT OF L2 SPOKEN ENGLISH USING WAV2VEC 2.0
    Banno, Stefano
    Matassoni, Marco
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1088 - 1095
  • [49] Automatic Language Identification Using Speech Rhythm Features for Multi-Lingual Speech Recognition
    Kim, Hwamin
    Park, Jeong-Sik
    APPLIED SCIENCES-BASEL, 2020, 10 (07):
  • [50] Speech Emotion Recognition Based on Multi-Task Learning Using a Convolutional Neural Network
    Kim, Nam Kyun
    Lee, Jiwon
    Ha, Hun Kyu
    Lee, Geon Woo
    Lee, Jung Hyuk
    Kim, Hong Kook
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 704 - 707