MULTI-LINGUAL MULTI-TASK SPEECH EMOTION RECOGNITION USING WAV2VEC 2.0

被引:31
|
作者
Sharma, Mayank [1 ]
机构
[1] Amazon, Chennai, Tamil Nadu, India
关键词
Multi-task Multi-lingual speech emotion recognition; Pre-trained wav2vec 2.0; PANN;
D O I
10.1109/ICASSP43922.2022.9747417
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech Emotion Recognition (SER) has several use cases for Digital Entertainment Content (DEC) in Over-the-top (OTT) services, emotive Text-to-Speech (TTS) engines and voice assistants. In this work, we present a Multi-Lingual (MLi) and Multi-Task Learning (MTL) audio only SER system based on the multi-lingual pre-trained wav2vec 2.0 model. The model is fine-tuned on 25 open source datasets in 13 locales across 7 emotion categories. We show that, a) Our wav2vec 2.0 single task based model outperforms Pre-trained Audio Neural Network (PANN) based single task pre-trained model by 7.2% (relative), b) The best MTL model outperforms the PANN based and wav2vec 2.0 based single task models by 8.6% and 1.7% (relative) respectively, c) The MTL based system outperforms pre-trained single task wav2vec 2.0 model in 9 out of 13 locales in terms of weighted F1 scores, and d) The MTL-MLi wav2vec 2.0 outperforms the state-of-the-art for the languages contained in the pre-training corpora.
引用
收藏
页码:6907 / 6911
页数:5
相关论文
共 50 条
  • [31] W2V2-Light: A Lightweight Version of Wav2vec 2.0 for Automatic Speech Recognition
    Kim, Dong-Hyun
    Lee, Jae-Hong
    Mo, Ji-Hwan
    Chang, Joon-Hyuk
    INTERSPEECH 2022, 2022, : 3038 - 3042
  • [32] Speech Emotion Recognition based on Multi-Task Learning
    Zhao, Huijuan
    Han Zhijie
    Wang, Ruchuan
    2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 186 - 188
  • [33] Speech Emotion Recognition in the Wild using Multi-task and Adversarial Learning
    Parry, Jack
    DeMattos, Eric
    Klementiev, Anita
    Ind, Axel
    Morse-Kopp, Daniela
    Clarke, Georgia
    Palaz, Dimitri
    INTERSPEECH 2022, 2022, : 1158 - 1162
  • [34] An improved wav2vec 2.0 pre-training approach using enhanced local dependency modeling for speech recognition
    Zhu, Qiu-shi
    Zhang, Jie
    Wu, Ming-hui
    Fang, Xin
    Dai, Li-Rong
    INTERSPEECH 2021, 2021, : 4334 - 4338
  • [35] Wav2f0: Exploring the Potential of Wav2vec 2.0 for Speech Fundamental Frequency Extraction
    Feng, Rui
    Liu, Yin-Long
    Ling, Zhen-Hua
    Yuan, Jia-Hong
    2024 IEEE 14TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, ISCSLP 2024, 2024, : 169 - 173
  • [36] Multi-lingual and Multi-task DNN Learning for Articulatory Error Detection
    Duan, Richeng
    Kawahara, Tatsuya
    Dantsuji, Masatake
    Zhang, Jinsong
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [37] Exploring Multi-Task Multi-Lingual Learning of Transformer Models for Hate Speech and Offensive Speech Identification in Social Media
    Mishra S.
    Prasad S.
    Mishra S.
    SN Computer Science, 2021, 2 (2)
  • [38] MMER: Multimodal Multi-task Learning for Speech Emotion Recognition
    Ghosh, Sreyan
    Tyagi, Utkarsh
    Ramaneswaran, S.
    Srivastava, Harshvardhan
    Manocha, Dinesh
    INTERSPEECH 2023, 2023, : 1209 - 1213
  • [39] Analysis of Multi-Lingual Emotion Recognition Using Auditory Attention Features
    Kalinli, Ozlem
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3613 - 3617
  • [40] Multi-Task Conformer with Multi-Feature Combination for Speech Emotion Recognition
    Seo, Jiyoung
    Lee, Bowon
    SYMMETRY-BASEL, 2022, 14 (07):