Combining wav2vec 2.0 Fine-Tuning and ConLearnNet for Speech Emotion Recognition

被引:1
|
作者
Sun, Chenjing [1 ]
Zhou, Yi [2 ]
Huang, Xin [1 ]
Yang, Jichen [3 ]
Hou, Xianhua [1 ]
机构
[1] South China Normal Univ, Sch Elect & Informat Engn, Foshan 528234, Peoples R China
[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 117583, Singapore
[3] Guangdong Polytech Normal Univ, Sch Cyber Secur, Guangzhou 510640, Peoples R China
关键词
speech emotion recognition (SER); wav2vec; 2.0; contrastive learning; MODEL;
D O I
10.3390/electronics13061103
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech emotion recognition poses challenges due to the varied expression of emotions through intonation and speech rate. In order to reduce the loss of emotional information during the recognition process and to enhance the extraction and classification of speech emotions and thus improve the ability of speech emotion recognition, we propose a novel approach in two folds. Firstly, a feed-forward network with skip connections (SCFFN) is introduced to fine-tune wav2vec 2.0 and extract emotion embeddings. Subsequently, ConLearnNet is employed for emotion classification. ConLearnNet comprises three steps: feature learning, contrastive learning, and classification. Feature learning transforms the input, while contrastive learning encourages similar representations for samples from the same category and discriminative representations for different categories. Experimental results on the IEMOCAP and the EMO-DB datasets demonstrate the superiority of our proposed method compared to state-of-the-art systems. We achieve a WA and UAR of 72.86% and 72.85% on IEMOCAP, and 97.20% and 96.41% on the EMO-DB, respectively.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] An improved wav2vec 2.0 pre-training approach using enhanced local dependency modeling for speech recognition
    Zhu, Qiu-shi
    Zhang, Jie
    Wu, Ming-hui
    Fang, Xin
    Dai, Li-Rong
    INTERSPEECH 2021, 2021, : 4334 - 4338
  • [32] Aggregation Strategies of Wav2vec 2.0 Embeddings for Computational Paralinguistic Tasks
    Vetrab, Mercedes
    Gosztolya, Gabor
    SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 79 - 93
  • [33] Siamese Network with Wav2vec Feature for Spoofing Speech Detection
    Xie, Yang
    Zhang, Zhenchuan
    Yang, Yingchun
    INTERSPEECH 2021, 2021, : 4269 - 4273
  • [34] Applying the conformal prediction paradigm for the uncertainty quantification of an end-to-end automatic speech recognition model (wav2vec 2.0)
    Ernez, Fares
    Arnold, Alexandre
    Galametz, Audrey
    Kobus, Catherine
    Ould-Amer, Nawal
    CONFORMAL AND PROBABILISTIC PREDICTION WITH APPLICATIONS, VOL 204, 2023, 204 : 16 - 35
  • [35] Exploring the Impact of Fine-Tuning the Wav2vec2 Model in Database-Independent Detection of Dysarthric Speech
    Javanmardi, Farhad
    Kadiri, Sudarsana Reddy
    Alku, Paavo
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (08) : 4951 - 4962
  • [36] Kazakh Speech Recognition: Wav2vec2.0 vs. Whisper
    Kozhirbayev, Zhanibek
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2023, 14 (06) : 1382 - 1389
  • [37] Cross-modal distillation with audio-text fusion for fine-grained emotion classification using BERT and Wav2vec 2.0
    Kim, Donghwa
    Kang, Pilsung
    NEUROCOMPUTING, 2022, 506 : 168 - 183
  • [38] Investigating the Utility of wav2vec 2.0 Hidden Layers for Detecting Multiple Sclerosis
    Gosztolya, Gabor
    Toth, Laszlo
    Svindt, Veronika
    Bona, Judit
    Hoffmann, Ildiko
    SPEECH AND COMPUTER, SPECOM 2024, PT I, 2025, 15299 : 297 - 308
  • [39] Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments
    Zhang, Xu
    Zhang, Xiangcheng
    Chen, Weisi
    Li, Chenlong
    Yu, Chengyuan
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [40] BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer Learning using Wav2Vec 2.0
    Kim, Miseul
    Piao, Zhenyu
    Lee, Jihyun
    Kang, Hong-Goo
    2023 IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS, BHI, 2023,