Combining wav2vec 2.0 Fine-Tuning and ConLearnNet for Speech Emotion Recognition

被引:1
|
作者
Sun, Chenjing [1 ]
Zhou, Yi [2 ]
Huang, Xin [1 ]
Yang, Jichen [3 ]
Hou, Xianhua [1 ]
机构
[1] South China Normal Univ, Sch Elect & Informat Engn, Foshan 528234, Peoples R China
[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 117583, Singapore
[3] Guangdong Polytech Normal Univ, Sch Cyber Secur, Guangzhou 510640, Peoples R China
关键词
speech emotion recognition (SER); wav2vec; 2.0; contrastive learning; MODEL;
D O I
10.3390/electronics13061103
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech emotion recognition poses challenges due to the varied expression of emotions through intonation and speech rate. In order to reduce the loss of emotional information during the recognition process and to enhance the extraction and classification of speech emotions and thus improve the ability of speech emotion recognition, we propose a novel approach in two folds. Firstly, a feed-forward network with skip connections (SCFFN) is introduced to fine-tune wav2vec 2.0 and extract emotion embeddings. Subsequently, ConLearnNet is employed for emotion classification. ConLearnNet comprises three steps: feature learning, contrastive learning, and classification. Feature learning transforms the input, while contrastive learning encourages similar representations for samples from the same category and discriminative representations for different categories. Experimental results on the IEMOCAP and the EMO-DB datasets demonstrate the superiority of our proposed method compared to state-of-the-art systems. We achieve a WA and UAR of 72.86% and 72.85% on IEMOCAP, and 97.20% and 96.41% on the EMO-DB, respectively.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] PROFICIENCY ASSESSMENT OF L2 SPOKEN ENGLISH USING WAV2VEC 2.0
    Banno, Stefano
    Matassoni, Marco
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1088 - 1095
  • [42] Wav2vec 2.0 Embeddings Are No Swiss Army Knife - A Case Study for Multiple Sclerosis
    Gosztolya, Gabor
    Vetrend, Mercedes
    Svindt, Veronika
    Bona, Judit
    Hoffmann, Ildiko
    INTERSPEECH 2024, 2024, : 2499 - 2503
  • [43] Computation and Memory Efficient Noise Adaptation of Wav2Vec2.0 for Noisy Speech Emotion Recognition with Skip Connection Adapters
    Leem, Seong-Gyun
    Fulford, Daniel
    Onnela, Jukka-Pekka
    Gard, David
    Busso, Carlos
    INTERSPEECH 2023, 2023, : 1888 - 1892
  • [44] wav2vec 2.0 ASR for Cantonese-Speaking Older Adults in a Clinical Setting
    Huang, Ranzo C. F.
    Mak, Brian
    INTERSPEECH 2023, 2023, : 4958 - 4962
  • [45] Improving Speech Emotion Recognition via Fine-tuning ASR with Speaker Information
    Ta, Bao Thang
    Nguyen, Tung Lam
    Dang, Dinh Son
    Le, Nhat Minh
    Do, Van Hai
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1596 - 1601
  • [46] Unveiling embedded features in Wav2vec2 and HuBERT msodels for Speech Emotion Recognition
    Chakhtouna, Adil
    Sekkate, Sara
    Adib, Abdellah
    5TH INTERNATIONAL CONFERENCE ON INDUSTRY 4.0 AND SMART MANUFACTURING, ISM 2023, 2024, 232 : 2560 - 2569
  • [47] Transfer Ability of Monolingual Wav2vec2.0 for Low-resource Speech Recognition
    Yi, Cheng
    Wang, Jianzong
    Cheng, Ning
    Zhou, Shiyu
    Xu, Bo
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [48] Audio Features from the Wav2Vec 2.0 Embeddings for the ACM Multimedia 2022 Stuttering Challenge
    Montacie, Claude
    Caraty, Marie-Jose
    Lackovic, Nikola
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 7195 - 7199
  • [49] SPEECH RECOGNITION BY SIMPLY FINE-TUNING BERT
    Huang, Wen-Chin
    Wu, Chia-Hua
    Luo, Shang-Bao
    Chen, Kuan-Yu
    Wang, Hsin-Min
    Toda, Tomoki
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7343 - 7347
  • [50] Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model
    Vyas, Apoorv
    Madikeri, Srikanth
    Bourlard, Herve
    INTERSPEECH 2021, 2021, : 2861 - 2865