Jointly Fine-Tuning "BERT-like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition

被引:35
|
作者
Siriwardhana, Shamane [1 ]
Reis, Andrew [1 ]
Weerasekera, Rivindu [1 ]
Nanayakkara, Suranga [1 ]
机构
[1] Univ Auckland, Auckland Bioengn Inst, Augmented Human Lab, Auckland, New Zealand
来源
关键词
speech emotion recognition; self supervised learning; Transformers; BERT; multimodal deep learning;
D O I
10.21437/Interspeech.2020-1212
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Multimodal emotion recognition from the speech is an important area in affective computing. Fusing multiple data modalities and learning representations with limited amounts of labeled data is a challenging task. In this paper, we explore the use of modality specific"BERT-like" pretrained Self Supervised Learning (SSL) architectures to represent both speech and text modalities for the task of multimodal speech emotion recognition. By conducting experiments on three publicly available datasets (IEMOCAP, CMU-MOSEI, and CMU-MOSI), we show that jointly fine-tuning "BERT-like" SSL architectures achieve state-of-the-art (SOTA) results. We also evaluate two methods of fusing speech and text modalities and show that a simple fusion mechanism can outperform more complex ones when using SSL models that have similar architectural properties to BERT.
引用
收藏
页码:3755 / 3759
页数:5
相关论文
共 34 条
  • [31] Fine-tuning pre-trained models for Automatic Speech Recognition: experiments on a fieldwork corpus of Japhug (Trans-Himalayan family)
    Guillaume, Séverine
    Wisniewski, Guillaume
    Macaire, Cécile
    Jacques, Guillaume
    Michaud, Alexis
    Galliot, Benjamin
    Coavoux, Maximin
    Rossato, Solange
    Nguyên, Minh-Châu
    Fily, Maxime
    COMPUTEL 2022 - 5th Workshop on the Use of Computational Methods in the Study of Endangered Languages, Proceedings of the Workshop, 2022, : 170 - 178
  • [32] Evaluating Arabic Emotion Recognition Task Using ChatGPT Models: A Comparative Analysis between Emotional Stimuli Prompt, Fine-Tuning, and In-Context Learning
    Nfaoui, El Habib
    Elfaik, Hanane
    JOURNAL OF THEORETICAL AND APPLIED ELECTRONIC COMMERCE RESEARCH, 2024, 19 (02): : 1118 - 1141
  • [33] One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification
    Heo, Jungwoo
    Lim, Chan-yeong
    Kim, Ju-ho
    Shin, Hyun-seo
    Yu, Ha-Jin
    INTERSPEECH 2023, 2023, : 5271 - 5275
  • [34] Adapter pre-training for improved speech recognition in unseen domains using low resource adapter tuning of self-supervised models
    Udupa, Sathvik
    Bandekar, Jesuraj
    Kumar, Saurabh
    Deekshitha, G.
    Sandhya, B.
    Abhayjeet, S.
    Murthy, Savitha
    Pai, Priyanka
    Raghavan, Srinivasa
    Nanavati, Raoul
    Ghosh, Prasanta Kumar
    INTERSPEECH 2024, 2024, : 2529 - 2533