Multimodal Embeddings From Language Models for Emotion Recognition in the Wild

被引:10
|
作者
Tseng, Shao-Yen [1 ]
Narayanan, Shrikanth [1 ]
Georgiou, Panayiotis [2 ]
机构
[1] Univ Southern Calif, Dept Elect & Comp Engn, Los Angeles, CA 90089 USA
[2] Apple Inc, Siri Understanding, Culver City, CA 90016 USA
关键词
Acoustics; Task analysis; Feature extraction; Convolution; Emotion recognition; Context modeling; Bit error rate; Machine learning; unsupervised learning; natural language processing; speech processing; emotion recognition; SPEECH;
D O I
10.1109/LSP.2021.3065598
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Word embeddings such as ELMo and BERT have been shown to model word usage in language with greater efficacy through contextualized learning on large-scale language corpora, resulting in significant performance improvement across many natural language processing tasks. In this work we integrate acoustic information into contextualized lexical embeddings through the addition of a parallel stream to the bidirectional language model. This multimodal language model is trained on spoken language data that includes both text and audio modalities. We show that embeddings extracted from this model integrate paralinguistic cues into word meanings and can provide vital affective information by applying these multimodal embeddings to the task of speaker emotion recognition.
引用
下载
收藏
页码:608 / 612
页数:5
相关论文
共 50 条
  • [21] Emotion recognition from unimodal to multimodal analysis: A review
    Ezzameli, K.
    Mahersia, H.
    INFORMATION FUSION, 2023, 99
  • [22] Learning Alignment for Multimodal Emotion Recognition from Speech
    Xu, Haiyang
    Zhang, Hui
    Han, Kun
    Wang, Yun
    Peng, Yiping
    Li, Xiangang
    INTERSPEECH 2019, 2019, : 3569 - 3573
  • [23] Emotion Recognition From Expressions in Face, Voice, and Body: The Multimodal Emotion Recognition Test (MERT)
    Baenziger, Tanja
    Grandjean, Didier
    Scherer, Klaus R.
    EMOTION, 2009, 9 (05) : 691 - 704
  • [24] Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention Transformers
    Muhammad Zaidi, Syed Aun
    Latif, Siddique
    Qadir, Junaid
    IEEE Open Journal of the Computer Society, 2024, 5 : 684 - 693
  • [25] Speech Emotion Recognition Using Transfer Learning: Integration of Advanced Speaker Embeddings and Image Recognition Models
    Jakubec, Maros
    Lieskovska, Eva
    Jarina, Roman
    Spisiak, Michal
    Kasak, Peter
    Applied Sciences (Switzerland), 2024, 14 (21):
  • [26] Emotion Recognition from Multimodal Physiological Signals for Emotion Aware Healthcare Systems
    Ayata, Deger
    Yaslan, Yusuf
    Kamasak, Mustafa E.
    JOURNAL OF MEDICAL AND BIOLOGICAL ENGINEERING, 2020, 40 (02) : 149 - 157
  • [27] Emotion Recognition from Multimodal Physiological Signals for Emotion Aware Healthcare Systems
    Değer Ayata
    Yusuf Yaslan
    Mustafa E. Kamasak
    Journal of Medical and Biological Engineering, 2020, 40 : 149 - 157
  • [28] Multimodal Group Emotion Recognition In-the-wild Using Privacy-Compliant Features
    Augusma, Anderson
    Vaufreydaz, Dominique
    Letue, Frederique
    PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023, 2023, : 750 - 754
  • [29] Emotion Recognition in the Wild from Videos using Images
    Bargal, Sarah Adel
    Barsoum, Emad
    Ferrer, Cristian Canton
    Zhang, Cha
    ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 433 - 436
  • [30] Audiovisual emotion recognition in wild
    Egils Avots
    Tomasz Sapiński
    Maie Bachmann
    Dorota Kamińska
    Machine Vision and Applications, 2019, 30 : 975 - 985