Multimodal Embeddings From Language Models for Emotion Recognition in the Wild

被引:10
|
作者
Tseng, Shao-Yen [1 ]
Narayanan, Shrikanth [1 ]
Georgiou, Panayiotis [2 ]
机构
[1] Univ Southern Calif, Dept Elect & Comp Engn, Los Angeles, CA 90089 USA
[2] Apple Inc, Siri Understanding, Culver City, CA 90016 USA
关键词
Acoustics; Task analysis; Feature extraction; Convolution; Emotion recognition; Context modeling; Bit error rate; Machine learning; unsupervised learning; natural language processing; speech processing; emotion recognition; SPEECH;
D O I
10.1109/LSP.2021.3065598
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Word embeddings such as ELMo and BERT have been shown to model word usage in language with greater efficacy through contextualized learning on large-scale language corpora, resulting in significant performance improvement across many natural language processing tasks. In this work we integrate acoustic information into contextualized lexical embeddings through the addition of a parallel stream to the bidirectional language model. This multimodal language model is trained on spoken language data that includes both text and audio modalities. We show that embeddings extracted from this model integrate paralinguistic cues into word meanings and can provide vital affective information by applying these multimodal embeddings to the task of speaker emotion recognition.
引用
下载
收藏
页码:608 / 612
页数:5
相关论文
共 50 条
  • [1] Emotion Recognition from Videos Using Multimodal Large Language Models
    Vaiani, Lorenzo
    Cagliero, Luca
    Garza, Paolo
    FUTURE INTERNET, 2024, 16 (07)
  • [2] Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition
    Dai, Wenliang
    Liu, Zihan
    Yu, Tiezheng
    Fung, Pascale
    1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 269 - 280
  • [3] An Assessment of In-the-Wild Datasets for Multimodal Emotion Recognition
    Aguilera, Ana
    Mellado, Diego
    Rojas, Felipe
    SENSORS, 2023, 23 (11)
  • [4] HEU Emotion: a large-scale database for multimodal emotion recognition in the wild
    Jing Chen
    Chenhui Wang
    Kejun Wang
    Chaoqun Yin
    Cong Zhao
    Tao Xu
    Xinyi Zhang
    Ziqiang Huang
    Meichen Liu
    Tao Yang
    Neural Computing and Applications, 2021, 33 : 8669 - 8685
  • [5] DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE
    Gu, Yue
    Chen, Shuhong
    Marsic, Ivan
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5079 - 5083
  • [6] HEU Emotion: a large-scale database for multimodal emotion recognition in the wild
    Chen, Jing
    Wang, Chenhui
    Wang, Kejun
    Yin, Chaoqun
    Zhao, Cong
    Xu, Tao
    Zhang, Xinyi
    Huang, Ziqiang
    Liu, Meichen
    Yang, Tao
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (14): : 8669 - 8685
  • [7] Multimodal Fusion based on Information Gain for Emotion Recognition in the Wild
    Ghaleb, Esam
    Popa, Mirela
    Hortal, Enrique
    Asteriadis, Stylianos
    PROCEEDINGS OF THE 2017 INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), 2017, : 814 - 823
  • [8] Multiple Models Fusion for Emotion Recognition in the Wild
    Wu, Jianlong
    Lin, Zhouchen
    Zha, Hongbin
    ICMI'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2015, : 475 - 481
  • [9] Video Emotion Recognition in the Wild Based on Fusion of Multimodal Features
    Chen, Shizhe
    Li, Xinrui
    Jin, Qin
    Zhang, Shilei
    Qin, Yong
    ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 494 - 500
  • [10] Comparing Recognition Performance and Robustness of Multimodal Deep Learning Models for Multimodal Emotion Recognition
    Liu, Wei
    Qiu, Jie-Lin
    Zheng, Wei-Long
    Lu, Bao-Liang
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (02) : 715 - 729