Multi-Label Multimodal Emotion Recognition With Transformer-Based Fusion and Emotion-Level Representation Learning

被引:28
|
作者
Le, Hoai-Duy [1 ]
Lee, Guee-Sang [1 ]
Kim, Soo-Hyung [1 ]
Kim, Seungwon [1 ]
Yang, Hyung-Jeong [1 ]
机构
[1] Chonnam Natl Univ, Dept Artificial Intelligence Convergence, Gwangju 61186, South Korea
关键词
Transformers; Emotion recognition; Feature extraction; Task analysis; Visualization; Deep learning; Sentiment analysis; Multimodal fusion; multi-label video emotion recognition; transformers;
D O I
10.1109/ACCESS.2023.3244390
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotion recognition has been an active research area for a long time. Recently, multimodal emotion recognition from video data has grown in importance with the explosion of video content due to the emergence of short video social media platforms. Effectively incorporating information from multiple modalities in video data to learn robust multimodal representation for improving recognition model performance is still the primary challenge for researchers. In this context, transformer architectures have been widely used and have significantly improved multimodal deep learning and representation learning. Inspired by this, we propose a transformer-based fusion and representation learning method to fuse and enrich multimodal features from raw videos for the task of multi-label video emotion recognition. Specifically, our method takes raw video frames, audio signals, and text subtitles as inputs and passes information from these multiple modalities through a unified transformer architecture for learning a joint multimodal representation. Moreover, we use the label-level representation approach to deal with the multi-label classification task and enhance the model performance. We conduct experiments on two benchmark datasets: Interactive Emotional Dyadic Motion Capture (IEMOCAP) and Carnegie Mellon University Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI) to evaluate our proposed method. The experimental results demonstrate that the proposed method outperforms other strong baselines and existing approaches for multi-label video emotion recognition.
引用
收藏
页码:14742 / 14751
页数:10
相关论文
共 50 条
  • [21] MULTIMODAL EMOTION RECOGNITION WITH CAPSULE GRAPH CONVOLUTIONAL BASED REPRESENTATION FUSION
    Liu, Jiaxing
    Chen, Sen
    Wang, Longbiao
    Liu, Zhilei
    Fu, Yahui
    Guo, Lili
    Dang, Jianwu
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6339 - 6343
  • [22] An optimized multi-label TSK fuzzy system for emotion recognition of multimodal physiological signals
    Li, Yixuan
    Fu, Zhongzheng
    He, Xinrun
    Huang, Jian
    2022 IEEE INTERNATIONAL CONFERENCE ON CYBORG AND BIONIC SYSTEMS, CBS, 2022, : 362 - 367
  • [23] Multi-label Classifier for Emotion Recognition from Music
    Tomar, Divya
    Agarwal, Sonali
    PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, NETWORKING AND INFORMATICS (ICACNI 2015), VOL 1, 2016, 43 : 111 - 123
  • [24] Multi-label emotion recognition of weblog sentence based on Bayesian networks
    Wang, Lei
    Ren, Fuji
    Miao, Duoqian
    IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2016, 11 (02) : 178 - 184
  • [25] Emotion Recognition With Multimodal Transformer Fusion Framework Based on Acoustic and Lexical Information
    Guo, Lili
    Wang, Longbiao
    Dang, Jianwu
    Fu, Yahui
    Liu, Jiaxing
    Ding, Shifei
    IEEE MULTIMEDIA, 2022, 29 (02) : 94 - 103
  • [26] Multi-Label Emotion Recognition of Korean Speech Data Using Deep Fusion Models
    Park, Seoin
    Jeon, Byeonghoon
    Lee, Seunghyun
    Yoon, Janghyeok
    APPLIED SCIENCES-BASEL, 2024, 14 (17):
  • [27] Multi-label emotion classification based on adversarial multi-task learning
    Lin, Nankai
    Fu, Sihui
    Lin, Xiaotian
    Wang, Lianxi
    INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (06)
  • [28] Image emotion multi-label classification based on multi-graph learning
    Wang, Meixia
    Zhao, Yuhai
    Wang, Yejiang
    Xu, Tongze
    Sun, Yiming
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 231
  • [29] Multimodal Emotion Recognition Based on Feature Fusion
    Xu, Yurui
    Wu, Xiao
    Su, Hang
    Liu, Xiaorui
    2022 INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2022), 2022, : 7 - 11
  • [30] Emotion Detection in Online Social Network Based on Multi-label Learning
    Zhang, Xiao
    Li, Wenzhong
    Lu, Sanglu
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2017), PT I, 2017, 10177 : 659 - 674