Topic and Style-aware Transformer for Multimodal Emotion Recognition

被引:0
|
作者
Qiu, Shuwen [1 ]
Sekhar, Nitesh [2 ]
Singhal, Prateek [2 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90024 USA
[2] Amazon, Seattle, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Understanding emotion expressions in multi-modal signals is key for machines to have a better understanding of human communication. While language, visual and acoustic modalities can provide clues from different perspectives, the visual modality is shown to make minimal contribution to the performance in the emotion recognition field due to its high dimensionality. Therefore, we first leverage the strong multi-modality backbone VATT to project the visual signal to the common space with language and acoustic signals. Also, we propose content-oriented features Topic and Speaking style on top of it to approach the subjectivity issues. Experiments conducted on the benchmark dataset MOSEI show our model can outperform SOTA results and effectively incorporate visual signals and handle subjectivity issues by serving as content "normalization".
引用
收藏
页码:2074 / 2082
页数:9
相关论文
共 50 条
  • [31] Learning to Style-Aware Bayesian Personalized Ranking for Visual Recommendation
    He, Ming
    Zhang, Shaozong
    Meng, Qian
    IEEE ACCESS, 2019, 7 : 14198 - 14205
  • [32] Style-aware adversarial pairwise ranking for image recommendation systems
    Zhefu Wu
    Song Zhang
    Agyemang Paul
    Luping Fang
    International Journal of Multimedia Information Retrieval, 2023, 12
  • [33] DBT: multimodal emotion recognition based on dual-branch transformer
    Yufan Yi
    Yan Tian
    Cong He
    Yajing Fan
    Xinli Hu
    Yiping Xu
    The Journal of Supercomputing, 2023, 79 : 8611 - 8633
  • [34] Towards Learning a Joint Representation from Transformer in Multimodal Emotion Recognition
    Deng, James J.
    Leung, Clement H. C.
    BRAIN INFORMATICS, BI 2021, 2021, 12960 : 179 - 188
  • [35] Improving multimodal fusion with Main Modal Transformer for emotion recognition in conversation
    Zou, ShiHao
    Huang, Xianying
    Shen, XuDong
    Liu, Hankai
    KNOWLEDGE-BASED SYSTEMS, 2022, 258
  • [36] Meaningful Multimodal Emotion Recognition Based on Capsule Graph Transformer Architecture
    Filali, Hajar
    Boulealam, Chafik
    El Fazazy, Khalid
    Mahraz, Adnane Mohamed
    Tairi, Hamid
    Riffi, Jamal
    INFORMATION, 2025, 16 (01)
  • [37] DBT: multimodal emotion recognition based on dual-branch transformer
    Yi, Yufan
    Tian, Yan
    He, Cong
    Fan, Yajing
    Hu, Xinli
    Xu, Yiping
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (08): : 8611 - 8633
  • [38] Exploring Style-Robust Scene Text Detection via Style-Aware Learning
    Cai, Yuanqiang
    Zhou, Fenfen
    Yin, Ronghui
    ELECTRONICS, 2024, 13 (02)
  • [39] Multimodal Prompt Transformer with Hybrid Contrastive Learning for Emotion Recognition in Conversation
    Zou, Shihao
    Huang, Xianying
    Shen, Xudong
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5994 - 6003
  • [40] Learning Style-Aware Symbolic Music Representations by Adversarial Autoencoders
    Valenti, Andrea
    Carta, Antonio
    Bacciu, Davide
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1563 - 1570