Topic and Style-aware Transformer for Multimodal Emotion Recognition

被引:0
|
作者
Qiu, Shuwen [1 ]
Sekhar, Nitesh [2 ]
Singhal, Prateek [2 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90024 USA
[2] Amazon, Seattle, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Understanding emotion expressions in multi-modal signals is key for machines to have a better understanding of human communication. While language, visual and acoustic modalities can provide clues from different perspectives, the visual modality is shown to make minimal contribution to the performance in the emotion recognition field due to its high dimensionality. Therefore, we first leverage the strong multi-modality backbone VATT to project the visual signal to the common space with language and acoustic signals. Also, we propose content-oriented features Topic and Speaking style on top of it to approach the subjectivity issues. Experiments conducted on the benchmark dataset MOSEI show our model can outperform SOTA results and effectively incorporate visual signals and handle subjectivity issues by serving as content "normalization".
引用
收藏
页码:2074 / 2082
页数:9
相关论文
共 50 条
  • [41] Style-aware adversarial pairwise ranking for image recommendation systems
    Wu, Zhefu
    Zhang, Song
    Paul, Agyemang
    Fang, Luping
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2023, 12 (02)
  • [42] TACST: Time-Aware Transformer for Robust Speech Emotion Recognition
    Wei, Wei
    Zhang, Bingkun
    Wang, Yibing
    MULTIMEDIA MODELING, MMM 2025, PT IV, 2025, 15523 : 442 - 453
  • [43] Incongruity-aware multimodal physiology signals fusion for emotion recognition
    Li, Jing
    Chen, Ning
    Zhu, Hongqing
    Li, Guangqiang
    Xu, Zhangyong
    Chen, Dingxin
    INFORMATION FUSION, 2024, 105
  • [44] A style-aware architectural middleware for resource-constrained, distributed systems
    Malek, S
    Mikic-Rakic, M
    Medvidovic, N
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2005, 31 (03) : 256 - 272
  • [45] Style-aware and multi-scale attention for face image completion
    Liu H.
    Li S.
    Zhu X.
    Sun H.
    Zhang J.
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2022, 54 (05): : 49 - 56
  • [46] Style-aware two-stage learning framework for video captioning
    Ma, Yunchuan
    Zhu, Zheng
    Qi, Yuankai
    Beheshti, Amin
    Li, Ying
    Qing, Laiyun
    Li, Guorong
    KNOWLEDGE-BASED SYSTEMS, 2024, 301
  • [47] LGCCT: A Light Gated and Crossed Complementation Transformer for Multimodal Speech Emotion Recognition
    Liu, Feng
    Shen, Si-Yuan
    Fu, Zi-Wang
    Wang, Han-Yang
    Zhou, Ai-Min
    Qi, Jia-Yin
    ENTROPY, 2022, 24 (07)
  • [48] Multimodal Emotion Recognition With Transformer-Based Self Supervised Feature Fusion
    Siriwardhana, Shamane
    Kaluarachchi, Tharindu
    Billinghurst, Mark
    Nanayakkara, Suranga
    IEEE ACCESS, 2020, 8 (08): : 176274 - 176285
  • [49] Automatic movie genre classification & emotion recognition via a BiProjection Multimodal Transformer
    Moreno-Galvan, Diego Aaron
    Lopez-Santillan, Roberto
    Gonzalez-Gurrola, Luis Carlos
    Montes-Y-Gomez, Manuel
    Sanchez-Vega, Fernando
    Lopez-Monroy, Adrian Pastor
    INFORMATION FUSION, 2025, 113
  • [50] MM-NodeFormer: Node Transformer Multimodal Fusion for Emotion Recognition in Conversation
    Huang, Zilong
    Mak, Man-Wai
    Lee, Kong Aik
    INTERSPEECH 2024, 2024, : 4069 - 4073