Topic and Style-aware Transformer for Multimodal Emotion Recognition

被引：0

作者：

Qiu, Shuwen ^{[1
]}

Sekhar, Nitesh ^{[2
]}

Singhal, Prateek ^{[2
]}

机构：

[1] Univ Calif Los Angeles, Los Angeles, CA 90024 USA

[2] Amazon, Seattle, WA USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Understanding emotion expressions in multi-modal signals is key for machines to have a better understanding of human communication. While language, visual and acoustic modalities can provide clues from different perspectives, the visual modality is shown to make minimal contribution to the performance in the emotion recognition field due to its high dimensionality. Therefore, we first leverage the strong multi-modality backbone VATT to project the visual signal to the common space with language and acoustic signals. Also, we propose content-oriented features Topic and Speaking style on top of it to approach the subjectivity issues. Experiments conducted on the benchmark dataset MOSEI show our model can outperform SOTA results and effectively incorporate visual signals and handle subjectivity issues by serving as content "normalization".

引用

页码：2074 / 2082

页数：9

共 50 条

[41] Style-aware adversarial pairwise ranking for image recommendation systems
Wu, Zhefu
Zhang, Song
Paul, Agyemang
Fang, Luping
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2023, 12 (02)
[42] TACST: Time-Aware Transformer for Robust Speech Emotion Recognition
Wei, Wei
Zhang, Bingkun
Wang, Yibing
MULTIMEDIA MODELING, MMM 2025, PT IV, 2025, 15523 : 442 - 453
[43] Incongruity-aware multimodal physiology signals fusion for emotion recognition
Li, Jing
Chen, Ning
Zhu, Hongqing
Li, Guangqiang
Xu, Zhangyong
Chen, Dingxin
INFORMATION FUSION, 2024, 105
[44] A style-aware architectural middleware for resource-constrained, distributed systems
Malek, S
Mikic-Rakic, M
Medvidovic, N
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2005, 31 (03) : 256 - 272
[45] Style-aware and multi-scale attention for face image completion
Liu H.
Li S.
Zhu X.
Sun H.
Zhang J.
Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2022, 54 (05): : 49 - 56
[46] Style-aware two-stage learning framework for video captioning
Ma, Yunchuan
Zhu, Zheng
Qi, Yuankai
Beheshti, Amin
Li, Ying
Qing, Laiyun
Li, Guorong
KNOWLEDGE-BASED SYSTEMS, 2024, 301
[47] LGCCT: A Light Gated and Crossed Complementation Transformer for Multimodal Speech Emotion Recognition
Liu, Feng
Shen, Si-Yuan
Fu, Zi-Wang
Wang, Han-Yang
Zhou, Ai-Min
Qi, Jia-Yin
ENTROPY, 2022, 24 (07)
[48] Multimodal Emotion Recognition With Transformer-Based Self Supervised Feature Fusion
Siriwardhana, Shamane
Kaluarachchi, Tharindu
Billinghurst, Mark
Nanayakkara, Suranga
IEEE ACCESS, 2020, 8 (08): : 176274 - 176285
[49] Automatic movie genre classification & emotion recognition via a BiProjection Multimodal Transformer
Moreno-Galvan, Diego Aaron
Lopez-Santillan, Roberto
Gonzalez-Gurrola, Luis Carlos
Montes-Y-Gomez, Manuel
Sanchez-Vega, Fernando
Lopez-Monroy, Adrian Pastor
INFORMATION FUSION, 2025, 113
[50] MM-NodeFormer: Node Transformer Multimodal Fusion for Emotion Recognition in Conversation
Huang, Zilong
Mak, Man-Wai
Lee, Kong Aik
INTERSPEECH 2024, 2024, : 4069 - 4073

← 1 2 3 4 5 →