Topic and Style-aware Transformer for Multimodal Emotion Recognition

被引：0

作者：

Qiu, Shuwen ^{[1
]}

Sekhar, Nitesh ^{[2
]}

Singhal, Prateek ^{[2
]}

机构：

[1] Univ Calif Los Angeles, Los Angeles, CA 90024 USA

[2] Amazon, Seattle, WA USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Understanding emotion expressions in multi-modal signals is key for machines to have a better understanding of human communication. While language, visual and acoustic modalities can provide clues from different perspectives, the visual modality is shown to make minimal contribution to the performance in the emotion recognition field due to its high dimensionality. Therefore, we first leverage the strong multi-modality backbone VATT to project the visual signal to the common space with language and acoustic signals. Also, we propose content-oriented features Topic and Speaking style on top of it to approach the subjectivity issues. Experiments conducted on the benchmark dataset MOSEI show our model can outperform SOTA results and effectively incorporate visual signals and handle subjectivity issues by serving as content "normalization".

引用

页码：2074 / 2082

页数：9

共 50 条

[31] Learning to Style-Aware Bayesian Personalized Ranking for Visual Recommendation
He, Ming
Zhang, Shaozong
Meng, Qian
IEEE ACCESS, 2019, 7 : 14198 - 14205
[32] Style-aware adversarial pairwise ranking for image recommendation systems
Zhefu Wu
Song Zhang
Agyemang Paul
Luping Fang
International Journal of Multimedia Information Retrieval, 2023, 12
[33] DBT: multimodal emotion recognition based on dual-branch transformer
Yufan Yi
Yan Tian
Cong He
Yajing Fan
Xinli Hu
Yiping Xu
The Journal of Supercomputing, 2023, 79 : 8611 - 8633
[34] Towards Learning a Joint Representation from Transformer in Multimodal Emotion Recognition
Deng, James J.
Leung, Clement H. C.
BRAIN INFORMATICS, BI 2021, 2021, 12960 : 179 - 188
[35] Improving multimodal fusion with Main Modal Transformer for emotion recognition in conversation
Zou, ShiHao
Huang, Xianying
Shen, XuDong
Liu, Hankai
KNOWLEDGE-BASED SYSTEMS, 2022, 258
[36] Meaningful Multimodal Emotion Recognition Based on Capsule Graph Transformer Architecture
Filali, Hajar
Boulealam, Chafik
El Fazazy, Khalid
Mahraz, Adnane Mohamed
Tairi, Hamid
Riffi, Jamal
INFORMATION, 2025, 16 (01)
[37] DBT: multimodal emotion recognition based on dual-branch transformer
Yi, Yufan
Tian, Yan
He, Cong
Fan, Yajing
Hu, Xinli
Xu, Yiping
JOURNAL OF SUPERCOMPUTING, 2023, 79 (08): : 8611 - 8633
[38] Exploring Style-Robust Scene Text Detection via Style-Aware Learning
Cai, Yuanqiang
Zhou, Fenfen
Yin, Ronghui
ELECTRONICS, 2024, 13 (02)
[39] Multimodal Prompt Transformer with Hybrid Contrastive Learning for Emotion Recognition in Conversation
Zou, Shihao
Huang, Xianying
Shen, Xudong
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5994 - 6003
[40] Learning Style-Aware Symbolic Music Representations by Adversarial Autoencoders
Valenti, Andrea
Carta, Antonio
Bacciu, Davide
ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1563 - 1570

← 1 2 3 4 5 →