Topic and Style-aware Transformer for Multimodal Emotion Recognition

被引：0

作者：

Qiu, Shuwen ^{[1
]}

Sekhar, Nitesh ^{[2
]}

Singhal, Prateek ^{[2
]}

机构：

[1] Univ Calif Los Angeles, Los Angeles, CA 90024 USA

[2] Amazon, Seattle, WA USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Understanding emotion expressions in multi-modal signals is key for machines to have a better understanding of human communication. While language, visual and acoustic modalities can provide clues from different perspectives, the visual modality is shown to make minimal contribution to the performance in the emotion recognition field due to its high dimensionality. Therefore, we first leverage the strong multi-modality backbone VATT to project the visual signal to the common space with language and acoustic signals. Also, we propose content-oriented features Topic and Speaking style on top of it to approach the subjectivity issues. Experiments conducted on the benchmark dataset MOSEI show our model can outperform SOTA results and effectively incorporate visual signals and handle subjectivity issues by serving as content "normalization".

引用

页码：2074 / 2082

页数：9

共 50 条

[1] Room Style Estimation for Style-Aware Recommendation
Ataer-Cansizoglu, Esra
Liu, Hantian
Weiss, Tomer
Mitra, Archi
Dholakia, Dhaval
Choi, Jae-Woo
Wayfair, Dan Wulin
2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND VIRTUAL REALITY (AIVR), 2019, : 267 - 270
[2] MULTIMODAL TRANSFORMER FUSION FOR CONTINUOUS EMOTION RECOGNITION
Huang, Jian
Tao, Jianhua
Liu, Bin
Lian, Zheng
Niu, Mingyue
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3507 - 3511
[3] Multimodal Transformer Fusion for Emotion Recognition: A Survey
Belaref, Amdjed
Seguier, Renaud
2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024, 2024, : 107 - 113
[4] Joint Multimodal Transformer for Emotion Recognition in the Wild
Waligora, Paul
Aslam, Muhammad Haseeb
Zeeshan, Muhammad Osama
Belharbi, Soufiane
Koerich, Alessandro Lameiras
Pedersoli, Marco
Bacon, Simon
Granger, Eric
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2024, : 4625 - 4635
[5] Flow style-aware network for arbitrary style transfer
Hu, Zhenshan
Ge, Bin
Xia, Chenxing
Wu, Wenyan
Zhou, Guangao
Wang, Baotong
COMPUTERS & GRAPHICS-UK, 2024, 124
[6] A Style-aware Discriminator for Controllable Image Translation
Kim, Kunhee
Park, Sanghun
Jeon, Eunyeong
Kim, Taehun
Kim, Daijin
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18218 - 18227
[7] Topic-aware video summarization using multimodal transformer
Zhu, Yubo
Zhao, Wentian
Hua, Rui
Wu, Xinxiao
PATTERN RECOGNITION, 2023, 140
[8] Deep Ranking for Style-Aware Room Recommendations
Yildiz, Ilkay
Ataer-Cansizoglu, Esra
Liu, Hantian
Golbus, Peter
Tezcan, Ozan
Choi, Jae-Woo
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13975 - 13976
[9] Noise-Resistant Multimodal Transformer for Emotion Recognition
Liu, Yuanyuan
Zhang, Haoyu
Zhan, Yibing
Chen, Zijing
Yin, Guanghao
Wei, Lin
Chen, Zhe
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (05) : 3020 - 3040
[10] Multimodal transformer augmented fusion for speech emotion recognition
Wang, Yuanyuan
Gu, Yu
Yin, Yifei
Han, Yingping
Zhang, He
Wang, Shuang
Li, Chenyu
Quan, Dou
FRONTIERS IN NEUROROBOTICS, 2023, 17

← 1 2 3 4 5 →