Topic and Style-aware Transformer for Multimodal Emotion Recognition

被引:0
|
作者
Qiu, Shuwen [1 ]
Sekhar, Nitesh [2 ]
Singhal, Prateek [2 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90024 USA
[2] Amazon, Seattle, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Understanding emotion expressions in multi-modal signals is key for machines to have a better understanding of human communication. While language, visual and acoustic modalities can provide clues from different perspectives, the visual modality is shown to make minimal contribution to the performance in the emotion recognition field due to its high dimensionality. Therefore, we first leverage the strong multi-modality backbone VATT to project the visual signal to the common space with language and acoustic signals. Also, we propose content-oriented features Topic and Speaking style on top of it to approach the subjectivity issues. Experiments conducted on the benchmark dataset MOSEI show our model can outperform SOTA results and effectively incorporate visual signals and handle subjectivity issues by serving as content "normalization".
引用
收藏
页码:2074 / 2082
页数:9
相关论文
共 50 条
  • [1] Room Style Estimation for Style-Aware Recommendation
    Ataer-Cansizoglu, Esra
    Liu, Hantian
    Weiss, Tomer
    Mitra, Archi
    Dholakia, Dhaval
    Choi, Jae-Woo
    Wayfair, Dan Wulin
    2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND VIRTUAL REALITY (AIVR), 2019, : 267 - 270
  • [2] MULTIMODAL TRANSFORMER FUSION FOR CONTINUOUS EMOTION RECOGNITION
    Huang, Jian
    Tao, Jianhua
    Liu, Bin
    Lian, Zheng
    Niu, Mingyue
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3507 - 3511
  • [3] Multimodal Transformer Fusion for Emotion Recognition: A Survey
    Belaref, Amdjed
    Seguier, Renaud
    2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024, 2024, : 107 - 113
  • [4] Joint Multimodal Transformer for Emotion Recognition in the Wild
    Waligora, Paul
    Aslam, Muhammad Haseeb
    Zeeshan, Muhammad Osama
    Belharbi, Soufiane
    Koerich, Alessandro Lameiras
    Pedersoli, Marco
    Bacon, Simon
    Granger, Eric
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2024, : 4625 - 4635
  • [5] Flow style-aware network for arbitrary style transfer
    Hu, Zhenshan
    Ge, Bin
    Xia, Chenxing
    Wu, Wenyan
    Zhou, Guangao
    Wang, Baotong
    COMPUTERS & GRAPHICS-UK, 2024, 124
  • [6] A Style-aware Discriminator for Controllable Image Translation
    Kim, Kunhee
    Park, Sanghun
    Jeon, Eunyeong
    Kim, Taehun
    Kim, Daijin
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18218 - 18227
  • [7] Topic-aware video summarization using multimodal transformer
    Zhu, Yubo
    Zhao, Wentian
    Hua, Rui
    Wu, Xinxiao
    PATTERN RECOGNITION, 2023, 140
  • [8] Deep Ranking for Style-Aware Room Recommendations
    Yildiz, Ilkay
    Ataer-Cansizoglu, Esra
    Liu, Hantian
    Golbus, Peter
    Tezcan, Ozan
    Choi, Jae-Woo
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13975 - 13976
  • [9] Noise-Resistant Multimodal Transformer for Emotion Recognition
    Liu, Yuanyuan
    Zhang, Haoyu
    Zhan, Yibing
    Chen, Zijing
    Yin, Guanghao
    Wei, Lin
    Chen, Zhe
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (05) : 3020 - 3040
  • [10] Multimodal transformer augmented fusion for speech emotion recognition
    Wang, Yuanyuan
    Gu, Yu
    Yin, Yifei
    Han, Yingping
    Zhang, He
    Wang, Shuang
    Li, Chenyu
    Quan, Dou
    FRONTIERS IN NEUROROBOTICS, 2023, 17