Building Robust Multimodal Sentiment Recognition via a Simple yet Effective Multimodal Transformer

被引:0
|
作者
Zong, Daoming [1 ]
Ding, Chaoyue [1 ]
Li, Baoxiang [1 ]
Zhou, Dinghao [1 ]
Li, Jiakui [1 ]
Zheng, Ken [1 ]
Zhou, Qunyan [1 ]
机构
[1] SenseTime Grp Ltd, Beijing, Peoples R China
关键词
Multimodal Sentiment Analysis; Multimodal Fusion; Modality Robustness;
D O I
10.1145/3581783.3612872
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present the solutions to the MER-MULTI and MER-NOISE sub-challenges of the Multimodal Emotion Recognition Challenge (MER 2023). For the tasks MER-MULTI and MERNOISE, participants are required to recognize both discrete and dimensional emotions. Particularly, in MER-NOISE, the test videos are corrupted with noise, necessitating the consideration of modality robustness. Our empirical findings indicate that different modalities contribute differently to the tasks, with a significant impact from the audio and visual modalities, while the text modality plays a weaker role in emotion prediction. To facilitate subsequent multimodal fusion, and considering that language information is implicitly embedded in large pre-trained speech models, we have made the deliberate choice to abandon the text modality and solely utilize visual and acoustic modalities for these sub-challenges. To address the potential underfitting of individual modalities during multimodal training, we propose to jointly train all modalities via a weighted blending of supervision signals. Furthermore, to enhance the robustness of our model, we employ a range of data augmentation techniques at the image level, waveform level, and spectrogram level. Experimental results show that our model ranks 1st in both MER-MULTI (0.7005) and MER-NOISE (0.6846) subchallenges, validating the effectiveness of our method. Our code is publicly available at https://github.com/dingchaoyue/MultimodalEmotion- Recognition-MER- and- MuSe- 2023- Challenges.
引用
收藏
页码:9596 / 9600
页数:5
相关论文
共 50 条
  • [1] Efficient Multimodal Transformer With Dual-Level Feature Restoration for Robust Multimodal Sentiment Analysis
    Sun, Licai
    Lian, Zheng
    Liu, Bin
    Tao, Jianhua
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (01) : 309 - 325
  • [2] Multimodal transformer with adaptive modality weighting for multimodal sentiment analysis
    Wang, Yifeng
    He, Jiahao
    Wang, Di
    Wang, Quan
    Wan, Bo
    Luo, Xuemei
    [J]. NEUROCOMPUTING, 2024, 572
  • [3] Multimodal Phased Transformer for Sentiment Analysis
    Cheng, Junyan
    Fostiropoulos, Iordanis
    Boehm, Barry
    Soleymani, Mohammad
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 2447 - 2458
  • [4] Husformer: A Multimodal Transformer for Multimodal Human State Recognition
    Wang, Ruiqi
    Jo, Wonse
    Zhao, Dezhong
    Wang, Weizheng
    Gupte, Arjun
    Yang, Baijian
    Chen, Guohua
    Min, Byung-Cheol
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (04) : 1374 - 1390
  • [5] Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis
    Yuan, Ziqi
    Li, Wei
    Xu, Hua
    Yu, Wenmeng
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4400 - 4407
  • [6] Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer
    Yu, Jianfei
    Jiang, Jing
    Yang, Li
    Xia, Rui
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3342 - 3352
  • [7] Hierarchical Interactive Multimodal Transformer for Aspect-Based Multimodal Sentiment Analysis
    Yu, Jianfei
    Chen, Kai
    Xia, Rui
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 1966 - 1978
  • [8] Multimodal Fusion for Human Action Recognition via Spatial Transformer
    Sun, Yaohui
    Xu, Weiyao
    Gao, Ju
    Yu, Xiaoyi
    [J]. 2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 1638 - 1641
  • [9] Multimodal Transformer for Nursing Activity Recognition
    Ijaz, Momal
    Diaz, Renato
    Chen, Chen
    [J]. arXiv, 2022,
  • [10] Multimodal Transformer for Nursing Activity Recognition
    Ijaz, Momal
    Diaz, Renato
    Chen, Chen
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2064 - 2073