Efficient Multimodal Transformer With Dual-Level Feature Restoration for Robust Multimodal Sentiment Analysis

被引:28
|
作者
Sun, Licai [1 ,2 ]
Lian, Zheng [2 ]
Liu, Bin [2 ]
Tao, Jianhua [3 ]
机构
[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[3] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Robustness; Semantics; Data models; Computational modeling; Videos; Training; Multimodal sentiment analysis; unaligned and incomplete data; efficient multimodal Transformer; dual-level feature restoration; robustness;
D O I
10.1109/TAFFC.2023.3274829
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the proliferation of user-generated online videos, Multimodal Sentiment Analysis (MSA) has attracted increasing attention recently. Despite significant progress, there are still two major challenges on the way towards robust MSA: 1) inefficiency when modeling cross-modal interactions in unaligned multimodal data; and 2) vulnerability to random modality feature missing which typically occurs in realistic settings. In this paper, we propose a generic and unified framework to address them, named Efficient Multimodal Transformer with Dual-Level Feature Restoration (EMT-DLFR). Concretely, EMT employs utterance-level representations from each modality as the global multimodal context to interact with local unimodal features and mutually promote each other. It not only avoids the quadratic scaling cost of previous local-local cross-modal interaction methods but also leads to better performance. To improve model robustness in the incomplete modality setting, on the one hand, DLFR performs low-level feature reconstruction to implicitly encourage the model to learn semantic information from incomplete data. On the other hand, it innovatively regards complete and incomplete data as two different views of one sample and utilizes siamese representation learning to explicitly attract their high-level representations. Comprehensive experiments on three popular datasets demonstrate that our method achieves superior performance in both complete and incomplete modality settings.
引用
收藏
页码:309 / 325
页数:17
相关论文
共 50 条
  • [1] Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis
    Yuan, Ziqi
    Li, Wei
    Xu, Hua
    Yu, Wenmeng
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4400 - 4407
  • [2] Multimodal Phased Transformer for Sentiment Analysis
    Cheng, Junyan
    Fostiropoulos, Iordanis
    Boehm, Barry
    Soleymani, Mohammad
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 2447 - 2458
  • [3] Multimodal transformer with adaptive modality weighting for multimodal sentiment analysis
    Wang, Yifeng
    He, Jiahao
    Wang, Di
    Wang, Quan
    Wan, Bo
    Luo, Xuemei
    [J]. NEUROCOMPUTING, 2024, 572
  • [4] Building Robust Multimodal Sentiment Recognition via a Simple yet Effective Multimodal Transformer
    Zong, Daoming
    Ding, Chaoyue
    Li, Baoxiang
    Zhou, Dinghao
    Li, Jiakui
    Zheng, Ken
    Zhou, Qunyan
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9596 - 9600
  • [5] A feature-based restoration dynamic interaction network for multimodal sentiment analysis
    Zeng, Yufei
    Li, Zhixin
    Chen, Zhenbin
    Ma, Huifang
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127
  • [6] Hierarchical Interactive Multimodal Transformer for Aspect-Based Multimodal Sentiment Analysis
    Yu, Jianfei
    Chen, Kai
    Xia, Rui
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 1966 - 1978
  • [7] AcFormer: An Aligned and Compact Transformer for Multimodal Sentiment Analysis
    Zong, Daoming
    Ding, Chaoyue
    Li, Baoxiang
    Li, Jiakui
    Zheng, Ken
    Zhou, Qunyan
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 833 - 842
  • [8] TMBL: Transformer-based multimodal binding learning model for multimodal sentiment analysis
    Huang, Jiehui
    Zhou, Jun
    Tang, Zhenchao
    Lin, Jiaying
    Chen, Calvin Yu-Chian
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 285
  • [9] TensorFormer: A Tensor-Based Multimodal Transformer for Multimodal Sentiment Analysis and Depression Detection
    Sun, Hao
    Chen, Yen-Wei
    Lin, Lanfen
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (04) : 2776 - 2786
  • [10] MEDT: Using Multimodal Encoding-Decoding Network as in Transformer for Multimodal Sentiment Analysis
    Qi, Qingfu
    Lin, Liyuan
    Zhang, Rui
    Xue, Chengrong
    [J]. IEEE ACCESS, 2022, 10 : 28750 - 28759