Dynamically Shifting Multimodal Representations via Hybrid-Modal Attention for Multimodal Sentiment Analysis

被引:3
|
作者
Lin, Ronghao [1 ]
Hu, Haifeng [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou 510006, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Acoustics; Visualization; Feature extraction; Task analysis; Logic gates; Sentiment analysis; Multi-stage fusion framework; intra- and inter-modality dynamics; multimodal representations shifting; hybrid-modal attention; PREDICTION; LANGUAGE; SPEECH; FUSION;
D O I
10.1109/TMM.2023.3303711
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the field of multimodal machine learning, multimodal sentiment analysis task has been an active area of research. The predominant approaches focus on learning efficient multimodal representations containing intra- and inter-modality information. However, the heterogeneous nature of different modalities brings great challenges to multimodal representation learning. In this article, we propose a multi-stage fusion framework to dynamically fine-tune multimodal representations via a hybrid-modal attention mechanism. Previous methods mostly only fine-tune the textual representation due to the success of large corpus pre-trained models and neglect the inconsistency problem of different modality spaces. Thus, we design a module called the Multimodal Shifting Gate (MSG) to fine-tune the three modalities by modeling inter-modality dynamics and shifting representations. We also adopt a module named Masked Bimodal Adjustment (MBA) on the textual modality to improve the inconsistency of parameter spaces and reduce the modality gap. In addition, we utilize syntactic-level and semantic-level textual features output from different layers of the Transformer model to sufficiently capture the intra-modality dynamics. Moreover, we construct a Shifting HuberLoss to robustly introduce the variation of the shifting value into the training process. Extensive experiments on the public datasets, including CMU-MOSI and CMU-MOSEI, demonstrate the efficacy of our approach.
引用
收藏
页码:2740 / 2755
页数:16
相关论文
共 50 条
  • [1] Multimodal Sentiment Analysis Representations Learning via Contrastive Learning with Condense Attention Fusion
    Wang, Huiru
    Li, Xiuhong
    Ren, Zenyu
    Wang, Min
    Ma, Chunming
    SENSORS, 2023, 23 (05)
  • [2] Hybrid cross-modal interaction learning for multimodal sentiment analysis
    Fu, Yanping
    Zhang, Zhiyuan
    Yang, Ruidi
    Yao, Cuiyou
    NEUROCOMPUTING, 2024, 571
  • [3] Deep Modular Co-Attention Shifting Network for Multimodal Sentiment Analysis
    Shi, Piao
    Hu, Min
    Shi, Xuefeng
    Ren, Fuji
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (04)
  • [4] CFMISA: Cross-Modal Fusion of Modal Invariant and Specific Representations for Multimodal Sentiment Analysis
    Xia, Haiying
    Chen, Jingwen
    Tan, Yumei
    Tang, Xiaohu
    PATTERN RECOGNITION AND COMPUTER VISION, PT III, PRCV 2024, 2025, 15033 : 423 - 437
  • [5] Multimodal GRU with directed pairwise cross-modal attention for sentiment analysis
    Zhenkai Qin
    Qining Luo
    Zhidong Zang
    Hongpeng Fu
    Scientific Reports, 15 (1)
  • [6] Multimodal Sentiment Analysis Based on a Cross-Modal Multihead Attention Mechanism
    Deng, Lujuan
    Liu, Boyi
    Li, Zuhe
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 78 (01): : 1157 - 1170
  • [7] Hybrid Contrastive Learning of Tri-Modal Representation for Multimodal Sentiment Analysis
    Mai, Sijie
    Zeng, Ying
    Zheng, Shuangjia
    Hu, Haifeng
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 2276 - 2289
  • [8] The Weighted Cross-Modal Attention Mechanism With Sentiment Prediction Auxiliary Task for Multimodal Sentiment Analysis
    Chen, Qiupu
    Huang, Guimin
    Wang, Yabing
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2689 - 2695
  • [9] Attention fusion network for multimodal sentiment analysis
    Yuanyi Luo
    Rui Wu
    Jiafeng Liu
    Xianglong Tang
    Multimedia Tools and Applications, 2024, 83 : 8207 - 8217
  • [10] Attention fusion network for multimodal sentiment analysis
    Luo, Yuanyi
    Wu, Rui
    Liu, Jiafeng
    Tang, Xianglong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 8207 - 8217