Dynamically Shifting Multimodal Representations via Hybrid-Modal Attention for Multimodal Sentiment Analysis

被引:3
|
作者
Lin, Ronghao [1 ]
Hu, Haifeng [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou 510006, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Acoustics; Visualization; Feature extraction; Task analysis; Logic gates; Sentiment analysis; Multi-stage fusion framework; intra- and inter-modality dynamics; multimodal representations shifting; hybrid-modal attention; PREDICTION; LANGUAGE; SPEECH; FUSION;
D O I
10.1109/TMM.2023.3303711
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the field of multimodal machine learning, multimodal sentiment analysis task has been an active area of research. The predominant approaches focus on learning efficient multimodal representations containing intra- and inter-modality information. However, the heterogeneous nature of different modalities brings great challenges to multimodal representation learning. In this article, we propose a multi-stage fusion framework to dynamically fine-tune multimodal representations via a hybrid-modal attention mechanism. Previous methods mostly only fine-tune the textual representation due to the success of large corpus pre-trained models and neglect the inconsistency problem of different modality spaces. Thus, we design a module called the Multimodal Shifting Gate (MSG) to fine-tune the three modalities by modeling inter-modality dynamics and shifting representations. We also adopt a module named Masked Bimodal Adjustment (MBA) on the textual modality to improve the inconsistency of parameter spaces and reduce the modality gap. In addition, we utilize syntactic-level and semantic-level textual features output from different layers of the Transformer model to sufficiently capture the intra-modality dynamics. Moreover, we construct a Shifting HuberLoss to robustly introduce the variation of the shifting value into the training process. Extensive experiments on the public datasets, including CMU-MOSI and CMU-MOSEI, demonstrate the efficacy of our approach.
引用
收藏
页码:2740 / 2755
页数:16
相关论文
共 50 条
  • [21] Multimodal Sentiment Analysis Using BiGRU and Attention-Based Hybrid Fusion Strategy
    Liu, Zhizhong
    Zhou, Bin
    Meng, Lingqiang
    Huang, Guangyu
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 37 (02): : 1963 - 1981
  • [22] IMCN: Identifying Modal Contribution Network for Multimodal Sentiment Analysis
    Zhang, Qiongan
    Shi, Lei
    Liu, Peiyu
    Zhu, Zhenfang
    Xu, Liancheng
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4729 - 4735
  • [23] Cross-Modal Enhancement Network for Multimodal Sentiment Analysis
    Wang, Di
    Liu, Shuai
    Wang, Quan
    Tian, Yumin
    He, Lihuo
    Gao, Xinbo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4909 - 4921
  • [24] Multimodal Sentiment Analysis With Image-Text Correlation Modal
    Li, Yuxin
    Jiang, Shan
    Chaomurilige
    2023 IEEE INTERNATIONAL CONFERENCES ON INTERNET OF THINGS, ITHINGS IEEE GREEN COMPUTING AND COMMUNICATIONS, GREENCOM IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING, CPSCOM IEEE SMART DATA, SMARTDATA AND IEEE CONGRESS ON CYBERMATICS,CYBERMATICS, 2024, : 281 - 286
  • [25] Multimodal optimization via dynamically hybrid niching differential evolution
    Wang, Kai
    Gong, Wenyin
    Deng, Libao
    Wang, Ling
    KNOWLEDGE-BASED SYSTEMS, 2022, 238
  • [26] Multi-View Interactive Representations for Multimodal Sentiment Analysis
    Tang, Zemin
    Xiao, Qi
    Qin, Yunchuan
    Zhou, Xu
    Zhou, Joey Tianyi
    Li, Kenli
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 4095 - 4107
  • [27] Multimodal Sentiment Analysis Based on Expert Mixing of Subtask Representations
    Lei, Ling
    He, Wangjun
    Zheng, Qiuyan
    Zhu, Bing
    IEEE ACCESS, 2025, 13 : 44278 - 44287
  • [28] Multimodal Sentiment Analysis Based on Cross-Modal Attention and Gated Cyclic Hierarchical Fusion Networks
    Quan, Zhibang
    Sun, Tao
    Su, Mengli
    Wei, Jishu
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [29] Multimodal Sentiment Analysis Method Based on Cross-Modal Attention and Gated Unit Fusion Network
    Chen, Yansong
    Zhang, Le
    Zhang, Leihan
    Lü, Xueqiang
    Data Analysis and Knowledge Discovery, 2024, 8 (07) : 67 - 76
  • [30] SKEAFN: Sentiment Knowledge Enhanced Attention Fusion Network for multimodal sentiment analysis
    Zhu, Chuanbo
    Chen, Min
    Zhang, Sheng
    Sun, Chao
    Liang, Han
    Liu, Yifan
    Chen, Jincai
    INFORMATION FUSION, 2023, 100