Cross-modality Representation Interactive Learning For Multimodal Sentiment Analysis

被引:0
|
作者
Huang, Jian [1 ]
Ji, Yanli [2 ]
Yang, Yang [3 ]
Shen, Heng Tao [4 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Sichuan, Peoples R China
[2] UESTC, Shenzhen Inst Adv Study, Sch Comp Sci & Engn, Chengdu, Sichuan, Peoples R China
[3] UESTC Guangdong, Sch Comp Sci & Engn, UESTC Inst Elect & Informat Engn, Chengdu, Sichuan, Peoples R China
[4] UESTC Peng Cheng Lab, Sch Comp Sci & Engn, Chengdu, Sichuan, Peoples R China
关键词
Multimodal Sentiment Analysis; Representation Interactive Learning; Multimodal Fusion;
D O I
10.1145/3581783.3612295
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Effective alignment and fusion of multimodal features remain a significant challenge for multimodal sentiment analysis. In various multimodal applications, the text modal exhibits a significant advantage of compact yet expressive representation ability. In this paper, we propose a Cross-modality Representation Interactive Learning (CRIL) approach, which adopts the text modality to guide other modalities for learning representative feature tokens, contributing to effective multimodal fusion in multimodal sentiment analysis. We propose a semantic representation interactive learning module to learn concise semantic representation tokens for audio and video modalities under the guidance of the text modality, ensuring semantic alignment of representations among multiple modalities. Furthermore, we design a semantic relationship interactive learning module, which calculates a self-attention matrix for each modality and controls their consistency to enable the semantic relationship alignment for multiple modalities. Finally, we present a two-stage interactive fusion solution to bridge the modality gap for multimodal fusion and sentiment analysis. Extensive experiments are performed on the CMU-MOSEI, CMU-MOSI, and UR-FUNNY datasets, and experiment results demonstrate the effectiveness of our proposed approach.
引用
收藏
页码:426 / 434
页数:9
相关论文
共 50 条
  • [1] Representation Learning for Cross-Modality Classification
    van Tulder, Gijs
    de Bruijne, Marleen
    [J]. MEDICAL COMPUTER VISION AND BAYESIAN AND GRAPHICAL MODELS FOR BIOMEDICAL IMAGING, 2017, 10081 : 126 - 136
  • [2] Cross-Modality Sentiment Analysis for Social Multimedia
    Ji, Rongrong
    Cao, Donglin
    Lin, Dazhen
    [J]. 2015 1ST IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2015, : 28 - 31
  • [3] Multi-layer cross-modality attention fusion network for multimodal sentiment analysis
    Zihao Yin
    Yongping Du
    Yang Liu
    Yuxin Wang
    [J]. Multimedia Tools and Applications, 2024, 83 (21) : 60171 - 60187
  • [4] Representation Learning Through Cross-Modality Supervision
    Sankaran, Nishant
    Mohan, Deen Dayal
    Setlur, Srirangaraj
    Govindaraju, Venugopal
    Fedorishin, Dennis
    [J]. 2019 14TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2019), 2019, : 107 - 114
  • [5] Cross-Modality Microblog Sentiment Prediction via Bi-Layer Multimodal Hypergraph Learning
    Ji, Rongrong
    Chen, Fuhai
    Cao, Liujuan
    Gao, Yue
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (04) : 1062 - 1075
  • [6] Cross-modality reinforcement for unaligned sequences sentiment analysis
    Wang, Fan
    Tian, Shengwei
    Yu, Long
    Long, Jun
    Zhou, Tiejun
    Wang, Bo
    Wang, Junwen
    Wang, Yongtao
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (05) : 6013 - 6025
  • [7] Cross-modality representation learning from transformer for hashtag prediction
    Mian Muhammad Yasir Khalil
    Qingxian Wang
    Bo Chen
    Weidong Wang
    [J]. Journal of Big Data, 10
  • [8] Learning Disentangled Representation for Multimodal Cross-Domain Sentiment Analysis
    Zhang, Yuhao
    Zhang, Ying
    Guo, Wenya
    Cai, Xiangrui
    Yuan, Xiaojie
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (10) : 7956 - 7966
  • [9] Cross-modality representation learning from transformer for hashtag prediction
    Khalil, Mian Muhammad Yasir
    Wang, Qingxian
    Chen, Bo
    Wang, Weidong
    [J]. JOURNAL OF BIG DATA, 2023, 10 (01)
  • [10] CISum: Learning Cross-modality Interaction to Enhance Multimodal Semantic Coverage for Multimodal Summarization
    Zhang, Litian
    Zhang, Xiaoming
    Guo, Ziming
    Liu, Zhipeng
    [J]. PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 370 - 378