Mixture of Attention Variants for Modal Fusion in Multi-Modal Sentiment Analysis

被引:0
|
作者
He, Chao [1 ,2 ]
Zhang, Xinghua [3 ]
Song, Dongqing [1 ]
Shen, Yingshan [2 ]
Mao, Chengjie [1 ]
Wen, Huosheng [4 ]
Zhu, Dingju [4 ]
Cai, Lihua [2 ,4 ]
机构
[1] South China Normal Univ, Sch Comp Sci, Guangzhou 510631, Peoples R China
[2] South China Normal Univ, Aberdeen Inst Data Sci & Artificial Intelligence, Guangzhou 528225, Peoples R China
[3] South China Normal Univ, Int United Coll, Guangzhou 528225, Peoples R China
[4] South China Normal Univ, Sch Software, Guangzhou 528225, Peoples R China
关键词
multi-modality; attention mechanism; sentiment analysis; feature fusion; deep learning; VISUAL SENTIMENT; SEMANTICS;
D O I
10.3390/bdcc8020014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the popularization of better network access and the penetration of personal smartphones in today's world, the explosion of multi-modal data, particularly opinionated video messages, has created urgent demands and immense opportunities for Multi-Modal Sentiment Analysis (MSA). Deep learning with the attention mechanism has served as the foundation technique for most state-of-the-art MSA models due to its ability to learn complex inter- and intra-relationships among different modalities embedded in video messages, both temporally and spatially. However, modal fusion is still a major challenge due to the vast feature space created by the interactions among different data modalities. To address the modal fusion challenge, we propose an MSA algorithm based on deep learning and the attention mechanism, namely the Mixture of Attention Variants for Modal Fusion (MAVMF). The MAVMF algorithm includes a two-stage process: in stage one, self-attention is applied to effectively extract image and text features, and the dependency relationships in the context of video discourse are captured by a bidirectional gated recurrent neural module; in stage two, four multi-modal attention variants are leveraged to learn the emotional contributions of important features from different modalities. Our proposed approach is end-to-end and has been shown to achieve a superior performance to the state-of-the-art algorithms when tested with two largest public datasets, CMU-MOSI and CMU-MOSEI.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Multi-modal fusion attention sentiment analysis for mixed sentiment classification
    Xue, Zhuanglin
    Xu, Jiabin
    [J]. COGNITIVE COMPUTATION AND SYSTEMS, 2024,
  • [2] Contextual Inter-modal Attention for Multi-modal Sentiment Analysis
    Ghosal, Deepanway
    Akhtar, Md Shad
    Chauhan, Dushyant
    Poria, Soujanya
    Ekbalt, Asif
    Bhattacharyyat, Pushpak
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3454 - 3466
  • [3] Multi-Modal Sentiment Analysis Based on Interactive Attention Mechanism
    Wu, Jun
    Zhu, Tianliang
    Zheng, Xinli
    Wang, Chunzhi
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (16):
  • [4] Improved Sentiment Classification by Multi-modal Fusion
    Gan, Lige
    Benlamri, Rachid
    Khoury, Richard
    [J]. 2017 THIRD IEEE INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2017), 2017, : 11 - 16
  • [5] Sequential Late Fusion Technique for Multi-modal Sentiment Analysis
    Banerjee, Debapriya
    Lygerakis, Fotios
    Makedon, Fillia
    [J]. THE 14TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2021, 2021, : 264 - 265
  • [6] Non-Uniform Attention Network for Multi-modal Sentiment Analysis
    Wang, Binqiang
    Dong, Gang
    Zhao, Yaqian
    Li, Rengang
    Cao, Qichun
    Chao, Yinyin
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2022, 13141 LNCS : 612 - 623
  • [7] Non-Uniform Attention Network for Multi-modal Sentiment Analysis
    Wang, Binqiang
    Dong, Gang
    Zhao, Yaqian
    Li, Rengang
    Cao, Qichun
    Chao, Yinyin
    [J]. MULTIMEDIA MODELING (MMM 2022), PT I, 2022, 13141 : 612 - 623
  • [8] Multi-Modal Sentiment Analysis Based on Image and Text Fusion Based on Cross-Attention Mechanism
    Li, Hongchan
    Lu, Yantong
    Zhu, Haodong
    [J]. ELECTRONICS, 2024, 13 (11)
  • [9] BLR: A Multi-modal Sentiment Analysis Model
    Yang Yang
    Ye Zhonglin
    Zhao Haixing
    Li Gege
    Cao Shujuan
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PART X, 2023, 14263 : 466 - 478
  • [10] Multi-modal Fusion
    Liu, Huaping
    Hussain, Amir
    Wang, Shuliang
    [J]. INFORMATION SCIENCES, 2018, 432 : 462 - 462