Mixture of Attention Variants for Modal Fusion in Multi-Modal Sentiment Analysis

被引:0
|
作者
He, Chao [1 ,2 ]
Zhang, Xinghua [3 ]
Song, Dongqing [1 ]
Shen, Yingshan [2 ]
Mao, Chengjie [1 ]
Wen, Huosheng [4 ]
Zhu, Dingju [4 ]
Cai, Lihua [2 ,4 ]
机构
[1] South China Normal Univ, Sch Comp Sci, Guangzhou 510631, Peoples R China
[2] South China Normal Univ, Aberdeen Inst Data Sci & Artificial Intelligence, Guangzhou 528225, Peoples R China
[3] South China Normal Univ, Int United Coll, Guangzhou 528225, Peoples R China
[4] South China Normal Univ, Sch Software, Guangzhou 528225, Peoples R China
关键词
multi-modality; attention mechanism; sentiment analysis; feature fusion; deep learning; VISUAL SENTIMENT; SEMANTICS;
D O I
10.3390/bdcc8020014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the popularization of better network access and the penetration of personal smartphones in today's world, the explosion of multi-modal data, particularly opinionated video messages, has created urgent demands and immense opportunities for Multi-Modal Sentiment Analysis (MSA). Deep learning with the attention mechanism has served as the foundation technique for most state-of-the-art MSA models due to its ability to learn complex inter- and intra-relationships among different modalities embedded in video messages, both temporally and spatially. However, modal fusion is still a major challenge due to the vast feature space created by the interactions among different data modalities. To address the modal fusion challenge, we propose an MSA algorithm based on deep learning and the attention mechanism, namely the Mixture of Attention Variants for Modal Fusion (MAVMF). The MAVMF algorithm includes a two-stage process: in stage one, self-attention is applied to effectively extract image and text features, and the dependency relationships in the context of video discourse are captured by a bidirectional gated recurrent neural module; in stage two, four multi-modal attention variants are leveraged to learn the emotional contributions of important features from different modalities. Our proposed approach is end-to-end and has been shown to achieve a superior performance to the state-of-the-art algorithms when tested with two largest public datasets, CMU-MOSI and CMU-MOSEI.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Attention-based multi-modal fusion sarcasm detection
    Liu, Jing
    Tian, Shengwei
    Yu, Long
    Long, Jun
    Zhou, Tiejun
    Wang, Bo
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 2097 - 2108
  • [22] Application of Multi-modal Fusion Attention Mechanism in Semantic Segmentation
    Liu, Yunlong
    Yoshie, Osamu
    Watanabe, Hiroshi
    COMPUTER VISION - ACCV 2022, PT VII, 2023, 13847 : 378 - 397
  • [23] Multi-modal Conditional Attention Fusion for Dimensional Emotion Prediction
    Chen, Shizhe
    Jin, Qin
    MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 571 - 575
  • [24] Cross-Modal Semantic Alignment and Information Refinement for Multi-Modal Sentiment Analysis
    Ding, Meirong
    Chen, Hongye
    Zeng, Biqing
    Computer Engineering and Applications, 2024, 60 (22) : 114 - 125
  • [25] Cross-modal context-gated convolution for multi-modal sentiment analysis
    Wen, Huanglu
    You, Shaodi
    Fu, Ying
    PATTERN RECOGNITION LETTERS, 2021, 146 : 252 - 259
  • [26] Attention-based Multi-modal Sentiment Analysis and Emotion Detection in Conversation using RNN
    Huddar, Mahesh G.
    Sannakki, Sanjeev S.
    Rajpurohit, Vijay S.
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2021, 6 (06): : 112 - 121
  • [27] Cross-modal attention for multi-modal image registration
    Song, Xinrui
    Chao, Hanqing
    Xu, Xuanang
    Guo, Hengtao
    Xu, Sheng
    Turkbey, Baris
    Wood, Bradford J.
    Sanford, Thomas
    Wang, Ge
    Yan, Pingkun
    MEDICAL IMAGE ANALYSIS, 2022, 82
  • [28] Multi-modal Sentiment Analysis using Deep Canonical Correlation Analysis
    Sun, Zhongkai
    Sarma, Prathusha K.
    Sethares, William
    Bucy, Erik P.
    INTERSPEECH 2019, 2019, : 1323 - 1327
  • [29] Quaternion Principal Component Analysis for Multi-modal Fusion
    Chen, Meng
    Wang, Chenxia
    Meng, Xiao
    Wang, Zhifang
    GENETIC AND EVOLUTIONARY COMPUTING, VOL II, 2016, 388 : 11 - 19
  • [30] Heterogeneous Multi-Modal Sensor Fusion with Hybrid Attention for Exercise Recognition
    Wijekoon, Anjana
    Wiratunga, Nirmalie
    Cooper, Kay
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,