Mixture of Attention Variants for Modal Fusion in Multi-Modal Sentiment Analysis

被引：0

作者：

He, Chao ^{[1
,2
]}

Zhang, Xinghua ^{[3
]}

Song, Dongqing ^{[1
]}

Shen, Yingshan ^{[2
]}

Mao, Chengjie ^{[1
]}

Wen, Huosheng ^{[4
]}

Zhu, Dingju ^{[4
]}

Cai, Lihua ^{[2
,4
]}

机构：

[1] South China Normal Univ, Sch Comp Sci, Guangzhou 510631, Peoples R China

[2] South China Normal Univ, Aberdeen Inst Data Sci & Artificial Intelligence, Guangzhou 528225, Peoples R China

[3] South China Normal Univ, Int United Coll, Guangzhou 528225, Peoples R China

[4] South China Normal Univ, Sch Software, Guangzhou 528225, Peoples R China

来源：

BIG DATA AND COGNITIVE COMPUTING | 2024年 / 8卷 / 02期

关键词：

multi-modality; attention mechanism; sentiment analysis; feature fusion; deep learning; VISUAL SENTIMENT; SEMANTICS;

D O I：

10.3390/bdcc8020014

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the popularization of better network access and the penetration of personal smartphones in today's world, the explosion of multi-modal data, particularly opinionated video messages, has created urgent demands and immense opportunities for Multi-Modal Sentiment Analysis (MSA). Deep learning with the attention mechanism has served as the foundation technique for most state-of-the-art MSA models due to its ability to learn complex inter- and intra-relationships among different modalities embedded in video messages, both temporally and spatially. However, modal fusion is still a major challenge due to the vast feature space created by the interactions among different data modalities. To address the modal fusion challenge, we propose an MSA algorithm based on deep learning and the attention mechanism, namely the Mixture of Attention Variants for Modal Fusion (MAVMF). The MAVMF algorithm includes a two-stage process: in stage one, self-attention is applied to effectively extract image and text features, and the dependency relationships in the context of video discourse are captured by a bidirectional gated recurrent neural module; in stage two, four multi-modal attention variants are leveraged to learn the emotional contributions of important features from different modalities. Our proposed approach is end-to-end and has been shown to achieve a superior performance to the state-of-the-art algorithms when tested with two largest public datasets, CMU-MOSI and CMU-MOSEI.

引用

页数：19

共 50 条

[21] Attention-based multi-modal fusion sarcasm detection
Liu, Jing
Tian, Shengwei
Yu, Long
Long, Jun
Zhou, Tiejun
Wang, Bo
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 2097 - 2108
[22] Application of Multi-modal Fusion Attention Mechanism in Semantic Segmentation
Liu, Yunlong
Yoshie, Osamu
Watanabe, Hiroshi
COMPUTER VISION - ACCV 2022, PT VII, 2023, 13847 : 378 - 397
[23] Multi-modal Conditional Attention Fusion for Dimensional Emotion Prediction
Chen, Shizhe
Jin, Qin
MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 571 - 575
[24] Cross-Modal Semantic Alignment and Information Refinement for Multi-Modal Sentiment Analysis
Ding, Meirong
Chen, Hongye
Zeng, Biqing
Computer Engineering and Applications, 2024, 60 (22) : 114 - 125
[25] Cross-modal context-gated convolution for multi-modal sentiment analysis
Wen, Huanglu
You, Shaodi
Fu, Ying
PATTERN RECOGNITION LETTERS, 2021, 146 : 252 - 259
[26] Attention-based Multi-modal Sentiment Analysis and Emotion Detection in Conversation using RNN
Huddar, Mahesh G.
Sannakki, Sanjeev S.
Rajpurohit, Vijay S.
INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2021, 6 (06): : 112 - 121
[27] Cross-modal attention for multi-modal image registration
Song, Xinrui
Chao, Hanqing
Xu, Xuanang
Guo, Hengtao
Xu, Sheng
Turkbey, Baris
Wood, Bradford J.
Sanford, Thomas
Wang, Ge
Yan, Pingkun
MEDICAL IMAGE ANALYSIS, 2022, 82
[28] Multi-modal Sentiment Analysis using Deep Canonical Correlation Analysis
Sun, Zhongkai
Sarma, Prathusha K.
Sethares, William
Bucy, Erik P.
INTERSPEECH 2019, 2019, : 1323 - 1327
[29] Quaternion Principal Component Analysis for Multi-modal Fusion
Chen, Meng
Wang, Chenxia
Meng, Xiao
Wang, Zhifang
GENETIC AND EVOLUTIONARY COMPUTING, VOL II, 2016, 388 : 11 - 19
[30] Heterogeneous Multi-Modal Sensor Fusion with Hybrid Attention for Exercise Recognition
Wijekoon, Anjana
Wiratunga, Nirmalie
Cooper, Kay
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,

← 1 2 3 4 5 →