Multi-level textual-visual alignment and fusion network for multimodal aspect-based sentiment analysis

被引：6

作者：

Li, You ^{[1
]}

Ding, Han ^{[1
]}

Lin, Yuming ^{[1
]}

Feng, Xinyu ^{[1
]}

Chang, Liang ^{[1
]}

机构：

[1] Guilin Univ Elect Technol, Guangxi Key Lab Trusted Software, Jinji Rd, Guilin 541004, Guangxi, Peoples R China

来源：

ARTIFICIAL INTELLIGENCE REVIEW | 2024年 / 57卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Multimodal aspect-based sentiment analysis; Textual-visual alignment; Multi-scale fusion; Multi-granularity translation;

D O I：

10.1007/s10462-023-10685-z

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multimodal Aspect-Based Sentiment Analysis (MABSA) is an essential task in sentiment analysis that has garnered considerable attention in recent years. Typical approaches in MABSA often utilize cross-modal Transformers to capture interactions between textual and visual modalities. However, bridging the semantic gap between modalities spaces and addressing interference from irrelevant visual objects at different scales remains challenging. To tackle these limitations, we present the Multi-level Textual-Visual Alignment and Fusion Network (MTVAF) in this work, which incorporates three auxiliary tasks. Specifically, MTVAF first transforms multi-level image information into image descriptions, facial descriptions, and optical characters. These are then concatenated with the textual input to form a textual+visual input, facilitating comprehensive alignment between visual and textual modalities. Next, both inputs are fed into an integrated text model that incorporates relevant visual representations. Dynamic attention mechanisms are employed to generate visual prompts to control cross-modal fusion. Finally, we align the probability distributions of the textual input space and the textual+visual input space, effectively reducing noise introduced during the alignment process. Experimental results on two MABSA benchmark datasets demonstrate the effectiveness of the proposed MTVAF, showcasing its superior performance compared to state-of-the-art approaches. Our codes are available at https://github.com/MKMaS-GUET/MTVAF.

引用

页数：26

共 50 条

[1] Multi-level textual-visual alignment and fusion network for multimodal aspect-based sentiment analysis
You Li
Han Ding
Yuming Lin
Xinyu Feng
Liang Chang
Artificial Intelligence Review, 57
[2] Visual Enhancement Capsule Network for Aspect-based Multimodal Sentiment Analysis
Zhang, Yifei
Zhang, Zhiqing
Feng, Shi
Wang, Daling
APPLIED SCIENCES-BASEL, 2022, 12 (23):
[3] Position Perceptive Multi-Hop Fusion Network for Multimodal Aspect-Based Sentiment Analysis
Fan, Hao
Chen, Junjie
IEEE ACCESS, 2024, 12 : 90586 - 90595
[4] Dual-Perspective Fusion Network for Aspect-Based Multimodal Sentiment Analysis
Wang, Di
Tian, Changning
Liang, Xiao
Zhao, Lin
He, Lihuo
Wang, Quan
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 (4028-4038) : 4028 - 4038
[5] MSFNet: modality smoothing fusion network for multimodal aspect-based sentiment analysis
Xiang, Yan
Cai, Yunjia
Guo, Junjun
FRONTIERS IN PHYSICS, 2023, 11
[6] Interactive Fusion Network with Recurrent Attention for Multimodal Aspect-based Sentiment Analysis
Wang, Jun
Wang, Qianlong
Wen, Zhiyuan
Liang, Xingwei
Xu, Ruifeng
ARTIFICIAL INTELLIGENCE, CICAI 2022, PT III, 2022, 13606 : 298 - 309
[7] Multi-grained fusion network with self-distillation for aspect-based multimodal sentiment analysis
Yang, Juan
Xiao, Yali
Du, Xu
KNOWLEDGE-BASED SYSTEMS, 2024, 293
[8] AMIFN: Aspect-guided multi-view interactions and fusion network for multimodal aspect-based sentiment analysis°
Yang, Juan
Xu, Mengya
Xiao, Yali
Du, Xu
NEUROCOMPUTING, 2024, 573
[9] AMIFN: Aspect-guided multi-view interactions and fusion network for multimodal aspect-based sentiment analysis
Yang, Juan
Xu, Mengya
Xiao, Yali
Du, Xu
Neurocomputing, 2024, 573
[10] A vision and language hierarchical alignment for multimodal aspect-based sentiment analysis
Zou, Wang
Sun, Xia
Lu, Qiang
Wang, Xuxin
Feng, Jun
PATTERN RECOGNITION, 2025, 162

← 1 2 3 4 5 →