Multi-level textual-visual alignment and fusion network for multimodal aspect-based sentiment analysis

被引:6
|
作者
Li, You [1 ]
Ding, Han [1 ]
Lin, Yuming [1 ]
Feng, Xinyu [1 ]
Chang, Liang [1 ]
机构
[1] Guilin Univ Elect Technol, Guangxi Key Lab Trusted Software, Jinji Rd, Guilin 541004, Guangxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal aspect-based sentiment analysis; Textual-visual alignment; Multi-scale fusion; Multi-granularity translation;
D O I
10.1007/s10462-023-10685-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal Aspect-Based Sentiment Analysis (MABSA) is an essential task in sentiment analysis that has garnered considerable attention in recent years. Typical approaches in MABSA often utilize cross-modal Transformers to capture interactions between textual and visual modalities. However, bridging the semantic gap between modalities spaces and addressing interference from irrelevant visual objects at different scales remains challenging. To tackle these limitations, we present the Multi-level Textual-Visual Alignment and Fusion Network (MTVAF) in this work, which incorporates three auxiliary tasks. Specifically, MTVAF first transforms multi-level image information into image descriptions, facial descriptions, and optical characters. These are then concatenated with the textual input to form a textual+visual input, facilitating comprehensive alignment between visual and textual modalities. Next, both inputs are fed into an integrated text model that incorporates relevant visual representations. Dynamic attention mechanisms are employed to generate visual prompts to control cross-modal fusion. Finally, we align the probability distributions of the textual input space and the textual+visual input space, effectively reducing noise introduced during the alignment process. Experimental results on two MABSA benchmark datasets demonstrate the effectiveness of the proposed MTVAF, showcasing its superior performance compared to state-of-the-art approaches. Our codes are available at https://github.com/MKMaS-GUET/MTVAF.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] Multi-level textual-visual alignment and fusion network for multimodal aspect-based sentiment analysis
    You Li
    Han Ding
    Yuming Lin
    Xinyu Feng
    Liang Chang
    Artificial Intelligence Review, 57
  • [2] Visual Enhancement Capsule Network for Aspect-based Multimodal Sentiment Analysis
    Zhang, Yifei
    Zhang, Zhiqing
    Feng, Shi
    Wang, Daling
    APPLIED SCIENCES-BASEL, 2022, 12 (23):
  • [3] Position Perceptive Multi-Hop Fusion Network for Multimodal Aspect-Based Sentiment Analysis
    Fan, Hao
    Chen, Junjie
    IEEE ACCESS, 2024, 12 : 90586 - 90595
  • [4] Dual-Perspective Fusion Network for Aspect-Based Multimodal Sentiment Analysis
    Wang, Di
    Tian, Changning
    Liang, Xiao
    Zhao, Lin
    He, Lihuo
    Wang, Quan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 (4028-4038) : 4028 - 4038
  • [5] MSFNet: modality smoothing fusion network for multimodal aspect-based sentiment analysis
    Xiang, Yan
    Cai, Yunjia
    Guo, Junjun
    FRONTIERS IN PHYSICS, 2023, 11
  • [6] Interactive Fusion Network with Recurrent Attention for Multimodal Aspect-based Sentiment Analysis
    Wang, Jun
    Wang, Qianlong
    Wen, Zhiyuan
    Liang, Xingwei
    Xu, Ruifeng
    ARTIFICIAL INTELLIGENCE, CICAI 2022, PT III, 2022, 13606 : 298 - 309
  • [7] Multi-grained fusion network with self-distillation for aspect-based multimodal sentiment analysis
    Yang, Juan
    Xiao, Yali
    Du, Xu
    KNOWLEDGE-BASED SYSTEMS, 2024, 293
  • [8] AMIFN: Aspect-guided multi-view interactions and fusion network for multimodal aspect-based sentiment analysis°
    Yang, Juan
    Xu, Mengya
    Xiao, Yali
    Du, Xu
    NEUROCOMPUTING, 2024, 573
  • [9] AMIFN: Aspect-guided multi-view interactions and fusion network for multimodal aspect-based sentiment analysis
    Yang, Juan
    Xu, Mengya
    Xiao, Yali
    Du, Xu
    Neurocomputing, 2024, 573
  • [10] A vision and language hierarchical alignment for multimodal aspect-based sentiment analysis
    Zou, Wang
    Sun, Xia
    Lu, Qiang
    Wang, Xuxin
    Feng, Jun
    PATTERN RECOGNITION, 2025, 162