Scanning, attention, and reasoning multimodal content for sentiment analysis

被引:5
|
作者
Liu, Yun [1 ]
Li, Zhoujun [2 ]
Zhou, Ke [1 ]
Zhang, Leilei [1 ]
Li, Lang [1 ]
Tian, Peng [1 ]
Shen, Shixun [1 ]
机构
[1] Moutai Inst, Dept Automat, Renhuai 564507, Guizhou Provinc, Peoples R China
[2] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal sentiment analysis; Attention; Reasoning; FUSION;
D O I
10.1016/j.knosys.2023.110467
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rise of social networks has provided people with platforms to display their lives and emotions, often in multimodal forms such as images and descriptive texts. Capturing the emotions embedded in the multimodal content of social networks involves great research challenges and practical values. Existing methods usually make sentiment predictions based on a single-round reasoning process with multimodal attention networks, however, this may be insufficient for tasks that require deep understanding and complex reasoning. To effectively comprehend multimodal content and predict the correct sentiment tendencies, we propose the Scanning, Attention, and Reasoning (SAR) model for multimodal sentiment analysis. Specifically, a perceptual scanning model is designed to roughly perceive the image and text content, as well as the intrinsic correlation between them. To deeply understand the complementary features between images and texts, an intensive attention model is proposed for cross-modal feature association learning. The multimodal joint features from the scanning and attention models are fused together as the representation of a multimodal node in the social network. A heterogeneous reasoning model implemented with a graph neural network is constructed to capture the influence of network communication in social networks and make sentiment predictions. Extensive experiments conducted on three benchmark datasets confirm the effectiveness and superiority of our model compared with state-of-the-art methods.(c) 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Dual Attention Networks for Multimodal Reasoning and Matching
    Nam, Hyeonseob
    Ha, Jung-Woo
    Kim, Jeonghee
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2156 - 2164
  • [42] Multi-attention Fusion for Multimodal Sentiment Classification
    Li, Guangmin
    Zeng, Xin
    Chen, Chi
    Zhou, Long
    PROCEEDINGS OF 2024 ACM ICMR WORKSHOP ON MULTIMODAL VIDEO RETRIEVAL, ICMR-MVR 2024, 2024, : 1 - 7
  • [43] Gated attention fusion network for multimodal sentiment classification
    Du, Yongping
    Liu, Yang
    Peng, Zhi
    Jin, Xingnan
    KNOWLEDGE-BASED SYSTEMS, 2022, 240
  • [44] A survey of multimodal sentiment analysis
    Soleymani, Mohammad
    Garcia, David
    Jou, Brendan
    Schuller, Bjoern
    Chang, Shih-Fu
    Pantic, Maja
    IMAGE AND VISION COMPUTING, 2017, 65 : 3 - 14
  • [45] A Survey on Multimodal Sentiment Analysis
    Zhang Y.
    Rong L.
    Song D.
    Zhang P.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2020, 33 (05): : 426 - 438
  • [46] Benchmarking Multimodal Sentiment Analysis
    Cambria, Erik
    Hazarika, Devamanyu
    Poria, Soujanya
    Hussain, Amir
    Subramanyam, R. B. V.
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2017, PT II, 2018, 10762 : 166 - 179
  • [47] Multimodal sentiment analysis: A survey
    Lai, Songning
    Hu, Xifeng
    Xu, Haoxuan
    Ren, Zhaoxia
    Liu, Zhi
    DISPLAYS, 2023, 80
  • [48] SENTIMENT ANALYSIS AND MULTIMODAL APPROACH APPLIED TO SOCIAL MEDIA CONTENT IN HOSPITALITY INDUSTRY
    Musanovic, Jelena
    Folgieri, Raffaella
    Gregoric, Maj A.
    6TH INTERNATIONAL SCIENTIFIC CONFERENCE TOSEE - TOURISM IN SOUTHERN AND EASTERN EUROPE 2021: TOSEE - SMART, EXPERIENCE, EXCELLENCE & TOFEEL - FEELINGS, EXCITEMENT, EDUCATION, LEISURE, 2021, 6 : 533 - 544
  • [49] Multimodal Sentiment Analysis: Sentiment Analysis Using Audiovisual Format
    Yadav, Sumit K.
    Bhushan, Mayank
    Gupta, Swati
    2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2015, : 1415 - 1419
  • [50] Fusing audio, visual and textual clues for sentiment analysis from multimodal content
    Poria, Soujanya
    Cambria, Erik
    Howard, Newton
    Huang, Guang-Bin
    Hussain, Amir
    NEUROCOMPUTING, 2016, 174 : 50 - 59