Multimodal Sentiment Analysis With Image-Text Interaction Network

被引:58
|
作者
Zhu, Tong [1 ]
Li, Leida [2 ]
Yang, Jufeng [3 ]
Zhao, Sicheng [4 ]
Liu, Hantao [5 ]
Qian, Jiansheng [1 ]
机构
[1] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221116, Peoples R China
[2] Xidian Univ, Sch Artificial Intelligence, Xian 710071, Peoples R China
[3] Nankai Univ, Sch Comp Sci & Control Engn, Tianjin 300350, Peoples R China
[4] Columbia Univ, Dept Radiol, New York, NY 10027 USA
[5] Cardiff Univ, Sch Comp Sci & Informat, Cardiff CF243AA, Wales
基金
中国国家自然科学基金;
关键词
Index Terms-Image-text interaction; multimodal sentiment analysis; region-word alignment; LANGUAGE; CONTEXT;
D O I
10.1109/TMM.2022.3160060
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
More and more users are getting used to posting images and text on social networks to share their emotions or opinions. Accordingly, multimodal sentiment analysis has become a research topic of increasing interest in recent years. Typically, there exist affective regions that evoke human sentiment in an image, which are usually manifested by corresponding words in people's comments. Similarly, people also tend to portray the affective regions of an image when composing image descriptions. As a result, the relationship between image affective regions and the associated text is of great significance for multimodal sentiment analysis. However, most of the existing multimodal sentiment analysis approaches simply concatenate features from image and text, which could not fully explore the interaction between them, leading to suboptimal results. Motivated by this observation, we propose a new image-text interaction network (ITIN) to investigate the relationship between affective image regions and text for multimodal sentiment analysis. Specifically, we introduce a cross-modal alignment module to capture region-word correspondence, based on which multimodal features are fused through an adaptive cross-modal gating module. Moreover, considering the complementary role of context information on sentiment analysis, we integrate the individual-modal contextual feature representations for achieving more reliable prediction. Extensive experimental results and comparisons on public datasets demonstrate that the proposed model is superior to the state-of-the-art methods.
引用
收藏
页码:3375 / 3385
页数:11
相关论文
共 50 条
  • [1] Image-text interaction graph neural network for image-text sentiment analysis
    Wenxiong Liao
    Bi Zeng
    Jianqi Liu
    Pengfei Wei
    Jiongkun Fang
    [J]. Applied Intelligence, 2022, 52 : 11184 - 11198
  • [2] Image-text interaction graph neural network for image-text sentiment analysis
    Liao, Wenxiong
    Zeng, Bi
    Liu, Jianqi
    Wei, Pengfei
    Fang, Jiongkun
    [J]. APPLIED INTELLIGENCE, 2022, 52 (10) : 11184 - 11198
  • [3] Multimodal Sentiment Analysis With Image-Text Correlation Modal
    Li, Yuxin
    Jiang, Shan
    Chaomurilige
    [J]. 2023 IEEE INTERNATIONAL CONFERENCES ON INTERNET OF THINGS, ITHINGS IEEE GREEN COMPUTING AND COMMUNICATIONS, GREENCOM IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING, CPSCOM IEEE SMART DATA, SMARTDATA AND IEEE CONGRESS ON CYBERMATICS,CYBERMATICS, 2024, : 281 - 286
  • [4] BIT: Improving Image-text Sentiment Analysis via Learning Bidirectional Image-text Interaction
    Xiao, Xingwang
    Pu, Yuanyuan
    Zhao, Zhengpeng
    Gu, Jinjing
    Xu, Dan
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [5] Image-text sentiment analysis via deep multimodal attentive fusion
    Huang, Feiran
    Zhang, Xiaoming
    Zhao, Zhonghua
    Xu, Jie
    Li, Zhoujun
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 167 : 26 - 37
  • [6] An image-text consistency driven multimodal sentiment analysis approach for social media
    Zhao, Ziyuan
    Zhu, Huiying
    Xue, Zehao
    Liu, Zhao
    Tian, Jing
    Chua, Matthew Chin Heng
    Liu, Maofu
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (06)
  • [7] A TBGAV-Based Image-Text Multimodal Sentiment Analysis Method for Tourism Reviews
    Zhang, Ke
    Wang, Shunmin
    Yu, Yuanyu
    [J]. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND WEB ENGINEERING, 2023, 18 (01) : 1 - 17
  • [8] Dynamic interaction networks for image-text multimodal learning
    Wang, Wenshan
    Liu, Pengfei
    Yang, Su
    Zhang, Weishan
    [J]. NEUROCOMPUTING, 2020, 379 : 262 - 272
  • [9] Collaborative fine-grained interaction learning for image-text sentiment analysis
    Xiao, Xingwang
    Pu, Yuanyuan
    Zhou, Dongming
    Cao, Jinde
    Gu, Jinjing
    Zhao, Zhengpeng
    Xu, Dan
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 279
  • [10] Image-Text Multimodal Sentiment Analysis Framework of Assamese News Articles Using Late Fusion
    Das, Ringki
    Singh, Thoudam Doren
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)