Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors

被引:0
|
作者
Wu, Yang [1 ]
Zhao, Yanyan [1 ]
Yang, Hao [1 ]
Chen, Song [1 ]
Qin, Bing [1 ]
Cao, Xiaohuan [2 ]
Zhao, Wenting [2 ]
机构
[1] Harbin Inst Technol, Harbin, Peoples R China
[2] China Merchants Bank, AI Lab, Shenzhen, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
REPRESENTATIONS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal sentiment analysis has attracted increasing attention and lots of models have been proposed. However, the performance of the state-of-the-art models decreases sharply when they are deployed in the real world. We find that the main reason is that real-world applications can only access the text outputs by the automatic speech recognition (ASR) models, which may be with errors because of the limitation of model capacity. Through further analysis of the ASR outputs, we find that in some cases the sentiment words, the key sentiment elements in the textual modality, are recognized as other words, which makes the sentiment of the text change and hurts the performance of multimodal sentiment analysis models directly. To address this problem, we propose the sentiment word aware multimodal refinement model (SWRM), which can dynamically refine the erroneous sentiment words by leveraging multimodal sentiment clues. Specifically, we first use the sentiment word position detection module to obtain the most possible position of the sentiment word in the text and then utilize the multimodal sentiment word refinement module to dynamically refine the sentiment word embeddings. The refined embeddings are taken as the textual inputs of the multimodal feature fusion module to predict the sentiment labels. We conduct extensive experiments on the real-world datasets including MOSI-Speechbrain, MOSI-IBM, and MOSI-iFlytek and the results demonstrate the effectiveness of our model, which surpasses the current state-of-the-art models on three datasets. Furthermore, our approach can be adapted for other multimodal feature fusion models easily(1).
引用
收藏
页码:1397 / 1406
页数:10
相关论文
共 50 条
  • [1] Sentiment-aware multimodal pre-training for multimodal sentiment analysis
    Ye, Junjie
    Zhou, Jie
    Tian, Junfeng
    Wang, Rui
    Zhou, Jingyi
    Gui, Tao
    Zhang, Qi
    Huang, Xuanjing
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 258
  • [2] A survey of multimodal sentiment analysis
    Soleymani, Mohammad
    Garcia, David
    Jou, Brendan
    Schuller, Bjoern
    Chang, Shih-Fu
    Pantic, Maja
    [J]. IMAGE AND VISION COMPUTING, 2017, 65 : 3 - 14
  • [3] A Survey on Multimodal Sentiment Analysis
    Zhang Y.
    Rong L.
    Song D.
    Zhang P.
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2020, 33 (05): : 426 - 438
  • [4] Word-wise Sparse Attention for Multimodal Sentiment Analysis
    Qian, Fan
    Song, Hongwei
    Han, Jiqing
    [J]. INTERSPEECH 2022, 2022, : 1973 - 1977
  • [5] Benchmarking Multimodal Sentiment Analysis
    Cambria, Erik
    Hazarika, Devamanyu
    Poria, Soujanya
    Hussain, Amir
    Subramanyam, R. B. V.
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2017, PT II, 2018, 10762 : 166 - 179
  • [6] Multimodal sentiment analysis: A survey
    Lai, Songning
    Hu, Xifeng
    Xu, Haoxuan
    Ren, Zhaoxia
    Liu, Zhi
    [J]. DISPLAYS, 2023, 80
  • [7] Multimodal Event-Aware Network for Sentiment Analysis in Tourism
    Wang, Lijuan
    Guo, Wenya
    Yao, Xingxu
    Zhang, Yuxiang
    Yang, Jufeng
    [J]. IEEE MULTIMEDIA, 2021, 28 (02) : 49 - 58
  • [8] Multimodal Sentiment Analysis: Sentiment Analysis Using Audiovisual Format
    Yadav, Sumit K.
    Bhushan, Mayank
    Gupta, Swati
    [J]. 2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2015, : 1415 - 1419
  • [9] A novel context-aware multimodal framework for persian sentiment analysis
    Dashtipour, Kia
    Gogate, Mandar
    Cambria, Erik
    Hussain, Amir
    [J]. NEUROCOMPUTING, 2021, 457 : 377 - 388
  • [10] Trustworthy Multimodal Fusion for Sentiment Analysis in Ordinal Sentiment Space
    Xie, Zhuyang
    Yang, Yan
    Wang, Jie
    Liu, Xiaorong
    Li, Xiaofan
    [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (08) : 7657 - 7670