Multi-Modal Sentiment Recognition of Online Users Based on Text-Image-Audio Fusion

被引:0
|
作者
Li, Hui [1 ]
Pang, Jingwei [1 ]
机构
[1] School of Economics & Management, Xidian University, Xi’an,710126, China
基金
中国国家自然科学基金;
关键词
Deep learning - Economic and social effects - Emotion Recognition - Image analysis - Video analysis;
D O I
10.11925/infotech.2096-3467.2023.0744
中图分类号
学科分类号
摘要
[Objective] To effectively utilize information containing audio and video and fully capture the multi-modal interaction among text, image, and audio, this study proposes a multi-modal sentiment analysis model for online users (TIsA) incorporating text, image, and STFT-CNN audio feature extraction. [Methods] First, we separated the video data into audio and image data. Then, we used BERT and BiLSTM to obtain text feature representations and applied STFT to convert audio time-domain signals to the frequency domain. We also utilized CNN to extract audio and image features. Finally, we fused the features from the three modalities. [Results] We conducted empirical research using the9.5 Luding Earthquakepublic sentiment data from Sina Weibo. The proposed TIsA model achieved an accuracy, macro-averaged recall, and macro-averaged F1 score of 96.10%, 96.20%, and 96.10%, respectively, outperforming related baseline models. [Limitations] We should have explored the more profound effects of different fusion strategies on sentiment recognition results. [Conclusions] The proposed TIsA model demonstrates high accuracy in processing audio-containing videos, effectively supporting online public opinion analysis. © 2024 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:11 / 21
相关论文
共 50 条
  • [1] Multi-Modal Sentiment Analysis Based on Image and Text Fusion Based on Cross-Attention Mechanism
    Li, Hongchan
    Lu, Yantong
    Zhu, Haodong
    ELECTRONICS, 2024, 13 (11)
  • [2] Multi-modal emotion recognition in conversation based on prompt learning with text-audio fusion features
    Wu, Yuezhou
    Zhang, Siling
    Li, Pengfei
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [3] Image and Encoded Text Fusion for Multi-Modal Classification
    Gallo, I.
    Calefati, A.
    Nawaz, S.
    Janjua, M. K.
    2018 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), 2018, : 203 - 209
  • [4] Statistical and Visual Analysis of Audio, Text, and Image Features for Multi-Modal Music Genre Recognition
    Wilkes, Ben
    Vatolkin, Igor
    Mueller, Heinrich
    ENTROPY, 2021, 23 (11)
  • [5] Visual audio and textual triplet fusion network for multi-modal sentiment analysis
    Lv, Cai-Chao
    Zhang, Xuan
    Zhang, Hong-Bo
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (12) : 9505 - 9513
  • [6] A Multi-Modal ELMo Model for Image Sentiment Recognition of Consumer Data
    Rong, Lu
    Ding, Yijie
    Wang, Mengyao
    El Saddik, Abdulmotaleb
    Hossain, M. Shamim
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 3697 - 3708
  • [7] Improved Sentiment Classification by Multi-modal Fusion
    Gan, Lige
    Benlamri, Rachid
    Khoury, Richard
    2017 THIRD IEEE INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2017), 2017, : 11 - 16
  • [8] The sentiment recognition of online users based on DNNs multimodal fusion
    Fan T.
    Wu P.
    Zhu P.
    Proceedings of the Association for Information Science and Technology, 2019, 56 (01) : 89 - 97
  • [9] Human activity recognition based on multi-modal fusion
    Zhang, Cheng
    Zu, Tianqi
    Hou, Yibin
    He, Jian
    Yang, Shengqi
    Dong, Ruihai
    CCF TRANSACTIONS ON PERVASIVE COMPUTING AND INTERACTION, 2023, 5 (03) : 321 - 332
  • [10] Human activity recognition based on multi-modal fusion
    Cheng Zhang
    Tianqi Zu
    Yibin Hou
    Jian He
    Shengqi Yang
    Ruihai Dong
    CCF Transactions on Pervasive Computing and Interaction, 2023, 5 : 321 - 332