Multi-Modal Sentiment Recognition of Online Users Based on Text-Image-Audio Fusion

被引:0
|
作者
Li, Hui [1 ]
Pang, Jingwei [1 ]
机构
[1] School of Economics & Management, Xidian University, Xi’an,710126, China
基金
中国国家自然科学基金;
关键词
Deep learning - Economic and social effects - Emotion Recognition - Image analysis - Video analysis;
D O I
10.11925/infotech.2096-3467.2023.0744
中图分类号
学科分类号
摘要
[Objective] To effectively utilize information containing audio and video and fully capture the multi-modal interaction among text, image, and audio, this study proposes a multi-modal sentiment analysis model for online users (TIsA) incorporating text, image, and STFT-CNN audio feature extraction. [Methods] First, we separated the video data into audio and image data. Then, we used BERT and BiLSTM to obtain text feature representations and applied STFT to convert audio time-domain signals to the frequency domain. We also utilized CNN to extract audio and image features. Finally, we fused the features from the three modalities. [Results] We conducted empirical research using the9.5 Luding Earthquakepublic sentiment data from Sina Weibo. The proposed TIsA model achieved an accuracy, macro-averaged recall, and macro-averaged F1 score of 96.10%, 96.20%, and 96.10%, respectively, outperforming related baseline models. [Limitations] We should have explored the more profound effects of different fusion strategies on sentiment recognition results. [Conclusions] The proposed TIsA model demonstrates high accuracy in processing audio-containing videos, effectively supporting online public opinion analysis. © 2024 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:11 / 21
相关论文
共 50 条
  • [31] Multi-modal Image Fusion with KNN Matting
    Zhang, Xia
    Lin, Hui
    Kang, Xudong
    Li, Shutao
    PATTERN RECOGNITION (CCPR 2014), PT II, 2014, 484 : 89 - 96
  • [32] Multi-modal Sentiment Feature Learning Based on Sentiment Signal
    Lin, Dazhen
    Li, Lingxiao
    Cao, Donglin
    Li, Shaozi
    12TH CHINESE CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING (CHINESECSCW 2017), 2017, : 33 - 40
  • [33] Read, Watch, Listen, and Summarize: Multi-Modal Summarization for Asynchronous Text, Image, Audio and Video
    Li, Haoran
    Zhu, Junnan
    Ma, Cong
    Zhang, Jiajun
    Zong, Chengqing
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (05) : 996 - 1009
  • [34] An overview of multi-modal medical image fusion
    Du, Jiao
    Li, Weisheng
    Lu, Ke
    Xiao, Bin
    NEUROCOMPUTING, 2016, 215 : 3 - 20
  • [35] Sequential Late Fusion Technique for Multi-modal Sentiment Analysis
    Banerjee, Debapriya
    Lygerakis, Fotios
    Makedon, Fillia
    THE 14TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2021, 2021, : 264 - 265
  • [36] Multi-Modal Emotion Recognition Based On deep Learning Of EEG And Audio Signals
    Li, Zhongjie
    Zhang, Gaoyan
    Dang, Jianwu
    Wang, Longbiao
    Wei, Jianguo
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [37] A Multi-Modal Medical Image Analysis Algorithm Based on Text Guidance
    Fan, Lin
    Gong, Xun
    Zheng, Cen-Yang
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2024, 52 (07): : 2341 - 2355
  • [38] Image - Text Association Enhanced Multi-modal Swine Disease Knowledge Graph Fusion
    Jiang, Tingting
    Xu, Ao
    Wu, Feifei
    Yang, Shuai
    He, Jin
    Gu, Lichuan
    Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 56 (01): : 56 - 64
  • [39] An Image-Text Sentiment Analysis Method Using Multi-Channel Multi-Modal Joint Learning
    Gong, Lianting
    He, Xingzhou
    Yang, Jianzhong
    APPLIED ARTIFICIAL INTELLIGENCE, 2024, 38 (01)
  • [40] A Multi-modal Approach for hmotion Recognition of TV Drama Characters Using Image and Text
    Lee, Jung-Hoon
    Kim, Hyun-Ju
    Cheong, Yun-Gyung
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 420 - 424