Multi-Modal Sentiment Recognition of Online Users Based on Text-Image-Audio Fusion

被引:0
|
作者
Li, Hui [1 ]
Pang, Jingwei [1 ]
机构
[1] School of Economics & Management, Xidian University, Xi’an,710126, China
基金
中国国家自然科学基金;
关键词
Deep learning - Economic and social effects - Emotion Recognition - Image analysis - Video analysis;
D O I
10.11925/infotech.2096-3467.2023.0744
中图分类号
学科分类号
摘要
[Objective] To effectively utilize information containing audio and video and fully capture the multi-modal interaction among text, image, and audio, this study proposes a multi-modal sentiment analysis model for online users (TIsA) incorporating text, image, and STFT-CNN audio feature extraction. [Methods] First, we separated the video data into audio and image data. Then, we used BERT and BiLSTM to obtain text feature representations and applied STFT to convert audio time-domain signals to the frequency domain. We also utilized CNN to extract audio and image features. Finally, we fused the features from the three modalities. [Results] We conducted empirical research using the9.5 Luding Earthquakepublic sentiment data from Sina Weibo. The proposed TIsA model achieved an accuracy, macro-averaged recall, and macro-averaged F1 score of 96.10%, 96.20%, and 96.10%, respectively, outperforming related baseline models. [Limitations] We should have explored the more profound effects of different fusion strategies on sentiment recognition results. [Conclusions] The proposed TIsA model demonstrates high accuracy in processing audio-containing videos, effectively supporting online public opinion analysis. © 2024 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:11 / 21
相关论文
共 50 条
  • [41] Multi-modal fusion method for human action recognition based on IALC
    Zhang, Yinhuan
    Xiao, Qinkun
    Liu, Xing
    Wei, Yongquan
    Chu, Chaoqin
    Xue, Jingyun
    IET IMAGE PROCESSING, 2023, 17 (02) : 388 - 400
  • [42] A Novel Chinese Character Recognition Method Based on Multi-Modal Fusion
    Liu, Jin
    Lyu, Shiqi
    Yu, Chao
    Yang, Yihe
    Luan, Cuiju
    FUZZY SYSTEMS AND DATA MINING V (FSDM 2019), 2019, 320 : 487 - 492
  • [43] MIGT: Multi-modal image inpainting guided with text
    Li, Ailin
    Zhao, Lei
    Zuo, Zhiwen
    Wang, Zhizhong
    Xing, Wei
    Lu, Dongming
    NEUROCOMPUTING, 2023, 520 : 376 - 385
  • [44] Multi-modal image fusion based on saliency guided in NSCT domain
    Wang, Shiying
    Shen, Yan
    IET IMAGE PROCESSING, 2020, 14 (13) : 3188 - 3201
  • [45] Leveraging multi-modal fusion for graph-based image annotation
    Amiri, S. Hamid
    Jamzad, Mansour
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2018, 55 : 816 - 828
  • [46] Multi-Modal Image Fusion Based on Matrix Product State of Tensor
    Lu, Yixiang
    Wang, Rui
    Gao, Qingwei
    Sun, Dong
    Zhu, De
    FRONTIERS IN NEUROROBOTICS, 2021, 15
  • [47] IMAGE DESCRIPTION THROUGH FUSION BASED RECURRENT MULTI-MODAL LEARNING
    Oruganti, Ram Manohar
    Sah, Shagan
    Pillai, Suhas
    Ptucha, Raymond
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 3613 - 3617
  • [48] Guided Image Deblurring by Deep Multi-Modal Image Fusion
    Liu, Yuqi
    Sheng, Zehua
    Shen, Hui-Liang
    IEEE ACCESS, 2022, 10 : 130708 - 130718
  • [49] ATTENTION DRIVEN FUSION FOR MULTI-MODAL EMOTION RECOGNITION
    Priyasad, Darshana
    Fernando, Tharindu
    Denman, Simon
    Sridharan, Sridha
    Fookes, Clinton
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3227 - 3231
  • [50] Recognition of multi-modal fusion images with irregular interference
    Wang, Yawei
    Chen, Yifei
    Wang, Dongfeng
    PeerJ Computer Science, 2022, 8