Multi-Modal Sentiment Recognition of Online Users Based on Text-Image-Audio Fusion

被引:0
|
作者
Li, Hui [1 ]
Pang, Jingwei [1 ]
机构
[1] School of Economics & Management, Xidian University, Xi’an,710126, China
基金
中国国家自然科学基金;
关键词
Deep learning - Economic and social effects - Emotion Recognition - Image analysis - Video analysis;
D O I
10.11925/infotech.2096-3467.2023.0744
中图分类号
学科分类号
摘要
[Objective] To effectively utilize information containing audio and video and fully capture the multi-modal interaction among text, image, and audio, this study proposes a multi-modal sentiment analysis model for online users (TIsA) incorporating text, image, and STFT-CNN audio feature extraction. [Methods] First, we separated the video data into audio and image data. Then, we used BERT and BiLSTM to obtain text feature representations and applied STFT to convert audio time-domain signals to the frequency domain. We also utilized CNN to extract audio and image features. Finally, we fused the features from the three modalities. [Results] We conducted empirical research using the9.5 Luding Earthquakepublic sentiment data from Sina Weibo. The proposed TIsA model achieved an accuracy, macro-averaged recall, and macro-averaged F1 score of 96.10%, 96.20%, and 96.10%, respectively, outperforming related baseline models. [Limitations] We should have explored the more profound effects of different fusion strategies on sentiment recognition results. [Conclusions] The proposed TIsA model demonstrates high accuracy in processing audio-containing videos, effectively supporting online public opinion analysis. © 2024 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:11 / 21
相关论文
共 50 条
  • [21] Multi-Modal Emotion Recognition Fusing Video and Audio
    Xu, Chao
    Du, Pufeng
    Feng, Zhiyong
    Meng, Zhaopeng
    Cao, Tianyi
    Dong, Caichao
    APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 455 - 462
  • [22] ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition
    Wang, Xinyu
    Gui, Min
    Jiang, Yong
    Jia, Zixia
    Bach, Nguyen
    Wang, Tao
    Huang, Zhongqiang
    Huang, Fei
    Tu, Kewei
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3176 - 3189
  • [23] Adherent Peanut Image Segmentation Based on Multi-Modal Fusion
    Wang, Yujing
    Ye, Fang
    Zeng, Jiusun
    Cai, Jinhui
    Huang, Wangsen
    SENSORS, 2024, 24 (14)
  • [24] Multi-modal Image Fusion Based on ROI and Laplacian Pyramid
    Gao, Xiong
    Zhang, Hong
    Chen, Hao
    Li, Jiafeng
    SIXTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2014), 2015, 9443
  • [25] Fabric image retrieval based on multi-modal feature fusion
    Ning Zhang
    Yixin Liu
    Zhongjian Li
    Jun Xiang
    Ruru Pan
    Signal, Image and Video Processing, 2024, 18 : 2207 - 2217
  • [26] Fabric image retrieval based on multi-modal feature fusion
    Zhang, Ning
    Liu, Yixin
    Li, Zhongjian
    Xiang, Jun
    Pan, Ruru
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (03) : 2207 - 2217
  • [27] On Multi-modal Fusion for Freehand Gesture Recognition
    Schak, Monika
    Gepperth, Alexander
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT I, 2020, 12396 : 862 - 873
  • [28] Masked Audio Text Encoders are Effective Multi-Modal Rescorers
    Cai, Jinglun
    Sunkara, Monica
    Li, Xilai
    Bhatia, Anshu
    Pan, Xiao
    Bodapati, Sravan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 10718 - 10730
  • [29] Multi-modal haptic image recognition based on deep learning
    Han, Dong
    Nie, Hong
    Chen, Jinbao
    Chen, Meng
    Deng, Zhen
    Zhang, Jianwei
    SENSOR REVIEW, 2018, 38 (04) : 486 - 493
  • [30] Audio-Visual Scene Classification Based on Multi-modal Graph Fusion
    Lei, Han
    Chen, Ning
    INTERSPEECH 2022, 2022, : 4157 - 4161