Multi-Modal Sentiment Recognition of Online Users Based on Text-Image-Audio Fusion

被引：0

作者：

Li, Hui ^{[1
]}

Pang, Jingwei ^{[1
]}

机构：

[1] School of Economics & Management, Xidian University, Xi’an,710126, China

来源：

Data Analysis and Knowledge Discovery | 2024年 / 8卷 / 11期

基金：

中国国家自然科学基金;

关键词：

Deep learning - Economic and social effects - Emotion Recognition - Image analysis - Video analysis;

D O I：

10.11925/infotech.2096-3467.2023.0744

中图分类号：

学科分类号：

摘要：

[Objective] To effectively utilize information containing audio and video and fully capture the multi-modal interaction among text, image, and audio, this study proposes a multi-modal sentiment analysis model for online users (TIsA) incorporating text, image, and STFT-CNN audio feature extraction. [Methods] First, we separated the video data into audio and image data. Then, we used BERT and BiLSTM to obtain text feature representations and applied STFT to convert audio time-domain signals to the frequency domain. We also utilized CNN to extract audio and image features. Finally, we fused the features from the three modalities. [Results] We conducted empirical research using the9.5 Luding Earthquakepublic sentiment data from Sina Weibo. The proposed TIsA model achieved an accuracy, macro-averaged recall, and macro-averaged F1 score of 96.10%, 96.20%, and 96.10%, respectively, outperforming related baseline models. [Limitations] We should have explored the more profound effects of different fusion strategies on sentiment recognition results. [Conclusions] The proposed TIsA model demonstrates high accuracy in processing audio-containing videos, effectively supporting online public opinion analysis. © 2024 Chinese Academy of Sciences. All rights reserved.

引用

页码：11 / 21

共 50 条

[21] Multi-Modal Emotion Recognition Fusing Video and Audio
Xu, Chao
Du, Pufeng
Feng, Zhiyong
Meng, Zhaopeng
Cao, Tianyi
Dong, Caichao
APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 455 - 462
[22] ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition
Wang, Xinyu
Gui, Min
Jiang, Yong
Jia, Zixia
Bach, Nguyen
Wang, Tao
Huang, Zhongqiang
Huang, Fei
Tu, Kewei
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3176 - 3189
[23] Adherent Peanut Image Segmentation Based on Multi-Modal Fusion
Wang, Yujing
Ye, Fang
Zeng, Jiusun
Cai, Jinhui
Huang, Wangsen
SENSORS, 2024, 24 (14)
[24] Multi-modal Image Fusion Based on ROI and Laplacian Pyramid
Gao, Xiong
Zhang, Hong
Chen, Hao
Li, Jiafeng
SIXTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2014), 2015, 9443
[25] Fabric image retrieval based on multi-modal feature fusion
Ning Zhang
Yixin Liu
Zhongjian Li
Jun Xiang
Ruru Pan
Signal, Image and Video Processing, 2024, 18 : 2207 - 2217
[26] Fabric image retrieval based on multi-modal feature fusion
Zhang, Ning
Liu, Yixin
Li, Zhongjian
Xiang, Jun
Pan, Ruru
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (03) : 2207 - 2217
[27] On Multi-modal Fusion for Freehand Gesture Recognition
Schak, Monika
Gepperth, Alexander
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT I, 2020, 12396 : 862 - 873
[28] Masked Audio Text Encoders are Effective Multi-Modal Rescorers
Cai, Jinglun
Sunkara, Monica
Li, Xilai
Bhatia, Anshu
Pan, Xiao
Bodapati, Sravan
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 10718 - 10730
[29] Multi-modal haptic image recognition based on deep learning
Han, Dong
Nie, Hong
Chen, Jinbao
Chen, Meng
Deng, Zhen
Zhang, Jianwei
SENSOR REVIEW, 2018, 38 (04) : 486 - 493
[30] Audio-Visual Scene Classification Based on Multi-modal Graph Fusion
Lei, Han
Chen, Ning
INTERSPEECH 2022, 2022, : 4157 - 4161

← 1 2 3 4 5 →