TEDT: Transformer-Based Encoding–Decoding Translation Network for Multimodal Sentiment Analysis

被引：0

作者：

Fan Wang

Shengwei Tian

Long Yu

Jing Liu

Junwen Wang

Kun Li

Yongtao Wang

机构：

[1] University of Xinjiang,School of Software

[2] University of Xinjiang,Network and Information Center

来源：

Cognitive Computation | 2023年 / 15卷

关键词：

Multimodal sentiment analysis; Transformer; Multimodal fusion; Multimodal attention;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Multimodal sentiment analysis is a popular and challenging research topic in natural language processing, but the impact of individual modal data in videos on sentiment analysis results can be different. In the temporal dimension, natural language sentiment is influenced by nonnatural language sentiment, which may enhance or weaken the original sentiment of the current natural language. In addition, there is a general problem of poor quality of nonnatural language features, which essentially hinders the effect of multimodal fusion. To address the above issues, we proposed a multimodal encoding–decoding translation network with a transformer and adopted a joint encoding–decoding method with text as the primary information and sound and image as the secondary information. To reduce the negative impact of nonnatural language data on natural language data, we propose a modality reinforcement cross-attention module to convert nonnatural language features into natural language features to improve their quality and better integrate multimodal features. Moreover, the dynamic filtering mechanism filters out the error information generated in the cross-modal interaction to further improve the final output. We evaluated the proposed method on two multimodal sentiment analysis benchmark datasets (MOSI and MOSEI), and the accuracy of the method was 89.3% and 85.9%, respectively. In addition, our method outperformed the current state-of-the-art methods. Our model can greatly improve the effect of multimodal fusion and more accurately analyze human sentiment.

引用

页码：289 / 303

页数：14

共 50 条

[41] Transformer-Based Interactive Multi-Modal Attention Network for Video Sentiment Detection
Xuqiang Zhuang
Fangai Liu
Jian Hou
Jianhua Hao
Xiaohong Cai
Neural Processing Letters, 2022, 54 : 1943 - 1960
[42] Multimodal transformer with adaptive modality weighting for multimodal sentiment analysis
Wang, Yifeng
He, Jiahao
Wang, Di
Wang, Quan
Wan, Bo
Luo, Xuemei
NEUROCOMPUTING, 2024, 572
[43] Novelty fused image and text models based on deep neural network and transformer for multimodal sentiment analysis
Hung, Bui Thanh
Thu, Nguyen Hoang Minh
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (25) : 66263 - 66281
[44] Transformer-based deep learning models for the sentiment analysis of social media data
Kokab, Sayyida Tabinda
Asghar, Sohail
Naz, Shehneela
ARRAY, 2022, 14
[45] Enhancing the accuracy of transformer-based embeddings for sentiment analysis in social big data
Zemzem, Wiem
Tagina, Moncef
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2023, 73 (03) : 169 - 177
[46] TDFNet: Transformer-Based Deep-Scale Fusion Network for Multimodal Emotion Recognition
Zhao, Zhengdao
Wang, Yuhua
Shen, Guang
Xu, Yuezhu
Zhang, Jiayuan
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3771 - 3782
[47] TMSS: An End-to-End Transformer-Based Multimodal Network for Segmentation and Survival Prediction
Saeed, Numan
Sobirov, Ikboljon
Al Majzoub, Roba
Yaqub, Mohammad
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VII, 2022, 13437 : 319 - 329
[48] Multimodal sentiment analysis with unidirectional modality translation
Yang, Bo
Shao, Bo
Wu, Lijun
Lin, Xiaola
NEUROCOMPUTING, 2022, 467 : 130 - 137
[49] A transformer-based network for speech recognition
Tang L.
International Journal of Speech Technology, 2023, 26 (02) : 531 - 539
[50] Conv-Enhanced Transformer and Robust Optimization Network for robust multimodal sentiment analysis
Sun, Bin
Jia, Li
Cui, Yiming
Wang, Na
Jiang, Tao
NEUROCOMPUTING, 2025, 634

← 1 2 3 4 5 →