MMTF-DES: A fusion of multimodal transformer models for desire, emotion, and sentiment analysis of social media data

被引:0
|
作者
Aziz, Abdul [1 ]
Chowdhury, Nihad Karim [1 ]
Kabir, Muhammad Ashad [2 ]
Chy, Abu Nowshed [1 ]
Siddique, Md. Jawad [3 ]
机构
[1] Univ Chittagong, Dept Comp Sci & Engn, Chattogram 4331, Bangladesh
[2] Charles Sturt Univ, Sch Comp Math & Engn, Bathurst, NSW 2795, Australia
[3] Southern Illinois Univ, Dept Comp Sci, Carbondale, IL 62901 USA
关键词
Human desire understanding; Desire analysis; Sentiment analysis; Emotion analysis; Multimodal transformer; Vision-language models; RECOGNITION; FRAMEWORK;
D O I
10.1016/j.neucom.2025.129376
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Desires, emotions, and sentiments are pivotal in understanding and predicting human behavior, influencing various aspects of decision-making, communication, and social interactions. Their analysis, particularly in the context of multimodal data (such as images and texts) from social media, provides profound insights into cultural diversity, psychological well-being, and consumer behavior. Prior studies overlooked the use of image-text pairwise feature representation, which is crucial for the task of human desire understanding. In this research, we have proposed a unified multimodal-based framework with image-text pair settings to identify human desire, sentiment, and emotion. The core of our proposed method lies in the encoder module, which is built using two state-of-the-art multimodal vision-language models (VLMs). To effectively extract visual and contextualized embedding features from social media image and text pairs, we jointly fine-tune two pre-trained multimodal VLMs: Vision-and-Language Transformer (ViLT) and Vision-and-Augmented-Language Transformer (VAuLT). Subsequently, we use an early fusion strategy on these embedding features to obtain combined diverse feature representations. Moreover, we leverage a multi-sample dropout mechanism to enhance the generalization ability and expedite the training process of our proposed method. To evaluate our proposed approach, we used the multimodal dataset MSED for the human desire understanding task. Through our experimental evaluation, we demonstrate that our method excels in capturing both visual and contextual information, resulting in superior performance compared to other state-of-the-art techniques. Specifically, our method outperforms existing approaches by 3% for sentiment analysis, 2.2% for emotion analysis, and approximately 1% for desire analysis.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Sentiment Analysis of Social Media via Multimodal Feature Fusion
    Zhang, Kang
    Geng, Yushui
    Zhao, Jing
    Liu, Jianxin
    Li, Wenxiao
    SYMMETRY-BASEL, 2020, 12 (12): : 1 - 14
  • [2] Transformer-based deep learning models for the sentiment analysis of social media data
    Kokab, Sayyida Tabinda
    Asghar, Sohail
    Naz, Shehneela
    ARRAY, 2022, 14
  • [3] Facial Emotion Recognition for Sentiment Analysis of Social Media Data
    de Paula, Diandre
    Alexandre, Luis A.
    PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2022), 2022, 13256 : 207 - 217
  • [4] Sentiment analysis of social media comments based on multimodal attention fusion network
    Liu, Ziyu
    Yang, Tao
    Chen, Wen
    Chen, Jiangchuan
    Li, Qinru
    Zhang, Jun
    APPLIED SOFT COMPUTING, 2024, 164
  • [5] Understanding Environmental Posts: Sentiment and Emotion Analysis of Social Media Data
    Amangeldi, Daniyar
    Usmanova, Aida
    Shamoi, Pakizar
    IEEE ACCESS, 2024, 12 : 33504 - 33523
  • [6] Sentiment Analysis on Social Media for Emotion Classification
    Tanna, Dilesh
    Dudhane, Manasi
    Sardar, Amrut
    Deshpande, Kiran
    Deshmukh, Neha
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS 2020), 2020, : 911 - 915
  • [7] SS-Trans (Single-Stream Transformer for Multimodal Sentiment Analysis and Emotion Recognition): The Emotion Whisperer-A Single-Stream Transformer for Multimodal Sentiment Analysis
    Ji, Mingyu
    Wei, Ning
    Zhou, Jiawei
    Wang, Xin
    ELECTRONICS, 2024, 13 (21)
  • [8] Aspect Based Sentiment Analysis on Multimodal Data: A Transformer and Low-Rank Fusion Approach
    Jin, Meilin
    Shao, Lianhe
    Wang, Xihan
    Yan, Qianqian
    Chu, Zhulu
    Luo, Tongtong
    Tang, Jiacheng
    Gao, Quanli
    2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE, CCAI 2024, 2024, : 332 - 338
  • [9] Integration of deep learning techniques for sentiment and emotion analysis of social media data
    Hota H.S.
    Sharma D.K.
    Verma N.
    International Journal of Intelligent Systems Technologies and Applications, 2023, 21 (01) : 1 - 20
  • [10] A Unimodal Reinforced Transformer With Time Squeeze Fusion for Multimodal Sentiment Analysis
    He, Jiaxuan
    Mai, Sijie
    Hu, Haifeng
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 992 - 996