UEFN: Efficient uncertainty estimation fusion network for reliable multimodal sentiment analysis

被引:0
|
作者
Wang, Shuai [1 ,2 ]
Ratnavelu, K. [2 ]
Bin Shibghatullah, Abdul Samad [2 ,3 ]
机构
[1] Yuncheng Univ, Shanxi Prov Optoelect Informat Sci & Technol Lab, Yuncheng 044000, Peoples R China
[2] UCSI Univ, Inst Comp Sci & Digital Innovat, Fac Appl Sci, 1 Jalan Menara Gading, Cheras 56000, Kuala Lumpur, Malaysia
[3] Univ Tenaga Nas, Coll Engn, Jalan Kajang Puchong, Kajang 43009, Selangor, Malaysia
关键词
Uncertainty estimation; Multimodal representation; Sentiment analysis; Decision fusion; Social media;
D O I
10.1007/s10489-024-06113-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rapid evolution of the digital era has greatly transformed social media, resulting in more diverse emotional expressions and increasingly complex public discourse. Consequently, identifying relationships within multimodal data has become increasingly challenging. Most current multimodal sentiment analysis (MSA) methods concentrate on merging data from diverse modalities into an integrated feature representation to enhance recognition performance by leveraging the complementary nature of multimodal data. However, these approaches often overlook prediction reliability. To address this, we propose the uncertainty estimation fusion network (UEFN), a reliable MSA method based on uncertainty estimation. UEFN combines the Dirichlet distribution and Dempster-Shafer evidence theory (DSET) to predict the probability distribution and uncertainty of text, speech, and image modalities, fusing the predictions at the decision level. Specifically, the method first represents the contextual features of text, speech, and image modalities separately. It then employs a fully connected neural network to transform features from different modalities into evidence forms. Subsequently, it parameterizes the evidence of different modalities via the Dirichlet distribution and estimates the probability distribution and uncertainty for each modality. Finally, we use DSET to fuse the predictions, obtaining the sentiment analysis results and uncertainty estimation, referred to as the multimodal decision fusion layer (MDFL). Additionally, on the basis of the modality uncertainty generated by subjective logic theory, we calculate feature weights, apply them to the corresponding features, concatenate the weighted features, and feed them into a feedforward neural network for sentiment classification, forming the adaptive weight fusion layer (AWFL). Both MDFL and AWFL are then used for multitask training. Experimental comparisons demonstrate that the UEFN not only achieves excellent performance but also provides uncertainty estimation along with the predictions, enhancing the reliability and interpretability of the results.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Attention fusion network for multimodal sentiment analysis
    Yuanyi Luo
    Rui Wu
    Jiafeng Liu
    Xianglong Tang
    Multimedia Tools and Applications, 2024, 83 : 8207 - 8217
  • [2] Attention fusion network for multimodal sentiment analysis
    Luo, Yuanyi
    Wu, Rui
    Liu, Jiafeng
    Tang, Xianglong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 8207 - 8217
  • [3] Fusion-Extraction Network for Multimodal Sentiment Analysis
    Jiang, Tao
    Wang, Jiahai
    Liu, Zhiyue
    Ling, Yingbiao
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 785 - 797
  • [4] SKEAFN: Sentiment Knowledge Enhanced Attention Fusion Network for multimodal sentiment analysis
    Zhu, Chuanbo
    Chen, Min
    Zhang, Sheng
    Sun, Chao
    Liang, Han
    Liu, Yifan
    Chen, Jincai
    INFORMATION FUSION, 2023, 100
  • [5] Global Local Fusion Neural Network for Multimodal Sentiment Analysis
    Hu, Xiaoran
    Yamamura, Masayuki
    APPLIED SCIENCES-BASEL, 2022, 12 (17):
  • [6] Graph Reconstruction Attention Fusion Network for Multimodal Sentiment Analysis
    Hu, Ronglong
    Yi, Jizheng
    Chen, Lijiang
    Jin, Ze
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2025, 21 (01) : 297 - 306
  • [7] Efficient Multimodal Fusion for Hand Pose Estimation With Hourglass Network
    Hoang, Dinh-Cuong
    Xuan Tan, Phan
    Pham, Duc-Long
    Pham, Hai-Nam
    Bui, Son-Anh
    Nguyen, Chi-Minh
    Phi, An-Binh
    Tran, Khanh-Duong
    Trinh, Viet-Anh
    Tran, van-Duc
    Tran, Duc-Thanh
    Duong, van-Hiep
    Phan, Khanh-Toan
    Nguyen, van-Thiep
    Vu, van-Duc
    Nguyen, Thu-Uyen
    IEEE ACCESS, 2024, 12 : 113810 - 113825
  • [8] TETFN: A text enhanced transformer fusion network for multimodal sentiment analysis
    Wang, Di
    Guo, Xutong
    Tian, Yumin
    Liu, Jinhui
    He, LiHuo
    Luo, Xuemei
    PATTERN RECOGNITION, 2023, 136
  • [9] Multimodal Sentiment Analysis Based on Attention Mechanism and Tensor Fusion Network
    Zhang, Kang
    Geng, Yushui
    Zhao, Jing
    Li, Wenxiao
    Liu, Jianxin
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 1473 - 1477
  • [10] Prompt Link Multimodal Fusion in Multimodal Sentiment Analysis
    Zhu, Kang
    Fan, Cunhang
    Tao, Jianhua
    Lv, Zhao
    INTERSPEECH 2024, 2024, : 4668 - 4672