Beyond visual cues: Emotion recognition in images with text-aware fusion☆

被引:1
|
作者
Sungur, Kerim Serdar [1 ]
Bakal, Gokhan [1 ]
机构
[1] Abdullah Gul Univ, Dept Comp Engn, Erkilet Blvd Sumer Campus, TR-38080 Kayseri, Turkiye
关键词
Sentiment analysis; Hybrid model; Image & text processing; Deep learning; SENTIMENT ANALYSIS;
D O I
10.1016/j.displa.2024.102958
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis is a widely studied problem for understanding human emotions and potential outcomes. As it can be performed over textual data, working on visual data elements is also critically substantial to examining the current emotional status. In this effort, the aim is to investigate any potential enhancements in sentiment analysis predictions through visual instances by integrating textual data as additional knowledge reflecting the contextual information of the images. Thus, two separate models have been developed as image-processing and text-processing models in which both models were trained on distinct datasets comprising the same five human emotions. Following, the outputs of the individual models' last dense layers are combined to construct the hybrid multimodel empowered by visual and textual components. The fundamental focus is to evaluate the performance of the hybrid model in which the textual knowledge is concatenated with visual data. Essentially, the hybrid model achieved nearly a 3% F1-score improvement compared to the plain image classification model utilizing convolutional neural network architecture. In essence, this research underscores the potency of fusing textual context with visual information to refine sentiment analysis predictions. The findings not only emphasize the potential of a multi-modal approach but also spotlight a promising avenue for future advancements in emotion analysis and understanding.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Deep Multimodal Fusion for Depression Detection: Integrating Facial Emotion Recognition, EEG Signals and Audio Cues
    Thirunavukkarasu, J.
    Jebamathi, Shiny M.
    Varshaa, P.
    Nisha, M.
    Sri, Nanthitha M.
    2024 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND APPLIED INFORMATICS, ACCAI 2024, 2024,
  • [42] A Multimodal Facial Emotion Recognition Framework through the Fusion of Speech with Visible and Infrared Images
    Siddiqui, Mohammad Faridul Haque
    Javaid, Ahmad Y.
    MULTIMODAL TECHNOLOGIES AND INTERACTION, 2020, 4 (03) : 1 - 21
  • [43] Efficient bimodal emotion recognition system based on speech/text embeddings and ensemble learning fusion
    Chakhtouna, Adil
    Sekkate, Sara
    Adib, Abdellah
    ANNALS OF TELECOMMUNICATIONS, 2025,
  • [44] FUSION APPROACHES FOR EMOTION RECOGNITION FROM SPEECH USING ACOUSTIC AND TEXT-BASED FEATURES
    Pepino, Leonardo
    Riera, Pablo
    Ferrer, Luciana
    Gravano, Agustin
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6484 - 6488
  • [45] Feature and Decision Level Audio-visual Data Fusion in Emotion Recognition Problem
    Sidorov, Maxim
    Sopov, Evgenii
    Ivanov, Ilia
    Minker, Wolfgang
    ICIMCO 2015 PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, VOL. 2, 2015, : 246 - 251
  • [46] Continuous Emotion Recognition with Audio-visual Leader-follower Attentive Fusion
    Zhang, Su
    Ding, Yi
    Wei, Ziquan
    Guan, Cuntai
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3560 - 3567
  • [47] Featural vs. Holistic processing and visual sampling in the influence of social category cues on emotion recognition
    Craig, Belinda M.
    Chen, Nigel T. M.
    Lipp, Ottmar, V
    COGNITION & EMOTION, 2022, 36 (05) : 855 - 875
  • [48] Fusion of thermal and visual images for efficient face recognition using Gabor filter
    Ahmad, Jahanzed
    Ali, Usman
    Qureshi, Rashid Jalal
    2006 IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1-3, 2006, : 135 - +
  • [49] Multi-grained visual pivot-guided multi-modal neural machine translation with text-aware cross-modal contrastive disentangling
    Guo, Junjun
    Su, Rui
    Ye, Junjie
    NEURAL NETWORKS, 2024, 178
  • [50] Length Uncertainty-Aware Graph Contrastive Fusion Network for multimodal physiological signal emotion recognition
    Li, Guangqiang
    Chen, Ning
    Zhu, Hongqing
    Li, Jing
    Xu, Zhangyong
    Zhu, Zhiying
    NEURAL NETWORKS, 2025, 187