Beyond visual cues: Emotion recognition in images with text-aware fusion☆

被引:1
|
作者
Sungur, Kerim Serdar [1 ]
Bakal, Gokhan [1 ]
机构
[1] Abdullah Gul Univ, Dept Comp Engn, Erkilet Blvd Sumer Campus, TR-38080 Kayseri, Turkiye
关键词
Sentiment analysis; Hybrid model; Image & text processing; Deep learning; SENTIMENT ANALYSIS;
D O I
10.1016/j.displa.2024.102958
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis is a widely studied problem for understanding human emotions and potential outcomes. As it can be performed over textual data, working on visual data elements is also critically substantial to examining the current emotional status. In this effort, the aim is to investigate any potential enhancements in sentiment analysis predictions through visual instances by integrating textual data as additional knowledge reflecting the contextual information of the images. Thus, two separate models have been developed as image-processing and text-processing models in which both models were trained on distinct datasets comprising the same five human emotions. Following, the outputs of the individual models' last dense layers are combined to construct the hybrid multimodel empowered by visual and textual components. The fundamental focus is to evaluate the performance of the hybrid model in which the textual knowledge is concatenated with visual data. Essentially, the hybrid model achieved nearly a 3% F1-score improvement compared to the plain image classification model utilizing convolutional neural network architecture. In essence, this research underscores the potency of fusing textual context with visual information to refine sentiment analysis predictions. The findings not only emphasize the potential of a multi-modal approach but also spotlight a promising avenue for future advancements in emotion analysis and understanding.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] TACMT: Text-aware cross-modal transformer for visual grounding on high-resolution SAR images
    Li, Tianyang
    Wang, Chao
    Tian, Sirui
    Zhang, Bo
    Wu, Fan
    Tang, Yixian
    Zhang, Hong
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2025, 222 : 152 - 166
  • [2] Context-Aware Based Visual-Audio Feature Fusion for Emotion Recognition
    Cheng, Huijie
    Tie, Yun
    Qi, Lin
    Jin, Cong
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [3] Emotion recognition based on joint visual and audio cues
    Sebe, Nicu
    Cohen, Ira
    Gevers, Theo
    Huang, Thomas S.
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2006, : 1136 - +
  • [4] Context-aware Multimodal Fusion for Emotion Recognition
    Li, Jinchao
    Wang, Shuai
    Chao, Yang
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2022, 2022, : 2013 - 2017
  • [5] Robust face recognition by fusion of visual and infrared cues
    Kim, Sang-ki
    Lee, Hyobin
    Yu, Sunjin
    Lee, Sangyoun
    2006 1ST IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, VOLS 1-3, 2006, : 1594 - +
  • [6] Robust face recognition by fusion of visual and infrared cues
    Kim, Sang-ki
    Lee, Hyobin
    Yu, Sunjin
    Lee, Sangyoun
    ICIEA 2006: 1ST IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, VOLS 1-3, PROCEEDINGS, 2006, : 804 - 808
  • [7] Hierarchical fusion of visual and physiological signals for emotion recognition
    Fang, Yuchun
    Rong, Ruru
    Huang, Jun
    MULTIDIMENSIONAL SYSTEMS AND SIGNAL PROCESSING, 2021, 32 (04) : 1103 - 1121
  • [8] Kernel Fusion of Audio and Visual Information for Emotion Recognition
    Wang, Yongjin
    Zhang, Rui
    Guan, Ling
    Venetsanopoulos, A. N.
    IMAGE ANALYSIS AND RECOGNITION: 8TH INTERNATIONAL CONFERENCE, ICIAR 2011, PT II: 8TH INTERNATIONAL CONFERENCE, ICIAR 2011, 2011, 6754 : 140 - 150
  • [9] Physio-visual data fusion for emotion recognition
    Maaoui, C.
    Abdat, F.
    Pruski, A.
    IRBM, 2014, 35 (03) : 109 - 118
  • [10] Hierarchical fusion of visual and physiological signals for emotion recognition
    Yuchun Fang
    Ruru Rong
    Jun Huang
    Multidimensional Systems and Signal Processing, 2021, 32 : 1103 - 1121