Beyond visual cues: Emotion recognition in images with text-aware fusion☆

被引：1

作者：

Sungur, Kerim Serdar ^{[1
]}

Bakal, Gokhan ^{[1
]}

机构：

[1] Abdullah Gul Univ, Dept Comp Engn, Erkilet Blvd Sumer Campus, TR-38080 Kayseri, Turkiye

来源：

DISPLAYS | 2025年 / 87卷

关键词：

Sentiment analysis; Hybrid model; Image & text processing; Deep learning; SENTIMENT ANALYSIS;

D O I：

10.1016/j.displa.2024.102958

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sentiment analysis is a widely studied problem for understanding human emotions and potential outcomes. As it can be performed over textual data, working on visual data elements is also critically substantial to examining the current emotional status. In this effort, the aim is to investigate any potential enhancements in sentiment analysis predictions through visual instances by integrating textual data as additional knowledge reflecting the contextual information of the images. Thus, two separate models have been developed as image-processing and text-processing models in which both models were trained on distinct datasets comprising the same five human emotions. Following, the outputs of the individual models' last dense layers are combined to construct the hybrid multimodel empowered by visual and textual components. The fundamental focus is to evaluate the performance of the hybrid model in which the textual knowledge is concatenated with visual data. Essentially, the hybrid model achieved nearly a 3% F1-score improvement compared to the plain image classification model utilizing convolutional neural network architecture. In essence, this research underscores the potency of fusing textual context with visual information to refine sentiment analysis predictions. The findings not only emphasize the potential of a multi-modal approach but also spotlight a promising avenue for future advancements in emotion analysis and understanding.

引用

页数：8

共 50 条

[41] Deep Multimodal Fusion for Depression Detection: Integrating Facial Emotion Recognition, EEG Signals and Audio Cues
Thirunavukkarasu, J.
Jebamathi, Shiny M.
Varshaa, P.
Nisha, M.
Sri, Nanthitha M.
2024 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND APPLIED INFORMATICS, ACCAI 2024, 2024,
[42] A Multimodal Facial Emotion Recognition Framework through the Fusion of Speech with Visible and Infrared Images
Siddiqui, Mohammad Faridul Haque
Javaid, Ahmad Y.
MULTIMODAL TECHNOLOGIES AND INTERACTION, 2020, 4 (03) : 1 - 21
[43] Efficient bimodal emotion recognition system based on speech/text embeddings and ensemble learning fusion
Chakhtouna, Adil
Sekkate, Sara
Adib, Abdellah
ANNALS OF TELECOMMUNICATIONS, 2025,
[44] FUSION APPROACHES FOR EMOTION RECOGNITION FROM SPEECH USING ACOUSTIC AND TEXT-BASED FEATURES
Pepino, Leonardo
Riera, Pablo
Ferrer, Luciana
Gravano, Agustin
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6484 - 6488
[45] Feature and Decision Level Audio-visual Data Fusion in Emotion Recognition Problem
Sidorov, Maxim
Sopov, Evgenii
Ivanov, Ilia
Minker, Wolfgang
ICIMCO 2015 PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, VOL. 2, 2015, : 246 - 251
[46] Continuous Emotion Recognition with Audio-visual Leader-follower Attentive Fusion
Zhang, Su
Ding, Yi
Wei, Ziquan
Guan, Cuntai
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3560 - 3567
[47] Featural vs. Holistic processing and visual sampling in the influence of social category cues on emotion recognition
Craig, Belinda M.
Chen, Nigel T. M.
Lipp, Ottmar, V
COGNITION & EMOTION, 2022, 36 (05) : 855 - 875
[48] Fusion of thermal and visual images for efficient face recognition using Gabor filter
Ahmad, Jahanzed
Ali, Usman
Qureshi, Rashid Jalal
2006 IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1-3, 2006, : 135 - +
[49] Multi-grained visual pivot-guided multi-modal neural machine translation with text-aware cross-modal contrastive disentangling
Guo, Junjun
Su, Rui
Ye, Junjie
NEURAL NETWORKS, 2024, 178
[50] Length Uncertainty-Aware Graph Contrastive Fusion Network for multimodal physiological signal emotion recognition
Li, Guangqiang
Chen, Ning
Zhu, Hongqing
Li, Jing
Xu, Zhangyong
Zhu, Zhiying
NEURAL NETWORKS, 2025, 187

← 1 2 3 4 5 →