Beyond visual cues: Emotion recognition in images with text-aware fusion☆

被引：1

作者：

Sungur, Kerim Serdar ^{[1
]}

Bakal, Gokhan ^{[1
]}

机构：

[1] Abdullah Gul Univ, Dept Comp Engn, Erkilet Blvd Sumer Campus, TR-38080 Kayseri, Turkiye

来源：

DISPLAYS | 2025年 / 87卷

关键词：

Sentiment analysis; Hybrid model; Image & text processing; Deep learning; SENTIMENT ANALYSIS;

D O I：

10.1016/j.displa.2024.102958

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sentiment analysis is a widely studied problem for understanding human emotions and potential outcomes. As it can be performed over textual data, working on visual data elements is also critically substantial to examining the current emotional status. In this effort, the aim is to investigate any potential enhancements in sentiment analysis predictions through visual instances by integrating textual data as additional knowledge reflecting the contextual information of the images. Thus, two separate models have been developed as image-processing and text-processing models in which both models were trained on distinct datasets comprising the same five human emotions. Following, the outputs of the individual models' last dense layers are combined to construct the hybrid multimodel empowered by visual and textual components. The fundamental focus is to evaluate the performance of the hybrid model in which the textual knowledge is concatenated with visual data. Essentially, the hybrid model achieved nearly a 3% F1-score improvement compared to the plain image classification model utilizing convolutional neural network architecture. In essence, this research underscores the potency of fusing textual context with visual information to refine sentiment analysis predictions. The findings not only emphasize the potential of a multi-modal approach but also spotlight a promising avenue for future advancements in emotion analysis and understanding.

引用

页数：8

共 50 条

[1] TACMT: Text-aware cross-modal transformer for visual grounding on high-resolution SAR images
Li, Tianyang
Wang, Chao
Tian, Sirui
Zhang, Bo
Wu, Fan
Tang, Yixian
Zhang, Hong
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2025, 222 : 152 - 166
[2] Context-Aware Based Visual-Audio Feature Fusion for Emotion Recognition
Cheng, Huijie
Tie, Yun
Qi, Lin
Jin, Cong
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[3] Emotion recognition based on joint visual and audio cues
Sebe, Nicu
Cohen, Ira
Gevers, Theo
Huang, Thomas S.
18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2006, : 1136 - +
[4] Context-aware Multimodal Fusion for Emotion Recognition
Li, Jinchao
Wang, Shuai
Chao, Yang
Liu, Xunying
Meng, Helen
INTERSPEECH 2022, 2022, : 2013 - 2017
[5] Robust face recognition by fusion of visual and infrared cues
Kim, Sang-ki
Lee, Hyobin
Yu, Sunjin
Lee, Sangyoun
2006 1ST IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, VOLS 1-3, 2006, : 1594 - +
[6] Robust face recognition by fusion of visual and infrared cues
Kim, Sang-ki
Lee, Hyobin
Yu, Sunjin
Lee, Sangyoun
ICIEA 2006: 1ST IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, VOLS 1-3, PROCEEDINGS, 2006, : 804 - 808
[7] Hierarchical fusion of visual and physiological signals for emotion recognition
Fang, Yuchun
Rong, Ruru
Huang, Jun
MULTIDIMENSIONAL SYSTEMS AND SIGNAL PROCESSING, 2021, 32 (04) : 1103 - 1121
[8] Kernel Fusion of Audio and Visual Information for Emotion Recognition
Wang, Yongjin
Zhang, Rui
Guan, Ling
Venetsanopoulos, A. N.
IMAGE ANALYSIS AND RECOGNITION: 8TH INTERNATIONAL CONFERENCE, ICIAR 2011, PT II: 8TH INTERNATIONAL CONFERENCE, ICIAR 2011, 2011, 6754 : 140 - 150
[9] Physio-visual data fusion for emotion recognition
Maaoui, C.
Abdat, F.
Pruski, A.
IRBM, 2014, 35 (03) : 109 - 118
[10] Hierarchical fusion of visual and physiological signals for emotion recognition
Yuchun Fang
Ruru Rong
Jun Huang
Multidimensional Systems and Signal Processing, 2021, 32 : 1103 - 1121

← 1 2 3 4 5 →