Deep CNN with late fusion for real time multimodal emotion recognition

被引：4

作者：

Dixit, Chhavi ^{[1
]}

Satapathy, Shashank Mouli ^{[2
]}

机构：

[1] Shell India Markets Pvt Ltd, Bengaluru 560103, Karnataka, India

[2] Vellore Inst Technol, Sch Comp Sci & Engn, Vellore 632014, Tamil Nadu, India

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 240卷

关键词：

CNN; Cross dataset; Ensemble learning; FastText; Multimodal emotion recognition; Stacking; SENTIMENT ANALYSIS; MODEL;

D O I：

10.1016/j.eswa.2023.122579

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Emotion recognition is a fundamental aspect of human communication and plays a crucial role in various domains. This project aims at developing an efficient model for real-time multimodal emotion recognition in videos of human oration (opinion videos), where the speakers express their opinions about various topics. Four separate datasets contributing 20,000 samples for text, 1,440 for audio, 35,889 for images, and 3,879 videos for multimodal analysis respectively are used. One model is trained for each of the modalities: fastText for text analysis because of its efficiency, robustness to noise, and pre-trained embeddings; customized 1-D CNN for audio analysis using its translation invariance, hierarchical feature extraction, scalability, and generalization; custom 2-D CNN for image analysis because of its ability to capture local features and handle variations in image content. They are tested and combined on the CMU-MOSEI dataset using both bagging and stacking to find the most effective architecture. They are then used for real-time analysis of speeches. Each of the models is trained on 80% of the datasets, the remaining 20% is used for testing individual and combined accuracies in CMU-MOSEI. The emotions finally predicted by the architecture correspond to the six classes in the CMU-MOSEI dataset. This cross-dataset training and testing of the models makes them robust and efficient for general use, removes reliance on a specific domain or dataset, and adds more data points for model training. The proposed architecture was able to achieve an accuracy of 85.85% and an F1-score of 83 on the CMU-MOSEI dataset.

引用

页数：15

共 50 条

[1] Real-time music emotion recognition based on multimodal fusion
Hao, Xingye
Li, Honghe
Wen, Yonggang
ALEXANDRIA ENGINEERING JOURNAL, 2025, 116 : 586 - 600
[2] Real-time fear emotion recognition in mice based on multimodal data fusion
Wang, Hao
Shi, Zhanpeng
Hu, Ruijie
Wang, Xinyi
Chen, Jian
Che, Haoyuan
SCIENTIFIC REPORTS, 2025, 15 (01):
[3] Deep Feature Extraction and Attention Fusion for Multimodal Emotion Recognition
Yang, Zhiyi
Li, Dahua
Hou, Fazheng
Song, Yu
Gao, Qiang
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (03) : 1526 - 1530
[4] Robust CNN for facial emotion recognition and real-time GUI
Ali I.
Ghaffar F.
AIMS Electronics and Electrical Engineering, 2024, 8 (02): : 217 - 236
[5] Emotion Recognition and Classification of Film Reviews Based on Deep Learning and Multimodal Fusion
Na, Risu
Sun, Ning
WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
[6] Feature Fusion for Multimodal Emotion Recognition Based on Deep Canonical Correlation Analysis
Zhang, Ke
Li, Yuanqing
Wang, Jingyu
Wang, Zhen
Li, Xuelong
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1898 - 1902
[7] An early fusion approach for multimodal emotion recognition using deep recurrent networks
Bucur, Beniamin
Somfeleam, Iulia
Ghiurutan, Alexandru
Lcmnaru, Camelia
Dinsoreanu, Mihaela
2018 IEEE 14TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP), 2018, : 71 - 78
[8] Data Fusion for Real-time Multimodal Emotion Recognition through Webcams and Microphones in E-Learning
Bahreini, Kiavash
Nadolski, Rob
Westera, Wim
INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2016, 32 (05) : 415 - 430
[9] MULTIMODAL TRANSFORMER FUSION FOR CONTINUOUS EMOTION RECOGNITION
Huang, Jian
Tao, Jianhua
Liu, Bin
Lian, Zheng
Niu, Mingyue
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3507 - 3511
[10] Multimodal Emotion Recognition Based on Feature Fusion
Xu, Yurui
Wu, Xiao
Su, Hang
Liu, Xiaorui
2022 INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2022), 2022, : 7 - 11

← 1 2 3 4 5 →