Integrating gating and learned queries in audiovisual emotion recognition

被引：0

作者：

Zhang, Zaifang ^{[1
]}

Guo, Qing ^{[1
]}

Lu, Shunlu ^{[1
]}

Su, Junyi ^{[1
]}

Tang, Tao ^{[1
]}

机构：

[1] Shanghai Univ, Sch Mechatron Engn & Automat, 99 Shangda Rd, Shanghai 200444, Peoples R China

来源：

MULTIMEDIA SYSTEMS | 2024年 / 30卷 / 06期

关键词：

Gating mechanism; Learned queries; Cross-modal fusion; Multimodal emotion recognition; NETWORK;

D O I：

10.1007/s00530-024-01551-1

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Emotion recognition, an important bridge in human-computer interaction, has attracted significant interest. Although numerous studies have made progress in auditory and visual information, effective integration of these two modalities remains a significant challenge. This paper proposes an audiovisual emotion recognition model that achieves more accurate cross-modal fusion by introducing a private enhancement module (PEM) and a shared learned module (SLM). In PEM, a token-level gating mechanism dynamically adjusts feature expression within tokens, while SLM employs learned queries to effectively comprehend differences between modalities, achieving more precise cross-modal fusion. Our model underwent rigorous testing on CREMA-D and IEMOCAP datasets, demonstrating superior recognition capabilities in comparison to existing advanced emotion recognition models. Lastly, through ablation studies, this paper extensively examines the roles and contributions of different modules within the model.

引用

页数：11

共 50 条

[31] Differential Audiovisual Information Processing in Emotion Recognition: An Eye-Tracking Study
Zheng, Yueyuan
Hsiao, Janet H.
EMOTION, 2023, 23 (04) : 1028 - 1039
[32] A multimodal emotion recognition model integrating speech, video and MoCAP
Ning Jia
Chunjun Zheng
Wei Sun
Multimedia Tools and Applications, 2022, 81 : 32265 - 32286
[33] Integrating Emotion Recognition Tools for Developing Emotionally Intelligent Agents
Marcos-Pablos, Samuel
Lobato, Fernando
Garcia-Penalvo, Francisco
INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2022, 7 (06): : 69 - 76
[34] Integrating Facial Expression and Body Gesture in Videos for Emotion Recognition
Yan, Jingjie
Zheng, Wenming
Xin, Minhai
Yan, Jingwei
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (03): : 610 - 613
[35] A multimodal emotion recognition model integrating speech, video and MoCAP
Jia, Ning
Zheng, Chunjun
Sun, Wei
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (22) : 32265 - 32286
[36] Exploiting IoT Services by Integrating Emotion Recognition in Web of Objects
Jarwar, Muhammad Aslam
Chong, Ilyoung
2017 31ST INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN), 2017, : 54 - 56
[37] An Occam's Razor View on Learning Audiovisual Emotion Recognition with Small Training Sets
Vielzeuf, Valentin
Kervadec, Corentin
Pateux, Stephane
Lechervy, Alexis
Jurie, Frederic
ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 589 - 593
[38] Feature-level and Model-level Audiovisual Fusion for Emotion Recognition in the Wild
Cai, Jie
Meng, Zibo
Khan, Ahmed Shehab
Li, Zhiyuan
O'Reilly, James
Han, Shizhong
Liu, Ping
Chen, Min
Tong, Yan
2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 443 - 448
[39] Pamphlets against emotion and audiovisual
Vizcaino-Alcantud, Pablo
ALPHA-REVISTA DE ARTES LETRAS Y FILOSOFIA, 2021, (53): : 349 - 349
[40] An Emotion Recognition Method Based on Eye Movement and Audiovisual Features in MOOC Learning Environment
Bao, Jindi
Tao, Xiaomei
Zhou, Yinghui
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (01) : 171 - 183

← 1 2 3 4 5 →