Integrating gating and learned queries in audiovisual emotion recognition

被引：0

作者：

Zhang, Zaifang ^{[1
]}

Guo, Qing ^{[1
]}

Lu, Shunlu ^{[1
]}

Su, Junyi ^{[1
]}

Tang, Tao ^{[1
]}

机构：

[1] Shanghai Univ, Sch Mechatron Engn & Automat, 99 Shangda Rd, Shanghai 200444, Peoples R China

来源：

MULTIMEDIA SYSTEMS | 2024年 / 30卷 / 06期

关键词：

Gating mechanism; Learned queries; Cross-modal fusion; Multimodal emotion recognition; NETWORK;

D O I：

10.1007/s00530-024-01551-1

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Emotion recognition, an important bridge in human-computer interaction, has attracted significant interest. Although numerous studies have made progress in auditory and visual information, effective integration of these two modalities remains a significant challenge. This paper proposes an audiovisual emotion recognition model that achieves more accurate cross-modal fusion by introducing a private enhancement module (PEM) and a shared learned module (SLM). In PEM, a token-level gating mechanism dynamically adjusts feature expression within tokens, while SLM employs learned queries to effectively comprehend differences between modalities, achieving more precise cross-modal fusion. Our model underwent rigorous testing on CREMA-D and IEMOCAP datasets, demonstrating superior recognition capabilities in comparison to existing advanced emotion recognition models. Lastly, through ablation studies, this paper extensively examines the roles and contributions of different modules within the model.

引用

页数：11

共 50 条

[41] MULTIMODAL INFORMATION FUSION OF AUDIOVISUAL EMOTION RECOGNITION USING NOVEL INFORMATION THEORETIC TOOLS
Xie, Zhibing
Guan, Ling
2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2013), 2013,
[42] Multimodal Information Fusion of Audiovisual Emotion Recognition Using Novel Information Theoretic Tools
Xie, Zhibing
Guan, Ling
INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2013, 4 (04): : 1 - 14
[43] End-to-End Modeling and Transfer Learning for Audiovisual Emotion Recognition in-the-Wild
Dresvyanskiy, Denis
Ryumina, Elena
Kaya, Heysem
Markitantov, Maxim
Karpov, Alexey
Minker, Wolfgang
MULTIMODAL TECHNOLOGIES AND INTERACTION, 2022, 6 (02)
[44] Lessons Learned from Developing Emotion Recognition System for Everyday Life
Saganowski, Stanislaw
Miszczyk, Jan
Kunc, Dominika
Lisouski, Dzmitry
Kazienko, Przemyslaw
PROCEEDINGS OF THE TWENTIETH ACM CONFERENCE ON EMBEDDED NETWORKED SENSOR SYSTEMS, SENSYS 2022, 2022, : 1047 - 1054
[45] Learned Index for Spatial Queries
Wang, Haixin
Fu, Xiaoyi
Xu, Jianliang
Lu, Hua
2019 20TH INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (MDM 2019), 2019, : 569 - 574
[46] ALIGNING AUDIOVISUAL FEATURES FOR AUDIOVISUAL SPEECH RECOGNITION
Tao, Fei
Busso, Carlos
2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,
[47] Emotion Recognition by Integrating Eye Movement Analysis and Facial Expression Model
Thong Van Huynh
Yang, Hyung-Jeong
Lee, Guee-Sang
Kim, Soo-Hyung
Na, In-Seop
PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND SOFT COMPUTING (ICMLSC 2019), 2019, : 166 - 169
[48] From Smart to Personal Environment: Integrating Emotion Recognition into Smart Houses
Fedotov, Dmitrii
Matsuda, Yuki
Minker, Wolfgang
2019 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS (PERCOM WORKSHOPS), 2019, : 943 - 948
[49] Audiovisual gating and the time course of speech perception
Munhall, KG
Tohkura, Y
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1998, 104 (01): : 530 - 539
[50] MULTILINGUAL SPEECH EMOTION RECOGNITION WITH MULTI-GATING MECHANISM AND NEURAL ARCHITECTURE SEARCH
Wang, Zihan
Meng, Qi
Lan, HaiFeng
Zhang, XinRui
Guo, KeHao
Gupta, Akshat
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 806 - 813

← 1 2 3 4 5 →