Integrating gating and learned queries in audiovisual emotion recognition

被引：0

作者：

Zhang, Zaifang ^{[1
]}

Guo, Qing ^{[1
]}

Lu, Shunlu ^{[1
]}

Su, Junyi ^{[1
]}

Tang, Tao ^{[1
]}

机构：

[1] Shanghai Univ, Sch Mechatron Engn & Automat, 99 Shangda Rd, Shanghai 200444, Peoples R China

来源：

MULTIMEDIA SYSTEMS | 2024年 / 30卷 / 06期

关键词：

Gating mechanism; Learned queries; Cross-modal fusion; Multimodal emotion recognition; NETWORK;

D O I：

10.1007/s00530-024-01551-1

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Emotion recognition, an important bridge in human-computer interaction, has attracted significant interest. Although numerous studies have made progress in auditory and visual information, effective integration of these two modalities remains a significant challenge. This paper proposes an audiovisual emotion recognition model that achieves more accurate cross-modal fusion by introducing a private enhancement module (PEM) and a shared learned module (SLM). In PEM, a token-level gating mechanism dynamically adjusts feature expression within tokens, while SLM employs learned queries to effectively comprehend differences between modalities, achieving more precise cross-modal fusion. Our model underwent rigorous testing on CREMA-D and IEMOCAP datasets, demonstrating superior recognition capabilities in comparison to existing advanced emotion recognition models. Lastly, through ablation studies, this paper extensively examines the roles and contributions of different modules within the model.

引用

页数：11

共 50 条

[11] An audiovisual emotion recognition system based on rough set theory
Yang, Yong
Wang, Guoyin
Chen, Peijun
Zhou, Jian
PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE: 50 YEARS' ACHIEVEMENTS, FUTURE DIRECTIONS AND SOCIAL IMPACTS, 2006, : 690 - 693
[12] MULTITASK LEARNING AND MULTISTAGE FUSION FOR DIMENSIONAL AUDIOVISUAL EMOTION RECOGNITION
Atmaja, Bagus Tris
Akagi, Masato
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4482 - 4486
[13] DEEP LEARNING FOR ROBUST FEATURE GENERATION IN AUDIOVISUAL EMOTION RECOGNITION
Kim, Yelin
Lee, Honglak
Provost, Emily Mower
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 3687 - 3691
[14] Effectiveness of a short audiovisual emotion recognition training program in adults
Schlegel, Katja
Vicaria, Ishabel M.
Isaacowitz, Derek M.
Hall, Judith A.
MOTIVATION AND EMOTION, 2017, 41 (05) : 646 - 660
[15] Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations
Wu, Wen
Zhang, Chao
Woodland, Philip C.
INTERSPEECH 2023, 2023, : 3607 - 3611
[16] Integrating Recurrence Dynamics for Speech Emotion Recognition
Tzinis, Efthymios
Paraskevopoulos, Georgios
Baziotis, Christos
Potamianos, Alexandros
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 927 - 931
[17] A Probabilistic Fusion Strategy for Audiovisual Emotion Recognition of Sparse and Noisy Data
Lin, Jen-Chun
Wu, Chung-Hsien
Wei, Wen-Li
1ST INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT 2013), 2013, : 278 - 281
[18] Exploring Cross-Modality Affective Reactions for Audiovisual Emotion Recognition
Mariooryad, Soroosh
Busso, Carlos
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2013, 4 (02) : 183 - 196
[19] Exploiting EEG Signals and Audiovisual Feature Fusion for Video Emotion Recognition
Xing, Baixi
Zhang, Hui
Zhang, Kejun
Zhang, Lekai
Wu, Xinda
Shi, Xiaoying
Yu, Shanghai
Zhang, Sanyuan
IEEE ACCESS, 2019, 7 : 59844 - 59861
[20] An audiovisual and contextual approach for categorical and continuous emotion recognition in-the-wild
Antoniadis, Panagiotis
Pikoulis, Ioannis
Filntisis, Panagiotis P.
Maragos, Petros
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3638 - 3644

← 1 2 3 4 5 →