Integrating gating and learned queries in audiovisual emotion recognition

被引:0
|
作者
Zhang, Zaifang [1 ]
Guo, Qing [1 ]
Lu, Shunlu [1 ]
Su, Junyi [1 ]
Tang, Tao [1 ]
机构
[1] Shanghai Univ, Sch Mechatron Engn & Automat, 99 Shangda Rd, Shanghai 200444, Peoples R China
关键词
Gating mechanism; Learned queries; Cross-modal fusion; Multimodal emotion recognition; NETWORK;
D O I
10.1007/s00530-024-01551-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotion recognition, an important bridge in human-computer interaction, has attracted significant interest. Although numerous studies have made progress in auditory and visual information, effective integration of these two modalities remains a significant challenge. This paper proposes an audiovisual emotion recognition model that achieves more accurate cross-modal fusion by introducing a private enhancement module (PEM) and a shared learned module (SLM). In PEM, a token-level gating mechanism dynamically adjusts feature expression within tokens, while SLM employs learned queries to effectively comprehend differences between modalities, achieving more precise cross-modal fusion. Our model underwent rigorous testing on CREMA-D and IEMOCAP datasets, demonstrating superior recognition capabilities in comparison to existing advanced emotion recognition models. Lastly, through ablation studies, this paper extensively examines the roles and contributions of different modules within the model.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] MULTIMODAL INFORMATION FUSION OF AUDIOVISUAL EMOTION RECOGNITION USING NOVEL INFORMATION THEORETIC TOOLS
    Xie, Zhibing
    Guan, Ling
    2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2013), 2013,
  • [42] Multimodal Information Fusion of Audiovisual Emotion Recognition Using Novel Information Theoretic Tools
    Xie, Zhibing
    Guan, Ling
    INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2013, 4 (04): : 1 - 14
  • [43] End-to-End Modeling and Transfer Learning for Audiovisual Emotion Recognition in-the-Wild
    Dresvyanskiy, Denis
    Ryumina, Elena
    Kaya, Heysem
    Markitantov, Maxim
    Karpov, Alexey
    Minker, Wolfgang
    MULTIMODAL TECHNOLOGIES AND INTERACTION, 2022, 6 (02)
  • [44] Lessons Learned from Developing Emotion Recognition System for Everyday Life
    Saganowski, Stanislaw
    Miszczyk, Jan
    Kunc, Dominika
    Lisouski, Dzmitry
    Kazienko, Przemyslaw
    PROCEEDINGS OF THE TWENTIETH ACM CONFERENCE ON EMBEDDED NETWORKED SENSOR SYSTEMS, SENSYS 2022, 2022, : 1047 - 1054
  • [45] Learned Index for Spatial Queries
    Wang, Haixin
    Fu, Xiaoyi
    Xu, Jianliang
    Lu, Hua
    2019 20TH INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (MDM 2019), 2019, : 569 - 574
  • [46] ALIGNING AUDIOVISUAL FEATURES FOR AUDIOVISUAL SPEECH RECOGNITION
    Tao, Fei
    Busso, Carlos
    2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,
  • [47] Emotion Recognition by Integrating Eye Movement Analysis and Facial Expression Model
    Thong Van Huynh
    Yang, Hyung-Jeong
    Lee, Guee-Sang
    Kim, Soo-Hyung
    Na, In-Seop
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND SOFT COMPUTING (ICMLSC 2019), 2019, : 166 - 169
  • [48] From Smart to Personal Environment: Integrating Emotion Recognition into Smart Houses
    Fedotov, Dmitrii
    Matsuda, Yuki
    Minker, Wolfgang
    2019 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS (PERCOM WORKSHOPS), 2019, : 943 - 948
  • [49] Audiovisual gating and the time course of speech perception
    Munhall, KG
    Tohkura, Y
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1998, 104 (01): : 530 - 539
  • [50] MULTILINGUAL SPEECH EMOTION RECOGNITION WITH MULTI-GATING MECHANISM AND NEURAL ARCHITECTURE SEARCH
    Wang, Zihan
    Meng, Qi
    Lan, HaiFeng
    Zhang, XinRui
    Guo, KeHao
    Gupta, Akshat
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 806 - 813