Integrating gating and learned queries in audiovisual emotion recognition

被引:0
|
作者
Zhang, Zaifang [1 ]
Guo, Qing [1 ]
Lu, Shunlu [1 ]
Su, Junyi [1 ]
Tang, Tao [1 ]
机构
[1] Shanghai Univ, Sch Mechatron Engn & Automat, 99 Shangda Rd, Shanghai 200444, Peoples R China
关键词
Gating mechanism; Learned queries; Cross-modal fusion; Multimodal emotion recognition; NETWORK;
D O I
10.1007/s00530-024-01551-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotion recognition, an important bridge in human-computer interaction, has attracted significant interest. Although numerous studies have made progress in auditory and visual information, effective integration of these two modalities remains a significant challenge. This paper proposes an audiovisual emotion recognition model that achieves more accurate cross-modal fusion by introducing a private enhancement module (PEM) and a shared learned module (SLM). In PEM, a token-level gating mechanism dynamically adjusts feature expression within tokens, while SLM employs learned queries to effectively comprehend differences between modalities, achieving more precise cross-modal fusion. Our model underwent rigorous testing on CREMA-D and IEMOCAP datasets, demonstrating superior recognition capabilities in comparison to existing advanced emotion recognition models. Lastly, through ablation studies, this paper extensively examines the roles and contributions of different modules within the model.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Differential Audiovisual Information Processing in Emotion Recognition: An Eye-Tracking Study
    Zheng, Yueyuan
    Hsiao, Janet H.
    EMOTION, 2023, 23 (04) : 1028 - 1039
  • [32] A multimodal emotion recognition model integrating speech, video and MoCAP
    Ning Jia
    Chunjun Zheng
    Wei Sun
    Multimedia Tools and Applications, 2022, 81 : 32265 - 32286
  • [33] Integrating Emotion Recognition Tools for Developing Emotionally Intelligent Agents
    Marcos-Pablos, Samuel
    Lobato, Fernando
    Garcia-Penalvo, Francisco
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2022, 7 (06): : 69 - 76
  • [34] Integrating Facial Expression and Body Gesture in Videos for Emotion Recognition
    Yan, Jingjie
    Zheng, Wenming
    Xin, Minhai
    Yan, Jingwei
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (03): : 610 - 613
  • [35] A multimodal emotion recognition model integrating speech, video and MoCAP
    Jia, Ning
    Zheng, Chunjun
    Sun, Wei
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (22) : 32265 - 32286
  • [36] Exploiting IoT Services by Integrating Emotion Recognition in Web of Objects
    Jarwar, Muhammad Aslam
    Chong, Ilyoung
    2017 31ST INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN), 2017, : 54 - 56
  • [37] An Occam's Razor View on Learning Audiovisual Emotion Recognition with Small Training Sets
    Vielzeuf, Valentin
    Kervadec, Corentin
    Pateux, Stephane
    Lechervy, Alexis
    Jurie, Frederic
    ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 589 - 593
  • [38] Feature-level and Model-level Audiovisual Fusion for Emotion Recognition in the Wild
    Cai, Jie
    Meng, Zibo
    Khan, Ahmed Shehab
    Li, Zhiyuan
    O'Reilly, James
    Han, Shizhong
    Liu, Ping
    Chen, Min
    Tong, Yan
    2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 443 - 448
  • [39] Pamphlets against emotion and audiovisual
    Vizcaino-Alcantud, Pablo
    ALPHA-REVISTA DE ARTES LETRAS Y FILOSOFIA, 2021, (53): : 349 - 349
  • [40] An Emotion Recognition Method Based on Eye Movement and Audiovisual Features in MOOC Learning Environment
    Bao, Jindi
    Tao, Xiaomei
    Zhou, Yinghui
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (01) : 171 - 183