Integrating gating and learned queries in audiovisual emotion recognition

被引:0
|
作者
Zhang, Zaifang [1 ]
Guo, Qing [1 ]
Lu, Shunlu [1 ]
Su, Junyi [1 ]
Tang, Tao [1 ]
机构
[1] Shanghai Univ, Sch Mechatron Engn & Automat, 99 Shangda Rd, Shanghai 200444, Peoples R China
关键词
Gating mechanism; Learned queries; Cross-modal fusion; Multimodal emotion recognition; NETWORK;
D O I
10.1007/s00530-024-01551-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotion recognition, an important bridge in human-computer interaction, has attracted significant interest. Although numerous studies have made progress in auditory and visual information, effective integration of these two modalities remains a significant challenge. This paper proposes an audiovisual emotion recognition model that achieves more accurate cross-modal fusion by introducing a private enhancement module (PEM) and a shared learned module (SLM). In PEM, a token-level gating mechanism dynamically adjusts feature expression within tokens, while SLM employs learned queries to effectively comprehend differences between modalities, achieving more precise cross-modal fusion. Our model underwent rigorous testing on CREMA-D and IEMOCAP datasets, demonstrating superior recognition capabilities in comparison to existing advanced emotion recognition models. Lastly, through ablation studies, this paper extensively examines the roles and contributions of different modules within the model.
引用
收藏
页数:11
相关论文
共 50 条
  • [11] An audiovisual emotion recognition system based on rough set theory
    Yang, Yong
    Wang, Guoyin
    Chen, Peijun
    Zhou, Jian
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE: 50 YEARS' ACHIEVEMENTS, FUTURE DIRECTIONS AND SOCIAL IMPACTS, 2006, : 690 - 693
  • [12] MULTITASK LEARNING AND MULTISTAGE FUSION FOR DIMENSIONAL AUDIOVISUAL EMOTION RECOGNITION
    Atmaja, Bagus Tris
    Akagi, Masato
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4482 - 4486
  • [13] DEEP LEARNING FOR ROBUST FEATURE GENERATION IN AUDIOVISUAL EMOTION RECOGNITION
    Kim, Yelin
    Lee, Honglak
    Provost, Emily Mower
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 3687 - 3691
  • [14] Effectiveness of a short audiovisual emotion recognition training program in adults
    Schlegel, Katja
    Vicaria, Ishabel M.
    Isaacowitz, Derek M.
    Hall, Judith A.
    MOTIVATION AND EMOTION, 2017, 41 (05) : 646 - 660
  • [15] Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations
    Wu, Wen
    Zhang, Chao
    Woodland, Philip C.
    INTERSPEECH 2023, 2023, : 3607 - 3611
  • [16] Integrating Recurrence Dynamics for Speech Emotion Recognition
    Tzinis, Efthymios
    Paraskevopoulos, Georgios
    Baziotis, Christos
    Potamianos, Alexandros
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 927 - 931
  • [17] A Probabilistic Fusion Strategy for Audiovisual Emotion Recognition of Sparse and Noisy Data
    Lin, Jen-Chun
    Wu, Chung-Hsien
    Wei, Wen-Li
    1ST INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT 2013), 2013, : 278 - 281
  • [18] Exploring Cross-Modality Affective Reactions for Audiovisual Emotion Recognition
    Mariooryad, Soroosh
    Busso, Carlos
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2013, 4 (02) : 183 - 196
  • [19] Exploiting EEG Signals and Audiovisual Feature Fusion for Video Emotion Recognition
    Xing, Baixi
    Zhang, Hui
    Zhang, Kejun
    Zhang, Lekai
    Wu, Xinda
    Shi, Xiaoying
    Yu, Shanghai
    Zhang, Sanyuan
    IEEE ACCESS, 2019, 7 : 59844 - 59861
  • [20] An audiovisual and contextual approach for categorical and continuous emotion recognition in-the-wild
    Antoniadis, Panagiotis
    Pikoulis, Ioannis
    Filntisis, Panagiotis P.
    Maragos, Petros
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3638 - 3644