A Versatile Multimodal Learning Framework for Zero-Shot Emotion Recognition

被引:1
|
作者
Qi, Fan [1 ]
Zhang, Huaiwen [2 ,3 ]
Yang, Xiaoshan [4 ,5 ,6 ]
Xu, Changsheng [4 ,5 ,6 ]
机构
[1] Tianjin Univ Technol, Sch Comp Sci & Engn, Tianjin 300384, Peoples R China
[2] Inner Mongolia Univ, Coll Comp Sci, Hohhot 010021, Peoples R China
[3] Natl & Local Joint Engn Res Ctr Intelligent Infor, Hohhot 010021, Peoples R China
[4] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing 100190, Peoples R China
[5] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China
[6] Peng Cheng Lab, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal emotion recognition; zero-shot learning; transformer; NETWORKS; MODEL;
D O I
10.1109/TCSVT.2024.3362270
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Multi-modal Emotion Recognition (MER) aims to identify various human emotions from heterogeneous modalities. With the development of emotional theories, there are more and more novel and fine-grained concepts to describe human emotional feelings. Real-world recognition systems often encounter unseen emotion labels. To address this challenge, we propose a versatile zero-shot MER framework to refine emotion label embeddings for capturing inter-label relationships and improving discrimination between labels. We integrate prior knowledge into a novel affective graph space that generates tailored label embeddings capturing inter-label relationships. To obtain multimodal representations, we disentangle the features of each modality into egocentric and altruistic components using adversarial learning. These components are then hierarchically fused using a hybrid co-attention mechanism. Furthermore, an emotion-guided decoder exploits label-modal dependencies to generate adaptive multimodal representations guided by emotion embeddings. We conduct extensive experiments with different multimodal combinations, including visual-acoustic and visual-textual inputs, on four datasets in both single-label and multi-label zero-shot settings. Results demonstrate the superiority of our proposed framework over state-of-the-art methods.
引用
收藏
页码:5728 / 5741
页数:14
相关论文
共 50 条
  • [21] Generalized zero-shot emotion recognition from body gestures
    Jinting Wu
    Yujia Zhang
    Shiying Sun
    Qianzhong Li
    Xiaoguang Zhao
    Applied Intelligence, 2022, 52 : 8616 - 8634
  • [22] Discriminative Learning of Latent Features for Zero-Shot Recognition
    Li, Yan
    Zhang, Junge
    Zhang, Jianguo
    Huang, Kaiqi
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7463 - 7471
  • [23] Zero-shot recognition with latent visual attributes learning
    Xie, Yurui
    He, Xiaohai
    Zhang, Jing
    Luo, Xiaodong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (37-38) : 27321 - 27335
  • [24] Hierarchical Prompt Learning for Compositional Zero-Shot Recognition
    Wang, Henan
    Yang, Muli
    Wei, Kun
    Deng, Cheng
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 1470 - 1478
  • [25] Hyperbolic Visual Embedding Learning for Zero-Shot Recognition
    Liu, Shaoteng
    Chen, Jingjing
    Pan, Liangming
    Ngo, Chong-Wah
    Chua, Tat-Seng
    Jiang, Yu-Gang
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 9270 - 9278
  • [26] Extreme Reverse Projection Learning for Zero-Shot Recognition
    Guan, Jiechao
    Zhao, An
    Lu, Zhiwu
    COMPUTER VISION - ACCV 2018, PT I, 2019, 11361 : 125 - 141
  • [27] A Generalized Zero-Shot Deep Learning Classifier for Emotion Recognition Using Facial Expression Images
    Bhati, Vishal Singh
    Tiwari, Namita
    Chawla, Meenu
    IEEE ACCESS, 2025, 13 : 18687 - 18700
  • [28] Human Motion Recognition Using Zero-Shot Learning
    Mohammadi, Farid Ghareh
    Imteaj, Ahmed
    Amini, M. Hadi
    Arabnia, Hamid R.
    ADVANCES IN ARTIFICIAL INTELLIGENCE AND APPLIED COGNITIVE COMPUTING, 2021, : 171 - 181
  • [29] Dissimilarity Representation Learning for Generalized Zero-Shot Recognition
    Yang, Gang
    Liu, Jinlu
    Xu, Jieping
    Li, Xirong
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 2032 - 2039
  • [30] JOINT PROJECTION AND SUBSPACE LEARNING FOR ZERO-SHOT RECOGNITION
    Liu, Guangzhen
    Guan, Jiechao
    Zhang, Manli
    Zhang, Jianhong
    Wang, Zihao
    Lu, Zhiwu
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1228 - 1233