A Versatile Multimodal Learning Framework for Zero-Shot Emotion Recognition

被引:1
|
作者
Qi, Fan [1 ]
Zhang, Huaiwen [2 ,3 ]
Yang, Xiaoshan [4 ,5 ,6 ]
Xu, Changsheng [4 ,5 ,6 ]
机构
[1] Tianjin Univ Technol, Sch Comp Sci & Engn, Tianjin 300384, Peoples R China
[2] Inner Mongolia Univ, Coll Comp Sci, Hohhot 010021, Peoples R China
[3] Natl & Local Joint Engn Res Ctr Intelligent Infor, Hohhot 010021, Peoples R China
[4] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing 100190, Peoples R China
[5] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China
[6] Peng Cheng Lab, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal emotion recognition; zero-shot learning; transformer; NETWORKS; MODEL;
D O I
10.1109/TCSVT.2024.3362270
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Multi-modal Emotion Recognition (MER) aims to identify various human emotions from heterogeneous modalities. With the development of emotional theories, there are more and more novel and fine-grained concepts to describe human emotional feelings. Real-world recognition systems often encounter unseen emotion labels. To address this challenge, we propose a versatile zero-shot MER framework to refine emotion label embeddings for capturing inter-label relationships and improving discrimination between labels. We integrate prior knowledge into a novel affective graph space that generates tailored label embeddings capturing inter-label relationships. To obtain multimodal representations, we disentangle the features of each modality into egocentric and altruistic components using adversarial learning. These components are then hierarchically fused using a hybrid co-attention mechanism. Furthermore, an emotion-guided decoder exploits label-modal dependencies to generate adaptive multimodal representations guided by emotion embeddings. We conduct extensive experiments with different multimodal combinations, including visual-acoustic and visual-textual inputs, on four datasets in both single-label and multi-label zero-shot settings. Results demonstrate the superiority of our proposed framework over state-of-the-art methods.
引用
收藏
页码:5728 / 5741
页数:14
相关论文
共 50 条
  • [1] Multimodal zero-shot learning for tactile texture recognition ☆
    Cao, Guanqun
    Jiang, Jiaqi
    Bollegala, Danushka
    Li, Min
    Luo, Shan
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2024, 176
  • [2] Autonomous Emotion Learning in Speech: A View of Zero-Shot Speech Emotion Recognition
    Xu, Xinzhou
    Deng, Jun
    Cummins, Nicholas
    Zhang, Zixing
    Zhao, Li
    Schuller, Bjorn W.
    INTERSPEECH 2019, 2019, : 949 - 953
  • [3] A review on multimodal zero-shot learning
    Cao, Weipeng
    Wu, Yuhao
    Sun, Yixuan
    Zhang, Haigang
    Ren, Jin
    Gu, Dujuan
    Wang, Xingkai
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 13 (02)
  • [4] Novel multimodal contrast learning framework using zero-shot prediction for abnormal behavior recognition
    Liu, Hai Chuan
    Khairuddin, Anis Salwa Mohd
    Chuah, Joon Huang
    Zhao, Xian Min
    Wang, Xiao Dan
    Fang, Li Ming
    Kong, Si Bo
    APPLIED INTELLIGENCE, 2025, 55 (02)
  • [5] Zero-shot Learning Using Multimodal Descriptions
    Mall, Utkarsh
    Hariharan, Bharath
    Bala, Kavita
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 3930 - 3938
  • [6] Zero-Shot Visual Emotion Recognition by Exploiting BERT
    Kang, Hyunwook
    Hazarika, Devamanyu
    Kim, Dongho
    Kim, Jihie
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 2, 2023, 543 : 485 - 494
  • [7] An Adversarial Learning Framework for Zero-shot Fault Recognition of Mechanical Systems
    Chen, Jinglong
    Pan, Tongyang
    Zhou, Zitong
    He, Shuilong
    2019 IEEE 17TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2019, : 1275 - 1278
  • [8] ZeroEVNet: A multimodal zero-shot learning framework for scalable emergency vehicle detection
    Ravi, Reeta
    Kanniappan, Jayashree
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 275
  • [9] Zero-shot Video Emotion Recognition via Multimodal Protagonist-aware Transformer Network
    Qi, Fan
    Yang, Xiaoshan
    Xu, Changsheng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1074 - 1083
  • [10] Integrative zero-shot learning for fruit recognition
    Tran-Anh, Dat
    Huu, Quynh Nguyen
    Bui-Quoc, Bao
    Hoang, Ngan Dao
    Quoc, Tao Ngo
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (29) : 73191 - 73213