A novel transformer autoencoder for multi-modal emotion recognition with incomplete data

被引:0
|
作者
Cheng, Cheng [1 ]
Liu, Wenzhe [2 ]
Fan, Zhaoxin [3 ]
Feng, Lin [1 ]
Jia, Ziyu [4 ]
机构
[1] Dalian Univ Technol, Dept Comp Sci & Technol, Dalian, Peoples R China
[2] Huzhou Univ, Sch Informat Engn, Huzhou, Peoples R China
[3] Renmin Univ China, Psyche AI Inc, Beijing, Peoples R China
[4] Univ Chinese Acad Sci, Inst Automat, Chinese Acad Sci, Brainnetome Ctr, Beijing, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Multi-modal signals; Emotion recognition; Incomplete data; Transformer autoencoder; Convolutional encoder;
D O I
10.1016/j.neunet.2024.106111
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi -modal signals have become essential data for emotion recognition since they can represent emotions more comprehensively. However, in real -world environments, it is often impossible to acquire complete data on multi -modal signals, and the problem of missing modalities causes severe performance degradation in emotion recognition. Therefore, this paper represents the first attempt to use a transformer -based architecture, aiming to fill the modality -incomplete data from partially observed data for multi -modal emotion recognition (MER). Concretely, this paper proposes a novel unified model called transformer autoencoder (TAE), comprising a modality -specific hybrid transformer encoder, an inter -modality transformer encoder, and a convolutional decoder. The modality -specific hybrid transformer encoder bridges a convolutional encoder and a transformer encoder, allowing the encoder to learn local and global context information within each particular modality. The inter -modality transformer encoder builds and aligns global cross -modal correlations and models longrange contextual information with different modalities. The convolutional decoder decodes the encoding features to produce more precise recognition. Besides, a regularization term is introduced into the convolutional decoder to force the decoder to fully leverage the complete and incomplete data for emotional recognition of missing data. 96.33%, 95.64%, and 92.69% accuracies are attained on the available data of the DEAP and SEED -IV datasets, and 93.25%, 92.23%, and 81.76% accuracies are obtained on the missing data. Particularly, the model acquires a 5.61% advantage with 70% missing data, demonstrating that the model outperforms some state-of-the-art approaches in incomplete multi -modal learning.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] A novel transformer autoencoder for multi-modal emotion recognition with incomplete data
    Cheng, Cheng
    Liu, Wenzhe
    Fan, Zhaoxin
    Feng, Lin
    Jia, Ziyu
    [J]. Neural Networks, 2024, 172
  • [2] A Unified Biosensor–Vision Multi-Modal Transformer network for emotion recognition
    Ali, Kamran
    Hughes, Charles E.
    [J]. Biomedical Signal Processing and Control, 2025, 102
  • [3] Multi-Modal Domain Adaptation Variational Autoencoder for EEG-Based Emotion Recognition
    Yixin Wang
    Shuang Qiu
    Dan Li
    Changde Du
    Bao-Liang Lu
    Huiguang He
    [J]. IEEE/CAA Journal of Automatica Sinica, 2022, 9 (09) : 1612 - 1626
  • [4] A novel signal channel attention network for multi-modal emotion recognition
    Du, Ziang
    Ye, Xia
    Zhao, Pujie
    [J]. FRONTIERS IN NEUROROBOTICS, 2024, 18
  • [5] Multi-modal Attention for Speech Emotion Recognition
    Pan, Zexu
    Luo, Zhaojie
    Yang, Jichen
    Li, Haizhou
    [J]. INTERSPEECH 2020, 2020, : 364 - 368
  • [6] An End-to-End Transformer with Progressive Tri-Modal Attention for Multi-modal Emotion Recognition
    Wu, Yang
    Peng, Pai
    Zhang, Zhenyu
    Zhao, Yanyan
    Qin, Bing
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII, 2024, 14431 : 396 - 408
  • [7] Towards Efficient Multi-Modal Emotion Recognition
    Dobrisek, Simon
    Gajsek, Rok
    Mihelic, France
    Pavesic, Nikola
    Struc, Vitomir
    [J]. INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2013, 10
  • [8] Emotion Recognition from Multi-Modal Information
    Wu, Chung-Hsien
    Lin, Jen-Chun
    Wei, Wen-Li
    Cheng, Kuan-Chun
    [J]. 2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [9] Multi-modal Emotion Recognition Based on Hypergraph
    Zong, Lin-Lin
    Zhou, Jia-Hui
    Xie, Qiu-Jie
    Zhang, Xian-Chao
    Xu, Bo
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2023, 46 (12): : 2520 - 2534
  • [10] Evaluation and Discussion of Multi-modal Emotion Recognition
    Rabie, Ahmad
    Wrede, Britta
    Vogt, Thurid
    Hanheide, Marc
    [J]. SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND ELECTRICAL ENGINEERING, VOL 1, PROCEEDINGS, 2009, : 598 - +