A novel transformer autoencoder for multi-modal emotion recognition with incomplete data

被引:0
|
作者
Cheng, Cheng [1 ]
Liu, Wenzhe [2 ]
Fan, Zhaoxin [3 ]
Feng, Lin [1 ]
Jia, Ziyu [4 ]
机构
[1] Dalian Univ Technol, Dept Comp Sci & Technol, Dalian, Peoples R China
[2] Huzhou Univ, Sch Informat Engn, Huzhou, Peoples R China
[3] Renmin Univ China, Psyche AI Inc, Beijing, Peoples R China
[4] Univ Chinese Acad Sci, Inst Automat, Chinese Acad Sci, Brainnetome Ctr, Beijing, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Multi-modal signals; Emotion recognition; Incomplete data; Transformer autoencoder; Convolutional encoder;
D O I
10.1016/j.neunet.2024.106111
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi -modal signals have become essential data for emotion recognition since they can represent emotions more comprehensively. However, in real -world environments, it is often impossible to acquire complete data on multi -modal signals, and the problem of missing modalities causes severe performance degradation in emotion recognition. Therefore, this paper represents the first attempt to use a transformer -based architecture, aiming to fill the modality -incomplete data from partially observed data for multi -modal emotion recognition (MER). Concretely, this paper proposes a novel unified model called transformer autoencoder (TAE), comprising a modality -specific hybrid transformer encoder, an inter -modality transformer encoder, and a convolutional decoder. The modality -specific hybrid transformer encoder bridges a convolutional encoder and a transformer encoder, allowing the encoder to learn local and global context information within each particular modality. The inter -modality transformer encoder builds and aligns global cross -modal correlations and models longrange contextual information with different modalities. The convolutional decoder decodes the encoding features to produce more precise recognition. Besides, a regularization term is introduced into the convolutional decoder to force the decoder to fully leverage the complete and incomplete data for emotional recognition of missing data. 96.33%, 95.64%, and 92.69% accuracies are attained on the available data of the DEAP and SEED -IV datasets, and 93.25%, 92.23%, and 81.76% accuracies are obtained on the missing data. Particularly, the model acquires a 5.61% advantage with 70% missing data, demonstrating that the model outperforms some state-of-the-art approaches in incomplete multi -modal learning.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Facial emotion recognition using multi-modal information
    De Silva, LC
    Miyasato, T
    Nakatsu, R
    [J]. ICICS - PROCEEDINGS OF 1997 INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING, VOLS 1-3: THEME: TRENDS IN INFORMATION SYSTEMS ENGINEERING AND WIRELESS MULTIMEDIA COMMUNICATIONS, 1997, : 397 - 401
  • [22] Lightweight multi-modal emotion recognition model based on modal generation
    Liu, Peisong
    Che, Manqiang
    Luo, Jiangchuan
    [J]. 2022 9TH INTERNATIONAL FORUM ON ELECTRICAL ENGINEERING AND AUTOMATION, IFEEA, 2022, : 430 - 435
  • [23] Cross-modal dynamic convolution for multi-modal emotion recognition
    Wen, Huanglu
    You, Shaodi
    Fu, Ying
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 78
  • [24] Human Emotion Estimation Using Multi-Modal Variational AutoEncoder with Time Changes
    Moroto, Yuya
    Maeda, Keisuke
    Ogawa, Takahiro
    Haseyama, Miki
    [J]. 2021 IEEE 3RD GLOBAL CONFERENCE ON LIFE SCIENCES AND TECHNOLOGIES (IEEE LIFETECH 2021), 2021, : 67 - 68
  • [25] Implementation of Multi-modal Speech Emotion Recognition Using Text Data and Audio Signals
    Adesola, Falade
    Adeyinka, Omirinlewo
    Kayode, Akindeji
    Ayodele, Adebiyi
    [J]. 2023 International Conference on Science, Engineering and Business for Sustainable Development Goals, SEB-SDG 2023, 2023,
  • [26] Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review
    Zhang, Jianhua
    Yin, Zhong
    Chen, Peng
    Nichele, Stefano
    [J]. INFORMATION FUSION, 2020, 59 : 103 - 126
  • [28] Convolutional Transformer Fusion Blocks for Multi-Modal Gesture Recognition
    Hampiholi, Basavaraj
    Jarvers, Christian
    Mader, Wolfgang
    Neumann, Heiko
    [J]. IEEE ACCESS, 2023, 11 : 34094 - 34103
  • [29] Contextual and Cross-Modal Interaction for Multi-Modal Speech Emotion Recognition
    Yang, Dingkang
    Huang, Shuai
    Liu, Yang
    Zhang, Lihua
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2093 - 2097
  • [30] Multi-modal fusion network with complementarity and importance for emotion recognition
    Liu, Shuai
    Gao, Peng
    Li, Yating
    Fu, Weina
    Ding, Weiping
    [J]. INFORMATION SCIENCES, 2023, 619 : 679 - 694