A novel transformer autoencoder for multi-modal emotion recognition with incomplete data

被引：0

作者：

Cheng, Cheng ^{[1
]}

Liu, Wenzhe ^{[2
]}

Fan, Zhaoxin ^{[3
]}

Feng, Lin ^{[1
]}

Jia, Ziyu ^{[4
]}

机构：

[1] Dalian Univ Technol, Dept Comp Sci & Technol, Dalian, Peoples R China

[2] Huzhou Univ, Sch Informat Engn, Huzhou, Peoples R China

[3] Renmin Univ China, Psyche AI Inc, Beijing, Peoples R China

[4] Univ Chinese Acad Sci, Inst Automat, Chinese Acad Sci, Brainnetome Ctr, Beijing, Peoples R China

来源：

NEURAL NETWORKS | 2024年 / 172卷

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

Multi-modal signals; Emotion recognition; Incomplete data; Transformer autoencoder; Convolutional encoder;

D O I：

10.1016/j.neunet.2024.106111

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi -modal signals have become essential data for emotion recognition since they can represent emotions more comprehensively. However, in real -world environments, it is often impossible to acquire complete data on multi -modal signals, and the problem of missing modalities causes severe performance degradation in emotion recognition. Therefore, this paper represents the first attempt to use a transformer -based architecture, aiming to fill the modality -incomplete data from partially observed data for multi -modal emotion recognition (MER). Concretely, this paper proposes a novel unified model called transformer autoencoder (TAE), comprising a modality -specific hybrid transformer encoder, an inter -modality transformer encoder, and a convolutional decoder. The modality -specific hybrid transformer encoder bridges a convolutional encoder and a transformer encoder, allowing the encoder to learn local and global context information within each particular modality. The inter -modality transformer encoder builds and aligns global cross -modal correlations and models longrange contextual information with different modalities. The convolutional decoder decodes the encoding features to produce more precise recognition. Besides, a regularization term is introduced into the convolutional decoder to force the decoder to fully leverage the complete and incomplete data for emotional recognition of missing data. 96.33%, 95.64%, and 92.69% accuracies are attained on the available data of the DEAP and SEED -IV datasets, and 93.25%, 92.23%, and 81.76% accuracies are obtained on the missing data. Particularly, the model acquires a 5.61% advantage with 70% missing data, demonstrating that the model outperforms some state-of-the-art approaches in incomplete multi -modal learning.

引用

页数：12

共 50 条

[1] A novel transformer autoencoder for multi-modal emotion recognition with incomplete data
Cheng, Cheng
Liu, Wenzhe
Fan, Zhaoxin
Feng, Lin
Jia, Ziyu
[J]. Neural Networks, 2024, 172
[2] A Unified Biosensor–Vision Multi-Modal Transformer network for emotion recognition
Ali, Kamran
Hughes, Charles E.
[J]. Biomedical Signal Processing and Control, 2025, 102
[3] Multi-Modal Domain Adaptation Variational Autoencoder for EEG-Based Emotion Recognition
Yixin Wang
Shuang Qiu
Dan Li
Changde Du
Bao-Liang Lu
Huiguang He
[J]. IEEE/CAA Journal of Automatica Sinica, 2022, 9 (09) : 1612 - 1626
[4] A novel signal channel attention network for multi-modal emotion recognition
Du, Ziang
Ye, Xia
Zhao, Pujie
[J]. FRONTIERS IN NEUROROBOTICS, 2024, 18
[5] Multi-modal Attention for Speech Emotion Recognition
Pan, Zexu
Luo, Zhaojie
Yang, Jichen
Li, Haizhou
[J]. INTERSPEECH 2020, 2020, : 364 - 368
[6] An End-to-End Transformer with Progressive Tri-Modal Attention for Multi-modal Emotion Recognition
Wu, Yang
Peng, Pai
Zhang, Zhenyu
Zhao, Yanyan
Qin, Bing
[J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII, 2024, 14431 : 396 - 408
[7] Towards Efficient Multi-Modal Emotion Recognition
Dobrisek, Simon
Gajsek, Rok
Mihelic, France
Pavesic, Nikola
Struc, Vitomir
[J]. INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2013, 10
[8] Emotion Recognition from Multi-Modal Information
Wu, Chung-Hsien
Lin, Jen-Chun
Wei, Wen-Li
Cheng, Kuan-Chun
[J]. 2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
[9] Multi-modal Emotion Recognition Based on Hypergraph
Zong, Lin-Lin
Zhou, Jia-Hui
Xie, Qiu-Jie
Zhang, Xian-Chao
Xu, Bo
[J]. Jisuanji Xuebao/Chinese Journal of Computers, 2023, 46 (12): : 2520 - 2534
[10] Evaluation and Discussion of Multi-modal Emotion Recognition
Rabie, Ahmad
Wrede, Britta
Vogt, Thurid
Hanheide, Marc
[J]. SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND ELECTRICAL ENGINEERING, VOL 1, PROCEEDINGS, 2009, : 598 - +

← 1 2 3 4 5 →