Multimodal shared features learning for emotion recognition by enhanced sparse local discriminative canonical correlation analysis

被引：13

作者：

Fu, Jiamin ^{[1
]}

Mao, Qirong ^{[1
]}

Tu, Juanjuan ^{[2
]}

Zhan, Yongzhao ^{[1
]}

机构：

[1] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang, Jiangsu, Peoples R China

[2] Jiangsu Univ Sci & Technol, Sch Comp Sci & Engn, Zhenjiang, Jiangsu, Peoples R China

来源：

MULTIMEDIA SYSTEMS | 2019年 / 25卷 / 05期

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

Multimodal emotion recognition; Multimodal shared feature learning; Multimodal information fusion; Canonical correlation analysis;

D O I：

10.1007/s00530-017-0547-8

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multimodal emotion recognition is a challenging research topic which has recently started to attract the attention of the research community. To better recognize the video users' emotion, the research of multimodal emotion recognition based on audio and video is essential. Multimodal emotion recognition performance heavily depends on finding good shared feature representation. The good shared representation needs to consider two aspects: (1) it has the character of each modality and (2) it can balance the effect of different modalities to make the decision optimal. In the light of these, we propose a novel Enhanced Sparse Local Discriminative Canonical Correlation Analysis approach (En-SLDCCA) to learn the multimodal shared feature representation. The shared feature representation learning involves two stages. In the first stage, we pretrain the Sparse Auto-Encoder with unimodal video (or audio), so that we can obtain the hidden feature representation of video and audio separately. In the second stage, we obtain the correlation coefficients of video and audio using our En-SLDCCA approach, then we form the shared feature representation which fuses the features from video and audio using the correlation coefficients. We evaluate the performance of our method on the challenging multimodal Enterface'05 database. Experimental results reveal that our method is superior to the unimodal video (or audio) and improves significantly the performance for multimodal emotion recognition when compared with the current state of the art.

引用

页码：451 / 461

页数：11

共 50 条

[1] Multimodal shared features learning for emotion recognition by enhanced sparse local discriminative canonical correlation analysis
Jiamin Fu
Qirong Mao
Juanjuan Tu
Yongzhao Zhan
Multimedia Systems, 2019, 25 : 451 - 461
[2] Multimodal emotion recognition based on kernel canonical correlation analysis
Li, Bo
Qi, Lin
Gao, Lei
2014 IEEE WORKSHOP ON ELECTRONICS, COMPUTER AND APPLICATIONS, 2014, : 934 - 937
[3] Canonical Correlation Analysis for Data Fusion in Multimodal Emotion Recognition
Nemati, Shahla
2018 9TH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2018, : 676 - 681
[4] Sparse Representation based Discriminative Canonical Correlation Analysis for Face Recognition
Guan, Naiyang
Zhang, Xiang
Luo, Zhigang
Lan, Long
2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 51 - 56
[5] Canonical correlation analysis based on local sparse representation and linear discriminative analysis
Xia, J.-M. (jianmingeei@163.com), 1600, Northeast University (29):
[6] Research on Feature Fusion for Emotion Recognition Based on Discriminative Canonical Correlation Analysis
ChuqiLiu
Li, Chao
ZipingZhao
PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON MATHEMATICS AND ARTIFICIAL INTELLIGENCE (ICMAI 2018), 2018, : 30 - 36
[7] Feature Fusion for Multimodal Emotion Recognition Based on Deep Canonical Correlation Analysis
Zhang, Ke
Li, Yuanqing
Wang, Jingyu
Wang, Zhen
Li, Xuelong
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1898 - 1902
[8] Multi-mode Emotion Recognition Based on Generalized Discriminative Canonical Correlation Analysis
Chen, Lijiang
Dou, Wentao
Mao, Xia
2018 INTERNATIONAL CONFERENCE ON SENSORS, SIGNAL AND IMAGE PROCESSING (SSIP 2018), 2018, : 18 - 23
[9] Learning Sparse and Discriminative Multimodal Feature Codes for Finger Recognition
Li, Shuyi
Zhang, Bob
Fei, Lunke
Zhao, Shuping
Zhou, Yicong
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 805 - 815
[10] Multimodal Emotion Recognition Using Deep Generalized Canonical Correlation Analysis with an Attention Mechanism
Lan, Yu-Ting
Liu, Wei
Lu, Bao-Liang
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,

← 1 2 3 4 5 →