Multimodal shared features learning for emotion recognition by enhanced sparse local discriminative canonical correlation analysis

被引:13
|
作者
Fu, Jiamin [1 ]
Mao, Qirong [1 ]
Tu, Juanjuan [2 ]
Zhan, Yongzhao [1 ]
机构
[1] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang, Jiangsu, Peoples R China
[2] Jiangsu Univ Sci & Technol, Sch Comp Sci & Engn, Zhenjiang, Jiangsu, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Multimodal emotion recognition; Multimodal shared feature learning; Multimodal information fusion; Canonical correlation analysis;
D O I
10.1007/s00530-017-0547-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multimodal emotion recognition is a challenging research topic which has recently started to attract the attention of the research community. To better recognize the video users' emotion, the research of multimodal emotion recognition based on audio and video is essential. Multimodal emotion recognition performance heavily depends on finding good shared feature representation. The good shared representation needs to consider two aspects: (1) it has the character of each modality and (2) it can balance the effect of different modalities to make the decision optimal. In the light of these, we propose a novel Enhanced Sparse Local Discriminative Canonical Correlation Analysis approach (En-SLDCCA) to learn the multimodal shared feature representation. The shared feature representation learning involves two stages. In the first stage, we pretrain the Sparse Auto-Encoder with unimodal video (or audio), so that we can obtain the hidden feature representation of video and audio separately. In the second stage, we obtain the correlation coefficients of video and audio using our En-SLDCCA approach, then we form the shared feature representation which fuses the features from video and audio using the correlation coefficients. We evaluate the performance of our method on the challenging multimodal Enterface'05 database. Experimental results reveal that our method is superior to the unimodal video (or audio) and improves significantly the performance for multimodal emotion recognition when compared with the current state of the art.
引用
收藏
页码:451 / 461
页数:11
相关论文
共 50 条
  • [31] Learning deep multimodal affective features for spontaneous speech emotion recognition
    Zhang, Shiqing
    Tao, Xin
    Chuang, Yuelong
    Zhao, Xiaoming
    SPEECH COMMUNICATION, 2021, 127 : 73 - 81
  • [32] Learning emotion-discriminative and domain-invariant features for domain adaptation in speech emotion recognition
    Mao, Qirong
    Xu, Guopeng
    Xue, Wentao
    Gou, Jianping
    Zhan, Yongzhao
    SPEECH COMMUNICATION, 2017, 93 : 1 - 10
  • [33] An Improved Kernelized Discriminative Canonical Correlation Analysis and Its Application to Gait Recognition
    Wang, Kejun
    Yan, Tao
    PROCEEDINGS OF THE 10TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA 2012), 2012, : 4869 - 4874
  • [34] Selecting Discriminative Features with Discriminative Multiple Canonical Correlation Analysis for Multi-Feature Information Fusion
    Gao, Lei
    Qi, Lin
    Guan, Ling
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE OF THE BIOMETRICS SPECIAL INTEREST GROUP (BIOSIG 2013), 2013,
  • [35] Sparse Bayesian multiway canonical correlation analysis for EEG pattern recognition
    Zhang, Yu
    Zhou, Guoxu
    Jin, Jing
    Zhang, Yangsong
    Wang, Xingyu
    Cichocki, Andrzej
    NEUROCOMPUTING, 2017, 225 : 103 - 110
  • [36] Sparse tensor canonical correlation analysis for micro-expression recognition
    Wang, Su-Jing
    Yan, Wen-Jing
    Sun, Tingkai
    Zhao, Guoying
    Fu, Xiaolan
    NEUROCOMPUTING, 2016, 214 : 218 - 232
  • [37] Sparse additive discriminant canonical correlation analysis for multiple features fusion
    Wang, Zhan
    Wang, Lizhi
    Huang, Hua
    NEUROCOMPUTING, 2021, 463 : 185 - 197
  • [38] Emotion Recognition From Multimodal Physiological Signals via Discriminative Correlation Fusion With a Temporal Alignment Mechanism
    Hou, Kechen
    Zhang, Xiaowei
    Yang, Yikun
    Zhao, Qiqi
    Yuan, Wenjie
    Zhou, Zhongyi
    Zhang, Sipo
    Li, Chen
    Shen, Jian
    Hu, Bin
    IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (05) : 3079 - 3092
  • [39] Learning Discriminative Features using Center Loss and Reconstruction as Regularizer for Speech Emotion Recognition
    Tripathi, Suraj
    Ramesh, Abhiram
    Kumar, Abhay
    Singh, Chirag
    Yenigalla, Promod
    WORKSHOP ON ARTIFICIAL INTELLIGENCE IN AFFECTIVE COMPUTING, VOL 122, 2019, 122 : 44 - 53
  • [40] LEARNING DISCRIMINATIVE FEATURES FROM SPECTROGRAMS USING CENTER LOSS FOR SPEECH EMOTION RECOGNITION
    Dai, Dongyang
    Wu, Zhiyong
    Li, Runnan
    Wu, Xixin
    Jia, Jia
    Meng, Helen
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7405 - 7409