Multimodal shared features learning for emotion recognition by enhanced sparse local discriminative canonical correlation analysis

被引:13
|
作者
Fu, Jiamin [1 ]
Mao, Qirong [1 ]
Tu, Juanjuan [2 ]
Zhan, Yongzhao [1 ]
机构
[1] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang, Jiangsu, Peoples R China
[2] Jiangsu Univ Sci & Technol, Sch Comp Sci & Engn, Zhenjiang, Jiangsu, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Multimodal emotion recognition; Multimodal shared feature learning; Multimodal information fusion; Canonical correlation analysis;
D O I
10.1007/s00530-017-0547-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multimodal emotion recognition is a challenging research topic which has recently started to attract the attention of the research community. To better recognize the video users' emotion, the research of multimodal emotion recognition based on audio and video is essential. Multimodal emotion recognition performance heavily depends on finding good shared feature representation. The good shared representation needs to consider two aspects: (1) it has the character of each modality and (2) it can balance the effect of different modalities to make the decision optimal. In the light of these, we propose a novel Enhanced Sparse Local Discriminative Canonical Correlation Analysis approach (En-SLDCCA) to learn the multimodal shared feature representation. The shared feature representation learning involves two stages. In the first stage, we pretrain the Sparse Auto-Encoder with unimodal video (or audio), so that we can obtain the hidden feature representation of video and audio separately. In the second stage, we obtain the correlation coefficients of video and audio using our En-SLDCCA approach, then we form the shared feature representation which fuses the features from video and audio using the correlation coefficients. We evaluate the performance of our method on the challenging multimodal Enterface'05 database. Experimental results reveal that our method is superior to the unimodal video (or audio) and improves significantly the performance for multimodal emotion recognition when compared with the current state of the art.
引用
收藏
页码:451 / 461
页数:11
相关论文
共 50 条
  • [21] K-Means Clustering-based Kernel Canonical Correlation Analysis for Multimodal Emotion Recognition
    Chen, Luefeng
    Wang, Kuanlin
    Wu, Min
    Pedrycz, Witold
    Hirota, Kaoru
    IFAC PAPERSONLINE, 2020, 53 (02): : 10250 - 10254
  • [22] Learning Mutual Correlation in Multimodal Transformer for Speech Emotion Recognition
    Wang, Yuhua
    Shen, Guang
    Xu, Yuezhu
    Li, Jiahang
    Zhao, Zhengdao
    INTERSPEECH 2021, 2021, : 4518 - 4522
  • [23] Multimodal interaction enhanced representation learning for video emotion recognition
    Xia, Xiaohan
    Zhao, Yong
    Jiang, Dongmei
    FRONTIERS IN NEUROSCIENCE, 2022, 16
  • [24] ENHANCED SEMI-SUPERVISED LEARNING FOR MULTIMODAL EMOTION RECOGNITION
    Zhang, Zixing
    Ringeval, Fabien
    Dong, Bin
    Coutinho, Eduardo
    Marchi, Erik
    Schuller, Bjoern
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5185 - 5189
  • [25] MULTIVIEW LEARNING VIA DEEP DISCRIMINATIVE CANONICAL CORRELATION ANALYSIS
    Elmadany, Nour El Din
    He, Yifeng
    Guan, Ling
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2409 - 2413
  • [26] Sparse multiway canonical correlation analysis for multimodal stroke recovery data
    Das, Subham
    West, Franklin D.
    Park, Cheolwoo
    BIOMETRICAL JOURNAL, 2024, 66 (02)
  • [27] Multicamera Action Recognition with Canonical Correlation Analysis and Discriminative Sequence Classification
    Cilla, Rodrigo
    Patricio, Miguel A.
    Berlanga, Antonio
    Molina, Jose M.
    FOUNDATIONS ON NATURAL AND ARTIFICIAL COMPUTATION: 4TH INTERNATIONAL WORK-CONFERENCE ON THE INTERPLAY BETWEEN NATURAL AND ARTIFICIAL COMPUTATION, IWINAC 2011, PART I, 2011, 6686 : 491 - 500
  • [28] Sparse canonical correlation analysis for mobile media recognition on the cloud
    Wang, Yanjiang
    Zhou, Bin
    Liu, Weifeng
    Zhang, Huimin
    Journal of Mobile Multimedia, 2017, 12 (3-4): : 265 - 276
  • [29] INFORMATION FUSION BASED ON KERNEL ENTROPY COMPONENT ANALYSIS IN DISCRIMINATIVE CANONICAL CORRELATION SPACE WITH APPLICATION TO AUDIO EMOTION RECOGNITION
    Gao, Lei
    Qi, Lin
    Guan, Ling
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2817 - 2821
  • [30] Selecting Features with Group-Sparse Nonnegative Supervised Canonical Correlation Analysis: Multimodal Prostate Cancer Prognosis
    Wang, Haibo
    Singanamalli, Asha
    Ginsburg, Shoshana
    Madabhushi, Anant
    MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION - MICCAI 2014, PT III, 2014, 8675 : 385 - 392