Speech-Visual Emotion Recognition via Modal Decomposition Learning

被引:0
|
作者
Bai, Lei [1 ]
Chang, Rui [1 ]
Chen, Guanghui [2 ]
Zhou, Yu [1 ]
机构
[1] North China Univ Water Resources & Elect Power, Sch Elect Engn, Zhengzhou 450000, Peoples R China
[2] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Visualization; Speech recognition; Mel frequency cepstral coefficient; Emotion recognition; Data mining; Three-dimensional displays; modal decomposition; speech modality; visual modality; FEATURES; FUSION;
D O I
10.1109/LSP.2023.3324294
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
It is becoming a mainstream feature fusion approach for speech-visual emotion recognition (SVER) by directly using neural networks to fuse the extracted speech and visual features. However, the heterogeneity between speech and visual modalities usually results in a distribution gap and information redundancy between the extracted speech and visual features, thus affecting the performance of the SVER. To this end, this letter proposes a SVER method based on the modal decomposition learning. It leverages the shared, private and reconstructed modal learning with a specifically designed loss to decompose the extracted speech and visual features into the shared and private subspaces to obtain the shared and private features, which effectively reduces the distribution gap and information redundancy between the extracted speech and visual features. Experiments on the BAUM-1 s, RAVDESS and eNTERFACE05 datasets also show that the proposed method achieves a better result.
引用
收藏
页码:1452 / 1456
页数:5
相关论文
共 50 条
  • [1] Multi-Modal Emotion Recognition by Fusing Correlation Features of Speech-Visual
    Chen Guanghui
    Zeng Xiaoping
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 533 - 537
  • [2] Speech-Visual Emotion Recognition by Fusing Shared and Specific Features
    Chen, Guanghui
    Jiao, Shuang
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 678 - 682
  • [3] Speech emotion recognition via learning analogies
    Ntalampiras, Stavros
    PATTERN RECOGNITION LETTERS, 2021, 144 : 21 - 26
  • [4] Spontaneous speech emotion recognition via multiple kernel learning
    Zha, Cheng
    Yang, Ping
    Zhang, Xinran
    Zhao, Li
    PROCEEDINGS 2016 EIGHTH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION ICMTMA 2016, 2016, : 621 - 623
  • [5] Electroglottograph-Based Speech Emotion Recognition via Cross-Modal Distillation
    Chen, Lijiang
    Ren, Jie
    Mao, Xia
    Zhao, Qi
    APPLIED SCIENCES-BASEL, 2022, 12 (09):
  • [6] Speech Emotion Recognition via Multi-Level Cross-Modal Distillation
    Li, Ruichen
    Zhao, Jinming
    Jin, Qin
    INTERSPEECH 2021, 2021, : 4488 - 4492
  • [7] Multi-modal Attention for Speech Emotion Recognition
    Pan, Zexu
    Luo, Zhaojie
    Yang, Jichen
    Li, Haizhou
    INTERSPEECH 2020, 2020, : 364 - 368
  • [8] Representation Learning for Speech Emotion Recognition
    Ghosh, Sayan
    Laksana, Eugene
    Morency, Louis-Philippe
    Scherer, Stefan
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3603 - 3607
  • [9] Speech Emotion Recognition with Deep Learning
    Harar, Pavol
    Burget, Radim
    Dutta, Malay Kishore
    2017 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2017, : 137 - 140
  • [10] Transfer Learning for Speech Emotion Recognition
    Han Zhijie
    Zhao, Huijuan
    Wang, Ruchuan
    2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 96 - 99