Learning Fine-Grained Cross Modality Excitement for Speech Emotion Recognition

被引:7
|
作者
Li, Hang [1 ]
Ding, Wenbiao [1 ]
Wu, Zhongqin [1 ]
Liu, Zitao [1 ]
机构
[1] TAL Educ Grp, Beijing, Peoples R China
来源
基金
国家重点研发计划;
关键词
speech emotion recognition; human-computer interaction; multimodal learning;
D O I
10.21437/Interspeech.2021-158
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Speech emotion recognition is a challenging task because the emotion expression is complex, multimodal and fine-grained. In this paper, we propose a novel multimodal deep learning approach to perform fine-grained emotion recognition from real-life speeches. We design a temporal alignment mean-max pooling mechanism to capture the subtle and fine-grained emotions implied in every utterance. In addition, we propose a cross modality excitement module to conduct sample-specific adjustment on cross modality embeddings and adaptively recalibrate the corresponding values by its aligned latent features from the other modality. Our proposed model is evaluated on two well-known real-world speech emotion recognition datasets. The results demonstrate that our approach is superior on the prediction tasks for multimodal speech utterances, and it outperforms a wide range of baselines in terms of prediction accuracy. Further more, we conduct detailed ablation studies to show that our temporal alignment mean-max pooling mechanism and cross modality excitement significantly contribute to the promising results. In order to encourage the research reproducibility, we make the code publicly available at https://github.com/tal-ai/FG_CME.git.
引用
收藏
页码:3375 / 3379
页数:5
相关论文
共 50 条
  • [1] Fine-Grained Grounding for Multimodal Speech Recognition
    Srinivasan, Tejas
    Sanabria, Ramon
    Metze, Florian
    Elliott, Desmond
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2667 - 2677
  • [2] Affective Computing for Social Companion Robots Using Fine-grained Speech Emotion Recognition
    Ahuja, Saransh
    Shabani, Amir
    [J]. 2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 331 - 332
  • [3] Fine-Grained Crowdsourcing for Fine-Grained Recognition
    Jia Deng
    Krause, Jonathan
    Li Fei-Fei
    [J]. 2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 580 - 587
  • [4] Towards Fine-Grained Recognition: Joint Learning for Object Detection and Fine-Grained Classification
    Wang, Qiaosong
    Rasmussen, Christopher
    [J]. ADVANCES IN VISUAL COMPUTING, ISVC 2019, PT II, 2019, 11845 : 332 - 344
  • [5] MPAF-CNN: Multiperspective aware and fine-grained fusion strategy for speech emotion recognition
    Li, Guoyan
    Hou, Junjie
    Liu, Yi
    Wei, Jianguo
    [J]. APPLIED ACOUSTICS, 2023, 214
  • [6] Learning Features and Parts for Fine-Grained Recognition
    Krause, Jonathan
    Gebru, Timnit
    Deng, Jia
    Li, Li-Jia
    Li Fei-Fei
    [J]. 2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 26 - 33
  • [7] Learning to locate for fine-grained image recognition
    Chen, Jiamin
    Hu, Jianguo
    Li, Shiren
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 206
  • [8] Few-Shot Learning for Fine-Grained Emotion Recognition Using Physiological Signals
    Zhang, Tianyi
    El Ali, Abdallah
    Hanjalic, Alan
    Cesar, Pablo
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 3773 - 3787
  • [9] Weakly-Supervised Learning for Fine-Grained Emotion Recognition Using Physiological Signals
    Zhang, Tianyi
    El Ali, Abdallah
    Wang, Chen
    Hanjalic, Alan
    Cesar, Pablo
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 2304 - 2322
  • [10] Incremental Learning for Fine-Grained Image Recognition
    Cao, Liangliang
    Hsiao, Jenhao
    de Juan, Paloma
    Li, Yuncheng
    Thomee, Bart
    [J]. ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 363 - 366