Learning Fine-Grained Cross Modality Excitement for Speech Emotion Recognition

被引:7
|
作者
Li, Hang [1 ]
Ding, Wenbiao [1 ]
Wu, Zhongqin [1 ]
Liu, Zitao [1 ]
机构
[1] TAL Educ Grp, Beijing, Peoples R China
来源
基金
国家重点研发计划;
关键词
speech emotion recognition; human-computer interaction; multimodal learning;
D O I
10.21437/Interspeech.2021-158
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Speech emotion recognition is a challenging task because the emotion expression is complex, multimodal and fine-grained. In this paper, we propose a novel multimodal deep learning approach to perform fine-grained emotion recognition from real-life speeches. We design a temporal alignment mean-max pooling mechanism to capture the subtle and fine-grained emotions implied in every utterance. In addition, we propose a cross modality excitement module to conduct sample-specific adjustment on cross modality embeddings and adaptively recalibrate the corresponding values by its aligned latent features from the other modality. Our proposed model is evaluated on two well-known real-world speech emotion recognition datasets. The results demonstrate that our approach is superior on the prediction tasks for multimodal speech utterances, and it outperforms a wide range of baselines in terms of prediction accuracy. Further more, we conduct detailed ablation studies to show that our temporal alignment mean-max pooling mechanism and cross modality excitement significantly contribute to the promising results. In order to encourage the research reproducibility, we make the code publicly available at https://github.com/tal-ai/FG_CME.git.
引用
收藏
页码:3375 / 3379
页数:5
相关论文
共 50 条
  • [31] Learning Convolutional Action Primitives for Fine-grained Action Recognition
    Lea, Colin
    Vidal, Rene
    Hager, Gregory D.
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2016, : 1642 - 1649
  • [32] Attention cutting and padding learning for fine-grained image recognition
    Zhuo Cheng
    Hongjian Li
    Xiaolin Duan
    Xiangyan Zeng
    Mingxuan He
    Hao Luo
    [J]. Multimedia Tools and Applications, 2021, 80 : 32791 - 32805
  • [33] Attention cutting and padding learning for fine-grained image recognition
    Cheng, Zhuo
    Li, Hongjian
    Duan, Xiaolin
    Zeng, Xiangyan
    He, Mingxuan
    Luo, Hao
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (21-23) : 32791 - 32805
  • [34] Learning Fine-Grained Image Representations for Mathematical Expression Recognition
    Bender, Sidney
    Haurilet, Monica
    Roitberg, Alina
    Stiefelhagen, Rainer
    [J]. 2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION WORKSHOPS (ICDARW) AND 13TH IAPR INTERNATIONAL WORKSHOP ON GRAPHICS RECOGNITION (GREC 2019), VOL 1, 2019, : 56 - 61
  • [35] Few-Learning for Fine-Grained Vehicle Model Recognition
    Kezebou, Landry
    Oludare, Victor
    Panetta, Karen
    Agaian, Sos
    [J]. 2021 IEEE VIRTUAL IEEE INTERNATIONAL SYMPOSIUM ON TECHNOLOGIES FOR HOMELAND SECURITY, 2021,
  • [36] Learning discriminative representation with global and fine-grained features for cross-view gait recognition
    Xiao, Jing
    Yang, Huan
    Xie, Kun
    Zhu, Jia
    Zhang, Ji
    [J]. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2022, 7 (02) : 187 - 199
  • [37] EMOTION-CONTROLLABLE SPEECH SYNTHESIS USING EMOTION SOFT LABELS AND FINE-GRAINED PROSODY FACTORS
    Luo, Xuan
    Takamichi, Shinnosuke
    Koriyama, Tomoki
    Saito, Yuki
    Saruwatari, Hiroshi
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 794 - 799
  • [38] Cross-X Learning for Fine-Grained Visual Categorization
    Luo, Wei
    Yang, Xitong
    Mo, Xianjie
    Lu, Yuheng
    Davis, Larry S.
    Li, Jun
    Yang, Jian
    Lim, Ser-Nam
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8241 - 8250
  • [39] Cross-media Deep Fine-grained Correlation Learning
    Zhuo Y.-K.
    Qi J.-W.
    Peng Y.-X.
    [J]. Ruan Jian Xue Bao/Journal of Software, 2019, 30 (04): : 884 - 895
  • [40] CANCEREMO : A Dataset for Fine-Grained Emotion Detection
    Sosea, Tiberiu
    Caragea, Cornelia
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 8892 - 8904