Speech emotion recognition based on transfer learning from the FaceNet frameworka)

被引:21
|
作者
Liu, Shuhua [1 ]
Zhang, Mengyu [1 ]
Fang, Ming [1 ]
Zhao, Jianwei [1 ]
Hou, Kun [1 ]
Hung, Chih-Cheng [2 ]
机构
[1] Northeast Normal Univ, Changchun 130117, Jilin, Peoples R China
[2] Kennesaw State Univ, Coll Comp & Software Engn, Marietta, GA 30060 USA
来源
关键词
D O I
10.1121/10.0003530
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech plays an important role in human-computer emotional interaction. FaceNet used in face recognition achieves great success due to its excellent feature extraction. In this study, we adopt the FaceNet model and improve it for speech emotion recognition. To apply this model for our work, speech signals are divided into segments at a given time interval, and the signal segments are transformed into a discrete waveform diagram and spectrogram. Subsequently, the waveform and spectrogram are separately fed into FaceNet for end-to-end training. Our empirical study shows that the pretraining is effective on the spectrogram for FaceNet. Hence, we pretrain the network on the CASIA dataset and then fine-tune it on the IEMOCAP dataset with waveforms. It will derive the maximum transfer learning knowledge from the CASIA dataset due to its high accuracy. This high accuracy may be due to its clean signals. Our preliminary experimental results show an accuracy of 68.96% and 90% on the emotion benchmark datasets IEMOCAP and CASIA, respectively. The cross-training is then conducted on the dataset, and comprehensive experiments are performed. Experimental results indicate that the proposed approach outperforms state-of-the-art methods on the IEMOCAP dataset among single modal approaches.
引用
收藏
页码:1338 / 1345
页数:8
相关论文
共 50 条
  • [41] Cooperative Learning and its Application to Emotion Recognition from Speech
    Zhang, Zixing
    Coutinho, Eduardo
    Deng, Jun
    Schuller, Bjoern
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (01) : 115 - 126
  • [42] Emotion recognition from speech using deep learning on spectrograms
    Li, Xingguang
    Song, Wenjun
    Liang, Zonglin
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (03) : 2791 - 2796
  • [43] Speech emotion recognition based on emotion perception
    Gang Liu
    Shifang Cai
    Ce Wang
    EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [44] Speech emotion recognition based on emotion perception
    Liu, Gang
    Cai, Shifang
    Wang, Ce
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [45] EmoNet: A Transfer Learning Framework for Multi-Corpus Speech Emotion Recognition
    Gerczuk, Maurice
    Amiriparian, Shahin
    Ottl, Sandra
    Schuller, Bjorn W. W.
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (02) : 1472 - 1487
  • [46] Speech Emotion Recognition Using Deep Learning Transfer Models and Explainable Techniques
    Kim, Tae-Wan
    Kwak, Keun-Chang
    APPLIED SCIENCES-BASEL, 2024, 14 (04):
  • [47] Unsupervised Transfer Components Learning for Cross-Domain Speech Emotion Recognition
    Jiang, Shenjie
    Song, Peng
    Li, Shaokai
    Zhao, Keke
    Zheng, Wenming
    INTERSPEECH 2023, 2023, : 4538 - 4542
  • [48] Transfer Linear Subspace Learning for Cross-Corpus Speech Emotion Recognition
    Song, Peng
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (02) : 265 - 275
  • [49] Transfer Subspace Learning for Unsupervised Cross-Corpus Speech Emotion Recognition
    Liu, Na
    Zhang, Baofeng
    Liu, Bin
    Shi, Jingang
    Yang, Lei
    Li, Zhiwei
    Zhu, Junchao
    IEEE ACCESS, 2021, 9 : 95925 - 95937
  • [50] Nonnegative Matrix Factorization Based Transfer Subspace Learning for Cross-Corpus Speech Emotion Recognition
    Luo, Hui
    Han, Jiqing
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2047 - 2060