Speaker to Emotion: Domain Adaptation for Speech Emotion Recognition with Residual Adapters

被引:0
|
作者
Xi, Yuxuan [1 ]
Li, Pengcheng [1 ]
Song, Yan [1 ]
Jiang, Yiheng [1 ]
Dai, Lirong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/apsipaasc47483.2019.9023339
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Despite considerable recent progress in deep learning methods for speech emotion recognition (SER), performance is severely restricted by the lack of large-scale labeled speech emotion corpora. For instance, it is difficult to employ complex neural network architectures such as ResNet, which accompanied by large-sale corpora like VoxCeleb and NIST SRE, have proven to perform well for the related speaker verification (SV) task. In this paper, a novel domain adaptation method is proposed for the speech emotion recognition (SER) task, which aims to transfer related information from a speaker corpus to an emotion corpus. Specifically, a residual adapter architecture is designed for the SER task where ResNet acts as a universal model for general information extraction. An adapter module then trains limited additional parameters to focus on modeling deviation for the specific SER task. To evaluate the effectiveness of the proposed method, we conduct extensive evaluations on benchmark IEMOCAP and CHEAVD 2.0 corpora. Results show significant improvement, with overall results in each task outperforming or matching state-of-the-art methods.
引用
收藏
页码:513 / 518
页数:6
相关论文
共 50 条
  • [1] SUPERVISED DOMAIN ADAPTATION FOR EMOTION RECOGNITION FROM SPEECH
    Abdelwahab, Mohammed
    Busso, Carlos
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5058 - 5062
  • [2] Adversarial Domain Adaptation for Noisy Speech Emotion Recognition
    Cho, Sunyoung
    Yoon, Soosung
    Song, Hyunseung
    [J]. 2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 1966 - 1970
  • [3] Speaker Attentive Speech Emotion Recognition
    Le Moine, Clement
    Obin, Nicolas
    Roebel, Axel
    [J]. INTERSPEECH 2021, 2021, : 2866 - 2870
  • [4] Speaker Awareness for Speech Emotion Recognition
    Assuncao, Gustavo
    Menezes, Paulo
    Perdigao, Fernando
    [J]. INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2020, 16 (04) : 15 - 22
  • [5] Unsupervised domain adaptation for speech emotion recognition using PCANet
    Huang, Zhengwei
    Xue, Wentao
    Mao, Qirong
    Zhan, Yongzhao
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (05) : 6785 - 6799
  • [6] Learning emotion-discriminative and domain-invariant features for domain adaptation in speech emotion recognition
    Mao, Qirong
    Xu, Guopeng
    Xue, Wentao
    Gou, Jianping
    Zhan, Yongzhao
    [J]. SPEECH COMMUNICATION, 2017, 93 : 1 - 10
  • [7] Unsupervised domain adaptation for speech emotion recognition using PCANet
    Zhengwei Huang
    Wentao Xue
    Qirong Mao
    Yongzhao Zhan
    [J]. Multimedia Tools and Applications, 2017, 76 : 6785 - 6799
  • [8] ENSEMBLE FEATURE SELECTION FOR DOMAIN ADAPTATION IN SPEECH EMOTION RECOGNITION
    Abdelwahab, Mohammed
    Busso, Carlos
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5000 - 5004
  • [9] Speaker Recognition and Speech Emotion Recognition Based on GMM
    Xu, Shupeng
    Liu, Yan
    Liu, Xiping
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON ELECTRIC AND ELECTRONICS, 2013, : 434 - 436
  • [10] Domain Invariant Feature Learning for Speaker-Independent Speech Emotion Recognition
    Lu, Cheng
    Zong, Yuan
    Zheng, Wenming
    Li, Yang
    Tang, Chuangao
    Schuller, Bjoern W.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2217 - 2230