Speaker to Emotion: Domain Adaptation for Speech Emotion Recognition with Residual Adapters

被引：0

作者：

Xi, Yuxuan ^{[1
]}

Li, Pengcheng ^{[1
]}

Song, Yan ^{[1
]}

Jiang, Yiheng ^{[1
]}

Dai, Lirong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Peoples R China

来源：

2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2019年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/apsipaasc47483.2019.9023339

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Despite considerable recent progress in deep learning methods for speech emotion recognition (SER), performance is severely restricted by the lack of large-scale labeled speech emotion corpora. For instance, it is difficult to employ complex neural network architectures such as ResNet, which accompanied by large-sale corpora like VoxCeleb and NIST SRE, have proven to perform well for the related speaker verification (SV) task. In this paper, a novel domain adaptation method is proposed for the speech emotion recognition (SER) task, which aims to transfer related information from a speaker corpus to an emotion corpus. Specifically, a residual adapter architecture is designed for the SER task where ResNet acts as a universal model for general information extraction. An adapter module then trains limited additional parameters to focus on modeling deviation for the specific SER task. To evaluate the effectiveness of the proposed method, we conduct extensive evaluations on benchmark IEMOCAP and CHEAVD 2.0 corpora. Results show significant improvement, with overall results in each task outperforming or matching state-of-the-art methods.

引用

页码：513 / 518

页数：6

共 50 条

[1] SUPERVISED DOMAIN ADAPTATION FOR EMOTION RECOGNITION FROM SPEECH
Abdelwahab, Mohammed
Busso, Carlos
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5058 - 5062
[2] Adversarial Domain Adaptation for Noisy Speech Emotion Recognition
Cho, Sunyoung
Yoon, Soosung
Song, Hyunseung
[J]. 2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 1966 - 1970
[3] Speaker Attentive Speech Emotion Recognition
Le Moine, Clement
Obin, Nicolas
Roebel, Axel
[J]. INTERSPEECH 2021, 2021, : 2866 - 2870
[4] Speaker Awareness for Speech Emotion Recognition
Assuncao, Gustavo
Menezes, Paulo
Perdigao, Fernando
[J]. INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2020, 16 (04) : 15 - 22
[5] Unsupervised domain adaptation for speech emotion recognition using PCANet
Huang, Zhengwei
Xue, Wentao
Mao, Qirong
Zhan, Yongzhao
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (05) : 6785 - 6799
[6] Learning emotion-discriminative and domain-invariant features for domain adaptation in speech emotion recognition
Mao, Qirong
Xu, Guopeng
Xue, Wentao
Gou, Jianping
Zhan, Yongzhao
[J]. SPEECH COMMUNICATION, 2017, 93 : 1 - 10
[7] Unsupervised domain adaptation for speech emotion recognition using PCANet
Zhengwei Huang
Wentao Xue
Qirong Mao
Yongzhao Zhan
[J]. Multimedia Tools and Applications, 2017, 76 : 6785 - 6799
[8] ENSEMBLE FEATURE SELECTION FOR DOMAIN ADAPTATION IN SPEECH EMOTION RECOGNITION
Abdelwahab, Mohammed
Busso, Carlos
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5000 - 5004
[9] Speaker Recognition and Speech Emotion Recognition Based on GMM
Xu, Shupeng
Liu, Yan
Liu, Xiping
[J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON ELECTRIC AND ELECTRONICS, 2013, : 434 - 436
[10] Domain Invariant Feature Learning for Speaker-Independent Speech Emotion Recognition
Lu, Cheng
Zong, Yuan
Zheng, Wenming
Li, Yang
Tang, Chuangao
Schuller, Bjoern W.
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2217 - 2230

← 1 2 3 4 5 →