Nonnegative Matrix Factorization Based Transfer Subspace Learning for Cross-Corpus Speech Emotion Recognition

被引：21

作者：

Luo, Hui ^{[1
]}

Han, Jiqing ^{[2
]}

机构：

[1] Harbin Inst Technol, Sch Comp Sci & Technol, Comp Sci & Technol, Harbin 150001, Heilongjiang, Peoples R China

[2] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150001, Heilongjiang, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2020年 / 28卷

基金：

美国国家科学基金会;

关键词：

Non-negative matrix factorization; transfer subspace learning; cross-corpus; speech emotion recognition; ALGORITHMS;

D O I：

10.1109/TASLP.2020.3006331

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This article focuses on the cross-corpus speech emotion recognition (SER) task. To overcome the problem that the distribution of training (source) samples is inconsistent with that of testing (target) samples, we propose a non-negative matrix factorization based transfer subspace learning method (NMFTSL). Our method tries to find a shared feature subspace for the source and target corpora, in which the discrepancy between the two distributions is eliminated as much as possible and their individual components are excluded, thus the knowledge of the source corpus can be transferred to the target corpus. Specifically, in this induced subspace, we minimize the distances not only between the marginal distributions but also between the conditional distributions, where both distances are measured by the maximum mean discrepancy criterion. To estimate the conditional distribution of the target corpus, we propose to integrate the prediction of target label and the learning of feature representation into a joint learning model. Meanwhile, we introduce a difference loss to exclude the individual components from the shared subspace, which can further reduce the mutual interference between the source and target individual components. Moreover, we propose a discrimination loss to introduce the labels into the shared subspace, which can improve the discrimination ability of the feature representation. We also provide the solution for the corresponding optimization problem. To evaluate the performance of our method, we construct 30 cross-corpus SER schemes using 6 popular speech emotion corpora. Experimental results show that our approach achieves better overall performance than state-of-the-art methods.

引用

页码：2047 / 2060

页数：14

共 50 条

[31] Implicitly Aligning Joint Distributions for Cross-Corpus Speech Emotion Recognition
Lu, Cheng
Zong, Yuan
Tang, Chuangao
Lian, Hailun
Chang, Hongli
Zhu, Jie
Li, Sunan
Zhao, Yan
ELECTRONICS, 2022, 11 (17)
[32] DOMAIN GENERALIZATION WITH TRIPLET NETWORK FOR CROSS-CORPUS SPEECH EMOTION RECOGNITION
Lee, Shi-wook
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 389 - 396
[33] CROSS-CORPUS EEG-BASED EMOTION RECOGNITION
Rayatdoost, Soheil
Soleymani, Mohammad
2018 IEEE 28TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2018,
[34] A Cross-Corpus Recognition of Emotional Speech
Xiao, Zhongzhe
Wu, Di
Zhang, Xiaojun
Tao, Zhi
PROCEEDINGS OF 2016 9TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2016, : 42 - 46
[35] Synthesized speech for model training in cross-corpus recognition of human emotion
Björn Schuller
Zixing Zhang
Felix Weninger
Felix Burkhardt
International Journal of Speech Technology, 2012, 15 (3) : 313 - 323
[36] Deep Cross-Corpus Speech Emotion Recognition: Recent Advances and Perspectives
Zhang, Shiqing
Liu, Ruixin
Tao, Xin
Zhao, Xiaoming
FRONTIERS IN NEUROROBOTICS, 2021, 15
[37] Adversarial Domain Generalized Transformer for Cross-Corpus Speech Emotion Recognition
Gao, Yuan
Wang, Longbiao
Liu, Jiaxing
Dang, Jianwu
Okada, Shogo
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (02) : 697 - 708
[38] Transferable discriminant linear regression for cross-corpus speech emotion recognition
Li, Shaokai
Song, Peng
Zhang, Wenjing
APPLIED ACOUSTICS, 2022, 197
[39] Domain Generalization with Triplet Network for Cross-Corpus Speech Emotion Recognition
Lee, Shi-Wook
2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings, 2021, : 389 - 396
[40] Cross-corpus Speech Emotion Recognition Using Transfer Semi-supervised Discriminant Analysis
Song, Peng
Zhang, Xinran
Ou, Shifeng
Liu, Jingjing
Yu, Yanwei
Zheng, Wenming
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,

← 1 2 3 4 5 →