A STUDY ON CROSS-CORPUS SPEECH EMOTION RECOGNITION AND DATA AUGMENTATION

被引：3

作者：

Braunschweiler, Norbert ^{[1
]}

Doddipatla, Rama ^{[1
]}

Keizer, Simon ^{[1
]}

Stoyanchev, Svetlana ^{[1
]}

机构：

[1] Cambridge Res Lab, Toshiba Europe Ltd, Cambridge CB4 0GZ, England

来源：

2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) | 2021年

关键词：

speech emotion recognition; cross-corpus; data augmentation; CNN-RNN bi-directional LSTM; deep learning;

D O I：

10.1109/ASRU51503.2021.9687987

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Models that can handle a wide range of speakers and acoustic conditions are essential in speech emotion recognition (SER). Often, these models tend to show mixed results when presented with speakers or acoustic conditions that were not visible during training. This paper investigates the impact of cross-corpus data complementation and data augmentation on the performance of SER models in matched (test-set from same corpus) and mismatched (test-set from different corpus) conditions. Investigations using six emotional speech corpora that include single and multiple speakers as well as variations in emotion style (acted, elicited, natural) and recording conditions are presented. Observations show that, as expected, models trained on single corpora perform best in matched conditions while performance decreases between 10-40% in mismatched conditions, depending on corpus specific features. Models trained on mixed corpora can be more stable in mismatched contexts, and the performance reductions range from 1 to 8% when compared with single corpus models in matched conditions. Data augmentation yields additional gains up to 4% and seem to benefit mismatched conditions more than matched ones.

引用

页码：24 / 30

页数：7

共 50 条

[1] A CROSS-CORPUS STUDY ON SPEECH EMOTION RECOGNITION
Milner, Rosanna
Jalal, Md Asif
Ng, Raymond W. M.
Hain, Thomas
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 304 - 311
[2] Convolutional neural network-based cross-corpus speech emotion recognition with data augmentation and features fusion
Jahangir, Rashid
Teh, Ying Wah
Mujtaba, Ghulam
Alroobaea, Roobaea
Shaikh, Zahid Hussain
Ali, Ihsan
MACHINE VISION AND APPLICATIONS, 2022, 33 (03)
[3] Convolutional neural network-based cross-corpus speech emotion recognition with data augmentation and features fusion
Rashid Jahangir
Ying Wah Teh
Ghulam Mujtaba
Roobaea Alroobaea
Zahid Hussain Shaikh
Ihsan Ali
Machine Vision and Applications, 2022, 33
[4] Cross-Corpus Speech Emotion Recognition Based on Causal Emotion Information Representation
Fu, Hongliang
Li, Qianqian
Tao, Huawei
Zhu, Chunhua
Xie, Yue
Guo, Ruxue
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (08) : 1097 - 1100
[5] A Comparative Study on Different Labelling Schemes and Cross-Corpus Experiments in Speech Emotion Recognition
Baki, Pinar
Erden, Berna
Oncul, Serkan
29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
[6] Implicitly Aligning Joint Distributions for Cross-Corpus Speech Emotion Recognition
Lu, Cheng
Zong, Yuan
Tang, Chuangao
Lian, Hailun
Chang, Hongli
Zhu, Jie
Li, Sunan
Zhao, Yan
ELECTRONICS, 2022, 11 (17)
[7] Synthesized speech for model training in cross-corpus recognition of human emotion
Schuller, Bjorn
Zhang, Zixing
Weninger, Felix
Burkhardt, Felix
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (03) : 313 - 323
[8] Cross-Corpus Speech Emotion Recognition Based on Hybrid Neural Networks
Rehman, Abdul
Liu, Zhen-Tao
Li, Dan-Yun
Wu, Bao-Han
PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 7464 - 7468
[9] DOMAIN GENERALIZATION WITH TRIPLET NETWORK FOR CROSS-CORPUS SPEECH EMOTION RECOGNITION
Lee, Shi-wook
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 389 - 396
[10] A Cross-Corpus Recognition of Emotional Speech
Xiao, Zhongzhe
Wu, Di
Zhang, Xiaojun
Tao, Zhi
PROCEEDINGS OF 2016 9TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2016, : 42 - 46

← 1 2 3 4 5 →