Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages

被引：1

作者：

Retta, Ephrem Afele ^{[1
]}

Sutcliffe, Richard ^{[2
]}

Mahmood, Jabar ^{[3
,4
]}

Berwo, Michael Abebe ^{[4
]}

Almekhlafi, Eiad ^{[1
]}

Khan, Sajjad Ahmad ^{[5
]}

Chaudhry, Shehzad Ashraf ^{[6
,7
]}

Mhamed, Mustafa ^{[1
,8
]}

Feng, Jun ^{[1
]}

机构：

[1] Northwest Univ, Sch Informat Sci & Technol, Xian 710127, Peoples R China

[2] Univ Essex, Sch Comp Sci & Elect Engn, Wivenhoe Pk, Colchester CO4 3SQ, England

[3] Univ Sialkot, Fac Comp & Informat Technol, Sialkot 51040, Punjab, Pakistan

[4] Changan Univ, Sch Informat & Engn, Xian 710064, Peoples R China

[5] Hoseo Univ, Comp Engn Dept, Asan 31499, South Korea

[6] Abu Dhabi Univ, Coll Engn, Dept Comp Sci & Informat Technol, Abu Dhabi 59911, U Arab Emirates

[7] Nisantasi Univ, Fac Engn & Architecture, Dept Software Engn, TR-34398 Istanbul, Turkiye

[8] China Agr Univ, Coll Informat & Elect Engn, Beijing 100083, Peoples R China

来源：

APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 23期

关键词：

speech emotion recognition; multilingual; cross-lingual; feature extraction;

D O I：

10.3390/app132312587

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

In a conventional speech emotion recognition (SER) task, a classifier for a given language is trained on a pre-existing dataset for that same language. However, where training data for a language do not exist, data from other languages can be used instead. We experiment with cross-lingual and multilingual SER, working with Amharic, English, German, and Urdu. For Amharic, we use our own publicly available Amharic Speech Emotion Dataset (ASED). For English, German and Urdu, we use the existing RAVDESS, EMO-DB, and URDU datasets. We followed previous research in mapping labels for all of the datasets to just two classes: positive and negative. Thus, we can compare performance on different languages directly and combine languages for training and testing. In Experiment 1, monolingual SER trials were carried out using three classifiers, AlexNet, VGGE (a proposed variant of VGG), and ResNet50. The results, averaged for the three models, were very similar for ASED and RAVDESS, suggesting that Amharic and English SER are equally difficult. Similarly, German SER is more difficult, and Urdu SER is easier. In Experiment 2, we trained on one language and tested on another, in both directions for each of the following pairs: Amharic <-> German, Amharic <-> English, and Amharic <-> Urdu. The results with Amharic as the target suggested that using English or German as the source gives the best result. In Experiment 3, we trained on several non-Amharic languages and then tested on Amharic. The best accuracy obtained was several percentage points greater than the best accuracy in Experiment 2, suggesting that a better result can be obtained when using two or three non-Amharic languages for training than when using just one non-Amharic language. Overall, the results suggest that cross-lingual and multilingual training can be an effective strategy for training an SER classifier when resources for a language are scarce.

引用

页数：17

共 50 条

[31] CROSS-CORPUS SPEECH EMOTION RECOGNITION USING JOINT DISTRIBUTION ADAPTIVE REGRESSION
Zhang, Jiacheng
Jiang, Lin
Zong, Yuan
Zheng, Wenming
Zhao, Li
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3790 - 3794
[32] A Novel DBN Feature Fusion Model for Cross-Corpus Speech Emotion Recognition
Zou Cairong
Zhang Xinran
Zha Cheng
Zhao Li
JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, 2016, 2016
[33] Auditory attention model based on Chirplet for cross-corpus speech emotion recognition
Zhang X.
Song P.
Zha C.
Tao H.
Zhao L.
Zhao, Li (zhaoli@seu.edu.cn), 1600, Southeast University (32): : 402 - 407
[34] Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition
Ye, Jiaxin
Wei, Yujie
Wen, Xin-Cheng
Ma, Chenglong
Huang, Zhizhong
Liu, Kunhong
Shan, Hongming
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5956 - 5965
[35] Exploring corpus-invariant emotional acoustic feature for cross-corpus speech emotion recognition
Lian, Hailun
Lu, Cheng
Zhao, Yan
Li, Sunan
Qi, Tianhua
Zong, Yuan
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 258
[36] A Comparative Study on Different Labelling Schemes and Cross-Corpus Experiments in Speech Emotion Recognition
Baki, Pinar
Erden, Berna
Oncul, Serkan
29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
[37] Cross-Corpus Speech Emotion Recognition Based on Joint Transfer Subspace Learning and Regression
Zhang, Weijian
Song, Peng
Chen, Dongliang
Sheng, Chao
Zhang, Wenjing
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (02) : 588 - 598
[38] Improved Cross-Corpus Speech Emotion Recognition Using Deep Local Domain Adaptation
ZHAO Huijuan
YE Ning
WANG Ruchuan
ChineseJournalofElectronics, 2023, 32 (03) : 640 - 646
[39] Cross-Corpus Acoustic Emotion Recognition: Variances and Strategies
Schuller, Bjoern
Vlasenko, Bogdan
Eyben, Florian
Woellmer, Martin
Stuhlsatz, Andre
Wendemuth, Andreas
Rigoll, Gerhard
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2010, 1 (02) : 119 - 131
[40] Improved Cross-Corpus Speech Emotion Recognition Using Deep Local Domain Adaptation
Zhao Huijuan
Ye Ning
Wang Ruchuan
CHINESE JOURNAL OF ELECTRONICS, 2023, 32 (03) : 640 - 646

← 1 2 3 4 5 →