Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages

被引:0
|
作者
Retta, Ephrem Afele [1 ]
Sutcliffe, Richard [2 ]
Mahmood, Jabar [3 ,4 ]
Berwo, Michael Abebe [4 ]
Almekhlafi, Eiad [1 ]
Khan, Sajjad Ahmad [5 ]
Chaudhry, Shehzad Ashraf [6 ,7 ]
Mhamed, Mustafa [1 ,8 ]
Feng, Jun [1 ]
机构
[1] Northwest Univ, Sch Informat Sci & Technol, Xian 710127, Peoples R China
[2] Univ Essex, Sch Comp Sci & Elect Engn, Wivenhoe Pk, Colchester CO4 3SQ, England
[3] Univ Sialkot, Fac Comp & Informat Technol, Sialkot 51040, Punjab, Pakistan
[4] Changan Univ, Sch Informat & Engn, Xian 710064, Peoples R China
[5] Hoseo Univ, Comp Engn Dept, Asan 31499, South Korea
[6] Abu Dhabi Univ, Coll Engn, Dept Comp Sci & Informat Technol, Abu Dhabi 59911, U Arab Emirates
[7] Nisantasi Univ, Fac Engn & Architecture, Dept Software Engn, TR-34398 Istanbul, Turkiye
[8] China Agr Univ, Coll Informat & Elect Engn, Beijing 100083, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 23期
关键词
speech emotion recognition; multilingual; cross-lingual; feature extraction;
D O I
10.3390/app132312587
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In a conventional speech emotion recognition (SER) task, a classifier for a given language is trained on a pre-existing dataset for that same language. However, where training data for a language do not exist, data from other languages can be used instead. We experiment with cross-lingual and multilingual SER, working with Amharic, English, German, and Urdu. For Amharic, we use our own publicly available Amharic Speech Emotion Dataset (ASED). For English, German and Urdu, we use the existing RAVDESS, EMO-DB, and URDU datasets. We followed previous research in mapping labels for all of the datasets to just two classes: positive and negative. Thus, we can compare performance on different languages directly and combine languages for training and testing. In Experiment 1, monolingual SER trials were carried out using three classifiers, AlexNet, VGGE (a proposed variant of VGG), and ResNet50. The results, averaged for the three models, were very similar for ASED and RAVDESS, suggesting that Amharic and English SER are equally difficult. Similarly, German SER is more difficult, and Urdu SER is easier. In Experiment 2, we trained on one language and tested on another, in both directions for each of the following pairs: Amharic <-> German, Amharic <-> English, and Amharic <-> Urdu. The results with Amharic as the target suggested that using English or German as the source gives the best result. In Experiment 3, we trained on several non-Amharic languages and then tested on Amharic. The best accuracy obtained was several percentage points greater than the best accuracy in Experiment 2, suggesting that a better result can be obtained when using two or three non-Amharic languages for training than when using just one non-Amharic language. Overall, the results suggest that cross-lingual and multilingual training can be an effective strategy for training an SER classifier when resources for a language are scarce.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] A CROSS-CORPUS STUDY ON SPEECH EMOTION RECOGNITION
    Milner, Rosanna
    Jalal, Md Asif
    Ng, Raymond W. M.
    Hain, Thomas
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 304 - 311
  • [2] A STUDY ON CROSS-CORPUS SPEECH EMOTION RECOGNITION AND DATA AUGMENTATION
    Braunschweiler, Norbert
    Doddipatla, Rama
    Keizer, Simon
    Stoyanchev, Svetlana
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 24 - 30
  • [3] Cross-Corpus Speech Emotion Recognition Based on Causal Emotion Information Representation
    Fu, Hongliang
    Li, Qianqian
    Tao, Huawei
    Zhu, Chunhua
    Xie, Yue
    Guo, Ruxue
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (08) : 1097 - 1100
  • [4] Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages
    Latif, Siddique
    Qayyum, Adnan
    Usman, Muhammad
    Qadir, Junaid
    [J]. 2018 INTERNATIONAL CONFERENCE ON FRONTIERS OF INFORMATION TECHNOLOGY (FIT 2018), 2018, : 88 - 93
  • [5] Synthesized speech for model training in cross-corpus recognition of human emotion
    Schuller, Bjorn
    Zhang, Zixing
    Weninger, Felix
    Burkhardt, Felix
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (03) : 313 - 323
  • [6] Implicitly Aligning Joint Distributions for Cross-Corpus Speech Emotion Recognition
    Lu, Cheng
    Zong, Yuan
    Tang, Chuangao
    Lian, Hailun
    Chang, Hongli
    Zhu, Jie
    Li, Sunan
    Zhao, Yan
    [J]. ELECTRONICS, 2022, 11 (17)
  • [7] Cross-Corpus Speech Emotion Recognition Based on Hybrid Neural Networks
    Rehman, Abdul
    Liu, Zhen-Tao
    Li, Dan-Yun
    Wu, Bao-Han
    [J]. PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 7464 - 7468
  • [8] DOMAIN GENERALIZATION WITH TRIPLET NETWORK FOR CROSS-CORPUS SPEECH EMOTION RECOGNITION
    Lee, Shi-wook
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 389 - 396
  • [9] A Cross-Corpus Recognition of Emotional Speech
    Xiao, Zhongzhe
    Wu, Di
    Zhang, Xiaojun
    Tao, Zhi
    [J]. PROCEEDINGS OF 2016 9TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2016, : 42 - 46
  • [10] Deep Cross-Corpus Speech Emotion Recognition: Recent Advances and Perspectives
    Zhang, Shiqing
    Liu, Ruixin
    Tao, Xin
    Zhao, Xiaoming
    [J]. FRONTIERS IN NEUROROBOTICS, 2021, 15