Hierarchical classifier design for speech emotion recognition in the mixed-cultural environment

被引：3

作者：

Vasuki, P. ^{[1
]}

Aravindan, Chandrabose ^{[2
]}

机构：

[1] SSN Coll Engn, Dept IT, Chennai, Tamil Nadu, India

[2] SSN Coll Engn, Dept CSE, Chennai, Tamil Nadu, India

来源：

JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE | 2021年 / 33卷 / 03期

关键词：

Speech emotion classification; hierarchical classification system; integrated corpus environment; CROSS-CORPUS;

D O I：

10.1080/0952813X.2020.1764630

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recognition of emotion in speech is a difficult task due to many speaker factors like gender, age, and the cultural background (nationality, ethnicity, and region) as well as the acoustical environment. Among these factors, the cultural background of the speaker has a strong influence on the expression of emotion. The reason for the unsatisfactory performance of an emotion recognition engine built using mixed-cultural samples can be traced back to this. To address this issue, a two-level hierarchical engine has been designed to identify emotion from the speech of different cultural backgrounds. The first level of the hierarchical engine is a culture identification system, which identifies the corpus of an input utterance. As most of the speakers involved in the construction of a specific corpus are from the same locality and cultural background, we assume that a corpus represents the cultural background of the speakers of the corpus constructed. Based on the response of the first level classifier, the input utterance is forwarded to an appropriate corpus-specific emotion recognition engine, in the second level. Each corpus-specific emotion recognition system is a discriminative, multiclass SVM classifier, trained with the emotional utterances of that particular corpus. The system has been tested with five different corpora, collected from diverse cultural backgrounds, namely EMO-DB, SAVEE, IITKGP-SEC, Spanish corpus S0329, and CMU's Woogles corpus. The system achieved an accuracy of 82.01% which is an improvement of 13.38% over monolithic approaches.

引用

页码：451 / 466

页数：16

共 50 条

[1] Design of Hierarchical Classifier to Improve Speech Emotion Recognition
Vasuki, P.
[J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2023, 44 (01): : 19 - 33
[2] Mixed-Cultural Speech for Mixed-Cultural Users - Natural vs. Synthetic Speech for Virtual Agents
Obremski, David
Lugrin, Birgit
[J]. PROCEEDINGS OF THE 10TH CONFERENCE ON HUMAN-AGENT INTERACTION, HAI 2022, 2022, : 290 - 292
[3] The impact of mixed-cultural speech on the stereotypical perception of a virtual robot
Obremski, David
Friedrich, Paula
Haak, Nora
Schaper, Philipp
Lugrin, Birgit
[J]. FRONTIERS IN ROBOTICS AND AI, 2022, 9
[4] CLASSIFIER FUSION FOR EMOTION RECOGNITION FROM SPEECH
Scherer, Stefan
Schwenker, Friedhelm
Palm, Guenther
[J]. ADVANCED INTELLIGENT ENVIRONMENTS, 2009, : 95 - 117
[5] Hierarchical framework for speech emotion recognition
You, Mingyu
Chen, Chun
Bu, Jiajun
Liu, Jia
Tao, Jianhua
[J]. 2006 IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS, VOLS 1-7, 2006, : 515 - +
[6] Multi-Classifier Speech Emotion Recognition System
Partila, Pavol
Tovarek, Jaromir
Voznak, Miroslav
Rozhon, Jan
Sevcik, Lukas
Baran, Remigiusz
[J]. 2018 26TH TELECOMMUNICATIONS FORUM (TELFOR), 2018, : 416 - 419
[7] Speech emotion recognition in noisy environment
Chenchah, Farah
Lachiri, Zied
[J]. 2016 2ND INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP), 2016, : 788 - 792
[8] Improvement Of Speech Emotion Recognition with Neural Network Classifier by Using Speech Spectrogram
Prasomphan, Sathit
[J]. 2015 INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP 2015), 2015, : 73 - 76
[9] Ensemble majority voting classifier for speech emotion recognition and prediction
Anagnostopoulos, Theodoros
Skourlas, Christos
[J]. Journal of Systems and Information Technology, 2014, 16 (03) : 222 - 232
[10] Hierarchical sparse coding framework for speech emotion recognition
Torres-Boza, Diana
Oveneke, Meshia Cedric
Wang, Fengna
Jiang, Dongmei
Verhelst, Werner
Sahli, Hichem
[J]. SPEECH COMMUNICATION, 2018, 99 : 80 - 89

← 1 2 3 4 5 →