Hierarchical classifier design for speech emotion recognition in the mixed-cultural environment

被引:3
|
作者
Vasuki, P. [1 ]
Aravindan, Chandrabose [2 ]
机构
[1] SSN Coll Engn, Dept IT, Chennai, Tamil Nadu, India
[2] SSN Coll Engn, Dept CSE, Chennai, Tamil Nadu, India
关键词
Speech emotion classification; hierarchical classification system; integrated corpus environment; CROSS-CORPUS;
D O I
10.1080/0952813X.2020.1764630
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognition of emotion in speech is a difficult task due to many speaker factors like gender, age, and the cultural background (nationality, ethnicity, and region) as well as the acoustical environment. Among these factors, the cultural background of the speaker has a strong influence on the expression of emotion. The reason for the unsatisfactory performance of an emotion recognition engine built using mixed-cultural samples can be traced back to this. To address this issue, a two-level hierarchical engine has been designed to identify emotion from the speech of different cultural backgrounds. The first level of the hierarchical engine is a culture identification system, which identifies the corpus of an input utterance. As most of the speakers involved in the construction of a specific corpus are from the same locality and cultural background, we assume that a corpus represents the cultural background of the speakers of the corpus constructed. Based on the response of the first level classifier, the input utterance is forwarded to an appropriate corpus-specific emotion recognition engine, in the second level. Each corpus-specific emotion recognition system is a discriminative, multiclass SVM classifier, trained with the emotional utterances of that particular corpus. The system has been tested with five different corpora, collected from diverse cultural backgrounds, namely EMO-DB, SAVEE, IITKGP-SEC, Spanish corpus S0329, and CMU's Woogles corpus. The system achieved an accuracy of 82.01% which is an improvement of 13.38% over monolithic approaches.
引用
收藏
页码:451 / 466
页数:16
相关论文
共 50 条
  • [1] Design of Hierarchical Classifier to Improve Speech Emotion Recognition
    Vasuki, P.
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2023, 44 (01): : 19 - 33
  • [2] Mixed-Cultural Speech for Mixed-Cultural Users - Natural vs. Synthetic Speech for Virtual Agents
    Obremski, David
    Lugrin, Birgit
    [J]. PROCEEDINGS OF THE 10TH CONFERENCE ON HUMAN-AGENT INTERACTION, HAI 2022, 2022, : 290 - 292
  • [3] The impact of mixed-cultural speech on the stereotypical perception of a virtual robot
    Obremski, David
    Friedrich, Paula
    Haak, Nora
    Schaper, Philipp
    Lugrin, Birgit
    [J]. FRONTIERS IN ROBOTICS AND AI, 2022, 9
  • [4] CLASSIFIER FUSION FOR EMOTION RECOGNITION FROM SPEECH
    Scherer, Stefan
    Schwenker, Friedhelm
    Palm, Guenther
    [J]. ADVANCED INTELLIGENT ENVIRONMENTS, 2009, : 95 - 117
  • [5] Hierarchical framework for speech emotion recognition
    You, Mingyu
    Chen, Chun
    Bu, Jiajun
    Liu, Jia
    Tao, Jianhua
    [J]. 2006 IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS, VOLS 1-7, 2006, : 515 - +
  • [6] Multi-Classifier Speech Emotion Recognition System
    Partila, Pavol
    Tovarek, Jaromir
    Voznak, Miroslav
    Rozhon, Jan
    Sevcik, Lukas
    Baran, Remigiusz
    [J]. 2018 26TH TELECOMMUNICATIONS FORUM (TELFOR), 2018, : 416 - 419
  • [7] Speech emotion recognition in noisy environment
    Chenchah, Farah
    Lachiri, Zied
    [J]. 2016 2ND INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP), 2016, : 788 - 792
  • [8] Improvement Of Speech Emotion Recognition with Neural Network Classifier by Using Speech Spectrogram
    Prasomphan, Sathit
    [J]. 2015 INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP 2015), 2015, : 73 - 76
  • [9] Ensemble majority voting classifier for speech emotion recognition and prediction
    Anagnostopoulos, Theodoros
    Skourlas, Christos
    [J]. Journal of Systems and Information Technology, 2014, 16 (03) : 222 - 232
  • [10] Hierarchical sparse coding framework for speech emotion recognition
    Torres-Boza, Diana
    Oveneke, Meshia Cedric
    Wang, Fengna
    Jiang, Dongmei
    Verhelst, Werner
    Sahli, Hichem
    [J]. SPEECH COMMUNICATION, 2018, 99 : 80 - 89