Speech Emotion Recognition with Cross-lingual Databases

被引：0

作者：

Chiou, Bo-Chang ^{[1
]}

Chen, Chia-Ping ^{[1
]}

机构：

[1] Natl Sun Yat Sen Univ, Dept Comp Sci & Engn, Kaohsiung, Taiwan

来源：

15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 | 2014年

关键词：

speech synthesis; unit selection; join costs;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we investigate cross-lingual automatic speech emotion recognition. The basic idea is that since the emotion recognition system is based on the acoustic features only, it is possible to, combine data in different languages to improve the recognition accuracy. We begin with the construction of a Mandarin database of emotional speech, which is similar to the well-known Berlin Database of Emotional Speech (EMO-DB) in the composition and size. In order to reduce the variability due to different languages and different speakers, we propose to apply histogram equalization as a data normalization method. Recognition systems based on support vector machines have been evaluated on EMO-DB. Compared to the baseline system without multi-lingual databases and data normalization, the proposed system has achieved a relative improvement of 39.9% in the emotion recognition accuracy, from 86.2% to 91.7%. The accuracy is among the best known results reported on EMODB, if not the best.

引用

页码：558 / 561

页数：4

共 50 条

[41] Cross-Lingual Speech-to-Text Summarization
Pontes, Elvys Linhares
Gonzalez-Gallardo, Carlos-Emiliano
Torres-Moreno, Juan-Manuel
Huet, Stephane
[J]. MULTIMEDIA AND NETWORK INFORMATION SYSTEMS, 2019, 833 : 385 - 395
[42] FUSION OF MULTIPLE EMOTION PERSPECTIVES: IMPROVING AFFECT RECOGNITION THROUGH INTEGRATING CROSS-LINGUAL EMOTION INFORMATION
Chang, Chun-Min
Lee, Chi-Chun
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5820 - 5824
[43] METTS: Multilingual Emotional Text-to-Speech by Cross-Speaker and Cross-Lingual Emotion Transfer
Zhu, Xinfa
Lei, Yi
Li, Tao
Zhang, Yongmao
Zhou, Hongbin
Lu, Heng
Xie, Lei
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1506 - 1518
[44] Cross-Lingual Subspace Gaussian Mixture Models for Low-Resource Speech Recognition
Lu, Liang
Ghoshal, Arnab
Renals, Steve
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (01) : 17 - 27
[45] Speech Recognition for Turkic Languages Using Cross-Lingual Transfer Learning from Kazakh
Orel, Daniil
Yeshpanov, Rustem
Varol, Huseyin Atakan
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, BIGCOMP, 2023, : 174 - 182
[46] Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition
Farooq, Muhammad Umar
Hain, Thomas
[J]. INTERSPEECH 2022, 2022, : 3849 - 3853
[47] Speech Recognition for Turkic Languages Using Cross-Lingual Transfer Learning from Kazakh
Orel, Daniil
Yeshpanov, Rustem
Varol, Huseyin Atakan
[J]. Proceedings - 2023 IEEE International Conference on Big Data and Smart Computing, BigComp 2023, 2023, : 174 - 182
[48] MAXIMUM A POSTERIORI ADAPTATION OF SUBSPACE GAUSSIAN MIXTURE MODELS FOR CROSS-LINGUAL SPEECH RECOGNITION
Lu, Liang
Ghoshal, Arnab
Renals, Steve
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4877 - 4880
[49] Zero-Shot Cross-lingual Aphasia Detection using Automatic Speech Recognition
Chatzoudis, Gerasimos
Plitsis, Manos
Stamouli, Spyridoula
Dimou, Athanasia-Lida
Katsamanis, Nassos
Katsouros, Vassilis
[J]. INTERSPEECH 2022, 2022, : 2178 - 2182
[50] MULTI-STREAM TEMPORALLY VARYING WEIGHT REGRESSION FOR CROSS-LINGUAL SPEECH RECOGNITION
Liu, Shilin
Sim, Khe Chai
[J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 434 - 439

← 1 2 3 4 5 →