Speech Emotion Recognition with Cross-lingual Databases

被引:0
|
作者
Chiou, Bo-Chang [1 ]
Chen, Chia-Ping [1 ]
机构
[1] Natl Sun Yat Sen Univ, Dept Comp Sci & Engn, Kaohsiung, Taiwan
关键词
speech synthesis; unit selection; join costs;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we investigate cross-lingual automatic speech emotion recognition. The basic idea is that since the emotion recognition system is based on the acoustic features only, it is possible to, combine data in different languages to improve the recognition accuracy. We begin with the construction of a Mandarin database of emotional speech, which is similar to the well-known Berlin Database of Emotional Speech (EMO-DB) in the composition and size. In order to reduce the variability due to different languages and different speakers, we propose to apply histogram equalization as a data normalization method. Recognition systems based on support vector machines have been evaluated on EMO-DB. Compared to the baseline system without multi-lingual databases and data normalization, the proposed system has achieved a relative improvement of 39.9% in the emotion recognition accuracy, from 86.2% to 91.7%. The accuracy is among the best known results reported on EMODB, if not the best.
引用
收藏
页码:558 / 561
页数:4
相关论文
共 50 条
  • [41] Cross-Lingual Speech-to-Text Summarization
    Pontes, Elvys Linhares
    Gonzalez-Gallardo, Carlos-Emiliano
    Torres-Moreno, Juan-Manuel
    Huet, Stephane
    [J]. MULTIMEDIA AND NETWORK INFORMATION SYSTEMS, 2019, 833 : 385 - 395
  • [42] FUSION OF MULTIPLE EMOTION PERSPECTIVES: IMPROVING AFFECT RECOGNITION THROUGH INTEGRATING CROSS-LINGUAL EMOTION INFORMATION
    Chang, Chun-Min
    Lee, Chi-Chun
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5820 - 5824
  • [43] METTS: Multilingual Emotional Text-to-Speech by Cross-Speaker and Cross-Lingual Emotion Transfer
    Zhu, Xinfa
    Lei, Yi
    Li, Tao
    Zhang, Yongmao
    Zhou, Hongbin
    Lu, Heng
    Xie, Lei
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1506 - 1518
  • [44] Cross-Lingual Subspace Gaussian Mixture Models for Low-Resource Speech Recognition
    Lu, Liang
    Ghoshal, Arnab
    Renals, Steve
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (01) : 17 - 27
  • [45] Speech Recognition for Turkic Languages Using Cross-Lingual Transfer Learning from Kazakh
    Orel, Daniil
    Yeshpanov, Rustem
    Varol, Huseyin Atakan
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, BIGCOMP, 2023, : 174 - 182
  • [46] Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition
    Farooq, Muhammad Umar
    Hain, Thomas
    [J]. INTERSPEECH 2022, 2022, : 3849 - 3853
  • [47] Speech Recognition for Turkic Languages Using Cross-Lingual Transfer Learning from Kazakh
    Orel, Daniil
    Yeshpanov, Rustem
    Varol, Huseyin Atakan
    [J]. Proceedings - 2023 IEEE International Conference on Big Data and Smart Computing, BigComp 2023, 2023, : 174 - 182
  • [48] MAXIMUM A POSTERIORI ADAPTATION OF SUBSPACE GAUSSIAN MIXTURE MODELS FOR CROSS-LINGUAL SPEECH RECOGNITION
    Lu, Liang
    Ghoshal, Arnab
    Renals, Steve
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4877 - 4880
  • [49] Zero-Shot Cross-lingual Aphasia Detection using Automatic Speech Recognition
    Chatzoudis, Gerasimos
    Plitsis, Manos
    Stamouli, Spyridoula
    Dimou, Athanasia-Lida
    Katsamanis, Nassos
    Katsouros, Vassilis
    [J]. INTERSPEECH 2022, 2022, : 2178 - 2182
  • [50] MULTI-STREAM TEMPORALLY VARYING WEIGHT REGRESSION FOR CROSS-LINGUAL SPEECH RECOGNITION
    Liu, Shilin
    Sim, Khe Chai
    [J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 434 - 439