Using Speech Enhancement Preprocessing for Speech Emotion Recognition in Realistic Noisy Conditions

被引:4
|
作者
Zhou, Hengshun [1 ]
Du, Jun [1 ]
Tu, Yan-Hui [1 ]
Lee, Chin-Hui [2 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
来源
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
speech emotion recognition; speech enhancement; realistic environments; multiple-target learning; LSTM;
D O I
10.21437/Interspeech.2020-2472
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this study, we investigate the effects of deep learning (DL)-based speech enhancement (SE) on speech emotion recognition (SER) in realistic environments. First, we use emotion speech data to train regression-based speech enhancement models which is shown to be beneficial to noisy speech emotion recognition. Next, to improve the model generalization capability of the regression model, an LSTM architecture with a design of hidden layers via simply densely-connected progressive learning, is adopted for the enhancement model. Finally, a post-processor utilizing an improved speech presence probability to estimate masks from the above proposed LSTM structure is shown to further improves recognition accuracies. Experiments results on the IEMOCAP and CHEAVD 2.0 corpora demonstrate that the proposed framework can yield consistent and significant improvements over the systems using unprocessed noisy speech.
引用
收藏
页码:4098 / 4102
页数:5
相关论文
共 50 条
  • [41] Speech Emotion Recognition Using Speech Feature and Word Embedding
    Atmaja, Bagus Tris
    Shirai, Kiyoaki
    Akagi, Masato
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 519 - 523
  • [42] Signal preprocessing for speech recognition
    Kolokolov, AS
    [J]. AUTOMATION AND REMOTE CONTROL, 2002, 63 (03) : 494 - 501
  • [43] Signal Preprocessing for Speech Recognition
    A. S. Kolokolov
    [J]. Automation and Remote Control, 2002, 63 : 494 - 501
  • [44] Robust distributed speech recognition using speech enhancement
    Flynn, Ronan
    Jones, Edward
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2008, 54 (03) : 1267 - 1273
  • [45] A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments
    Visser, E
    Otsuka, M
    Lee, TW
    [J]. SPEECH COMMUNICATION, 2003, 41 (2-3) : 393 - 407
  • [46] On the effectiveness of speech enhancement to a proposed speech recognition process that applied to noisy isolated-word recognition
    Liu, Lih-Cherng
    Lu, Ching-Ta
    Tsai, Ho-Hsuan
    [J]. PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 3310 - +
  • [47] ROBUST SPEECH RECOGNITION IN UNKNOWN REVERBERANT AND NOISY CONDITIONS
    Hsiao, Roger
    Ma, Jeff
    Hartmann, William
    Karafiat, Martin
    Grezl, Frantisek
    Burget, Lukas
    Szoke, Igor
    Cernocky, Jan Honza
    Watanabe, Shinji
    Chen, Zhuo
    Mallidi, Sri Harish
    Hermansky, Hynek
    Tsakalidis, Stavros
    Schwartz, Richard
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 533 - 538
  • [48] Techniques for robust speech recognition in noisy and reverberant conditions
    Brown, GJ
    Palomäki, KJ
    [J]. SPEECH SEPARATION BY HUMANS AND MACHINES, 2005, : 213 - 220
  • [49] Perceptual speech modeling for noisy speech recognition
    Wu, CH
    Chiu, YH
    Lim, H
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 385 - 388
  • [50] Speech Enhancement of Noisy and Reverberant Speech for Text-to-Speech
    Valentini-Botinhao, Cassia
    Yamagishi, Junichi
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (08) : 1420 - 1433