Automatic Multi-Speaker Speech Recognition System Based on Time-Frequency Blind Source Separation under Ubiquitous Environment

被引:0
|
作者
Wang, Zhe [1 ]
Zhang, Haijian [1 ]
Bi, Guoan [1 ]
Li, Xiumei [2 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
[2] Hangzhou Normal Univ, Sch Informat Sci & Engn, Hangzhou, Peoples R China
关键词
FOURIER-TRANSFORM; NOISE; DOMAIN;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, an automatic speech recognition (ASR) system under ubiquitous environment is proposed, which is successfully implemented in a personalized voice command system under vehicle and living room environment. The proposed ASR system describes a novel scheme of separating speech sources from multi-speakers, detecting speech presence/absence by tracking the higher portion of speech power spectrum and adaptively suppressing noises. An automatic recognition algorithm to adapt with the multi-speaker task is designed and conducted. Evaluation tests are carried out using noise database NOISEX-92 and speech database YOHO Corpus. Experimental results show that the proposed algorithm manages to achieve very impressive improvements.
引用
收藏
页码:101 / +
页数:2
相关论文
共 50 条
  • [1] Multi-Speaker Adaptation for Robust Speech Recognition under Ubiquitous Environment
    Shih, Po-Yi
    Wang, Jhing-Fa
    Lin, Yuan-Ning
    Fu, Zhong-Hua
    ORIENTAL COCOSDA 2009 - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2009, : 126 - 131
  • [2] Exploring the time-frequency microstructure of speech for blind source separation
    Wu, HC
    Principe, JC
    Xu, DX
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 1145 - 1148
  • [3] Sparse Component Analysis for Speech Recognition in Multi-Speaker Environment
    Asaei, Afsaneh
    Bourlard, Herve
    Garner, Philip N.
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1704 - 1707
  • [4] Blind speech source separation via nonlinear time-frequency masking
    XU Shun CHEN Shaorong LIU Yulin (DSP Lab.
    ChineseJournalofAcoustics, 2008, (03) : 203 - 214
  • [5] Multimodal (audio-visual) source separation exploiting multi-speaker tracking, robust beamforming and time-frequency masking
    Naqvi, S. Mohsen
    Wang, W.
    Khan, M. Salman
    Barnard, M.
    Chambers, J. A.
    IET SIGNAL PROCESSING, 2012, 6 (05) : 466 - 477
  • [6] Blind speech source separation via nonlinear time-frequency masking
    Xu, Shun
    Chen, Shaorong
    Liu, Yulin
    Shengxue Xuebao/Acta Acustica, 2007, 32 (04): : 375 - 381
  • [7] Memory Time Span in LSTMs for Multi-Speaker Source Separation
    Zegers, Jeroen
    Van Hamme, Hugo
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1477 - 1481
  • [8] INTEGRATION OF SPEECH SEPARATION, DIARIZATION, AND RECOGNITION FOR MULTI-SPEAKER MEETINGS: SYSTEM DESCRIPTION, COMPARISON, AND ANALYSIS
    Raj, Desh
    Denisov, Pavel
    Chen, Zhuo
    Erdogan, Hakan
    Huang, Zili
    He, Maokui
    Watanabe, Shinji
    Du, Jun
    Yoshioka, Takuya
    Luo, Yi
    Kanda, Naoyuki
    Li, Jinyu
    Wisdom, Scott
    Hershey, John R.
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 897 - 904
  • [9] MULTI-SPEAKER AND CONTEXT-INDEPENDENT ACOUSTICAL CUES FOR AUTOMATIC SPEECH RECOGNITION
    ROSSI, M
    NISHINUMA, Y
    MERCIER, G
    SPEECH COMMUNICATION, 1983, 2 (2-3) : 215 - 217
  • [10] Time-frequency distributions for automatic speech recognition
    Potamianos, A
    Maragos, P
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (03): : 196 - 200