A Corpus-Based Approach to Speech Enhancement From Nonstationary Noise

被引:48
|
作者
Ming, Ji [1 ]
Srinivasan, Ramji [1 ]
Crookes, Danny [1 ]
机构
[1] Queens Univ Belfast, Sch Elect Elect Engn & Comp Sci, Belfast BT7 1NN, Antrim, North Ireland
基金
英国工程与自然科学研究理事会;
关键词
Corpus-based speech modeling; longest matching segment; nonstationary noise; speech enhancement; speech separation; SUBSPACE APPROACH; RECOGNITION; MODEL; SUPPRESSION;
D O I
10.1109/TASL.2010.2064312
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Temporal dynamics and speaker characteristics are two important features of speech that distinguish speech from noise. In this paper, we propose a method to maximally extract these two features of speech for speech enhancement. We demonstrate that this can reduce the requirement for prior information about the noise, which can be difficult to estimate for fast-varying noise. Given noisy speech, the new approach estimates clean speech by recognizing long segments of the clean speech as whole units. In the recognition, clean speech sentences, taken from a speech corpus, are used as examples. Matching segments are identified between the noisy sentence and the corpus sentences. The estimate is formed by using the longest matching segments found in the corpus sentences. Longer speech segments as whole units contain more distinct dynamics and richer speaker characteristics, and can be identified more accurately from noise than shorter speech segments. Therefore, estimation based on the longest recognized segments increases the noise immunity and hence the estimation accuracy. The new approach consists of a statistical model to represent up to sentence-long temporal dynamics in the corpus speech, and an algorithm to identify the longest matching segments between the noisy sentence and the corpus sentences. The algorithm is made more robust to noise uncertainty by introducing missing-feature based noise compensation into the corpus sentences. Experiments have been conducted on the TIMIT database for speech enhancement from various types of nonstationary noise including song, music, and crosstalk speech. The new approach has shown improved performance over conventional enhancement algorithms in both objective and subjective evaluations.
引用
收藏
页码:822 / 836
页数:15
相关论文
共 50 条
  • [1] A Corpus-Based Approach to Speech Enhancement from Nonstationary Noise
    Ming, Ji
    Srinivasan, Ramji
    Crookes, Danny
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1097 - 1100
  • [2] Corpus-Based Speech Enhancement With Uncertainty Modeling and Cepstral Smoothing
    Nickel, Robert M.
    Astudillo, Ramon Fernandez
    Kolossa, Dorothea
    Martin, Rainer
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (05): : 983 - 997
  • [3] FAST SEGMENT SEARCH FOR CORPUS-BASED SPEECH ENHANCEMENT BASED ON SPEECH RECOGNITION TECHNOLOGY
    Ogawa, Atsunori
    Kinoshita, Keisuke
    Hori, Takaaki
    Nakatani, Tomohiro
    Nakamura, Atsushi
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [4] A Corpus-Based Approach to the Study of Speech Act of Thanking
    Cheng, Stephanie W.
    [J]. CONCENTRIC-STUDIES IN LINGUISTICS, 2010, 36 (02) : 257 - 274
  • [5] Speech enhancement for nonstationary noise environment
    Lin, L
    Ambikairajah, E
    Holmes, WH
    [J]. APCCAS 2002: ASIA-PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS, VOL 1, PROCEEDINGS, 2002, : 177 - 180
  • [6] Speech Enhancement for Nonstationary Noise Environments
    Zhang, Qiquan
    Wang, Mingjiang
    [J]. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT 2017), 2017, : 1663 - 1667
  • [7] Speech enhancement in nonstationary noise environments using noise properties
    Manohar, K
    Rao, P
    [J]. SPEECH COMMUNICATION, 2006, 48 (01) : 96 - 109
  • [8] A speech enhancement approach based on noise classification
    Yuan, Wenhao
    Xia, Bin
    [J]. APPLIED ACOUSTICS, 2015, 96 : 11 - 19
  • [9] Corpus-based methods in language and speech processing
    Bruce, R
    [J]. COMPUTATIONAL LINGUISTICS, 1998, 24 (02) : 317 - 318
  • [10] Special section on corpus-based speech technologies
    Shikano, K
    Tokuda, K
    Matsui, T
    Shinoda, K
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (03) : 365 - 365