Robust speech recognition by integrating speech separation and hypothesis testing

被引:17
|
作者
Srinivasan, Soundararajan [1 ]
Wang, DeLiang [2 ,3 ]
机构
[1] Ohio State Univ, Dept Biomed Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[3] Ohio State Univ, Ctr Cognit Sci, Columbus, OH 43210 USA
关键词
Robust speech recognition; Missing-data recognizer; Ideal binary mask; Speech segregation; Top-down processing; NOVELTY DETECTION; NOISE;
D O I
10.1016/j.specom.2009.08.008
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Missing-data methods attempt to improve robust speech recognition by distinguishing between reliable and unreliable data in the time-frequency (T-F) domain. Such methods require a binary mask to label speech-dominant T-F regions of a noisy speech signal as reliable and the rest as unreliable. Current methods for computing the mask are based mainly on bottom-up cues such as harmonicity and produce labeling errors that degrade recognition performance. In this paper, we propose a two-stage recognition system that combines bottom-up and top-down cues in order to simultaneously improve both mask estimation and recognition accuracy. First, an n-best lattice consistent with a speech separation mask is generated. The lattice is then re-scored by expanding the mask using a model-based hypothesis test to determine the reliability of individual T-F units. Systematic evaluations of the proposed system show significant improvement in recognition performance compared to that using speech separation alone. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:72 / 81
页数:10
相关论文
共 50 条
  • [1] Robust speech recognition by integrating speech separation and hypothesis testing
    Srinivasan, S
    Wang, DL
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 89 - 92
  • [2] CASA Based Speech Separation for Robust Speech Recognition
    Han Runqiang
    Zhao Pei
    Gao Qin
    Zhang Zhiping
    Wu Hao
    Wu Xihong
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 77 - 80
  • [3] Deep Neural Network Based Speech Separation for Robust Speech Recognition
    Tu Yanhui
    Jun, Du
    Xu Yong
    Dai Lirong
    Chin-Hui, Lee
    [J]. 2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 532 - 536
  • [4] Monaural speech separation based on MAXVQ and CASA for robust speech recognition
    Li, Peng
    Guan, Yong
    Wang, Shijin
    Xu, Bo
    Liu, Wenju
    [J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 30 - 44
  • [5] SPEECH SEPARATION FOR SPEECH RECOGNITION
    DECHEVEIGNE, A
    HAWAHARA, H
    AIKAWA, K
    LEA, A
    [J]. JOURNAL DE PHYSIQUE IV, 1994, 4 (C5): : 545 - 548
  • [6] Investigation of Speech Separation as a Front-End for Noise Robust Speech Recognition
    Narayanan, Arun
    Wang, DeLiang
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 826 - 835
  • [7] A robust speech analysis in speech recognition
    Miyanaga, Y
    Gozen, S
    Ohtsuki, N
    [J]. 2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 706 - 709
  • [8] Improved voice activity detection using contextual multiple hypothesis testing for robust speech recognition
    Ramirez, Javier
    Segura, Jose C.
    Gorriz, Juan M.
    Garcia, Luz
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08): : 2177 - 2189
  • [9] Joint Training of Speech Separation, Filterbank and Acoustic Model for Robust Automatic Speech Recognition
    Wang, Zhong-Qiu
    Wang, DeLiang
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2839 - 2843
  • [10] Co-channel speech separation for robust automatic speech recognition: Stability and efficiency
    Yen, KC
    Zhao, YX
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 859 - 862