Binary and ratio time-frequency masks for robust speech recognition

被引:173
|
作者
Srinivasan, Soundararajan
Roman, Nicoleta
Wang, DeLiang
机构
[1] Ohio State Univ, Dept Biomed Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[3] Ohio State Univ, Ctr Cognit Sci, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
ideal binary mask; ratio mask; robust speech recognition; missing-data recognizer; binaural processing; speech segregation;
D O I
10.1016/j.specom.2006.09.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A time-varying Wiener filter specifies the ratio of a target signal and a noisy mixture in a local time-frequency unit. We estimate this ratio using a binaural processor and derive a ratio time-frequency mask. This mask is used to extract the speech signal, which is then fed to a conventional speech recognizer operating in the cepstral domain. We compare the performance of this system with a missing-data recognizer that operates in the spectral domain using the time-frequency units that are dominated by speech. To apply the missing-data recognizer, the same binaural processor is used to estimate an ideal binary time-frequency mask, which selects a local time-frequency unit if the speech signal within the unit is stronger than the interference. We find that the performance of the missing data recognizer is better on a small vocabulary recognition task but the performance of the conventional recognizer is substantially better when the vocabulary size is increased. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:1486 / 1501
页数:16
相关论文
共 50 条
  • [1] Robust speaker recognition using binary time-frequency masks
    Shao, Yang
    Wang, DeLiang
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 645 - 648
  • [2] TIME-FREQUENCY CONVOLUTIONAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Mitra, Vikramjit
    Franco, Horacio
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 317 - 323
  • [3] Robust speech recognition from binary masks
    Narayanan, Arun
    Wang, DeLiang
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 128 (05): : EL217 - EL222
  • [4] Robust Beam forming for Speech Recognition Using DNN-Based Time-Frequency Masks Estimation
    Jiang, Wenbin
    Wen, Fei
    Liu, Peilin
    [J]. IEEE ACCESS, 2018, 6 : 52385 - 52392
  • [5] ROBUST SPEECH RECOGNITION FROM RATIO MASKS
    Wang, Zhong-Qiu
    Wang, DeLiang
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5720 - 5724
  • [6] Perceptual learning for speech in noise after application of binary time-frequency masks
    Ahmadi, Mahnaz
    Gross, Vauna L.
    Sinex, Donal G.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 133 (03): : 1687 - 1692
  • [7] Time-Frequency Masking For Large Scale Robust Speech Recognition
    Wang, Yuxuan
    Misra, Ananya
    Chine, Kean K.
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2469 - 2473
  • [8] Deep Speech Inpainting of Time-frequency Masks
    Kegler, Mikolaj
    Beckmann, Pierre
    Cernak, Milos
    [J]. INTERSPEECH 2020, 2020, : 3276 - 3280
  • [9] On the optimality of ideal binary time-frequency masks
    Li, Yipeng
    Wang, Debang
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 3501 - 3504
  • [10] On the optimality of ideal binary time-frequency masks
    Li, Yipeng
    Wang, DeLiang
    [J]. SPEECH COMMUNICATION, 2009, 51 (03) : 230 - 239