Bridging the Gap Between Monaural Speech Enhancement and Recognition With Distortion-Independent Acoustic Modeling

被引:24
|
作者
Wang, Peidong [1 ]
Tan, Ke [1 ]
Wang, De Liang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
Speech enhancement; Acoustic distortion; Acoustics; Training; Speech recognition; Noise measurement; speech recognition; speech distortion; distortion-independent acoustic modeling; DEEP NEURAL-NETWORK; FRONT-END; SEPARATION; NOISE;
D O I
10.1109/TASLP.2019.2946789
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Monaural speech enhancement has made dramatic advances since the introduction of deep learning a few years ago. Although enhanced speech has been demonstrated to have better intelligibility and quality for human listeners, feeding it directly to automatic speech recognition (ASR) systems trained with noisy speech has not produced expected improvements in ASR performance. The lack of an enhancement benefit on recognition, or the gap between monaural speech enhancement and recognition, is often attributed to speech distortions introduced in the enhancement process. In this article, we analyze the distortion problem, compare different acoustic models, and investigate a distortion-independent training scheme for monaural speech recognition. Experimental results suggest that distortion-independent acoustic modeling is able to overcome the distortion problem. Such an acoustic model can also work with speech enhancement models different from the one used during training. Moreover, the models investigated in this paper outperform the previous best system on the CHiME-2 corpus.
引用
收藏
页码:39 / 48
页数:10
相关论文
共 50 条
  • [1] Bridging the Gap Between Monaural Speech Enhancement and Recognition with Distortion-Independent Acoustic Modeling
    Wang, Peidong
    Tan, Ke
    Wang, DeLiang
    [J]. INTERSPEECH 2019, 2019, : 471 - 475
  • [2] Enhanced Spectral Features for Distortion-Independent Acoustic Modeling
    Wang, Peidong
    Wang, DeLiang
    [J]. INTERSPEECH 2019, 2019, : 476 - 480
  • [3] Bridging the gap between human and automatic speech recognition
    ten Bosch, Louis
    Kirchhoff, Katrin
    [J]. SPEECH COMMUNICATION, 2007, 49 (05) : 331 - 335
  • [4] Compensation of speech enhancement distortion for robust speech recognition
    Ding, P
    Cao, ZG
    [J]. 2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 449 - 452
  • [5] Double Adversarial Network based Monaural Speech Enhancement for Robust Speech Recognition
    Du, Zhihao
    Han, Jiqing
    Zhang, Xueliang
    [J]. INTERSPEECH 2020, 2020, : 309 - 313
  • [6] Language-independent and language-adaptive acoustic modeling for speech recognition
    Schultz, T
    Waibel, A
    [J]. SPEECH COMMUNICATION, 2001, 35 (1-2) : 31 - 51
  • [7] ACOUSTIC MODELING OF SUBWORD UNITS FOR LARGE VOCABULARY SPEAKER INDEPENDENT SPEECH RECOGNITION
    LEE, CH
    RABINER, LR
    PIERACCINI, R
    WILPON, JG
    [J]. SPEECH AND NATURAL LANGUAGE, 1989, : 280 - 291
  • [8] BRIDGING GAP BETWEEN RECOGNITION AND INTERVENTION
    CHITTUM, JR
    GASQUE, MR
    [J]. JOURNAL OF OCCUPATIONAL MEDICINE, 1966, 8 (03): : 140 - 141
  • [9] Unpaired Speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition
    Kim, Geonmin
    Lee, Hwaran
    Kim, Bo-Kyeong
    Oh, Sang-Hoon
    Lee, Soo-Young
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (01) : 159 - 163
  • [10] Multidialectal Spanish acoustic modeling for speech recognition
    Caballero, Monica
    Moreno, Asuncion
    Nogueiras, Albino
    [J]. SPEECH COMMUNICATION, 2009, 51 (03) : 217 - 229