Bridging the Gap Between Monaural Speech Enhancement and Recognition With Distortion-Independent Acoustic Modeling

被引:24
|
作者
Wang, Peidong [1 ]
Tan, Ke [1 ]
Wang, De Liang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
Speech enhancement; Acoustic distortion; Acoustics; Training; Speech recognition; Noise measurement; speech recognition; speech distortion; distortion-independent acoustic modeling; DEEP NEURAL-NETWORK; FRONT-END; SEPARATION; NOISE;
D O I
10.1109/TASLP.2019.2946789
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Monaural speech enhancement has made dramatic advances since the introduction of deep learning a few years ago. Although enhanced speech has been demonstrated to have better intelligibility and quality for human listeners, feeding it directly to automatic speech recognition (ASR) systems trained with noisy speech has not produced expected improvements in ASR performance. The lack of an enhancement benefit on recognition, or the gap between monaural speech enhancement and recognition, is often attributed to speech distortions introduced in the enhancement process. In this article, we analyze the distortion problem, compare different acoustic models, and investigate a distortion-independent training scheme for monaural speech recognition. Experimental results suggest that distortion-independent acoustic modeling is able to overcome the distortion problem. Such an acoustic model can also work with speech enhancement models different from the one used during training. Moreover, the models investigated in this paper outperform the previous best system on the CHiME-2 corpus.
引用
收藏
页码:39 / 48
页数:10
相关论文
共 50 条
  • [31] SIMULTANEOUS SPEECH RECOGNITION AND SPEAKER DIARIZATION FOR MONAURAL DIALOGUE RECORDINGS WITH TARGET-SPEAKER ACOUSTIC MODELS
    Kanda, Naoyuki
    Horiguchi, Shota
    Fujita, Yusuke
    Xue, Yawen
    Nagamatsu, Kenji
    Watanabe, Shinji
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 31 - 38
  • [32] Improving Speech Recognition for the Elderly: A New Corpus of Elderly Japanese Speech and Investigation of Acoustic Modeling for Speech Recognition
    Fukuda, Meiko
    Nishizaki, Hiromitsu
    Iribe, Yurie
    Nishimura, Ryota
    Kitaoka, Norihide
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6578 - 6585
  • [33] DATA SAMPLING ENSEMBLE ACOUSTIC MODELLING IN SPEAKER INDEPENDENT SPEECH RECOGNITION
    Chen, Xin
    Zhao, Yunxin
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5130 - 5133
  • [34] Language Independent and Unsupervised Acoustic Models for Speech Recognition and Keyword Spotting
    Knill, Kate M.
    Gales, Mark J. F.
    Ragni, Anton
    Rath, Shakti P.
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 16 - 20
  • [35] Speech Enhancement Based on Masking Approach Considering Speech Quality and Acoustic Confidence for Noisy Speech Recognition
    Chu, Shih-Chuan
    Wu, Chung-Hsien
    Lin, Yun-Wen
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 536 - 540
  • [36] Transfer learning for acoustic modeling of noise robust speech recognition
    Yi J.
    Tao J.
    Liu B.
    Wen Z.
    [J]. Qinghua Daxue Xuebao/Journal of Tsinghua University, 2018, 58 (01): : 55 - 60
  • [37] CYCLEGAN BANDWIDTH EXTENSION ACOUSTIC MODELING FOR AUTOMATIC SPEECH RECOGNITION
    Haws, David
    Cui, Xiaodong
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6780 - 6784
  • [38] A study on acoustic modeling for speech recognition of predominantly monosyllabic languages
    Maneenoi, E
    Ahkuputra, V
    Luksaneeyanawin, S
    Jitapunkul, S
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (05): : 1146 - 1163
  • [39] Acoustic Modeling Based on Model Structure Annealing for Speech Recognition
    Shiota, Sayaka
    Hashimoto, Kei
    Zen, Heiga
    Nankaku, Yoshihiko
    Lee, Akinobu
    Tokuda, Keiichi
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 932 - 935
  • [40] Significance of Feature Selection for Acoustic Modeling in Dysarthric Speech Recognition
    Mathew, Jerin Baby
    Jacob, Jonie
    Sajeev, Karun
    Joy, Jithin
    Rajan, Rajeev
    [J]. 2018 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2018,