Bridging the Gap Between Monaural Speech Enhancement and Recognition With Distortion-Independent Acoustic Modeling

被引:24
|
作者
Wang, Peidong [1 ]
Tan, Ke [1 ]
Wang, De Liang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
Speech enhancement; Acoustic distortion; Acoustics; Training; Speech recognition; Noise measurement; speech recognition; speech distortion; distortion-independent acoustic modeling; DEEP NEURAL-NETWORK; FRONT-END; SEPARATION; NOISE;
D O I
10.1109/TASLP.2019.2946789
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Monaural speech enhancement has made dramatic advances since the introduction of deep learning a few years ago. Although enhanced speech has been demonstrated to have better intelligibility and quality for human listeners, feeding it directly to automatic speech recognition (ASR) systems trained with noisy speech has not produced expected improvements in ASR performance. The lack of an enhancement benefit on recognition, or the gap between monaural speech enhancement and recognition, is often attributed to speech distortions introduced in the enhancement process. In this article, we analyze the distortion problem, compare different acoustic models, and investigate a distortion-independent training scheme for monaural speech recognition. Experimental results suggest that distortion-independent acoustic modeling is able to overcome the distortion problem. Such an acoustic model can also work with speech enhancement models different from the one used during training. Moreover, the models investigated in this paper outperform the previous best system on the CHiME-2 corpus.
引用
下载
收藏
页码:39 / 48
页数:10
相关论文
共 50 条
  • [21] Context-independent acoustic models for Thai speech recognition
    Kasuriya, S
    Kanokphara, S
    Thatphithakkul, N
    Cotsomrong, P
    Sunpethniyom, T
    IEEE INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES 2004 (ISCIT 2004), PROCEEDINGS, VOLS 1 AND 2: SMART INFO-MEDIA SYSTEMS, 2004, : 991 - 994
  • [23] Effective Triphone Mapping for Acoustic Modeling in Speech Recognition
    Darjaa, Sakhia
    Cernak, Milos
    Trnka, Marian
    Rusko, Milan
    Sabo, Robert
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1728 - 1731
  • [24] Improved Acoustic Modeling for Automatic Dysarthric Speech Recognition
    Sriranjani, R.
    Reddy, M. Ramasubba
    Umesh, S.
    2015 TWENTY FIRST NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2015,
  • [25] Deep Neural Networks for Acoustic Modeling in Speech Recognition
    Hinton, Geoffrey
    Deng, Li
    Yu, Dong
    Dahl, George E.
    Mohamed, Abdel-rahman
    Jaitly, Navdeep
    Senior, Andrew
    Vanhoucke, Vincent
    Patrick Nguyen
    Sainath, Tara N.
    Kingsbury, Brian
    IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
  • [27] Survey on Acoustic Modeling and Feature Extraction for Speech Recognition
    Garg, Anjali
    Sharma, Poonam
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 2291 - 2295
  • [28] Bridging the Gap Between Independent Enterprise Architecture Domain Models
    Stuht, Thomas
    Speck, Andreas
    BUSINESS INFORMATION SYSTEMS (BIS 2016), 2016, 255 : 277 - 288
  • [29] Bridging the gap between multilevel modeling and economic methods
    Oshchepkov, Aleksey
    Shirokanova, Anna
    SOCIAL SCIENCE RESEARCH, 2022, 104
  • [30] Bridging the gap between business decision and process modeling
    Neiger, D
    Churilov, L
    INFORMATION TECHNOLOGY AND ORGANIZATIONS: TRENDS, ISSUES, CHALLENGES AND SOLUTIONS, VOLS 1 AND 2, 2003, : 444 - 445