SPEECH RECOGNITION IN UNSEEN AND NOISY CHANNEL CONDITIONS

被引:0
|
作者
Mitra, Vikramjit [1 ]
Franco, Horacio [1 ]
Bartels, Chris [1 ]
van Hout, Julien [1 ]
Graciarena, Martin [1 ]
Vergyri, Dimitra [1 ]
机构
[1] SRI Int, Speech Technol & Res Lab, 333 Ravenswood Ave, Menlo Pk, CA 94025 USA
关键词
automatic speech recognition; unsupervised adaptation; channel; and noise-robust speech recognition; auto-encoders; bottleneck features;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech recognition in varying background conditions is a challenging problem. Acoustic condition mismatch between training and evaluation data can significantly reduce recognition performance. For mismatched conditions, data-adaptation techniques are typically found to be useful, as they expose the acoustic model to the new data condition(s). Supervised adaptation techniques usually provide substantial performance improvement, but such gain is contingent on having labeled or transcribed data, which is often unavailable. The alternative is unsupervised adaptation, where feature-transform methods and model-adaptation techniques are typically explored. This work investigates robust features, feature-space maximum likelihood linear regression (fMLLR) transform, and deep convolutional nets to address the problem of unseen channel and noise conditions. In addition, the work investigates bottleneck (BN) features extracted from deep autoencoder (DAE) networks trained by using acoustic features extracted from the speech signal. We demonstrate that such representations not only produce robust systems but also that they can be used to perform data selection for unsupervised model adaptation. Our results indicate that the techniques presented in this paper significantly improve performance of speech recognition systems in unseen channel and noise conditions.
引用
收藏
页码:5215 / 5219
页数:5
相关论文
共 50 条
  • [1] Generative Noise Modeling and Channel Simulation for Robust Speech Recognition in Unseen Conditions
    Soni, Meet
    Joshi, Sonal
    Panda, Ashish
    [J]. INTERSPEECH 2019, 2019, : 441 - 445
  • [2] Speaker Recognition for noisy speech in telephonic channel
    Maurya, Ankur
    Aggarwal, R. K.
    [J]. PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON APPLIED AND THEORETICAL COMPUTING AND COMMUNICATION TECHNOLOGY (ICATCCT), 2016, : 451 - 456
  • [3] INVESTIGATION OF A SINGLE-CHANNEL FREQUENCY-DOMAIN SPEECH ENHANCEMENT NETWORK TO IMPROVE END-TO-END BENGALI AUTOMATIC SPEECH RECOGNITION UNDER UNSEEN NOISY CONDITIONS
    Noor, Md Mahbub E.
    Lu, Yen-Ju
    Wang, Syu-Siang
    Ghose, Supratip
    Chang, Chia-Yu
    Zezario, Ryandhimas E.
    Ahmed, Shafique
    Chung, Wei-Ho
    Tsao, Yu
    Wang, Hsin-Min
    [J]. 2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2021, : 7 - 12
  • [4] Evaluating robust features on Deep Neural Networks for speech recognition in noisy and channel mismatched conditions
    Mitra, Vikramjit
    Wang, Wen
    Franco, Horacio
    Lei, Yun
    Bartels, Chris
    Graciarena, Martin
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 895 - 899
  • [5] ROBUST SPEECH RECOGNITION IN UNKNOWN REVERBERANT AND NOISY CONDITIONS
    Hsiao, Roger
    Ma, Jeff
    Hartmann, William
    Karafiat, Martin
    Grezl, Frantisek
    Burget, Lukas
    Szoke, Igor
    Cernocky, Jan Honza
    Watanabe, Shinji
    Chen, Zhuo
    Mallidi, Sri Harish
    Hermansky, Hynek
    Tsakalidis, Stavros
    Schwartz, Richard
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 533 - 538
  • [6] Techniques for robust speech recognition in noisy and reverberant conditions
    Brown, GJ
    Palomäki, KJ
    [J]. SPEECH SEPARATION BY HUMANS AND MACHINES, 2005, : 213 - 220
  • [7] Speech Enhancement and Recognition of Compressed Speech Signal in Noisy Reverberant Conditions
    Suman, Maloji
    Khan, Habibulla
    Latha, M. Madhavi
    Kumari, Devarakonda Aruna
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS 2012 (INDIA 2012), 2012, 132 : 379 - +
  • [8] SPEECH RECOGNITION WITH NO SPEECH OR WITH NOISY SPEECH
    Krishna, Gautam
    Co Tran
    Yu, Jianguo
    Tewfik, Ahmed H.
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1090 - 1094
  • [9] Using Speech Enhancement Preprocessing for Speech Emotion Recognition in Realistic Noisy Conditions
    Zhou, Hengshun
    Du, Jun
    Tu, Yan-Hui
    Lee, Chin-Hui
    [J]. INTERSPEECH 2020, 2020, : 4098 - 4102
  • [10] EVALUATION OF ADAPTIVE SPEECH CODERS UNDER NOISY CHANNEL CONDITIONS
    SCAGLIOLA, C
    [J]. BELL SYSTEM TECHNICAL JOURNAL, 1979, 58 (06): : 1369 - 1394