LEARNING NOISE-INVARIANT REPRESENTATIONS FOR ROBUST SPEECH RECOGNITION

被引:0
|
作者
Liang, Davis [1 ]
Huang, Zhiheng [1 ]
Lipton, Zachary C. [1 ,2 ]
机构
[1] Amazon AI, Palo Alto, CA 94303 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
Domain adaptation; invariance; data augmentation; noisy speech;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite rapid advances in speech recognition, current models remain brittle to superficial perturbations to their inputs. Small amounts of noise can destroy the performance of an otherwise state-of-the-art model. To harden models against background noise, practitioners often perform data augmentation, adding artificially-noised examples to the training set, carrying over the original label. In this paper, we hypothesize that a clean example and its superficially perturbed counterparts shouldn't merely map to the same class - they should map to the same representation. We propose invariant-representation-learning (IRL): At each training iteration, for each training example, we sample a noisy counterpart. We then apply a penalty term to coerce matched representations at each layer (above some chosen layer). Our key results, demonstrated on the LibriSpeech dataset are the following: (i) IRL significantly reduces character error rates (CER) on both 'clean' (3:3% vs 6:5%) and 'other' (11:0% vs 18:1%) test sets; (ii) on several out-of-domain noise settings (different from those seen during training) IRL's benefits are even more pronounced. Careful ablations confirm that our results are not simply due to shrinking activations at the chosen layers.
引用
收藏
页码:56 / 63
页数:8
相关论文
共 50 条
  • [1] Disentangled Feature Learning for Noise-Invariant Speech Enhancement
    Bae, Soo Hyun
    Choi, Inkyu
    Kim, Nam Soo
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (11):
  • [2] Constructing Noise-Invariant Representations of Sound in the Auditory Pathway
    Rabinowitz, Neil C.
    Willmore, Ben D. B.
    King, Andrew J.
    Schnupp, Jan W. H.
    [J]. PLOS BIOLOGY, 2013, 11 (11):
  • [3] NOISE-INVARIANT COMMUNICATIONS SYSTEMS
    OKUNEV, YB
    [J]. TELECOMMUNICATIONS AND RADIO ENGINEER-USSR, 1971, (03): : 59 - &
  • [4] LEARNING NOISE INVARIANT FEATURES THROUGH TRANSFER LEARNING FOR ROBUST END-TO-END SPEECH RECOGNITION
    Zhang, Shucong
    Do, Cong-Thanh
    Doddipatla, Rama
    Renals, Steve
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7024 - 7028
  • [5] NOISE AWARE MANIFOLD LEARNING FOR ROBUST SPEECH RECOGNITION
    Tomar, Vikrant Singh
    Rose, Richard C.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7087 - 7091
  • [6] Transfer learning for acoustic modeling of noise robust speech recognition
    Yi J.
    Tao J.
    Liu B.
    Wen Z.
    [J]. Qinghua Daxue Xuebao/Journal of Tsinghua University, 2018, 58 (01): : 55 - 60
  • [7] Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition
    Gemmeke, Jort F.
    Virtanen, Tuomas
    Hurmalainen, Antti
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (07): : 2067 - 2080
  • [8] Learning Corpus-Invariant Discriminant Feature Representations for Speech Emotion Recognition
    Song, Peng
    Ou, Shifeng
    Du, Zhenbin
    Guo, Yanyan
    Ma, Wenming
    Liu, Jinglei
    Zheng, Wenming
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (05) : 1136 - 1139
  • [9] ALGONQUIN - Learning dynamic noise models from noisy speech for robust speech recognition
    Frey, BJ
    Kristjansson, TT
    Deng, L
    Acero, A
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 1165 - 1171
  • [10] Robust speech recognition by extracting invariant features
    Eskikand, Parvin Zarei
    Seyyedsalehi, Seyyed Ali
    [J]. 4TH INTERNATIONAL CONFERENCE OF COGNITIVE SCIENCE, 2012, 32 : 230 - 237