SEQUENCE NOISE INJECTED TRAINING FOR END-TO-END SPEECH RECOGNITION

被引:0
|
作者
Saon, George [1 ]
Tuske, Zoltan [1 ]
Audhkhasi, Kartik [1 ]
Kingsbury, Brian [1 ]
机构
[1] IBM Res AI, Yorktown Hts, NY 10598 USA
关键词
End-to-end ASR; noise injection;
D O I
10.1109/icassp.2019.8683706
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a simple noise injection algorithm for training end-to-end ASR models which consists in adding to the spectra of training utterances the scaled spectra of random utterances of comparable length. We conjecture that the sequence information of the "noise" utterances is important and verify this via a contrast experiment where the frames of the utterances to be added are randomly shuffled. Experiments for both CTC and attention-based models show that the proposed scheme results in up to 9% relative word error rate improvements ( depending on the model and test set) on the Switchboard 300 hours English conversational telephony database. Additionally, we set a new benchmark for attention-based encoder-decoder models on this corpus.
引用
收藏
页码:6261 / 6265
页数:5
相关论文
共 50 条
  • [31] Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation
    Kim, Hanbyul
    Seo, Seunghyun
    Lee, Lukas
    Baek, Seolki
    INTERSPEECH 2023, 2023, : 1653 - 1657
  • [32] COMBINING END-TO-END AND ADVERSARIAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
    Drexler, Jennifer
    Glass, James
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 361 - 368
  • [33] ADVERSARIAL TRAINING OF END-TO-END SPEECH RECOGNITION USING A CRITICIZING LANGUAGE MODEL
    Liu, Alexander H.
    Lee, Hung-yi
    Lee, Lin-shan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6176 - 6180
  • [34] Improved training strategies for end-to-end speech recognition in digital voice assistants
    Tulsiani, Hitesh
    Sapru, Ashtosh
    Arsikere, Harish
    Punjabi, Surabhi
    Garimella, Sri
    INTERSPEECH 2020, 2020, : 2792 - 2796
  • [35] TOKEN-WISE TRAINING FOR ATTENTION BASED END-TO-END SPEECH RECOGNITION
    Wang, Peidong
    Cui, Jia
    Weng, Chao
    Yu, Dong
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6276 - 6280
  • [36] SYNCHRONOUS TRANSFORMERS FOR END-TO-END SPEECH RECOGNITION
    Tian, Zhengkun
    Yi, Jiangyan
    Bai, Ye
    Tao, Jianhua
    Zhang, Shuai
    Wen, Zhengqi
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7884 - 7888
  • [37] End-to-End Speech Recognition For Arabic Dialects
    Seham Nasr
    Rehab Duwairi
    Muhannad Quwaider
    Arabian Journal for Science and Engineering, 2023, 48 : 10617 - 10633
  • [38] End-to-End Speech Recognition of Tamil Language
    Changrampadi, Mohamed Hashim
    Shahina, A.
    Narayanan, M. Badri
    Khan, A. Nayeemulla
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 32 (02): : 1309 - 1323
  • [39] PARAMETER UNCERTAINTY FOR END-TO-END SPEECH RECOGNITION
    Braun, Stefan
    Liu, Shih-Chii
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5636 - 5640
  • [40] END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS
    Petridis, Stavros
    Li, Zuwei
    Pantic, Maja
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2592 - 2596