SEQUENCE NOISE INJECTED TRAINING FOR END-TO-END SPEECH RECOGNITION

被引:0
|
作者
Saon, George [1 ]
Tuske, Zoltan [1 ]
Audhkhasi, Kartik [1 ]
Kingsbury, Brian [1 ]
机构
[1] IBM Res AI, Yorktown Hts, NY 10598 USA
关键词
End-to-end ASR; noise injection;
D O I
10.1109/icassp.2019.8683706
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a simple noise injection algorithm for training end-to-end ASR models which consists in adding to the spectra of training utterances the scaled spectra of random utterances of comparable length. We conjecture that the sequence information of the "noise" utterances is important and verify this via a contrast experiment where the frames of the utterances to be added are randomly shuffled. Experiments for both CTC and attention-based models show that the proposed scheme results in up to 9% relative word error rate improvements ( depending on the model and test set) on the Switchboard 300 hours English conversational telephony database. Additionally, we set a new benchmark for attention-based encoder-decoder models on this corpus.
引用
收藏
页码:6261 / 6265
页数:5
相关论文
共 50 条
  • [21] END-TO-END ANCHORED SPEECH RECOGNITION
    Wang, Yiming
    Fan, Xing
    Chen, I-Fan
    Liu, Yuzong
    Chen, Tongfei
    Hoffmeister, Bjorn
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7090 - 7094
  • [22] Jointly Adversarial Enhancement Training for Robust End-to-End Speech Recognition
    Liu, Bin
    Nie, Shuai
    Liang, Shan
    Liu, Wenju
    Yu, Meng
    Chen, Lianwu
    Peng, Shouye
    Li, Changliang
    INTERSPEECH 2019, 2019, : 491 - 495
  • [23] End-to-end multilingual speech recognition system with language supervision training
    Liu, Danyang
    Xu, Ji
    Zhang, Pengyuan
    IEICE Transactions on Information and Systems, 2020, E103D (06) : 1427 - 1430
  • [24] End-to-End Multilingual Speech Recognition System with Language Supervision Training
    Liu, Danyang
    Xu, Ji
    Zhang, Pengyuan
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (06): : 1427 - 1430
  • [25] EXPLORING MODEL UNITS AND TRAINING STRATEGIES FOR END-TO-END SPEECH RECOGNITION
    Huang, Mingkun
    Lu, Yizhou
    Wang, Lan
    Qian, Yanmin
    Yu, Kai
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 524 - 531
  • [26] Large Margin Training for Attention Based End-to-End Speech Recognition
    Wang, Peidong
    Cui, Jia
    Weng, Chao
    Yu, Dong
    INTERSPEECH 2019, 2019, : 246 - 250
  • [27] Towards end-to-end training of automatic speech recognition for nigerian pidgin
    Ajisafe, Daniel
    Adegboro, Oluwabukola
    Oduntan, Esther
    Arulogun, Tayo
    arXiv, 2020,
  • [28] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
    Liu, Da-Rong
    Yang, Chi-Yu
    Wu, Szu-Lin
    Lee, Hung-Yi
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
  • [29] INTERACTIVE FEATURE FUSION FOR END-TO-END NOISE-ROBUST SPEECH RECOGNITION
    Hu, Yuchen
    Hou, Nana
    Chen, Chen
    Chng, Eng Siong
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6292 - 6296
  • [30] Improving End-to-End Bangla Speech Recognition with Semi-supervised Training
    Sadeq, Nafis
    Chowdhury, Nafis Tahmid
    Utshaw, Farhan Tanvir
    Ahmed, Shafayat
    Adnan, Muhammad Abdullah
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1875 - 1883