SEQUENCE NOISE INJECTED TRAINING FOR END-TO-END SPEECH RECOGNITION

被引：0

作者：

Saon, George ^{[1
]}

Tuske, Zoltan ^{[1
]}

Audhkhasi, Kartik ^{[1
]}

Kingsbury, Brian ^{[1
]}

机构：

[1] IBM Res AI, Yorktown Hts, NY 10598 USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

End-to-end ASR; noise injection;

D O I：

10.1109/icassp.2019.8683706

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We present a simple noise injection algorithm for training end-to-end ASR models which consists in adding to the spectra of training utterances the scaled spectra of random utterances of comparable length. We conjecture that the sequence information of the "noise" utterances is important and verify this via a contrast experiment where the frames of the utterances to be added are randomly shuffled. Experiments for both CTC and attention-based models show that the proposed scheme results in up to 9% relative word error rate improvements ( depending on the model and test set) on the Switchboard 300 hours English conversational telephony database. Additionally, we set a new benchmark for attention-based encoder-decoder models on this corpus.

引用

页码：6261 / 6265

页数：5

共 50 条

[1] End-to-End Speech Recognition Sequence Training With Reinforcement Learning
Tjandra, Andros
Sakti, Sakriani
Nakamura, Satoshi
IEEE ACCESS, 2019, 7 : 79758 - 79769
[2] Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition
Shinohara, Yusuke
Watanabe, Shinji
INTERSPEECH 2022, 2022, : 2098 - 2102
[3] END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM
Kim, Chanwoo
Kim, Sungsoo
Kim, Kwangyoun
Kumar, Mehul
Kim, Jiyeon
Lee, Kyungmin
Han, Changwoo
Garg, Abhinav
Kim, Eunhyang
Shin, Minkyoo
Singh, Shatrughan
Heck, Larry
Gowda, Dhananjaya
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 562 - 569
[4] SELF-TRAINING FOR END-TO-END SPEECH RECOGNITION
Kahn, Jacob
Lee, Ann
Hannun, Awni
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7084 - 7088
[5] Noise Robust End-to-End Speech Recognition For Bangla Language
Sumit, Sakhawat Hosain
Al Muntasir, Tareq
Zaman, M. M. Arefin
Nandi, Rabindra Nath
Sourov, Tanvir
2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
[6] Improved training for online end-to-end speech recognition systems
Kim, Suyoun
Seltzer, Michael L.
Li, Jinyu
Zhao, Rui
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2913 - 2917
[7] CYCLE-CONSISTENCY TRAINING FOR END-TO-END SPEECH RECOGNITION
Hori, Takaaki
Astudillo, Ramon
Hayashi, Tomoki
Zhang, Yu
Watanabe, Shinji
Le Roux, Jonathan
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6271 - 6275
[8] Multitask Training with Text Data for End-to-End Speech Recognition
Wang, Peidong
Sainath, Tara N.
Weiss, Ron J.
INTERSPEECH 2021, 2021, : 2566 - 2570
[9] Improved training of end-to-end attention models for speech recognition
Zeyer, Albert
Irie, Kazuki
Schlueter, Ralf
Ney, Hermann
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 7 - 11
[10] Serialized Output Training for End-to-End Overlapped Speech Recognition
Kanda, Naoyuki
Gaur, Yashesh
Wang, Xiaofei
Meng, Zhong
Yoshioka, Takuya
INTERSPEECH 2020, 2020, : 2797 - 2801

← 1 2 3 4 5 →