You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation

被引：0

作者：

Laptev, Aleksandr ^{[1
]}

Korostik, Roman ^{[1
]}

Svischev, Aleksey ^{[1
]}

Andrusenko, Andrei ^{[1
]}

Medennikov, Ivan ^{[1
,2
]}

Rybin, Sergey ^{[1
,2
]}

机构：

[1] ITMO Univ, St Petersburg 197101, Russia

[2] STC Innovat Ltd, St Petersburg 194044, Russia

来源：

2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020) | 2020年

关键词：

Speech Recognition; End-to-End; Speech Synthesis; Data Augmentation; ASR;

D O I：

10.1109/cisp-bmei51763.2020.9263564

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Data augmentation is one of the most effective ways to make end-to-end automatic speech recognition (ASR) perform close to the conventional hybrid approach, especially when dealing with low-resource tasks. Using recent advances in speech synthesis (text-to-speech, or TTS), we build our TTS system on an ASR training database and then extend the data with synthesized speech to train a recognition model. We argue that, when the training data amount is relatively low, this approach can allow an end-to-end model to reach hybrid systems' quality. For an artificial low-to-medium-resource setup, we compare the proposed augmentation with the semi-supervised learning technique. We also investigate the influence of vocoder usage on final ASR performance by comparing Griffin-Lim algorithm with our modified LPCNet. When applied with an external language model, our approach outperforms a semi-supervised setup for LibriSpeech test-clean and only 33% worse than a comparable supervised setup. Our system establishes a competitive result for end-to-end ASR trained on LibriSpeech train-clean-100 set with WER 4.3% for test-clean and 13.5% for test-other.

引用

页码：439 / 444

页数：6

共 50 条

[1] Semantic Data Augmentation for End-to-End Mandarin Speech Recognition
Sun, Jianwei
Tang, Zhiyuan
Yin, Hengxin
Wang, Wei
Zhao, Xi
Zhao, Shuaijiang
Lei, Xiaoning
Zou, Wei
Li, Xiangang
[J]. INTERSPEECH 2021, 2021, : 1269 - 1273
[2] Data Augmentation for End-to-end Silent Speech Recognition for Laryngectomees
Cao, Beiming
Teplansky, Kristin
Sebkhi, Nordine
Bhaysar, Arpan
Inan, Omer T.
Samlan, Robin
Mau, Ted
Wang, Jun
[J]. INTERSPEECH 2022, 2022, : 3653 - 3657
[3] On the Training and Testing Data Preparation for End-to-End Text-to-Speech Application
Duc Chung Tran
Khan, M. K. A. Ahamed
Sridevi, S.
[J]. 2020 11TH IEEE CONTROL AND SYSTEM GRADUATE RESEARCH COLLOQUIUM (ICSGRC), 2020, : 73 - 75
[4] Multitask Training with Text Data for End-to-End Speech Recognition
Wang, Peidong
Sainath, Tara N.
Weiss, Ron J.
[J]. INTERSPEECH 2021, 2021, : 2566 - 2570
[5] DATA AUGMENTATION FOR END-TO-END CODE-SWITCHING SPEECH RECOGNITION
Du, Chenpeng
Li, Hao
Lu, Yizhou
Wang, Lan
Qian, Yanmin
[J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 194 - 200
[6] SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition
Song, Xingchen
Wu, Zhiyong
Huang, Yiheng
Su, Dan
Meng, Helen
[J]. INTERSPEECH 2020, 2020, : 581 - 585
[7] SEMI-SUPERVISED END-TO-END SPEECH RECOGNITION USING TEXT-TO-SPEECH AND AUTOENCODERS
Karita, Shigeki
Watanabe, Shinji
Iwata, Tomoharu
Delcroix, Marc
Ogawa, Atsunori
Nakatani, Tomohiro
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6166 - 6170
[8] Improvement of the end-To-end scene text recognition method for text-To-speech conversion
Makhmudov, Fazliddin
Mukhiddinov, Mukhriddin
Abdusalomov, Akmalbek
Avazov, Kuldoshbay
Khamdamov, Utkir
Cho, Young Im
[J]. Cho, Young Im (yicho@gachon.ac.kr), 1600, World Scientific (18):
[9] End-to-End Mongolian Text-to-Speech System
Li, Jingdong
Zhang, Hui
Liu, Rui
Zhang, Xueliang
Bao, Feilong
[J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 483 - 487
[10] Improvement of the end-to-end scene text recognition method for "text-to-speech" conversion
Makhmudov, Fazliddin
Mukhiddinov, Mukhriddin
Abdusalomov, Akmalbek
Avazov, Kuldoshbay
Khamdamov, Utkir
Cho, Young Im
[J]. INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2020, 18 (06)

← 1 2 3 4 5 →