Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios

被引:3
|
作者
Tsunoo, Emiru [1 ]
Shibata, Kentaro [1 ]
Narisetty, Chaitanya [2 ]
Kashiwagi, Yosuke [1 ]
Watanabe, Shinji [2 ]
机构
[1] Sony Corp, Tokyo, Japan
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源
关键词
speech recognition; data augmentation; RNN-transducer; text-to-speech; Cycle-GAN; label smoothing;
D O I
10.21437/Interspeech.2021-958
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Although end-to-end automatic speech recognition (E2E ASR) has achieved great performance in tasks that have numerous paired data, it is still challenging to make E2E ASR robust against noisy and low-resource conditions. In this study, we investigated data augmentation methods for E2E ASR in distant-talk scenarios. E2E ASR models are trained on the series of CHiME challenge datasets, which are suitable tasks for studying robustness against noisy and spontaneous speech. We propose to use three augmentation methods and thier combinations: 1) data augmentation using text-to-speech (TTS) data, 2) cycle-consistent generative adversarial network (Cycle-GAN) augmentation trained to map two different audio characteristics, the one of clean speech and of noisy recordings, to match the testing condition, and 3) pseudo-label augmentation provided by the pretrained ASR module for smoothing label distributions. Experimental results using the CHiME-6/CHiME-4 datasets show that each augmentation method individually improves the accuracy on top of the conventional SpecAugment; further improvements are obtained by combining these approaches. We achieved 4.3% word error rate (WER) reduction, which was more significant than that of the SpecAugment, when we combine all three augmentations for the CHiME-6 task.
引用
收藏
页码:301 / 305
页数:5
相关论文
共 50 条
  • [11] You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
    Laptev, Aleksandr
    Korostik, Roman
    Svischev, Aleksey
    Andrusenko, Andrei
    Medennikov, Ivan
    Rybin, Sergey
    [J]. 2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 439 - 444
  • [12] Multitask Training with Text Data for End-to-End Speech Recognition
    Wang, Peidong
    Sainath, Tara N.
    Weiss, Ron J.
    [J]. INTERSPEECH 2021, 2021, : 2566 - 2570
  • [13] END-TO-END MULTIMODAL SPEECH RECOGNITION
    Palaskar, Shruti
    Sanabria, Ramon
    Metze, Florian
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5774 - 5778
  • [14] End-to-End Speech Recognition in Russian
    Markovnikov, Nikita
    Kipyatkova, Irina
    Lyakso, Elena
    [J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 377 - 386
  • [15] Multichannel End-to-end Speech Recognition
    Ochiai, Tsubasa
    Watanabe, Shinji
    Hori, Takaaki
    Hershey, John R.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [16] End-to-end Accented Speech Recognition
    Viglino, Thibault
    Motlicek, Petr
    Cernak, Milos
    [J]. INTERSPEECH 2019, 2019, : 2140 - 2144
  • [17] END-TO-END AUDIOVISUAL SPEECH RECOGNITION
    Petridis, Stavros
    Stafylakis, Themos
    Ma, Pingchuan
    Cai, Feipeng
    Tzimiropoulos, Georgios
    Pantic, Maja
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6548 - 6552
  • [18] Overview of end-to-end speech recognition
    Wang, Song
    Li, Guanyu
    [J]. 2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [19] END-TO-END ANCHORED SPEECH RECOGNITION
    Wang, Yiming
    Fan, Xing
    Chen, I-Fan
    Liu, Yuzong
    Chen, Tongfei
    Hoffmeister, Bjorn
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7090 - 7094
  • [20] Active Learning Methods for Low Resource End-To-End Speech Recognition
    Malhotra, Karan
    Bansal, Shubham
    Ganapathy, Sriram
    [J]. INTERSPEECH 2019, 2019, : 2215 - 2219