Synthetic Data Augmentation for ASR with Domain Filtering

被引:1
|
作者
Tuan Vu Ho [1 ]
Horiguchi, Shota [1 ]
Watanabe, Shinji [2 ]
Garcia, Paola [3 ]
Sumiyoshi, Takashi [1 ]
机构
[1] Hitachi Ltd, Res & Dev Grp, Hitachi, Ibaraki, Japan
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[3] Johns Hopkins Univ, Baltimore, MD 21218 USA
关键词
speech recognition; domain filtering; semantic similarity maximization; vocabulary coverage maximization; SPEECH;
D O I
10.1109/APSIPAASC58517.2023.10317120
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent studies have shown that synthetic speech can effectively serve as training data for automatic speech recognition models. Text data for synthetic speech is mostly obtained from in-domain text or generated text using augmentation. However, obtaining large amounts of in-domain text data with diverse lexical contexts is difficult, especially in low-resource scenarios. This paper proposes using text from a large generic-domain source and applying a domain filtering method to choose the relevant text data. This method involves two filtering steps: 1) selecting text based on its semantic similarity to the available in-domain text and 2) diversifying the vocabulary of the selected text using a greedy-search algorithm. Experimental results show that our proposed method outperforms the conventional text augmentation approach, with the relative reduction of word-error-rate ranging from 6% to 25% on the LibriSpeech dataset and 15% on a low-resource Vietnamese dataset.
引用
收藏
页码:1760 / 1765
页数:6
相关论文
共 50 条
  • [1] SYNTHETIC DATA AUGMENTATION FOR IMPROVING LOW-RESOURCE ASR
    Thai, Bao
    Jimerson, Robert
    Arcoraci, Dominic
    Prud'hommeaux, Emily
    Ptucha, Raymond
    [J]. 2019 IEEE WESTERN NEW YORK IMAGE AND SIGNAL PROCESSING WORKSHOP (WNYISPW), 2019,
  • [2] A Simple Baseline for Domain Adaptation in End to End ASR Systems Using Synthetic Data
    Joshi, Raviraj
    Singh, Anupam
    [J]. PROCEEDINGS OF THE 5TH WORKSHOP ON E-COMMERCE AND NLP (ECNLP 5), 2022, : 244 - 249
  • [3] Accurate synthesis of dysarthric Speech for ASR data augmentation
    Soleymanpour, Mohammad
    Johnson, Michael T.
    Soleymanpour, Rahim
    Berry, Jeffrey
    [J]. SPEECH COMMUNICATION, 2024, 164
  • [4] IMPROVED DATA SELECTION FOR DOMAIN ADAPTATION IN ASR
    Wotherspoon, Shannon
    Hartmann, William
    Snover, Matthew
    Kimball, Owen
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7018 - 7022
  • [5] Data Augmentation for Low-Resource Quechua ASR Improvement
    Zevallos, Rodolfo
    Bel, Nuria
    Cambara, Guillermo
    Farrus, Mireia
    Luque, Jordi
    [J]. INTERSPEECH 2022, 2022, : 3518 - 3522
  • [6] DATA AUGMENTATION FOR ASR USING TTS VIA A DISCRETE REPRESENTATION
    Ueno, Sei
    Mimura, Masato
    Sakai, Shinsuke
    Kawahara, Tatsuya
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 68 - 75
  • [7] SCADA: Stochastic, Consistent and Adversarial Data Augmentation to Improve ASR
    Wang, Gary
    Rosenberg, Andrew
    Chen, Zhehuai
    Zhang, Yu
    Ramabhadran, Bhuvana
    Moreno, Pedro
    [J]. INTERSPEECH 2020, 2020, : 2832 - 2836
  • [8] On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR
    Lam, Tsz Kin
    Ohta, Mayumi
    Schamoni, Shigehiko
    Riezler, Stefan
    [J]. INTERSPEECH 2021, 2021, : 1299 - 1303
  • [9] Data Augmentation Using Spectral Warping for Low Resource Children ASR
    Kathania, Hemant Kumar
    Kadyan, Viredner
    Kadiri, Sudarsana Reddy
    Kurimo, Mikko
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2022, 94 (12): : 1507 - 1513
  • [10] Synthetic data augmentation for biological datasets
    Silva, B.
    Pereira, F.
    Lourenco, N.
    [J]. EUROPEAN JOURNAL OF CLINICAL INVESTIGATION, 2022, 52