Universal Cross-Lingual Data Generation for Low Resource ASR

被引:0
|
作者
Wang, Wei [1 ]
Qian, Yanmin [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Auditory Cognit & Computat Acoust Lab, Shanghai 200240, Peoples R China
关键词
Splicing; Data models; Phonetics; Training; Speech synthesis; Dictionaries; Data mining; Low-resource speech recognition; text-to-seech; data splicing; self-supervised learning; SPEECH RECOGNITION; REPRESENTATION;
D O I
10.1109/TASLP.2023.3345150
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Significant advances in end-to-end (E2E) automatic speech recognition (ASR) have primarily been concentrated on languages rich in annotated data. Nevertheless, a large proportion of languages worldwide, which are typically low-resource, continue to pose significant challenges. To address this issue, this study presents a novel speech synthesis framework based on data splicing that leverages self-supervised learning (SSL) units from Hidden Unit BERT (HuBERT) as universal phonetic units. In our framework, the SSL phonetic units serve as crucial bridges between speech and text across different languages. By leveraging these units, we successfully splice speech fragments from high-resource languages into synthesized speech that maintains acoustic coherence with text from low-resource languages. To further enhance the practicality of the framework, we introduce a sampling strategy based on confidence scores assigned to the speech segments used in data splicing. The application of this confidence sampling strategy in data splicing significantly accelerates ASR model convergence and enhances overall ASR performance. Experimental results on the CommonVoice dataset show 25-35% relative improvement for four Indo-European languages and about 20% for Turkish using a 4-gram language model for rescoring, under a 10-hour low-resource setup. Furthermore, we showcase the scalability of our framework by incorporating a larger unsupervised speech corpus for generating speech fragments in data splicing, resulting in an additional 10% relative improvement.
引用
收藏
页码:973 / 983
页数:11
相关论文
共 50 条
  • [1] UniSplice: Universal Cross-Lingual Data Splicing for Low-Resource ASR
    Wang, Wei
    Qian, Yanmin
    INTERSPEECH 2023, 2023, : 2253 - 2257
  • [2] ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion
    Casanova, Edresson
    Shulby, Christopher
    Korolev, Alexander
    Candido Junior, Arnaldo
    Soares, Anderson da Silva
    Aluisio, Sandra
    Ponti, Moacir Antonelli
    INTERSPEECH 2023, 2023, : 1244 - 1248
  • [3] Improving Candidate Generation for Low-resource Cross-lingual Entity Linking
    Zhou, Shuyan
    Rijhwani, Shruti
    Wieting, John
    Carbonell, Jaime
    Neubig, Graham
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 : 109 - 124
  • [4] Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR
    Klejch, Ondrej
    Wallington, Electra
    Bell, Peter
    INTERSPEECH 2022, 2022, : 2288 - 2292
  • [5] Cross-lingual embedding for cross-lingual question retrieval in low-resource community question answering
    HajiAminShirazi, Shahrzad
    Momtazi, Saeedeh
    MACHINE TRANSLATION, 2020, 34 (04) : 287 - 303
  • [6] Knowledge Distillation Based Training of Universal ASR Source Models for Cross-lingual Transfer
    Fukuda, Takashi
    Thomas, Samuel
    INTERSPEECH 2021, 2021, : 3450 - 3454
  • [7] Is Translation Helpful? An Exploration of Cross-Lingual Transfer in Low-Resource Dialog Generation
    Shen, Lei
    Yu, Shuai
    Shen, Xiaoyu
    2024 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN 2024, 2024,
  • [8] Cross-lingual intent classification in a low resource industrial setting
    Khalil, Talaat
    Kielczewski, Kornel
    Chouliaras, Georgios Christos
    Keldibek, Amina
    Versteegh, Maarten
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6419 - 6424
  • [9] Cross-Lingual Morphological Tagging for Low-Resource Languages
    Buys, Jan
    Botha, Jan A.
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1954 - 1964
  • [10] XAlign: Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages
    Abhishek, Tushar
    Sagare, Shivprasad
    Singh, Bhavyajeet
    Sharma, Anubhav
    Gupta, Manish
    Varma, Vasudeva
    COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 171 - 175