Universal Cross-Lingual Data Generation for Low Resource ASR

被引:0
|
作者
Wang, Wei [1 ]
Qian, Yanmin [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Auditory Cognit & Computat Acoust Lab, Shanghai 200240, Peoples R China
关键词
Splicing; Data models; Phonetics; Training; Speech synthesis; Dictionaries; Data mining; Low-resource speech recognition; text-to-seech; data splicing; self-supervised learning; SPEECH RECOGNITION; REPRESENTATION;
D O I
10.1109/TASLP.2023.3345150
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Significant advances in end-to-end (E2E) automatic speech recognition (ASR) have primarily been concentrated on languages rich in annotated data. Nevertheless, a large proportion of languages worldwide, which are typically low-resource, continue to pose significant challenges. To address this issue, this study presents a novel speech synthesis framework based on data splicing that leverages self-supervised learning (SSL) units from Hidden Unit BERT (HuBERT) as universal phonetic units. In our framework, the SSL phonetic units serve as crucial bridges between speech and text across different languages. By leveraging these units, we successfully splice speech fragments from high-resource languages into synthesized speech that maintains acoustic coherence with text from low-resource languages. To further enhance the practicality of the framework, we introduce a sampling strategy based on confidence scores assigned to the speech segments used in data splicing. The application of this confidence sampling strategy in data splicing significantly accelerates ASR model convergence and enhances overall ASR performance. Experimental results on the CommonVoice dataset show 25-35% relative improvement for four Indo-European languages and about 20% for Turkish using a 4-gram language model for rescoring, under a 10-hour low-resource setup. Furthermore, we showcase the scalability of our framework by incorporating a larger unsupervised speech corpus for generating speech fragments in data splicing, resulting in an additional 10% relative improvement.
引用
收藏
页码:973 / 983
页数:11
相关论文
共 50 条
  • [21] Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition
    Hou, Wenxin
    Zhu, Han
    Wang, Yidong
    Wang, Jindong
    Qin, Tao
    Xu, Renju
    Shinozaki, Takahiro
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 317 - 329
  • [22] Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin
    Lin, Pin-Jie
    Saeed, Muhammed
    Chang, Ernie
    Scholman, Merel
    INTERSPEECH 2023, 2023, : 3954 - 3958
  • [23] Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages
    Nie, Ercong
    Liang, Sheng
    Schmid, Helmut
    Schuetze, Hinrich
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8320 - 8340
  • [24] Cross-Lingual Word Embeddings for Low-Resource Language Modeling
    Adams, Oliver
    Makarucha, Adam
    Neubig, Graham
    Bird, Steven
    Cohn, Trevor
    15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 937 - 947
  • [25] Cross-lingual and Cross-domain Transfer Learning for Automatic Term Extraction from Low Resource Data
    Hazem, Amir
    Bouhandi, Meriem
    Boudin, Florian
    Daille, Beatrice
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 648 - 662
  • [26] AfriWOZ: Corpus for Exploiting Cross-Lingual Transfer for Dialogue Generation in Low-Resource, African Languages
    Adewumi, Tosin
    Adeyemi, Mofetoluwa
    Anuoluwapo, Aremu
    Peters, Bukola
    Buzaaba, Happy
    Samuel, Oyerinde
    Rufai, Amina Mardiyyah
    Ajibade, Benjamin
    Gwadabe, Tajudeen
    Traore, Mory Moussou Koulibaly
    Ajayi, Tunde Oluwaseyi
    Muhammad, Shamsuddeen
    Baruwa, Ahmed
    Owoicho, Paul
    Ogunremi, Tolulope
    Ngigi, Phylis
    Ahia, Orevaoghene
    Nasir, Ruqayya
    Liwicki, Foteini
    Liwicki, Marcus
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [27] Cross-Lingual Generation and Evaluation of a Wide-Coverage Lexical Semantic Resource
    Novak, Attila
    Novak, Borbala
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 45 - 51
  • [28] A Cross-Lingual Summarization method based on cross-lingual Fact-relationship Graph Generation
    Zhang, Yongbing
    Gao, Shengxiang
    Huang, Yuxin
    Tan, Kaiwen
    Yu, Zhengtao
    PATTERN RECOGNITION, 2024, 146
  • [29] Cross-Lingual Classification of Crisis Data
    Khare, Prashant
    Burel, Gregoire
    Maynard, Diana
    Alani, Harith
    SEMANTIC WEB - ISWC 2018, PT I, 2018, 11136 : 617 - 633
  • [30] Improving Low-Resource Cross-lingual Parsing with Expected Statistic Regularization
    Effland, Thomas
    Collins, Michael
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 122 - 138