Universal Cross-Lingual Data Generation for Low Resource ASR

被引:0
|
作者
Wang, Wei [1 ]
Qian, Yanmin [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Auditory Cognit & Computat Acoust Lab, Shanghai 200240, Peoples R China
关键词
Splicing; Data models; Phonetics; Training; Speech synthesis; Dictionaries; Data mining; Low-resource speech recognition; text-to-seech; data splicing; self-supervised learning; SPEECH RECOGNITION; REPRESENTATION;
D O I
10.1109/TASLP.2023.3345150
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Significant advances in end-to-end (E2E) automatic speech recognition (ASR) have primarily been concentrated on languages rich in annotated data. Nevertheless, a large proportion of languages worldwide, which are typically low-resource, continue to pose significant challenges. To address this issue, this study presents a novel speech synthesis framework based on data splicing that leverages self-supervised learning (SSL) units from Hidden Unit BERT (HuBERT) as universal phonetic units. In our framework, the SSL phonetic units serve as crucial bridges between speech and text across different languages. By leveraging these units, we successfully splice speech fragments from high-resource languages into synthesized speech that maintains acoustic coherence with text from low-resource languages. To further enhance the practicality of the framework, we introduce a sampling strategy based on confidence scores assigned to the speech segments used in data splicing. The application of this confidence sampling strategy in data splicing significantly accelerates ASR model convergence and enhances overall ASR performance. Experimental results on the CommonVoice dataset show 25-35% relative improvement for four Indo-European languages and about 20% for Turkish using a 4-gram language model for rescoring, under a 10-hour low-resource setup. Furthermore, we showcase the scalability of our framework by incorporating a larger unsupervised speech corpus for generating speech fragments in data splicing, resulting in an additional 10% relative improvement.
引用
收藏
页码:973 / 983
页数:11
相关论文
共 50 条
  • [41] Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser
    Duong, Long
    Cohn, Trevor
    Bird, Steven
    Cook, Paul
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, 2015, : 845 - 850
  • [42] Cross-Lingual Knowledge Distillation for Answer Sentence Selection in Low-Resource Languages
    Gupta, Shivanshu
    Matsubara, Yoshitomo
    Chadha, Ankit
    Moschitti, Alessandro
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 14078 - 14092
  • [43] Deep Persian sentiment analysis: Cross-lingual training for low-resource languages
    Ghasemi, Rouzbeh
    Ashrafi Asli, Seyed Arad
    Momtazi, Saeedeh
    JOURNAL OF INFORMATION SCIENCE, 2022, 48 (04) : 449 - 462
  • [44] UniBridge: A Unified Approach to Cross-Lingual Transfer Learning for Low-Resource Languages
    Trinh Pham
    Le, Khoi M.
    Luu Anh Tuan
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 3168 - 3184
  • [45] Searching the Web for Cross-lingual Parallel Data
    El-Kishky, Ahmed
    Koehn, Philipp
    Schwenk, Holger
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 2417 - 2420
  • [46] Machine-Created Universal Language for Cross-Lingual Transfer
    Liang, Yaobo
    Zhu, Quanzhi
    Zhao, Junhe
    Duan, Nan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 18617 - 18625
  • [47] Automatic Wordnet Development for Low-Resource Languages using Cross-Lingual WSD
    Taghizadeh, Nasrin
    Faili, Hesham
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2016, 56 : 61 - 87
  • [48] Translation Errors Significantly Impact Low-Resource Languages in Cross-Lingual Learning
    Agrawal, Ashish Sunil
    Fazili, Barah
    Jyothi, Preethi
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 319 - 329
  • [49] Cross-lingual and Multi-stream Posterior Features for Low Resource LVCSR Systems
    Thomas, Samuel
    Ganapathy, Sriram
    Hermansky, Hynek
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 877 - 880
  • [50] Baselines and Test Data for Cross-Lingual Inference
    Agic, Zeljko
    Schluter, Natalie
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3890 - 3894