Zero-Shot Text Normalization via Cross-Lingual Knowledge Distillation

被引:1
|
作者
Wang, Linqin [1 ]
Huang, Xiang [2 ]
Yu, Zhengtao [1 ]
Peng, Hao [2 ]
Gao, Shengxiang [1 ]
Mao, Cunli [1 ]
Huang, Yuxin [1 ]
Dong, Ling [1 ]
Yu, Philip S. [3 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Peoples R China
[2] Beihang Univ, Sch Cyber Sci & Technol, Beijing 1000191, Peoples R China
[3] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
关键词
Task analysis; Training; Neural networks; Adaptation models; Knowledge engineering; Symbols; Speech processing; Zero-shot text normalization; Cross-lingual knowledge distillation; Weighted finite state transducers; Data augmentation;
D O I
10.1109/TASLP.2024.3407509
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Text normalization (TN) is a crucial preprocessing step in text-to-speech synthesis, which pertains to the accurate pronunciation of numbers and symbols within the text. Existing neural network-based TN methods have shown significant success in rich-resource languages. However, these methods are data-driven and highly rely on a large number of labeled datasets, which are not practical in zero-resource settings. Rule-based weighted finite-state transducers (WFST) are a common measure for zero-shot TN, but WFST-based TN approaches encounter challenges with ambiguous input, particularly in cases where the normalized form is context-dependent. On the other hand, conventional neural TN methods suffer from unrecoverable errors. In this paper, we propose ZSTN, a novel zero-shot TN framework based on cross-lingual knowledge distillation, which utilizes annotated data to train the teacher model on rich-resource language and unlabelled data to train the student model on zero-resource language. Furthermore, it incorporates expert knowledge from WFST into a knowledge distillation neural network. Concretely, a TN model with WFST pseudo-labels augmentation is trained as a teacher model in the source language. Subsequently, the student model is supervised by soft-labels from the teacher model and WFST pseudo-labels from the target language. By leveraging cross-lingual knowledge distillation, we address contextual ambiguity in the text, while WFST mitigates unrecoverable errors of the neural model. Additionally, ZSTN is adaptable to different zero-resource languages by using the joint loss function for the teacher model and WFST constraints. We also release a zero-shot text normalization dataset in five languages. We compare ZSTN with seven zero-shot TN benchmarks on public datasets in four languages for the teacher model and zero-shot datasets in five languages for the student model. The results demonstrate that the proposed ZSTN excels in performance without the need for labeled data.
引用
收藏
页码:4631 / 4646
页数:16
相关论文
共 50 条
  • [41] Zero-Shot Cross-Lingual Transfer in Legal Domain Using Transformer Models
    Shaheen, Zein
    Wohlgenannt, Gerhard
    Mouromtsev, Dmitry
    2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI 2021), 2021, : 450 - 456
  • [42] Label modification and bootstrapping for zero-shot cross-lingual hate speech detection
    Bigoulaeva, Irina
    Hangya, Viktor
    Gurevych, Iryna
    Fraser, Alexander
    LANGUAGE RESOURCES AND EVALUATION, 2023, 57 (04) : 1515 - 1546
  • [43] Zero-Shot Cross-lingual Aphasia Detection using Automatic Speech Recognition
    Chatzoudis, Gerasimos
    Plitsis, Manos
    Stamouli, Spyridoula
    Dimou, Athanasia-Lida
    Katsamanis, Nassos
    Katsouros, Vassilis
    INTERSPEECH 2022, 2022, : 2178 - 2182
  • [44] Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
    Artetxe, Mikel
    Schwenk, Holger
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2019, 7 : 597 - 610
  • [45] Label modification and bootstrapping for zero-shot cross-lingual hate speech detection
    Irina Bigoulaeva
    Viktor Hangya
    Iryna Gurevych
    Alexander Fraser
    Language Resources and Evaluation, 2023, 57 : 1515 - 1546
  • [46] Zero-shot cross-lingual transfer language selection using linguistic similarity
    Eronen, Juuso
    Ptaszynski, Michal
    Masui, Fumito
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
  • [47] Transfer language selection for zero-shot cross-lingual abusive language detection
    Eronen, Juuso
    Ptaszynski, Michal
    Masui, Fumito
    Arata, Masaki
    Leliwa, Gniewosz
    Wroczynski, Michal
    INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (04)
  • [48] Beyond the EnglishWeb: Zero-Shot Cross-Lingual and Lightweight Monolingual Classification of Registers
    Repo, Liina
    Skantsi, Valtteri
    Ronnqvist, Samuel
    Hellstrom, Saara
    Oinonen, Miika
    Salmela, Anna
    Biber, Douglas
    Egbert, Jesse
    Pyysalo, Sampo
    Laippala, Veronika
    EACL 2021: THE 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2021, : 183 - 191
  • [49] Why Does Zero-Shot Cross-Lingual Generation Fail? An Explanation and a Solution
    Li, Tianjian
    Murray, Kenton
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 12461 - 12476
  • [50] The Impact of Cross-Lingual Adjustment of Contextual Word Representations on Zero-Shot Transfer
    Efimov, Pavel
    Boytsov, Leonid
    Arslanova, Elena
    Braslavski, Pavel
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT III, 2023, 13982 : 51 - 67