Zero-Shot Text Normalization via Cross-Lingual Knowledge Distillation

被引:1
|
作者
Wang, Linqin [1 ]
Huang, Xiang [2 ]
Yu, Zhengtao [1 ]
Peng, Hao [2 ]
Gao, Shengxiang [1 ]
Mao, Cunli [1 ]
Huang, Yuxin [1 ]
Dong, Ling [1 ]
Yu, Philip S. [3 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Peoples R China
[2] Beihang Univ, Sch Cyber Sci & Technol, Beijing 1000191, Peoples R China
[3] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
关键词
Task analysis; Training; Neural networks; Adaptation models; Knowledge engineering; Symbols; Speech processing; Zero-shot text normalization; Cross-lingual knowledge distillation; Weighted finite state transducers; Data augmentation;
D O I
10.1109/TASLP.2024.3407509
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Text normalization (TN) is a crucial preprocessing step in text-to-speech synthesis, which pertains to the accurate pronunciation of numbers and symbols within the text. Existing neural network-based TN methods have shown significant success in rich-resource languages. However, these methods are data-driven and highly rely on a large number of labeled datasets, which are not practical in zero-resource settings. Rule-based weighted finite-state transducers (WFST) are a common measure for zero-shot TN, but WFST-based TN approaches encounter challenges with ambiguous input, particularly in cases where the normalized form is context-dependent. On the other hand, conventional neural TN methods suffer from unrecoverable errors. In this paper, we propose ZSTN, a novel zero-shot TN framework based on cross-lingual knowledge distillation, which utilizes annotated data to train the teacher model on rich-resource language and unlabelled data to train the student model on zero-resource language. Furthermore, it incorporates expert knowledge from WFST into a knowledge distillation neural network. Concretely, a TN model with WFST pseudo-labels augmentation is trained as a teacher model in the source language. Subsequently, the student model is supervised by soft-labels from the teacher model and WFST pseudo-labels from the target language. By leveraging cross-lingual knowledge distillation, we address contextual ambiguity in the text, while WFST mitigates unrecoverable errors of the neural model. Additionally, ZSTN is adaptable to different zero-resource languages by using the joint loss function for the teacher model and WFST constraints. We also release a zero-shot text normalization dataset in five languages. We compare ZSTN with seven zero-shot TN benchmarks on public datasets in four languages for the teacher model and zero-shot datasets in five languages for the student model. The results demonstrate that the proposed ZSTN excels in performance without the need for labeled data.
引用
收藏
页码:4631 / 4646
页数:16
相关论文
共 50 条
  • [21] Cross-Lingual BERT Transformation for Zero-Shot Dependency Parsing
    Wang, Yuxuan
    Che, Wanxiang
    Guo, Jiang
    Liu, Yijia
    Liu, Ting
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5721 - 5727
  • [22] Zero-Shot Learning for Cross-Lingual News Sentiment Classification
    Pelicon, Andraz
    Pranjic, Marko
    Miljkovic, Dragana
    Skrlj, Blaz
    Pollak, Senja
    APPLIED SCIENCES-BASEL, 2020, 10 (17):
  • [23] Improving Zero-Shot Cross-Lingual Dialogue State Tracking via Contrastive Learning
    Xiang, Yu
    Zhang, Ting
    Di, Hui
    Huang, Hui
    Li, Chunyou
    Ouchi, Kazushige
    Chen, Yufeng
    Xu, Jinan
    CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 127 - 141
  • [24] A joint learning approach with knowledge injection for zero-shot cross-lingual hate speech detection
    Pamungkas, Endang Wahyu
    Basile, Valerio
    Patti, Viviana
    INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (04)
  • [25] Combining Cross-lingual and Cross-task Supervision for Zero-Shot Learning
    Pikuliak, Matus
    Simko, Marian
    TEXT, SPEECH, AND DIALOGUE (TSD 2020), 2020, 12284 : 162 - 170
  • [26] Substructure Distribution Projection for Zero-Shot Cross-Lingual Dependency Parsing
    Shi, Freda
    Gimpel, Kevin
    Livescu, Karen
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 6547 - 6563
  • [27] Curriculum meta-learning for zero-shot cross-lingual transfer
    Doan, Toan
    Le, Bac
    KNOWLEDGE-BASED SYSTEMS, 2024, 301
  • [28] Cross-Lingual Transfer in Zero-Shot Cross-Language Entity Linking
    Schumacher, Elliot
    Mayfield, James
    Dredze, Mark
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 583 - 595
  • [29] Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering
    Riabi, Arij
    Scialom, Thomas
    Keraron, Rachel
    Sagot, Benoit
    Seddah, Djame
    Staiano, Jacopo
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 7016 - 7030
  • [30] Analyzing Zero-shot Cross-lingual Transfer in Supervised NLP Tasks
    Choi, Hyunjin
    Kim, Judong
    Joe, Seongho
    Min, Seungjai
    Gwon, Youngjune
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9608 - 9613