Zero-Shot Text Normalization via Cross-Lingual Knowledge Distillation

被引:1
|
作者
Wang, Linqin [1 ]
Huang, Xiang [2 ]
Yu, Zhengtao [1 ]
Peng, Hao [2 ]
Gao, Shengxiang [1 ]
Mao, Cunli [1 ]
Huang, Yuxin [1 ]
Dong, Ling [1 ]
Yu, Philip S. [3 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Peoples R China
[2] Beihang Univ, Sch Cyber Sci & Technol, Beijing 1000191, Peoples R China
[3] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
关键词
Task analysis; Training; Neural networks; Adaptation models; Knowledge engineering; Symbols; Speech processing; Zero-shot text normalization; Cross-lingual knowledge distillation; Weighted finite state transducers; Data augmentation;
D O I
10.1109/TASLP.2024.3407509
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Text normalization (TN) is a crucial preprocessing step in text-to-speech synthesis, which pertains to the accurate pronunciation of numbers and symbols within the text. Existing neural network-based TN methods have shown significant success in rich-resource languages. However, these methods are data-driven and highly rely on a large number of labeled datasets, which are not practical in zero-resource settings. Rule-based weighted finite-state transducers (WFST) are a common measure for zero-shot TN, but WFST-based TN approaches encounter challenges with ambiguous input, particularly in cases where the normalized form is context-dependent. On the other hand, conventional neural TN methods suffer from unrecoverable errors. In this paper, we propose ZSTN, a novel zero-shot TN framework based on cross-lingual knowledge distillation, which utilizes annotated data to train the teacher model on rich-resource language and unlabelled data to train the student model on zero-resource language. Furthermore, it incorporates expert knowledge from WFST into a knowledge distillation neural network. Concretely, a TN model with WFST pseudo-labels augmentation is trained as a teacher model in the source language. Subsequently, the student model is supervised by soft-labels from the teacher model and WFST pseudo-labels from the target language. By leveraging cross-lingual knowledge distillation, we address contextual ambiguity in the text, while WFST mitigates unrecoverable errors of the neural model. Additionally, ZSTN is adaptable to different zero-resource languages by using the joint loss function for the teacher model and WFST constraints. We also release a zero-shot text normalization dataset in five languages. We compare ZSTN with seven zero-shot TN benchmarks on public datasets in four languages for the teacher model and zero-shot datasets in five languages for the student model. The results demonstrate that the proposed ZSTN excels in performance without the need for labeled data.
引用
收藏
页码:4631 / 4646
页数:16
相关论文
共 50 条
  • [1] Zero-Shot Cross-Lingual Knowledge Transfer in VQA via Multimodal Distillation
    Weng, Yu
    Dong, Jun
    He, Wenbin
    Chaomurilige
    Liu, Xuan
    Liu, Zheng
    Gao, Honghao
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, : 1 - 11
  • [2] Zero-Shot Cross-lingual Semantic Parsing
    Sherborne, Tom
    Lapata, Mirella
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4134 - 4153
  • [3] Discrepancy and Uncertainty Aware Denoising Knowledge Distillation for Zero-Shot Cross-Lingual Named Entity Recognition
    Ge, Ling
    Hu, Chunming
    Ma, Guanghui
    Liu, Jihong
    Zhang, Hong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18056 - 18064
  • [4] Rumour Detection via Zero-Shot Cross-Lingual Transfer Learning
    Tian, Lin
    Zhang, Xiuzhen
    Lau, Jey Han
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, 2021, 12975 : 603 - 618
  • [5] Zero-Shot Cross-Lingual Named Entity Recognition via Progressive Multi-Teacher Distillation
    Li, Zhuoran
    Hu, Chunming
    Zhang, Richong
    Chen, Junfan
    Guo, Xiaohui
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4617 - 4630
  • [6] Improving Cross-lingual Text Classification with Zero-shot Instance-Weighting
    Li, Irene
    Sen, Prithviraj
    Zhu, Huaiyu
    Li, Yunyao
    Radev, Dragomir
    REPL4NLP 2021: PROCEEDINGS OF THE 6TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP, 2021, : 1 - 7
  • [7] Zero-Shot Cross-Lingual Opinion Target Extraction
    Jebbara, Soufian
    Cimiano, Philipp
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2486 - 2495
  • [8] XeroAlign: Zero-Shot Cross-lingual Transformer Alignment
    Gritta, Milan
    Iacobacci, Ignacio
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 371 - 381
  • [9] Zero-Shot Cross-Lingual Neural Headline Generation
    Ayana
    Shen, Shi-qi
    Chen, Yun
    Yang, Cheng
    Liu, Zhi-yuan
    Sun, Mao-song
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (12) : 2319 - 2327
  • [10] Zero-Shot Cross-Lingual Transfer with Meta Learning
    Nooralahzadeh, Farhad
    Bekoulis, Giannis
    Bjerva, Johannes
    Augenstein, Isabelle
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 4547 - 4562