Zero-Shot Text Normalization via Cross-Lingual Knowledge Distillation

被引:1
|
作者
Wang, Linqin [1 ]
Huang, Xiang [2 ]
Yu, Zhengtao [1 ]
Peng, Hao [2 ]
Gao, Shengxiang [1 ]
Mao, Cunli [1 ]
Huang, Yuxin [1 ]
Dong, Ling [1 ]
Yu, Philip S. [3 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Peoples R China
[2] Beihang Univ, Sch Cyber Sci & Technol, Beijing 1000191, Peoples R China
[3] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
关键词
Task analysis; Training; Neural networks; Adaptation models; Knowledge engineering; Symbols; Speech processing; Zero-shot text normalization; Cross-lingual knowledge distillation; Weighted finite state transducers; Data augmentation;
D O I
10.1109/TASLP.2024.3407509
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Text normalization (TN) is a crucial preprocessing step in text-to-speech synthesis, which pertains to the accurate pronunciation of numbers and symbols within the text. Existing neural network-based TN methods have shown significant success in rich-resource languages. However, these methods are data-driven and highly rely on a large number of labeled datasets, which are not practical in zero-resource settings. Rule-based weighted finite-state transducers (WFST) are a common measure for zero-shot TN, but WFST-based TN approaches encounter challenges with ambiguous input, particularly in cases where the normalized form is context-dependent. On the other hand, conventional neural TN methods suffer from unrecoverable errors. In this paper, we propose ZSTN, a novel zero-shot TN framework based on cross-lingual knowledge distillation, which utilizes annotated data to train the teacher model on rich-resource language and unlabelled data to train the student model on zero-resource language. Furthermore, it incorporates expert knowledge from WFST into a knowledge distillation neural network. Concretely, a TN model with WFST pseudo-labels augmentation is trained as a teacher model in the source language. Subsequently, the student model is supervised by soft-labels from the teacher model and WFST pseudo-labels from the target language. By leveraging cross-lingual knowledge distillation, we address contextual ambiguity in the text, while WFST mitigates unrecoverable errors of the neural model. Additionally, ZSTN is adaptable to different zero-resource languages by using the joint loss function for the teacher model and WFST constraints. We also release a zero-shot text normalization dataset in five languages. We compare ZSTN with seven zero-shot TN benchmarks on public datasets in four languages for the teacher model and zero-shot datasets in five languages for the student model. The results demonstrate that the proposed ZSTN excels in performance without the need for labeled data.
引用
收藏
页码:4631 / 4646
页数:16
相关论文
共 50 条
  • [31] Improving Zero-Shot Cross-lingual Transfer for Multilingual Question Answering over Knowledge Graph
    Zhou, Yucheng
    Geng, Xiubo
    Shen, Tao
    Zhang, Wenqiang
    Jiang, Daxin
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 5822 - 5834
  • [32] Deep Exploration of Cross-Lingual Zero-Shot Generalization in Instruction Tuning
    Han, Janghoon
    Lee, Changho
    Shin, Joongbo
    Choi, Stanley Jungkyu
    Lee, Honglak
    Bae, Kyunghoon
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 15436 - 15452
  • [33] Zero-shot Cross-lingual Dialogue Systems with Transferable Latent Variables
    Liu, Zihan
    Shin, Jamin
    Xu, Yan
    Winata, Genta Indra
    Xu, Peng
    Madotto, Andrea
    Fung, Pascale
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1297 - 1303
  • [34] Zero-shot Cross-lingual Transfer is Under-specified Optimization
    Wu, Shijie
    Van Durme, Benjamin
    Dredze, Mark
    PROCEEDINGS OF THE 7TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP, 2022, : 236 - 248
  • [35] Zero-shot Cross-Lingual Phonetic Recognition with External Language Embedding
    Gao, Heting
    Ni, Junrui
    Zhang, Yang
    Qian, Kaizhi
    Chang, Shiyu
    Hasegawa-Johnson, Mark
    INTERSPEECH 2021, 2021, : 1304 - 1308
  • [36] Exposing the limits of Zero-shot Cross-lingual Hate Speech Detection
    Nozza, Debora
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 907 - 914
  • [37] Realistic Zero-Shot Cross-Lingual Transfer in Legal Topic Classification
    Xenouleas, Stratos
    Tsoukara, Alexia
    Panagiotakis, Giannis
    Chalkidis, Ilias
    Androutsopoulos, Ion
    PROCEEDINGS OF THE 12TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE, SETN 2022, 2022,
  • [38] Zero-shot learning based cross-lingual sentiment analysis for sanskrit text with insufficient labeled data
    Kumar, Puneet
    Pathania, Kshitij
    Raman, Balasubramanian
    APPLIED INTELLIGENCE, 2023, 53 (09) : 10096 - 10113
  • [39] Zero-shot learning based cross-lingual sentiment analysis for sanskrit text with insufficient labeled data
    Puneet Kumar
    Kshitij Pathania
    Balasubramanian Raman
    Applied Intelligence, 2023, 53 : 10096 - 10113
  • [40] Cross-lingual Distillation for Text Classification
    Xu, Ruochen
    Yang, Yiming
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1415 - 1425