KNOWLEDGE DISTILLATION FROM BERT IN PRE-TRAINING AND FINE-TUNING FOR POLYPHONE DISAMBIGUATION

被引：0

作者：

Sun, Hao ^{[1
]}

Tan, Xu ^{[2
]}

Gan, Jun-Wei ^{[3
]}

Zhao, Sheng ^{[3
]}

Han, Dongxu ^{[3
]}

Liu, Hongzhi ^{[1
]}

Qin, Tao ^{[2
]}

Liu, Tie-Yan ^{[2
]}

机构：

[1] Peking Univ, Beijing, Peoples R China

[2] Microsoft Res, Beijing, Peoples R China

[3] Microsoft STC Asia, Beijing, Peoples R China

来源：

2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019) | 2019年

关键词：

Polyphone Disambiguation; Knowledge Distillation; Pre-training; Fine-tuning; BERT;

D O I：

10.1109/asru46091.2019.9003918

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Polyphone disambiguation aims to select the correct pronunciation for a polyphonic word from several candidates, which is important for text-to-speech synthesis. Since the pronunciation of a polyphonic word is usually decided by its context, polyphone disambiguation can be regarded as a language understanding task. Inspired by the success of BERT for language understanding, we propose to leverage pre-trained BERT models for polyphone disambiguation. However, BERT models are usually too heavy to be served online, in terms of both memory cost and inference speed. In this work, we focus on efficient model for polyphone disambiguation and propose a two-stage knowledge distillation method that transfers the knowledge from a heavy BERT model in both pre-training and fine-tuning stages to a lightweight BERT model, in order to reduce online serving cost. Experiments on Chinese and English polyphone disambiguation datasets demonstrate that our method reduces model parameters by a factor of 5 and improves inference speed by 7 times, while nearly matches the classification accuracy (95.4% on Chinese and 98.1% on English) to the original BERT model.

引用

页码：168 / 175

页数：8

共 50 条

[1] Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction
Peng Su
K. Vijay-Shanker
BMC Bioinformatics, 23
[2] Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction
Su, Peng
Vijay-Shanker, K.
BMC BIOINFORMATICS, 2022, 23 (01)
[3] Trajectory-BERT: Pre-training and fine-tuning bidirectional transformers for crowd trajectory enhancement
Li, Lingyu
Huang, Tianyu
Li, Yihao
Li, Peng
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2023, 34 (3-4)
[4] SAR-HUB: Pre-Training, Fine-Tuning, and Explaining
Yang, Haodong
Kang, Xinyue
Liu, Long
Liu, Yujiang
Huang, Zhongling
REMOTE SENSING, 2023, 15 (23)
[5] AlignDet: Aligning Pre-training and Fine-tuning in Object Detection
Li, Ming
Wu, Jie
Wang, Xionghui
Chen, Chen
Qin, Jie
Xiao, Xuefeng
Wang, Rui
Zheng, Min
Pan, Xin
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6843 - 6853
[6] Improved Fine-Tuning by Better Leveraging Pre-Training Data
Liu, Ziquan
Xu, Yi
Xu, Yuanhong
Qian, Qi
Li, Hao
Ji, Xiangyang
Chan, Antoni B.
Jin, Rong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[7] Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning
Chen, Tianlong
Liu, Sijia
Chang, Shiyu
Cheng, Yu
Amini, Lisa
Wang, Zhangyang
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 696 - 705
[8] Dataset Distillation with Attention Labels for Fine-tuning BERT
Maekawa, Aru
Kobayashi, Naoki
Funakoshi, Kotaro
Okumura, Manabu
61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 119 - 127
[9] Knowledge-guided pre-training and fine-tuning: Video representation learning for action recognition
Wang, Guanhong
Zhou, Yang
He, Zhanhao
Lu, Keyu
Feng, Yang
Liu, Zuozhu
Wang, Gaoang
NEUROCOMPUTING, 2024, 571
[10] Tri-Train: Automatic Pre-Fine Tuning between Pre-Training and Fine-Tuning for SciNER
Zeng, Qingkai
Yu, Wenhao
Yu, Mengxia
Jiang, Tianwen
Weninger, Tim
Jiang, Meng
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4778 - 4787

← 1 2 3 4 5 →