FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning

被引：14

作者：

Lee, Yeonghyeon ^{[1
]}

Jang, Kangwook ^{[1
]}

Goo, Jahyun ^{[1
]}

Jung, Youngmoon ^{[1
]}

Kim, Hoirin ^{[1
]}

机构：

[1] Korea Adv Inst Sci & Technol, Sch Elect Engn, Daejeon, South Korea

来源：

INTERSPEECH 2022 | 2022年

基金：

新加坡国家研究基金会;

关键词：

knowledge distillation; speech representation learning; self-supervised learning; model compression;

D O I：

10.21437/Interspeech.2022-11112

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Large-scale speech self-supervised learning (SSL) has emerged to the main field of speech processing, however, the problem of computational cost arising from its vast size makes a high entry barrier to academia. In addition, existing distillation techniques of speech SSL models compress the model by reducing layers, which induces performance degradation in linguistic pattern recognition tasks such as phoneme recognition (PR). In this paper, we propose FitHuBERT, which makes thinner in dimension throughout almost all model components and deeper in layer compared to prior speech SSL distillation works. Moreover, we employ a time-reduction layer to speed up inference time and propose a method of hint-based distillation for less performance degradation. Our method reduces the model to 23.8% in size and 35.9% in inference time compared to HuBERT. Also, we achieve 12.1% word error rate and 13.3% phoneme error rate on the SUPERB benchmark which is superior than prior work.

引用

页码：3588 / 3592

页数：5

共 50 条

[21] SSSD: Self-Supervised Self Distillation
Chen, Wei-Chi
Chu, Wei-Ta
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2769 - 2776
[22] Learning by Distillation: A Self-Supervised Learning Framework for Optical Flow Estimation
Liu, Pengpeng
Lyu, Michael R.
King, Irwin
Xu, Jia
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (09) : 5026 - 5041
[23] Auxiliary Learning for Self-Supervised Video Representation via Similarity-based Knowledge Distillation
Dadashzadeh, Amirhossein
Whone, Alan
Mirmehdi, Majid
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4230 - 4239
[24] CHARACTERIZING THE ADVERSARIAL VULNERABILITY OF SPEECH SELF-SUPERVISED LEARNING
Wu, Haibin
Zheng, Bo
Li, Xu
Wu, Xixin
Lee, Hung-Yi
Meng, Helen
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3164 - 3168
[25] INVESTIGATING SELF-SUPERVISED LEARNING FOR SPEECH ENHANCEMENT AND SEPARATION
Huang, Zili
Watanabe, Shinji
Yang, Shu-wen
Garcia, Paola
Khudanpur, Sanjeev
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6837 - 6841
[26] Self-Supervised Learning With Segmental Masking for Speech Representation
Yue, Xianghu
Lin, Jingru
Gutierrez, Fabian Ritter
Li, Haizhou
IEEE Journal on Selected Topics in Signal Processing, 2022, 16 (06): : 1367 - 1379
[27] Self-Supervised Learning With Segmental Masking for Speech Representation
Yue, Xianghu
Lin, Jingru
Gutierrez, Fabian Ritter
Li, Haizhou
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1367 - 1379
[28] Self-supervised heterogeneous graph learning with iterative similarity distillation
Wang, Tianfeng
Pan, Zhisong
Hu, Guyu
Xu, Kun
Zhang, Yao
KNOWLEDGE-BASED SYSTEMS, 2023, 276
[29] Phonetically Motivated Self-Supervised Speech Representation Learning
Yue, Xianghu
Li, Haizhou
INTERSPEECH 2021, 2021, : 746 - 750
[30] A COMPREHENSIVE STUDY ON SELF-SUPERVISED DISTILLATION FOR SPEAKER REPRESENTATION LEARNING
Chen, Zhengyang
Qian, Yao
Han, Bing
Qian, Yanmin
Zeng, Michael
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 599 - 604

← 1 2 3 4 5 →