FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning

被引:14
|
作者
Lee, Yeonghyeon [1 ]
Jang, Kangwook [1 ]
Goo, Jahyun [1 ]
Jung, Youngmoon [1 ]
Kim, Hoirin [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Sch Elect Engn, Daejeon, South Korea
来源
基金
新加坡国家研究基金会;
关键词
knowledge distillation; speech representation learning; self-supervised learning; model compression;
D O I
10.21437/Interspeech.2022-11112
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Large-scale speech self-supervised learning (SSL) has emerged to the main field of speech processing, however, the problem of computational cost arising from its vast size makes a high entry barrier to academia. In addition, existing distillation techniques of speech SSL models compress the model by reducing layers, which induces performance degradation in linguistic pattern recognition tasks such as phoneme recognition (PR). In this paper, we propose FitHuBERT, which makes thinner in dimension throughout almost all model components and deeper in layer compared to prior speech SSL distillation works. Moreover, we employ a time-reduction layer to speed up inference time and propose a method of hint-based distillation for less performance degradation. Our method reduces the model to 23.8% in size and 35.9% in inference time compared to HuBERT. Also, we achieve 12.1% word error rate and 13.3% phoneme error rate on the SUPERB benchmark which is superior than prior work.
引用
收藏
页码:3588 / 3592
页数:5
相关论文
共 50 条
  • [31] Self-Supervised Learning With Adaptive Distillation for Hyperspectral Image Classification
    Yue, Jun
    Fang, Leyuan
    Rahmani, Hossein
    Ghamisi, Pedram
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [32] A Novel Knowledge Distillation Method for Self-Supervised Hyperspectral Image Classification
    Chi, Qiang
    Lv, Guohua
    Zhao, Guixin
    Dong, Xiangjun
    REMOTE SENSING, 2022, 14 (18)
  • [33] DisRot: boosting the generalization capability of few-shot learning via knowledge distillation and self-supervised learning
    Ma, Chenyu
    Jia, Jinfang
    Huang, Jianqiang
    Wu, Li
    Wang, Xiaoying
    MACHINE VISION AND APPLICATIONS, 2024, 35 (03)
  • [34] Self-Supervised Network Distillation for Exploration
    Zhang, Xu
    Dai, Ruiyu
    Chen, Weisi
    Qiu, Jiguang
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023, 37 (15)
  • [35] Boosting Self-Supervised Learning via Knowledge Transfer
    Noroozi, Mehdi
    Vinjimoor, Ananth
    Favaro, Paolo
    Pirsiavash, Hamed
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 9359 - 9367
  • [36] Self-distillation improves self-supervised learning for DNA sequence inference
    Yu, Tong
    Cheng, Lei
    Khalitov, Ruslan
    Olsson, Erland B.
    Yang, Zhirong
    NEURAL NETWORKS, 2025, 183
  • [37] OAGknow: Self-Supervised Learning for Linking Knowledge Graphs
    Liu, Xiao
    Mian, Li
    Dong, Yuxiao
    Zhang, Fanjin
    Zhang, Jing
    Tang, Jie
    Zhang, Peng
    Gong, Jibing
    Wang, Kuansan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (02) : 1895 - 1908
  • [38] Functional Knowledge Transfer with Self-supervised Representation Learning
    Chhipa, Prakash Chandra
    Chopra, Muskaan
    Mengi, Gopal
    Gupta, Varun
    Upadhyay, Richa
    Chippa, Meenakshi Subhash
    De, Kanjar
    Saini, Rajkumar
    Uchida, Seiichi
    Liwicki, Marcus
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3339 - 3343
  • [39] KD-MVS: Knowledge Distillation Based Self-supervised Learning for Multi-view Stereo
    Ding, Yikang
    Zhu, Qingtian
    Liu, Xiangyue
    Yuan, Wentao
    Zhang, Haotian
    Zhang, Chi
    COMPUTER VISION, ECCV 2022, PT XXXI, 2022, 13691 : 630 - 646
  • [40] TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech
    Liu, Andy T.
    Li, Shang-Wen
    Lee, Hung-yi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2351 - 2366