FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning

被引:14
|
作者
Lee, Yeonghyeon [1 ]
Jang, Kangwook [1 ]
Goo, Jahyun [1 ]
Jung, Youngmoon [1 ]
Kim, Hoirin [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Sch Elect Engn, Daejeon, South Korea
来源
基金
新加坡国家研究基金会;
关键词
knowledge distillation; speech representation learning; self-supervised learning; model compression;
D O I
10.21437/Interspeech.2022-11112
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Large-scale speech self-supervised learning (SSL) has emerged to the main field of speech processing, however, the problem of computational cost arising from its vast size makes a high entry barrier to academia. In addition, existing distillation techniques of speech SSL models compress the model by reducing layers, which induces performance degradation in linguistic pattern recognition tasks such as phoneme recognition (PR). In this paper, we propose FitHuBERT, which makes thinner in dimension throughout almost all model components and deeper in layer compared to prior speech SSL distillation works. Moreover, we employ a time-reduction layer to speed up inference time and propose a method of hint-based distillation for less performance degradation. Our method reduces the model to 23.8% in size and 35.9% in inference time compared to HuBERT. Also, we achieve 12.1% word error rate and 13.3% phoneme error rate on the SUPERB benchmark which is superior than prior work.
引用
收藏
页码:3588 / 3592
页数:5
相关论文
共 50 条
  • [21] SSSD: Self-Supervised Self Distillation
    Chen, Wei-Chi
    Chu, Wei-Ta
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2769 - 2776
  • [22] Learning by Distillation: A Self-Supervised Learning Framework for Optical Flow Estimation
    Liu, Pengpeng
    Lyu, Michael R.
    King, Irwin
    Xu, Jia
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (09) : 5026 - 5041
  • [23] Auxiliary Learning for Self-Supervised Video Representation via Similarity-based Knowledge Distillation
    Dadashzadeh, Amirhossein
    Whone, Alan
    Mirmehdi, Majid
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4230 - 4239
  • [24] CHARACTERIZING THE ADVERSARIAL VULNERABILITY OF SPEECH SELF-SUPERVISED LEARNING
    Wu, Haibin
    Zheng, Bo
    Li, Xu
    Wu, Xixin
    Lee, Hung-Yi
    Meng, Helen
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3164 - 3168
  • [25] INVESTIGATING SELF-SUPERVISED LEARNING FOR SPEECH ENHANCEMENT AND SEPARATION
    Huang, Zili
    Watanabe, Shinji
    Yang, Shu-wen
    Garcia, Paola
    Khudanpur, Sanjeev
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6837 - 6841
  • [26] Self-Supervised Learning With Segmental Masking for Speech Representation
    Yue, Xianghu
    Lin, Jingru
    Gutierrez, Fabian Ritter
    Li, Haizhou
    IEEE Journal on Selected Topics in Signal Processing, 2022, 16 (06): : 1367 - 1379
  • [27] Self-Supervised Learning With Segmental Masking for Speech Representation
    Yue, Xianghu
    Lin, Jingru
    Gutierrez, Fabian Ritter
    Li, Haizhou
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1367 - 1379
  • [28] Self-supervised heterogeneous graph learning with iterative similarity distillation
    Wang, Tianfeng
    Pan, Zhisong
    Hu, Guyu
    Xu, Kun
    Zhang, Yao
    KNOWLEDGE-BASED SYSTEMS, 2023, 276
  • [29] Phonetically Motivated Self-Supervised Speech Representation Learning
    Yue, Xianghu
    Li, Haizhou
    INTERSPEECH 2021, 2021, : 746 - 750
  • [30] A COMPREHENSIVE STUDY ON SELF-SUPERVISED DISTILLATION FOR SPEAKER REPRESENTATION LEARNING
    Chen, Zhengyang
    Qian, Yao
    Han, Bing
    Qian, Yanmin
    Zeng, Michael
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 599 - 604