FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning

被引:14
|
作者
Lee, Yeonghyeon [1 ]
Jang, Kangwook [1 ]
Goo, Jahyun [1 ]
Jung, Youngmoon [1 ]
Kim, Hoirin [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Sch Elect Engn, Daejeon, South Korea
来源
基金
新加坡国家研究基金会;
关键词
knowledge distillation; speech representation learning; self-supervised learning; model compression;
D O I
10.21437/Interspeech.2022-11112
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Large-scale speech self-supervised learning (SSL) has emerged to the main field of speech processing, however, the problem of computational cost arising from its vast size makes a high entry barrier to academia. In addition, existing distillation techniques of speech SSL models compress the model by reducing layers, which induces performance degradation in linguistic pattern recognition tasks such as phoneme recognition (PR). In this paper, we propose FitHuBERT, which makes thinner in dimension throughout almost all model components and deeper in layer compared to prior speech SSL distillation works. Moreover, we employ a time-reduction layer to speed up inference time and propose a method of hint-based distillation for less performance degradation. Our method reduces the model to 23.8% in size and 35.9% in inference time compared to HuBERT. Also, we achieve 12.1% word error rate and 13.3% phoneme error rate on the SUPERB benchmark which is superior than prior work.
引用
收藏
页码:3588 / 3592
页数:5
相关论文
共 50 条
  • [41] Efficient Personalized Speech Enhancement Through Self-Supervised Learning
    Sivaraman, Aswin
    Kim, Minje
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1342 - 1356
  • [42] Automatic self-supervised learning of associations between speech and text
    Knuuttila, Juho
    Rasanen, Okko
    Laine, Unto K.
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 465 - 469
  • [43] Linear-Complexity Self-Supervised Learning for Speech Processing
    Zhang, Shucong
    Parcollet, Titouan
    van Dalen, Rogier
    Bhattacharya, Sourav
    INTERSPEECH 2024, 2024, : 3480 - 3484
  • [44] Graph Knowledge Structure for Attentional Knowledge Tracing With Self-Supervised Learning
    Liu, Zhaohui
    Liu, Sainan
    Gu, Weifeng
    IEEE ACCESS, 2025, 13 : 10933 - 10943
  • [45] Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models
    Ashihara, Takanori
    Moriya, Takafumi
    Matsuura, Kohei
    Tanaka, Tomohiro
    INTERSPEECH 2022, 2022, : 411 - 415
  • [46] On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation
    Xia, Xin
    Yin, Hongzhi
    Yu, Junliang
    Wang, Qinyong
    Xu, Guandong
    Quoc Viet Hung Nguyen
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 546 - 555
  • [47] More from Less: Self-supervised Knowledge Distillation for Routine Histopathology Data
    Farndale, Lucas
    Insall, Robert
    Yuan, Ke
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT I, 2024, 14348 : 454 - 463
  • [48] MV-MR: Multi-Views and Multi-Representations for Self-Supervised Learning and Knowledge Distillation
    Kinakh, Vitaliy
    Drozdova, Mariia
    Voloshynovskiy, Slava
    ENTROPY, 2024, 26 (06)
  • [49] Big2Small: Learning from masked image modelling with heterogeneous self-supervised knowledge distillation
    Wang, Ziming
    Han, Shumin
    Wang, Xiaodi
    Hao, Jing
    Cao, Xianbin
    Zhang, Baochang
    IET CYBER-SYSTEMS AND ROBOTICS, 2024, 6 (04)
  • [50] Category contrastive distillation with self-supervised classification
    Chen, Weiwei
    Xu, Jiazhen
    Zheng, Yujie
    Wang, Chong
    SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (01)