Oracle Teacher: Leveraging Target Information for Better Knowledge Distillation of CTC Models

被引:0
|
作者
Yoon, Ji Won [1 ,2 ]
Kim, Hyung Yong [1 ,2 ,3 ,4 ]
Lee, Hyeonseung
Ahn, Sunghwan [1 ,2 ]
Kim, Nam Soo [1 ,2 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea
[2] Seoul Natl Univ, Inst New Media & Commun, Seoul 08826, South Korea
[3] Seoul Natl Univ, Seoul 08826, South Korea
[4] 42dot Inc, Seoul 06620, South Korea
关键词
Speech recognition; scene text recognition; connectionist temporal classification; knowledge distillation; teacher-student learning; transfer learning; AUTOMATIC SPEECH RECOGNITION; TEXT;
D O I
10.1109/TASLP.2023.3297955
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Knowledge distillation (KD), best known as an effective method for model compression, aims at transferring the knowledge of a bigger network (teacher) to a much smaller network (student). Conventional KD methods usually employ the teacher model trained in a supervised manner, where output labels are treated only as targets. Extending this supervised scheme further, we introduce a new type of teacher model for connectionist temporal classification (CTC)-based sequence models, namely Oracle Teacher, that leverages both the source inputs and the output labels as the teacher model's input. Since the Oracle Teacher learns a more accurate CTC alignment by referring to the target information, it can provide the student with more optimal guidance. One potential risk for the proposed approach is a trivial solution that the model's output directly copies the target input. Based on a many-to-one mapping property of the CTC algorithm, we present a training strategy that can effectively prevent the trivial solution and thus enables utilizing both source and target inputs for model training. Extensive experiments are conducted on two sequence learning tasks: speech recognition and scene text recognition. From the experimental results, we empirically show that the proposed model improves the students across these tasks while achieving a considerable speed-up in the teacher model's training time.
引用
收藏
页码:2974 / 2987
页数:14
相关论文
共 18 条
  • [1] Leveraging logit uncertainty for better knowledge distillation
    Guo, Zhen
    Wang, Dong
    He, Qiang
    Zhang, Pengzhou
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [2] AN INVESTIGATION OF A KNOWLEDGE DISTILLATION METHOD FOR CTC ACOUSTIC MODELS
    Takashima, Ryoichi
    Li, Sheng
    Kawai, Hisashi
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5809 - 5813
  • [3] Improving Knowledge Distillation of CTC-Trained Acoustic Models With Alignment-Consistent Ensemble and Target Delay
    Ding, Haisong
    Chen, Kai
    Huo, Qiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (2561-2571) : 2561 - 2571
  • [4] Factorized and progressive knowledge distillation for CTC-based ASR models
    Tian, Sanli
    Li, Zehan
    Lyv, Zhaobiao
    Cheng, Gaofeng
    Xiao, Qing
    Li, Ta
    Zhao, Qingwei
    SPEECH COMMUNICATION, 2024, 160
  • [5] INVESTIGATION OF SEQUENCE-LEVEL KNOWLEDGE DISTILLATION METHODS FOR CTC ACOUSTIC MODELS
    Takashima, Ryoichi
    Sheng, Li
    Kawai, Hisashi
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6156 - 6160
  • [6] Investigation of Sequence-level Knowledge Distillation Methods for CTC Acoustic Models
    Takashima, Ryoichi
    Sheng, Li
    Kawai, Hisashi
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2019, 2019-May : 6156 - 6160
  • [7] Efficient Knowledge Distillation: Empowering Small Language Models with Teacher Model Insights
    Ballout, Mohamad
    Krumnack, Ulf
    Heidemann, Gunther
    Kuehnberger, Kai-Uwe
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT I, NLDB 2024, 2024, 14762 : 32 - 46
  • [8] ONLINE TARGET SOUND EXTRACTION WITH KNOWLEDGE DISTILLATION FROM PARTIALLY NON-CAUSAL TEACHER
    Wakayama, Keigo
    Ochiai, Tsubasa
    Delcroix, Marc
    Yasuda, Masahiro
    Saito, Shoichiro
    Araki, Shoko
    Nakayama, Akira
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 561 - 565
  • [9] Domain-specific knowledge distillation yields smaller and better models for conversational commerce
    Howell, Kristen
    Wang, Jian
    Hazare, Akshay
    Bradley, Joseph
    Brew, Chris
    Chen, Xi
    Dunn, Matthew
    Hockey, Beth Ann
    Maurer, Andrew
    Widdows, Dominic
    PROCEEDINGS OF THE 5TH WORKSHOP ON E-COMMERCE AND NLP (ECNLP 5), 2022, : 151 - 160
  • [10] Self-Improving Teacher Cultivates Better Student: Distillation Calibration for Multimodal Large Language Models
    Li, Xinwei
    Lin, Li
    Wang, Shuai
    Qian, Chen
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 882 - 892