Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification

被引:16
|
作者
Shen, Peng [1 ]
Lu, Xugang [1 ]
Li, Sheng [1 ]
Kawai, Hisashi [1 ]
机构
[1] Natl Inst Informat & Commun Technol, Koganei, Tokyo, Japan
关键词
Task analysis; Speech processing; Training; Speech recognition; Neural networks; Feature extraction; Robustness; Internal representation learning; knowledge distillation; short utterances; spoken language identification;
D O I
10.1109/TASLP.2020.3023627
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
With successful applications of deep feature learning algorithms, spoken language identification (LID) on long utterances obtains satisfactory performance. However, the performance on short utterances is drastically degraded even when the LID system is trained using short utterances. The main reason is due to the large variation of the representation on short utterances which results in high model confusion. To narrow the performance gap between long, and short utterances, we proposed a teacher-student representation learning framework based on a knowledge distillation method to improve LID performance on short utterances. In the proposed framework, in addition to training the student model on short utterances with their true labels, the internal representation from the output of a hidden layer of the student model is supervised with the representation corresponding to their longer utterances. By reducing the distance of internal representations between short, and long utterances, the student model can explore robust discriminative representations for short utterances, which is expected to reduce model confusion. We conducted experiments on our in-house LID dataset, and NIST LRE07 dataset, and showed the effectiveness of the proposed methods for short utterance LID tasks.
引用
收藏
页码:2674 / 2683
页数:10
相关论文
共 50 条
  • [21] KDALDL: Knowledge Distillation-Based Adaptive Label Distribution Learning Network for Bone Age Assessment
    Zheng, Hao-Dong
    Yu, Lei
    Lu, Yu-Ting
    Zhang, Wei-Hao
    Yu, Yan-Jun
    IEEE ACCESS, 2024, 12 : 17679 - 17689
  • [22] Common latent representation learning for low-resourced spoken language identification
    Chen Chen
    Yulin Bu
    Yong Chen
    Deyun Chen
    Multimedia Tools and Applications, 2024, 83 : 34515 - 34535
  • [23] Common latent representation learning for low-resourced spoken language identification
    Chen, Chen
    Bu, Yulin
    Chen, Yong
    Chen, Deyun
    Multimedia Tools and Applications, 2024, 83 (12) : 34515 - 34535
  • [24] Subspace-Based Representation and Learning for Phonotactic Spoken Language Recognition
    Lee, Hung-Shin
    Tsao, Yu
    Jeng, Shyh-Kang
    Wang, Hsin-Min
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 3065 - 3079
  • [25] Comparative Study on Spoken Language Identification Based on Deep Learning
    Heracleous, Panikos
    Takai, Kohichi
    Yasuda, Keiji
    Mohammad, Yasser
    Yoneyama, Akio
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2265 - 2269
  • [26] An Efficient Knowledge Distillation-Based Detection Method for Infrared Small Targets
    Tang, Wenjuan
    Dai, Qun
    Hao, Fan
    REMOTE SENSING, 2024, 16 (17)
  • [27] Application of knowledge distillation in representation learning person re-identification model
    Liu, Chang
    Ma, Hang
    Jin, Jun-Jie
    Zhou, Xin-Lun
    Chen, Wen-Bai
    Journal of Computers (Taiwan), 2020, 31 (02): : 277 - 286
  • [28] A knowledge distillation-based deep interaction compressed network for CTR prediction
    Guan, Fei
    Qian, Cheng
    He, Feiyan
    KNOWLEDGE-BASED SYSTEMS, 2023, 275
  • [29] Minifying photometric stereo via knowledge distillation-based feature translation
    Han, Seungoh
    Park, Jinsun
    Cho, Donghyeon
    OPTICS EXPRESS, 2022, 30 (21) : 38284 - 38297
  • [30] FedKG: A Knowledge Distillation-Based Federated Graph Method for Social Bot Detection
    Wang, Xiujuan
    Chen, Kangmiao
    Wang, Keke
    Wang, Zhengxiang
    Zheng, Kangfeng
    Zhang, Jiayue
    SENSORS, 2024, 24 (11)