Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification

被引:16
|
作者
Shen, Peng [1 ]
Lu, Xugang [1 ]
Li, Sheng [1 ]
Kawai, Hisashi [1 ]
机构
[1] Natl Inst Informat & Commun Technol, Koganei, Tokyo, Japan
关键词
Task analysis; Speech processing; Training; Speech recognition; Neural networks; Feature extraction; Robustness; Internal representation learning; knowledge distillation; short utterances; spoken language identification;
D O I
10.1109/TASLP.2020.3023627
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
With successful applications of deep feature learning algorithms, spoken language identification (LID) on long utterances obtains satisfactory performance. However, the performance on short utterances is drastically degraded even when the LID system is trained using short utterances. The main reason is due to the large variation of the representation on short utterances which results in high model confusion. To narrow the performance gap between long, and short utterances, we proposed a teacher-student representation learning framework based on a knowledge distillation method to improve LID performance on short utterances. In the proposed framework, in addition to training the student model on short utterances with their true labels, the internal representation from the output of a hidden layer of the student model is supervised with the representation corresponding to their longer utterances. By reducing the distance of internal representations between short, and long utterances, the student model can explore robust discriminative representations for short utterances, which is expected to reduce model confusion. We conducted experiments on our in-house LID dataset, and NIST LRE07 dataset, and showed the effectiveness of the proposed methods for short utterance LID tasks.
引用
收藏
页码:2674 / 2683
页数:10
相关论文
共 50 条
  • [41] Applying feature normalization based on pole filtering to short-utterance speech recognition using deep neural network
    Han, Jaemin
    Kim, Min Sik
    Kim, Hyung Soon
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (01): : 64 - 68
  • [42] Representation Learning and Knowledge Distillation for Lightweight Domain Adaptation
    Bin Shah, Sayed Rafay
    Putty, Shreyas Subhash
    Schwung, Andreas
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 1202 - 1207
  • [43] DURATION-NORMALIZED FEATURE SELECTION FOR INDIAN SPOKEN LANGUAGE IDENTIFICATION IN UTTERANCE LENGTH MISMATCH
    Bakshi, Aarti M.
    Kopparapu, Sunil K.
    JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2022, 17 (03): : 2120 - 2134
  • [44] MUFTI: Multi-Domain Distillation-Based Heterogeneous Federated Continuous Learning
    Gai, Keke
    Wang, Zijun
    Yu, Jing
    Zhu, Liehuang
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2025, 20 : 2721 - 2733
  • [45] Effective Intrusion Detection in Heterogeneous Internet-of-Things Networks via Ensemble Knowledge Distillation-based Federated Learning
    Shen, Jiyuan
    Yang, Wenzhuo
    Chu, Zhaowei
    Fan, Jiani
    Niyato, Dusit
    Lam, Kwok-Yan
    ICC 2024 - IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2024, : 2034 - 2039
  • [46] Knowledge distillation-based information sharing for online process monitoring in decentralized manufacturing system
    Shi, Zhangyue
    Li, Yuxuan
    Liu, Chenang
    JOURNAL OF INTELLIGENT MANUFACTURING, 2025, 36 (03) : 2177 - 2192
  • [47] Learning to Imagine: Distillation-Based Interactive Context Exploitation for Dialogue State Tracking
    Guo, Jinyu
    Shuang, Kai
    Zhang, Kaihang
    Liu, Yixuan
    Li, Jijie
    Wang, Zihan
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 12845 - 12853
  • [48] A review into deep learning techniques for spoken language identification
    Irshad Ahmad Thukroo
    Rumaan Bashir
    Kaiser J. Giri
    Multimedia Tools and Applications, 2022, 81 : 32593 - 32624
  • [49] A review into deep learning techniques for spoken language identification
    Thukroo, Irshad Ahmad
    Bashir, Rumaan
    Giri, Kaiser J.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (22) : 32593 - 32624
  • [50] Identification of Spoken Language using Machine Learning Approach
    Shahriar, Md Asif
    Aziz, Iftekhar
    Banik, Shovan
    Sattar, Abdus
    2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020), 2020,