Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification

被引：16

作者：

Shen, Peng ^{[1
]}

Lu, Xugang ^{[1
]}

Li, Sheng ^{[1
]}

Kawai, Hisashi ^{[1
]}

机构：

[1] Natl Inst Informat & Commun Technol, Koganei, Tokyo, Japan

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2020年 / 28卷 / 28期

关键词：

Task analysis; Speech processing; Training; Speech recognition; Neural networks; Feature extraction; Robustness; Internal representation learning; knowledge distillation; short utterances; spoken language identification;

D O I：

10.1109/TASLP.2020.3023627

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

With successful applications of deep feature learning algorithms, spoken language identification (LID) on long utterances obtains satisfactory performance. However, the performance on short utterances is drastically degraded even when the LID system is trained using short utterances. The main reason is due to the large variation of the representation on short utterances which results in high model confusion. To narrow the performance gap between long, and short utterances, we proposed a teacher-student representation learning framework based on a knowledge distillation method to improve LID performance on short utterances. In the proposed framework, in addition to training the student model on short utterances with their true labels, the internal representation from the output of a hidden layer of the student model is supervised with the representation corresponding to their longer utterances. By reducing the distance of internal representations between short, and long utterances, the student model can explore robust discriminative representations for short utterances, which is expected to reduce model confusion. We conducted experiments on our in-house LID dataset, and NIST LRE07 dataset, and showed the effectiveness of the proposed methods for short utterance LID tasks.

引用

页码：2674 / 2683

页数：10

共 50 条

[31] FedRCIL: Federated Knowledge Distillation for Representation based Contrastive Incremental Learning
Psaltis, Athanasios
Chatzikonstantinou, Christos
Patrikakis, Charalampos Z.
Daras, Petros
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 3455 - 3464
[32] DiffSLU: Knowledge Distillation Based Diffusion Model for Cross-Lingual Spoken Language Understanding
Mao, Tianjun
Zhang, Chenghong
INTERSPEECH 2023, 2023, : 715 - 719
[33] Efficient Vehicle Selection and Resource Allocation for Knowledge Distillation-Based Federated Learning in UAV-Assisted VEC
Li, Chunlin
Zhang, Yong
Yu, Long
Yang, Mengjie
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025,
[34] Spoken Language Identification Using Deep Learning
Singh, Gundeep
Sharma, Sahil
Kumar, Vijay
Kaur, Manjit
Baz, Mohammed
Masud, Mehedi
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
[35] A Knowledge Distillation-Based Transportation System for Sensory Data Sharing Using LoRa
Kumari, Preti
Mishra, Rahul
Gupta, Hari Prabhat
IEEE SENSORS JOURNAL, 2021, 21 (22) : 25315 - 25322
[36] Knowledge distillation-based performance transferring for LSTM-RNN model acceleration
Ma, Hongbin
Yang, Shuyuan
Wu, Ruowu
Hao, Xiaojun
Long, Huimin
He, Guangjun
SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (06) : 1541 - 1548
[37] KD-INR: Time-Varying Volumetric Data Compression via Knowledge Distillation-Based Implicit Neural Representation
Han, Jun
Zheng, Hao
Bi, Chongke
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (10) : 6826 - 6838
[38] Knowledge Distillation-Based Robust UAV Swarm Communication Under Malicious Attacks
Wu, Qirui
Zhang, Yirun
Yang, Zhaohui
Shikh-Bahaei, Mohammad
2024 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS, ICC WORKSHOPS 2024, 2024, : 1023 - 1029
[39] Facial landmark points detection using knowledge distillation-based neural networks
Fard, Ali Pourramezan
Mahoor, Mohammad H.
COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 215
[40] Knowledge distillation-based performance transferring for LSTM-RNN model acceleration
Hongbin Ma
Shuyuan Yang
Ruowu Wu
Xiaojun Hao
Huimin Long
Guangjun He
Signal, Image and Video Processing, 2022, 16 : 1541 - 1548

← 1 2 3 4 5 →