DOMAIN ADAPTATION VIA TEACHER-STUDENT LEARNING FOR END-TO-END SPEECH RECOGNITION

被引:0
|
作者
Meng, Zhong [1 ]
Li, Jinyu [1 ]
Gaur, Yashesh [1 ]
Gong, Yifan [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
关键词
domain adaptation; teacher-student learning; end-to-end; encoder-decoder; speech recognition; NEURAL-NETWORKS; COMPRESSION;
D O I
10.1109/asru46091.2019.9003776
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Teacher-student (T/S) has shown to be effective for domain adaptation of deep neural network acoustic models in hybrid speech recognition systems. In this work, we extend the T/S learning to large-scale unsupervised domain adaptation of an attention-based end-to-end (E2E) model through two levels of knowledge transfer: teacher's token posteriors as soft labels and one-best predictions as decoder guidance. To further improve T/S learning with the help of ground-truth labels, we propose adaptive T/S (AT/S) learning. Instead of conditionally choosing from either the teacher's soft token posteriors or the one-hot ground-truth label, in AT/S, the student always learns from both the teacher and the ground truth with a pair of adaptive weights assigned to the soft and one-hot labels quantifying the confidence on each of the knowledge sources. The confidence scores are dynamically estimated at each decoder step as a function of the soft and one-hot labels. With 3400 hours parallel close-talk and far-field Microsoft Cortana data for domain adaptation, T/S and AT/S achieves 6.3% and 10.3% relative word error rate improvement over a strong E2E model trained with the same amount of far-field data.
引用
收藏
页码:268 / 275
页数:8
相关论文
共 50 条
  • [1] Robust Speech Recognition Using Teacher-Student Learning Domain Adaptation
    Ma, Han
    Zhang, Qiaoling
    Tang, Roubing
    Zhang, Lu
    Jia, Yubo
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (12) : 2112 - 2118
  • [2] Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning
    Denisov, Pavel
    Vu, Ngoc Thang
    [J]. INTERSPEECH 2020, 2020, : 881 - 885
  • [3] Semi-supervised end-to-end ASR via teacher-student learning with conditional posterior distribution
    Zhang, Zi-qiang
    Song, Yan
    Zhang, Jian-shu
    McLoughlin, Ian
    Dai, Li-Rong
    [J]. INTERSPEECH 2020, 2020, : 3580 - 3584
  • [4] DOMAIN ADAPTATION OF END-TO-END SPEECH RECOGNITION IN LOW-RESOURCE SETTINGS
    Samarakoon, Lahiru
    Mak, Brian
    Lam, Albert Y. S.
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 382 - 388
  • [5] SPEAKER ADAPTATION FOR MULTICHANNEL END-TO-END SPEECH RECOGNITION
    Ochiai, Tsubasa
    Watanabe, Shinji
    Katagiri, Shigeru
    Hori, Takaaki
    Hershey, John
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6707 - 6711
  • [6] Large-Scale Domain Adaptation via Teacher-Student Learning
    Li, Jinyu
    Seltzer, Michael L.
    Wang, Xi
    Zhao, Rui
    Gong, Yifan
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2386 - 2390
  • [7] Unsupervised Domain Adaptation on End-to-End Multi-Talker Overlapped Speech Recognition
    Zheng, Lin
    Zhu, Han
    Tian, Sanli
    Zhao, Qingwei
    Li, Ta
    [J]. IEEE Signal Processing Letters, 2024, 31 : 3119 - 3123
  • [8] AN EXPERIMENTAL STUDY ON PRIVATE AGGREGATION OF TEACHER ENSEMBLE LEARNING FOR END-TO-END SPEECH RECOGNITION
    Yang, Chao-Han Huck
    Chen, I-Fan
    Stolcke, Andreas
    Siniscalchi, Sabato Marco
    Lee, Chin-Hui
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1074 - 1080
  • [9] Speech Representation Learning for Emotion Recognition Using End-to-End ASR with Factorized Adaptation
    Yeh, Sung-Lin
    Lin, Yun-Shao
    Lee, Chi-Chun
    [J]. INTERSPEECH 2020, 2020, : 536 - 540
  • [10] IMPROVING END-TO-END SPEECH RECOGNITION WITH POLICY LEARNING
    Zhou, Yingbo
    Xiong, Caiming
    Socher, Richard
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5819 - 5823