LEARNING EFFECTIVE FACTORIZED HIDDEN LAYER BASES USING STUDENT-TEACHER TRAINING FOR LSTM ACOUSTIC MODEL ADAPTATION

被引:0
|
作者
Samarakoon, Lahiru [1 ,2 ]
Mak, Brian [1 ]
Sim, Khe Chai [3 ]
机构
[1] Hong Kong Univ Sci & Technol, Hong Kong, Hong Kong, Peoples R China
[2] Fano Labs, Hong Kong, Hong Kong, Peoples R China
[3] Google Inc, Mountain View, CA USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
关键词
Long Short-Term memory (LSTM); Recurrent Neural Networks (RNNs); Speaker Adaptation; Student-teacher training; Acoustic Modeling;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Factorized Hidden Layer (FHL) has been proposed for the adaptation of deep neural network (DNN) and Long Short-Term Memory (LSTM) based acoustic models (AMs). In FHL, a speaker-dependent (SD) transformation matrix and an SD bias are included in addition to the standard affine transformation. The SD transformation is a linear combination of rank-1 matrices whereas the SD bias is a linear combination of vectors. However, the adaptation of LSTMs is challenging and often reports modest gains. In this paper, we propose to use student-teacher training to estimate more efficient FHL bases for LSTM AMs using an FHL adapted DNN as the teacher model. For both AMI IHM and AMI SDM tasks, FHL achieves 3.2% absolute improvement over the frame-level cross entropy trained LSTM base-lines. Moreover, FHL results 3.0% and 3.8% absolute improvements over sequentially trained LSTM baselines for the AMI IHM and AMI SDM tasks respectively.
引用
收藏
页码:5954 / 5958
页数:5
相关论文
共 4 条
  • [1] LOW-RANK BASES FOR FACTORIZED HIDDEN LAYER ADAPTATION OF DNN ACOUSTIC MODELS
    Samarakoon, Lahiru
    Sim, Khe Chai
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 652 - 658
  • [2] OptiDistillNet: Learning nonlinear pulse propagation using the student-teacher model
    Gautam, Naveenta
    Kaushik, Vinay
    Choudhary, Amol
    Lall, Brejesh
    OPTICS EXPRESS, 2022, 30 (23) : 42430 - 42439
  • [3] ADVANCING MULTI-ACCENTED LSTM-CTC SPEECH RECOGNITION USING A DOMAIN SPECIFIC STUDENT-TEACHER LEARNING PARADIGM
    Ghorbani, Shahram
    Bulut, Ahmet E.
    Hansen, John H. L.
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 29 - 35
  • [4] Student becomes teacher: training faster deep learning lightweight networks for automated identification of optical coherence tomography B-scans of interest using a student-teacher framework
    Owen, Julia P.
    Blazes, Marian
    Manivannan, Niranchana
    Lee, Gary C.
    Yu, Sophia
    Durbin, Mary K.
    Nair, Aditya
    Singh, Rishi P.
    Talcott, Katherine E.
    Melo, Alline G.
    Greenlee, Tyler
    Chen, Eric R.
    Conti, Thais F.
    Lee, Cecilia S.
    Lee, Aaron Y.
    BIOMEDICAL OPTICS EXPRESS, 2021, 12 (09) : 5387 - 5399