LEARNING EFFECTIVE FACTORIZED HIDDEN LAYER BASES USING STUDENT-TEACHER TRAINING FOR LSTM ACOUSTIC MODEL ADAPTATION

被引：0

作者：

Samarakoon, Lahiru ^{[1
,2
]}

Mak, Brian ^{[1
]}

Sim, Khe Chai ^{[3
]}

机构：

[1] Hong Kong Univ Sci & Technol, Hong Kong, Hong Kong, Peoples R China

[2] Fano Labs, Hong Kong, Hong Kong, Peoples R China

[3] Google Inc, Mountain View, CA USA

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

Long Short-Term memory (LSTM); Recurrent Neural Networks (RNNs); Speaker Adaptation; Student-teacher training; Acoustic Modeling;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Factorized Hidden Layer (FHL) has been proposed for the adaptation of deep neural network (DNN) and Long Short-Term Memory (LSTM) based acoustic models (AMs). In FHL, a speaker-dependent (SD) transformation matrix and an SD bias are included in addition to the standard affine transformation. The SD transformation is a linear combination of rank-1 matrices whereas the SD bias is a linear combination of vectors. However, the adaptation of LSTMs is challenging and often reports modest gains. In this paper, we propose to use student-teacher training to estimate more efficient FHL bases for LSTM AMs using an FHL adapted DNN as the teacher model. For both AMI IHM and AMI SDM tasks, FHL achieves 3.2% absolute improvement over the frame-level cross entropy trained LSTM base-lines. Moreover, FHL results 3.0% and 3.8% absolute improvements over sequentially trained LSTM baselines for the AMI IHM and AMI SDM tasks respectively.

引用

页码：5954 / 5958

页数：5

共 4 条

[1] LOW-RANK BASES FOR FACTORIZED HIDDEN LAYER ADAPTATION OF DNN ACOUSTIC MODELS
Samarakoon, Lahiru
Sim, Khe Chai
2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 652 - 658
[2] OptiDistillNet: Learning nonlinear pulse propagation using the student-teacher model
Gautam, Naveenta
Kaushik, Vinay
Choudhary, Amol
Lall, Brejesh
OPTICS EXPRESS, 2022, 30 (23) : 42430 - 42439
[3] ADVANCING MULTI-ACCENTED LSTM-CTC SPEECH RECOGNITION USING A DOMAIN SPECIFIC STUDENT-TEACHER LEARNING PARADIGM
Ghorbani, Shahram
Bulut, Ahmet E.
Hansen, John H. L.
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 29 - 35
[4] Student becomes teacher: training faster deep learning lightweight networks for automated identification of optical coherence tomography B-scans of interest using a student-teacher framework
Owen, Julia P.
Blazes, Marian
Manivannan, Niranchana
Lee, Gary C.
Yu, Sophia
Durbin, Mary K.
Nair, Aditya
Singh, Rishi P.
Talcott, Katherine E.
Melo, Alline G.
Greenlee, Tyler
Chen, Eric R.
Conti, Thais F.
Lee, Cecilia S.
Lee, Aaron Y.
BIOMEDICAL OPTICS EXPRESS, 2021, 12 (09) : 5387 - 5399

← 1 →