SPEAKER ADAPTIVE TRAINING IN DEEP NEURAL NETWORKS USING SPEAKER DEPENDENT BOTTLENECK FEATURES

被引：0

作者：

Doddipatla, Rama ^{[1
]}

机构：

[1] Toshiba Res Europe Ltd, Cambridge Res Lab, Cambridge, England

来源：

2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS | 2016年

关键词：

Speaker adaptive training; speaker normalisation; deep neural networks; speaker dependent bottleneck features; automatic speech recognition; ADAPTATION;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The paper proposes an approach to perform speaker adaptive training (SAT) in deep neural networks using a two-stage DNN. The first-stage DNN extracts speaker dependent bottleneck(SDBN) features by updating the weights of the BN layer with speaker specific data. Using the SDBN features, a second-stage DNN is trained in the SAT framework. Choosing the BN layer as the speaker dependent layer instead of one of the hidden layers reduces the number of parameters to be tuned using speaker specific data. Experiments are presented on the Aurora4 task, where the input features are normalised with constrained maximum likelihood linear regression (CMLLR) and speaker information is appended in the form of D-vectors. Following an unsupervised adaptation of BN layer, the proposed approach provides a relative gain of 8.6% and 8.9% WER on top of DNNs trained with FBANK features appended with and without D-vectors respectively. A relative gain of 10.3% WER is observed when applied on top of DNNs trained with CMLLR transformed FBANK features, but the gain in performance saturated when combined with D-vectors. It is observed that supervised adaptation with as little as one minute of audio from a specific speaker improved the performance when compared with the baseline.

引用

页码：5290 / 5294

页数：5

共 50 条

[1] SPEAKER ADAPTIVE TRAINING USING DEEP NEURAL NETWORKS
Ochiai, Tsubasa
Matsuda, Shigeki
Lu, Xugang
Hori, Chiori
Katagiri, Shigeru
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[2] IMPROVED SPEAKER INDEPENDENT LIP READING USING SPEAKER ADAPTIVE TRAINING AND DEEP NEURAL NETWORKS
Almajai, Ibrahim
Cox, Stephen
Harvey, Richard
Lan, Yuxuan
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2722 - 2726
[3] IMPROVEMENTS TO SPEAKER ADAPTIVE TRAINING OF DEEP NEURAL NETWORKS
Miao, Yajie
Jiang, Lu
Zhang, Hao
Metze, Florian
[J]. 2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 165 - 170
[4] Investigation of Bottleneck Features and Multilingual Deep Neural Networks for Speaker Verification
Tian, Yao
Cai, Meng
He, Liang
Liu, Jia
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1151 - 1155
[5] Ensemble Speaker Modeling using Speaker Adaptive Training Deep Neural Network for Speaker Adaptation
Li, Sheng
Lu, Xugang
Akita, Yuya
Kawahara, Tatsuya
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2892 - 2896
[6] Embedding-Based Speaker Adaptive Training of Deep Neural Networks
Cui, Xiaodong
Goel, Vaibhava
Saon, George
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 122 - 126
[7] SPEAKER ADAPTIVE TRAINING FOR DEEP NEURAL NETWORKS EMBEDDING LINEAR TRANSFORMATION NETWORKS
Ochiai, Tsubasa
Matsuda, Shigeki
Watanabe, Hideyuki
Lu, Xugang
Hori, Chiori
Katagiri, Shigeru
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4605 - 4609
[8] On Speaker Adaptive Training of Artificial Neural Networks
Trmal, Jan
Zelinka, Jan
Mueller, Ludek
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 554 - 557
[9] SPEAKER ADAPTIVE JOINT TRAINING OF GAUSSIAN MIXTURE MODELS AND BOTTLENECK FEATURES
Tueske, Zoltan
Golik, Pavel
Schlueter, Ralf
Ney, Hermann
[J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 596 - 603
[10] Speaker-dependent Multipitch Tracking Using Deep Neural Networks
Liu, Yuzhou
Wang, DeLiang
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3279 - 3283

← 1 2 3 4 5 →