SPEAKER ADAPTATION OF DEEP NEURAL NETWORKS USING A HIERARCHY OF OUTPUT LAYERS

被引：0

作者：

Price, Ryan ^{[1
]}

Iso, Ken-ichi ^{[2
]}

Shinoda, Koichi ^{[1
]}

机构：

[1] Tokyo Inst Technol, Tokyo, Japan

[2] Yahoo Japan Corp, Tokyo, Japan

来源：

2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014 | 2014年

关键词：

Deep Neural Networks (DNN); Speaker Adaptation; Hierarchy;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep neural networks (DNN) used for acoustic modeling in speech recognition often have a very large number of output units corresponding to context dependent (CD) triphone HMM states. The amount of data available for speaker adaptation is often limited so a large majority of these CD states may not be observed during adaptation. In this case, the posterior probabilities of unseen CD states are only pushed towards zero during DNN speaker adaptation and the ability to predict these states can be degraded relative to the speaker independent network. We address this problem by appending an additional output layer which maps the original set of ONN output classes to a smaller set of phonetic classes (e.g. monophones) thereby reducing the occurrences of unseen states in the adaptation data. Adaptation proceeds by backpropagation of errors from the new output layer, which is disregarded at recognition time when posterior probabilities over the original set of CD states are used. We demonstrate the benefits of this approach over adapting the network with the original set of CD states using experiments on a Japanese voice search task and obtain 5.03% relative reduction in character error rate with approximately 60 seconds of adaptation data.

引用

页码：153 / 158

页数：6

共 50 条

[1] SPEAKER ADAPTATION OF CONTEXT DEPENDENT DEEP NEURAL NETWORKS
Liao, Hank
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7947 - 7951
[2] Deep Neural Networks with Cascaded Output Layers
[J]. Bai, Jie (baijie@tongji.edu.cn), 1600, Science Press (45):
[3] Channel adaptation based on deep neural networks for speaker verification
[J]. 2016, Sichuan University (48):
[4] Fast speaker adaptation using extended diagonal linear transformation for deep neural networks
Kim, Donghyun
Kim, Sanghun
[J]. ETRI JOURNAL, 2019, 41 (01) : 109 - 116
[5] IMPROVING SPEAKER RECOGNITION PERFORMANCE IN THE DOMAIN ADAPTATION CHALLENGE USING DEEP NEURAL NETWORKS
Garcia-Romero, Daniel
Zhang, Xiaohui
McCree, Alan
Povey, Daniel
[J]. 2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 378 - 383
[6] Speaker2Vec: Unsupervised Learning and Adaptation of a Speaker Manifold using Deep Neural Networks with an Evaluation on Speaker Segmentation
Jati, Arindam
Georgiou, Panayiotis
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3567 - 3571
[7] Ensemble Speaker Modeling using Speaker Adaptive Training Deep Neural Network for Speaker Adaptation
Li, Sheng
Lu, Xugang
Akita, Yuya
Kawahara, Tatsuya
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2892 - 2896
[8] SPEAKER ADAPTIVE TRAINING USING DEEP NEURAL NETWORKS
Ochiai, Tsubasa
Matsuda, Shigeki
Lu, Xugang
Hori, Chiori
Katagiri, Shigeru
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[9] Restructuring Output Layers of Deep Neural Networks using Minimum Risk Parameter Clustering
Kubo, Yotaro
Suzuki, Jun
Hori, Takaaki
Nakamura, Atsushi
[J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1068 - 1072
[10] Speaker Diarization Using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings
Cyrta, Pawel
Trzcinski, Tomasz
Stokowiec, Wojciech
[J]. INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, PT I, 2018, 655 : 107 - 117

← 1 2 3 4 5 →