SPEAKER ADAPTATION OF DEEP NEURAL NETWORKS USING A HIERARCHY OF OUTPUT LAYERS

被引：0

作者：

Price, Ryan ^{[1
]}

Iso, Ken-ichi ^{[2
]}

Shinoda, Koichi ^{[1
]}

机构：

[1] Tokyo Inst Technol, Tokyo, Japan

[2] Yahoo Japan Corp, Tokyo, Japan

来源：

2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014 | 2014年

关键词：

Deep Neural Networks (DNN); Speaker Adaptation; Hierarchy;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep neural networks (DNN) used for acoustic modeling in speech recognition often have a very large number of output units corresponding to context dependent (CD) triphone HMM states. The amount of data available for speaker adaptation is often limited so a large majority of these CD states may not be observed during adaptation. In this case, the posterior probabilities of unseen CD states are only pushed towards zero during DNN speaker adaptation and the ability to predict these states can be degraded relative to the speaker independent network. We address this problem by appending an additional output layer which maps the original set of ONN output classes to a smaller set of phonetic classes (e.g. monophones) thereby reducing the occurrences of unseen states in the adaptation data. Adaptation proceeds by backpropagation of errors from the new output layer, which is disregarded at recognition time when posterior probabilities over the original set of CD states are used. We demonstrate the benefits of this approach over adapting the network with the original set of CD states using experiments on a Japanese voice search task and obtain 5.03% relative reduction in character error rate with approximately 60 seconds of adaptation data.

引用

页码：153 / 158

页数：6

共 50 条

[31] A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition
School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta
GA
30332, United States
不详
Sicily, Italy
[J]. Neurocomputing, (448-459):
[32] IMPROVEMENTS TO SPEAKER ADAPTIVE TRAINING OF DEEP NEURAL NETWORKS
Miao, Yajie
Jiang, Lu
Zhang, Hao
Metze, Florian
[J]. 2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 165 - 170
[33] ASVtorch toolkit: Speaker verification with deep neural networks
Lee, Kong Aik
Vestman, Ville
Kinnunen, Tomi
[J]. SOFTWAREX, 2021, 14
[34] I-VECTOR-BASED SPEAKER ADAPTATION OF DEEP NEURAL NETWORKS FOR FRENCH BROADCAST AUDIO TRANSCRIPTION
Gupta, Vishwa
Kenny, Patrick
Ouellet, Pierre
Stafylakis, Themos
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[35] A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition
Huang, Zhen
Siniscalchi, Sabato Marco
Lee, Chin-Hui
[J]. NEUROCOMPUTING, 2016, 218 : 448 - 459
[36] Deep Neural Networks for Multiple Speaker Detection and Localization
He, Weipeng
Motlicek, Petr
Odobez, Jean-Marc
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 74 - 79
[37] Deep Residual Output Layers for Neural Language Generation
Pappas, Nikolaos
Henderson, James
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[38] Layers Sequence Optimizing for Deep Neural Networks using Multiples Objectives
Neto, Giuseppe
Miranda, Pericles B. C.
Cavalcanti, George D. C.
Si, Tapas
Cordeiro, Filipe
Castro, Mayara
[J]. 2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
[39] Multisource Domain Adaptation for Remote Sensing Using Deep Neural Networks
Elshamli, Ahmed
Taylor, Graham W.
Areibi, Shawki
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (05): : 3328 - 3340
[40] Domain adaptation for ear recognition using deep convolutional neural networks
Eyiokur, Fevziye Irem
Yaman, Dogucan
Ekenel, Hazim Kemal
[J]. IET BIOMETRICS, 2018, 7 (03) : 199 - 206

← 1 2 3 4 5 →