SPEAKER ADAPTATION OF DEEP NEURAL NETWORKS USING A HIERARCHY OF OUTPUT LAYERS

被引:0
|
作者
Price, Ryan [1 ]
Iso, Ken-ichi [2 ]
Shinoda, Koichi [1 ]
机构
[1] Tokyo Inst Technol, Tokyo, Japan
[2] Yahoo Japan Corp, Tokyo, Japan
关键词
Deep Neural Networks (DNN); Speaker Adaptation; Hierarchy;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep neural networks (DNN) used for acoustic modeling in speech recognition often have a very large number of output units corresponding to context dependent (CD) triphone HMM states. The amount of data available for speaker adaptation is often limited so a large majority of these CD states may not be observed during adaptation. In this case, the posterior probabilities of unseen CD states are only pushed towards zero during DNN speaker adaptation and the ability to predict these states can be degraded relative to the speaker independent network. We address this problem by appending an additional output layer which maps the original set of ONN output classes to a smaller set of phonetic classes (e.g. monophones) thereby reducing the occurrences of unseen states in the adaptation data. Adaptation proceeds by backpropagation of errors from the new output layer, which is disregarded at recognition time when posterior probabilities over the original set of CD states are used. We demonstrate the benefits of this approach over adapting the network with the original set of CD states using experiments on a Japanese voice search task and obtain 5.03% relative reduction in character error rate with approximately 60 seconds of adaptation data.
引用
收藏
页码:153 / 158
页数:6
相关论文
共 50 条
  • [31] A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition
    School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta
    GA
    30332, United States
    不详
    Sicily, Italy
    [J]. Neurocomputing, (448-459):
  • [32] IMPROVEMENTS TO SPEAKER ADAPTIVE TRAINING OF DEEP NEURAL NETWORKS
    Miao, Yajie
    Jiang, Lu
    Zhang, Hao
    Metze, Florian
    [J]. 2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 165 - 170
  • [33] ASVtorch toolkit: Speaker verification with deep neural networks
    Lee, Kong Aik
    Vestman, Ville
    Kinnunen, Tomi
    [J]. SOFTWAREX, 2021, 14
  • [34] I-VECTOR-BASED SPEAKER ADAPTATION OF DEEP NEURAL NETWORKS FOR FRENCH BROADCAST AUDIO TRANSCRIPTION
    Gupta, Vishwa
    Kenny, Patrick
    Ouellet, Pierre
    Stafylakis, Themos
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [35] A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition
    Huang, Zhen
    Siniscalchi, Sabato Marco
    Lee, Chin-Hui
    [J]. NEUROCOMPUTING, 2016, 218 : 448 - 459
  • [36] Deep Neural Networks for Multiple Speaker Detection and Localization
    He, Weipeng
    Motlicek, Petr
    Odobez, Jean-Marc
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 74 - 79
  • [37] Deep Residual Output Layers for Neural Language Generation
    Pappas, Nikolaos
    Henderson, James
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [38] Layers Sequence Optimizing for Deep Neural Networks using Multiples Objectives
    Neto, Giuseppe
    Miranda, Pericles B. C.
    Cavalcanti, George D. C.
    Si, Tapas
    Cordeiro, Filipe
    Castro, Mayara
    [J]. 2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
  • [39] Multisource Domain Adaptation for Remote Sensing Using Deep Neural Networks
    Elshamli, Ahmed
    Taylor, Graham W.
    Areibi, Shawki
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (05): : 3328 - 3340
  • [40] Domain adaptation for ear recognition using deep convolutional neural networks
    Eyiokur, Fevziye Irem
    Yaman, Dogucan
    Ekenel, Hazim Kemal
    [J]. IET BIOMETRICS, 2018, 7 (03) : 199 - 206