SPEAKER ADAPTATION OF DEEP NEURAL NETWORKS USING A HIERARCHY OF OUTPUT LAYERS

被引:0
|
作者
Price, Ryan [1 ]
Iso, Ken-ichi [2 ]
Shinoda, Koichi [1 ]
机构
[1] Tokyo Inst Technol, Tokyo, Japan
[2] Yahoo Japan Corp, Tokyo, Japan
关键词
Deep Neural Networks (DNN); Speaker Adaptation; Hierarchy;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep neural networks (DNN) used for acoustic modeling in speech recognition often have a very large number of output units corresponding to context dependent (CD) triphone HMM states. The amount of data available for speaker adaptation is often limited so a large majority of these CD states may not be observed during adaptation. In this case, the posterior probabilities of unseen CD states are only pushed towards zero during DNN speaker adaptation and the ability to predict these states can be degraded relative to the speaker independent network. We address this problem by appending an additional output layer which maps the original set of ONN output classes to a smaller set of phonetic classes (e.g. monophones) thereby reducing the occurrences of unseen states in the adaptation data. Adaptation proceeds by backpropagation of errors from the new output layer, which is disregarded at recognition time when posterior probabilities over the original set of CD states are used. We demonstrate the benefits of this approach over adapting the network with the original set of CD states using experiments on a Japanese voice search task and obtain 5.03% relative reduction in character error rate with approximately 60 seconds of adaptation data.
引用
收藏
页码:153 / 158
页数:6
相关论文
共 50 条
  • [41] SPEAKER INDEPENDENT DIARIZATION FOR CHILD LANGUAGE ENVIRONMENT ANALYSIS USING DEEP NEURAL NETWORKS
    Najafian, Maryam
    Hansen, John H. L.
    [J]. 2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 114 - 120
  • [42] Comparison of Regularization Constraints in Deep Neural Network based Speaker Adaptation
    Shen, Peng
    Lu, Xugang
    Kawai, Hisashi
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [43] Speaker Verification Under Adverse Conditions Using I-vector Adaptation and Neural Networks
    Alam, Jahangir
    Kenny, Patrick
    Bhattacharya, Gautam
    Kockmann, Marcel
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3732 - 3736
  • [44] QDNN: deep neural networks with quantum layers
    Chen Zhao
    Xiao-Shan Gao
    [J]. Quantum Machine Intelligence, 2021, 3
  • [45] QDNN: deep neural networks with quantum layers
    Zhao, Chen
    Gao, Xiao-Shan
    [J]. QUANTUM MACHINE INTELLIGENCE, 2021, 3 (01)
  • [46] LayerOut: Freezing Layers in Deep Neural Networks
    Goutam K.
    Balasubramanian S.
    Gera D.
    Sarma R.R.
    [J]. SN Computer Science, 2020, 1 (5)
  • [47] Speaker verification using committee neural networks
    Reddy, NP
    Butch, OA
    [J]. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2003, 72 (02) : 109 - 115
  • [48] Speaker diarization using autoassociative neural networks
    Jothilakshmi, S.
    Ramalingam, V.
    Palanivel, S.
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2009, 22 (4-5) : 667 - 675
  • [49] Speaker recognition using artificial neural networks
    Mueen, F
    Ahmed, A
    Sanaullah
    Gaba, A
    [J]. ISCON 2002: IEEE STUDENTS CONFERENCE ON EMERGING TECHNOLOGIES, PROCEEDINGS, 2002, : 99 - 102
  • [50] Speaker identification using Neural Networks on an FPGA
    Trujillo-Romero, F.
    Caballero-Morales, S. O.
    [J]. 2012 IEEE NINTH ELECTRONICS, ROBOTICS AND AUTOMOTIVE MECHANICS CONFERENCE (CERMA 2012), 2012, : 197 - 202