SPEAKER ADAPTIVE TRAINING USING DEEP NEURAL NETWORKS

被引:0
|
作者
Ochiai, Tsubasa [1 ,2 ]
Matsuda, Shigeki [1 ]
Lu, Xugang [1 ]
Hori, Chiori [1 ]
Katagiri, Shigeru [2 ]
机构
[1] Natl Inst Informat & Commun Technol, Spoken Language Commun Lab, Kyoto, Japan
[2] Doshisha Univ, Grad Sch Engn, Kyoto, Japan
关键词
Speaker Adaptative Training; Deep Neural Network; ADAPTATION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Among many speaker adaptation embodiments, Speaker Adaptive Training (SAT) has been successfully applied to a standard Hidden-Markov-Model (HMM) speech recognizer, whose state is associated with Gaussian Mixture Models (GMMs). On the other hand, recent studies on Speaker-Independent (SI) recognizer development have reported that a new type of HMM speech recognizer, which replaces GMMs with Deep Neural Networks (DNNs), outperforms GMM-HMM recognizers. Along these two lines, it is natural to conceive of further improvement to a preset DNN-HMM recognizer by employing SAT. In this paper, we propose a novel training scheme that applies SAT to a SI DNN-HMM recognizer. We then implement the SAT scheme by allocating a Speaker-Dependent (SD) module to one of the intermediate layers of a seven-layer DNN, and elaborate its utility over TED Talks corpus data. Experiment results show that our speaker-adapted SAT-based DNN-HMM recognizer reduces the word error rate by 8.4% more than that of a baseline SI DNN-HMM recognizer, and (regardless of the SD module allocation) outperforms the conventional speaker adaptation scheme. The results also show that the inner layers of DNN are more suitable for the SD module than the outer layers.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] SPEAKER ADAPTIVE TRAINING IN DEEP NEURAL NETWORKS USING SPEAKER DEPENDENT BOTTLENECK FEATURES
    Doddipatla, Rama
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5290 - 5294
  • [2] IMPROVEMENTS TO SPEAKER ADAPTIVE TRAINING OF DEEP NEURAL NETWORKS
    Miao, Yajie
    Jiang, Lu
    Zhang, Hao
    Metze, Florian
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 165 - 170
  • [3] IMPROVED SPEAKER INDEPENDENT LIP READING USING SPEAKER ADAPTIVE TRAINING AND DEEP NEURAL NETWORKS
    Almajai, Ibrahim
    Cox, Stephen
    Harvey, Richard
    Lan, Yuxuan
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2722 - 2726
  • [4] SPEAKER ADAPTIVE TRAINING FOR DEEP NEURAL NETWORKS EMBEDDING LINEAR TRANSFORMATION NETWORKS
    Ochiai, Tsubasa
    Matsuda, Shigeki
    Watanabe, Hideyuki
    Lu, Xugang
    Hori, Chiori
    Katagiri, Shigeru
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4605 - 4609
  • [5] Embedding-Based Speaker Adaptive Training of Deep Neural Networks
    Cui, Xiaodong
    Goel, Vaibhava
    Saon, George
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 122 - 126
  • [6] Ensemble Speaker Modeling using Speaker Adaptive Training Deep Neural Network for Speaker Adaptation
    Li, Sheng
    Lu, Xugang
    Akita, Yuya
    Kawahara, Tatsuya
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2892 - 2896
  • [7] On Speaker Adaptive Training of Artificial Neural Networks
    Trmal, Jan
    Zelinka, Jan
    Mueller, Ludek
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 554 - 557
  • [8] Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors
    Miao, Yajie
    Zhang, Hao
    Metze, Florian
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (11) : 1938 - 1949
  • [9] Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors
    Miao, Yajie
    Zhang, Hao
    Metze, Florian
    IEEE Transactions on Audio, Speech and Language Processing, 2015, 23 (11): : 1938 - 1949
  • [10] SPEAKER CLUSTER-BASED SPEAKER ADAPTIVE TRAINING FOR DEEP NEURAL NETWORK ACOUSTIC MODELING
    Chu, Wei
    Chen, Ruxin
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5295 - 5299