SPEAKER ADAPTIVE TRAINING USING DEEP NEURAL NETWORKS

被引:0
|
作者
Ochiai, Tsubasa [1 ,2 ]
Matsuda, Shigeki [1 ]
Lu, Xugang [1 ]
Hori, Chiori [1 ]
Katagiri, Shigeru [2 ]
机构
[1] Natl Inst Informat & Commun Technol, Spoken Language Commun Lab, Kyoto, Japan
[2] Doshisha Univ, Grad Sch Engn, Kyoto, Japan
关键词
Speaker Adaptative Training; Deep Neural Network; ADAPTATION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Among many speaker adaptation embodiments, Speaker Adaptive Training (SAT) has been successfully applied to a standard Hidden-Markov-Model (HMM) speech recognizer, whose state is associated with Gaussian Mixture Models (GMMs). On the other hand, recent studies on Speaker-Independent (SI) recognizer development have reported that a new type of HMM speech recognizer, which replaces GMMs with Deep Neural Networks (DNNs), outperforms GMM-HMM recognizers. Along these two lines, it is natural to conceive of further improvement to a preset DNN-HMM recognizer by employing SAT. In this paper, we propose a novel training scheme that applies SAT to a SI DNN-HMM recognizer. We then implement the SAT scheme by allocating a Speaker-Dependent (SD) module to one of the intermediate layers of a seven-layer DNN, and elaborate its utility over TED Talks corpus data. Experiment results show that our speaker-adapted SAT-based DNN-HMM recognizer reduces the word error rate by 8.4% more than that of a baseline SI DNN-HMM recognizer, and (regardless of the SD module allocation) outperforms the conventional speaker adaptation scheme. The results also show that the inner layers of DNN are more suitable for the SD module than the outer layers.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis Viewpoint
    Cyr, Eric C.
    Gulian, Mamikon A.
    Patel, Ravi G.
    Perego, Mauro
    Trask, Nathaniel A.
    MATHEMATICAL AND SCIENTIFIC MACHINE LEARNING, VOL 107, 2020, 107 : 512 - 536
  • [32] Adaptive Normalized Risk-Averting Training for Deep Neural Networks
    Wang, Zhiguang
    Oates, Tim
    Lo, James
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2201 - 2207
  • [33] Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks
    Koster, Urs
    Webb, Tristan J.
    Wang, Xin
    Nassar, Marcel
    Bansal, Arjun K.
    Constable, William H.
    Elibol, Oguz H.
    Gray, Scott
    Hall, Stewart
    Hornof, Luke
    Khosrowshahi, Amir
    Kloss, Carey
    Pai, Ruby J.
    Rao, Naveen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [34] Speaker adaptation using codebook integrated deep neural networks for speech enhancement
    Chidambar, B.
    Naidu, D. Hanumanth Rao
    JASA EXPRESS LETTERS, 2024, 4 (11):
  • [35] Improved Speaker Recognition System for Stressed Speech using Deep Neural Networks
    Dumpala, Sri Harsha
    Kopparapu, Sunil Kumar
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 1257 - 1264
  • [36] Speaker identification using neural networks
    Pawar, RV
    Kajave, PP
    Mali, SN
    ENFORMATIKA, VOL 7: IEC 2005 PROCEEDINGS, 2005, : 429 - 433
  • [37] Improving Deep Neural Networks Based Speaker Verification Using Unlabeled Data
    Tian, Yao
    Cai, Meng
    He, Liang
    Zhang, Wei-Qiang
    Liu, Jia
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1863 - 1867
  • [38] Speaker Identification using Neural Networks
    Pawar, R. V.
    Kajave, P. P.
    Mali, S. N.
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 7, 2005, 7 : 429 - 433
  • [39] SPEAKER ADAPTATION OF CONTEXT DEPENDENT DEEP NEURAL NETWORKS
    Liao, Hank
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7947 - 7951
  • [40] ASVtorch toolkit: Speaker verification with deep neural networks
    Lee, Kong Aik
    Vestman, Ville
    Kinnunen, Tomi
    SOFTWAREX, 2021, 14