SPEAKER ADAPTIVE TRAINING USING DEEP NEURAL NETWORKS

被引:0
|
作者
Ochiai, Tsubasa [1 ,2 ]
Matsuda, Shigeki [1 ]
Lu, Xugang [1 ]
Hori, Chiori [1 ]
Katagiri, Shigeru [2 ]
机构
[1] Natl Inst Informat & Commun Technol, Spoken Language Commun Lab, Kyoto, Japan
[2] Doshisha Univ, Grad Sch Engn, Kyoto, Japan
关键词
Speaker Adaptative Training; Deep Neural Network; ADAPTATION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Among many speaker adaptation embodiments, Speaker Adaptive Training (SAT) has been successfully applied to a standard Hidden-Markov-Model (HMM) speech recognizer, whose state is associated with Gaussian Mixture Models (GMMs). On the other hand, recent studies on Speaker-Independent (SI) recognizer development have reported that a new type of HMM speech recognizer, which replaces GMMs with Deep Neural Networks (DNNs), outperforms GMM-HMM recognizers. Along these two lines, it is natural to conceive of further improvement to a preset DNN-HMM recognizer by employing SAT. In this paper, we propose a novel training scheme that applies SAT to a SI DNN-HMM recognizer. We then implement the SAT scheme by allocating a Speaker-Dependent (SD) module to one of the intermediate layers of a seven-layer DNN, and elaborate its utility over TED Talks corpus data. Experiment results show that our speaker-adapted SAT-based DNN-HMM recognizer reduces the word error rate by 8.4% more than that of a baseline SI DNN-HMM recognizer, and (regardless of the SD module allocation) outperforms the conventional speaker adaptation scheme. The results also show that the inner layers of DNN are more suitable for the SD module than the outer layers.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Speech Enhancement for Speaker Recognition Using Deep Recurrent Neural Networks
    Tkachenko, Maxim
    Yamshinin, Alexander
    Lyubimov, Nikolay
    Kotov, Mikhail
    Nastasenko, Marina
    SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 690 - 699
  • [22] Speaker-dependent multipitch tracking using deep neural networks
    Liu, Yuzhou
    Wang, DeLiang
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (02): : 710 - 721
  • [23] Prompt Tuning of Deep Neural Networks for Speaker-Adaptive Visual Speech Recognition
    Kim, Minsu
    Kim, Hyung-Il
    Ro, Yong Man
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (02) : 1042 - 1055
  • [24] Insights into Deep Neural Networks for Speaker Recognition
    Garcia-Romero, Daniel
    McCree, Alan
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1141 - 1145
  • [25] DEEP NEURAL NETWORKS FOR COCHANNEL SPEAKER IDENTIFICATION
    Zhao, Xiaojia
    Wang, Yuxuan
    Wang, DeLiang
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4824 - 4828
  • [26] Training Robust Deep Neural Networks on Noisy Labels Using Adaptive Sample Selection With Disagreement
    Takeda, Hiroshi
    Yoshida, Soh
    Muneyasu, Mitsuji
    IEEE ACCESS, 2021, 9 : 141131 - 141143
  • [27] Training Deep Spiking Neural Networks Using Backpropagation
    Lee, Jun Haeng
    Delbruck, Tobi
    Pfeiffer, Michael
    FRONTIERS IN NEUROSCIENCE, 2016, 10
  • [28] ON TRAINING DEEP NEURAL NETWORKS USING A STREAMING APPROACH
    Duda, Piotr
    Jaworski, Maciej
    Cader, Andrzej
    Wang, Lipo
    JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2020, 10 (01) : 15 - 26
  • [29] PWPROP: A Progressive Weighted Adaptive Method for Training Deep Neural Networks
    Wang, Dong
    Xu, Tao
    Zhang, Huatian
    Shang, Fanhua
    Liu, Hongying
    Liu, Yuanyuan
    Shen, Shengmei
    2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 508 - 515
  • [30] An Adaptive Layer Expansion Algorithm for Efficient Training of Deep Neural Networks
    Chen, Yi-Long
    Liu, Pangfeng
    Wu, Jan-Jan
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 420 - 425