SPEAKER ADAPTIVE TRAINING FOR DEEP NEURAL NETWORKS EMBEDDING LINEAR TRANSFORMATION NETWORKS

被引:0
|
作者
Ochiai, Tsubasa [1 ,2 ]
Matsuda, Shigeki [2 ]
Watanabe, Hideyuki [1 ]
Lu, Xugang [1 ]
Hori, Chiori [1 ]
Katagiri, Shigeru [2 ]
机构
[1] Natl Inst Informat & Commun Technol, Kyoto, Japan
[2] Doshisha Univ, Grad Sch Engn, Kyoto, Japan
关键词
Speaker Adaptive Training; Deep Neural Network; Linear Transformation Network;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, a novel speaker adaptation method was proposed that applied the Speaker Adaptive Training (SAT) concept to a speech recognizer consisting of a Deep Neural Network (DNN) and a Hidden Markov Model (HMM), and its utility was demonstrated. This method implements the SAT scheme by allocating one Speaker Dependent (SD) module for each training speaker to one of the intermediate layers of the front-end DNN. It then jointly optimizes the SD modules and the other part of network, which is shared by all the speakers. In this paper, we propose an improved version of the above SAT-based adaptation scheme for a DNN-HMM recognizer. Our new training adopts a Linear Transformation Network (LTN) for the SD module, and such LTN employment leads to more appropriate regularization in both the SAT and adaptation stages by replacing an empirically selected anchorage of a network for regularization in the preceding SAT-DNN-HMM with a SAT-optimized anchorage. We elaborate the effectiveness of our proposed method over TED Talks corpus data. Our experimental results show that a speaker-adapted recognizer using our method achieves a significant word error rate reduction of 9.2 points from a baseline SI-DNN recognizer and also steadily outperforms speaker-adapted recognizers, each of which originates from the preceding SAT-based DNN-HMM.
引用
收藏
页码:4605 / 4609
页数:5
相关论文
共 50 条
  • [41] AdaXod: a new adaptive and momental bound algorithm for training deep neural networks
    Yuanxuan Liu
    Dequan Li
    [J]. The Journal of Supercomputing, 2023, 79 : 17691 - 17715
  • [42] Exploiting nonlinear dendritic adaptive computation in training deep Spiking Neural Networks
    Shen, Guobin
    Zhao, Dongcheng
    Zeng, Yi
    [J]. NEURAL NETWORKS, 2024, 170 : 190 - 201
  • [43] Practical applicability of deep neural networks for overlapping speaker separation
    Appeltans, Pieter
    Zegers, Jeroen
    Van Hamme, Hugo
    [J]. INTERSPEECH 2019, 2019, : 1353 - 1357
  • [44] A Deep Neural Networks Approach for Speaker Verification on Embedded Devices
    Do-Duc, Hao
    Van-Khai, Nguyen
    Chau-Thanh, Duc
    [J]. RECENT CHALLENGES IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2024, PT I, 2024, 2144 : 27 - 38
  • [46] Deep neural networks for speaker verification with short speech utterances
    Yang, Il-Ho
    Heo, Hee-Soo
    Yoon, Sung-Hyun
    Yu, Ha-Jin
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2016, 35 (06): : 501 - 509
  • [47] Neural Discriminant Analysis for Deep Speaker Embedding
    Li, Lantian
    Wang, Dong
    Zheng, Thomas Fang
    [J]. INTERSPEECH 2020, 2020, : 3251 - 3255
  • [48] INVESTIGATING DEEP NEURAL NETWORKS FOR SPEAKER DIARIZATION IN THE DIHARD CHALLENGE
    Himawan, Ivan
    Rahman, Md Hafizur
    Sridharan, Sridha
    Fookes, Clinton
    Kanagasundaram, Ahilan
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 1029 - 1035
  • [49] Speech Separation of A Target Speaker Based on Deep Neural Networks
    Du Jun
    Tu Yanhui
    Xu Yong
    Dai Lirong
    Chin-Hui, Lee
    [J]. 2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 473 - 477
  • [50] Improved Techniques for Training Adaptive Deep Networks
    Li, Hao
    Zhang, Hong
    Qi, Xiaojuan
    Yang, Ruigang
    Huang, Gao
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 1891 - 1900