SPEAKER ADAPTIVE TRAINING IN DEEP NEURAL NETWORKS USING SPEAKER DEPENDENT BOTTLENECK FEATURES

被引:0
|
作者
Doddipatla, Rama [1 ]
机构
[1] Toshiba Res Europe Ltd, Cambridge Res Lab, Cambridge, England
关键词
Speaker adaptive training; speaker normalisation; deep neural networks; speaker dependent bottleneck features; automatic speech recognition; ADAPTATION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The paper proposes an approach to perform speaker adaptive training (SAT) in deep neural networks using a two-stage DNN. The first-stage DNN extracts speaker dependent bottleneck(SDBN) features by updating the weights of the BN layer with speaker specific data. Using the SDBN features, a second-stage DNN is trained in the SAT framework. Choosing the BN layer as the speaker dependent layer instead of one of the hidden layers reduces the number of parameters to be tuned using speaker specific data. Experiments are presented on the Aurora4 task, where the input features are normalised with constrained maximum likelihood linear regression (CMLLR) and speaker information is appended in the form of D-vectors. Following an unsupervised adaptation of BN layer, the proposed approach provides a relative gain of 8.6% and 8.9% WER on top of DNNs trained with FBANK features appended with and without D-vectors respectively. A relative gain of 10.3% WER is observed when applied on top of DNNs trained with CMLLR transformed FBANK features, but the gain in performance saturated when combined with D-vectors. It is observed that supervised adaptation with as little as one minute of audio from a specific speaker improved the performance when compared with the baseline.
引用
收藏
页码:5290 / 5294
页数:5
相关论文
共 50 条
  • [1] SPEAKER ADAPTIVE TRAINING USING DEEP NEURAL NETWORKS
    Ochiai, Tsubasa
    Matsuda, Shigeki
    Lu, Xugang
    Hori, Chiori
    Katagiri, Shigeru
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [2] IMPROVED SPEAKER INDEPENDENT LIP READING USING SPEAKER ADAPTIVE TRAINING AND DEEP NEURAL NETWORKS
    Almajai, Ibrahim
    Cox, Stephen
    Harvey, Richard
    Lan, Yuxuan
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2722 - 2726
  • [3] IMPROVEMENTS TO SPEAKER ADAPTIVE TRAINING OF DEEP NEURAL NETWORKS
    Miao, Yajie
    Jiang, Lu
    Zhang, Hao
    Metze, Florian
    [J]. 2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 165 - 170
  • [4] Investigation of Bottleneck Features and Multilingual Deep Neural Networks for Speaker Verification
    Tian, Yao
    Cai, Meng
    He, Liang
    Liu, Jia
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1151 - 1155
  • [5] Ensemble Speaker Modeling using Speaker Adaptive Training Deep Neural Network for Speaker Adaptation
    Li, Sheng
    Lu, Xugang
    Akita, Yuya
    Kawahara, Tatsuya
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2892 - 2896
  • [6] Embedding-Based Speaker Adaptive Training of Deep Neural Networks
    Cui, Xiaodong
    Goel, Vaibhava
    Saon, George
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 122 - 126
  • [7] SPEAKER ADAPTIVE TRAINING FOR DEEP NEURAL NETWORKS EMBEDDING LINEAR TRANSFORMATION NETWORKS
    Ochiai, Tsubasa
    Matsuda, Shigeki
    Watanabe, Hideyuki
    Lu, Xugang
    Hori, Chiori
    Katagiri, Shigeru
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4605 - 4609
  • [8] On Speaker Adaptive Training of Artificial Neural Networks
    Trmal, Jan
    Zelinka, Jan
    Mueller, Ludek
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 554 - 557
  • [9] SPEAKER ADAPTIVE JOINT TRAINING OF GAUSSIAN MIXTURE MODELS AND BOTTLENECK FEATURES
    Tueske, Zoltan
    Golik, Pavel
    Schlueter, Ralf
    Ney, Hermann
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 596 - 603
  • [10] Speaker-dependent Multipitch Tracking Using Deep Neural Networks
    Liu, Yuzhou
    Wang, DeLiang
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3279 - 3283