ON COMBINING DNN AND GMM WITH UNSUPERVISED SPEAKER ADAPTATION FOR ROBUST AUTOMATIC SPEECH RECOGNITION

被引:0
|
作者
Liu, Shilin [1 ]
Sim, Khe Chai [1 ]
机构
[1] Natl Univ Singapore, Sch Comp, Singapore 117548, Singapore
关键词
Gaussian mixture model; Deep Neural Network; Speaker Adaptation;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, context-dependent Deep Neural Network (CD-DNN) has been found to significantly outperform Gaussian Mixture Model (GMM) for various large vocabulary continuous speech recognition tasks. Unlike the GMM approach, there is no meaningful interpretation of the DNN parameters, which makes it difficult to devise effective adaptation methods for DNNs. Furthermore, DNN parameter estimation is based on discriminative criteria, which is more sensitive to label errors and therefore less reliable for unsupervised adaptation. Many effective adaptation techniques that have been developed and proven to work well for GMM/HMM systems cannot be easily applied to DNNs. Therefore, this paper proposes a novel method of combining DNN and GMM using the Temporally Varying Weight Regression framework to take advantage of the superior performance of the DNNs and the robust adaptability of the GMMs. This paper addresses the issue of incorporating the high-dimensional CD-DNN posteriors into this framework without dramatically increasing the system complexity. Experimental results on a broadcast news large vocabulary transcription task show that the proposed GMM+DNN/HMM system achieved significant performance gain over the baseline DNN/HMM system. With additional unsupervised speaker adaptation, the best GMM+DNN/HMM system obtained about 20% relative improvements over the DNN/HMM baseline.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Unsupervised speaker adaptation for robust speech recognition in real environments
    Yamade, S
    Baba, A
    Yoshikawa, S
    Lee, A
    Saruwatari, H
    Shikano, K
    [J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2005, 88 (08): : 30 - 41
  • [2] A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation
    Tomashenko, Natalia
    Khokhlov, Yuri
    Esteve, Yannick
    [J]. STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2016, 2016, 9918 : 120 - 132
  • [3] AN INVESTIGATION INTO LEARNING EFFECTIVE SPEAKER SUBSPACES FOR ROBUST UNSUPERVISED DNN ADAPTATION
    Samarakoon, Lahiru
    Sim, Khe Chai
    Mak, Brian
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5035 - 5039
  • [4] Analysis of DNN Speech Signal Enhancement for Robust Speaker Recognition
    Novotny, Ondrej
    Plchot, Oldrich
    Glembek, Ondrej
    Cernocky, Jan ''Honza''
    Burget, Lukas
    [J]. COMPUTER SPEECH AND LANGUAGE, 2019, 58 : 403 - 421
  • [5] Combining MMSE enhancement with LA model adaptation for robust automatic speech recognition
    Ding, P
    Cao, ZG
    [J]. ELECTRONICS LETTERS, 2001, 37 (08) : 539 - 540
  • [6] Unsupervised Speaker Adaptation for DNN-based Speech Synthesis using Input Codes
    Takaki, Shinji
    Nishimura, Yoshikazu
    Yamagishi, Junichi
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 649 - 658
  • [7] Unsupervised Speaker Adaptation Using Speaker-Class Models for Lecture Speech Recognition
    Kosaka, Tetsuo
    Takeda, Yuui
    Ito, Takashi
    Kato, Masaharu
    Kohda, Masaki
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2363 - 2369
  • [8] ROBUST SPEAKER RECOGNITION BASED ON DNN/I-VECTORS AND SPEECH SEPARATION
    Chang, Jorge
    Wang, DeLiang
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5415 - 5419
  • [9] Speaker Recognition and Speech Emotion Recognition Based on GMM
    Xu, Shupeng
    Liu, Yan
    Liu, Xiping
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON ELECTRIC AND ELECTRONICS, 2013, : 434 - 436
  • [10] Combining Multiple Acoustic Models in GMM Spaces for Robust Speech Recognition
    Kang, Byung Ok
    Kwon, Oh-Wook
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (03): : 724 - 730