AN INVESTIGATION INTO LEARNING EFFECTIVE SPEAKER SUBSPACES FOR ROBUST UNSUPERVISED DNN ADAPTATION

被引:0
|
作者
Samarakoon, Lahiru [1 ,2 ]
Sim, Khe Chai [3 ]
Mak, Brian [2 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
[2] Hong Kong Univ Sci & Technol, Hong Kong, Hong Kong, Peoples R China
[3] Google Inc, Mountain View, CA USA
关键词
Automatic Speech Recognition; DNN Adaptation; Subspace Methods; NEURAL-NETWORK; TRANSFORMATIONS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Subspace methods are used for deep neural network (DNN)based acoustic model adaptation. These methods first construct a subspace and then perform the speaker adaptation as a point in the subspace. This paper aims to investigate the effectiveness of subspace methods for robust unsupervised adaptation. For the analysis, we compare two state-of-the-art subspace methods, namely, the singular value decomposition (SVD)-based bottleneck adaptation and the factorized hidden layer (FHL) adaptation. Both of these methods perform speaker adaptation as a linear combination of rank-1 bases. The main difference between the subspace construction is that FHL adaptation constructs a speaker subspace separate from the phoneme classification space while SVD-based bottleneck adaptation shares the same subspace for both the phoneme classification and the speaker adaptation. So far, no direct comparisons between these two methods are reported. In this work, we compare these two methods for their robustness to unsupervised adaptation on Aurora 4, AMI IHM and AMI SDM tasks. Our findings show that the FHL adaptation outperforms the SVD-based bottleneck adaptation especially in challenging conditions where the adaptation data is limited, or the quality of the adaptation alignments are low.
引用
收藏
页码:5035 / 5039
页数:5
相关论文
共 50 条
  • [31] Iterative unsupervised speaker adaptation for batch dictation
    Homma, S
    Takahashi, J
    Sagayama, S
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1141 - 1144
  • [32] MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
    Fan, Yuchen
    Qian, Yao
    Soong, Frank K.
    He, Lei
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4475 - 4479
  • [33] Unsupervised speaker adaptation for speaker independent acoustic to articulatory speech inversion
    Sivaraman, Ganesh
    Mitra, Vikramjit
    Nam, Hosung
    Tiede, Mark
    Espy-Wilson, Carol
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 146 (01): : 316 - 329
  • [34] A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation
    Tomashenko, Natalia
    Khokhlov, Yuri
    Esteve, Yannick
    [J]. STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2016, 2016, 9918 : 120 - 132
  • [35] A study of speaker adaptation for DNN-based speech synthesis
    Wu, Zhizheng
    Swietojanski, Pawel
    Veaux, Christophe
    Renals, Steve
    King, Simon
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 879 - 883
  • [36] A DNN-based emotional speech synthesis by speaker adaptation
    Yang, Hongwu
    Zhang, Weizhao
    Zhi, Pengpeng
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 633 - 637
  • [37] Empirical Evaluation of Speaker Adaptation on DNN based Acoustic Model
    Wang, Ke
    Zhang, Junbo
    Wang, Yujun
    Xie, Lei
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2429 - 2433
  • [38] A robust unsupervised speaker clustering of speech utterances
    Zhang, SL
    Zhang, SW
    Xu, B
    [J]. PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 115 - 120
  • [39] Analysis of DNN Speech Signal Enhancement for Robust Speaker Recognition
    Novotny, Ondrej
    Plchot, Oldrich
    Glembek, Ondrej
    Cernocky, Jan ''Honza''
    Burget, Lukas
    [J]. COMPUTER SPEECH AND LANGUAGE, 2019, 58 : 403 - 421
  • [40] LEARNING HIDDEN UNIT CONTRIBUTIONS FOR UNSUPERVISED SPEAKER ADAPTATION OF NEURAL NETWORK ACOUSTIC MODELS
    Swietojanski, Pawel
    Renals, Steve
    [J]. 2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 171 - 176