AN INVESTIGATION INTO LEARNING EFFECTIVE SPEAKER SUBSPACES FOR ROBUST UNSUPERVISED DNN ADAPTATION

被引:0
|
作者
Samarakoon, Lahiru [1 ,2 ]
Sim, Khe Chai [3 ]
Mak, Brian [2 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
[2] Hong Kong Univ Sci & Technol, Hong Kong, Hong Kong, Peoples R China
[3] Google Inc, Mountain View, CA USA
关键词
Automatic Speech Recognition; DNN Adaptation; Subspace Methods; NEURAL-NETWORK; TRANSFORMATIONS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Subspace methods are used for deep neural network (DNN)based acoustic model adaptation. These methods first construct a subspace and then perform the speaker adaptation as a point in the subspace. This paper aims to investigate the effectiveness of subspace methods for robust unsupervised adaptation. For the analysis, we compare two state-of-the-art subspace methods, namely, the singular value decomposition (SVD)-based bottleneck adaptation and the factorized hidden layer (FHL) adaptation. Both of these methods perform speaker adaptation as a linear combination of rank-1 bases. The main difference between the subspace construction is that FHL adaptation constructs a speaker subspace separate from the phoneme classification space while SVD-based bottleneck adaptation shares the same subspace for both the phoneme classification and the speaker adaptation. So far, no direct comparisons between these two methods are reported. In this work, we compare these two methods for their robustness to unsupervised adaptation on Aurora 4, AMI IHM and AMI SDM tasks. Our findings show that the FHL adaptation outperforms the SVD-based bottleneck adaptation especially in challenging conditions where the adaptation data is limited, or the quality of the adaptation alignments are low.
引用
收藏
页码:5035 / 5039
页数:5
相关论文
共 50 条
  • [41] X-VECTORS: ROBUST DNN EMBEDDINGS FOR SPEAKER RECOGNITION
    Snyder, David
    Garcia-Romero, Daniel
    Sell, Gregory
    Povey, Daniel
    Khudanpur, Sanjeev
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5329 - 5333
  • [42] DNN-Driven Mixture of PLDA for Robust Speaker Verification
    Li, Na
    Mak, Man-Wai
    Chien, Jen-Tzung
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (06) : 1371 - 1383
  • [43] Unsupervised Domain Adaptation for DNN-based Automated Harvesting
    Shkanaev, Aleksandr Yu
    Sholomov, Dmitry L.
    Nikolaev, Dmitry P.
    [J]. TWELFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2019), 2020, 11433
  • [44] Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS
    Udagawa, Kenta
    Saito, Yuki
    Saruwatari, Hiroshi
    [J]. INTERSPEECH 2022, 2022, : 2968 - 2972
  • [45] Unsupervised incremental online adaptation to unknown environment and speaker
    Yook, D
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 617 - 620
  • [46] Supervised and unsupervised speaker adaptation using confidence measure
    Liu, J
    Li, HS
    Liu, J
    Liu, RS
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2003, 12 (01) : 139 - 143
  • [47] Unsupervised Speaker Adaptation of BLSTM-RNN for LVCSR Based on Speaker Code
    Huang, Zhiying
    Xue, Shaofei
    Yan, Zhijie
    Dai, Lirong
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [48] Multimodal speech synthesis architecture for unsupervised speaker adaptation
    Hieu-Thi Luong
    Yamagishi, Junichi
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2494 - 2498
  • [49] Random Subspaces NMF for Unsupervised Transfer Learning
    Redko, Ievgen
    Bennani, Younes
    [J]. PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 3901 - 3908
  • [50] CONSTRAINED DISCRIMINATIVE MAPPING TRANSFORMS FOR UNSUPERVISED SPEAKER ADAPTATION
    Chen, Langzhou
    Gales, Mark J. F.
    Chin, K. K.
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5344 - 5347