RAPID SPEAKER ADAPTATION WITH SPEAKER ADAPTIVE TRAINING AND NON-NEGATIVE MATRIX FACTORIZATION

被引:0
|
作者
Zhang, Xueru [1 ]
Demuynck, Kris [1 ]
Van Hamme, Hugo [1 ]
机构
[1] Katholieke Univ Leuven, Dept Elect Engn ESAT, B-3001 Louvain, Belgium
关键词
Speaker adaptation; non-negative matrix factorization; speaker adaptive training; maximum likelihood linear regression; weight adaptation;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we describe a novel speaker adaptation algorithm based on Gaussian mixture weight adaptation. A small number of latent speaker vectors are estimated with non-negative matrix factorization (NMF). These base vectors encode the correlations between Gaussian activations as learned from the train data. Expressing the speaker dependent Gaussian mixture weights as a linear combination of a small number of base vectors, reduces the number of parameters that must be estimated from the enrollment data. In order to learn meaningful correlations between Gaussian activations from the train data, the NMF-based weight adaptation was combined with vocal tract length normalization (VTLN) and feature-space maximum likelihood linear regression (fMLLR) based speaker adaptive training based. Evaluation on the 5k closed and 20k open vocabulary Wall Street Journal tasks shows a 4% relative word error rate reduction over the speaker independent recognition system which already incorporates VTLN. The proposed fast adaptation algorithm, using a single enrollment sentence only, results in similar performance as fMLLR adapting on 40 enrollment sentences.
引用
收藏
页码:4456 / 4459
页数:4
相关论文
共 50 条
  • [1] Rapid speaker adaptation in latent speaker space with non-negative matrix factorization
    Zhang, Xueru
    Demuynck, Kris
    Van Hamme, Hugo
    [J]. SPEECH COMMUNICATION, 2013, 55 (09) : 893 - 908
  • [2] Fast speaker adaptation using non-negative matrix factorization
    Duchateau, Jacques
    Leroy, Tobias
    Demuynck, Kris
    Van hamme, Hugo
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4269 - 4272
  • [3] Speaker Clustering Based on Non-negative Matrix Factorization
    Nishida, Masafumi
    Yamamoto, Seiichi
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 956 - 959
  • [4] Speaker conversion using kernel non-negative matrix factorization
    Xu Qinyu
    Lu Guanming
    Yan Jingjie
    Li Haibo
    Cheng Xiao
    [J]. The Journal of China Universities of Posts and Telecommunications, 2017, (05) : 60 - 67
  • [5] Speaker conversion using kernel non-negative matrix factorization
    Xu Qinyu
    Lu Guanming
    Yan Jingjie
    Li Haibo
    Cheng Xiao
    [J]. TheJournalofChinaUniversitiesofPostsandTelecommunications., 2017, 24 (05) - 67
  • [6] Speaker conversion using kernel non-negative matrix factorization
    [J]. Guanming, Lu (lugm@njupt.edu.cn), 2017, Beijing University of Posts and Telecommunications (24):
  • [7] Adaptation of speaker-specific bases in non-negative matrix factorization for single channel speech-music separation
    Grais, Emad M.
    Erdogan, Hakan
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 576 - 579
  • [8] Exploiting Non-negative Matrix Factorization with Linear Constraints in Noise-Robust Speaker Identification
    Lyubimov, Nikolay
    Nastasenko, Marina
    Kotov, Mikhail
    Doroshin, Danila
    [J]. SPEECH AND COMPUTER, 2014, 8773 : 200 - 208
  • [9] Joint speaker separation and recognition using non-negative matrix deconvolution with adaptive dictionary
    Drgas, Szymon
    Virtanen, Tuomas
    [J]. COMPUTER SPEECH AND LANGUAGE, 2021, 70
  • [10] Speaker Clustering Based on Non-Negative Matrix Factorization Using Gaussian Mixture Model in Complementary Subspace
    Nishida, Masafumi
    Yamamoto, Seiichi
    [J]. PROCEEDINGS OF THE 15TH INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2017,