UNSUPERVISED SPEAKER ADAPTATION OF DEEP NEURAL NETWORK BASED ON THE COMBINATION OF SPEAKER CODES AND SINGULAR VALUE DECOMPOSITION FOR SPEECH RECOGNITION

被引:0
|
作者
Xue, Shaofei [1 ]
Jiang, Hui [2 ]
Dai, Lirong [1 ]
Liu, Qingfeng [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
[2] York Univ, Dept Elect Engn & Comp Sci, Toronto, ON, Canada
关键词
Deep Neural Network (DNN); Speaker Code; Speaker Adaptation; singular value decomposition (SVD); TRANSFORMATIONS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, we have proposed a general adaptation scheme for deep neural network based on discriminant condition codes and applied it to supervised speaker adaptation in speech recognition based on either frame-level cross-entropy or sequence-level maximum mutual information training criterion [1, 2, 3, 4]. In this case, each condition code is associated with one speaker in data, which is thus called speaker code for convenience. Our previous work has shown that speaker code based methods are quite effective in adapting DNNs even when only a very small amount of adaptation data is available. However, we have to use a large speaker code size and complex processes to obtain the best ASR performance since good initializations of speaker codes and connection weights are very important. In this paper, we propose a method using singular value decomposition (SVD) as in [5] to initialize speaker codes and connection weights to obtain a comparable ASR performance as before but with a smaller speaker code size and much less computation complexity. Meanwhile, we have evaluated unsupervised speaker adaptation with the proposed method in large vocabulary speech recognition in the Switchboard task. Experimental results have shown that it is effective for providing well initializations and suitable in adapting large DNN models.
引用
收藏
页码:4555 / 4559
页数:5
相关论文
共 50 条
  • [1] SINGULAR VALUE DECOMPOSITION BASED LOW-FOOTPRINT SPEAKER ADAPTATION AND PERSONALIZATION FOR DEEP NEURAL NETWORK
    Xue, Jian
    Li, Jinyu
    Yu, Dong
    Seltzer, Mike
    Gong, Yifan
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [2] Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition
    Xue, Shaofei
    Jiang, Hui
    Dai, Lirong
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 1 - +
  • [3] Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition
    Shaofei Xue
    Hui Jiang
    Lirong Dai
    Qingfeng Liu
    [J]. Journal of Signal Processing Systems, 2016, 82 : 175 - 185
  • [4] Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition
    Xue, Shaofei
    Jiang, Hui
    Dai, Lirong
    Liu, Qingfeng
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 175 - 185
  • [5] Deep Neural Network-Based Speech Recognition with Combination of Speaker-Class Models
    Kosaka, Tetsuo
    Konno, Kazuki
    Kato, Masaharu
    [J]. 2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 1203 - 1206
  • [6] Two-Step Unsupervised Speaker Adaptation Based on Speaker and Gender Recognition and HMM Combination
    Cerva, Petr
    Nouza, Jan
    Silovsky, Jan
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2326 - 2329
  • [7] Discriminative Learning of Filterbank Layer within Deep Neural Network Based Speech Recognition for Speaker Adaptation
    Seki, Hiroshi
    Yamamoto, Kazumasa
    Akiba, Tomoyosi
    Nakagawa, Seiichi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (02) : 364 - 374
  • [8] N-Best-based unsupervised speaker adaptation for speech recognition
    Matsui, T
    Furui, S
    [J]. COMPUTER SPEECH AND LANGUAGE, 1998, 12 (01): : 41 - 50
  • [9] Unsupervised Speaker Adaptation Using Speaker-Class Models for Lecture Speech Recognition
    Kosaka, Tetsuo
    Takeda, Yuui
    Ito, Takashi
    Kato, Masaharu
    Kohda, Masaki
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2363 - 2369
  • [10] RAPID SPEAKER ADAPTATION OF NEURAL NETWORK BASED FILTERBANK LAYER FOR AUTOMATIC SPEECH RECOGNITION
    Seki, Hiroshi
    Yamamoto, Kazumasa
    Akiba, Tomoyosi
    Nakagawa, Seiichi
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 574 - 580