N-Best-based unsupervised speaker adaptation for speech recognition

被引:15
|
作者
Matsui, T
Furui, S
机构
[1] Nippon Telegraph & Tel Corp, Human Interface Labs, Yokosuka, Kanagawa 239, Japan
[2] Tokyo Inst Technol, Meguro Ku, Tokyo 152, Japan
来源
COMPUTER SPEECH AND LANGUAGE | 1998年 / 12卷 / 01期
关键词
D O I
10.1006/csla.1997.0036
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes an instantaneous speaker adaptation method that uses N-best decoding for continuous mixture-density hidden-Markov-model-based speech-recognition systems. This method is effective even for speakers whose decoding using speaker-independent (SI) models are error-prone and for whom speaker adaptation techniques are truly needed. In addition, smoothed estimation and utterance verification are introduced into this method. The smoothed estimation is based on the likelihood values for adapted models of word sequences obtained by N-best decoding and improves the performance of error-prone speakers, and the utterance verification technique reduces the amount of calculation required. Performance evaluation using connected-digit (four-digit strings) recognition experiments performed over actual telephone lines showed a reduction of 36.4% in the error rates of speakers whose decoding using SI models are error-prone. (C) 1998 Academic Press Limited.
引用
收藏
页码:41 / 50
页数:10
相关论文
共 50 条
  • [1] Smoothed N-best-based speaker adaptation for speech recognition
    Matsui, T
    Matsuoka, T
    Furui, S
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS, 1997, : 1015 - 1018
  • [2] N-best-based instantaneous speaker adaptation method for speech recognition
    Matsui, T
    Furui, S
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 973 - 976
  • [3] Unsupervised speaker adaptation for robust speech recognition in real environments
    Yamade, S
    Baba, A
    Yoshikawa, S
    Lee, A
    Saruwatari, H
    Shikano, K
    [J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2005, 88 (08): : 30 - 41
  • [4] Unsupervised Speaker Adaptation Using Speaker-Class Models for Lecture Speech Recognition
    Kosaka, Tetsuo
    Takeda, Yuui
    Ito, Takashi
    Kato, Masaharu
    Kohda, Masaki
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2363 - 2369
  • [5] Multistage data selection-based unsupervised speaker adaptation for personalized speech emotion recognition
    Kim, Jae-Bok
    Park, Jeong-Sik
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2016, 52 : 126 - 134
  • [6] UNSUPERVISED SPEAKER ADAPTATION OF DEEP NEURAL NETWORK BASED ON THE COMBINATION OF SPEAKER CODES AND SINGULAR VALUE DECOMPOSITION FOR SPEECH RECOGNITION
    Xue, Shaofei
    Jiang, Hui
    Dai, Lirong
    Liu, Qingfeng
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4555 - 4559
  • [7] Supervised and unsupervised speaker adaptation in large vocabulary continuous speech recognition of Czech
    Cerva, P
    Nouza, J
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2005, 3658 : 203 - 210
  • [8] ON COMBINING DNN AND GMM WITH UNSUPERVISED SPEAKER ADAPTATION FOR ROBUST AUTOMATIC SPEECH RECOGNITION
    Liu, Shilin
    Sim, Khe Chai
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [9] On robustness of unsupervised domain adaptation for speaker recognition
    Bousquet, Pierre-Michel
    Rouvier, Mickael
    [J]. INTERSPEECH 2019, 2019, : 2958 - 2962
  • [10] PREDICTIVE SPEAKER ADAPTATION IN SPEECH RECOGNITION
    COX, S
    [J]. COMPUTER SPEECH AND LANGUAGE, 1995, 9 (01): : 1 - 17