Long term on-line speaker adaptation for large vocabulary dictation

被引:0
|
作者
Thelen, E
机构
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
On-line speaker adaptation is desirable for speech recognition dictation applications, because it offers the possibility to improve the system with the speaker-specific data obtained from the user. Since the user will work with such a device over a long period, for a dictation system the long term adaptation performance is more important than the adaptation speed. In contrast to speaker-dependent re-training, the speaker-specific speech data does not need to be stored for on-line speaker adaptation and each adaptation step does not require a large computational effort. In this paper we describe our way of performing online Bayesian speaker adaptation using partial traceback. We compare supervised with unsupervised adaptation and speaker adaptation with speaker-dependent training using the adaptation material. Compared to the speaker-independent startup models, the error rate was divided by two after five hours of supervised adaptation in our experiments, In the long term experiments, supervised on-line adaptation performed similar to speaker-dependent training using the adaptation material.
引用
收藏
页码:2139 / 2142
页数:4
相关论文
共 50 条
  • [1] Rapid speaker adaptation for embedded large vocabulary dictation system with sparse training materials
    Huang, Wei
    Zhang, Yaxin
    He, Xin
    Bao, Qingfeng
    [J]. 2008 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING, VOLS 1 AND 2, PROCEEDINGS, 2008, : 1069 - 1072
  • [2] A study on speaker adaptation of large vocabulary
    Jeon, B
    Kim, J
    Hong, S
    Kwon, Y
    Lee, K
    [J]. ISIE 2001: IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS PROCEEDINGS, VOLS I-III, 2001, : 513 - 515
  • [3] On-line incremental speaker adaptation with automatic speaker change detection
    Zhang, ZP
    Furui, S
    Ohtsuki, K
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 961 - 964
  • [4] Iterative unsupervised speaker adaptation for batch dictation
    Homma, S
    Takahashi, J
    Sagayama, S
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1141 - 1144
  • [5] CCLMDS'96: Towards a speaker-independent large-vocabulary Mandarin dictation system
    Chiang, TH
    Pengwu, CM
    Chien, SC
    Chang, CH
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1799 - 1802
  • [6] SPEAKER ADAPTATION IN A LARGE-VOCABULARY GAUSSIAN HMM RECOGNIZER
    KENNY, P
    LENNIG, M
    MERMELSTEIN, P
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1990, 12 (09) : 917 - 920
  • [7] Experiments in speaker normalisation and adaptation for large vocabulary speech recognition
    Pye, D
    Woodland, PC
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1047 - 1050
  • [8] On-line incremental speaker adaptation for broadcast news transcription
    Zhang, ZP
    Furui, S
    Ohtsuki, K
    [J]. SPEECH COMMUNICATION, 2002, 37 (3-4) : 271 - 281
  • [9] Speaker clustering and transformation for speaker adaptation in large-vocabulary speech recognition systems
    Padmanabhan, M
    Bahl, LR
    Nahamoo, D
    Picheny, MA
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 701 - 704
  • [10] Speaker adaptation in the philips system for large vocabulary continuous speech recognition
    Thelen, E
    Aubert, X
    Beyerlein, P
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1035 - 1038