On-line incremental speaker adaptation for broadcast news transcription

被引:7
|
作者
Zhang, ZP
Furui, S
Ohtsuki, K
机构
[1] Tokyo Inst Technol, Dept Comp Sci, Meguro Ku, Tokyo 1528552, Japan
[2] NTT Corp, Cyber Space Labs, Media Proc Project, Yokosuka, Kanagawa 2390847, Japan
关键词
speaker adaptation; speaker-change detection; likelihood comparison; GMM (Gaussian mixture models); SA (speaker-adaptive) GMM;
D O I
10.1016/S0167-6393(01)00018-8
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes a new unsupervised, on-line and incremental speaker adaptation technique that improves the performance of speech recognition systems when there are frequent changes in speaker identity and each speaker utters a series of several sentences. The speaker change is detected using speaker-in dependent (SI) and speaker-adaptive (SA) Gaussian mixture models (GMMs), and both phone hidden Markov model (HMM) and GMM are adapted by maximum likelihood linear regression (MLLR) transformation. Using this method, the word error rate of a broadcast news transcription task was reduced by 10.0% relative to the results using the SI models. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:271 / 281
页数:11
相关论文
共 50 条
  • [1] An On-line Incremental Speaker Adaptation Technique for Audio Stream Transcription
    Giuliani, Diego
    Brugnara, Fabio
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3307 - 3311
  • [2] On-line incremental speaker adaptation with automatic speaker change detection
    Zhang, ZP
    Furui, S
    Ohtsuki, K
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 961 - 964
  • [3] Study on Speaker Adaptation Methods in the Broadcast News Transcription Task
    Cerva, Petr
    Zdansky, Jindrich
    Silovsky, Jan
    Nouza, Jan
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 277 - 284
  • [4] ON-LINE SPEAKER ADAPTATION BASED EMOTION RECOGNITION USING INCREMENTAL EMOTIONAL INFORMATION
    Kim, Jae-Bok
    Park, Jeong-Sik
    Oh, Yung-Hwan
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4948 - 4951
  • [5] Fast incremental clustering of Gaussian mixture speaker models for scaling up retrieval in on-line broadcast
    Rougui, J. E.
    Rziza, M.
    Aboutajdine, D.
    Gelgon, M.
    Martinez, J.
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 5379 - 5382
  • [6] Incremental language modeling for automatic transcription of broadcast news
    Ohtsuki, Katsutoshi
    Nguyen, Long
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (02): : 526 - 532
  • [7] On-line incremental adaptation for speaker verification using maximum likelihood estimates of CDHMM parameters
    Yu, K
    Mason, J
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1752 - 1755
  • [8] Statistical language model adaptation for Mandarin broadcast news transcription
    Chen, B
    Tsai, WH
    Kuo, JW
    [J]. 2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 313 - 316
  • [9] Transcription of broadcast news - System robustness issues and adaptation techniques
    Bakis, R
    Chen, S
    Gopalakrishnan, P
    Gopinath, R
    Maes, S
    Polymenakos, L
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 711 - 714
  • [10] Domain Adaptation of a Broadcast News Transcription System for the Portuguese Parliament
    Neves, Luis
    Martins, Ciro
    Meinedo, Hugo
    Neto, Joao
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROCEEDINGS, 2008, 5190 : 163 - 171