Iterative unsupervised speaker adaptation for batch dictation

被引:0
|
作者
Homma, S
Takahashi, J
Sagayama, S
机构
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes an automatic batch-style dictation paradigm in which the entire dictated speech is fully utilized for speaker adaptation and is recognized using the speaker adaptation results. The key point is that the same speech data is used both for recognition as the target and for speaker adaptation. Two steps, speech recognition and speaker adaptation which uses recognition results as means of supervision, are iterated to maximize the advantage of closed-data speaker adaptation. Recognition errors are reduced by 37% in a practical application of batch-style speech-to-text conversion of recorded dictation of Japanese medical diagnoses compared to speaker-independent recognition. To select only reliable recognition results, a supervision improvement procedure is used by which erroneous recognition results can be eliminated from the supervision. In this procedure, 59-74% of the data are extracted from the tentative recognition results and their reliability is 89-93% This procedure also reduces recognition errors by 45%.
引用
收藏
页码:1141 / 1144
页数:4
相关论文
共 50 条
  • [1] Batch Normalization based Unsupervised Speaker Adaptation for Acoustic Models
    Yi, Jiangyan
    Tao, Jianhua
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 176 - 180
  • [2] UNSUPERVISED SPEAKER ADAPTATION OF BATCH NORMALIZED ACOUSTIC MODELS FOR ROBUST ASR
    Wang, Zhong-Qiu
    Wang, DeLiang
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4890 - 4894
  • [3] Unsupervised speaker adaptation using reference speaker weighting
    Lai, Tsz-Chung
    Mak, Brian
    [J]. CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 380 - +
  • [4] DIFFERENTIABLE POOLING FOR UNSUPERVISED SPEAKER ADAPTATION
    Swietojanski, Pawel
    Renals, Steve
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4305 - 4309
  • [5] An approach to robust unsupervised speaker adaptation
    Kim, NS
    Seo, DJ
    Lim, W
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2005, 12 (06) : 469 - 472
  • [6] Unsupervised model adaptation for speaker verification
    Preti, Alexandre
    Bonastre, Jean-Francois
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2090 - 2093
  • [7] Iterative unsupervised GMM training for speaker indexing
    Paralic, Martin
    Jarina, Roman
    [J]. RADIOENGINEERING, 2007, 16 (03) : 138 - 144
  • [8] Iterative PLDA Adaptation for Speaker Diarization
    Le Lan, Gael
    Charlet, Delphine
    Larcher, Anthony
    Meignier, Sylvain
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2175 - 2179
  • [9] Long term on-line speaker adaptation for large vocabulary dictation
    Thelen, E
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2139 - 2142
  • [10] A fast algorithm for unsupervised incremental speaker adaptation
    Schussler, M
    Gallwitz, F
    Harbeck, S
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1019 - 1022