Adaptive systems for unsupervised speaker tracking and speech recognition

被引:1
|
作者
Herbig, Tobias [1 ,3 ]
Gerl, Franz [2 ]
Minker, Wolfgang [3 ]
Haeb-Umbach, Reinhold [4 ]
机构
[1] Nuance Communicat Aachen GmbH, D-89077 Ulm, Germany
[2] SVOX Deutschland GmbH, D-89077 Ulm, Germany
[3] Univ Ulm, Inst Informat Technol, D-89081 Ulm, Germany
[4] Univ Paderborn, Dept Communicat Engn, D-33095 Paderborn, Germany
关键词
Speaker change detection; Speaker identification; Speaker adaptation;
D O I
10.1007/s12530-011-9034-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech recognition offers an intuitive and convenient interface to control technical devices. Improvements achieved through ongoing research activities enable the user to handle increasingly complex tasks via speech. For special applications, e.g. dictation, highly sophisticated techniques have been developed to yield high recognition accuracy. Many use cases, however, are characterized by changing conditions such as different speakers or time-variant environments. A manifold of approaches has been published to handle the problem of changes in the acoustic environment or speaker specific voice characteristics by adapting the statistical models of a speech recognizer and speaker tracking. Combining speaker adaptation and speaker tracking may be advantageous, because it allows a system to adapt to more than one user at the same time. The performance of speech controlled systems may be continuously improved over time. In this article we review some techniques and systems for unsupervised speaker tracking which may be combined with speech recognition. We discuss a unified view on speaker identification and speech recognition embedded in a self-learning system. The latter adapts individually to its main users without requiring additional interventions of the user such as an enrollment. Robustness is continuously improved by progressive speaker adaptation. We analyze our evaluation results for a realistic in-car application to validate the evolution of the system in terms of speech recognition accuracy and identification rate.
引用
收藏
页码:199 / 214
页数:16
相关论文
共 50 条
  • [1] Speaker Tracking in an Unsupervised Speech Controlled System
    Herbig, Tobias
    Gerl, Franz
    Minker, Wolfgang
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2674 - +
  • [2] Fast Speaker Adaptive Training for Speech Recognition
    Povey, Daniel
    Kuo, Hong-Kwang J.
    Soltau, Hagen
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1245 - 1248
  • [3] Unsupervised speaker adaptation for robust speech recognition in real environments
    Yamade, S
    Baba, A
    Yoshikawa, S
    Lee, A
    Saruwatari, H
    Shikano, K
    [J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2005, 88 (08): : 30 - 41
  • [4] An unsupervised speaker adaptation method for lecture-style spontaneous speech recognition using multiple recognition systems
    Nakagawa, S
    Watanabe, T
    Nishizaki, H
    Utsuro, T
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (03): : 463 - 471
  • [5] Unsupervised Speaker Adaptation Using Speaker-Class Models for Lecture Speech Recognition
    Kosaka, Tetsuo
    Takeda, Yuui
    Ito, Takashi
    Kato, Masaharu
    Kohda, Masaki
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2363 - 2369
  • [6] Speaker clustering and transformation for speaker adaptation in speech recognition systems
    Padmanabhan, M
    Bahl, LR
    Nahamoo, D
    Picheny, MA
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (01): : 71 - 77
  • [7] On Speaker-Independent, Speaker-Dependent, and Speaker-Adaptive Speech Recognition
    Huang, Xuedong
    Lee, Kai-Fu
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (02): : 150 - 157
  • [8] N-Best-based unsupervised speaker adaptation for speech recognition
    Matsui, T
    Furui, S
    [J]. COMPUTER SPEECH AND LANGUAGE, 1998, 12 (01): : 41 - 50
  • [9] Automatic speech recognition fusion approach to unsupervised speaker clustering and labeling
    Lawson, A. D.
    Huggins, M. C.
    Grieco, J. J.
    Galligan, S. A.
    Harris, D. M.
    [J]. 2006 IEEE AEROSPACE CONFERENCE, VOLS 1-9, 2006, : 3280 - 3285
  • [10] Study on Integration of Speaker Diarization with Speaker Adaptive Speech Recognition for Broadcast Transcription
    Silovsky, Jan
    Cerva, Petr
    Zdansky, Jindrich
    Nouza, Jan
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 478 - 481