Adaptive systems for unsupervised speaker tracking and speech recognition

被引：1

作者：

Herbig, Tobias ^{[1
,3
]}

Gerl, Franz ^{[2
]}

Minker, Wolfgang ^{[3
]}

Haeb-Umbach, Reinhold ^{[4
]}

机构：

[1] Nuance Communicat Aachen GmbH, D-89077 Ulm, Germany

[2] SVOX Deutschland GmbH, D-89077 Ulm, Germany

[3] Univ Ulm, Inst Informat Technol, D-89081 Ulm, Germany

[4] Univ Paderborn, Dept Communicat Engn, D-33095 Paderborn, Germany

来源：

EVOLVING SYSTEMS | 2011年 / 2卷 / 03期

关键词：

Speaker change detection; Speaker identification; Speaker adaptation;

D O I：

10.1007/s12530-011-9034-1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech recognition offers an intuitive and convenient interface to control technical devices. Improvements achieved through ongoing research activities enable the user to handle increasingly complex tasks via speech. For special applications, e.g. dictation, highly sophisticated techniques have been developed to yield high recognition accuracy. Many use cases, however, are characterized by changing conditions such as different speakers or time-variant environments. A manifold of approaches has been published to handle the problem of changes in the acoustic environment or speaker specific voice characteristics by adapting the statistical models of a speech recognizer and speaker tracking. Combining speaker adaptation and speaker tracking may be advantageous, because it allows a system to adapt to more than one user at the same time. The performance of speech controlled systems may be continuously improved over time. In this article we review some techniques and systems for unsupervised speaker tracking which may be combined with speech recognition. We discuss a unified view on speaker identification and speech recognition embedded in a self-learning system. The latter adapts individually to its main users without requiring additional interventions of the user such as an enrollment. Robustness is continuously improved by progressive speaker adaptation. We analyze our evaluation results for a realistic in-car application to validate the evolution of the system in terms of speech recognition accuracy and identification rate.

引用

页码：199 / 214

页数：16

共 50 条

[1] Speaker Tracking in an Unsupervised Speech Controlled System
Herbig, Tobias
Gerl, Franz
Minker, Wolfgang
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2674 - +
[2] Fast Speaker Adaptive Training for Speech Recognition
Povey, Daniel
Kuo, Hong-Kwang J.
Soltau, Hagen
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1245 - 1248
[3] Unsupervised speaker adaptation for robust speech recognition in real environments
Yamade, S
Baba, A
Yoshikawa, S
Lee, A
Saruwatari, H
Shikano, K
[J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2005, 88 (08): : 30 - 41
[4] An unsupervised speaker adaptation method for lecture-style spontaneous speech recognition using multiple recognition systems
Nakagawa, S
Watanabe, T
Nishizaki, H
Utsuro, T
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (03): : 463 - 471
[5] Unsupervised Speaker Adaptation Using Speaker-Class Models for Lecture Speech Recognition
Kosaka, Tetsuo
Takeda, Yuui
Ito, Takashi
Kato, Masaharu
Kohda, Masaki
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2363 - 2369
[6] Speaker clustering and transformation for speaker adaptation in speech recognition systems
Padmanabhan, M
Bahl, LR
Nahamoo, D
Picheny, MA
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (01): : 71 - 77
[7] On Speaker-Independent, Speaker-Dependent, and Speaker-Adaptive Speech Recognition
Huang, Xuedong
Lee, Kai-Fu
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (02): : 150 - 157
[8] N-Best-based unsupervised speaker adaptation for speech recognition
Matsui, T
Furui, S
[J]. COMPUTER SPEECH AND LANGUAGE, 1998, 12 (01): : 41 - 50
[9] Automatic speech recognition fusion approach to unsupervised speaker clustering and labeling
Lawson, A. D.
Huggins, M. C.
Grieco, J. J.
Galligan, S. A.
Harris, D. M.
[J]. 2006 IEEE AEROSPACE CONFERENCE, VOLS 1-9, 2006, : 3280 - 3285
[10] Study on Integration of Speaker Diarization with Speaker Adaptive Speech Recognition for Broadcast Transcription
Silovsky, Jan
Cerva, Petr
Zdansky, Jindrich
Nouza, Jan
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 478 - 481

← 1 2 3 4 5 →