Adaptive systems for unsupervised speaker tracking and speech recognition

被引：1

作者：

Herbig, Tobias ^{[1
,3
]}

Gerl, Franz ^{[2
]}

Minker, Wolfgang ^{[3
]}

Haeb-Umbach, Reinhold ^{[4
]}

机构：

[1] Nuance Communicat Aachen GmbH, D-89077 Ulm, Germany

[2] SVOX Deutschland GmbH, D-89077 Ulm, Germany

[3] Univ Ulm, Inst Informat Technol, D-89081 Ulm, Germany

[4] Univ Paderborn, Dept Communicat Engn, D-33095 Paderborn, Germany

来源：

EVOLVING SYSTEMS | 2011年 / 2卷 / 03期

关键词：

Speaker change detection; Speaker identification; Speaker adaptation;

D O I：

10.1007/s12530-011-9034-1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech recognition offers an intuitive and convenient interface to control technical devices. Improvements achieved through ongoing research activities enable the user to handle increasingly complex tasks via speech. For special applications, e.g. dictation, highly sophisticated techniques have been developed to yield high recognition accuracy. Many use cases, however, are characterized by changing conditions such as different speakers or time-variant environments. A manifold of approaches has been published to handle the problem of changes in the acoustic environment or speaker specific voice characteristics by adapting the statistical models of a speech recognizer and speaker tracking. Combining speaker adaptation and speaker tracking may be advantageous, because it allows a system to adapt to more than one user at the same time. The performance of speech controlled systems may be continuously improved over time. In this article we review some techniques and systems for unsupervised speaker tracking which may be combined with speech recognition. We discuss a unified view on speaker identification and speech recognition embedded in a self-learning system. The latter adapts individually to its main users without requiring additional interventions of the user such as an enrollment. Robustness is continuously improved by progressive speaker adaptation. We analyze our evaluation results for a realistic in-car application to validate the evolution of the system in terms of speech recognition accuracy and identification rate.

引用

页码：199 / 214

页数：16

共 50 条

[21] Speaker-Independent Silent Speech Recognition with Across-Speaker Articulatory Normalization and Speaker Adaptive Training
Wang, Jun
Hahm, Seongjun
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2415 - 2419
[22] TOWARDS SPEAKER-ADAPTIVE SPEECH RECOGNITION BASED ON SURFACE ELECTROMYOGRAPHY
Wand, Michael
Schultz, Tanja
[J]. BIOSIGNALS 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BIO-INSPIRED SYSTEMS AND SIGNAL PROCESSING, 2009, : 155 - 162
[23] UNSUPERVISED IDIOLECT DISCOVERY FOR SPEAKER RECOGNITION
Jansen, Aren
Garcia-Romero, Daniel
Clark, Pascal
Hernandez-Cordero, Jaime
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[24] Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems
Siniscalchi, Sabato Marco
Li, Jinyu
Lee, Chin-Hui
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (10): : 2152 - 2161
[25] Speaker adaptation for hybrid MMI/connectionist speech recognition systems
Rottland, J
Neukirchen, C
Rigoll, G
[J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 465 - 468
[26] Discriminative speaker adaptation in Persian continuous speech recognition systems
Pirhosseinloo, Shadi
Ganj, Farshad Almas
[J]. 4TH INTERNATIONAL CONFERENCE OF COGNITIVE SCIENCE, 2012, 32 : 296 - 301
[27] Speaker clustering and transformation for speaker adaptation in large-vocabulary speech recognition systems
Padmanabhan, M
Bahl, LR
Nahamoo, D
Picheny, MA
[J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 701 - 704
[28] AN INTRODUCTION TO SPEECH AND SPEAKER RECOGNITION
PEACOCKE, RD
GRAF, DH
[J]. COMPUTER, 1990, 23 (08) : 26 - 33
[29] UNSUPERVISED SPEAKER ADAPTATION OF DEEP NEURAL NETWORK BASED ON THE COMBINATION OF SPEAKER CODES AND SINGULAR VALUE DECOMPOSITION FOR SPEECH RECOGNITION
Xue, Shaofei
Jiang, Hui
Dai, Lirong
Liu, Qingfeng
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4555 - 4559
[30] A robust unsupervised speaker clustering of speech utterances
Zhang, SL
Zhang, SW
Xu, B
[J]. PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 115 - 120

← 1 2 3 4 5 →