Adaptive systems for unsupervised speaker tracking and speech recognition

被引:1
|
作者
Herbig, Tobias [1 ,3 ]
Gerl, Franz [2 ]
Minker, Wolfgang [3 ]
Haeb-Umbach, Reinhold [4 ]
机构
[1] Nuance Communicat Aachen GmbH, D-89077 Ulm, Germany
[2] SVOX Deutschland GmbH, D-89077 Ulm, Germany
[3] Univ Ulm, Inst Informat Technol, D-89081 Ulm, Germany
[4] Univ Paderborn, Dept Communicat Engn, D-33095 Paderborn, Germany
关键词
Speaker change detection; Speaker identification; Speaker adaptation;
D O I
10.1007/s12530-011-9034-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech recognition offers an intuitive and convenient interface to control technical devices. Improvements achieved through ongoing research activities enable the user to handle increasingly complex tasks via speech. For special applications, e.g. dictation, highly sophisticated techniques have been developed to yield high recognition accuracy. Many use cases, however, are characterized by changing conditions such as different speakers or time-variant environments. A manifold of approaches has been published to handle the problem of changes in the acoustic environment or speaker specific voice characteristics by adapting the statistical models of a speech recognizer and speaker tracking. Combining speaker adaptation and speaker tracking may be advantageous, because it allows a system to adapt to more than one user at the same time. The performance of speech controlled systems may be continuously improved over time. In this article we review some techniques and systems for unsupervised speaker tracking which may be combined with speech recognition. We discuss a unified view on speaker identification and speech recognition embedded in a self-learning system. The latter adapts individually to its main users without requiring additional interventions of the user such as an enrollment. Robustness is continuously improved by progressive speaker adaptation. We analyze our evaluation results for a realistic in-car application to validate the evolution of the system in terms of speech recognition accuracy and identification rate.
引用
收藏
页码:199 / 214
页数:16
相关论文
共 50 条
  • [21] Speaker-Independent Silent Speech Recognition with Across-Speaker Articulatory Normalization and Speaker Adaptive Training
    Wang, Jun
    Hahm, Seongjun
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2415 - 2419
  • [22] TOWARDS SPEAKER-ADAPTIVE SPEECH RECOGNITION BASED ON SURFACE ELECTROMYOGRAPHY
    Wand, Michael
    Schultz, Tanja
    [J]. BIOSIGNALS 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BIO-INSPIRED SYSTEMS AND SIGNAL PROCESSING, 2009, : 155 - 162
  • [23] UNSUPERVISED IDIOLECT DISCOVERY FOR SPEAKER RECOGNITION
    Jansen, Aren
    Garcia-Romero, Daniel
    Clark, Pascal
    Hernandez-Cordero, Jaime
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [24] Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems
    Siniscalchi, Sabato Marco
    Li, Jinyu
    Lee, Chin-Hui
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (10): : 2152 - 2161
  • [25] Speaker adaptation for hybrid MMI/connectionist speech recognition systems
    Rottland, J
    Neukirchen, C
    Rigoll, G
    [J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 465 - 468
  • [26] Discriminative speaker adaptation in Persian continuous speech recognition systems
    Pirhosseinloo, Shadi
    Ganj, Farshad Almas
    [J]. 4TH INTERNATIONAL CONFERENCE OF COGNITIVE SCIENCE, 2012, 32 : 296 - 301
  • [27] Speaker clustering and transformation for speaker adaptation in large-vocabulary speech recognition systems
    Padmanabhan, M
    Bahl, LR
    Nahamoo, D
    Picheny, MA
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 701 - 704
  • [28] AN INTRODUCTION TO SPEECH AND SPEAKER RECOGNITION
    PEACOCKE, RD
    GRAF, DH
    [J]. COMPUTER, 1990, 23 (08) : 26 - 33
  • [29] UNSUPERVISED SPEAKER ADAPTATION OF DEEP NEURAL NETWORK BASED ON THE COMBINATION OF SPEAKER CODES AND SINGULAR VALUE DECOMPOSITION FOR SPEECH RECOGNITION
    Xue, Shaofei
    Jiang, Hui
    Dai, Lirong
    Liu, Qingfeng
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4555 - 4559
  • [30] A robust unsupervised speaker clustering of speech utterances
    Zhang, SL
    Zhang, SW
    Xu, B
    [J]. PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 115 - 120