Adaptive systems for unsupervised speaker tracking and speech recognition

被引:1
|
作者
Herbig, Tobias [1 ,3 ]
Gerl, Franz [2 ]
Minker, Wolfgang [3 ]
Haeb-Umbach, Reinhold [4 ]
机构
[1] Nuance Communicat Aachen GmbH, D-89077 Ulm, Germany
[2] SVOX Deutschland GmbH, D-89077 Ulm, Germany
[3] Univ Ulm, Inst Informat Technol, D-89081 Ulm, Germany
[4] Univ Paderborn, Dept Communicat Engn, D-33095 Paderborn, Germany
关键词
Speaker change detection; Speaker identification; Speaker adaptation;
D O I
10.1007/s12530-011-9034-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech recognition offers an intuitive and convenient interface to control technical devices. Improvements achieved through ongoing research activities enable the user to handle increasingly complex tasks via speech. For special applications, e.g. dictation, highly sophisticated techniques have been developed to yield high recognition accuracy. Many use cases, however, are characterized by changing conditions such as different speakers or time-variant environments. A manifold of approaches has been published to handle the problem of changes in the acoustic environment or speaker specific voice characteristics by adapting the statistical models of a speech recognizer and speaker tracking. Combining speaker adaptation and speaker tracking may be advantageous, because it allows a system to adapt to more than one user at the same time. The performance of speech controlled systems may be continuously improved over time. In this article we review some techniques and systems for unsupervised speaker tracking which may be combined with speech recognition. We discuss a unified view on speaker identification and speech recognition embedded in a self-learning system. The latter adapts individually to its main users without requiring additional interventions of the user such as an enrollment. Robustness is continuously improved by progressive speaker adaptation. We analyze our evaluation results for a realistic in-car application to validate the evolution of the system in terms of speech recognition accuracy and identification rate.
引用
收藏
页码:199 / 214
页数:16
相关论文
共 50 条
  • [41] On robustness of unsupervised domain adaptation for speaker recognition
    Bousquet, Pierre-Michel
    Rouvier, Mickael
    [J]. INTERSPEECH 2019, 2019, : 2958 - 2962
  • [42] Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
    Ni, Junrui
    Wang, Liming
    Gao, Heting
    Qian, Kaizhi
    Zhang, Yang
    Chang, Shiyu
    Hasegawa-Johnson, Mark
    [J]. INTERSPEECH 2022, 2022, : 461 - 465
  • [43] Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems
    Abdullah, Hadi
    Garcia, Washington
    Peeters, Christian
    Traynor, Patrick
    Butler, Kevin R. B.
    Wilson, Joseph
    [J]. 26TH ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2019), 2019,
  • [44] Speech variability in automatic speaker recognition systems for commercial and forensic purposes
    Ortega-García, J
    González-Rodríguez, J
    Cruz-Llanas, S
    [J]. IEEE AEROSPACE AND ELECTRONIC SYSTEMS MAGAZINE, 2000, 15 (11) : 27 - 32
  • [45] Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems
    Deng, Jiajun
    Xie, Xurong
    Wang, Tianzi
    Cui, Mingyu
    Xue, Boyang
    Jin, Zengrui
    Li, Guinan
    Hu, Shujie
    Liu, Xunying
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1175 - 1190
  • [46] A speaker based unsupervised speech segmentation algorithm used in conversational speech
    Chen, Yanxiang
    Wang, Qiong
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, 2007, 4798 : 396 - +
  • [47] Speaker-Adaptive Neural Vocoders for Parametric Speech Synthesis Systems
    Song, Eunwoo
    Kim, Jin-Seob
    Byun, Kyungguen
    Kang, Hong-Goo
    [J]. 2020 IEEE 22ND INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2020,
  • [48] An Adaptive Threshold Computation for Unsupervised Speaker Segmentation
    Docio-Fernandez, Laura
    Lopez-Otero, Paula
    Garcia-Mateo, Carmen
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 860 - 863
  • [49] Unsupervised Arabic Speech Embedding Model for Speaker Identification
    Al Roken, Noora
    Hussain, Abir
    Shahin, Ismail
    Turky, Ayad
    Khan, Bilal
    Khan, Wasiq
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [50] Multimodal speech synthesis architecture for unsupervised speaker adaptation
    Hieu-Thi Luong
    Yamagishi, Junichi
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2494 - 2498