A speaker clustering algorithm for fast speaker adaptation in continuous speech recognition

被引:0
|
作者
Rodríguez, LJ [1 ]
Torres, MI [1 ]
机构
[1] Univ Basque Country, Fac Ciencia & Tecnol, Pattern Recognit & Speech Technol Grp, DEE, E-48080 Bilbao, Spain
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper a speaker adaptation methodology is proposed, which first automatically determines a number of speaker clusters in the training material, then estimates the parameters of the corresponding models, and finally applies a fast match strategy - based on the so called histogram models - to choose the optimal cluster for each test utterance. The fast match strategy is critical to make this methodology useful in real applications, since carrying out several recognition passes - one for each cluster of speakers -, and then selecting the decoded string with the highest likelihood, would be too costly. Preliminary experimentation over two speech databases in Spanish reveal that both the clustering algorithm and the fast match strategy are consistent and reliable. The histogram models, though being suboptimal - they succeeded in guessing the right cluster for unseen test speakers in 85% of the cases with read speech, and in 63% of the cases with spontaneous speech -, yielded around a 6% decrease in error rate in phonetic recognition experiments.
引用
收藏
页码:433 / 440
页数:8
相关论文
共 50 条
  • [41] Speaker change detection and speaker clustering using VQ distortion for broadcast news speech recognition
    Mori, K
    Nakagawa, S
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 413 - 416
  • [42] Adaptation of hidden Markov model for telephone speech recognition and speaker adaptation
    Natl Tsing Hua Univ, Hsinchu, Taiwan
    IEE Proc Vision Image Signal Proc, 3 (129-135):
  • [43] Adaptation of hidden Markov model for telephone speech recognition and speaker adaptation
    Chien, JT
    Wang, HC
    IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 1997, 144 (03): : 129 - 135
  • [44] Speaker Clustering with Penalty Distance for Speaker Verification with Multi-Speaker Speech
    Das, Rohan Kumar
    Yang, Jichen
    Li, Haizhou
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1630 - 1635
  • [45] Robust several-speaker speech recognition with highly dependable online speaker adaptation and identification
    Shih, Po-Yi
    Lin, Po-Chuan
    Wang, Jhing-Fa
    Lin, Yuan-Ning
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2011, 34 (05) : 1459 - 1467
  • [46] Speaker clustering for speech recognition using vocal tract parameters
    Naito, M
    Deng, L
    Sagisaka, Y
    SPEECH COMMUNICATION, 2002, 36 (3-4) : 305 - 315
  • [47] Emotional Speech Clustering based Robust Speaker Recognition System
    Li, Dongdong
    Yang, Yingchun
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 4576 - +
  • [48] Fast model selection based speaker adaptation for nonnative speech
    He, XD
    Zhao, YX
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (04): : 298 - 307
  • [49] Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems
    Siniscalchi, Sabato Marco
    Li, Jinyu
    Lee, Chin-Hui
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (10): : 2152 - 2161
  • [50] Speaker adaptation of fuzzy-perceptron-based speech recognition
    Lin, CT
    Nein, HW
    Lin, WF
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 1999, 7 (01) : 1 - 30