Speaker clustering for speech recognition using vocal tract parameters

被引:10
|
作者
Naito, M
Deng, L
Sagisaka, Y
机构
[1] ATR, Interpreting Telephony Res Labs, Kyoto 6190288, Japan
[2] Univ Waterloo, Dept Elect & Comp Engn, Waterloo, ON N2L 3G1, Canada
关键词
vocal tract parameters; speaker-clustering; speech recognition;
D O I
10.1016/S0167-6393(00)00089-3
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose speaker clustering methods for speech recogition based on vocal tract (VT) size related articulatory parameters associated with individual speakers. Two parameters characterizing gross VT dimensions are first derived from the formant frequencies of two vowels and are then used to cluster speakers. The resulting speaker clusters are significantly different from speaker clusters obtained by conventional acoustic criteria. Then phoneme recognition experiments are carried out by using speaker-clustered HMMs (SC-HMMs) trained for each cluster. The proposed method requires a small amount of speech data for speaker clustering and for selecting the most suitable SC-HMM for a target speaker, but gives higher recognition rates than conventional speaker clustering methods based on acoustic criteria. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:305 / 315
页数:11
相关论文
共 50 条
  • [1] Speaker clustering for speech recognition using the parameters characterizing vocal-tract dimensions
    Naito, M
    Deng, L
    Sagisaka, Y
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 981 - 984
  • [2] NORMALIZING THE VOCAL-TRACT LENGTH FOR SPEAKER-INDEPENDENT SPEECH RECOGNITION
    LIN, QG
    CHE, CW
    IEEE SIGNAL PROCESSING LETTERS, 1995, 2 (11) : 201 - 203
  • [3] Robust Speaker Recognition Using Denoised Vocal Source and Vocal Tract Features
    Wang, Ning
    Ching, P. C.
    Zheng, Nengheng
    Lee, Tan
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (01): : 196 - 205
  • [4] Spectral Characteristics of Vocal Tract for Speaker Recognition
    Sigmund, Milan
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2006, 6 (1A): : 17 - 19
  • [6] Speaker clustering and transformation for speaker adaptation in speech recognition systems
    Padmanabhan, M
    Bahl, LR
    Nahamoo, D
    Picheny, MA
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (01): : 71 - 77
  • [7] Contemporary Speech/Speaker Recognition with Speech from Impaired Vocal Apparatus
    Nidhyananthan, S. Selva
    Selvakumari, R. Shantha
    Shenbagalakshmi, V.
    2014 INTERNATIONAL CONFERENCE ON COMMUNICATION AND NETWORK TECHNOLOGIES (ICCNT), 2014, : 198 - 202
  • [8] Speaker change detection and speaker clustering using VQ distortion for broadcast news speech recognition
    Mori, K
    Nakagawa, S
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 413 - 416
  • [9] A speaker clustering algorithm for fast speaker adaptation in continuous speech recognition
    Rodríguez, LJ
    Torres, MI
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 : 433 - 440
  • [10] DEEPTALK: VOCAL STYLE ENCODING FOR SPEAKER RECOGNITION AND SPEECH SYNTHESIS
    Chowdhury, Anurag
    Ross, Arun
    David, Prabu
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6189 - 6193