Speaker clustering for speech recognition using vocal tract parameters

被引：10

作者：

Naito, M

Deng, L

Sagisaka, Y

机构：

[1] ATR, Interpreting Telephony Res Labs, Kyoto 6190288, Japan

[2] Univ Waterloo, Dept Elect & Comp Engn, Waterloo, ON N2L 3G1, Canada

来源：

SPEECH COMMUNICATION | 2002年 / 36卷 / 3-4期

关键词：

vocal tract parameters; speaker-clustering; speech recognition;

D O I：

10.1016/S0167-6393(00)00089-3

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We propose speaker clustering methods for speech recogition based on vocal tract (VT) size related articulatory parameters associated with individual speakers. Two parameters characterizing gross VT dimensions are first derived from the formant frequencies of two vowels and are then used to cluster speakers. The resulting speaker clusters are significantly different from speaker clusters obtained by conventional acoustic criteria. Then phoneme recognition experiments are carried out by using speaker-clustered HMMs (SC-HMMs) trained for each cluster. The proposed method requires a small amount of speech data for speaker clustering and for selecting the most suitable SC-HMM for a target speaker, but gives higher recognition rates than conventional speaker clustering methods based on acoustic criteria. (C) 2002 Elsevier Science B.V. All rights reserved.

引用

页码：305 / 315

页数：11

共 50 条

[21] Emotional Speech Clustering based Robust Speaker Recognition System
Li, Dongdong
Yang, Yingchun
PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 4576 - +
[22] A study on speaker normalization using vocal tract normalization and speaker adaptive training
Welling, L
Haeb-Umbach, R
Aubert, X
Haberland, N
PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 797 - 800
[23] Speaker verification using complementary information from vocal source and vocal tract
Zheng, Nengheng
Wang, Ning
Lee, Tan
Ching, P. C.
CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 518 - +
[24] Speaker clustering and transformation for speaker adaptation in large-vocabulary speech recognition systems
Padmanabhan, M
Bahl, LR
Nahamoo, D
Picheny, MA
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 701 - 704
[25] Speaker-independent speech recognition based on tree-structured speaker clustering
Kosaka, T
Matsunaga, S
Sagayama, S
COMPUTER SPEECH AND LANGUAGE, 1996, 10 (01): : 55 - 74
[26] Vocal tract length invariant features for automatic speech recognition
Mertins, A
Rademacher, J
2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 308 - 312
[27] USING CLUSTERING COMPARISON MEASURES FOR SPEAKER RECOGNITION
Kua, Jia Min Karen
Epps, Julien
Nosratighods, Mohaddeseh
Ambikairajah, Eliathamby
Choi, Eric
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5452 - 5455
[28] ESTIMATION OF VOCAL TRACT PARAMETERS FOR THE CLASSIFICATION OF SPEECH UNDER STRESS
Yao, Xiao
Jitsuhiro, Takatoshi
Miyajima, Chiyomi
Kitaoka, Norihide
Takeda, Kazuya
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7532 - 7536
[29] SPEAKER CLUSTERING USING VECTOR REPRESENTATION WITH LONG-TERM FEATURE FOR LECTURE SPEECH RECOGNITION
Huang, Chien-Lin
Hori, Chiori
Kashioka, Hideki
Ma, Bin
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 3532 - 3536
[30] Acoustic-phonetic speech parameters for speaker-independent speech recognition
Deshmukh, O
Espy-Wilson, CY
Juneja, A
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 593 - 596

← 1 2 3 4 5 →