Model-based speaker normalization methods for speech recognition

被引:1
|
作者
Naito, M [1 ]
Deng, L
Sagisaka, Y
机构
[1] ATR Interpreting Telecommun Res Labs, Kyoto 6190237, Japan
[2] Univ Waterloo, Dept Elect & Comp Engn, Waterloo, ON N2L 3G1, Canada
关键词
vocal tract shape; articulatory model; vocal-tract area functions; frequency warping; speaker normalization;
D O I
10.1002/ecjb.10119
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A speaker normalization method using a speech generation model is proposed in order to achieve high-performance speaker adaptation with a small amount of adaptation data. The speaker- and phoneme-dependent vocal tract area function is approximated by the corresponding area function produced by the articulatory model of a standard speaker, combined with phoneme-independent feature quantities of the vocal-tract shape of the normalized target speaker as estimated from the formant frequencies of two vowels. The frequency warping functions are determined from the formant frequencies of speech calculated from the vocal-tract area functions thus obtained, and normalization of the uttered speech is performed by stretching the speech spectrum in the frequency-axis direction. Continuous phoneme recognition experiments using phoneme connection rules show that the recognition error using a gender-dependent model is reduced by about 30% in the proposed method and that recognition performance superior to that of vocal-tract length normalization is obtained. The recognition performance of the proposed method is also equivalent to that of speaker adaptation by moving vector field smoothing (VFS) using 10 phonetically balanced sentences, showing that high-performance speaker adaptation using a small amount of adaptation data can be achieved by the proposed method. (C) 2003 Wiley Periodicals, Inc.
引用
收藏
页码:45 / 56
页数:12
相关论文
共 50 条
  • [41] Blind score normalization method for PLDA based speaker recognition
    Doroshin, Danila
    Lubimov, Nikolay
    Nastasenko, Marina
    Kotov, Mikhail
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 210 - 213
  • [42] SPEAKER NORMALIZATION FOR AUTOMATIC WORD RECOGNITION
    BOEHM, JF
    WRIGHT, RD
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1971, 49 (01): : 133 - &
  • [43] Robust endpoint detection and energy normalization for real-time speech and speaker recognition
    Li, Q
    Zheng, JS
    Tsai, A
    Zhou, QR
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (03): : 146 - 157
  • [44] A generalized approach for model-based speaker-dependent single channel speech separation
    Radfar, M. H.
    Sayadiyan, A.
    Dansereau, R. M.
    [J]. IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY TRANSACTION B-ENGINEERING, 2007, 31 (B3): : 361 - 375
  • [45] Normalization of modulation features for speaker recognition
    Thiruvaran, Tharmarajah
    Ambikairajah, Eliathamby
    Epps, Julien
    [J]. PROCEEDINGS OF THE 2007 15TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, 2007, : 599 - +
  • [46] EFFICIENT SCORE NORMALIZATION FOR SPEAKER RECOGNITION
    Aronowitz, Hagai
    Aronowitz, Vanessia
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4402 - 4405
  • [47] Speaker normalization and novel robust speech feature based on Mellin transform
    Chen, Jingdong
    Xu, Bo
    Huang, Taiyi
    [J]. 2000, Scientific Publishing House, China (26):
  • [48] A Bayesian view on acoustic model-based techniques for robust speech recognition
    Maas, Roland
    Huemmer, Christian
    Sehr, Armin
    Kellermann, Walter
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015, : 1 - 16
  • [49] Model-based clustered sparse imputation for noise robust speech recognition
    Goodarzi, Mohammad Mohsen
    Almasganj, Farshad
    [J]. SPEECH COMMUNICATION, 2016, 76 : 218 - 229
  • [50] STATE CLUSTERING IN HIDDEN MARKOV MODEL-BASED CONTINUOUS SPEECH RECOGNITION
    YOUNG, SJ
    WOODLAND, PC
    [J]. COMPUTER SPEECH AND LANGUAGE, 1994, 8 (04): : 369 - 383