Model-based speaker normalization methods for speech recognition

被引:1
|
作者
Naito, M [1 ]
Deng, L
Sagisaka, Y
机构
[1] ATR Interpreting Telecommun Res Labs, Kyoto 6190237, Japan
[2] Univ Waterloo, Dept Elect & Comp Engn, Waterloo, ON N2L 3G1, Canada
关键词
vocal tract shape; articulatory model; vocal-tract area functions; frequency warping; speaker normalization;
D O I
10.1002/ecjb.10119
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A speaker normalization method using a speech generation model is proposed in order to achieve high-performance speaker adaptation with a small amount of adaptation data. The speaker- and phoneme-dependent vocal tract area function is approximated by the corresponding area function produced by the articulatory model of a standard speaker, combined with phoneme-independent feature quantities of the vocal-tract shape of the normalized target speaker as estimated from the formant frequencies of two vowels. The frequency warping functions are determined from the formant frequencies of speech calculated from the vocal-tract area functions thus obtained, and normalization of the uttered speech is performed by stretching the speech spectrum in the frequency-axis direction. Continuous phoneme recognition experiments using phoneme connection rules show that the recognition error using a gender-dependent model is reduced by about 30% in the proposed method and that recognition performance superior to that of vocal-tract length normalization is obtained. The recognition performance of the proposed method is also equivalent to that of speaker adaptation by moving vector field smoothing (VFS) using 10 phonetically balanced sentences, showing that high-performance speaker adaptation using a small amount of adaptation data can be achieved by the proposed method. (C) 2003 Wiley Periodicals, Inc.
引用
收藏
页码:45 / 56
页数:12
相关论文
共 50 条
  • [21] Model-based feature compensation for robust speech recognition
    Shen, Haifeng
    Li, Qunxia
    Guo, Jun
    Liu, Gang
    [J]. FUNDAMENTA INFORMATICAE, 2006, 72 (04) : 529 - 539
  • [22] Model-Based Feature Enhancement for Reverberant Speech Recognition
    Krueger, Alexander
    Haeb-Umbach, Reinhold
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07): : 1692 - 1707
  • [23] Hidden Markov model-based speech emotion recognition
    Schuller, B
    Rigoll, G
    Lang, M
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 1 - 4
  • [24] Hidden Markov model-based speech emotion recognition
    Schuller, B
    Rigoll, G
    Lang, M
    [J]. 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, 2003, : 401 - 404
  • [25] Adaptive model-based technique for robust speech recognition
    Graciarena, M
    [J]. CONFERENCE RECORD OF THE THIRTY-FOURTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2000, : 1512 - 1516
  • [26] Model-based feature compensation for robust speech recognition
    School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing, 100876, China
    不详
    不详
    [J]. Fundam Inf, 2006, 4 (529-539):
  • [27] Model-based feature enhancement for noisy speech recognition
    Couvreur, C
    Van hamme, H
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1719 - 1722
  • [28] Speaker Recognition and Speech Emotion Recognition Based on GMM
    Xu, Shupeng
    Liu, Yan
    Liu, Xiping
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON ELECTRIC AND ELECTRONICS, 2013, : 434 - 436
  • [29] Exploring AI-based Speaker Dependent Methods in Dysarthric Speech Recognition
    Mulfari, Davide
    Celesti, Antonio
    Villari, Massimo
    [J]. 2022 22ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2022), 2022, : 958 - 964
  • [30] Speech recognition using interphoneme dependencies based on a speaker space model
    Cho, Kook
    Ichimaru, Taiichiroh
    Yamashita, Yoichi
    [J]. Systems and Computers in Japan, 2005, 36 (08) : 15 - 22