Model-based speaker normalization methods for speech recognition

被引:1
|
作者
Naito, M [1 ]
Deng, L
Sagisaka, Y
机构
[1] ATR Interpreting Telecommun Res Labs, Kyoto 6190237, Japan
[2] Univ Waterloo, Dept Elect & Comp Engn, Waterloo, ON N2L 3G1, Canada
关键词
vocal tract shape; articulatory model; vocal-tract area functions; frequency warping; speaker normalization;
D O I
10.1002/ecjb.10119
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A speaker normalization method using a speech generation model is proposed in order to achieve high-performance speaker adaptation with a small amount of adaptation data. The speaker- and phoneme-dependent vocal tract area function is approximated by the corresponding area function produced by the articulatory model of a standard speaker, combined with phoneme-independent feature quantities of the vocal-tract shape of the normalized target speaker as estimated from the formant frequencies of two vowels. The frequency warping functions are determined from the formant frequencies of speech calculated from the vocal-tract area functions thus obtained, and normalization of the uttered speech is performed by stretching the speech spectrum in the frequency-axis direction. Continuous phoneme recognition experiments using phoneme connection rules show that the recognition error using a gender-dependent model is reduced by about 30% in the proposed method and that recognition performance superior to that of vocal-tract length normalization is obtained. The recognition performance of the proposed method is also equivalent to that of speaker adaptation by moving vector field smoothing (VFS) using 10 phonetically balanced sentences, showing that high-performance speaker adaptation using a small amount of adaptation data can be achieved by the proposed method. (C) 2003 Wiley Periodicals, Inc.
引用
收藏
页码:45 / 56
页数:12
相关论文
共 50 条
  • [31] Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks
    Nejadgholi, Isar
    Seyyedsalehi, Seyyed Ali
    [J]. NEURAL COMPUTING & APPLICATIONS, 2009, 18 (01): : 45 - 55
  • [32] Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks
    Isar Nejadgholi
    Seyyed Ali Seyyedsalehi
    [J]. Neural Computing and Applications, 2009, 18 : 45 - 55
  • [33] Speaker normalization on conversational telephone speech
    Wegmann, S
    McAllaster, D
    Orloff, J
    Peskin, B
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 339 - 341
  • [34] Unsupervised noise model estimation for model-based robust speech recognition
    Graciarena, M
    Franco, H
    [J]. ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 351 - 356
  • [35] Predictive model-based compensation schemes for robust speech recognition
    Gales, MJF
    [J]. SPEECH COMMUNICATION, 1998, 25 (1-3) : 49 - 74
  • [36] Model-based Articulatory Phonetic Features for Improved Speech Recognition
    Huang, Guangpu
    Er, Meng Joo
    [J]. 2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
  • [37] Model-Based Wiener filter for noise robust speech recognition
    Arakawa, Takayuki
    Tsujikawa, Masanori
    Isotani, Ryosuke
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 537 - 540
  • [38] The Non-trusty Clown Attack on Model-based Speaker Recognition Systems
    Baroughi, Alireza Farrokh
    Craver, Scott
    [J]. MEDIA WATERMARKING, SECURITY, AND FORENSICS 2015, 2015, 9409
  • [39] Feature extraction and normalization in SVM-based speaker recognition
    Mazibuko, Thembisile
    Mashao, Daniel
    [J]. WMSCI 2006: 10TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL V, PROCEEDINGS, 2006, : 260 - +
  • [40] Temporal Speech Normalization Methods Comparison in Speech Recognition Using Neural Network
    Salam, Md Sah Bin Hj
    Mohamad, Dzulkifli
    Salleh, Sheikh Hussain Shaikh
    [J]. 2009 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION, 2009, : 442 - 447