Model-based speaker normalization methods for speech recognition

被引:1
|
作者
Naito, M [1 ]
Deng, L
Sagisaka, Y
机构
[1] ATR Interpreting Telecommun Res Labs, Kyoto 6190237, Japan
[2] Univ Waterloo, Dept Elect & Comp Engn, Waterloo, ON N2L 3G1, Canada
关键词
vocal tract shape; articulatory model; vocal-tract area functions; frequency warping; speaker normalization;
D O I
10.1002/ecjb.10119
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A speaker normalization method using a speech generation model is proposed in order to achieve high-performance speaker adaptation with a small amount of adaptation data. The speaker- and phoneme-dependent vocal tract area function is approximated by the corresponding area function produced by the articulatory model of a standard speaker, combined with phoneme-independent feature quantities of the vocal-tract shape of the normalized target speaker as estimated from the formant frequencies of two vowels. The frequency warping functions are determined from the formant frequencies of speech calculated from the vocal-tract area functions thus obtained, and normalization of the uttered speech is performed by stretching the speech spectrum in the frequency-axis direction. Continuous phoneme recognition experiments using phoneme connection rules show that the recognition error using a gender-dependent model is reduced by about 30% in the proposed method and that recognition performance superior to that of vocal-tract length normalization is obtained. The recognition performance of the proposed method is also equivalent to that of speaker adaptation by moving vector field smoothing (VFS) using 10 phonetically balanced sentences, showing that high-performance speaker adaptation using a small amount of adaptation data can be achieved by the proposed method. (C) 2003 Wiley Periodicals, Inc.
引用
收藏
页码:45 / 56
页数:12
相关论文
共 50 条
  • [1] Speaker normalization for template based speech recognition
    Demange, Sebastien
    Van Compernolle, Dirk
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 560 - 563
  • [2] Adaptive Speaker Normalization for CTC-Based Speech Recognition
    Ding, Penguin
    Guo, Wu
    Gu, Bin
    Ling, Zhenhua
    Du, Jun
    [J]. INTERSPEECH 2020, 2020, : 1266 - 1270
  • [3] Correlation Networks for Speaker Normalization in Automatic Speech Recognition
    Sharon, Rini A.
    Kothinti, Sandeep Reddy
    Umesh, Srinivasan
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 882 - 886
  • [4] Efficient Speaker and Noise Normalization for Robust Speech Recognition
    Joshi, Vikas
    Bilgi, Raghavendra
    Umesh, S.
    Benitez, C.
    Garcia, L.
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2612 - 2615
  • [5] Capturing local variability for speaker normalization in speech recognition
    Miguel, Antonio
    Lleida, Eduardo
    Rose, Richard
    Buera, Luis
    Saz, Oscar
    Ortega, Alfonso
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (03): : 578 - 593
  • [6] Improved automatic speech recognition through speaker normalization
    Giuliani, D
    Gerosa, M
    Brugnara, F
    [J]. COMPUTER SPEECH AND LANGUAGE, 2006, 20 (01): : 107 - 123
  • [7] SPEAKER NORMALIZATION FOR SELF-SUPERVISED SPEECH EMOTION RECOGNITION
    Gat, Itai
    Aronowitz, Hagai
    Zhu, Weizhong
    Morais, Edmilson
    Hoory, Ron
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7342 - 7346
  • [8] Speaker recognition from coded speech and the effects of score normalization
    Dunn, RB
    Quatieri, TF
    Reynolds, DA
    Campbell, JP
    [J]. CONFERENCE RECORD OF THE THIRTY-FIFTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1 AND 2, 2001, : 1562 - 1567
  • [9] A Generative Model for Score Normalization in Speaker Recognition
    Swart, Albert
    Brummer, Niko
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1477 - 1481
  • [10] Emotional speech feature normalization and recognition based on speaker-sensitive feature clustering
    Huang, Chengwei
    Song, Baolin
    Zhao, Li
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (04) : 805 - 816