Speaker normalization for template based speech recognition

被引:0
|
作者
Demange, Sebastien [1 ]
Van Compernolle, Dirk [1 ]
机构
[1] Katholieke Univ Leuven, Dept ESAT, B-3001 Louvain, Belgium
关键词
template based speech recognition; speaker normalization; VTLN;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vocal Tract Length Normalization (VTLN) has been shown to be an efficient speaker normalization tool for HMM based systems. In this paper we show that it is equally efficient for a template based recognition system. Template based systems, while promising, have as potential drawback that templates maintain all non phonetic details apart from the essential phonemic properties; i.e. they retain information on speaker and acoustic recording circumstances. This may lead to a very inefficient usage of the database. We show that after VTLN significantly more speakers - also from opposite gender - contribute templates to the matching sequence compared to the non-normalized case. In experiments on the Wall Street Journal database this leads to a relative word error rate reduction of 10%.
引用
收藏
页码:560 / 563
页数:4
相关论文
共 50 条
  • [1] Model-based speaker normalization methods for speech recognition
    Naito, M
    Deng, L
    Sagisaka, Y
    [J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2003, 86 (02): : 45 - 56
  • [2] Adaptive Speaker Normalization for CTC-Based Speech Recognition
    Ding, Penguin
    Guo, Wu
    Gu, Bin
    Ling, Zhenhua
    Du, Jun
    [J]. INTERSPEECH 2020, 2020, : 1266 - 1270
  • [3] Correlation Networks for Speaker Normalization in Automatic Speech Recognition
    Sharon, Rini A.
    Kothinti, Sandeep Reddy
    Umesh, Srinivasan
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 882 - 886
  • [4] Efficient Speaker and Noise Normalization for Robust Speech Recognition
    Joshi, Vikas
    Bilgi, Raghavendra
    Umesh, S.
    Benitez, C.
    Garcia, L.
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2612 - 2615
  • [5] Capturing local variability for speaker normalization in speech recognition
    Miguel, Antonio
    Lleida, Eduardo
    Rose, Richard
    Buera, Luis
    Saz, Oscar
    Ortega, Alfonso
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (03): : 578 - 593
  • [6] Improved automatic speech recognition through speaker normalization
    Giuliani, D
    Gerosa, M
    Brugnara, F
    [J]. COMPUTER SPEECH AND LANGUAGE, 2006, 20 (01): : 107 - 123
  • [7] SPEAKER NORMALIZATION FOR SELF-SUPERVISED SPEECH EMOTION RECOGNITION
    Gat, Itai
    Aronowitz, Hagai
    Zhu, Weizhong
    Morais, Edmilson
    Hoory, Ron
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7342 - 7346
  • [8] Speaker recognition from coded speech and the effects of score normalization
    Dunn, RB
    Quatieri, TF
    Reynolds, DA
    Campbell, JP
    [J]. CONFERENCE RECORD OF THE THIRTY-FIFTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1 AND 2, 2001, : 1562 - 1567
  • [9] Emotional speech feature normalization and recognition based on speaker-sensitive feature clustering
    Huang, Chengwei
    Song, Baolin
    Zhao, Li
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (04) : 805 - 816
  • [10] Across-speaker Articulatory Normalization for Speaker-independent Silent Speech Recognition
    Wang, Jun
    Samal, Ashok
    Green, Jordan R.
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1179 - 1183