Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

被引:1
|
作者
Yapanel, Umit H. [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, Ctr Robust Speech Syst, Richardson, TX 75083 USA
关键词
D O I
10.1155/2008/148967
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A proven method for achieving effective automatic speech recognition (ASR) due to speaker differences is to perform acoustic feature speaker normalization. More effective speaker normalization methods are needed which require limited computing resources for real-time performance. The most popular speaker normalization technique is vocal-tract length normalization (VTLN), despite the fact that it is computationally expensive. In this study, we propose a novel online VTLN algorithm entitled built-in speaker normalization (BISN), where normalization is performed on-the-fly within a newly proposed PMVDR acoustic front end. The novel algorithm aspect is that in conventional frontend processing with PMVDR and VTLN, two separating warping phases are needed; while in the proposed BISN method only one single speaker dependent warp is used to achieve both the PMVDR perceptual warp and VTLN warp simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces computational requirements, thereby offering advantages for real-time ASR systems. Evaluations are performed for (i) an in-car extended digit recognition task, where an on-the-fly BISN implementation reduces the relative word error rate (WER) by 24%, and (ii) for a diverse noisy speech task (SPINE 2), where the relative WER improvement was 9%, both relative to the baseline speaker normalization method. Copyright (C) 2008 U. H. Yapanel and J. H. L. Hansen.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization
    Umit H. Yapanel
    John H.L. Hansen
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2008
  • [2] Towards an intelligent acoustic front-end for automatic speech recognition:built-in speaker normalization (BISN)
    Yapanel, UH
    Hansen, JHL
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 949 - 952
  • [3] Perceptual MVDR-based Unsupervised Built-in Speaker Normalization for Kazakh Speech Recognition
    Yessenbayev, Zhandos
    Yapanel, Umit
    [J]. 2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT), 2014, : 87 - 91
  • [4] Correlation Networks for Speaker Normalization in Automatic Speech Recognition
    Sharon, Rini A.
    Kothinti, Sandeep Reddy
    Umesh, Srinivasan
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 882 - 886
  • [5] Improved automatic speech recognition through speaker normalization
    Giuliani, D
    Gerosa, M
    Brugnara, F
    [J]. COMPUTER SPEECH AND LANGUAGE, 2006, 20 (01): : 107 - 123
  • [6] Acoustic quality normalization for robust automatic speech recognition
    Muhammad G.
    [J]. International Journal of Speech Technology, 2007, 10 (4) : 175 - 182
  • [7] Towards End-to-End Private Automatic Speaker Recognition
    Teixeira, Francisco
    Abad, Alberto
    Raj, Bhiksha
    Trancoso, Isabel
    [J]. INTERSPEECH 2022, 2022, : 2798 - 2802
  • [8] COMBINING SPEAKER AND NOISE FEATURE NORMALIZATION TECHNIQUES FOR AUTOMATIC SPEECH RECOGNITION
    Garcia, L.
    Benitez, C.
    Segura, J. C.
    Umesh, S.
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5496 - 5499
  • [9] An efficient front-end for automatic speech recognition
    Ahadi, SM
    Sheikhzadeh, H
    Brennan, RL
    Freeman, GH
    [J]. ICECS 2003: PROCEEDINGS OF THE 2003 10TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS 1-3, 2003, : 128 - 131
  • [10] SPEAKER NORMALIZATION FOR AUTOMATIC WORD RECOGNITION
    BOEHM, JF
    WRIGHT, RD
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1971, 49 (01): : 133 - &