Stochastic Vector Mapping-based Feature Enhancement Using Prior Model and Environment Adaptation for Noisy Speech Recognition

被引:0
|
作者
Hsieh, Chia-Hsin [1 ]
Wu, Chung-Hsien [1 ]
Lin, Jun-Yu [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 70101, Taiwan
关键词
noisy speech recognition; feature enhancement; environment adaptation; prior model;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an approach to feature enhancement for noisy speech recognition. Three prior models are introduced to characterize clean speech, noise and noisy speech respectively using sequential noise estimation based on noise-normalized stochastic vector mapping. Environment adaptation is also adopted to reduce the mismatch between training data and test data. For AURORA2 database, the experimental results indicate that a 0.77% digit accuracy improvement for multi-condition training and 0.29% digit accuracy improvement for clean speech training were achieved without stereo training data compared to the SPLICE-based approach with recursive noise estimation. For MAT-BN Mandarin broadcast news database, a 2.6% syllable accuracy improvement for anchor speech and 4.2% syllable accuracy improvement for field report speech were obtained compared to the MCE-based approach.
引用
收藏
页码:29 / 32
页数:4
相关论文
共 50 条
  • [1] Stochastic vector mapping-based feature enhancement using prior-models and model adaptation for noisy speech recognition
    Hsieh, Chia-Hsin
    Wu, Chung-Hsien
    [J]. SPEECH COMMUNICATION, 2008, 50 (06) : 467 - 475
  • [2] Feature adaptation using deviation vector for robust speech recognition in noisy environment
    Hwang, TH
    Lee, LM
    Wang, HC
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1227 - 1230
  • [3] Model-based feature enhancement for noisy speech recognition
    Couvreur, C
    Van hamme, H
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1719 - 1722
  • [4] Word graph based feature enhancement for noisy speech recognition
    Yan, Zhi-Jie
    Soong, Frank K.
    Wang, Ren-Hua
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 373 - +
  • [5] Speech Recognition in Noisy Environments using a Switching Linear Dynamic Model for Feature Enhancement
    Schuller, Bjoern
    Woellmer, Martin
    Moosmayr, Tobias
    Rigoll, Gerhard
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1789 - +
  • [6] Recognition of Noisy Speech: A Comparative Survey of Robust Model Architecture and Feature Enhancement
    Schuller, Bjoern
    Woellmer, Martin
    Moosmayr, Tobias
    Rigoll, Gerhard
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2009,
  • [7] Recognition of Noisy Speech: A Comparative Survey of Robust Model Architecture and Feature Enhancement
    Björn Schuller
    Martin Wöllmer
    Tobias Moosmayr
    Gerhard Rigoll
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2009
  • [8] Speech enhancement method based on feature compensation gain for effective speech recognition in noisy environments
    Bae, Ara
    Kim, Wooil
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2019, 38 (01): : 51 - 55
  • [9] Multi-environment model adaptation based on vector Taylor series for robust speech recognition
    Lue, Yong
    Wu, Haiyang
    Zhou, Lin
    Wu, Zhenyang
    [J]. PATTERN RECOGNITION, 2010, 43 (09) : 3093 - 3099
  • [10] Model-Based Feature Enhancement for Reverberant Speech Recognition
    Krueger, Alexander
    Haeb-Umbach, Reinhold
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07): : 1692 - 1707