On including temporal constraints in Viterbi alignment for speech recognition in noise

被引:14
|
作者
Yoma, NB [1 ]
McInnes, FR [1 ]
Jack, MA [1 ]
Stump, SD [1 ]
Ling, LL [1 ]
机构
[1] Univ Chile, Dept Elect Engn, Santiago, Chile
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2001年 / 9卷 / 02期
关键词
duration modeling; noise robustness; speech recognition;
D O I
10.1109/89.902285
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper addresses the problem of temporal constraints in the Viterbi algorithm in speaker-dependent and independent tasks. The results here presented suggest that in a speaker-dependent task the introduction of temporal constraints can lead to a high improvement with additive or convolutional noise, the statistical modeling of state durations is not relevant if the max and min state duration restrictions are imposed, and truncated probability densities give better results than a metric previously proposed. Finally, word position dependent and independent temporal restrictions are compared in connected word speech recognition experiments and it is shown that the former leads to better results with the same computational load. However, duration model effect could be much less significant when the acoustic model is optimized and when the training and testing conditions are matched.
引用
收藏
页码:179 / 182
页数:4
相关论文
共 50 条
  • [31] Noise compensation for speech recognition with arbitrary additive noise
    Ming, J
    ELECTRONICS LETTERS, 2004, 40 (03) : 206 - 207
  • [32] Quantifying temporal speech reduction in French using forced speech alignment
    Adda-Decker, Martine
    Snoeren, Natalie D.
    JOURNAL OF PHONETICS, 2011, 39 (03) : 261 - 270
  • [33] Noise compensation for speech recognition with arbitrary additive noise
    Ming, J
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (03): : 833 - 844
  • [34] Noise filtering enhances speech recognition
    Kempainen, S
    EDN, 1997, 42 (10) : 30 - 30
  • [35] Speech Recognition in Natural Background Noise
    Meyer, Julien
    Dentel, Laure
    Meunier, Fanny
    PLOS ONE, 2013, 8 (11):
  • [36] Speech Recognition at the Acceptable Noise Level
    Gordon-Hickey, Susan
    Morlas, Holly
    JOURNAL OF THE AMERICAN ACADEMY OF AUDIOLOGY, 2015, 26 (05) : 443 - 450
  • [37] Toward noise robustness speech recognition
    Namarvar, HH
    Liaw, J
    Berger, TW
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 4016 - 4016
  • [38] Speech Recognition in High Noise Environment
    Tang, Chunling
    Li, Min
    EKOLOJI, 2019, 28 (107): : 1561 - 1565
  • [39] Temporal Context in Speech Emotion Recognition
    Xia, Yangyang
    Chen, Li-Wei
    Rudnicky, Alexander
    Stern, Richard M.
    INTERSPEECH 2021, 2021, : 3370 - 3374