On including temporal constraints in Viterbi alignment for speech recognition in noise

被引:14
|
作者
Yoma, NB [1 ]
McInnes, FR [1 ]
Jack, MA [1 ]
Stump, SD [1 ]
Ling, LL [1 ]
机构
[1] Univ Chile, Dept Elect Engn, Santiago, Chile
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2001年 / 9卷 / 02期
关键词
duration modeling; noise robustness; speech recognition;
D O I
10.1109/89.902285
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper addresses the problem of temporal constraints in the Viterbi algorithm in speaker-dependent and independent tasks. The results here presented suggest that in a speaker-dependent task the introduction of temporal constraints can lead to a high improvement with additive or convolutional noise, the statistical modeling of state durations is not relevant if the max and min state duration restrictions are imposed, and truncated probability densities give better results than a metric previously proposed. Finally, word position dependent and independent temporal restrictions are compared in connected word speech recognition experiments and it is shown that the former leads to better results with the same computational load. However, duration model effect could be much less significant when the acoustic model is optimized and when the training and testing conditions are matched.
引用
收藏
页码:179 / 182
页数:4
相关论文
共 50 条
  • [1] Weighted Viterbi algorithm and state duration modelling for speech recognition in noise
    Yoma, NB
    McInnes, FR
    Jack, MA
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 709 - 712
  • [2] A robust Viterbi algorithm against impulsive noise with application to speech recognition
    Siu, Manhung
    Chan, Arthur
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (06): : 2122 - 2133
  • [3] Noise robust speech recognition with state duration constraints
    Laurila, K
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 871 - 874
  • [4] Temporal Resolution and Speech Recognition in Noise in Adults with Hearing Aid
    Gumus, Birgul
    Derinsu, Ufuk
    B-ENT, 2023, 19 (01) : 18 - 23
  • [5] Influence of Instantaneous Compression on Recognition of Speech in Noise with Temporal Dips
    Rasetshwane, Daniel M.
    Raybine, David A.
    Kopun, Judy G.
    Gorga, Michael P.
    Neely, Stephen T.
    JOURNAL OF THE AMERICAN ACADEMY OF AUDIOLOGY, 2019, 30 (01) : 16 - 30
  • [6] Combining feature compensation and Weighted Viterbi Decoding for noise robust speech recognition with limited adaptation data
    Cui, XD
    Alwan, A
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 969 - 972
  • [7] SPEECH RECOGNITION IN NOISE, TEMPORAL AND SPECTRAL RESOLUTION IN NORMAL AND IMPAIRED HEARING
    ARLINGER, S
    DRYSELIUS, H
    ACTA OTO-LARYNGOLOGICA, 1990, : 30 - 37
  • [8] Spectral and Temporal Envelope Cues for Human and Automatic Speech Recognition in Noise
    Guangxin Hu
    Sarah C. Determan
    Yue Dong
    Alec T. Beeve
    Joshua E. Collins
    Yan Gai
    Journal of the Association for Research in Otolaryngology, 2020, 21 : 73 - 87
  • [9] On the temporal decorrelation of feature parameters for noise-robust speech recognition
    Jung, HY
    Lee, SY
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04): : 407 - 416
  • [10] Spectral and Temporal Envelope Cues for Human and Automatic Speech Recognition in Noise
    Hu, Guangxin
    Determan, Sarah C.
    Dong, Yue
    Beeve, Alec T.
    Collins, Joshua E.
    Gai, Yan
    JARO-JOURNAL OF THE ASSOCIATION FOR RESEARCH IN OTOLARYNGOLOGY, 2020, 21 (01): : 73 - 87