On including temporal constraints in Viterbi alignment for speech recognition in noise

被引：14

作者：

Yoma, NB ^{[1
]}

McInnes, FR ^{[1
]}

Jack, MA ^{[1
]}

Stump, SD ^{[1
]}

Ling, LL ^{[1
]}

机构：

[1] Univ Chile, Dept Elect Engn, Santiago, Chile

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2001年 / 9卷 / 02期

关键词：

duration modeling; noise robustness; speech recognition;

D O I：

10.1109/89.902285

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper addresses the problem of temporal constraints in the Viterbi algorithm in speaker-dependent and independent tasks. The results here presented suggest that in a speaker-dependent task the introduction of temporal constraints can lead to a high improvement with additive or convolutional noise, the statistical modeling of state durations is not relevant if the max and min state duration restrictions are imposed, and truncated probability densities give better results than a metric previously proposed. Finally, word position dependent and independent temporal restrictions are compared in connected word speech recognition experiments and it is shown that the former leads to better results with the same computational load. However, duration model effect could be much less significant when the acoustic model is optimized and when the training and testing conditions are matched.

引用

页码：179 / 182

页数：4

共 50 条

[1] Weighted Viterbi algorithm and state duration modelling for speech recognition in noise
Yoma, NB
McInnes, FR
Jack, MA
PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 709 - 712
[2] A robust Viterbi algorithm against impulsive noise with application to speech recognition
Siu, Manhung
Chan, Arthur
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (06): : 2122 - 2133
[3] Noise robust speech recognition with state duration constraints
Laurila, K
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 871 - 874
[4] Temporal Resolution and Speech Recognition in Noise in Adults with Hearing Aid
Gumus, Birgul
Derinsu, Ufuk
B-ENT, 2023, 19 (01) : 18 - 23
[5] Influence of Instantaneous Compression on Recognition of Speech in Noise with Temporal Dips
Rasetshwane, Daniel M.
Raybine, David A.
Kopun, Judy G.
Gorga, Michael P.
Neely, Stephen T.
JOURNAL OF THE AMERICAN ACADEMY OF AUDIOLOGY, 2019, 30 (01) : 16 - 30
[6] Combining feature compensation and Weighted Viterbi Decoding for noise robust speech recognition with limited adaptation data
Cui, XD
Alwan, A
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 969 - 972
[7] SPEECH RECOGNITION IN NOISE, TEMPORAL AND SPECTRAL RESOLUTION IN NORMAL AND IMPAIRED HEARING
ARLINGER, S
DRYSELIUS, H
ACTA OTO-LARYNGOLOGICA, 1990, : 30 - 37
[8] Spectral and Temporal Envelope Cues for Human and Automatic Speech Recognition in Noise
Guangxin Hu
Sarah C. Determan
Yue Dong
Alec T. Beeve
Joshua E. Collins
Yan Gai
Journal of the Association for Research in Otolaryngology, 2020, 21 : 73 - 87
[9] On the temporal decorrelation of feature parameters for noise-robust speech recognition
Jung, HY
Lee, SY
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04): : 407 - 416
[10] Spectral and Temporal Envelope Cues for Human and Automatic Speech Recognition in Noise
Hu, Guangxin
Determan, Sarah C.
Dong, Yue
Beeve, Alec T.
Collins, Joshua E.
Gai, Yan
JARO-JOURNAL OF THE ASSOCIATION FOR RESEARCH IN OTOLARYNGOLOGY, 2020, 21 (01): : 73 - 87

← 1 2 3 4 5 →