Predicting automatic speech recognition performance using prosodic cues

被引：0

作者：

Litman, DJ ^{[1
]}

Hirschberg, JB ^{[1
]}

Swerts, M ^{[1
]}

机构：

[1] AT&T Labs Res, Florham Pk, NJ 07932 USA

来源：

6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP | 2000年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In spoken dialogue systems, it is important for a system to know how likely a speech recognition hypothesis is to be correct, so it can reprompt for fresh input, or, in cases where many errors have occurred, change its interaction strategy or switch the caller to a human attendant. We have discovered prosodic features which more accurately predict when a recognition hypothesis contains a word error than the acoustic confidence score thresholds traditionally used in automatic speech recognition. We present analytic results indicating that there are significant prosodic differences between correctly and incorrectly recognized turns in the TOOT train information corpus. We then present machine learning results showing how the use of prosodic features to automatically predict correct versus incorrectly recognized turns improves over the use of acoustic confidence scores alone.

引用

页码：A218 / A225

页数：8

共 50 条

[1] Prosodic and other cues to speech recognition failures
Hirschberg, J
Litman, D
Swerts, M
[J]. SPEECH COMMUNICATION, 2004, 43 (1-2) : 155 - 175
[2] Towards automatic detection of reported speech in dialogue using prosodic cues
Cervone, Alessandra
Lai, Catherine
Pareti, Silvia
Bell, Peter
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3061 - 3065
[3] Automatic speech recognition using audio visual cues
Yashwanth, H
Mahendrakar, H
David, S
[J]. PROCEEDINGS OF THE IEEE INDICON 2004, 2004, : 166 - 169
[4] Prosodic and accentual information for automatic speech recognition
Milone, DH
Rubio, AJ
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (04): : 321 - 333
[5] Prosodic knowledge sources for automatic speech recognition
Vergyri, D
Stolcke, A
Gadde, VRR
Ferrer, L
Shriberg, E
[J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 208 - 211
[6] CONSONANTAL CUES FOR AUTOMATIC SPEECH RECOGNITION
LARKIN, WD
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1960, 32 (11): : 1518 - 1518
[7] On using prosodic cues in automatic language identification
ThymeGobbel, AE
Hutchins, SE
[J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1768 - 1771
[8] INFRASONIC CUES FOR AUTOMATIC RECOGNITION OF SPEECH SOUNDS
MYASNIKO.
MYASNIKO.EN
PEKELNYI, MY
TRILESNIK, A
[J]. SOVIET PHYSICS ACOUSTICS-USSR, 1969, 14 (04): : 522 - +
[9] Intelligibility Rating with Automatic Speech Recognition, Prosodic, and Cepstral Evaluation
Haderlein, Tino
Moers, Cornelia
Moebius, Bernd
Rosanowski, Frank
Noeth, Elmar
[J]. TEXT, SPEECH AND DIALOGUE, TSD 2011, 2011, 6836 : 195 - 202
[10] Automatic Evaluation of Parkinson's Speech - Acoustic, Prosodic and Voice Related Cues
Bocklet, Tobias
Steidl, Stefan
Noeth, Elmar
Skodda, Sabine
[J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1148 - 1152

← 1 2 3 4 5 →