Predicting automatic speech recognition performance using prosodic cues

被引:0
|
作者
Litman, DJ [1 ]
Hirschberg, JB [1 ]
Swerts, M [1 ]
机构
[1] AT&T Labs Res, Florham Pk, NJ 07932 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In spoken dialogue systems, it is important for a system to know how likely a speech recognition hypothesis is to be correct, so it can reprompt for fresh input, or, in cases where many errors have occurred, change its interaction strategy or switch the caller to a human attendant. We have discovered prosodic features which more accurately predict when a recognition hypothesis contains a word error than the acoustic confidence score thresholds traditionally used in automatic speech recognition. We present analytic results indicating that there are significant prosodic differences between correctly and incorrectly recognized turns in the TOOT train information corpus. We then present machine learning results showing how the use of prosodic features to automatically predict correct versus incorrectly recognized turns improves over the use of acoustic confidence scores alone.
引用
收藏
页码:A218 / A225
页数:8
相关论文
共 50 条
  • [1] Prosodic and other cues to speech recognition failures
    Hirschberg, J
    Litman, D
    Swerts, M
    [J]. SPEECH COMMUNICATION, 2004, 43 (1-2) : 155 - 175
  • [2] Towards automatic detection of reported speech in dialogue using prosodic cues
    Cervone, Alessandra
    Lai, Catherine
    Pareti, Silvia
    Bell, Peter
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3061 - 3065
  • [3] Automatic speech recognition using audio visual cues
    Yashwanth, H
    Mahendrakar, H
    David, S
    [J]. PROCEEDINGS OF THE IEEE INDICON 2004, 2004, : 166 - 169
  • [4] Prosodic and accentual information for automatic speech recognition
    Milone, DH
    Rubio, AJ
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (04): : 321 - 333
  • [5] Prosodic knowledge sources for automatic speech recognition
    Vergyri, D
    Stolcke, A
    Gadde, VRR
    Ferrer, L
    Shriberg, E
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 208 - 211
  • [6] CONSONANTAL CUES FOR AUTOMATIC SPEECH RECOGNITION
    LARKIN, WD
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1960, 32 (11): : 1518 - 1518
  • [7] On using prosodic cues in automatic language identification
    ThymeGobbel, AE
    Hutchins, SE
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1768 - 1771
  • [8] INFRASONIC CUES FOR AUTOMATIC RECOGNITION OF SPEECH SOUNDS
    MYASNIKO.
    MYASNIKO.EN
    PEKELNYI, MY
    TRILESNIK, A
    [J]. SOVIET PHYSICS ACOUSTICS-USSR, 1969, 14 (04): : 522 - +
  • [9] Intelligibility Rating with Automatic Speech Recognition, Prosodic, and Cepstral Evaluation
    Haderlein, Tino
    Moers, Cornelia
    Moebius, Bernd
    Rosanowski, Frank
    Noeth, Elmar
    [J]. TEXT, SPEECH AND DIALOGUE, TSD 2011, 2011, 6836 : 195 - 202
  • [10] Automatic Evaluation of Parkinson's Speech - Acoustic, Prosodic and Voice Related Cues
    Bocklet, Tobias
    Steidl, Stefan
    Noeth, Elmar
    Skodda, Sabine
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1148 - 1152