Predicting automatic speech recognition performance using prosodic cues

被引:0
|
作者
Litman, DJ [1 ]
Hirschberg, JB [1 ]
Swerts, M [1 ]
机构
[1] AT&T Labs Res, Florham Pk, NJ 07932 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In spoken dialogue systems, it is important for a system to know how likely a speech recognition hypothesis is to be correct, so it can reprompt for fresh input, or, in cases where many errors have occurred, change its interaction strategy or switch the caller to a human attendant. We have discovered prosodic features which more accurately predict when a recognition hypothesis contains a word error than the acoustic confidence score thresholds traditionally used in automatic speech recognition. We present analytic results indicating that there are significant prosodic differences between correctly and incorrectly recognized turns in the TOOT train information corpus. We then present machine learning results showing how the use of prosodic features to automatically predict correct versus incorrectly recognized turns improves over the use of acoustic confidence scores alone.
引用
收藏
页码:A218 / A225
页数:8
相关论文
共 50 条
  • [21] Automatic Speech Recognition Performance for Training on Noised Speech
    Prodeus, Arkadiy
    Kukharicheva, Kateryna
    [J]. 2017 2ND IEEE INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION AND COMMUNICATION TECHNOLOGIES-2017 (AICT 2017), 2017, : 71 - 74
  • [22] AUTOMATIC DETECTION OF PROSODIC BOUNDARIES IN SPEECH
    CAMPBELL, N
    [J]. SPEECH COMMUNICATION, 1993, 13 (3-4) : 343 - 354
  • [23] Emotion Recognition from Speech using Prosodic and Linguistic Features
    Pervaiz, Mahwish
    Khan, Tamim Ahmed
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (08) : 84 - 90
  • [24] PHRASE RECOGNITION IN CONVERSATIONAL SPEECH USING PROSODIC AND PHONEMIC INFORMATION
    OKAWA, S
    ENDO, T
    KOBAYASHI, T
    SHIRAI, K
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1993, E76D (01) : 44 - 50
  • [25] Attention and Feature Selection for Automatic Speech Emotion Recognition Using Utterance and Syllable-Level Prosodic Features
    Starlet Ben Alex
    Leena Mary
    Ben P. Babu
    [J]. Circuits, Systems, and Signal Processing, 2020, 39 : 5681 - 5709
  • [26] Predicting Automatic Speech Recognition Performance over Communication Channels from Instrumental Speech Quality and Intelligibility Scores
    Gallardo, Laura Fernandez
    Moeller, Sebastian
    Beerends, John
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2939 - 2943
  • [27] Spectral and Temporal Envelope Cues for Human and Automatic Speech Recognition in Noise
    Guangxin Hu
    Sarah C. Determan
    Yue Dong
    Alec T. Beeve
    Joshua E. Collins
    Yan Gai
    [J]. Journal of the Association for Research in Otolaryngology, 2020, 21 : 73 - 87
  • [28] Prosodic feature normalization for emotion recognition by using synthesized speech
    Suzuki, Motoyuki
    Nakagawa, Shohei
    Kita, Kenji
    [J]. ADVANCES IN KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, 2012, 243 : 306 - 313
  • [29] Spectral and Temporal Envelope Cues for Human and Automatic Speech Recognition in Noise
    Hu, Guangxin
    Determan, Sarah C.
    Dong, Yue
    Beeve, Alec T.
    Collins, Joshua E.
    Gai, Yan
    [J]. JARO-JOURNAL OF THE ASSOCIATION FOR RESEARCH IN OTOLARYNGOLOGY, 2020, 21 (01): : 73 - 87
  • [30] An Automatic Diagnosis and Assessment of Dysarthric Speech using Speech Disorder Specific Prosodic Features
    Vyas, Garima
    Dutta, Malay Kishore
    Prinosil, Jiri
    Harar, Pavol
    [J]. 2016 39TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2016, : 515 - 518