PREDICTING WORD ERROR RATE FOR REVERBERANT SPEECH

被引:0
|
作者
Gamper, Hannes [1 ]
Emmanouilidou, Dimitra [1 ]
Braun, Sebastian [1 ]
Tashev, Ivan J. [1 ]
机构
[1] Microsoft Res, One Microsoft Way, Redmond, WA 98052 USA
关键词
Distant speech recognition; ASR; reverberation; T60; C50; RECOGNITION;
D O I
10.1109/icassp40776.2020.9053025
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Reverberation negatively impacts the performance of automatic speech recognition (ASR). Prior work on quantifying the effect of reverberation has shown that clarity (C50), a parameter that can be estimated from the acoustic impulse response, is correlated with ASR performance. In this paper we propose predicting ASR performance in terms of the word error rate (WER) directly from acoustic parameters via a polynomial, sigmoidal, or neural network fit, as well as blindly from reverberant speech samples using a convolutional neural network (CNN). We carry out experiments on two state-of-the-art ASR models and a large set of acoustic impulse responses (AIRs). The results confirm C50 and C80 to be highly correlated with WER, allowing WER to be predicted with the proposed fitting approaches. The proposed non-intrusive CNN model outperforms C50-based WER prediction, indicating that WER can be estimated blindly, i.e., directly from the reverberant speech samples without knowledge of the acoustic parameters.
引用
收藏
页码:491 / 495
页数:5
相关论文
共 50 条
  • [1] Optimizing expected word error rate via sampling for speech recognition
    Shannon, Matt
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3537 - 3541
  • [2] Word Error Rate Estimation for Speech Recognition: e-WER
    Ali, Ahmed
    Renals, Steve
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2018, : 20 - 24
  • [3] WHY WORD ERROR RATE IS NOT A GOOD METRIC FOR SPEECH RECOGNIZER TRAINING FOR THE SPEECH TRANSLATION TASK?
    He, Xiaodong
    Deng, Li
    Acero, Alex
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5632 - 5635
  • [4] Microphone array driven speech recognition:: Influence of localization on the word error rate
    Wölfel, M
    Nickel, K
    McDonough, J
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3869 : 320 - 331
  • [5] The Extended Speech Transmission Index: Predicting speech intelligibility in fluctuating noise and reverberant rooms
    van Schoonhoven, Jelmer
    Rhebergen, Koenraad S.
    Dreschler, Wouter A.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 145 (03): : 1178 - 1194
  • [6] SUBBAND MINIMUM CLASSIFICATION ERROR BEAMFORMING FOR SPEECH RECOGNITION IN REVERBERANT ENVIRONMENTS
    Liao, Yuan-Fu
    Xu, I-Yun
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4702 - 4705
  • [7] An Empirical Analysis of Word Error Rate and Keyword Error Rate
    Park, Youngja
    Patwardhan, Siddharth
    Visweswariah, Karthik
    Gates, Stephen C.
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2070 - +
  • [8] How much is a word? Predicting ease of articulation planning from apraxic speech error patterns
    Ziegler, Wolfram
    Aichert, Ingrid
    [J]. CORTEX, 2015, 69 : 24 - 39
  • [9] Word Error Rate Comparison between Single and Double Radar Solutions for Silent Speech Recognition
    Lee, Sunghwa
    Seo, Jiwon
    [J]. 2019 19TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2019), 2019, : 1211 - 1214
  • [10] WORD RATE AND INTELLIGIBILITY OF ALTERNATED SPEECH
    WINGFIELD, A
    WHEALE, JL
    [J]. PERCEPTION & PSYCHOPHYSICS, 1975, 18 (05): : 317 - 320