PREDICTING WORD ERROR RATE FOR REVERBERANT SPEECH

被引:0
|
作者
Gamper, Hannes [1 ]
Emmanouilidou, Dimitra [1 ]
Braun, Sebastian [1 ]
Tashev, Ivan J. [1 ]
机构
[1] Microsoft Res, One Microsoft Way, Redmond, WA 98052 USA
关键词
Distant speech recognition; ASR; reverberation; T60; C50; RECOGNITION;
D O I
10.1109/icassp40776.2020.9053025
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Reverberation negatively impacts the performance of automatic speech recognition (ASR). Prior work on quantifying the effect of reverberation has shown that clarity (C50), a parameter that can be estimated from the acoustic impulse response, is correlated with ASR performance. In this paper we propose predicting ASR performance in terms of the word error rate (WER) directly from acoustic parameters via a polynomial, sigmoidal, or neural network fit, as well as blindly from reverberant speech samples using a convolutional neural network (CNN). We carry out experiments on two state-of-the-art ASR models and a large set of acoustic impulse responses (AIRs). The results confirm C50 and C80 to be highly correlated with WER, allowing WER to be predicted with the proposed fitting approaches. The proposed non-intrusive CNN model outperforms C50-based WER prediction, indicating that WER can be estimated blindly, i.e., directly from the reverberant speech samples without knowledge of the acoustic parameters.
引用
收藏
页码:491 / 495
页数:5
相关论文
共 50 条
  • [41] Stacked Auto-Encoder for ASR Error Detection and Word Error Rate Prediction
    Jalalvand, Shahab
    Falavigna, Daniele
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2142 - 2146
  • [42] Controlling Grammatical Error Correction Using Word Edit Rate
    Hotate, Kengo
    Kaneko, Masahiro
    Katsumata, Satoru
    Komachi, Mamoru
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 149 - 154
  • [43] DECODING LINEAR BLOCK CODES FOR MINIMIZING WORD ERROR RATE
    HWANG, TY
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1979, 25 (06) : 733 - 737
  • [44] On the Relationship Between Bayes Risk and Word Error Rate in ASR
    Schlueter, Ralf
    Nussbaum-Thom, Markus
    Ney, Hermann
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05): : 1103 - 1112
  • [45] On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer
    Lu, Liang
    Meng, Zhong
    Kanda, Naoyuki
    Li, Jinyu
    Gong, Yifan
    [J]. INTERSPEECH 2021, 2021, : 3435 - 3439
  • [46] Toward Zero Oracle Word Error Rate on the Switchboard Benchmark
    Faria, Arlo
    Janin, Adam
    Adkoli, Sidhi
    Riedhammer, Korbinian
    [J]. INTERSPEECH 2022, 2022, : 3973 - 3977
  • [47] Evaluating Word Error Rate via Radius of Decision Region
    Dai, Liyun
    Yang, Hongwen
    [J]. 2011 IEEE VEHICULAR TECHNOLOGY CONFERENCE (VTC FALL), 2011,
  • [48] Minimizing word error rate in textual summaries of spoken language
    Zechner, K
    Waibel, A
    [J]. 6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : A186 - A193
  • [49] Word error rate minimization using an integrated confidence measure
    Kobayashi, Akio
    Onoe, Kazuo
    Homma, Shinichi
    Sato, Shoei
    Imai, Torn
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (05): : 835 - 843
  • [50] ENHANCEMENT OF REVERBERANT SPEECH USING THE CELP POSTFILTER
    Jeub, Marco
    Vary, Peter
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3993 - 3996