WHY WORD ERROR RATE IS NOT A GOOD METRIC FOR SPEECH RECOGNIZER TRAINING FOR THE SPEECH TRANSLATION TASK?

被引:0
|
作者
He, Xiaodong [1 ]
Deng, Li [1 ]
Acero, Alex [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
关键词
Speech translation; speech recognition; machine translation; translation metric; word error rate; BLEU score optimization; log-linear model;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech translation (ST) is an enabling technology for cross-lingual oral communication. A ST system consists of two major components: an automatic speech recognizer (ASR) and a machine translator (MT). Nowadays, most ASR systems are trained and tuned by minimizing word error rate (WER). However, WER counts word errors at the surface level. It does not consider the contextual and syntactic roles of a word, which are often critical for MT. In the end-to-end ST scenarios, whether WER is a good metric for the ASR component of the full ST system is an open issue and lacks systematic studies. In this paper, we report our recent investigation on this issue, focusing on the interactions of ASR and MT in a ST system. We show that BLEU-oriented global optimization of ASR system parameters improves the translation quality by an absolute 1.5% BLEU score, while sacrificing WER over the conventional, WER-optimized ASR system. We also conducted an in-depth study on the impact of ASR errors on the final ST output. Our findings suggest that the speech recognizer component of the full ST system should be optimized by translation metrics instead of the traditional WER.
引用
收藏
页码:5632 / 5635
页数:4
相关论文
共 50 条
  • [1] PHONETICALLY-ORIENTED WORD ERROR ALIGNMENT FOR SPEECH RECOGNITION ERROR ANALYSIS IN SPEECH TRANSLATION
    Ruiz, Nicholas
    Federico, Marcello
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 296 - 302
  • [2] PREDICTING WORD ERROR RATE FOR REVERBERANT SPEECH
    Gamper, Hannes
    Emmanouilidou, Dimitra
    Braun, Sebastian
    Tashev, Ivan J.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 491 - 495
  • [3] A 20000-WORD SPEECH RECOGNIZER OF ITALIAN
    BRANDETTI, M
    FERRETTI, M
    FUSI, A
    MALTESE, G
    SCARCI, S
    VITILLARO, G
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1989, 399 : 391 - 400
  • [4] A 20000-WORD SPEECH RECOGNIZER OF ITALIAN
    BRANDETTI, M
    FERRETTI, M
    FUSI, A
    MALTESE, G
    SCARCI, S
    VITILLARO, G
    [J]. RECENT ISSUES IN PATTERN ANALYSIS AND RECOGNITION, 1989, 399 : 391 - 400
  • [5] HMM speech recognizer based on discriminative metric design
    Watanabe, H
    Katagiri, S
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 3237 - 3240
  • [6] Phrase-based translation of speech recognizer word lattices using loglinear model combination
    Matusov, E
    Ney, H
    Schlüter, R
    [J]. 2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 110 - 115
  • [7] A SOM based 2500 - Isolated - Farsi - Word speech recognizer
    Shirazi, J
    Menhaj, MB
    [J]. ARTIFICIAL NEURAL NETWORKS: BIOLOGICAL INSPIRATIONS - ICANN 2005, PT 1, PROCEEDINGS, 2005, 3696 : 589 - 595
  • [8] Speech Rate Calculations with Short Utterances: A Study from a Speech-to-Speech, Machine Translation Mediated Map Task
    Akira, Hayakawa
    Vogel, Carl
    Luz, Saturnino
    Campbell, Nick
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3176 - 3183
  • [9] Integration of speech recognition and machine translation: Speech recognition word lattice translation
    Zhang, RQ
    Kikui, G
    [J]. SPEECH COMMUNICATION, 2006, 48 (3-4) : 321 - 334
  • [10] Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition
    Soltau, Hagen
    Liao, Hank
    Sak, Hasim
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3707 - 3711