WHY WORD ERROR RATE IS NOT A GOOD METRIC FOR SPEECH RECOGNIZER TRAINING FOR THE SPEECH TRANSLATION TASK?

被引：0

作者：

He, Xiaodong ^{[1
]}

Deng, Li ^{[1
]}

Acero, Alex ^{[1
]}

机构：

[1] Microsoft Res, Redmond, WA 98052 USA

来源：

2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2011年

关键词：

Speech translation; speech recognition; machine translation; translation metric; word error rate; BLEU score optimization; log-linear model;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speech translation (ST) is an enabling technology for cross-lingual oral communication. A ST system consists of two major components: an automatic speech recognizer (ASR) and a machine translator (MT). Nowadays, most ASR systems are trained and tuned by minimizing word error rate (WER). However, WER counts word errors at the surface level. It does not consider the contextual and syntactic roles of a word, which are often critical for MT. In the end-to-end ST scenarios, whether WER is a good metric for the ASR component of the full ST system is an open issue and lacks systematic studies. In this paper, we report our recent investigation on this issue, focusing on the interactions of ASR and MT in a ST system. We show that BLEU-oriented global optimization of ASR system parameters improves the translation quality by an absolute 1.5% BLEU score, while sacrificing WER over the conventional, WER-optimized ASR system. We also conducted an in-depth study on the impact of ASR errors on the final ST output. Our findings suggest that the speech recognizer component of the full ST system should be optimized by translation metrics instead of the traditional WER.

引用

页码：5632 / 5635

页数：4

共 50 条

[1] PHONETICALLY-ORIENTED WORD ERROR ALIGNMENT FOR SPEECH RECOGNITION ERROR ANALYSIS IN SPEECH TRANSLATION
Ruiz, Nicholas
Federico, Marcello
[J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 296 - 302
[2] PREDICTING WORD ERROR RATE FOR REVERBERANT SPEECH
Gamper, Hannes
Emmanouilidou, Dimitra
Braun, Sebastian
Tashev, Ivan J.
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 491 - 495
[3] A 20000-WORD SPEECH RECOGNIZER OF ITALIAN
BRANDETTI, M
FERRETTI, M
FUSI, A
MALTESE, G
SCARCI, S
VITILLARO, G
[J]. LECTURE NOTES IN COMPUTER SCIENCE, 1989, 399 : 391 - 400
[4] A 20000-WORD SPEECH RECOGNIZER OF ITALIAN
BRANDETTI, M
FERRETTI, M
FUSI, A
MALTESE, G
SCARCI, S
VITILLARO, G
[J]. RECENT ISSUES IN PATTERN ANALYSIS AND RECOGNITION, 1989, 399 : 391 - 400
[5] HMM speech recognizer based on discriminative metric design
Watanabe, H
Katagiri, S
[J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 3237 - 3240
[6] Phrase-based translation of speech recognizer word lattices using loglinear model combination
Matusov, E
Ney, H
Schlüter, R
[J]. 2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 110 - 115
[7] A SOM based 2500 - Isolated - Farsi - Word speech recognizer
Shirazi, J
Menhaj, MB
[J]. ARTIFICIAL NEURAL NETWORKS: BIOLOGICAL INSPIRATIONS - ICANN 2005, PT 1, PROCEEDINGS, 2005, 3696 : 589 - 595
[8] Speech Rate Calculations with Short Utterances: A Study from a Speech-to-Speech, Machine Translation Mediated Map Task
Akira, Hayakawa
Vogel, Carl
Luz, Saturnino
Campbell, Nick
[J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3176 - 3183
[9] Integration of speech recognition and machine translation: Speech recognition word lattice translation
Zhang, RQ
Kikui, G
[J]. SPEECH COMMUNICATION, 2006, 48 (3-4) : 321 - 334
[10] Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition
Soltau, Hagen
Liao, Hank
Sak, Hasim
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3707 - 3711

← 1 2 3 4 5 →