WHY WORD ERROR RATE IS NOT A GOOD METRIC FOR SPEECH RECOGNIZER TRAINING FOR THE SPEECH TRANSLATION TASK?

被引:0
|
作者
He, Xiaodong [1 ]
Deng, Li [1 ]
Acero, Alex [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
关键词
Speech translation; speech recognition; machine translation; translation metric; word error rate; BLEU score optimization; log-linear model;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech translation (ST) is an enabling technology for cross-lingual oral communication. A ST system consists of two major components: an automatic speech recognizer (ASR) and a machine translator (MT). Nowadays, most ASR systems are trained and tuned by minimizing word error rate (WER). However, WER counts word errors at the surface level. It does not consider the contextual and syntactic roles of a word, which are often critical for MT. In the end-to-end ST scenarios, whether WER is a good metric for the ASR component of the full ST system is an open issue and lacks systematic studies. In this paper, we report our recent investigation on this issue, focusing on the interactions of ASR and MT in a ST system. We show that BLEU-oriented global optimization of ASR system parameters improves the translation quality by an absolute 1.5% BLEU score, while sacrificing WER over the conventional, WER-optimized ASR system. We also conducted an in-depth study on the impact of ASR errors on the final ST output. Our findings suggest that the speech recognizer component of the full ST system should be optimized by translation metrics instead of the traditional WER.
引用
收藏
页码:5632 / 5635
页数:4
相关论文
共 50 条
  • [31] Unsupervised phonetic and word level discovery for speech to speech translation for unwritten languages
    Hillis, Steven
    Kumar, Anushree Prasanna
    Black, Alan W.
    [J]. INTERSPEECH 2019, 2019, : 1138 - 1142
  • [32] Improving Automatic Speech Recognition and Speech Translation via Word Embedding Prediction
    Chuang, Shun-Po
    Liu, Alexander H.
    Sung, Tzu-Wei
    Lee, Hung-yi
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 93 - 105
  • [33] EXPERIMENTING NATURAL-LANGUAGE DICTATION WITH A 20000-WORD SPEECH RECOGNIZER
    ALTO, P
    BRANDETTI, M
    FERRETTI, M
    MALTESE, G
    SCARCI, S
    [J]. VLSI AND COMPUTER PERIPHERALS: VLSI AND MICROELECTRONIC APPLICATIONS IN INTELLIGENT PERIPHERALS AND THEIR INTERCONNECTION NETWORKS, 1989, : B78 - B81
  • [34] Unsupervised training of an HMM-based Speech Recognizer for Topic Classification
    Gish, Herbert
    Siu, Man-hung
    Chan, Arthur
    Belfield, Bill
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1895 - 1898
  • [35] Unsupervised training for Farsi-English speech-to-speech translation
    Xiang, Bing
    Deng, Yonggang
    Gao, Yuqing
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4977 - 4980
  • [36] Study on the impact of the training corpus of the language model on the performance of a speech recognizer
    Pineiro Martin, Andres
    Garcia-Mateo, Carmen
    Docio-Fernandez, Laura
    Luis Regueira, Xose
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2018, (61): : 75 - 82
  • [37] A Speech-to-Speech, Machine Translation Mediated Map Task: An Exploratory Study
    Cerrato, Loredana
    Akira, Hayakawa
    Campbell, Nick
    Luz, Saturnino
    [J]. FUTURE AND EMERGENT TRENDS IN LANGUAGE TECHNOLOGY, FETLT 2015, 2016, 9577 : 53 - 64
  • [38] Why is this Wrong? Diagnosing Erroneous Speech Recognizer Output with a Two Phase Parser
    Ludwig, Bernd
    Hacker, Martin
    [J]. ECAI 2008, PROCEEDINGS, 2008, 178 : 323 - +
  • [39] Word-selective training for speech recognition
    Kamm, TM
    Meyer, GGL
    [J]. ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 55 - 60
  • [40] MIGRATION OF SPEECH UNITS IN AN ILLUSORY WORD DETECTION TASK
    MORAIS, J
    KOLINSKY, R
    [J]. BULLETIN OF THE PSYCHONOMIC SOCIETY, 1991, 29 (06) : 482 - 482