WHY WORD ERROR RATE IS NOT A GOOD METRIC FOR SPEECH RECOGNIZER TRAINING FOR THE SPEECH TRANSLATION TASK?

被引:0
|
作者
He, Xiaodong [1 ]
Deng, Li [1 ]
Acero, Alex [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
关键词
Speech translation; speech recognition; machine translation; translation metric; word error rate; BLEU score optimization; log-linear model;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech translation (ST) is an enabling technology for cross-lingual oral communication. A ST system consists of two major components: an automatic speech recognizer (ASR) and a machine translator (MT). Nowadays, most ASR systems are trained and tuned by minimizing word error rate (WER). However, WER counts word errors at the surface level. It does not consider the contextual and syntactic roles of a word, which are often critical for MT. In the end-to-end ST scenarios, whether WER is a good metric for the ASR component of the full ST system is an open issue and lacks systematic studies. In this paper, we report our recent investigation on this issue, focusing on the interactions of ASR and MT in a ST system. We show that BLEU-oriented global optimization of ASR system parameters improves the translation quality by an absolute 1.5% BLEU score, while sacrificing WER over the conventional, WER-optimized ASR system. We also conducted an in-depth study on the impact of ASR errors on the final ST output. Our findings suggest that the speech recognizer component of the full ST system should be optimized by translation metrics instead of the traditional WER.
引用
收藏
页码:5632 / 5635
页数:4
相关论文
共 50 条
  • [41] METHODS OF CONTROLLING WORD RATE OF RECORDED SPEECH
    FOULKE, E
    [J]. JOURNAL OF COMMUNICATION, 1970, 20 (03) : 305 - 314
  • [42] Accounting for Speech Rate in Spoken Word Recognition
    Li, David Cheng-Huan
    Kaiser, Elsi
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2007 - 2010
  • [43] Unified Speech-Text Pre-training for Speech Translation and Recognition
    Tang, Yun
    Gong, Hongyu
    Dong, Ning
    Wang, Changhan
    Hsu, Wei-Ning
    Gu, Jiatao
    Baevski, Alexei
    Li, Xian
    Mohamed, Abdelrahman
    Auli, Michael
    Pino, Juan
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1488 - 1499
  • [44] A programmable application-specific VLSI architecture and implementation for speech word-recognizer
    Suen, AN
    Wang, JF
    Wang, TD
    [J]. PROCEEDINGS OF THE ASP-DAC '97 - ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE 1997, 1996, : 71 - 75
  • [45] Using a large vocabulary continuous speech recognizer for a constrained domain with limited training
    Siu, MH
    Jonas, M
    Gish, H
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 105 - 108
  • [46] Speech Synthesis for Error Training Models in CALL
    Zhang, Xin
    Lu, Qin
    Wan, Jiping
    Ma, Guangguang
    Chiu, Tin Shing
    Ye, Weiping
    Zhou, Wenli
    Li, Qiao
    [J]. COMPUTER PROCESSING OF ORIENTAL LANGUAGES: LANGUAGE TECHNOLOGY FOR THE KNOWLEDGE-BASED ECONOMY, 2009, 5459 : 260 - +
  • [47] Word Error Rate Comparison between Single and Double Radar Solutions for Silent Speech Recognition
    Lee, Sunghwa
    Seo, Jiwon
    [J]. 2019 19TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2019), 2019, : 1211 - 1214
  • [48] A multitask co-training framework for improving speech translation by leveraging speech recognition and machine translation tasks
    Zhou, Yue
    Yuan, Yuxuan
    Shi, Xiaodong
    [J]. NEURAL COMPUTING & APPLICATIONS, 2024, 36 (15): : 8641 - 8656
  • [49] A multitask co-training framework for improving speech translation by leveraging speech recognition and machine translation tasks
    Yue Zhou
    Yuxuan Yuan
    Xiaodong Shi
    [J]. Neural Computing and Applications, 2024, 36 : 8641 - 8656
  • [50] Remote Spoken Document Retrieval using Foreground Speech Segmentation based Isolated Word Recognizer
    Deepak, K. T.
    Prasanna, S. R. Mahadeva
    [J]. 2013 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2013,