Identification of Relevant and Redundant Automatic Metrics for MT Evaluation

被引:7
|
作者
Munk, Michal [1 ]
Munkova, Dasa [2 ]
Benko, L'ubomir [3 ]
机构
[1] Constantine Philosopher Univ Nitra, Fac Nat Sci, Dept Informat, Nitra, Slovakia
[2] Constantine Philosopher Univ Nitra, Dept Translat Studies, Nitra, Slovakia
[3] Univ Pardubice, Inst Syst Engn & Informat, Pardubice, Czech Republic
关键词
Machine translation; Evaluation; Automatic metrics; Reliability; Entropy; Redundancy;
D O I
10.1007/978-3-319-49397-8_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The paper is aimed at automatic metrics for translation quality assessment (TQA), specifically at machine translation (MT) output and the metrics for the evaluation of MT output (Precision, Recall, F-measure, BLEU, PER, WER and CDER). We examine their reliability and we determine the metrics which show decreasing reliability of the automatic evaluation of MT output. Besides the traditional measures (Cronbach's alpha and standardized alpha) we use entropy for assessing the reliability of the automatic metrics of MT output. The results were obtained on a dataset covering translation from a low resource language (SK) into English (EN). The main contribution consists of the identification of the redundant automatic MT evaluation metrics.
引用
收藏
页码:141 / 152
页数:12
相关论文
共 50 条
  • [21] Sensitivity of automated MT evaluation metrics on higher quality MT output: BLEU vs task-based evaluation methods
    Babych, Bogdan
    Hartley, Anthony
    [J]. SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 2133 - 2136
  • [22] Reassessing Automatic Evaluation Metrics for Code Summarization Tasks
    Roy, Devjeet
    Fakhoury, Sarah
    Arnaoudova, Venera
    [J]. PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21), 2021, : 1105 - 1116
  • [23] Automatic Metrics for Machine Translation Evaluation and Minority Languages
    Munkova, Dasa
    Munk, Michal
    [J]. PROCEEDINGS OF THE MEDITERRANEAN CONFERENCE ON INFORMATION & COMMUNICATION TECHNOLOGIES 2015 (MEDCT 2015), VOL 2, 2016, 381 : 631 - 636
  • [24] Automatic Prediction of Speech Evaluation Metrics for Dysarthric Speech
    Laaridh, Imed
    Ben Kheder, Waad
    Fredouille, Corinne
    Meunier, Christine
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1834 - 1838
  • [25] A Study of Automatic Metrics for the Evaluation of Natural Language Explanations
    Clinciu, Miruna-Adriana
    Eshghi, Arash
    Hastie, Helen
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2376 - 2387
  • [26] The (Un)Suitability of Automatic Evaluation Metrics for Text Simplification
    Alva-Manchego, Fernando
    Scarton, Carolina
    Specia, Lucia
    [J]. COMPUTATIONAL LINGUISTICS, 2021, 47 (04) : 861 - 889
  • [27] The price of debiasing automatic metrics in natural language evaluation
    Chaganty, Arun Tejasvi
    Mussmann, Stephen
    Liang, Percy
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 643 - 653
  • [28] Significance tests of automatic machine translation evaluation metrics
    Zhang, Ying
    Vogel, Stephan
    [J]. MACHINE TRANSLATION, 2010, 24 (01) : 51 - 65
  • [29] Automatic Identification of Information Quality Metrics in Health News Stories
    Al-Jefri, Majed
    Evans, Roger
    Lee, Joon
    Ghezzi, Pietro
    [J]. FRONTIERS IN PUBLIC HEALTH, 2020, 8
  • [30] Empirical legal analysis simplified: reducing complexity through automatic identification and evaluation of legally relevant factors
    Gray, Morgan A.
    Savelka, Jaromir
    Oliver, Wesley M.
    Ashley, Kevin D.
    [J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2024, 382 (2270):