Identification of Relevant and Redundant Automatic Metrics for MT Evaluation

被引：7

作者：

Munk, Michal ^{[1
]}

Munkova, Dasa ^{[2
]}

Benko, L'ubomir ^{[3
]}

机构：

[1] Constantine Philosopher Univ Nitra, Fac Nat Sci, Dept Informat, Nitra, Slovakia

[2] Constantine Philosopher Univ Nitra, Dept Translat Studies, Nitra, Slovakia

[3] Univ Pardubice, Inst Syst Engn & Informat, Pardubice, Czech Republic

来源：

MULTI-DISCIPLINARY TRENDS IN ARTIFICIAL INTELLIGENCE, (MIWAI 2016) | 2016年 / 10053卷

关键词：

Machine translation; Evaluation; Automatic metrics; Reliability; Entropy; Redundancy;

D O I：

10.1007/978-3-319-49397-8_12

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The paper is aimed at automatic metrics for translation quality assessment (TQA), specifically at machine translation (MT) output and the metrics for the evaluation of MT output (Precision, Recall, F-measure, BLEU, PER, WER and CDER). We examine their reliability and we determine the metrics which show decreasing reliability of the automatic evaluation of MT output. Besides the traditional measures (Cronbach's alpha and standardized alpha) we use entropy for assessing the reliability of the automatic metrics of MT output. The results were obtained on a dataset covering translation from a low resource language (SK) into English (EN). The main contribution consists of the identification of the redundant automatic MT evaluation metrics.

引用

页码：141 / 152

页数：12

共 50 条

[21] Sensitivity of automated MT evaluation metrics on higher quality MT output: BLEU vs task-based evaluation methods
Babych, Bogdan
Hartley, Anthony
[J]. SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 2133 - 2136
[22] Reassessing Automatic Evaluation Metrics for Code Summarization Tasks
Roy, Devjeet
Fakhoury, Sarah
Arnaoudova, Venera
[J]. PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21), 2021, : 1105 - 1116
[23] Automatic Metrics for Machine Translation Evaluation and Minority Languages
Munkova, Dasa
Munk, Michal
[J]. PROCEEDINGS OF THE MEDITERRANEAN CONFERENCE ON INFORMATION & COMMUNICATION TECHNOLOGIES 2015 (MEDCT 2015), VOL 2, 2016, 381 : 631 - 636
[24] Automatic Prediction of Speech Evaluation Metrics for Dysarthric Speech
Laaridh, Imed
Ben Kheder, Waad
Fredouille, Corinne
Meunier, Christine
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1834 - 1838
[25] A Study of Automatic Metrics for the Evaluation of Natural Language Explanations
Clinciu, Miruna-Adriana
Eshghi, Arash
Hastie, Helen
[J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2376 - 2387
[26] The (Un)Suitability of Automatic Evaluation Metrics for Text Simplification
Alva-Manchego, Fernando
Scarton, Carolina
Specia, Lucia
[J]. COMPUTATIONAL LINGUISTICS, 2021, 47 (04) : 861 - 889
[27] The price of debiasing automatic metrics in natural language evaluation
Chaganty, Arun Tejasvi
Mussmann, Stephen
Liang, Percy
[J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 643 - 653
[28] Significance tests of automatic machine translation evaluation metrics
Zhang, Ying
Vogel, Stephan
[J]. MACHINE TRANSLATION, 2010, 24 (01) : 51 - 65
[29] Automatic Identification of Information Quality Metrics in Health News Stories
Al-Jefri, Majed
Evans, Roger
Lee, Joon
Ghezzi, Pietro
[J]. FRONTIERS IN PUBLIC HEALTH, 2020, 8
[30] Empirical legal analysis simplified: reducing complexity through automatic identification and evaluation of legally relevant factors
Gray, Morgan A.
Savelka, Jaromir
Oliver, Wesley M.
Ashley, Kevin D.
[J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2024, 382 (2270):

← 1 2 3 4 5 →