Towards the use of entropy as a measure for the reliability of automatic MT evaluation metrics

被引：10

作者：

Munk, Michal ^{[1
]}

Munkova, Dasa ^{[2
]}

Benko, Lubomir ^{[3
]}

机构：

[1] Constantine Philosopher Univ Nitra, Dept Informat, Nitra, Slovakia

[2] Constantine Philosopher Univ Nitra, Dept Translat Studies, Nitra, Slovakia

[3] Univ Pardubice, Inst Syst Engn & Informat, Studentska 95, Pardubice 53210, Czech Republic

来源：

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS | 2018年 / 34卷 / 05期

关键词：

Entropy; machine translation; reliability estimation; quality; automatic MT evaluation;

D O I：

10.3233/JIFS-169505

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The study describes an experiment with different estimations of reliability. Reliability reflects the technical quality of the measurement procedure such as an automatic evaluation of Machine Translation (MT). Reliability is an indicator of accuracy, the reliability of measuring, in our case, measuring the accuracy and error rate of MT output based on automatic metrics (precision, recall, f-measure, Bleu-n, WER, PER, and CDER). The experiment showed metrics (Bleu-4 and WER) that reduce the overall reliability of the automatic evaluation of accuracy and error rate using entropy. Based on the results we can say, that the use of entropy for the estimation of reliability brings more accurate results than conventional estimations of reliability (Cronbach's alpha and correlation). MT evaluation, based on n-grams or edit distance, using entropy could offer a new view on lexicon-based metrics in comparison to commonly used ones.

引用

页码：3225 / 3233

页数：9

共 50 条

[1] The significance of recall in automatic metrics for MT evaluation
Lavie, A
Sagae, K
Jayaraman, S
[J]. MACHINE TRANSLATION: FROM REAL USERS TO RESEARCH, PROCEEDINGS, 2004, 3265 : 134 - 143
[2] Identification of Relevant and Redundant Automatic Metrics for MT Evaluation
Munk, Michal
Munkova, Dasa
Benko, L'ubomir
[J]. MULTI-DISCIPLINARY TRENDS IN ARTIFICIAL INTELLIGENCE, (MIWAI 2016), 2016, 10053 : 141 - 152
[3] Towards Automatic Measure of Similarity for Use in Unit Selection
Tihelka, Daniel
[J]. ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 637 - 642
[4] Automated MT evaluation metrics and their limitations
Babych, Bogdan
[J]. TRADUMATICA-TRADUCCIO I TECNOLOGIES DE LA INFORMACIO I LA COMUNICACIO, 2014, (12): : 464 - 470
[5] Metrics for MT evaluation: evaluating reordering
Birch, Alexandra
Osborne, Miles
Blunsom, Phil
[J]. MACHINE TRANSLATION, 2010, 24 (01) : 15 - 26
[6] Dynamics of metrics in measure spaces and scaling entropy
Vershik, A. M.
Veprev, G. A.
Zatitskii, P. B.
[J]. RUSSIAN MATHEMATICAL SURVEYS, 2023, 78 (03) : 443 - 499
[7] Towards alternative metrics to measure research
Lavaud, F.
Dutau, G.
[J]. REVUE FRANCAISE D ALLERGOLOGIE, 2018, 58 (02): : 69 - 71
[8] Contemplating automatic MT evaluation
White, JS
[J]. ENVISIONING MACHINE TRANSLATION IN THE INFORMATION FUTURE, PROCEEDINGS, 2000, 1934 : 100 - 108
[9] Approximate entropy as a measure of irregularity for psychiatric serial metrics
Pincus, Steven M.
[J]. BIPOLAR DISORDERS, 2006, 8 (05) : 430 - 440
[10] Towards Heterogeneous Automatic MT Error Analysis
Gimenez, Jesus
Marquez, Lluis
[J]. SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1894 - 1901

← 1 2 3 4 5 →