Towards the use of entropy as a measure for the reliability of automatic MT evaluation metrics

被引:10
|
作者
Munk, Michal [1 ]
Munkova, Dasa [2 ]
Benko, Lubomir [3 ]
机构
[1] Constantine Philosopher Univ Nitra, Dept Informat, Nitra, Slovakia
[2] Constantine Philosopher Univ Nitra, Dept Translat Studies, Nitra, Slovakia
[3] Univ Pardubice, Inst Syst Engn & Informat, Studentska 95, Pardubice 53210, Czech Republic
关键词
Entropy; machine translation; reliability estimation; quality; automatic MT evaluation;
D O I
10.3233/JIFS-169505
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The study describes an experiment with different estimations of reliability. Reliability reflects the technical quality of the measurement procedure such as an automatic evaluation of Machine Translation (MT). Reliability is an indicator of accuracy, the reliability of measuring, in our case, measuring the accuracy and error rate of MT output based on automatic metrics (precision, recall, f-measure, Bleu-n, WER, PER, and CDER). The experiment showed metrics (Bleu-4 and WER) that reduce the overall reliability of the automatic evaluation of accuracy and error rate using entropy. Based on the results we can say, that the use of entropy for the estimation of reliability brings more accurate results than conventional estimations of reliability (Cronbach's alpha and correlation). MT evaluation, based on n-grams or edit distance, using entropy could offer a new view on lexicon-based metrics in comparison to commonly used ones.
引用
收藏
页码:3225 / 3233
页数:9
相关论文
共 50 条
  • [1] The significance of recall in automatic metrics for MT evaluation
    Lavie, A
    Sagae, K
    Jayaraman, S
    [J]. MACHINE TRANSLATION: FROM REAL USERS TO RESEARCH, PROCEEDINGS, 2004, 3265 : 134 - 143
  • [2] Identification of Relevant and Redundant Automatic Metrics for MT Evaluation
    Munk, Michal
    Munkova, Dasa
    Benko, L'ubomir
    [J]. MULTI-DISCIPLINARY TRENDS IN ARTIFICIAL INTELLIGENCE, (MIWAI 2016), 2016, 10053 : 141 - 152
  • [3] Towards Automatic Measure of Similarity for Use in Unit Selection
    Tihelka, Daniel
    [J]. ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 637 - 642
  • [4] Automated MT evaluation metrics and their limitations
    Babych, Bogdan
    [J]. TRADUMATICA-TRADUCCIO I TECNOLOGIES DE LA INFORMACIO I LA COMUNICACIO, 2014, (12): : 464 - 470
  • [5] Metrics for MT evaluation: evaluating reordering
    Birch, Alexandra
    Osborne, Miles
    Blunsom, Phil
    [J]. MACHINE TRANSLATION, 2010, 24 (01) : 15 - 26
  • [6] Dynamics of metrics in measure spaces and scaling entropy
    Vershik, A. M.
    Veprev, G. A.
    Zatitskii, P. B.
    [J]. RUSSIAN MATHEMATICAL SURVEYS, 2023, 78 (03) : 443 - 499
  • [7] Towards alternative metrics to measure research
    Lavaud, F.
    Dutau, G.
    [J]. REVUE FRANCAISE D ALLERGOLOGIE, 2018, 58 (02): : 69 - 71
  • [8] Contemplating automatic MT evaluation
    White, JS
    [J]. ENVISIONING MACHINE TRANSLATION IN THE INFORMATION FUTURE, PROCEEDINGS, 2000, 1934 : 100 - 108
  • [9] Approximate entropy as a measure of irregularity for psychiatric serial metrics
    Pincus, Steven M.
    [J]. BIPOLAR DISORDERS, 2006, 8 (05) : 430 - 440
  • [10] Towards Heterogeneous Automatic MT Error Analysis
    Gimenez, Jesus
    Marquez, Lluis
    [J]. SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1894 - 1901